Bicocca Open Archive

In this paper, we design a technique for mapping the source code into a vector space and we show its application in the recognition of security weaknesses.By applying ideas commonly used in Natural Language Processing, we train a model for producing an embedding of programs starting from their Abstract Syntax Trees. We then show how such embedding is able to infer clusters roughly separating different classes of software weaknesses.Even if the training of the embedding is unsupervised and made on a generic Java dataset, we show that the model can be used for supervised learning of specific classes of vulnerabilities, helping to capture some features distinguishing them in code.Finally, we discuss how our model performs over the different types of vulnerabilities categorized by the CWE initiative.

Saletta, M., Ferretti, C. (2020). A Neural Embedding for Source Code: Security Analysis and CWE Lists. In Proceedings - IEEE 18th International Conference on Dependable, Autonomic and Secure Computing, IEEE 18th International Conference on Pervasive Intelligence and Computing, IEEE 6th International Conference on Cloud and Big Data Computing and IEEE 5th Cyber Science and Technology Congress, DASC/PiCom/CBDCom/CyberSciTech 2020 (pp.523-530). Institute of Electrical and Electronics Engineers Inc. [10.1109/DASC-PICom-CBDCom-CyberSciTech49142.2020.00095].

A Neural Embedding for Source Code: Security Analysis and CWE Lists

Saletta, M;Ferretti, C

2020

Abstract

In this paper, we design a technique for mapping the source code into a vector space and we show its application in the recognition of security weaknesses.By applying ideas commonly used in Natural Language Processing, we train a model for producing an embedding of programs starting from their Abstract Syntax Trees. We then show how such embedding is able to infer clusters roughly separating different classes of software weaknesses.Even if the training of the embedding is unsupervised and made on a generic Java dataset, we show that the model can be used for supervised learning of specific classes of vulnerabilities, helping to capture some features distinguishing them in code.Finally, we discuss how our model performs over the different types of vulnerabilities categorized by the CWE initiative.

Scheda breve

Scheda completa

Scheda completa (DC)

	Tipo di intervento
	
				paper
			
	Parole chiave
	
				Security; source code embedding; static analysis; vulnerability classification;
			
	Lingua del contenuto
	
				English
			
	Nome del convegno
	
				18th IEEE International Conference on Dependable, Autonomic and Secure Computing, 18th IEEE International Conference on Pervasive Intelligence and Computing, 6th IEEE International Conference on Cloud and Big Data Computing and 5th IEEE Cyber Science and Technology Congress, DASC/PiCom/CBDCom/CyberSciTech 2020 - 17 August 2020 through 24 August 2020
			
	Anno del convegno
	
				2020
			
	Titolo degli atti
	
				Proceedings - IEEE 18th International Conference on Dependable, Autonomic and Secure Computing, IEEE 18th International Conference on Pervasive Intelligence and Computing, IEEE 6th International Conference on Cloud and Big Data Computing and IEEE 5th Cyber Science and Technology Congress, DASC/PiCom/CBDCom/CyberSciTech 2020
			
	ISBN del volume degli atti
	
				978-1-7281-6609-4
			
	Data di pubblicazione
	
				2020
			
	Pagina iniziale
	
				523
			
	Pagina finale
	
				530
			
	Article number
	
				9251115
			
	DOI dell'intervento
	
				https://dx.doi.org/10.1109/DASC-PICom-CBDCom-CyberSciTech49142.2020.00095
			
	Fulltext
	
				none
			
	Citazione
	
				Saletta, M., Ferretti, C. (2020). A Neural Embedding for Source Code: Security Analysis and CWE Lists. In Proceedings - IEEE 18th International Conference on Dependable, Autonomic and Secure Computing, IEEE 18th International Conference on Pervasive Intelligence and Computing, IEEE 6th International Conference on Cloud and Big Data Computing and IEEE 5th Cyber Science and Technology Congress, DASC/PiCom/CBDCom/CyberSciTech 2020 (pp.523-530). Institute of Electrical and Electronics Engineers Inc. [10.1109/DASC-PICom-CBDCom-CyberSciTech49142.2020.00095].
			
	Appare nelle tipologie:
	
				02 - Intervento a convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/299207

Citazioni

6

3

Social impact