Bicocca Open Archive

In the last decade, data security has become a primary concern for an increasing amount of companies around the world. Protecting the customer's privacy is now at the core of many businesses operating in any kind of market. Thus, the demand for new technologies to safeguard user data and prevent data breaches has increased accordingly. In this work, we investigate a machine learning-based approach to automatically extract sources and sinks from arbitrary Java libraries. Our method exploits several different features based on semantic, syntactic, intra-procedural dataflow and class-hierarchy traits embedded into the bytecode to distinguish sources and sinks. The performed experiments show that, under certain conditions and after some preprocessing, sources and sinks across different libraries share common characteristics that allow a machine learning model to distinguish them from the other library methods. The prototype model achieved remarkable results of 86% accuracy and 81% F-measure on our validation set of roughly 600 methods.

Sas, D., Bessi, M., Arcelli Fontana, F. (2018). Automatic detection of sources and sinks in arbitrary Java libraries. In Proceedings - 18th IEEE International Working Conference on Source Code Analysis and Manipulation, SCAM 2018 (pp.103-112). Institute of Electrical and Electronics Engineers Inc. [10.1109/SCAM.2018.00019].

Automatic detection of sources and sinks in arbitrary Java libraries

Sas, D;Bessi, M;Arcelli Fontana, F

2018

Abstract

In the last decade, data security has become a primary concern for an increasing amount of companies around the world. Protecting the customer's privacy is now at the core of many businesses operating in any kind of market. Thus, the demand for new technologies to safeguard user data and prevent data breaches has increased accordingly. In this work, we investigate a machine learning-based approach to automatically extract sources and sinks from arbitrary Java libraries. Our method exploits several different features based on semantic, syntactic, intra-procedural dataflow and class-hierarchy traits embedded into the bytecode to distinguish sources and sinks. The performed experiments show that, under certain conditions and after some preprocessing, sources and sinks across different libraries share common characteristics that allow a machine learning model to distinguish them from the other library methods. The prototype model achieved remarkable results of 86% accuracy and 81% F-measure on our validation set of roughly 600 methods.

Scheda breve

Scheda completa

Scheda completa (DC)

	Tipo di intervento
	
			paper
		
	Parole chiave
	
			Java; Machine Learning; Sink; Source; Static Analysis;
		
	Parole chiave
	
			Java; Machine Learning; Sink; Source; Static Analysis; Software; Safety, Risk, Reliability and Quality
		
	Lingua del contenuto
	
			English
		
	Nome del convegno
	
			18th IEEE International Working Conference on Source Code Analysis and Manipulation, SCAM 2018
		
	Anno del convegno
	
			2018
		
	Titolo degli atti
	
			Proceedings - 18th IEEE International Working Conference on Source Code Analysis and Manipulation, SCAM 2018
		
	ISBN del volume degli atti
	
			978-153868290-6
		
	Data di pubblicazione
	
			2018
		
	Pagina iniziale
	
			103
		
	Pagina finale
	
			112
		
	Article number
	
			8530723
		
	DOI dell'intervento
	
			https://dx.doi.org/10.1109/SCAM.2018.00019
		
	URL alternativo
	
			http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=8528960
		
	Fulltext
	
			none
		
	Citazione
	
			Sas, D., Bessi, M., Arcelli Fontana, F. (2018). Automatic detection of sources and sinks in arbitrary Java libraries. In Proceedings - 18th IEEE International Working Conference on Source Code Analysis and Manipulation, SCAM 2018 (pp.103-112). Institute of Electrical and Electronics Engineers Inc. [10.1109/SCAM.2018.00019].
		
	Appare nelle tipologie:
	
			02 - Intervento a convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/219124

Citazioni

5

3

Social impact