Comparing and experimenting machine learning techniques for code smell detection

Arcelli Fontana, F; Mäntylä, M; Zanoni, M; Marino, A

doi:10.1007/s10664-015-9378-4

Several code smell detection tools have been developed providing different results, because smells can be subjectively interpreted, and hence detected, in different ways. In this paper, we perform the largest experiment of applying machine learning algorithms to code smells to the best of our knowledge. We experiment 16 different machine-learning algorithms on four code smells (Data Class, Large Class, Feature Envy, Long Method) and 74 software systems, with 1986 manually validated code smell samples. We found that all algorithms achieved high performances in the cross-validation data set, yet the highest performances were obtained by J48 and Random Forest, while the worst performance were achieved by support vector machines. However, the lower prevalence of code smells, i.e., imbalanced data, in the entire data set caused varying performances that need to be addressed in the future studies. We conclude that the application of machine learning to the detection of these code smells can provide high accuracy (>96 %), and only a hundred training examples are needed to reach at least 95 % accuracy.

ARCELLI FONTANA, F., Mäntylä, M., Zanoni, M., Marino, A. (2016). Comparing and experimenting machine learning techniques for code smell detection. EMPIRICAL SOFTWARE ENGINEERING, 21(3), 1143-1191 [10.1007/s10664-015-9378-4].

Comparing and experimenting machine learning techniques for code smell detection

ARCELLI FONTANA, FRANCESCA^Primo;Mäntylä, M;ZANONI, MARCO;Marino, A.

2016

Abstract

Several code smell detection tools have been developed providing different results, because smells can be subjectively interpreted, and hence detected, in different ways. In this paper, we perform the largest experiment of applying machine learning algorithms to code smells to the best of our knowledge. We experiment 16 different machine-learning algorithms on four code smells (Data Class, Large Class, Feature Envy, Long Method) and 74 software systems, with 1986 manually validated code smell samples. We found that all algorithms achieved high performances in the cross-validation data set, yet the highest performances were obtained by J48 and Random Forest, while the worst performance were achieved by support vector machines. However, the lower prevalence of code smells, i.e., imbalanced data, in the entire data set caused varying performances that need to be addressed in the future studies. We conclude that the application of machine learning to the detection of these code smells can provide high accuracy (>96 %), and only a hundred training examples are needed to reach at least 95 % accuracy.

Scheda breve

Scheda completa

Scheda completa (DC)

	Sottotipologia
	
				Articolo in rivista - Articolo scientifico
			
	Parole chiave
	
				Benchmark for code smell detection; Code smells detection; Machine learning techniques;
			
	Lingua del contenuto
	
				English
			
	Data di pubblicazione
	
				2016
			
	Rivista
	
				EMPIRICAL SOFTWARE ENGINEERING
			
	Numero del volume
	
				21
			
	Fascicolo
	
				3
			
	Pagina iniziale
	
				1143
			
	Pagina finale
	
				1191
			
	DOI dell'articolo
	
				https://dx.doi.org/10.1007/s10664-015-9378-4
			
	URL alternativo
	
				http://link.springer.com/article/10.1007%2Fs10664-015-9378-4
			
	Fulltext
	
				reserved
			
	Citazione
	
				ARCELLI FONTANA, F., Mäntylä, M., Zanoni, M., Marino, A. (2016). Comparing and experimenting machine learning techniques for code smell detection. EMPIRICAL SOFTWARE ENGINEERING, 21(3), 1143-1191 [10.1007/s10664-015-9378-4].
			
	Appare nelle tipologie:
	
				01 - Articolo su rivista

File in questo prodotto:

File	Dimensione	Formato
ERA-EXTENSION-SUBMITTED-3.docx Solo gestori archivio Descrizione: articolo Tipologia di allegato: Author’s Accepted Manuscript, AAM (Post-print) Dimensione 601.83 kB Formato Microsoft Word XML Visualizza/Apri Richiedi una copia	601.83 kB	Microsoft Word XML	Visualizza/Apri Richiedi una copia
9-Comparing ML-ESE-Springer-2016.pdf Solo gestori archivio Tipologia di allegato: Publisher’s Version (Version of Record, VoR) Dimensione 1.9 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	1.9 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/84895

Citazioni

403

290

Bicocca Open Archive

Comparing and experimenting machine learning techniques for code smell detection

ARCELLI FONTANA, FRANCESCA^Primo;Mäntylä, M;ZANONI, MARCO;Marino, A.

Primo

2016

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

Citazioni

Social impact

Bicocca Open Archive

Comparing and experimenting machine learning techniques for code smell detection

ARCELLI FONTANA, FRANCESCAPrimo;Mäntylä, M;ZANONI, MARCO;Marino, A.

Primo

2016

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Citazioni

Social impact

Conferma cancellazione

ARCELLI FONTANA, FRANCESCA^Primo;Mäntylä, M;ZANONI, MARCO;Marino, A.

Scheda breve

Scheda completa

Scheda completa (DC)