Bicocca Open Archive

A recent study by Faye Orcales and colleagues proposes a teaching curriculum on supervised machine learning applied to genomics data aimed at predicting antibiotic resistance. The article describes a traditional machine learning pipeline step-by-step in a way that is accessible to anyone, including novices. However, the authors provide a misleading piece of advice in the "Evaluating model performance" section, where they recommend that readers use accuracy and the F1 score for binary classification. We write this short formal comment on that article to reaffirm and explain why accuracy and the F1 score should be avoided in the evaluation of binary classification and why the Matthews correlation coefficient (MCC) should be employed instead. We also take this opportunity to warn readers about the dangers of k-fold cross-validation, which is suggested as a standard method for dividing data into training set and test set, but has several flaws and pitfalls.

Chicco, D., Jurman, G. (2025). Comment on "Using genomic data and machine learning to predict antibiotic resistance: A tutorial paper". PLOS COMPUTATIONAL BIOLOGY, 21(12) [10.1371/journal.pcbi.1013673].

Comment on "Using genomic data and machine learning to predict antibiotic resistance: A tutorial paper"

Chicco D.^Primo;Jurman G.

2025

Abstract

A recent study by Faye Orcales and colleagues proposes a teaching curriculum on supervised machine learning applied to genomics data aimed at predicting antibiotic resistance. The article describes a traditional machine learning pipeline step-by-step in a way that is accessible to anyone, including novices. However, the authors provide a misleading piece of advice in the "Evaluating model performance" section, where they recommend that readers use accuracy and the F1 score for binary classification. We write this short formal comment on that article to reaffirm and explain why accuracy and the F1 score should be avoided in the evaluation of binary classification and why the Matthews correlation coefficient (MCC) should be employed instead. We also take this opportunity to warn readers about the dangers of k-fold cross-validation, which is suggested as a standard method for dividing data into training set and test set, but has several flaws and pitfalls.

Scheda breve

Scheda completa

Scheda completa (DC)

	Sottotipologia
	
				Articolo in rivista - Articolo scientifico
			
	Parole chiave
	
				machine learning: antibiotic resistance; data science; binary classification
			
	Lingua del contenuto
	
				English
			
	Data ahead of print o Data prima pubblicazione Online
	
				1-dic-2025
			
	Data di pubblicazione
	
				2025
			
	Rivista
	
				PLOS COMPUTATIONAL BIOLOGY
			
	Numero del volume
	
				21
			
	Fascicolo
	
				12
			
	Article number
	
				e1013673
			
	DOI dell'articolo
	
				https://dx.doi.org/10.1371/journal.pcbi.1013673
			
	Fulltext
	
				open
			
	Citazione
	
				Chicco, D., Jurman, G. (2025). Comment on "Using genomic data and machine learning to predict antibiotic resistance: A tutorial paper". PLOS COMPUTATIONAL BIOLOGY, 21(12) [10.1371/journal.pcbi.1013673].
			
	Appare nelle tipologie:
	
				01 - Articolo su rivista

File in questo prodotto:

File	Dimensione	Formato
Chicco-Jurman-2025-PLoS Computational Biology-VoR.pdf accesso aperto Tipologia di allegato: Publisher’s Version (Version of Record, VoR) Licenza: Creative Commons Dimensione 316.7 kB Formato Adobe PDF Visualizza/Apri	316.7 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/580984

Citazioni

0

0

Social impact