Bicocca Open Archive

Background and Objective Medical machine learning (ML) models tend to perform better on data from the same cohort than on new data, often due to overfitting, or co-variate shifts. For these reasons, external validation (EV) is a necessary practice in the evaluation of medical ML. However, there is still a gap in the literature on how to interpret EV results and hence assess the robustness of ML models. Methods: We fill this gap by proposing a meta-validation method, to assess the soundness of EV procedures. In doing so, we complement the usual way to assess EV by considering both dataset cardinality, and the similarity of the EV dataset with respect to the training set. We then investigate how the notions of cardinality and similarity can be used to inform on the reliability of a validation procedure, by integrating them into two summative data visualizations. Results: We illustrate our methodology by applying it to the validation of a state-of-the-art COVID-19 diagnostic model on 8 EV sets, collected across 3 different continents. The model performance was moderately impacted by data similarity (Pearson ρ = 0.38, p< 0.001). In the EV, the validated model reported good AUC (average: 0.84), acceptable calibration (average: 0.17) and utility (average: 0.50). The validation datasets were adequate in terms of dataset cardinality and similarity, thus suggesting the soundness of the results. We also provide a qualitative guideline to evaluate the reliability of validation procedures, and we discuss the importance of proper external validation in light of the obtained results. Conclusions: In this paper, we propose a novel, lean methodology to: 1) study how the similarity between training and validation sets impacts the generalizability of a ML model; 2) assess the soundness of EV evaluations along three complementary performance dimensions: discrimination, utility and calibration; 3) draw conclusions on the robustness of the model under validation. We applied this methodology to a state-of-the-art model for the diagnosis of COVID-19 from routine blood tests, and showed how to interpret the results in light of the presented framework.

Cabitza, F., Campagner, A., Soares, F., Garcia de Guadiana-Romualdo, L., Challa, F., Sulejmani, A., et al. (2021). The importance of being external. methodological insights for the external validation of machine learning models in medicine. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 208(September 2021) [10.1016/j.cmpb.2021.106288].

The importance of being external. methodological insights for the external validation of machine learning models in medicine

Cabitza F.;Campagner A.;Soares F.;Garcia de Guadiana-Romualdo L.;Challa F.;Sulejmani A.;Seghezzi M.;Carobene A.

2021

Abstract

Background and Objective Medical machine learning (ML) models tend to perform better on data from the same cohort than on new data, often due to overfitting, or co-variate shifts. For these reasons, external validation (EV) is a necessary practice in the evaluation of medical ML. However, there is still a gap in the literature on how to interpret EV results and hence assess the robustness of ML models. Methods: We fill this gap by proposing a meta-validation method, to assess the soundness of EV procedures. In doing so, we complement the usual way to assess EV by considering both dataset cardinality, and the similarity of the EV dataset with respect to the training set. We then investigate how the notions of cardinality and similarity can be used to inform on the reliability of a validation procedure, by integrating them into two summative data visualizations. Results: We illustrate our methodology by applying it to the validation of a state-of-the-art COVID-19 diagnostic model on 8 EV sets, collected across 3 different continents. The model performance was moderately impacted by data similarity (Pearson ρ = 0.38, p< 0.001). In the EV, the validated model reported good AUC (average: 0.84), acceptable calibration (average: 0.17) and utility (average: 0.50). The validation datasets were adequate in terms of dataset cardinality and similarity, thus suggesting the soundness of the results. We also provide a qualitative guideline to evaluate the reliability of validation procedures, and we discuss the importance of proper external validation in light of the obtained results. Conclusions: In this paper, we propose a novel, lean methodology to: 1) study how the similarity between training and validation sets impacts the generalizability of a ML model; 2) assess the soundness of EV evaluations along three complementary performance dimensions: discrimination, utility and calibration; 3) draw conclusions on the robustness of the model under validation. We applied this methodology to a state-of-the-art model for the diagnosis of COVID-19 from routine blood tests, and showed how to interpret the results in light of the presented framework.

Scheda breve

Scheda completa

Scheda completa (DC)

	Sottotipologia
	
				Articolo in rivista - Articolo scientifico
			
	Parole chiave
	
				COVID-19; Dataset cardinality; Dataset similarity; Medical machine learning; Validation;
			
	Lingua del contenuto
	
				English
			
	Data ahead of print o Data prima pubblicazione Online
	
				22-lug-2021
			
	Data di pubblicazione
	
				2021
			
	Rivista
	
				COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE
			
	Numero del volume
	
				208
			
	Fascicolo
	
				September 2021
			
	Article number
	
				106288
			
	DOI dell'articolo
	
				https://dx.doi.org/10.1016/j.cmpb.2021.106288
			
	Fulltext
	
				open
			
	Citazione
	
				Cabitza, F., Campagner, A., Soares, F., Garcia de Guadiana-Romualdo, L., Challa, F., Sulejmani, A., et al. (2021). The importance of being external. methodological insights for the external validation of machine learning models in medicine. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 208(September 2021) [10.1016/j.cmpb.2021.106288].
			
	Appare nelle tipologie:
	
				01 - Articolo su rivista

File in questo prodotto:

File	Dimensione	Formato
Paper_Preprint.pdf accesso aperto Tipologia di allegato: Submitted Version (Pre-print) Dimensione 2.18 MB Formato Adobe PDF Visualizza/Apri	2.18 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/324839

Citazioni

140

125

Social impact