Bicocca Open Archive

Background: The progress of digital transformation in clinical practice opens the door to transforming the current clinical line for liver disease diagnosis from a late-stage diagnosis approach to an early-stage based one. Early diagnosis of liver fibrosis can prevent the progression of the disease and decrease liver-related morbidity and mortality. We developed here a machine learning (ML) algorithm containing standard parameters that can identify liver fibrosis in the general US population. Materials and methods: Starting from a public database (National Health and Nutrition Examination Survey, NHANES), representative of the American population with 7265 eligible subjects (control population n = 6828, with Fibroscan values E < 9.7 KPa; target population n = 437 with Fibroscan values E ≥ 9.7 KPa), we set up an SVM algorithm able to discriminate for individuals with liver fibrosis among the general US population. The algorithm set up involved the removal of missing data and a sampling optimization step to managing the data imbalance (only ∼ 5 % of the dataset is the target population). Results: For the feature selection, we performed an unbiased analysis, starting from 33 clinical, anthropometric, and biochemical parameters regardless of their previous application as biomarkers of liver diseases. Through PCA analysis, we identified the 26 more significant features and then used them to set up a sampling method on an SVM algorithm. The best sampling technique to manage the data imbalance was found to be oversampling through the SMOTE-NC. For final model validation, we utilized a subset of 300 individuals (150 with liver fibrosis and 150 controls), subtracted from the main dataset prior to sampling. Performances were evaluated on multiple independent runs. Conclusions: We provide proof of concept of an ML clinical decision support tool for liver fibrosis diagnosis in the general US population. Though the presented ML model represents at this stage only a prototype, in the future, it might be implemented and potentially applied to program broad screenings for liver fibrosis.

Hassoun, S., Bruckmann, C., Ciardullo, S., Perseghin, G., Di Gaudio, F., Broccolo, F. (2023). Setting up of a machine learning algorithm for the identification of severe liver fibrosis profile in the general US population cohort. INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 170(February 2023) [10.1016/j.ijmedinf.2022.104932].

Setting up of a machine learning algorithm for the identification of severe liver fibrosis profile in the general US population cohort

Ciardullo, Stefano;Perseghin, Gianluca;Di Gaudio, Francesca;Broccolo, Francesco^Ultimo

2023

Abstract

Background: The progress of digital transformation in clinical practice opens the door to transforming the current clinical line for liver disease diagnosis from a late-stage diagnosis approach to an early-stage based one. Early diagnosis of liver fibrosis can prevent the progression of the disease and decrease liver-related morbidity and mortality. We developed here a machine learning (ML) algorithm containing standard parameters that can identify liver fibrosis in the general US population. Materials and methods: Starting from a public database (National Health and Nutrition Examination Survey, NHANES), representative of the American population with 7265 eligible subjects (control population n = 6828, with Fibroscan values E < 9.7 KPa; target population n = 437 with Fibroscan values E ≥ 9.7 KPa), we set up an SVM algorithm able to discriminate for individuals with liver fibrosis among the general US population. The algorithm set up involved the removal of missing data and a sampling optimization step to managing the data imbalance (only ∼ 5 % of the dataset is the target population). Results: For the feature selection, we performed an unbiased analysis, starting from 33 clinical, anthropometric, and biochemical parameters regardless of their previous application as biomarkers of liver diseases. Through PCA analysis, we identified the 26 more significant features and then used them to set up a sampling method on an SVM algorithm. The best sampling technique to manage the data imbalance was found to be oversampling through the SMOTE-NC. For final model validation, we utilized a subset of 300 individuals (150 with liver fibrosis and 150 controls), subtracted from the main dataset prior to sampling. Performances were evaluated on multiple independent runs. Conclusions: We provide proof of concept of an ML clinical decision support tool for liver fibrosis diagnosis in the general US population. Though the presented ML model represents at this stage only a prototype, in the future, it might be implemented and potentially applied to program broad screenings for liver fibrosis.

Scheda breve

Scheda completa

Scheda completa (DC)

	Sottotipologia
	
				Articolo in rivista - Articolo scientifico
			
	Parole chiave
	
				Imbalanced dataset; Liver fibrosis; Machine learning; NHANES; Oversampling techniques;
			
	Lingua del contenuto
	
				English
			
	Data ahead of print o Data prima pubblicazione Online
	
				25-nov-2022
			
	Data di pubblicazione
	
				2023
			
	Rivista
	
				INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS
			
	Numero del volume
	
				170
			
	Fascicolo
	
				February 2023
			
	Article number
	
				104932
			
	DOI dell'articolo
	
				https://dx.doi.org/10.1016/j.ijmedinf.2022.104932
			
	Fulltext
	
				none
			
	Citazione
	
				Hassoun, S., Bruckmann, C., Ciardullo, S., Perseghin, G., Di Gaudio, F., Broccolo, F. (2023). Setting up of a machine learning algorithm for the identification of severe liver fibrosis profile in the general US population cohort. INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 170(February 2023) [10.1016/j.ijmedinf.2022.104932].
			
	Appare nelle tipologie:
	
				01 - Articolo su rivista

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/397891

Citazioni

14

11

Social impact