Bicocca Open Archive

Background: Drug-induced liver injury (DILI) is an adverse reaction caused by the intake of drugs of common use that produces liver damage. The impact of DILI is estimated to affect around 20 in 100,000 inhabitants worldwide each year. Despite being one of the main causes of liver failure, the pathophysiology and mechanisms of DILI are poorly understood. In the present study, we developed an ensemble learning approach based on different features (CMap gene expression, chemical structures, drug targets) to predict drugs that might cause DILI and gain a better understanding of the mechanisms linked to the adverse reaction. Results: We searched for gene signatures in CMap gene expression data by using two approaches: phenotype-gene associations data from DisGeNET, and a non-parametric test comparing gene expression of DILI-Concern and No-DILI-Concern drugs (as per DILIrank definitions). The average accuracy of the classifiers in both approaches was 69%. We used chemical structures as features, obtaining an accuracy of 65%. The combination of both types of features produced an accuracy around 63%, but improved the independent hold-out test up to 67%. The use of drug-target associations as feature obtained the best accuracy (70%) in the independent hold-out test. Conclusions: When using CMap gene expression data, searching for a specific gene signature among the landmark genes improves the quality of the classifiers, but it is still limited by the intrinsic noise of the dataset. When using chemical structures as a feature, the structural diversity of the known DILI-causing drugs hampers the prediction, which is a similar problem as for the use of gene expression information. The combination of both features did not improve the quality of the classifiers but increased the robustness as shown on independent hold-out tests. The use of drug-target associations as feature improved the prediction, specially the specificity, and the results were comparable to previous research studies.

Aguirre-Plans, J., Pinero, J., Souza, T., Callegaro, G., Kunnen, S., Sanz, F., et al. (2021). An ensemble learning approach for modeling the systems biology of drug-induced injury. BIOLOGY DIRECT, 16(1) [10.1186/s13062-020-00288-x].

An ensemble learning approach for modeling the systems biology of drug-induced injury

Aguirre-Plans J.;Pinero J.;Souza T.;Callegaro G.;Kunnen S. J.;Sanz F.;Fernandez-Fuentes N.;Furlong L. I.;Guney E.;Oliva B.

2021

Abstract

Background: Drug-induced liver injury (DILI) is an adverse reaction caused by the intake of drugs of common use that produces liver damage. The impact of DILI is estimated to affect around 20 in 100,000 inhabitants worldwide each year. Despite being one of the main causes of liver failure, the pathophysiology and mechanisms of DILI are poorly understood. In the present study, we developed an ensemble learning approach based on different features (CMap gene expression, chemical structures, drug targets) to predict drugs that might cause DILI and gain a better understanding of the mechanisms linked to the adverse reaction. Results: We searched for gene signatures in CMap gene expression data by using two approaches: phenotype-gene associations data from DisGeNET, and a non-parametric test comparing gene expression of DILI-Concern and No-DILI-Concern drugs (as per DILIrank definitions). The average accuracy of the classifiers in both approaches was 69%. We used chemical structures as features, obtaining an accuracy of 65%. The combination of both types of features produced an accuracy around 63%, but improved the independent hold-out test up to 67%. The use of drug-target associations as feature obtained the best accuracy (70%) in the independent hold-out test. Conclusions: When using CMap gene expression data, searching for a specific gene signature among the landmark genes improves the quality of the classifiers, but it is still limited by the intrinsic noise of the dataset. When using chemical structures as a feature, the structural diversity of the known DILI-causing drugs hampers the prediction, which is a similar problem as for the use of gene expression information. The combination of both features did not improve the quality of the classifiers but increased the robustness as shown on independent hold-out tests. The use of drug-target associations as feature improved the prediction, specially the specificity, and the results were comparable to previous research studies.

Scheda breve

Scheda completa

Scheda completa (DC)

	Sottotipologia
	
				Articolo in rivista - Articolo scientifico
			
	Parole chiave
	
				CAMDA; Cmap; Drug safety; Drug structure; Drug-induced liver injury; Hepatotoxicity; Machine learning; Systems biology;
			
	Lingua del contenuto
	
				English
			
	Data di pubblicazione
	
				2021
			
	Rivista
	
				BIOLOGY DIRECT
			
	Numero del volume
	
				16
			
	Fascicolo
	
				1
			
	Article number
	
				5
			
	DOI dell'articolo
	
				https://dx.doi.org/10.1186/s13062-020-00288-x
			
	Fulltext
	
				open
			
	Citazione
	
				Aguirre-Plans, J., Pinero, J., Souza, T., Callegaro, G., Kunnen, S., Sanz, F., et al. (2021). An ensemble learning approach for modeling the systems biology of drug-induced injury. BIOLOGY DIRECT, 16(1) [10.1186/s13062-020-00288-x].
			
	Appare nelle tipologie:
	
				01 - Articolo su rivista

File in questo prodotto:

File	Dimensione	Formato
Aguirre-Plans-2021-Biology Direct-VoR.pdf accesso aperto Descrizione: This article is licensed under a Creative Commons Attribution 4.0 International License To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ Tipologia di allegato: Publisher’s Version (Version of Record, VoR) Licenza: Creative Commons Dimensione 2.34 MB Formato Adobe PDF Visualizza/Apri	2.34 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/511679

Citazioni

13

12

Social impact