Bicocca Open Archive

Mass spectrometry (MS) is widely used for the identification of chemical compounds by matching the experimentally acquired mass spectrum against a database of reference spectra. However, this approach suffers from a limited coverage of the existing databases causing a failure in the identification of a compound not present in the database. Among the computational approaches for mining metabolite structures based on MS data, one option is to predict molecular fingerprints from the mass spectra by means of chemometric strategies and then use them to screen compound libraries. This can be carried out by calibrating multi-task artificial neural networks from large datasets of mass spectra, used as inputs, and molecular fingerprints as outputs. In this study, we prepared a large LC-MS/MS dataset from an on-line open repository. These data were used to train and evaluate deep-learning-based approaches to predict molecular fingerprints and retrieve the structure of unknown compounds from their LC-MS/MS spectra. Effects of data sparseness and the impact of different strategies of data curing and dimensionality reduction on the output accuracy have been evaluated. Moreover, extensive diagnostics have been carried out to evaluate modelling advantages and drawbacks as a function of the explored chemical space.

Consonni, V., Gosetti, F., Termopoli, V., Todeschini, R., Valsecchi, C., Ballabio, D. (2022). Multi-Task Neural Networks and Molecular Fingerprints to Enhance Compound Identification from LC-MS/MS Data. MOLECULES, 27(18), 1-16 [10.3390/molecules27185827].

Multi-Task Neural Networks and Molecular Fingerprints to Enhance Compound Identification from LC-MS/MS Data

Consonni, Viviana;Gosetti, Fabio;Termopoli, Veronica;Todeschini, Roberto;Valsecchi, Cecile;Ballabio, Davide

2022

Abstract

Mass spectrometry (MS) is widely used for the identification of chemical compounds by matching the experimentally acquired mass spectrum against a database of reference spectra. However, this approach suffers from a limited coverage of the existing databases causing a failure in the identification of a compound not present in the database. Among the computational approaches for mining metabolite structures based on MS data, one option is to predict molecular fingerprints from the mass spectra by means of chemometric strategies and then use them to screen compound libraries. This can be carried out by calibrating multi-task artificial neural networks from large datasets of mass spectra, used as inputs, and molecular fingerprints as outputs. In this study, we prepared a large LC-MS/MS dataset from an on-line open repository. These data were used to train and evaluate deep-learning-based approaches to predict molecular fingerprints and retrieve the structure of unknown compounds from their LC-MS/MS spectra. Effects of data sparseness and the impact of different strategies of data curing and dimensionality reduction on the output accuracy have been evaluated. Moreover, extensive diagnostics have been carried out to evaluate modelling advantages and drawbacks as a function of the explored chemical space.

Scheda breve

Scheda completa

Scheda completa (DC)

	Sottotipologia
	
				Articolo in rivista - Articolo scientifico
			
	Parole chiave
	
				chemometrics; classification; fingerprints; LC-MS/MS; multi-task; neural networks; similarity matching;
			
	Lingua del contenuto
	
				English
			
	Data ahead of print o Data prima pubblicazione Online
	
				8-set-2022
			
	Data di pubblicazione
	
				2022
			
	Rivista
	
				MOLECULES
			
	Numero del volume
	
				27
			
	Fascicolo
	
				18
			
	Pagina iniziale
	
				1
			
	Pagina finale
	
				16
			
	Article number
	
				5827
			
	DOI dell'articolo
	
				https://dx.doi.org/10.3390/molecules27185827
			
	Fulltext
	
				open
			
	Citazione
	
				Consonni, V., Gosetti, F., Termopoli, V., Todeschini, R., Valsecchi, C., Ballabio, D. (2022). Multi-Task Neural Networks and Molecular Fingerprints to Enhance Compound Identification from LC-MS/MS Data. MOLECULES, 27(18), 1-16 [10.3390/molecules27185827].
			
	Appare nelle tipologie:
	
				01 - Articolo su rivista

File in questo prodotto:

File	Dimensione	Formato
Consonni-2022-molecules-VoR.pdf accesso aperto Descrizione: Article Tipologia di allegato: Publisher’s Version (Version of Record, VoR) Licenza: Creative Commons Dimensione 2.42 MB Formato Adobe PDF Visualizza/Apri	2.42 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/391868

Citazioni

6

5

Social impact