Bicocca Open Archive

Consistency and completeness of biomolecular annotations is a keypoint of correct interpretation of biological experiments. Yet, the associations between genes (or proteins) and features correctly annotated are just some of all the existing ones. As time goes by, they increase in number and become more useful, but they remain incomplete and some of them incorrect. To support and quicken their time-consuming curation procedure and to improve consistence of available annotations, computational methods that are able to supply a ranked list of predicted annotations are hence extremely useful. Starting from a previous work on the automatic prediction of Gene Ontology (GO) annotations based on the Singular Value Decomposition of the annotation matrix, where every matrix element corresponds to the association of a gene with a feature, we propose the use of a modified Probabilistic Latent Semantic Analysis (pLSA) algorithm, named pLSAnorm, to better perform such prediction. pLSA is a statistical technique from the natural language processing field, which has not been used in bioinformatics annotation prediction yet; it takes advantage of the latent information contained in the analyzed data co-occurrences. We proved the effectiveness of the pLSAnorm prediction method by performing k-fold cross-validation of the GO annotations of two organisms, Gallus gallus and Bos taurus. Obtained results demonstrate the efficacy of our approach.

Masseroli, M., Chicco, D., Pinoli, P. (2012). Probabilistic Latent Semantic Analysis for prediction of Gene Ontology annotations. In WCCI 2012 IEEE World Congress on Computational Intelligence; The 2012 International Joint Conference on Neural Networks (IJCNN) (pp.2891-2898). IEEE [10.1109/IJCNN.2012.6252767].

Probabilistic Latent Semantic Analysis for prediction of Gene Ontology annotations

Masseroli, M;Chicco, D;Pinoli, P

2012

Abstract

Consistency and completeness of biomolecular annotations is a keypoint of correct interpretation of biological experiments. Yet, the associations between genes (or proteins) and features correctly annotated are just some of all the existing ones. As time goes by, they increase in number and become more useful, but they remain incomplete and some of them incorrect. To support and quicken their time-consuming curation procedure and to improve consistence of available annotations, computational methods that are able to supply a ranked list of predicted annotations are hence extremely useful. Starting from a previous work on the automatic prediction of Gene Ontology (GO) annotations based on the Singular Value Decomposition of the annotation matrix, where every matrix element corresponds to the association of a gene with a feature, we propose the use of a modified Probabilistic Latent Semantic Analysis (pLSA) algorithm, named pLSAnorm, to better perform such prediction. pLSA is a statistical technique from the natural language processing field, which has not been used in bioinformatics annotation prediction yet; it takes advantage of the latent information contained in the analyzed data co-occurrences. We proved the effectiveness of the pLSAnorm prediction method by performing k-fold cross-validation of the GO annotations of two organisms, Gallus gallus and Bos taurus. Obtained results demonstrate the efficacy of our approach.

Scheda breve

Scheda completa

Scheda completa (DC)

	Tipo di intervento
	
				paper
			
	Parole chiave
	
				bioinformatics; genetics; ontologies (artificial intelligence); probability; singular value decomposition
			
	Lingua del contenuto
	
				English
			
	Nome del convegno
	
				2012 Annual International Joint Conference on Neural Networks, IJCNN 2012, Part of the 2012 IEEE World Congress on Computational Intelligence, WCCI 2012 - 10 June 2012 through 15 June 2012
			
	Anno del convegno
	
				2012
			
	Titolo degli atti
	
				WCCI 2012 IEEE World Congress on Computational Intelligence; The 2012 International Joint Conference on Neural Networks (IJCNN)
			
	ISBN del volume degli atti
	
				9781467314909
			
	Data di pubblicazione
	
				2012
			
	Pagina iniziale
	
				2891
			
	Pagina finale
	
				2898
			
	Article number
	
				6252767
			
	DOI dell'intervento
	
				https://dx.doi.org/10.1109/IJCNN.2012.6252767
			
	Fulltext
	
				reserved
			
	Citazione
	
				Masseroli, M., Chicco, D., Pinoli, P. (2012). Probabilistic Latent Semantic Analysis for prediction of Gene Ontology annotations. In WCCI 2012 IEEE World Congress on Computational Intelligence; The 2012 International Joint Conference on Neural Networks (IJCNN) (pp.2891-2898). IEEE [10.1109/IJCNN.2012.6252767].
			
	Appare nelle tipologie:
	
				02 - Intervento a convegno

File in questo prodotto:

File	Dimensione	Formato
Masseroli-2012-IJCNN WCCI-VoR.pdf Solo gestori archivio Descrizione: Intervento a convegno Tipologia di allegato: Publisher’s Version (Version of Record, VoR) Licenza: Tutti i diritti riservati Dimensione 1.02 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	1.02 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/435461

Citazioni

39

14

Social impact