Consistency and completeness of biomolecular annotations is a keypoint of correct interpretation of biological experiments. Yet, the associations between genes (or proteins) and features correctly annotated are just some of all the existing ones. As time goes by, they increase in number and become more useful, but they remain incomplete and some of them incorrect. To support and quicken their time-consuming curation procedure and to improve consistence of available annotations, computational methods that are able to supply a ranked list of predicted annotations are hence extremely useful. Starting from a previous work on the automatic prediction of Gene Ontology (GO) annotations based on the Singular Value Decomposition of the annotation matrix, where every matrix element corresponds to the association of a gene with a feature, we propose the use of a modified Probabilistic Latent Semantic Analysis (pLSA) algorithm, named pLSAnorm, to better perform such prediction. pLSA is a statistical technique from the natural language processing field, which has not been used in bioinformatics annotation prediction yet; it takes advantage of the latent information contained in the analyzed data co-occurrences. We proved the effectiveness of the pLSAnorm prediction method by performing k-fold cross-validation of the GO annotations of two organisms, Gallus gallus and Bos taurus. Obtained results demonstrate the efficacy of our approach.

Masseroli, M., Chicco, D., Pinoli, P. (2012). Probabilistic Latent Semantic Analysis for prediction of Gene Ontology annotations. In WCCI 2012 IEEE World Congress on Computational Intelligence; The 2012 International Joint Conference on Neural Networks (IJCNN) (pp.2891-2898). IEEE [10.1109/IJCNN.2012.6252767].

Probabilistic Latent Semantic Analysis for prediction of Gene Ontology annotations

Chicco, D;
2012

Abstract

Consistency and completeness of biomolecular annotations is a keypoint of correct interpretation of biological experiments. Yet, the associations between genes (or proteins) and features correctly annotated are just some of all the existing ones. As time goes by, they increase in number and become more useful, but they remain incomplete and some of them incorrect. To support and quicken their time-consuming curation procedure and to improve consistence of available annotations, computational methods that are able to supply a ranked list of predicted annotations are hence extremely useful. Starting from a previous work on the automatic prediction of Gene Ontology (GO) annotations based on the Singular Value Decomposition of the annotation matrix, where every matrix element corresponds to the association of a gene with a feature, we propose the use of a modified Probabilistic Latent Semantic Analysis (pLSA) algorithm, named pLSAnorm, to better perform such prediction. pLSA is a statistical technique from the natural language processing field, which has not been used in bioinformatics annotation prediction yet; it takes advantage of the latent information contained in the analyzed data co-occurrences. We proved the effectiveness of the pLSAnorm prediction method by performing k-fold cross-validation of the GO annotations of two organisms, Gallus gallus and Bos taurus. Obtained results demonstrate the efficacy of our approach.
paper
bioinformatics; genetics; ontologies (artificial intelligence); probability; singular value decomposition
English
2012 Annual International Joint Conference on Neural Networks, IJCNN 2012, Part of the 2012 IEEE World Congress on Computational Intelligence, WCCI 2012 - 10 June 2012 through 15 June 2012
2012
WCCI 2012 IEEE World Congress on Computational Intelligence; The 2012 International Joint Conference on Neural Networks (IJCNN)
9781467314909
2012
2891
2898
6252767
reserved
Masseroli, M., Chicco, D., Pinoli, P. (2012). Probabilistic Latent Semantic Analysis for prediction of Gene Ontology annotations. In WCCI 2012 IEEE World Congress on Computational Intelligence; The 2012 International Joint Conference on Neural Networks (IJCNN) (pp.2891-2898). IEEE [10.1109/IJCNN.2012.6252767].
File in questo prodotto:
File Dimensione Formato  
Masseroli-2012-IJCNN WCCI-VoR.pdf

Solo gestori archivio

Descrizione: Intervento a convegno
Tipologia di allegato: Publisher’s Version (Version of Record, VoR)
Licenza: Tutti i diritti riservati
Dimensione 1.02 MB
Formato Adobe PDF
1.02 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/435461
Citazioni
  • Scopus 36
  • ???jsp.display-item.citation.isi??? 13
Social impact