Consistency and completeness of biomolecular annotations is a keypoint of correct interpretation of biological experiments. Yet, the associations between genes (or proteins) and features correctly annotated are just some of all the existing ones. As time goes by, they increase in number and become more useful, but they remain incomplete and some of them incorrect. To support and quicken their time-consuming curation procedure and to improve consistence of available annotations, computational methods that are able to supply a ranked list of predicted annotations are hence extremely useful. Starting from a previous work on the automatic prediction of Gene Ontology (GO) annotations based on the Singular Value Decomposition of the annotation matrix, where every matrix element corresponds to the association of a gene with a feature, we propose the use of a modified Probabilistic Latent Semantic Analysis (pLSA) algorithm, named pLSAnorm, to better perform such prediction. pLSA is a statistical technique from the natural language processing field, which has not been used in bioinformatics annotation prediction yet; it takes advantage of the latent information contained in the analyzed data co-occurrences. We proved the effectiveness of the pLSAnorm prediction method by performing k-fold cross-validation of the GO annotations of two organisms, Gallus gallus and Bos taurus. Obtained results demonstrate the efficacy of our approach.
Masseroli, M., Chicco, D., Pinoli, P. (2012). Probabilistic Latent Semantic Analysis for prediction of Gene Ontology annotations. In WCCI 2012 IEEE World Congress on Computational Intelligence; The 2012 International Joint Conference on Neural Networks (IJCNN) (pp.2891-2898). IEEE [10.1109/IJCNN.2012.6252767].
Probabilistic Latent Semantic Analysis for prediction of Gene Ontology annotations
Chicco, D;
2012
Abstract
Consistency and completeness of biomolecular annotations is a keypoint of correct interpretation of biological experiments. Yet, the associations between genes (or proteins) and features correctly annotated are just some of all the existing ones. As time goes by, they increase in number and become more useful, but they remain incomplete and some of them incorrect. To support and quicken their time-consuming curation procedure and to improve consistence of available annotations, computational methods that are able to supply a ranked list of predicted annotations are hence extremely useful. Starting from a previous work on the automatic prediction of Gene Ontology (GO) annotations based on the Singular Value Decomposition of the annotation matrix, where every matrix element corresponds to the association of a gene with a feature, we propose the use of a modified Probabilistic Latent Semantic Analysis (pLSA) algorithm, named pLSAnorm, to better perform such prediction. pLSA is a statistical technique from the natural language processing field, which has not been used in bioinformatics annotation prediction yet; it takes advantage of the latent information contained in the analyzed data co-occurrences. We proved the effectiveness of the pLSAnorm prediction method by performing k-fold cross-validation of the GO annotations of two organisms, Gallus gallus and Bos taurus. Obtained results demonstrate the efficacy of our approach.File | Dimensione | Formato | |
---|---|---|---|
Masseroli-2012-IJCNN WCCI-VoR.pdf
Solo gestori archivio
Descrizione: Intervento a convegno
Tipologia di allegato:
Publisher’s Version (Version of Record, VoR)
Licenza:
Tutti i diritti riservati
Dimensione
1.02 MB
Formato
Adobe PDF
|
1.02 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.