Gene function annotations are key elements in biology and bioinformatics. A typical annotation is the association between a gene and a feature term that describes a functional feature of the gene by using a controlled vocabulary term (e.g. a Gene Ontology (GO) feature term). Unfortunately, available annotations contain errors and biologically validated ones are incomplete by definition, since new knowledge is continuously discovered. Thus, computational algorithms which are able to provide ranked lists of predicted new gene annotations are an excellent contribution to the bioinformatics research. Here, we propose two variants of the known Latent Dirichlet Allocation (LDA) algorithm applied to the prediction of gene annotations. LDA is a very efficient machine learning method built on a set of multinomial probability distributions over a set of topics, given a document (a gene, in our case), and on a set of multinomial probability distributions over a set of words (feature terms, in our case), given a topic. In topic modeling, a topic can be considered as a latent meta-category of words, and a document as a mixture of topics. Our two LDA variants use the collapsed Gibbs Sampling method during the training phase, with two distinct initialization approaches to adapt the LDA mathematical model to the biomolecular annotation scenario. Using six outdated datasets of GO annotations of human and brown rat genes, we compared the annotations predicted by our methods to the ones given by the truncated Singular Value Decomposition (tSVD) method previously developed; then, we validated them by using the annotations available in an updated version of the same datasets. Obtained results show the efficiency of our new proposed algorithms.
Pinoli, P., Chicco, D., Masseroli, M. (2014). Latent dirichlet allocation based on gibbs sampling for gene function prediction. In Proceedings of the International Conference on Computational Intelligence in Bioinformatics and Computational Biology – CIBCB 2014 (pp.1-8). IEEE Computer Society [10.1109/CIBCB.2014.6845514].
Latent dirichlet allocation based on gibbs sampling for gene function prediction
Chicco, D;
2014
Abstract
Gene function annotations are key elements in biology and bioinformatics. A typical annotation is the association between a gene and a feature term that describes a functional feature of the gene by using a controlled vocabulary term (e.g. a Gene Ontology (GO) feature term). Unfortunately, available annotations contain errors and biologically validated ones are incomplete by definition, since new knowledge is continuously discovered. Thus, computational algorithms which are able to provide ranked lists of predicted new gene annotations are an excellent contribution to the bioinformatics research. Here, we propose two variants of the known Latent Dirichlet Allocation (LDA) algorithm applied to the prediction of gene annotations. LDA is a very efficient machine learning method built on a set of multinomial probability distributions over a set of topics, given a document (a gene, in our case), and on a set of multinomial probability distributions over a set of words (feature terms, in our case), given a topic. In topic modeling, a topic can be considered as a latent meta-category of words, and a document as a mixture of topics. Our two LDA variants use the collapsed Gibbs Sampling method during the training phase, with two distinct initialization approaches to adapt the LDA mathematical model to the biomolecular annotation scenario. Using six outdated datasets of GO annotations of human and brown rat genes, we compared the annotations predicted by our methods to the ones given by the truncated Singular Value Decomposition (tSVD) method previously developed; then, we validated them by using the annotations available in an updated version of the same datasets. Obtained results show the efficiency of our new proposed algorithms.File | Dimensione | Formato | |
---|---|---|---|
Pinoli-2014-CIBCB-VoR.pdf
Solo gestori archivio
Descrizione: Intervento a convegno
Tipologia di allegato:
Publisher’s Version (Version of Record, VoR)
Licenza:
Tutti i diritti riservati
Dimensione
513.98 kB
Formato
Adobe PDF
|
513.98 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.