Genomic regions that encode small RNA genes exhibit characteristic patterns in their sequence, secondary structure, and evolutionary conservation. Convolutional Neural Networks are a family of algorithms that can classify data based on learned patterns. Here we present MuStARD an application of Convolutional Neural Networks that can learn patterns associated with user-defined sets of genomic regions, and scan large genomic areas for novel regions exhibiting similar characteristics. We demonstrate that MuStARD is a generic method that can be trained on different classes of human small RNA genomic loci, without need for domain specific knowledge, due to the automated feature and background selection processes built into the model. We also demonstrate the ability of MuStARD for inter-species identification of functional elements by predicting mouse small RNAs (pre-miRNAs and snoRNAs) using models trained on the human genome. MuStARD can be used to filter small RNA-Seq datasets for identification of novel small RNA loci, intra- and inter- species, as demonstrated in three use cases of human, mouse, and fly pre-miRNA prediction. MuStARD is easy to deploy and extend to a variety of genomic classification questions. Code and trained models are freely available at gitlab.com/RBP_Bioinformatics/mustard.

Georgakilas, G., Grioni, A., Liakos, K., Chalupova, E., Plessas, F., Alexiou, P. (2020). Multi-branch Convolutional Neural Network for Identification of Small Non-coding RNA genomic loci. SCIENTIFIC REPORTS, 10(1) [10.1038/s41598-020-66454-3].

Multi-branch Convolutional Neural Network for Identification of Small Non-coding RNA genomic loci

Grioni A.;
2020

Abstract

Genomic regions that encode small RNA genes exhibit characteristic patterns in their sequence, secondary structure, and evolutionary conservation. Convolutional Neural Networks are a family of algorithms that can classify data based on learned patterns. Here we present MuStARD an application of Convolutional Neural Networks that can learn patterns associated with user-defined sets of genomic regions, and scan large genomic areas for novel regions exhibiting similar characteristics. We demonstrate that MuStARD is a generic method that can be trained on different classes of human small RNA genomic loci, without need for domain specific knowledge, due to the automated feature and background selection processes built into the model. We also demonstrate the ability of MuStARD for inter-species identification of functional elements by predicting mouse small RNAs (pre-miRNAs and snoRNAs) using models trained on the human genome. MuStARD can be used to filter small RNA-Seq datasets for identification of novel small RNA loci, intra- and inter- species, as demonstrated in three use cases of human, mouse, and fly pre-miRNA prediction. MuStARD is easy to deploy and extend to a variety of genomic classification questions. Code and trained models are freely available at gitlab.com/RBP_Bioinformatics/mustard.
Articolo in rivista - Articolo scientifico
Algorithms; Animals; Computational Biology; Genomics; Humans; Mice; MicroRNAs; Neural Networks, Computer; RNA, Small Nucleolar; RNA, Untranslated; Software
English
2020
10
1
9486
open
Georgakilas, G., Grioni, A., Liakos, K., Chalupova, E., Plessas, F., Alexiou, P. (2020). Multi-branch Convolutional Neural Network for Identification of Small Non-coding RNA genomic loci. SCIENTIFIC REPORTS, 10(1) [10.1038/s41598-020-66454-3].
File in questo prodotto:
File Dimensione Formato  
Georgakilas-2020-Scientific Reports-VoR.pdf

accesso aperto

Tipologia di allegato: Publisher’s Version (Version of Record, VoR)
Licenza: Creative Commons
Dimensione 2.54 MB
Formato Adobe PDF
2.54 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/510759
Citazioni
  • Scopus 14
  • ???jsp.display-item.citation.isi??? 13
Social impact