Several experiments and observations have revealed the fact that small local distinct structural features in RNA molecules are correlated with their biological function, for example, in post-transcriptional regulation of gene expression. Thus, finding similar structural features in a set of RNA sequences known to play the same biological function could provide substantial information concerning which parts of the sequences are responsible for the function itself. Unfortunately, finding common structural elements in RNA molecules is a very challenging task, even if limited to secondary structure. The main difficulty lies in the fact that in nearly all the cases the structure of the molecules is unknown, has to be somehow predicted, and that sequences with little or no similarity can fold into similar structures. Although they differ in some details, the approaches proposed so far are usually based on the preliminary alignment of the sequences and attempt to predict common structures (either local or global, or for some selected regions) for the aligned sequences. These methods give good results when sequence and structure similarity are very high, but function less well when similarity is limited to small and local elements, like single stem-loop motifs. Instead of aligning the sequences, the algorithm we present directly searches for regions of the sequences that can fold into similar structures, where the degree of similarity can be defined by the user. Any information concerning sequence similarity in the motifs can be used either as a search constraint, or a posteriori, by post-processing the output. The search for the regions sharing structural similarity is implemented with the affix tree, a novel text-indexing structure that significantly accelerates the search for patterns having a symmetric layout, such as those forming stem-loop structures. Tests based on experimentally known structures have shown that the algorithm is able to identify functional motifs in the secondary structure of non coding RNA, such as Iron Responsive Elements (IRE) in the untranslated regions of ferritin mRNA, and the domain IV stem-loop structure in SRP RNA.

Pavesi, G., Mauri, G., Pesole, G. (2004). An algorithm for finding conserved secondary structure motifs in unaligned RNA sequences. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 19(1), 2-12 [10.1007/BF02944781].

An algorithm for finding conserved secondary structure motifs in unaligned RNA sequences

MAURI, GIANCARLO;
2004

Abstract

Several experiments and observations have revealed the fact that small local distinct structural features in RNA molecules are correlated with their biological function, for example, in post-transcriptional regulation of gene expression. Thus, finding similar structural features in a set of RNA sequences known to play the same biological function could provide substantial information concerning which parts of the sequences are responsible for the function itself. Unfortunately, finding common structural elements in RNA molecules is a very challenging task, even if limited to secondary structure. The main difficulty lies in the fact that in nearly all the cases the structure of the molecules is unknown, has to be somehow predicted, and that sequences with little or no similarity can fold into similar structures. Although they differ in some details, the approaches proposed so far are usually based on the preliminary alignment of the sequences and attempt to predict common structures (either local or global, or for some selected regions) for the aligned sequences. These methods give good results when sequence and structure similarity are very high, but function less well when similarity is limited to small and local elements, like single stem-loop motifs. Instead of aligning the sequences, the algorithm we present directly searches for regions of the sequences that can fold into similar structures, where the degree of similarity can be defined by the user. Any information concerning sequence similarity in the motifs can be used either as a search constraint, or a posteriori, by post-processing the output. The search for the regions sharing structural similarity is implemented with the affix tree, a novel text-indexing structure that significantly accelerates the search for patterns having a symmetric layout, such as those forming stem-loop structures. Tests based on experimentally known structures have shown that the algorithm is able to identify functional motifs in the secondary structure of non coding RNA, such as Iron Responsive Elements (IRE) in the untranslated regions of ferritin mRNA, and the domain IV stem-loop structure in SRP RNA.
Articolo in rivista - Articolo scientifico
pattern discovery; RNA secondary structure; affix trees
English
2004
19
1
2
12
none
Pavesi, G., Mauri, G., Pesole, G. (2004). An algorithm for finding conserved secondary structure motifs in unaligned RNA sequences. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 19(1), 2-12 [10.1007/BF02944781].
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/2477
Citazioni
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 2
Social impact