We present an algorithm for finding common secondary structure motifs in a set of unaligned RNA sequences. The basic version of the algorithm takes as input a set of strings representing the secondary structure of the sequences, enumerates a set of candidate secondary structure patterns, and finally reports all those patterns that appear, possibly with variations, in all or most of the sequences of the set. By considering structural information only, the algorithm can be applied to cases where the input sequences do not present any significant similarity. However, sequence information can be added to the algorithm at different levels. Patterns describing RNA secondary structure elements present a peculiar symmetric layout that makes affix trees a suitable indexing structure that significantly accelerates the searching process, by permitting bidirectional search from the middle to the outside of patterns. In case the secondary structure of the input sequences is not available, we show how the algorithm can deal with the uncertainty deriving from prediction methods, or can predict the structure by itself on the fly while searching for patterns, again taking advantage of the information contained in the affix tree built for the sequences. Finally, we present some case studies where the algorithm was able to detect experimentally known RNA stem–loop motifs, either by using predicted structures, or by folding the sequences by itself.

Mauri, G., Pavesi, G. (2003). Pattern discovery in RNA secondary structure using affix trees. In Combinatorial pattern matching (pp.278-294). Berlin : Springer [10.1007/3-540-44888-8_21].

Pattern discovery in RNA secondary structure using affix trees

MAURI, GIANCARLO;
2003

Abstract

We present an algorithm for finding common secondary structure motifs in a set of unaligned RNA sequences. The basic version of the algorithm takes as input a set of strings representing the secondary structure of the sequences, enumerates a set of candidate secondary structure patterns, and finally reports all those patterns that appear, possibly with variations, in all or most of the sequences of the set. By considering structural information only, the algorithm can be applied to cases where the input sequences do not present any significant similarity. However, sequence information can be added to the algorithm at different levels. Patterns describing RNA secondary structure elements present a peculiar symmetric layout that makes affix trees a suitable indexing structure that significantly accelerates the searching process, by permitting bidirectional search from the middle to the outside of patterns. In case the secondary structure of the input sequences is not available, we show how the algorithm can deal with the uncertainty deriving from prediction methods, or can predict the structure by itself on the fly while searching for patterns, again taking advantage of the information contained in the affix tree built for the sequences. Finally, we present some case studies where the algorithm was able to detect experimentally known RNA stem–loop motifs, either by using predicted structures, or by folding the sequences by itself.
slide + paper
Pattern discovery; affix trees
English
CPM 2003 - 14th Symp. on Combinatorial Pattern Matching
2003
Combinatorial pattern matching
978-3-540-40311-1
2003
LNCS 2676
278
294
none
Mauri, G., Pavesi, G. (2003). Pattern discovery in RNA secondary structure using affix trees. In Combinatorial pattern matching (pp.278-294). Berlin : Springer [10.1007/3-540-44888-8_21].
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/17525
Citazioni
  • Scopus 15
  • ???jsp.display-item.citation.isi??? 13
Social impact