Graphs are the most suited data structure to summarize the transcript isoforms produced by a gene. Such graphs may be modeled by the notion of hypertext, that is a graph where nodes are texts representing the exons of the gene and edges connect consecutive exons of a transcript. Mapping reads obtained by deep transcriptome sequencing to such graphs is crucial to compare reads with an annotation of transcript isoforms and to infer novel events due to alternative splicing at the exonic level. In this paper, we propose an algorithm based on Maximal Exact Matches that efficiently solves the approximate pattern matching of a pattern P to a hypertext H. We implement it into Splicing Graph ALigner (SGAL), a tool that performs an accurate mapping of RNA-seq reads against a graph that is a representation of annotated and potentially new transcripts of a gene. Moreover, we performed an experimental analysis to compare SGAL to a state-of-art tool for spliced alignment (STAR), and to identify novel putative alternative splicing events such as exon skipping directly from mapping reads to the graph. Such analysis shows that our tool is able to perform accurate mapping of reads to exons, with good time and space performance. The software is freely available at https://github.com/AlgoLab/galig.

Beretta, S., Bonizzoni, P., Denti, L., Previtali, M., Rizzi, R. (2017). Mapping RNA-seq data to a transcript graph via approximate pattern matching to a hypertext. In Algorithms for Computational Biology (pp.49-61). Springer Verlag [10.1007/978-3-319-58163-7_3].

Mapping RNA-seq data to a transcript graph via approximate pattern matching to a hypertext

BERETTA, STEFANO
Primo
;
BONIZZONI, PAOLA
Secondo
;
DENTI, LUCA
;
PREVITALI, MARCO
Penultimo
;
RIZZI, RAFFAELLA
Ultimo
2017

Abstract

Graphs are the most suited data structure to summarize the transcript isoforms produced by a gene. Such graphs may be modeled by the notion of hypertext, that is a graph where nodes are texts representing the exons of the gene and edges connect consecutive exons of a transcript. Mapping reads obtained by deep transcriptome sequencing to such graphs is crucial to compare reads with an annotation of transcript isoforms and to infer novel events due to alternative splicing at the exonic level. In this paper, we propose an algorithm based on Maximal Exact Matches that efficiently solves the approximate pattern matching of a pattern P to a hypertext H. We implement it into Splicing Graph ALigner (SGAL), a tool that performs an accurate mapping of RNA-seq reads against a graph that is a representation of annotated and potentially new transcripts of a gene. Moreover, we performed an experimental analysis to compare SGAL to a state-of-art tool for spliced alignment (STAR), and to identify novel putative alternative splicing events such as exon skipping directly from mapping reads to the graph. Such analysis shows that our tool is able to perform accurate mapping of reads to exons, with good time and space performance. The software is freely available at https://github.com/AlgoLab/galig.
paper
Alternative splicing; Approximate sequence analysis; Graph-based alignment; Next-generation sequencing; Theoretical Computer Science; Computer Science (all)
English
International Conference, AlCoB 2017
2017
Algorithms for Computational Biology
9783319581620
2017
10252
49
61
none
Beretta, S., Bonizzoni, P., Denti, L., Previtali, M., Rizzi, R. (2017). Mapping RNA-seq data to a transcript graph via approximate pattern matching to a hypertext. In Algorithms for Computational Biology (pp.49-61). Springer Verlag [10.1007/978-3-319-58163-7_3].
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/168458
Citazioni
  • Scopus 9
  • ???jsp.display-item.citation.isi??? 5
Social impact