Graphs are the most suited data structure to summarize the transcript isoforms produced by a gene. Such graphs may be modeled by the notion of hypertext, that is a graph where nodes are texts representing the exons of the gene and edges connect consecutive exons of a transcript. Mapping reads obtained by deep transcriptome sequencing to such graphs is crucial to compare reads with an annotation of transcript isoforms and to infer novel events due to alternative splicing at the exonic level. In this paper, we propose an algorithm based on Maximal Exact Matches that efficiently solves the approximate pattern matching of a pattern P to a hypertext H. We implement it into Splicing Graph ALigner (SGAL), a tool that performs an accurate mapping of RNA-seq reads against a graph that is a representation of annotated and potentially new transcripts of a gene. Moreover, we performed an experimental analysis to compare SGAL to a state-of-art tool for spliced alignment (STAR), and to identify novel putative alternative splicing events such as exon skipping directly from mapping reads to the graph. Such analysis shows that our tool is able to perform accurate mapping of reads to exons, with good time and space performance. The software is freely available at https://github.com/AlgoLab/galig.
Beretta, S., Bonizzoni, P., Denti, L., Previtali, M., Rizzi, R. (2017). Mapping RNA-seq data to a transcript graph via approximate pattern matching to a hypertext. In Algorithms for Computational Biology (pp.49-61). Springer Verlag [10.1007/978-3-319-58163-7_3].
Mapping RNA-seq data to a transcript graph via approximate pattern matching to a hypertext
BERETTA, STEFANOPrimo
;BONIZZONI, PAOLASecondo
;DENTI, LUCA
;PREVITALI, MARCOPenultimo
;RIZZI, RAFFAELLAUltimo
2017
Abstract
Graphs are the most suited data structure to summarize the transcript isoforms produced by a gene. Such graphs may be modeled by the notion of hypertext, that is a graph where nodes are texts representing the exons of the gene and edges connect consecutive exons of a transcript. Mapping reads obtained by deep transcriptome sequencing to such graphs is crucial to compare reads with an annotation of transcript isoforms and to infer novel events due to alternative splicing at the exonic level. In this paper, we propose an algorithm based on Maximal Exact Matches that efficiently solves the approximate pattern matching of a pattern P to a hypertext H. We implement it into Splicing Graph ALigner (SGAL), a tool that performs an accurate mapping of RNA-seq reads against a graph that is a representation of annotated and potentially new transcripts of a gene. Moreover, we performed an experimental analysis to compare SGAL to a state-of-art tool for spliced alignment (STAR), and to identify novel putative alternative splicing events such as exon skipping directly from mapping reads to the graph. Such analysis shows that our tool is able to perform accurate mapping of reads to exons, with good time and space performance. The software is freely available at https://github.com/AlgoLab/galig.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.