Motivation: Recent advances in high throughput RNA-Seq technologies allow to produce massive datasets. When a study focuses only on a handful of genes, most reads are not relevant and degrade the performance of the tools used to analyze the data. Removing irrelevant reads from the input dataset leads to improved efficiency without compromising the results of the study. Results: We introduce a novel computational problem, called gene assignment and we propose an efficient alignment-free approach to solve it. Given an RNA-Seq sample and a panel of genes, a gene assignment consists in extracting from the sample the reads that most probably were sequenced from those genes. The problem becomes more complicated when the sample exhibits evidence of novel alternative splicing events. We implemented our approach in a tool called Shark and assessed its effectiveness in speeding up differential splicing analysis pipelines. This evaluation shows that Shark is able to significantly improve the performance of RNA-Seq analysis tools without having any impact on the final results. Availability: The tool is distributed as a stand-alone module and the software is freely available at https://github.com/AlgoLab/shark.

Denti, L., Pirola, Y., Previtali, M., Ceccato, T., Della Vedova, G., Rizzi, R., et al. (2021). Shark: fishing relevant reads in an RNA-Seq sample. BIOINFORMATICS, 37(4), 464-472 [10.1093/bioinformatics/btaa779].

Shark: fishing relevant reads in an RNA-Seq sample

Denti, Luca
Co-primo
;
Pirola, Yuri
Co-primo
;
Previtali, Marco
Co-primo
;
Della Vedova, Gianluca;Rizzi, Raffaella;Bonizzoni, Paola
Ultimo
2021

Abstract

Motivation: Recent advances in high throughput RNA-Seq technologies allow to produce massive datasets. When a study focuses only on a handful of genes, most reads are not relevant and degrade the performance of the tools used to analyze the data. Removing irrelevant reads from the input dataset leads to improved efficiency without compromising the results of the study. Results: We introduce a novel computational problem, called gene assignment and we propose an efficient alignment-free approach to solve it. Given an RNA-Seq sample and a panel of genes, a gene assignment consists in extracting from the sample the reads that most probably were sequenced from those genes. The problem becomes more complicated when the sample exhibits evidence of novel alternative splicing events. We implemented our approach in a tool called Shark and assessed its effectiveness in speeding up differential splicing analysis pipelines. This evaluation shows that Shark is able to significantly improve the performance of RNA-Seq analysis tools without having any impact on the final results. Availability: The tool is distributed as a stand-alone module and the software is freely available at https://github.com/AlgoLab/shark.
Articolo in rivista - Articolo scientifico
RNA-Seq; Alternative Splicing; Bioinformatics; Filtering; Succint data structures; Bloom filters;
English
464
472
9
Denti, L., Pirola, Y., Previtali, M., Ceccato, T., Della Vedova, G., Rizzi, R., et al. (2021). Shark: fishing relevant reads in an RNA-Seq sample. BIOINFORMATICS, 37(4), 464-472 [10.1093/bioinformatics/btaa779].
File in questo prodotto:
File Dimensione Formato  
btaa779.pdf

accesso aperto

Descrizione: Main article
Tipologia di allegato: Publisher’s Version (Version of Record, VoR)
Dimensione 719.91 kB
Formato Adobe PDF
719.91 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/289095
Citazioni
  • Scopus 3
  • ???jsp.display-item.citation.isi??? 2
Social impact