The string graph for a collection of next-generation reads is a lossless data representation that is fundamental for de novo assemblers based on the overlap-layout-consensus paradigm. In this article, we explore a novel approach to compute the string graph, based on the FM-index and Burrows and Wheeler Transform. We describe a simple algorithm that uses only the FM-index representation of the collection of reads to construct the string graph, without accessing the input reads. Our algorithm has been integrated into the string graph assembler (SGA) as a standalone module to construct the string graph. The new integrated assembler has been assessed on a standard benchmark, showing that fast string graph (FSG) is significantly faster than SGA while maintaining a moderate use of main memory, and showing practical advantages in running FSG on multiple threads. Moreover, we have studied the effect of coverage rates on the running times.

Bonizzoni, P., Della Vedova, G., Pirola, Y., Previtali, M., Rizzi, R. (2017). FSG: Fast String Graph Construction for de Novo Assembly. JOURNAL OF COMPUTATIONAL BIOLOGY, 24(10), 953-968 [10.1089/cmb.2017.0089].

FSG: Fast String Graph Construction for de Novo Assembly

Bonizzoni, P
;
Della Vedova, G;Pirola, Y;Previtali, M;Rizzi, R
2017

Abstract

The string graph for a collection of next-generation reads is a lossless data representation that is fundamental for de novo assemblers based on the overlap-layout-consensus paradigm. In this article, we explore a novel approach to compute the string graph, based on the FM-index and Burrows and Wheeler Transform. We describe a simple algorithm that uses only the FM-index representation of the collection of reads to construct the string graph, without accessing the input reads. Our algorithm has been integrated into the string graph assembler (SGA) as a standalone module to construct the string graph. The new integrated assembler has been assessed on a standard benchmark, showing that fast string graph (FSG) is significantly faster than SGA while maintaining a moderate use of main memory, and showing practical advantages in running FSG on multiple threads. Moreover, we have studied the effect of coverage rates on the running times.
Articolo in rivista - Articolo scientifico
Burrows and Wheeler Transform; genome assembly; string graph.; Modeling and Simulation; Molecular Biology; Genetics; Computational Mathematics; Computational Theory and Mathematics
English
2017
24
10
953
968
partially_open
Bonizzoni, P., Della Vedova, G., Pirola, Y., Previtali, M., Rizzi, R. (2017). FSG: Fast String Graph Construction for de Novo Assembly. JOURNAL OF COMPUTATIONAL BIOLOGY, 24(10), 953-968 [10.1089/cmb.2017.0089].
File in questo prodotto:
File Dimensione Formato  
FSG Fast String Graph Construction for De Novo As.pdf

Solo gestori archivio

Tipologia di allegato: Publisher’s Version (Version of Record, VoR)
Dimensione 421.05 kB
Formato Adobe PDF
421.05 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
FSG.pdf

accesso aperto

Tipologia di allegato: Author’s Accepted Manuscript, AAM (Post-print)
Dimensione 233.93 kB
Formato Adobe PDF
233.93 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/185846
Citazioni
  • Scopus 12
  • ???jsp.display-item.citation.isi??? 6
Social impact