The string graph for a collection of next-generation reads is a lossless data representation that is fundamental for de novo assemblers based on the overlap-layout-consensus paradigm. In this paper, we explore a novel approach to compute the string graph, based on the FMindex and Burrows-Wheeler Transform (BWT). We describe a simple algorithm that uses only the FM-index representation of the collection of reads to construct the string graph, without accessing the input reads. Our algorithm has been integrated into the SGA assembler as a standalone module to construct the string graph. The new integrated assembler has been assessed on a standard benchmark, showing that FSG is significantly faster than SGA while maintaining a moderate use of main memory, and showing practical advantages in running FSG on multiple threads.

Bonizzoni, P., DELLA VEDOVA, G., Pirola, Y., Previtali, M., Rizzi, R. (2016). FSG: Fast string graph construction for de novo assembly of reads data. In 12th International Symposium on Bioinformatics Research and Applications, ISBRA 2016; Minsk; Belarus; 5 June 2016 through 8 June 2016 (pp.27-39). Springer Verlag [10.1007/978-3-319-38782-6_3].

FSG: Fast string graph construction for de novo assembly of reads data

BONIZZONI, PAOLA;DELLA VEDOVA, GIANLUCA;PIROLA, YURI;PREVITALI, MARCO
;
RIZZI, RAFFAELLA
2016

Abstract

The string graph for a collection of next-generation reads is a lossless data representation that is fundamental for de novo assemblers based on the overlap-layout-consensus paradigm. In this paper, we explore a novel approach to compute the string graph, based on the FMindex and Burrows-Wheeler Transform (BWT). We describe a simple algorithm that uses only the FM-index representation of the collection of reads to construct the string graph, without accessing the input reads. Our algorithm has been integrated into the SGA assembler as a standalone module to construct the string graph. The new integrated assembler has been assessed on a standard benchmark, showing that FSG is significantly faster than SGA while maintaining a moderate use of main memory, and showing practical advantages in running FSG on multiple threads.
slide + paper
Theoretical Computer Science; Computer Science
English
12th International Symposium on Bioinformatics Research and Applications, ISBRA 2016
2016
Bourgeois, A; Skums, P; Wan, X; Zelikovsky, A
12th International Symposium on Bioinformatics Research and Applications, ISBRA 2016; Minsk; Belarus; 5 June 2016 through 8 June 2016
9783319387819
2016
9683
27
39
reserved
Bonizzoni, P., DELLA VEDOVA, G., Pirola, Y., Previtali, M., Rizzi, R. (2016). FSG: Fast string graph construction for de novo assembly of reads data. In 12th International Symposium on Bioinformatics Research and Applications, ISBRA 2016; Minsk; Belarus; 5 June 2016 through 8 June 2016 (pp.27-39). Springer Verlag [10.1007/978-3-319-38782-6_3].
File in questo prodotto:
File Dimensione Formato  
conf-paper-16-isbra.pdf

Solo gestori archivio

Descrizione: Articolo principale
Dimensione 219.87 kB
Formato Adobe PDF
219.87 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/130712
Citazioni
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 2
Social impact