Background: De novo genome assembly relies on two kinds of graphs: de Bruijn graphs and overlap graphs. Overlap graphs are the basis for the Celera assembler, while de Bruijn graphs have become the dominant technical device in the last decade. Those two kinds of graphs are collectively called assembly graphs. Results: In this review, we discuss the most recent advances in the problem of constructing, representing and navigating assembly graphs, focusing on very large datasets. We will also explore some computational techniques, such as the Bloom filter, to compactly store graphs while keeping all functionalities intact. Conclusions: We complete our analysis with a discussion on the algorithmic issues of assembling from long reads (e.g., PacBio and Oxford Nanopore). Finally, we present some of the most relevant open problems in this field. [Figure not available: see fulltext.]

Rizzi, R., Beretta, S., Patterson, M., Pirola, Y., Previtali, M., Della Vedova, G., et al. (2019). Overlap graphs and de Bruijn graphs: data structures for de novo genome assembly in the big data era. QUANTITATIVE BIOLOGY, 7(4), 278-292 [10.1007/s40484-019-0181-x].

Overlap graphs and de Bruijn graphs: data structures for de novo genome assembly in the big data era

Rizzi R.;Beretta S.;Patterson M.;Pirola Y.;Previtali M.;Della Vedova G.;Bonizzoni P.
2019

Abstract

Background: De novo genome assembly relies on two kinds of graphs: de Bruijn graphs and overlap graphs. Overlap graphs are the basis for the Celera assembler, while de Bruijn graphs have become the dominant technical device in the last decade. Those two kinds of graphs are collectively called assembly graphs. Results: In this review, we discuss the most recent advances in the problem of constructing, representing and navigating assembly graphs, focusing on very large datasets. We will also explore some computational techniques, such as the Bloom filter, to compactly store graphs while keeping all functionalities intact. Conclusions: We complete our analysis with a discussion on the algorithmic issues of assembling from long reads (e.g., PacBio and Oxford Nanopore). Finally, we present some of the most relevant open problems in this field. [Figure not available: see fulltext.]
Articolo in rivista - Review Essay
de Bruijn graphs; genome assembly; long reads; overlap graphs; string graphs
English
2019
7
4
278
292
open
Rizzi, R., Beretta, S., Patterson, M., Pirola, Y., Previtali, M., Della Vedova, G., et al. (2019). Overlap graphs and de Bruijn graphs: data structures for de novo genome assembly in the big data era. QUANTITATIVE BIOLOGY, 7(4), 278-292 [10.1007/s40484-019-0181-x].
File in questo prodotto:
File Dimensione Formato  
10281-257222.pdf

accesso aperto

Tipologia di allegato: Publisher’s Version (Version of Record, VoR)
Dimensione 341.46 kB
Formato Adobe PDF
341.46 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/257222
Citazioni
  • Scopus 25
  • ???jsp.display-item.citation.isi??? 20
Social impact