Some recent results (Bauer et al. in Algorithms in bioinformatics, Springer, Berlin, pp 326–337, 2012; Cox et al. in Algorithms in bioinformatics, Springer, Berlin, pp. 214–224, 2012; Rosone and Sciortino in The nature of computation. Logic, algorithms, applications, Springer, Berlin, pp 353–364, 2013) have introduced external-memory algorithms to compute self-indexes of a set of strings, mainly via computing the Burrows–Wheeler transform of the input strings. The motivations for those results stem from Bioinformatics, where a large number of short strings (called reads) are routinely produced and analyzed. In that field, a fundamental problem is to assemble a genome from a large set of much shorter samples extracted from the unknown genome. The approaches that are currently used to tackle this problem are memory-intensive. This fact does not bode well with the ongoing increase in the availability of genomic data. A data structure that is used in genome assembly is the string graph, where vertices correspond to samples and arcs represent two overlapping samples. In this paper we address an open problem of Simpson and Durbin (Bioinformatics 26(12):i367–i373, 2010): to design an external-memory algorithm to compute the string graph.

Bonizzoni, P., Della Vedova, G., Pirola, Y., Previtali, M., Rizzi, R. (2017). An External-Memory Algorithm for String Graph Construction. ALGORITHMICA, 78(2), 394-424 [10.1007/s00453-016-0165-4].

An External-Memory Algorithm for String Graph Construction

Bonizzoni, P;Della Vedova, G;Pirola, Y;Previtali, M
;
Rizzi, R
2017

Abstract

Some recent results (Bauer et al. in Algorithms in bioinformatics, Springer, Berlin, pp 326–337, 2012; Cox et al. in Algorithms in bioinformatics, Springer, Berlin, pp. 214–224, 2012; Rosone and Sciortino in The nature of computation. Logic, algorithms, applications, Springer, Berlin, pp 353–364, 2013) have introduced external-memory algorithms to compute self-indexes of a set of strings, mainly via computing the Burrows–Wheeler transform of the input strings. The motivations for those results stem from Bioinformatics, where a large number of short strings (called reads) are routinely produced and analyzed. In that field, a fundamental problem is to assemble a genome from a large set of much shorter samples extracted from the unknown genome. The approaches that are currently used to tackle this problem are memory-intensive. This fact does not bode well with the ongoing increase in the availability of genomic data. A data structure that is used in genome assembly is the string graph, where vertices correspond to samples and arcs represent two overlapping samples. In this paper we address an open problem of Simpson and Durbin (Bioinformatics 26(12):i367–i373, 2010): to design an external-memory algorithm to compute the string graph.
Articolo in rivista - Articolo scientifico
External memory algorithms, Burrows–Wheeler transform, String graphs, Genome assembly
English
31-mag-2016
2017
78
2
394
424
reserved
Bonizzoni, P., Della Vedova, G., Pirola, Y., Previtali, M., Rizzi, R. (2017). An External-Memory Algorithm for String Graph Construction. ALGORITHMICA, 78(2), 394-424 [10.1007/s00453-016-0165-4].
File in questo prodotto:
File Dimensione Formato  
journ-art-17-algorithmica.pdf

Solo gestori archivio

Tipologia di allegato: Publisher’s Version (Version of Record, VoR)
Dimensione 840.82 kB
Formato Adobe PDF
840.82 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/112989
Citazioni
  • Scopus 9
  • ???jsp.display-item.citation.isi??? 8
Social impact