Minimizer sketches summarize sequences with the main purpose of keeping the smallest footprint to be used for the sequence comparison of multiple reads: they are based on the notion of the smallest lexicographic k-mer in a window. Building on the concept of Lyndon factorization and a compact representation of sequences, called fingerprints, that correspond to the length of the factors in the factorization, we extend the notion of minimizer sketches to read fingerprints. By leveraging the conservation property of Lyndon factorization, we propose a novel approach for a fast comparison of long reads, to detect overlapping read pairs. An experimental evaluation of assemblies produced using the overlaps computed by our approach shows that it is competitive with the state-of-the-art tool minimap2 in terms of quality, while being up to 5 times faster at higher coverage levels.
Masri, O., Pirola, Y., Borghi, C., Bonizzoni, P., Rizzi, R. (2025). Minimizer Sketches of Lyndon-Fingerprints: A Novel Approach to Compute Overlaps Among Long Reads. In 2025 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) (pp.610-613). IEEE Computer Society [10.1109/bibm66473.2025.11356736].
Minimizer Sketches of Lyndon-Fingerprints: A Novel Approach to Compute Overlaps Among Long Reads
Pirola, Yuri;Bonizzoni, Paola;Rizzi, Raffaella
2025
Abstract
Minimizer sketches summarize sequences with the main purpose of keeping the smallest footprint to be used for the sequence comparison of multiple reads: they are based on the notion of the smallest lexicographic k-mer in a window. Building on the concept of Lyndon factorization and a compact representation of sequences, called fingerprints, that correspond to the length of the factors in the factorization, we extend the notion of minimizer sketches to read fingerprints. By leveraging the conservation property of Lyndon factorization, we propose a novel approach for a fast comparison of long reads, to detect overlapping read pairs. An experimental evaluation of assemblies produced using the overlaps computed by our approach shows that it is competitive with the state-of-the-art tool minimap2 in terms of quality, while being up to 5 times faster at higher coverage levels.| File | Dimensione | Formato | |
|---|---|---|---|
|
Masri-2025-BIBM-VoR.pdf
Solo gestori archivio
Tipologia di allegato:
Publisher’s Version (Version of Record, VoR)
Licenza:
Tutti i diritti riservati
Dimensione
354.41 kB
Formato
Adobe PDF
|
354.41 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


