Minimizer sketches summarize sequences with the main purpose of keeping the smallest footprint to be used for the sequence comparison of multiple reads: they are based on the notion of the smallest lexicographic k-mer in a window. Building on the concept of Lyndon factorization and a compact representation of sequences, called fingerprints, that correspond to the length of the factors in the factorization, we extend the notion of minimizer sketches to read fingerprints. By leveraging the conservation property of Lyndon factorization, we propose a novel approach for a fast comparison of long reads, to detect overlapping read pairs. An experimental evaluation of assemblies produced using the overlaps computed by our approach shows that it is competitive with the state-of-the-art tool minimap2 in terms of quality, while being up to 5 times faster at higher coverage levels.

Masri, O., Pirola, Y., Borghi, C., Bonizzoni, P., Rizzi, R. (2025). Minimizer Sketches of Lyndon-Fingerprints: A Novel Approach to Compute Overlaps Among Long Reads. In 2025 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) (pp.610-613). IEEE Computer Society [10.1109/bibm66473.2025.11356736].

Minimizer Sketches of Lyndon-Fingerprints: A Novel Approach to Compute Overlaps Among Long Reads

Pirola, Yuri;Bonizzoni, Paola;Rizzi, Raffaella
2025

Abstract

Minimizer sketches summarize sequences with the main purpose of keeping the smallest footprint to be used for the sequence comparison of multiple reads: they are based on the notion of the smallest lexicographic k-mer in a window. Building on the concept of Lyndon factorization and a compact representation of sequences, called fingerprints, that correspond to the length of the factors in the factorization, we extend the notion of minimizer sketches to read fingerprints. By leveraging the conservation property of Lyndon factorization, we propose a novel approach for a fast comparison of long reads, to detect overlapping read pairs. An experimental evaluation of assemblies produced using the overlaps computed by our approach shows that it is competitive with the state-of-the-art tool minimap2 in terms of quality, while being up to 5 times faster at higher coverage levels.
paper
Lyndon factorization; overlap; Bioinformatics
English
2025 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) - 15-18 December 2025
2025
2025 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)
9798331515577
2025
610
613
reserved
Masri, O., Pirola, Y., Borghi, C., Bonizzoni, P., Rizzi, R. (2025). Minimizer Sketches of Lyndon-Fingerprints: A Novel Approach to Compute Overlaps Among Long Reads. In 2025 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) (pp.610-613). IEEE Computer Society [10.1109/bibm66473.2025.11356736].
File in questo prodotto:
File Dimensione Formato  
Masri-2025-BIBM-VoR.pdf

Solo gestori archivio

Tipologia di allegato: Publisher’s Version (Version of Record, VoR)
Licenza: Tutti i diritti riservati
Dimensione 354.41 kB
Formato Adobe PDF
354.41 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/591523
Citazioni
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
Social impact