Introduction: Oxford Nanopore Technologies (ONT) is a third generation sequencing approach that allows the analysis of individual, full-length nucleic acids. ONT records the alterations of an ionic current flowing across a nano-scaled pore while a DNA or RNA strand is threading through the pore. Basecalling methods are then leveraged to translate the recorded signal back to the nucleic acid sequence. However, basecall generally introduces errors that hinder the process of barcode demultiplexing, a pivotal task in single-cell RNA sequencing that allows for separating the sequenced transcripts on the basis of their cell of origin. Methods: To solve this issue, we present a novel framework, called UNPLEX, designed to tackle the barcode demultiplexing problem by operating directly on the recorded signals. UNPLEX combines two unsupervised machine learning methods: autoencoders and self-organizing maps (SOM). The autoencoders extract compact, latent representations of the recorded signals that are then clustered by the SOM. Results and Discussion: Our results, obtained on two datasets composed of in silico generated ONT-like signals, show that UNPLEX represents a promising starting point for the development of effective tools to cluster the signals corresponding to the same cell.

Papetti, D., Spolaor, S., Nazari, I., Tirelli, A., Leonardi, T., Caprioli, C., et al. (2023). Barcode demultiplexing of nanopore sequencing raw signals by unsupervised machine learning. FRONTIERS IN BIOINFORMATICS, 3 [10.3389/fbinf.2023.1067113].

Barcode demultiplexing of nanopore sequencing raw signals by unsupervised machine learning

Papetti, Daniele M.;Spolaor, Simone;Besozzi, Daniela;
2023

Abstract

Introduction: Oxford Nanopore Technologies (ONT) is a third generation sequencing approach that allows the analysis of individual, full-length nucleic acids. ONT records the alterations of an ionic current flowing across a nano-scaled pore while a DNA or RNA strand is threading through the pore. Basecalling methods are then leveraged to translate the recorded signal back to the nucleic acid sequence. However, basecall generally introduces errors that hinder the process of barcode demultiplexing, a pivotal task in single-cell RNA sequencing that allows for separating the sequenced transcripts on the basis of their cell of origin. Methods: To solve this issue, we present a novel framework, called UNPLEX, designed to tackle the barcode demultiplexing problem by operating directly on the recorded signals. UNPLEX combines two unsupervised machine learning methods: autoencoders and self-organizing maps (SOM). The autoencoders extract compact, latent representations of the recorded signals that are then clustered by the SOM. Results and Discussion: Our results, obtained on two datasets composed of in silico generated ONT-like signals, show that UNPLEX represents a promising starting point for the development of effective tools to cluster the signals corresponding to the same cell.
Articolo in rivista - Articolo scientifico
artificial intelligence; autoencoder; complexity reduction; nanopore; RNA barcoding; scRNA-seq; self-organising map; unsupervised learning;
English
2023
3
1067113
open
Papetti, D., Spolaor, S., Nazari, I., Tirelli, A., Leonardi, T., Caprioli, C., et al. (2023). Barcode demultiplexing of nanopore sequencing raw signals by unsupervised machine learning. FRONTIERS IN BIOINFORMATICS, 3 [10.3389/fbinf.2023.1067113].
File in questo prodotto:
File Dimensione Formato  
10281-414276_VoR.pdf

accesso aperto

Tipologia di allegato: Publisher’s Version (Version of Record, VoR)
Licenza: Creative Commons
Dimensione 4.17 MB
Formato Adobe PDF
4.17 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/414276
Citazioni
  • Scopus 4
  • ???jsp.display-item.citation.isi??? 3
Social impact