Quality control (QC) is fundamental in single-cell RNA sequencing (scRNA-seq) data analysis pipelines to ensure data reliability. A critical QC step involves identifying damaged cells using quality metrics like the percentage of mitochondrial genes or the total number of reads. However, automatically determining the threshold of these metrics for filtering damaged cells can be challenging. Moreover, using this metric alone may result in the removal of biologically meaningful cells. This study aims to find alternative biomarkers to improve the identification of damaged cells, focusing on gene lists other than mitochondrial genes. We hypothesized that genes localized within other organelles, particularly the nucleus, would exhibit similar enrichment patterns as mitochondrial genes. To test this hypothesis, we used a public scRNA-seq dataset where damaged cells were labelled via optical inspection. We considered as potential descriptors the percentage of genes from various lists, in particular lists of transcripts detected within the nucleus. We built a binary logistic regression model to differentiate damaged cells from good cells and evaluated its performance. Our results showed that the traditional criteria, such as mitochondrial genes, number of genes, and total counts, successfully identified damaged cells but tended to overestimate damage. Our findings suggest that although standard features are effective, their poor precision can be problematic. Incorporating other gene lists, particularly those related to nuclear transcripts, into classification models can improve the prediction of damaged cells. Further investigation is needed to understand the underlying mechanisms driving these relationships.

Marteletto, G., Galuzzi, B., Damiani, C. (2025). Identifying Damage-Related Features in scRNA-seq Data. In Computational Intelligence Methods for Bioinformatics and Biostatistics 18th International Meeting, CIBB 2023, Padova, Italy, September 6–8, 2023, Revised Selected Papers (pp.192-201). Springer Science and Business Media Deutschland GmbH [10.1007/978-3-031-90714-2_14].

Identifying Damage-Related Features in scRNA-seq Data

Galuzzi B. G.
;
Damiani C.
Ultimo
2025

Abstract

Quality control (QC) is fundamental in single-cell RNA sequencing (scRNA-seq) data analysis pipelines to ensure data reliability. A critical QC step involves identifying damaged cells using quality metrics like the percentage of mitochondrial genes or the total number of reads. However, automatically determining the threshold of these metrics for filtering damaged cells can be challenging. Moreover, using this metric alone may result in the removal of biologically meaningful cells. This study aims to find alternative biomarkers to improve the identification of damaged cells, focusing on gene lists other than mitochondrial genes. We hypothesized that genes localized within other organelles, particularly the nucleus, would exhibit similar enrichment patterns as mitochondrial genes. To test this hypothesis, we used a public scRNA-seq dataset where damaged cells were labelled via optical inspection. We considered as potential descriptors the percentage of genes from various lists, in particular lists of transcripts detected within the nucleus. We built a binary logistic regression model to differentiate damaged cells from good cells and evaluated its performance. Our results showed that the traditional criteria, such as mitochondrial genes, number of genes, and total counts, successfully identified damaged cells but tended to overestimate damage. Our findings suggest that although standard features are effective, their poor precision can be problematic. Incorporating other gene lists, particularly those related to nuclear transcripts, into classification models can improve the prediction of damaged cells. Further investigation is needed to understand the underlying mechanisms driving these relationships.
paper
Damaged cells; Quality control; scRNA-seq data analysis;
English
18th International Conference on Computational Intelligence Methods for Bioinformatics and Biostatistics, CIBB 2023 - 6 September 2023 through 8 September 2023
2023
Vettoretti, M; Tavazzi, E; Longato, E; Baruzzo, G; Bellato, M
Computational Intelligence Methods for Bioinformatics and Biostatistics 18th International Meeting, CIBB 2023, Padova, Italy, September 6–8, 2023, Revised Selected Papers
9783031907135
2025
14513 LNBI
192
201
reserved
Marteletto, G., Galuzzi, B., Damiani, C. (2025). Identifying Damage-Related Features in scRNA-seq Data. In Computational Intelligence Methods for Bioinformatics and Biostatistics 18th International Meeting, CIBB 2023, Padova, Italy, September 6–8, 2023, Revised Selected Papers (pp.192-201). Springer Science and Business Media Deutschland GmbH [10.1007/978-3-031-90714-2_14].
File in questo prodotto:
File Dimensione Formato  
Marteletto et al-2025-CIBB-VoR.pdf

Solo gestori archivio

Tipologia di allegato: Publisher’s Version (Version of Record, VoR)
Licenza: Tutti i diritti riservati
Dimensione 3.05 MB
Formato Adobe PDF
3.05 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/559077
Citazioni
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
Social impact