: Molecular subtypes, such as defined by The Cancer Genome Atlas (TCGA), delineate a cancer's underlying biology, bringing hope to inform a patient's prognosis and treatment plan. However, most approaches used in the discovery of subtypes are not suitable for assigning subtype labels to new cancer specimens from other studies or clinical trials. Here, we address this barrier by applying five different machine learning approaches to multi-omic data from 8,791 TCGA tumor samples comprising 106 subtypes from 26 different cancer cohorts to build models based upon small numbers of features that can classify new samples into previously defined TCGA molecular subtypes-a step toward molecular subtype application in the clinic. We validate select classifiers using external datasets. Predictive performance and classifier-selected features yield insight into the different machine-learning approaches and genomic data platforms. For each cancer and data type we provide containerized versions of the top-performing models as a public resource.

Kyle, E., Christopher K., W., Christina, Y., Mauro A. A., C., Jordan A., L., Brian J., K., et al. (2025). Classification of non-TCGA cancer samples to TCGA molecular subtypes using compact feature sets. CANCER CELL [10.1016/j.ccell.2024.12.002].

Classification of non-TCGA cancer samples to TCGA molecular subtypes using compact feature sets

Daniele Ramazzotti;
2025

Abstract

: Molecular subtypes, such as defined by The Cancer Genome Atlas (TCGA), delineate a cancer's underlying biology, bringing hope to inform a patient's prognosis and treatment plan. However, most approaches used in the discovery of subtypes are not suitable for assigning subtype labels to new cancer specimens from other studies or clinical trials. Here, we address this barrier by applying five different machine learning approaches to multi-omic data from 8,791 TCGA tumor samples comprising 106 subtypes from 26 different cancer cohorts to build models based upon small numbers of features that can classify new samples into previously defined TCGA molecular subtypes-a step toward molecular subtype application in the clinic. We validate select classifiers using external datasets. Predictive performance and classifier-selected features yield insight into the different machine-learning approaches and genomic data platforms. For each cancer and data type we provide containerized versions of the top-performing models as a public resource.
Articolo in rivista - Articolo scientifico
TCGA; artificial intelligence; biomarkers; cancer; classification; epigenomic; genomic; machine learning; molecular; pathology
English
2-gen-2025
2025
open
Kyle, E., Christopher K., W., Christina, Y., Mauro A. A., C., Jordan A., L., Brian J., K., et al. (2025). Classification of non-TCGA cancer samples to TCGA molecular subtypes using compact feature sets. CANCER CELL [10.1016/j.ccell.2024.12.002].
File in questo prodotto:
File Dimensione Formato  
Ellrott-2025-Cancer Cell-VoR.pdf

accesso aperto

Descrizione: This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
Tipologia di allegato: Publisher’s Version (Version of Record, VoR)
Licenza: Creative Commons
Dimensione 14.19 MB
Formato Adobe PDF
14.19 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/530921
Citazioni
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
Social impact