Glioblastoma is an aggressive brain cancer that kills approximately one hundred thousand people worldwide every year. Unfortunately, treatment and therapy for patients with this disease are complicated and have limited efficacy in improving individuals' chances of survival. Electronic health records (EHRs) contain patient information collected routinely at hospitals through medical visits and laboratory tests, providing an interesting source of data for computational analyses. Clustering is an area of unsupervised machine learning where an algorithm partitions data according to certain statistical properties or rules, thereby identifying hidden patterns and correlations that would otherwise be difficult to notice. In this study, we applied several clustering techniques to three open datasets (Munich2019, Tainan2020, and Utrecht2019) derived from electronic health records, which included clinical, genetic, and administrative features of patients diagnosed with glioblastoma, considering two possible clusters. We evaluated our clustering results with the Density-Based Clustering Validation (DBCV) index, a relatively new score capable of accurately assessing both convex-shaped and concave-shaped clusters. Among the methods tested, Density-based Spatial Clustering of Applications with Noise (DBSCAN) yielded the best results across all three datasets. We then analyzed the features of the clusters identified by DBSCAN and found that cytosolic Hsp70 protein in the Munich2019 dataset, sex in the Tainan2020 dataset, and brain subventricular zone in the Utrecht2019 resulted significantly capable to distinguish the two clusters.
Chicco, D., Dora, S., Oneto, L. (2026). DBSCAN applied to EHRs data from patients with glioblastoma clusters patients based on cytosolic Hsp70 protein, sex, and brain subventricular zone. BIODATA MINING, 19(1) [10.1186/s13040-026-00549-x].
DBSCAN applied to EHRs data from patients with glioblastoma clusters patients based on cytosolic Hsp70 protein, sex, and brain subventricular zone
Chicco D.
Primo
;
2026
Abstract
Glioblastoma is an aggressive brain cancer that kills approximately one hundred thousand people worldwide every year. Unfortunately, treatment and therapy for patients with this disease are complicated and have limited efficacy in improving individuals' chances of survival. Electronic health records (EHRs) contain patient information collected routinely at hospitals through medical visits and laboratory tests, providing an interesting source of data for computational analyses. Clustering is an area of unsupervised machine learning where an algorithm partitions data according to certain statistical properties or rules, thereby identifying hidden patterns and correlations that would otherwise be difficult to notice. In this study, we applied several clustering techniques to three open datasets (Munich2019, Tainan2020, and Utrecht2019) derived from electronic health records, which included clinical, genetic, and administrative features of patients diagnosed with glioblastoma, considering two possible clusters. We evaluated our clustering results with the Density-Based Clustering Validation (DBCV) index, a relatively new score capable of accurately assessing both convex-shaped and concave-shaped clusters. Among the methods tested, Density-based Spatial Clustering of Applications with Noise (DBSCAN) yielded the best results across all three datasets. We then analyzed the features of the clusters identified by DBSCAN and found that cytosolic Hsp70 protein in the Munich2019 dataset, sex in the Tainan2020 dataset, and brain subventricular zone in the Utrecht2019 resulted significantly capable to distinguish the two clusters.| File | Dimensione | Formato | |
|---|---|---|---|
|
Chicco et al-2026-BioData Mining-VoR.pdf
accesso aperto
Tipologia di allegato:
Publisher’s Version (Version of Record, VoR)
Licenza:
Creative Commons
Dimensione
1.87 MB
Formato
Adobe PDF
|
1.87 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


