An ultrametric Gaussian mixture model is a powerful tool for modeling hierarchical relationships among latent concepts, making it ideal for studying complex phenomena in diverse and potentially heterogeneous populations. However, in many cases, only an incomplete set of observations is available on the phenomenon under study. To address this issue, we propose MissUGMM, an ultrametric Gaussian mixture model which takes into account the missing at random mechanism for the unobserved values. Our approach is estimated using the expectation-maximization algorithm and achieves favorable results in comparison to other existing mixture models in simulations conducted with synthetic and benchmark data sets, even without a theorized ultrametric structure underlying the data. Furthermore, MissUGMM is applied to a real-world problem for exploring the sustainable development of cities across countries starting from incomplete information provided by municipalities. Overall, our results demonstrate that MissUGMM is a powerful and versatile model in dealing with missing data and is applicable to a broader range of real-world problems.

Greselin, F., Zaccaria, G. (2025). Studying hierarchical latent structures in heterogeneous populations with missing information. JOURNAL OF CLASSIFICATION, 42(2 (July 2025)), 284-310 [10.1007/s00357-024-09492-0].

Studying hierarchical latent structures in heterogeneous populations with missing information

Greselin, F;Zaccaria, G
2025

Abstract

An ultrametric Gaussian mixture model is a powerful tool for modeling hierarchical relationships among latent concepts, making it ideal for studying complex phenomena in diverse and potentially heterogeneous populations. However, in many cases, only an incomplete set of observations is available on the phenomenon under study. To address this issue, we propose MissUGMM, an ultrametric Gaussian mixture model which takes into account the missing at random mechanism for the unobserved values. Our approach is estimated using the expectation-maximization algorithm and achieves favorable results in comparison to other existing mixture models in simulations conducted with synthetic and benchmark data sets, even without a theorized ultrametric structure underlying the data. Furthermore, MissUGMM is applied to a real-world problem for exploring the sustainable development of cities across countries starting from incomplete information provided by municipalities. Overall, our results demonstrate that MissUGMM is a powerful and versatile model in dealing with missing data and is applicable to a broader range of real-world problems.
Articolo in rivista - Articolo scientifico
Cities’ sustainable development; Gaussian mixture model; Hierarchy of latent concepts; Missing data; Ultrametricity;
English
4-ott-2024
2025
42
2 (July 2025)
284
310
open
Greselin, F., Zaccaria, G. (2025). Studying hierarchical latent structures in heterogeneous populations with missing information. JOURNAL OF CLASSIFICATION, 42(2 (July 2025)), 284-310 [10.1007/s00357-024-09492-0].
File in questo prodotto:
File Dimensione Formato  
Greselin-Zaccaria-2025-Journal of Classification-VoR.pdf

accesso aperto

Descrizione: This article is licensed under a Creative Commons Attribution 4.0 International License
Tipologia di allegato: Publisher’s Version (Version of Record, VoR)
Licenza: Creative Commons
Dimensione 1.24 MB
Formato Adobe PDF
1.24 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/517623
Citazioni
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 1
Social impact