The rapid expansion of digital data has intensified the need for computational methods capable of analyzing complex latent structures across a variety of domains, including textual data. Latent topic models, particularly latent Dirichlet allocation (LDA), are widely used to uncover latent structures in large text corpora. However, the Dirichlet prior on topic proportions imposes structural limitations that reduce the model’s ability to capture complex dependencies among topics. In this paper, we introduce the extended flexible latent Dirichlet allocation (EFLDA), a probabilistic model that extends LDA by allowing richer patterns of dependence among topics. The enriched parametrization of EFLDA improves the model’s ability to represent complex thematic structures, leading to great interpretability in real-world settings. Furthermore, we introduce the concept of sub-topics, defined as specific combinations of topics that provide a deeper understanding of corpora. We develop a collapsed Gibbs sampler for efficient inference and conduct an extensive evaluation on both synthetic data and multiple real-world applications, including mental health discourse, news articles, and microbiome data. Empirical results show that EFLDA outperforms classical LDA and recent alternative approaches in terms of topic coherence, sub-topic detection, and interpretability, while remaining robust across heterogeneous data settings characterized by complex and overlapping latent structures.

Ascari, R., Giampino, A., Migliorati, S. (2026). Sub-topics detection with extended flexible latent Dirichlet allocation. ADVANCES IN DATA ANALYSIS AND CLASSIFICATION [10.1007/s11634-026-00690-9].

Sub-topics detection with extended flexible latent Dirichlet allocation

Ascari, Roberto
;
Giampino, Alice;Migliorati, Sonia
2026

Abstract

The rapid expansion of digital data has intensified the need for computational methods capable of analyzing complex latent structures across a variety of domains, including textual data. Latent topic models, particularly latent Dirichlet allocation (LDA), are widely used to uncover latent structures in large text corpora. However, the Dirichlet prior on topic proportions imposes structural limitations that reduce the model’s ability to capture complex dependencies among topics. In this paper, we introduce the extended flexible latent Dirichlet allocation (EFLDA), a probabilistic model that extends LDA by allowing richer patterns of dependence among topics. The enriched parametrization of EFLDA improves the model’s ability to represent complex thematic structures, leading to great interpretability in real-world settings. Furthermore, we introduce the concept of sub-topics, defined as specific combinations of topics that provide a deeper understanding of corpora. We develop a collapsed Gibbs sampler for efficient inference and conduct an extensive evaluation on both synthetic data and multiple real-world applications, including mental health discourse, news articles, and microbiome data. Empirical results show that EFLDA outperforms classical LDA and recent alternative approaches in terms of topic coherence, sub-topic detection, and interpretability, while remaining robust across heterogeneous data settings characterized by complex and overlapping latent structures.
Articolo in rivista - Articolo scientifico
Collapsed Gibbs sampling; Finite mixture; Latent variables; Probabilistic modeling; Topic models
English
1-lug-2026
2026
open
Ascari, R., Giampino, A., Migliorati, S. (2026). Sub-topics detection with extended flexible latent Dirichlet allocation. ADVANCES IN DATA ANALYSIS AND CLASSIFICATION [10.1007/s11634-026-00690-9].
File in questo prodotto:
File Dimensione Formato  
Ascari et al-2026-Adv Data Anal Classif-VoR.pdf

accesso aperto

Descrizione: EFLDA_ADAC
Tipologia di allegato: Publisher’s Version (Version of Record, VoR)
Licenza: Creative Commons
Dimensione 4.31 MB
Formato Adobe PDF
4.31 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/614401
Citazioni
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
Social impact