Bicocca Open Archive

The rapid expansion of digital data has intensified the need for computational methods capable of analyzing complex latent structures across a variety of domains, including textual data. Latent topic models, particularly latent Dirichlet allocation (LDA), are widely used to uncover latent structures in large text corpora. However, the Dirichlet prior on topic proportions imposes structural limitations that reduce the model’s ability to capture complex dependencies among topics. In this paper, we introduce the extended flexible latent Dirichlet allocation (EFLDA), a probabilistic model that extends LDA by allowing richer patterns of dependence among topics. The enriched parametrization of EFLDA improves the model’s ability to represent complex thematic structures, leading to great interpretability in real-world settings. Furthermore, we introduce the concept of sub-topics, defined as specific combinations of topics that provide a deeper understanding of corpora. We develop a collapsed Gibbs sampler for efficient inference and conduct an extensive evaluation on both synthetic data and multiple real-world applications, including mental health discourse, news articles, and microbiome data. Empirical results show that EFLDA outperforms classical LDA and recent alternative approaches in terms of topic coherence, sub-topic detection, and interpretability, while remaining robust across heterogeneous data settings characterized by complex and overlapping latent structures.

Ascari, R., Giampino, A., Migliorati, S. (2026). Sub-topics detection with extended flexible latent Dirichlet allocation. ADVANCES IN DATA ANALYSIS AND CLASSIFICATION [10.1007/s11634-026-00690-9].

Sub-topics detection with extended flexible latent Dirichlet allocation

Ascari, Roberto;Giampino, Alice;Migliorati, Sonia

2026

Abstract

The rapid expansion of digital data has intensified the need for computational methods capable of analyzing complex latent structures across a variety of domains, including textual data. Latent topic models, particularly latent Dirichlet allocation (LDA), are widely used to uncover latent structures in large text corpora. However, the Dirichlet prior on topic proportions imposes structural limitations that reduce the model’s ability to capture complex dependencies among topics. In this paper, we introduce the extended flexible latent Dirichlet allocation (EFLDA), a probabilistic model that extends LDA by allowing richer patterns of dependence among topics. The enriched parametrization of EFLDA improves the model’s ability to represent complex thematic structures, leading to great interpretability in real-world settings. Furthermore, we introduce the concept of sub-topics, defined as specific combinations of topics that provide a deeper understanding of corpora. We develop a collapsed Gibbs sampler for efficient inference and conduct an extensive evaluation on both synthetic data and multiple real-world applications, including mental health discourse, news articles, and microbiome data. Empirical results show that EFLDA outperforms classical LDA and recent alternative approaches in terms of topic coherence, sub-topic detection, and interpretability, while remaining robust across heterogeneous data settings characterized by complex and overlapping latent structures.

Scheda breve

Scheda completa

Scheda completa (DC)

	Sottotipologia
	
				Articolo in rivista - Articolo scientifico
			
	Parole chiave
	
				Collapsed Gibbs sampling; Finite mixture; Latent variables; Probabilistic modeling; Topic models
			
	Lingua del contenuto
	
				English
			
	Data ahead of print o Data prima pubblicazione Online
	
				1-lug-2026
			
	Data di pubblicazione
	
				2026
			
	Rivista
	
				ADVANCES IN DATA ANALYSIS AND CLASSIFICATION
			
	DOI dell'articolo
	
				https://dx.doi.org/10.1007/s11634-026-00690-9
			
	Fulltext
	
				open
			
	Citazione
	
				Ascari, R., Giampino, A., Migliorati, S. (2026). Sub-topics detection with extended flexible latent Dirichlet allocation. ADVANCES IN DATA ANALYSIS AND CLASSIFICATION [10.1007/s11634-026-00690-9].
			
	Appare nelle tipologie:
	
				01 - Articolo su rivista

File in questo prodotto:

File	Dimensione	Formato
Ascari et al-2026-Adv Data Anal Classif-VoR.pdf accesso aperto Descrizione: EFLDA_ADAC Tipologia di allegato: Publisher’s Version (Version of Record, VoR) Licenza: Creative Commons Dimensione 4.31 MB Formato Adobe PDF Visualizza/Apri	4.31 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/614401

Citazioni

ND

ND

Social impact