Latent Dirichlet Allocation (LDA) is a popular statistical tool for the analysis of text documents when the goal is detecting latent topics. A well-known limitation of the LDA is its inability to model positive correlations between topics. This is attributable to the stiffness of the Dirichlet distribution, which is the standard prior for the topic distributions. The aim is to perform a preliminary study of the extended flexible Dirichlet (EFD) as an alternative prior. The latter is a generalization of the Dirichlet distribution defined as a particular structured mixture allowing for positive correlations between its elements. The EFD distribution retains many good theoretical properties of the Dirichlet one, such as identifiability and also explicit expressions of joint moments and closure under many relevant operations on the simplex. Furthermore, the introduction of additional parameters establishes more flexibility, while still maintaining the interpretability of the model, as well as conjugacy with respect to the multinomial model. The generalization of the LDA based on the EFD distribution is illustrated via an application to real data using Markov Chain Monte Carlo (MCMC) methods.

Giampino, A., Ascari, R., Migliorati, S. (2022). LEFDA: An extension of the classical LDA. In 24th International Conference on Computational Statistics (COMPSTAT 2022) and CSDA & EcoSta Workshop on Statistical Data Science (SDS 2022).

LEFDA: An extension of the classical LDA

Giampino, Alice
;
Ascari, Roberto;Migliorati, Sonia
2022

Abstract

Latent Dirichlet Allocation (LDA) is a popular statistical tool for the analysis of text documents when the goal is detecting latent topics. A well-known limitation of the LDA is its inability to model positive correlations between topics. This is attributable to the stiffness of the Dirichlet distribution, which is the standard prior for the topic distributions. The aim is to perform a preliminary study of the extended flexible Dirichlet (EFD) as an alternative prior. The latter is a generalization of the Dirichlet distribution defined as a particular structured mixture allowing for positive correlations between its elements. The EFD distribution retains many good theoretical properties of the Dirichlet one, such as identifiability and also explicit expressions of joint moments and closure under many relevant operations on the simplex. Furthermore, the introduction of additional parameters establishes more flexibility, while still maintaining the interpretability of the model, as well as conjugacy with respect to the multinomial model. The generalization of the LDA based on the EFD distribution is illustrated via an application to real data using Markov Chain Monte Carlo (MCMC) methods.
abstract + slide
topic modeling; textual data; mixture model; simplex distribution; LDA
English
The 24th International Conference on Computational Statistics (COMPSTAT 2022) - 23-26 August 2022
2022
24th International Conference on Computational Statistics (COMPSTAT 2022) and CSDA & EcoSta Workshop on Statistical Data Science (SDS 2022)
9781713870173
2022
https://www.proceedings.com/68141.html
none
Giampino, A., Ascari, R., Migliorati, S. (2022). LEFDA: An extension of the classical LDA. In 24th International Conference on Computational Statistics (COMPSTAT 2022) and CSDA & EcoSta Workshop on Statistical Data Science (SDS 2022).
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/445618
Citazioni
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
Social impact