Text modeling techniques have been used in a variety of applications in the past, among which is the analysis of documents for latent topics detection. One of the most commonly used tools for topic modeling is the latent Dirichlet allocation (LDA). The LDA allows us to obtain a representation of a document in terms of latent topic structure. This chapter proposes a collapsed Gibbs sampling algorithm for the estimation of the relevant flexible LDA (FLDA) parameters and find relevant insights into topic distributions over documents. It presents the considered distributions on the simplex, such as the Dirichlet and the flexible Dirichlet. The chapter introduces latent topic models with a focus on the LDA and the FLDA. It describes the main sampling schemes and estimation procedures used for both of the latent topic models. The chapter shows the performances of the new FLDA model through some extensive simulation studies.
Giampino, A., Ascari, R., Migliorati, S. (2024). A Flexible Generalization of the Latent Dirichlet Allocation. In Y. Dimotikalis, C.H. Skiadas (a cura di), Data Analysis and Related Applications 4: New Approaches: Volume 12 (pp. 109-122). Wiley [10.1002/9781394316915.ch8].
A Flexible Generalization of the Latent Dirichlet Allocation
Giampino A.
;Ascari R.;Migliorati S.
2024
Abstract
Text modeling techniques have been used in a variety of applications in the past, among which is the analysis of documents for latent topics detection. One of the most commonly used tools for topic modeling is the latent Dirichlet allocation (LDA). The LDA allows us to obtain a representation of a document in terms of latent topic structure. This chapter proposes a collapsed Gibbs sampling algorithm for the estimation of the relevant flexible LDA (FLDA) parameters and find relevant insights into topic distributions over documents. It presents the considered distributions on the simplex, such as the Dirichlet and the flexible Dirichlet. The chapter introduces latent topic models with a focus on the LDA and the FLDA. It describes the main sampling schemes and estimation procedures used for both of the latent topic models. The chapter shows the performances of the new FLDA model through some extensive simulation studies.File | Dimensione | Formato | |
---|---|---|---|
Giampino-2024-Data Analysis and Related Applications 4: New Approaches: Volume 12-VoR.pdf
Solo gestori archivio
Descrizione: A Flexible Generalization of the Latent Dirichlet Allocation
Tipologia di allegato:
Publisher’s Version (Version of Record, VoR)
Licenza:
Tutti i diritti riservati
Dimensione
1.98 MB
Formato
Adobe PDF
|
1.98 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.