Abstract Bayesian nonparametric mixture models are widely used to cluster observations. However, one of the major drawbacks of the approach is that the estimated partition often presents only a few dominating clusters and a large number of sparsely-populated ones. This feature translates into results that are uninterpretable unless we accept to ignore a relevant number of observations and clusters. Here, we explain this phenomenon through the study of the cost functions involved in the estimation of the partition. Moreover, we propose a post-processing procedure to reduce the number of sparsely-populated clusters. The procedure takes the form of entropy-regularization of posterior cluster allocations. While being computationally convenient with respect to alternative strategies, it is also theoretically justified as a correction to the Bayesian loss function used for point estimation and, as such, can be applied to any posterior distribution of clusters, regardless of the specific Bayesian model used.

Franzolini, B., Rebaudo, G. (2022). A regularized-entropy estimator to enhance cluster interpretability in Bayesian nonparametrics. In Book of Short Papers SIS 2022 (pp. 387-398). Springer.

A regularized-entropy estimator to enhance cluster interpretability in Bayesian nonparametrics

Beatrice Franzolini;
2022

Abstract

Abstract Bayesian nonparametric mixture models are widely used to cluster observations. However, one of the major drawbacks of the approach is that the estimated partition often presents only a few dominating clusters and a large number of sparsely-populated ones. This feature translates into results that are uninterpretable unless we accept to ignore a relevant number of observations and clusters. Here, we explain this phenomenon through the study of the cost functions involved in the estimation of the partition. Moreover, we propose a post-processing procedure to reduce the number of sparsely-populated clusters. The procedure takes the form of entropy-regularization of posterior cluster allocations. While being computationally convenient with respect to alternative strategies, it is also theoretically justified as a correction to the Bayesian loss function used for point estimation and, as such, can be applied to any posterior distribution of clusters, regardless of the specific Bayesian model used.
Capitolo o saggio
Bayesian nonparametrics, Exchangeable partition probability function, Entropy, Clustering, Dirichlet process mixture
English
Book of Short Papers SIS 2022
2022
9788891932310
Springer
387
398
Franzolini, B., Rebaudo, G. (2022). A regularized-entropy estimator to enhance cluster interpretability in Bayesian nonparametrics. In Book of Short Papers SIS 2022 (pp. 387-398). Springer.
none
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/582149
Citazioni
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
Social impact