Bicocca Open Archive

Abstract Bayesian nonparametric mixture models are widely used to cluster observations. However, one of the major drawbacks of the approach is that the estimated partition often presents only a few dominating clusters and a large number of sparsely-populated ones. This feature translates into results that are uninterpretable unless we accept to ignore a relevant number of observations and clusters. Here, we explain this phenomenon through the study of the cost functions involved in the estimation of the partition. Moreover, we propose a post-processing procedure to reduce the number of sparsely-populated clusters. The procedure takes the form of entropy-regularization of posterior cluster allocations. While being computationally convenient with respect to alternative strategies, it is also theoretically justified as a correction to the Bayesian loss function used for point estimation and, as such, can be applied to any posterior distribution of clusters, regardless of the specific Bayesian model used.

Franzolini, B., Rebaudo, G. (2022). A regularized-entropy estimator to enhance cluster interpretability in Bayesian nonparametrics. In Book of Short Papers SIS 2022 (pp. 387-398). Springer.

A regularized-entropy estimator to enhance cluster interpretability in Bayesian nonparametrics

Beatrice Franzolini;Giovanni Rebaudo

2022

Abstract

Abstract Bayesian nonparametric mixture models are widely used to cluster observations. However, one of the major drawbacks of the approach is that the estimated partition often presents only a few dominating clusters and a large number of sparsely-populated ones. This feature translates into results that are uninterpretable unless we accept to ignore a relevant number of observations and clusters. Here, we explain this phenomenon through the study of the cost functions involved in the estimation of the partition. Moreover, we propose a post-processing procedure to reduce the number of sparsely-populated clusters. The procedure takes the form of entropy-regularization of posterior cluster allocations. While being computationally convenient with respect to alternative strategies, it is also theoretically justified as a correction to the Bayesian loss function used for point estimation and, as such, can be applied to any posterior distribution of clusters, regardless of the specific Bayesian model used.

Scheda breve

Scheda completa

Scheda completa (DC)

	Sottotipologia
	
				Capitolo o saggio
			
	Parole chiave
	
				Bayesian nonparametrics, Exchangeable partition probability function, Entropy, Clustering, Dirichlet process mixture
			
	Lingua del contenuto
	
				English
			
	Titolo del volume
	
				Book of Short Papers SIS 2022
			
	Data di pubblicazione
	
				2022
			
	ISBN del volume
	
				9788891932310
			
	Editore
	
				Springer
			
	Pagina iniziale
	
				387
			
	Pagina finale
	
				398
			
	URL alternativo
	
				https://it.pearson.com/content/dam/region-core/italy/pearson-italy/pdf/Docenti/UniversitÃ /Sis-2022-4c-low.pdf
			
	Citazione
	
				Franzolini, B., Rebaudo, G. (2022). A regularized-entropy estimator to enhance cluster interpretability in Bayesian nonparametrics. In Book of Short Papers SIS 2022 (pp. 387-398). Springer.
			
	Fulltext
	
				none
			
	Appare nelle tipologie:
	
				03 - Contributo in libro

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/582149

Citazioni

ND

ND

Social impact