Bicocca Open Archive

In this paper an adaptive hierarchical fuzzy clustering algorithm is presented, named Hierarchical Data Divisive Soft Clustering (H2D-SC). The main novelty of the proposed algorithm is that it is a quality driven algorithm, since it dynamically evaluates a multi-dimensional quality measure of the clusters to drive the generation of the soft hierarchy. Specifically, it generates a hierarchy in which each node is split into a variable number of sub-nodes, determined by an innovative quality assessment of soft clusters, based on the evaluation of multiple dimensions such as the cluster's cohesion, its cardinality, its mass, and its fuzziness, as well as the partition's entropy. Clusters at the same hierarchical level share a minimum quality value: clusters in the lower levels of the hierarchy have a higher quality: this way more specific clusters (lower level clusters) have a higher quality than more general clusters (upper level clusters). Further, since the algorithm generates a soft partition, a document can belong to several sub-clusters with distinct membership degrees. The proposed algorithm is divisive, and it is based on a combination of a modified bisecting K-Means algorithm with a flat soft clustering algorithm used to partition each node. The paper describes the algorithm and its evaluation on two standard collections

Bordogna, G., Pasi, G. (2012). A quality driven hierarchical data divisive soft clustering for information retrieval. KNOWLEDGE-BASED SYSTEMS, 26, 9-19 [10.1016/j.knosys.2011.06.012].

A quality driven hierarchical data divisive soft clustering for information retrieval

Bordogna, G;PASI, GABRIELLA

2012

Abstract

In this paper an adaptive hierarchical fuzzy clustering algorithm is presented, named Hierarchical Data Divisive Soft Clustering (H2D-SC). The main novelty of the proposed algorithm is that it is a quality driven algorithm, since it dynamically evaluates a multi-dimensional quality measure of the clusters to drive the generation of the soft hierarchy. Specifically, it generates a hierarchy in which each node is split into a variable number of sub-nodes, determined by an innovative quality assessment of soft clusters, based on the evaluation of multiple dimensions such as the cluster's cohesion, its cardinality, its mass, and its fuzziness, as well as the partition's entropy. Clusters at the same hierarchical level share a minimum quality value: clusters in the lower levels of the hierarchy have a higher quality: this way more specific clusters (lower level clusters) have a higher quality than more general clusters (upper level clusters). Further, since the algorithm generates a soft partition, a document can belong to several sub-clusters with distinct membership degrees. The proposed algorithm is divisive, and it is based on a combination of a modified bisecting K-Means algorithm with a flat soft clustering algorithm used to partition each node. The paper describes the algorithm and its evaluation on two standard collections

Scheda breve

Scheda completa

Scheda completa (DC)

	Sottotipologia
	
				Articolo in rivista - Articolo scientifico
			
	Parole chiave
	
				Cluster's quality; Document clustering; Fuzzy C-Means; Quality measures; Soft Hierarchical Clustering;
			
	Parole chiave
	
				Text Clustering, Information Retrieval, Fuzzy Set Theory
			
	Lingua del contenuto
	
				English
			
	Data di pubblicazione
	
				2012
			
	Rivista
	
				KNOWLEDGE-BASED SYSTEMS
			
	Numero del volume
	
				26
			
	Pagina iniziale
	
				9
			
	Pagina finale
	
				19
			
	DOI dell'articolo
	
				https://dx.doi.org/10.1016/j.knosys.2011.06.012
			
	Fulltext
	
				none
			
	Citazione
	
				Bordogna, G., Pasi, G. (2012). A quality driven hierarchical data divisive soft clustering for information retrieval. KNOWLEDGE-BASED SYSTEMS, 26, 9-19 [10.1016/j.knosys.2011.06.012].
			
	Appare nelle tipologie:
	
				01 - Articolo su rivista

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/43320

Citazioni

33

28

Social impact