Bicocca Open Archive

In data mining, neighborhood classifiers are valid not only for numeric data but also symbolic data. The key issue for a neighborhood classifier is how to measure the similarity between two instances. In this paper, we compare six similarity measures, Overlap, Eskin, occurrence frequency (OF), inverse OF (IOF), Goodall3, and Goodall4, for symbolic data under the framework of a covering-based neighborhood classifier. In the training stage, a covering of the universe is built based on the given similarity measure. Then a covering reduction algorithm is used to remove some of these covering blocks and determine the representatives. In the testing stage, the similarities between all unlabeled instances and representatives are computed. The closest representative or a few representatives determine the predicted class label of the unlabeled instance. We compared the six similarity measures in experiments on 15 University of California-Irvine (UCI) datasets. The results demonstrate that although no measure dominated the others in all scenarios, some measures had consistently high performance. The covering-based neighborhood classifier with appropriate similarity measures, such as Overlap, IOF, and OF, was better than ID3, C4.5, and the Naïve Bayes classifiers.

Liu, F., Zhang, B., Ciucci, D., Wu, W., Min, F. (2018). A comparison study of similarity measures for covering-based neighborhood classifiers. INFORMATION SCIENCES, 448-449, 1-17 [10.1016/j.ins.2018.03.030].

A comparison study of similarity measures for covering-based neighborhood classifiers

Liu, Fu-Lun;Zhang, Ben-Wen;Ciucci, Davide;Wu, Wei-Zhi;Min, Fan

2018

Abstract

In data mining, neighborhood classifiers are valid not only for numeric data but also symbolic data. The key issue for a neighborhood classifier is how to measure the similarity between two instances. In this paper, we compare six similarity measures, Overlap, Eskin, occurrence frequency (OF), inverse OF (IOF), Goodall3, and Goodall4, for symbolic data under the framework of a covering-based neighborhood classifier. In the training stage, a covering of the universe is built based on the given similarity measure. Then a covering reduction algorithm is used to remove some of these covering blocks and determine the representatives. In the testing stage, the similarities between all unlabeled instances and representatives are computed. The closest representative or a few representatives determine the predicted class label of the unlabeled instance. We compared the six similarity measures in experiments on 15 University of California-Irvine (UCI) datasets. The results demonstrate that although no measure dominated the others in all scenarios, some measures had consistently high performance. The covering-based neighborhood classifier with appropriate similarity measures, such as Overlap, IOF, and OF, was better than ID3, C4.5, and the Naïve Bayes classifiers.

Scheda breve

Scheda completa

Scheda completa (DC)

	Sottotipologia
	
				Articolo in rivista - Articolo scientifico
			
	Parole chiave
	
				Classifier; Covering-based rough set; Representative; Similarity measure;
			
	Parole chiave
	
				Classifier; Covering-based rough set; Representative; Similarity measure
			
	Lingua del contenuto
	
				English
			
	Data di pubblicazione
	
				2018
			
	Rivista
	
				INFORMATION SCIENCES
			
	Numero del volume
	
				448-449
			
	Pagina iniziale
	
				1
			
	Pagina finale
	
				17
			
	DOI dell'articolo
	
				https://dx.doi.org/10.1016/j.ins.2018.03.030
			
	URL alternativo
	
				http://www.journals.elsevier.com/information-sciences/
			
	Fulltext
	
				partially_open
			
	Citazione
	
				Liu, F., Zhang, B., Ciucci, D., Wu, W., Min, F. (2018). A comparison study of similarity measures for covering-based neighborhood classifiers. INFORMATION SCIENCES, 448-449, 1-17 [10.1016/j.ins.2018.03.030].
			
	Appare nelle tipologie:
	
				01 - Articolo su rivista

File in questo prodotto:

File	Dimensione	Formato
Liu-2018-Informat Sci-VoR.pdf Solo gestori archivio Descrizione: Research Article Tipologia di allegato: Publisher’s Version (Version of Record, VoR) Licenza: Tutti i diritti riservati Dimensione 1.78 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	1.78 MB	Adobe PDF	Visualizza/Apri Richiedi una copia
Liu-2018-Informat Sci-AAM.pdf accesso aperto Descrizione: Research Article Tipologia di allegato: Author’s Accepted Manuscript, AAM (Post-print) Licenza: Creative Commons Dimensione 990.44 kB Formato Adobe PDF Visualizza/Apri	990.44 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/196357

Citazioni

22

18

Social impact