In data mining, neighborhood classifiers are valid not only for numeric data but also symbolic data. The key issue for a neighborhood classifier is how to measure the similarity between two instances. In this paper, we compare six similarity measures, Overlap, Eskin, occurrence frequency (OF), inverse OF (IOF), Goodall3, and Goodall4, for symbolic data under the framework of a covering-based neighborhood classifier. In the training stage, a covering of the universe is built based on the given similarity measure. Then a covering reduction algorithm is used to remove some of these covering blocks and determine the representatives. In the testing stage, the similarities between all unlabeled instances and representatives are computed. The closest representative or a few representatives determine the predicted class label of the unlabeled instance. We compared the six similarity measures in experiments on 15 University of California-Irvine (UCI) datasets. The results demonstrate that although no measure dominated the others in all scenarios, some measures had consistently high performance. The covering-based neighborhood classifier with appropriate similarity measures, such as Overlap, IOF, and OF, was better than ID3, C4.5, and the Naïve Bayes classifiers.

Liu, F., Zhang, B., Ciucci, D., Wu, W., Min, F. (2018). A comparison study of similarity measures for covering-based neighborhood classifiers. INFORMATION SCIENCES, 448-449, 1-17 [10.1016/j.ins.2018.03.030].

A comparison study of similarity measures for covering-based neighborhood classifiers

Ciucci, Davide;
2018

Abstract

In data mining, neighborhood classifiers are valid not only for numeric data but also symbolic data. The key issue for a neighborhood classifier is how to measure the similarity between two instances. In this paper, we compare six similarity measures, Overlap, Eskin, occurrence frequency (OF), inverse OF (IOF), Goodall3, and Goodall4, for symbolic data under the framework of a covering-based neighborhood classifier. In the training stage, a covering of the universe is built based on the given similarity measure. Then a covering reduction algorithm is used to remove some of these covering blocks and determine the representatives. In the testing stage, the similarities between all unlabeled instances and representatives are computed. The closest representative or a few representatives determine the predicted class label of the unlabeled instance. We compared the six similarity measures in experiments on 15 University of California-Irvine (UCI) datasets. The results demonstrate that although no measure dominated the others in all scenarios, some measures had consistently high performance. The covering-based neighborhood classifier with appropriate similarity measures, such as Overlap, IOF, and OF, was better than ID3, C4.5, and the Naïve Bayes classifiers.
Articolo in rivista - Articolo scientifico
Classifier; Covering-based rough set; Representative; Similarity measure;
Classifier; Covering-based rough set; Representative; Similarity measure
English
2018
448-449
1
17
partially_open
Liu, F., Zhang, B., Ciucci, D., Wu, W., Min, F. (2018). A comparison study of similarity measures for covering-based neighborhood classifiers. INFORMATION SCIENCES, 448-449, 1-17 [10.1016/j.ins.2018.03.030].
File in questo prodotto:
File Dimensione Formato  
Liu-2018-Informat Sci-VoR.pdf

Solo gestori archivio

Descrizione: Research Article
Tipologia di allegato: Publisher’s Version (Version of Record, VoR)
Licenza: Tutti i diritti riservati
Dimensione 1.78 MB
Formato Adobe PDF
1.78 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
Liu-2018-Informat Sci-AAM.pdf

accesso aperto

Descrizione: Research Article
Tipologia di allegato: Author’s Accepted Manuscript, AAM (Post-print)
Licenza: Creative Commons
Dimensione 990.44 kB
Formato Adobe PDF
990.44 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/196357
Citazioni
  • Scopus 18
  • ???jsp.display-item.citation.isi??? 17
Social impact