In this article, we propose a general framework for the development of external evaluation measures for soft clustering. Our proposal is based on the interpretation of soft clustering as representing uncertain information about an underlying, unknown hard clustering. We present a general construction, based on optimal transport theory, by which any evaluation measure can be naturally extended to soft clustering. The proposed “transport-based measure” provides an objective, interval-valued comparison index that represents the range of compatibility between two soft clusterings. We study the metric and complexity properties of the proposed approach, as well as its relationship with other existing proposals. We also propose approximation and bounding algorithms that make the approach practical for large datasets. Finally, we illustrate the application of the proposed method through two computational experiments.
Campagner, A., Ciucci, D., Denoeux, T. (2023). A general framework for evaluating and comparing soft clusterings. INFORMATION SCIENCES, 623(April 2023), 70-93 [10.1016/j.ins.2022.11.114].
A general framework for evaluating and comparing soft clusterings
Campagner A.
;Ciucci D.;
2023
Abstract
In this article, we propose a general framework for the development of external evaluation measures for soft clustering. Our proposal is based on the interpretation of soft clustering as representing uncertain information about an underlying, unknown hard clustering. We present a general construction, based on optimal transport theory, by which any evaluation measure can be naturally extended to soft clustering. The proposed “transport-based measure” provides an objective, interval-valued comparison index that represents the range of compatibility between two soft clusterings. We study the metric and complexity properties of the proposed approach, as well as its relationship with other existing proposals. We also propose approximation and bounding algorithms that make the approach practical for large datasets. Finally, we illustrate the application of the proposed method through two computational experiments.File | Dimensione | Formato | |
---|---|---|---|
Campagner-2022-Information Scie-VoR.pdf
Accesso Aperto
Descrizione: Research Article
Tipologia di allegato:
Author’s Accepted Manuscript, AAM (Post-print)
Licenza:
Creative Commons
Dimensione
1.07 MB
Formato
Adobe PDF
|
1.07 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.