The move towards preserving judgement disagreements in NLP requires the identification of adequate evaluation metrics. We identify a set of key properties that such metrics should have, and assess the extent to which natural candidates for soft evaluation such as Cross Entropy satisfy such properties. We employ a theoretical framework, supported by a visual approach, by practical examples, and by the analysis of a real case scenario. Our results indicate that Cross Entropy can result in fairly paradoxical results in some cases, whereas other measures Manhattan distance and Euclidean distance exhibit a more intuitive behavior, at least for the case of binary classification.

Rizzi, G., Leonardelli, E., Poesio, M., Uma, A., Pavlovic, M., Paun, S., et al. (2024). Soft metrics for evaluation with disagreements: an assessment. In 3rd Workshop on Perspectivist Approaches to NLP, NLPerspectives 2024 at LREC-COLING 2024 - Workshop Proceedings (pp.84-94). European Language Resources Association (ELRA).

Soft metrics for evaluation with disagreements: an assessment

Rizzi G.
Primo
;
Fersini E.
Ultimo
2024

Abstract

The move towards preserving judgement disagreements in NLP requires the identification of adequate evaluation metrics. We identify a set of key properties that such metrics should have, and assess the extent to which natural candidates for soft evaluation such as Cross Entropy satisfy such properties. We employ a theoretical framework, supported by a visual approach, by practical examples, and by the analysis of a real case scenario. Our results indicate that Cross Entropy can result in fairly paradoxical results in some cases, whereas other measures Manhattan distance and Euclidean distance exhibit a more intuitive behavior, at least for the case of binary classification.
paper
Binary classification; Cross entropy; Euclidean distance; Evaluation metrics; Manhattan distance; Property; Real case scenarios; Soft evaluations; Soft-metric; Theoretical framework
English
3rd Workshop on Perspectivist Approaches to NLP, NLPerspectives 2024
2024
Abercrombie, G; Basile, V; Bernardi, D; Dudy, S; Frenda, S; Havens, L; Tonelli, S
3rd Workshop on Perspectivist Approaches to NLP, NLPerspectives 2024 at LREC-COLING 2024 - Workshop Proceedings
9782493814234
2024
84
94
none
Rizzi, G., Leonardelli, E., Poesio, M., Uma, A., Pavlovic, M., Paun, S., et al. (2024). Soft metrics for evaluation with disagreements: an assessment. In 3rd Workshop on Perspectivist Approaches to NLP, NLPerspectives 2024 at LREC-COLING 2024 - Workshop Proceedings (pp.84-94). European Language Resources Association (ELRA).
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/590161
Citazioni
  • Scopus 5
  • ???jsp.display-item.citation.isi??? ND
Social impact