Bicocca Open Archive

The move towards preserving judgement disagreements in NLP requires the identification of adequate evaluation metrics. We identify a set of key properties that such metrics should have, and assess the extent to which natural candidates for soft evaluation such as Cross Entropy satisfy such properties. We employ a theoretical framework, supported by a visual approach, by practical examples, and by the analysis of a real case scenario. Our results indicate that Cross Entropy can result in fairly paradoxical results in some cases, whereas other measures Manhattan distance and Euclidean distance exhibit a more intuitive behavior, at least for the case of binary classification.

Rizzi, G., Leonardelli, E., Poesio, M., Uma, A., Pavlovic, M., Paun, S., et al. (2024). Soft metrics for evaluation with disagreements: an assessment. In 3rd Workshop on Perspectivist Approaches to NLP, NLPerspectives 2024 at LREC-COLING 2024 - Workshop Proceedings (pp.84-94). European Language Resources Association (ELRA).

Soft metrics for evaluation with disagreements: an assessment

Rizzi G.^Primo;Leonardelli E.^Secondo;Poesio M.;Uma A.;Pavlovic M.;Paun S.;Rosso P.^Penultimo;Fersini E.^Ultimo

2024

Abstract

The move towards preserving judgement disagreements in NLP requires the identification of adequate evaluation metrics. We identify a set of key properties that such metrics should have, and assess the extent to which natural candidates for soft evaluation such as Cross Entropy satisfy such properties. We employ a theoretical framework, supported by a visual approach, by practical examples, and by the analysis of a real case scenario. Our results indicate that Cross Entropy can result in fairly paradoxical results in some cases, whereas other measures Manhattan distance and Euclidean distance exhibit a more intuitive behavior, at least for the case of binary classification.

Scheda breve

Scheda completa

Scheda completa (DC)

	Tipo di intervento
	
				paper
			
	Parole chiave
	
				Binary classification; Cross entropy; Euclidean distance; Evaluation metrics; Manhattan distance; Property; Real case scenarios; Soft evaluations; Soft-metric; Theoretical framework
			
	Lingua del contenuto
	
				English
			
	Nome del convegno
	
				3rd Workshop on Perspectivist Approaches to NLP, NLPerspectives 2024
			
	Anno del convegno
	
				2024
			
	Curatori della monografia
	
				Abercrombie, G; Basile, V; Bernardi, D; Dudy, S; Frenda, S; Havens, L; Tonelli, S
			
	Titolo degli atti
	
				3rd Workshop on Perspectivist Approaches to NLP, NLPerspectives 2024 at LREC-COLING 2024 - Workshop Proceedings
			
	ISBN del volume degli atti
	
				9782493814234
			
	Data di pubblicazione
	
				2024
			
	Pagina iniziale
	
				84
			
	Pagina finale
	
				94
			
	Fulltext
	
				none
			
	Citazione
	
				Rizzi, G., Leonardelli, E., Poesio, M., Uma, A., Pavlovic, M., Paun, S., et al. (2024). Soft metrics for evaluation with disagreements: an assessment. In 3rd Workshop on Perspectivist Approaches to NLP, NLPerspectives 2024 at LREC-COLING 2024 - Workshop Proceedings (pp.84-94). European Language Resources Association (ELRA).
			
	Appare nelle tipologie:
	
				02 - Intervento a convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/590161

Citazioni

11

ND

Social impact