Bicocca Open Archive

Background and Objective Evaluation of AI-based decision support systems (AI-DSS) is of critical importance in practical applications, nonetheless common evaluation metrics fail to properly consider relevant and contextual information. In this article we discuss a novel utility metric, the weighted Utility (wU), for the evaluation of AI-DSS, which is based on the raters’ perceptions of their annotation hesitation and of the relevance of the training cases. Methods We discuss the relationship between the proposed metric and other previous proposals; and we describe the application of the proposed metric for both model evaluation and optimization, through three realistic case studies. Results We show that our metric generalizes the well-known Net Benefit, as well as other common error-based and utility-based metrics. Through the empirical studies, we show that our metric can provide a more flexible tool for the evaluation of AI models. We also show that, compared to other optimization metrics, model optimization based on the wU can provide significantly better performance (AUC 0.862 vs 0.895, p-value <0.05), especially on cases judged to be more complex by the human annotators (AUC 0.85 vs 0.92, p-value <0.05). Conclusions We make the point for having utility as a primary concern in the evaluation and optimization of machine learning models in critical domains, like the medical one; and for the importance of a human-centred approach to assess the potential impact of AI models on human decision making also on the basis of further information that can be collected during the ground-truthing process.

Campagner, A., Sternini, F., Cabitza, F. (2022). Decisions are not all equal—Introducing a utility metric based on case-wise raters’ perceptions. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 221(June 2022) [10.1016/j.cmpb.2022.106930].

Decisions are not all equal—Introducing a utility metric based on case-wise raters’ perceptions

Campagner A.;Sternini F.;Cabitza F.

2022

Abstract

Background and Objective Evaluation of AI-based decision support systems (AI-DSS) is of critical importance in practical applications, nonetheless common evaluation metrics fail to properly consider relevant and contextual information. In this article we discuss a novel utility metric, the weighted Utility (wU), for the evaluation of AI-DSS, which is based on the raters’ perceptions of their annotation hesitation and of the relevance of the training cases. Methods We discuss the relationship between the proposed metric and other previous proposals; and we describe the application of the proposed metric for both model evaluation and optimization, through three realistic case studies. Results We show that our metric generalizes the well-known Net Benefit, as well as other common error-based and utility-based metrics. Through the empirical studies, we show that our metric can provide a more flexible tool for the evaluation of AI models. We also show that, compared to other optimization metrics, model optimization based on the wU can provide significantly better performance (AUC 0.862 vs 0.895, p-value <0.05), especially on cases judged to be more complex by the human annotators (AUC 0.85 vs 0.92, p-value <0.05). Conclusions We make the point for having utility as a primary concern in the evaluation and optimization of machine learning models in critical domains, like the medical one; and for the importance of a human-centred approach to assess the potential impact of AI models on human decision making also on the basis of further information that can be collected during the ground-truthing process.

Scheda breve

Scheda completa

Scheda completa (DC)

	Sottotipologia
	
				Articolo in rivista - Articolo scientifico
			
	Parole chiave
	
				Evaluation; Health informatics; Human-centred; Machine learning; Utility; Validation;
			
	Lingua del contenuto
	
				English
			
	Data ahead of print o Data prima pubblicazione Online
	
				3-giu-2022
			
	Data di pubblicazione
	
				2022
			
	Rivista
	
				COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE
			
	Numero del volume
	
				221
			
	Fascicolo
	
				June 2022
			
	Article number
	
				106930
			
	DOI dell'articolo
	
				https://dx.doi.org/10.1016/j.cmpb.2022.106930
			
	Fulltext
	
				none
			
	Citazione
	
				Campagner, A., Sternini, F., Cabitza, F. (2022). Decisions are not all equal—Introducing a utility metric based on case-wise raters’ perceptions. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 221(June 2022) [10.1016/j.cmpb.2022.106930].
			
	Appare nelle tipologie:
	
				01 - Articolo su rivista

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/394390

Citazioni

8

8

Social impact