Bicocca Open Archive

Background: Despite the vagueness and uncertainty that is intrinsic in any medical act, interpretation and decision (including acts of data reporting and representation of relevant medical conditions), still little research has focused on how to explicitly take this uncertainty into account. In this paper, we focus on the representation of a general and wide-spread medical terminology, which is grounded on a traditional and well-established convention, to represent severity of health conditions (for instance, pain, visible signs), ranging from Absent to Extreme. Specifically, we will study how both potential patients and doctors perceive the different levels of the terminology in both quantitative and qualitative terms, and if the embedded user knowledge could improve the representation of ordinal values in the construction of machine learning models. Methods: To this aim, we conducted a questionnaire-based research study involving a relatively large sample of 1,152 potential patients and 31 clinicians to represent numerically the perceived meaning of standard and widely-applied labels to describe health conditions. Using these collected values, we then present and discuss different possible fuzzy-set based representations that address the vagueness of medical interpretation by taking into account the perceptions of domain experts. We also apply the findings of this user study to evaluate the impact of different encodings on the predictive performance of common machine learning models in regard to a real-world medical prognostic task. Results: We found significant differences in the perception of pain levels between the two user groups. We also show that the proposed encodings can improve the performances of specific classes of models, and discuss when this is the case. Conclusions: In perspective, our hope is that the proposed techniques for ordinal scale representation and ordinal encoding may be useful to the research community, and also that our methodology will be applied to other widely used ordinal scales for improving validity of datasets and bettering the results of machine learning tasks.

Seveso, A., Campagner, A., Ciucci, D., Cabitza, F. (2020). Ordinal labels in machine learning: a user-centered approach to improve data validity in medical settings. BMC MEDICAL INFORMATICS AND DECISION MAKING, 20(S5) [10.1186/s12911-020-01152-8].

Ordinal labels in machine learning: a user-centered approach to improve data validity in medical settings

Seveso, Andrea^Primo;Campagner, Andrea^Secondo;Ciucci, Davide^Penultimo;Cabitza, Federico^Ultimo

2020

Abstract

Background: Despite the vagueness and uncertainty that is intrinsic in any medical act, interpretation and decision (including acts of data reporting and representation of relevant medical conditions), still little research has focused on how to explicitly take this uncertainty into account. In this paper, we focus on the representation of a general and wide-spread medical terminology, which is grounded on a traditional and well-established convention, to represent severity of health conditions (for instance, pain, visible signs), ranging from Absent to Extreme. Specifically, we will study how both potential patients and doctors perceive the different levels of the terminology in both quantitative and qualitative terms, and if the embedded user knowledge could improve the representation of ordinal values in the construction of machine learning models. Methods: To this aim, we conducted a questionnaire-based research study involving a relatively large sample of 1,152 potential patients and 31 clinicians to represent numerically the perceived meaning of standard and widely-applied labels to describe health conditions. Using these collected values, we then present and discuss different possible fuzzy-set based representations that address the vagueness of medical interpretation by taking into account the perceptions of domain experts. We also apply the findings of this user study to evaluate the impact of different encodings on the predictive performance of common machine learning models in regard to a real-world medical prognostic task. Results: We found significant differences in the perception of pain levels between the two user groups. We also show that the proposed encodings can improve the performances of specific classes of models, and discuss when this is the case. Conclusions: In perspective, our hope is that the proposed techniques for ordinal scale representation and ordinal encoding may be useful to the research community, and also that our methodology will be applied to other widely used ordinal scales for improving validity of datasets and bettering the results of machine learning tasks.

Scheda breve

Scheda completa

Scheda completa (DC)

	Sottotipologia
	
				Articolo in rivista - Articolo scientifico
			
	Parole chiave
	
				Fuzzy sets; Ground truth; Machine learning; Ordinal scales;
			
	Parole chiave
	
				ordinal scales; machine learning; fuzzy sets; ground truth
			
	Lingua del contenuto
	
				English
			
	Data ahead of print o Data prima pubblicazione Online
	
				20-ago-2020
			
	Data di pubblicazione
	
				2020
			
	Rivista
	
				BMC MEDICAL INFORMATICS AND DECISION MAKING
			
	Numero del volume
	
				20
			
	Fascicolo
	
				S5
			
	Article number
	
				142
			
	DOI dell'articolo
	
				https://dx.doi.org/10.1186/s12911-020-01152-8
			
	Fulltext
	
				open
			
	Citazione
	
				Seveso, A., Campagner, A., Ciucci, D., Cabitza, F. (2020). Ordinal labels in machine learning: a user-centered approach to improve data validity in medical settings. BMC MEDICAL INFORMATICS AND DECISION MAKING, 20(S5) [10.1186/s12911-020-01152-8].
			
	Appare nelle tipologie:
	
				01 - Articolo su rivista

File in questo prodotto:

File	Dimensione	Formato
BMC2020.pdf accesso aperto Descrizione: Main manuscript Tipologia di allegato: Author’s Accepted Manuscript, AAM (Post-print) Dimensione 2.26 MB Formato Adobe PDF Visualizza/Apri	2.26 MB	Adobe PDF	Visualizza/Apri
10281-282520_VoR.pdf accesso aperto Tipologia di allegato: Publisher’s Version (Version of Record, VoR) Licenza: Creative Commons Dimensione 2.26 MB Formato Adobe PDF Visualizza/Apri	2.26 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/282520

Citazioni

16

11

Social impact