Calibration is paramount in developing and validating Machine Learning models, particularly in sensitive domains such as medicine. Despite its significance, existing metrics to assess calibration have been found to have shortcomings in regard to their interpretation and theoretical properties. This article introduces a novel and comprehensive framework to assess the calibration of Machine and Deep Learning models that addresses the above limitations. The proposed framework is based on a modification of the Expected Calibration Error (ECE), called the Estimated Calibration Index (ECI), which grounds on and extends prior research. ECI was initially formulated for binary settings, and we adapted it to fit multiclass settings. ECI offers a more nuanced, both locally and globally, and informative measure of a model's tendency towards over/underconfidence. The paper first outlines the issues related to the prevalent definitions of ECE, including potential biases that may arise in the evaluation of their measures. Then, we present the results of a series of experiments conducted to demonstrate the effectiveness of the proposed framework in supporting a more accurate understanding of a model's calibration level. Additionally, we discuss how to address and potentially mitigate some biases in calibration assessment.
Famiglini, L., Campagner, A., Cabitza, F. (2023). Towards a Rigorous Calibration Assessment Framework: Advancements in Metrics, Methods, and Use. In ECAI 2023. 26th European Conference on Artificial Intelligence. September 30–October 4, 2023, Kraków, Poland. Including 12th Conference on Prestigious Applications of Intelligent Systems (PAIS 2023). Proceedings (pp.645-652). IOS Press BV [10.3233/FAIA230327].
Towards a Rigorous Calibration Assessment Framework: Advancements in Metrics, Methods, and Use
Famiglini L.;Campagner A.;Cabitza F.
2023
Abstract
Calibration is paramount in developing and validating Machine Learning models, particularly in sensitive domains such as medicine. Despite its significance, existing metrics to assess calibration have been found to have shortcomings in regard to their interpretation and theoretical properties. This article introduces a novel and comprehensive framework to assess the calibration of Machine and Deep Learning models that addresses the above limitations. The proposed framework is based on a modification of the Expected Calibration Error (ECE), called the Estimated Calibration Index (ECI), which grounds on and extends prior research. ECI was initially formulated for binary settings, and we adapted it to fit multiclass settings. ECI offers a more nuanced, both locally and globally, and informative measure of a model's tendency towards over/underconfidence. The paper first outlines the issues related to the prevalent definitions of ECE, including potential biases that may arise in the evaluation of their measures. Then, we present the results of a series of experiments conducted to demonstrate the effectiveness of the proposed framework in supporting a more accurate understanding of a model's calibration level. Additionally, we discuss how to address and potentially mitigate some biases in calibration assessment.File | Dimensione | Formato | |
---|---|---|---|
Famiglini-2023-ECAI-VoR.pdf
accesso aperto
Tipologia di allegato:
Publisher’s Version (Version of Record, VoR)
Licenza:
Creative Commons
Dimensione
473.13 kB
Formato
Adobe PDF
|
473.13 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.