The concept of calibration is key in the development and validation of Machine Learning models, especially in sensitive contexts such as the medical one. However, existing calibration metrics can be difficult to interpret and are affected by theoretical limitations. In this paper, we present a new metric, called GICI (Global Interpretable Calibration Index), which is characterized by being local and defined only in terms of simple geometrical primitives, which makes it both simpler to interpret, and more general than other commonly used metrics, as it can be used also in recalibration procedures. Also, compared to traditional metrics, the GICI allows for a more comprehensive evaluation, as it provides a three-level information: a bin-level local estimate, a global one, and an estimate of the extent confidence scores are either over- or under-confident with respect to actual error rate. We also report the results from experiments aimed at testing the above statements and giving insights about the practical utility of this metric also to improve discriminative accuracy.
Cabitza, F., Campagner, A., Famiglini, L. (2022). Global Interpretable Calibration Index, a New Metric to Estimate Machine Learning Models’ Calibration. In 6th IFIP TC 5, TC 12, WG 8.4, WG 8.9, WG 12.9 International Cross-Domain Conference for Machine Learning and Knowledge Extraction, CD-MAKE 2022, held in conjunction with the 17th International Conference on Availability, Reliability and Security, ARES 2022 (pp.82-99). Springer Science and Business Media Deutschland GmbH [10.1007/978-3-031-14463-9_6].
Global Interpretable Calibration Index, a New Metric to Estimate Machine Learning Models’ Calibration
Cabitza F.
;Campagner A.;Famiglini L.
2022
Abstract
The concept of calibration is key in the development and validation of Machine Learning models, especially in sensitive contexts such as the medical one. However, existing calibration metrics can be difficult to interpret and are affected by theoretical limitations. In this paper, we present a new metric, called GICI (Global Interpretable Calibration Index), which is characterized by being local and defined only in terms of simple geometrical primitives, which makes it both simpler to interpret, and more general than other commonly used metrics, as it can be used also in recalibration procedures. Also, compared to traditional metrics, the GICI allows for a more comprehensive evaluation, as it provides a three-level information: a bin-level local estimate, a global one, and an estimate of the extent confidence scores are either over- or under-confident with respect to actual error rate. We also report the results from experiments aimed at testing the above statements and giving insights about the practical utility of this metric also to improve discriminative accuracy.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.