This study explores the concept of similarity in machine learning (ML) and its congruence with human judgment in medical contexts, focusing primarily on radiology. We conducted a user study involving two radiologists and two orthopedic and spine surgeons. These experts evaluated the similarity of 72 cases, selected from a larger dataset by an ML model based on Cosine and Euclidean distances, in comparison to 18 representative base cases of vertebral fractures. Our analysis focused on correlating these ML-derived distances with the experts’ assessments. The findings reveal that: (1) both Cosine and Euclidean distances had limited correlation with human judgments; (2) Cosine distances showed a marginally higher correlation than Euclidean distances; despite the limitations due to the small samples of evaluations and evaluators, our findings emphasize the necessity for ongoing research to enhance AI similarity metrics, aiming for greater human-centricity and relevance, particularly considering their critical role in ML training and inference. Our study’s implications are far-reaching, advocating for a comprehensive reevaluation of similarity assessments in AI to achieve a closer alignment with human cognitive processes, extending well beyond the realm of medical imaging.
Cabitza, F., Famiglini, L., Campagner, A., Sconfienza, L., Fusco, S., Caccavella, V., et al. (2024). Dissimilar Similarities: Comparing Human and Statistical Similarity Evaluation in Medical AI. In Modeling Decisions for Artificial Intelligence 21st International Conference, MDAI 2024, Tokyo, Japan, August 27–31, 2024, Proceedings (pp.187-198). Springer Science and Business Media Deutschland GmbH [10.1007/978-3-031-68208-7_16].
Dissimilar Similarities: Comparing Human and Statistical Similarity Evaluation in Medical AI
Cabitza F.;Famiglini L.;Campagner A.;
2024
Abstract
This study explores the concept of similarity in machine learning (ML) and its congruence with human judgment in medical contexts, focusing primarily on radiology. We conducted a user study involving two radiologists and two orthopedic and spine surgeons. These experts evaluated the similarity of 72 cases, selected from a larger dataset by an ML model based on Cosine and Euclidean distances, in comparison to 18 representative base cases of vertebral fractures. Our analysis focused on correlating these ML-derived distances with the experts’ assessments. The findings reveal that: (1) both Cosine and Euclidean distances had limited correlation with human judgments; (2) Cosine distances showed a marginally higher correlation than Euclidean distances; despite the limitations due to the small samples of evaluations and evaluators, our findings emphasize the necessity for ongoing research to enhance AI similarity metrics, aiming for greater human-centricity and relevance, particularly considering their critical role in ML training and inference. Our study’s implications are far-reaching, advocating for a comprehensive reevaluation of similarity assessments in AI to achieve a closer alignment with human cognitive processes, extending well beyond the realm of medical imaging.File | Dimensione | Formato | |
---|---|---|---|
Cabitza-2024-MDAI-preprint.pdf
accesso aperto
Tipologia di allegato:
Submitted Version (Pre-print)
Licenza:
Altro
Dimensione
1.86 MB
Formato
Adobe PDF
|
1.86 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.