Robust Learning Methods for Imprecise Data and Cautious Inference

Campagner, A

The representation, quantification and proper management of uncertainty is one of the central problems in Artificial Intelligence, and particularly so in Machine Learning, in which uncertainty is intrinsically tied to the inductive nature of the learning problem. Among different forms of uncertainty, the modeling of imprecision, that is the problem of dealing with data or knowledge that are imperfect} and incomplete, has recently attracted interest in the research community, for its theoretical and application-oriented implications on the practice and use of Machine Learning-based tools and methods. This work focuses on the problem of dealing with imprecision in Machine Learning, from two different perspectives. On the one hand, when imprecision affects the input data to a Machine Learning pipeline, leading to the problem of learning from imprecise data. On the other hand, when imprecision is used a way to implement uncertainty quantification for Machine Learning methods, by allowing these latter to provide set-valued predictions, leading to so-called cautious inference methods. The aim of this work, then, will be to investigate theoretical as well as empirical issues related to the two above mentioned settings. Within the context of learning from imprecise data, focus will be given on the investigation of the learning from fuzzy labels setting, both from a learning-theoretical and algorithmic point of view. Main contributions in this sense include: a learning-theoretical characterization of the hardness of learning from fuzzy labels problem; the proposal of a novel, pseudo labels-based, ensemble learning algorithm along with its theoretical study and empirical analysis, by which it is shown to provide promising results in comparison with the state-of-the-art; the application of this latter algorithm in three relevant real-world medical problems, in which imprecision occurs, respectively, due to the presence of conflicting expert opinions, the use of vague technical vocabulary, and the presence of individual variability in biochemical parameters; as well as the proposal of feature selection algorithms that may help in reducing the computational complexity of this task or limit the curse of dimensionality. Within the context of cautious inference, focus will be given to the theoretical study of three popular cautious inference frameworks, as well as to the development of novel algorithms and approaches to further the application of cautious inference in relevant settings. Main contributions in this sense include the study of the theoretical properties of, and relationships among, decision-theoretic, selective prediction and conformal prediction methods; the proposal of novel cautious inference techniques drawing from the interaction between decision-theoretic and conformal predictions methods, and their evaluation in medical settings; as well as the study of ensemble of cautious inference models, both from an empirical point of view, as well as from a theoretical one, by which it is shown that such ensembles could be useful to improve robustness, generalization, as well as to facilitate application of cautious inference methods on multi-source and multi-modal data.

La rappresentazione, quantificazione e gestione dell'incertezza è uno dei problemi centrali nell'Intelligenza Artificiale, ed in particolare nel Machine Learning, in cui l'incertezza è intrinsecamente collegata alla natura induttiva dell'apprendimento. Tra diverse forme d'incertezza, la modellazione dell'imprecisione, cioè il problem di gestire dati o conoscenza imperfetta o incompleta, ha recentemente attratto molto interesse nella comunità di ricerca, per via delle sue implicazione teoriche e applicate sull'uso di strumenti basati sul Machine Learning. Questo lavoro si concentra sul problema di gestire l'imprecision nel Machine Learning, sotto due diverse prospettive. Da un lato, l'imprecisione che riguarda i dati di input alla pipeline di Machine Learning, da cui si origina il problema dell'apprendimento da dati imprecisi. Dall'altro, l'imprecisione come strumento per implementare processi di quantificazione dell'incertezza nel Machine Learning, al fine di permettere a questi ultimi di fornire previsioni set-valued e portare quindi alla definizione di metodi di inferenza cauta. Lo scopo di questo lavoro, quindi, riguarda lo studio teorico ed empirico dei due scenari summenzionati. Per quanto riguarda il problema dell'apprendimento da dati imprecisi, il focus principale riguarda l'investigazione del problema dell'apprendimento da fuzzy label, sia da un punto di visto teorico che algoritmo. I contributi principali includono: la proposta di una caratterizzazione teorica del problema; la proposta di un nuovo algoritmo di ensemble, basato su pseudo-label, e il suo studio dal punto di visto teorico ed empirico; l'applicazione del summenzionato algoritmo in tre problemi medici reali; ed infine la proposta e lo studio di algoritmi di feature selection per ridurre la complessità computazionale e limitare la "curse of dimensionality" per algoritmi di apprendimento da fuzzy label. Per quanto riguarda l'inferenza cauta, il focus principale riguarda lo studio teorico di tre framework per l'inferenza cauta e lo sviluppo di nuovi algoritmi ed approcci per estendere l'applicabilità di tali framework in setting complessi. I contributi principali in questo senso riguardo lo studio delle proprietà teoriche di, e le relazioni tra, metodi di inferenza cauta decision-teorici, basati sulla selective prediction e sulla conformal prediction; lo studio di modelli ensemble di inferenza cauta, sia da un punto di vista empirico che teorico, mostrando in particolare che tali ensemble permettono di migliorare la robustezza e la generalizzazione di algoritmi di Machine Learning, nonché di facilitare l'applicazione di metodi d'inferenza cauta a dati complessi, multi-sorgenti o multi-modali

(2023). Robust Learning Methods for Imprecise Data and Cautious Inference. (Tesi di dottorato, Università degli Studi di Milano-Bicocca, 2023).