Supervised learning is an important branch of machine learning (ML), which requires a complete annotation (labeling) of the involved training data. This assumption is relaxed in the settings of weakly supervised learning, where labels are allowed to be imprecise or partial. In this article, we study the setting of superset learning, in which instances are assumed to be labeled with a set of possible annotations containing the correct one. We tackle the problem of learning from such data in the context of rough set theory (RST). More specifically, we consider the problem of RST-based feature reduction as a suitable means for data disambiguation, i.e., for the purpose of figuring out the most plausible precise instantiation of the imprecise training data. To this end, we define appropriate generalizations of decision tables and reducts, using tools from generalized information theory and belief function theory. Moreover, we analyze the computational complexity and theoretical properties of the associated computational problems. Finally, we present results of a series of experiments, in which we analyze the proposed concepts empirically and compare our methods with a state-of-the-art dimensionality reduction algorithm, reporting a statistically significant improvement in predictive accuracy.
Campagner, A., Ciucci, D., Hullermeier, E. (2021). Rough set-based feature selection for weakly labeled data. INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 136(September 2021), 150-167 [10.1016/j.ijar.2021.06.005].
Rough set-based feature selection for weakly labeled data
Campagner A.;Ciucci D.;
2021
Abstract
Supervised learning is an important branch of machine learning (ML), which requires a complete annotation (labeling) of the involved training data. This assumption is relaxed in the settings of weakly supervised learning, where labels are allowed to be imprecise or partial. In this article, we study the setting of superset learning, in which instances are assumed to be labeled with a set of possible annotations containing the correct one. We tackle the problem of learning from such data in the context of rough set theory (RST). More specifically, we consider the problem of RST-based feature reduction as a suitable means for data disambiguation, i.e., for the purpose of figuring out the most plausible precise instantiation of the imprecise training data. To this end, we define appropriate generalizations of decision tables and reducts, using tools from generalized information theory and belief function theory. Moreover, we analyze the computational complexity and theoretical properties of the associated computational problems. Finally, we present results of a series of experiments, in which we analyze the proposed concepts empirically and compare our methods with a state-of-the-art dimensionality reduction algorithm, reporting a statistically significant improvement in predictive accuracy.File | Dimensione | Formato | |
---|---|---|---|
Superset_Learning_and_Rough_Sets___Extended.pdf
accesso aperto
Tipologia di allegato:
Submitted Version (Pre-print)
Dimensione
370.27 kB
Formato
Adobe PDF
|
370.27 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.