Supervised learning is an important branch of machine learning (ML), which requires a complete annotation (labeling) of the involved training data. This assumption is relaxed in the settings of weakly supervised learning, where labels are allowed to be imprecise or partial. In this article, we study the setting of superset learning, in which instances are assumed to be labeled with a set of possible annotations containing the correct one. We tackle the problem of learning from such data in the context of rough set theory (RST). More specifically, we consider the problem of RST-based feature reduction as a suitable means for data disambiguation, i.e., for the purpose of figuring out the most plausible precise instantiation of the imprecise training data. To this end, we define appropriate generalizations of decision tables and reducts, using tools from generalized information theory and belief function theory. Moreover, we analyze the computational complexity and theoretical properties of the associated computational problems. Finally, we present results of a series of experiments, in which we analyze the proposed concepts empirically and compare our methods with a state-of-the-art dimensionality reduction algorithm, reporting a statistically significant improvement in predictive accuracy.

Campagner, A., Ciucci, D., Hullermeier, E. (2021). Rough set-based feature selection for weakly labeled data. INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 136(September 2021), 150-167 [10.1016/j.ijar.2021.06.005].

Rough set-based feature selection for weakly labeled data

Campagner A.;Ciucci D.;
2021

Abstract

Supervised learning is an important branch of machine learning (ML), which requires a complete annotation (labeling) of the involved training data. This assumption is relaxed in the settings of weakly supervised learning, where labels are allowed to be imprecise or partial. In this article, we study the setting of superset learning, in which instances are assumed to be labeled with a set of possible annotations containing the correct one. We tackle the problem of learning from such data in the context of rough set theory (RST). More specifically, we consider the problem of RST-based feature reduction as a suitable means for data disambiguation, i.e., for the purpose of figuring out the most plausible precise instantiation of the imprecise training data. To this end, we define appropriate generalizations of decision tables and reducts, using tools from generalized information theory and belief function theory. Moreover, we analyze the computational complexity and theoretical properties of the associated computational problems. Finally, we present results of a series of experiments, in which we analyze the proposed concepts empirically and compare our methods with a state-of-the-art dimensionality reduction algorithm, reporting a statistically significant improvement in predictive accuracy.
Articolo in rivista - Articolo scientifico
Entropy; Evidence Theory; Feature Selection; Rough Sets; Superset Learning;
English
18-giu-2021
2021
136
September 2021
150
167
open
Campagner, A., Ciucci, D., Hullermeier, E. (2021). Rough set-based feature selection for weakly labeled data. INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 136(September 2021), 150-167 [10.1016/j.ijar.2021.06.005].
File in questo prodotto:
File Dimensione Formato  
Superset_Learning_and_Rough_Sets___Extended.pdf

accesso aperto

Tipologia di allegato: Submitted Version (Pre-print)
Dimensione 370.27 kB
Formato Adobe PDF
370.27 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/324845
Citazioni
  • Scopus 31
  • ???jsp.display-item.citation.isi??? 28
Social impact