Feature selection, defined as the automatic selection of the most relevant features from a machine learning dataset, has three objectives: to improve the prediction performance of the predictors, to provide faster and more cost-effective predictors, and to offer a better understanding of the underlying process that generated the data. In the case of dimensional (e.g., temporal or spatial) data, feature selection is often approached using standard methods that neglect or denature the dimensional component. This paper provides a first step towards systematic and general dimensional feature selection, with a portfolio of supervised and unsupervised, filter-based selectors that can be naturally combined into an end-to-end methodology. In a hypothesis-testing setting, our experiments show that our approach can extract provably relevant features in both the temporal and spatial cases.
Cavina, P., Manzella, F., Pagliarini, G., Sciavicco, G., Stan, I. (2023). (Un)supervised Univariate Feature Extraction and Selection for Dimensional Data. In Proceedings of the 2nd Italian Conference on Big Data and Data Science (ITADATA 2023). CEUR-WS.
(Un)supervised Univariate Feature Extraction and Selection for Dimensional Data
Stan, IE
2023
Abstract
Feature selection, defined as the automatic selection of the most relevant features from a machine learning dataset, has three objectives: to improve the prediction performance of the predictors, to provide faster and more cost-effective predictors, and to offer a better understanding of the underlying process that generated the data. In the case of dimensional (e.g., temporal or spatial) data, feature selection is often approached using standard methods that neglect or denature the dimensional component. This paper provides a first step towards systematic and general dimensional feature selection, with a portfolio of supervised and unsupervised, filter-based selectors that can be naturally combined into an end-to-end methodology. In a hypothesis-testing setting, our experiments show that our approach can extract provably relevant features in both the temporal and spatial cases.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.