This paper deals with the issue of concept-drift in machine learning in the context of high dimensional problems. In contrast to previous concept drift detection methods, this application does not depend on the machine learning model in use for a specific target variable, but rather, it attempts to assess the concept drift as an independent characteristic of the evolution of a dataset. This major achievement enables data to be tested for the presence of drift, independently of the specific problem at hand. This is extremely useful when the same dataset is utilized for different classifications simultaneously, as it is often the case in a business environment. Moreover, unlike previous approaches, this method does not require the re-testing of each new model; a strategy which could prove expensive in computational terms. The fundamental intention of this work is to make use of graphical models to elicit the visible structure of data and represent it as a network. Specifically, we investigate how a graphical model evolves by looking at the creation of new links, and the disappearance of existing ones, in different time periods. We perform this task in four steps. We compute the adjacency matrix of a graph in each period, we apply a function that maps each possible state of the adjacency matrix over time into a transition matrix. We use the information in the transition matrix to produce a metric to estimate the presence of a drift in the data. Eventually, we evaluate this method with both three real-world datasets and a synthetic one.

Riso, L., Guerzoni, M. (2022). Concept drift estimation with graphical models. INFORMATION SCIENCES, 606(August 2022), 786-804 [10.1016/j.ins.2022.05.056].

Concept drift estimation with graphical models

Guerzoni, M
2022

Abstract

This paper deals with the issue of concept-drift in machine learning in the context of high dimensional problems. In contrast to previous concept drift detection methods, this application does not depend on the machine learning model in use for a specific target variable, but rather, it attempts to assess the concept drift as an independent characteristic of the evolution of a dataset. This major achievement enables data to be tested for the presence of drift, independently of the specific problem at hand. This is extremely useful when the same dataset is utilized for different classifications simultaneously, as it is often the case in a business environment. Moreover, unlike previous approaches, this method does not require the re-testing of each new model; a strategy which could prove expensive in computational terms. The fundamental intention of this work is to make use of graphical models to elicit the visible structure of data and represent it as a network. Specifically, we investigate how a graphical model evolves by looking at the creation of new links, and the disappearance of existing ones, in different time periods. We perform this task in four steps. We compute the adjacency matrix of a graph in each period, we apply a function that maps each possible state of the adjacency matrix over time into a transition matrix. We use the information in the transition matrix to produce a metric to estimate the presence of a drift in the data. Eventually, we evaluate this method with both three real-world datasets and a synthetic one.
Articolo in rivista - Articolo scientifico
Bayesian logistic regression; Drift estimation; Graphical models; Unsupervised learning;
English
17-mag-2022
2022
606
August 2022
786
804
open
Riso, L., Guerzoni, M. (2022). Concept drift estimation with graphical models. INFORMATION SCIENCES, 606(August 2022), 786-804 [10.1016/j.ins.2022.05.056].
File in questo prodotto:
File Dimensione Formato  
Riso-2022-Info Sci-preprint.pdf

accesso aperto

Descrizione: Research Article
Tipologia di allegato: Submitted Version (Pre-print)
Dimensione 600.76 kB
Formato Adobe PDF
600.76 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/396372
Citazioni
  • Scopus 6
  • ???jsp.display-item.citation.isi??? 5
Social impact