The importance of data cleaning and data quality is becoming increasingly clear, as evidenced by the surge in software, tools, consulting companies, and seminars addressing data quality issues. In this contribution, the authors present and describe how Bayesian computational techniques can be exploited for data-cleaning purposes to the extent of reducing the time to clean and understand the data. The proposed approach relies on the computational device named Bayesian belief network, which is a general statistical model that allows the efficient description and treatment of joint probability distributions. This work describes the conceptual framework that maps the Bayesian belief network computational device to some of the most difficult tasks in data cleaning, namely imputing missing values, completing truncated datasets, and outliers detection. The proposed framework is described and supported by a set of numerical experiments performed by exploiting the Bayesian belief network programming suite named HUGIN.

Fagiuoli, E., Omerino, S., Stella, F. (2008). Bayesian Belief Networks for Data Cleaning. In G. Felici, C. Vercelli (a cura di), Mathematical Methods for Knowledge Discovery and Data Mining (pp. 204-219). Hershyey, New York : Information Science Reference.

Bayesian Belief Networks for Data Cleaning

FAGIUOLI, ENRICO RENZO CESARE;STELLA, FABIO ANTONIO
2008

Abstract

The importance of data cleaning and data quality is becoming increasingly clear, as evidenced by the surge in software, tools, consulting companies, and seminars addressing data quality issues. In this contribution, the authors present and describe how Bayesian computational techniques can be exploited for data-cleaning purposes to the extent of reducing the time to clean and understand the data. The proposed approach relies on the computational device named Bayesian belief network, which is a general statistical model that allows the efficient description and treatment of joint probability distributions. This work describes the conceptual framework that maps the Bayesian belief network computational device to some of the most difficult tasks in data cleaning, namely imputing missing values, completing truncated datasets, and outliers detection. The proposed framework is described and supported by a set of numerical experiments performed by exploiting the Bayesian belief network programming suite named HUGIN.
Capitolo o saggio
Data Cleaning, Bayesian Belief Networks, Imputing Missing Values, Truncated Datasets, Outliers Detection
English
Mathematical Methods for Knowledge Discovery and Data Mining
Felici, G; Vercelli, C
2008
1599045281
Information Science Reference
204
219
Fagiuoli, E., Omerino, S., Stella, F. (2008). Bayesian Belief Networks for Data Cleaning. In G. Felici, C. Vercelli (a cura di), Mathematical Methods for Knowledge Discovery and Data Mining (pp. 204-219). Hershyey, New York : Information Science Reference.
none
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/8366
Citazioni
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
Social impact