Data cleansing is growing in importance among both public and private organisations, mainly due to the relevant amount of data exploited for supporting decision making processes. This paper is aimed to show how model-based verification algorithms (namely, model checking) can contribute in addressing data cleansing issues, furthermore a new benchmark problem focusing on the labour market dynamic is introduced. The consistent evolution of the data is checked using a model defined on the basis of domain knowledge. Then, we formally introduce the concept of universal cleanser, i.e. an object which summarises the set of all cleansing actions for each feasible data inconsistency (according to a given consistency model), then providing an algorithm which synthesises it. The universal cleanser can be seen as a repository of corrective interventions useful to develop cleansing routines. We applied our approach to a dataset derived from the Italian labour market data, making the whole dataset and outcomes publicly available to the community, so that the results we present can be shared and compared with other techniques

Mezzanzanica, M., Boselli, R., Cesarini, M., & Mercorio, F. (2013). Automatic Synthesis of Data Cleansing Activities. In Proceedings of the 2nd International Conference on Data Technologies and Applications (pp.138-149). Markus Helfert and Chiara Francalanci [10.5220/0004491101380149].

Automatic Synthesis of Data Cleansing Activities

MEZZANZANICA, MARIO;BOSELLI, ROBERTO;CESARINI, MIRKO;MERCORIO, FABIO
2013

Abstract

Data cleansing is growing in importance among both public and private organisations, mainly due to the relevant amount of data exploited for supporting decision making processes. This paper is aimed to show how model-based verification algorithms (namely, model checking) can contribute in addressing data cleansing issues, furthermore a new benchmark problem focusing on the labour market dynamic is introduced. The consistent evolution of the data is checked using a model defined on the basis of domain knowledge. Then, we formally introduce the concept of universal cleanser, i.e. an object which summarises the set of all cleansing actions for each feasible data inconsistency (according to a given consistency model), then providing an algorithm which synthesises it. The universal cleanser can be seen as a repository of corrective interventions useful to develop cleansing routines. We applied our approach to a dataset derived from the Italian labour market data, making the whole dataset and outcomes publicly available to the community, so that the results we present can be shared and compared with other techniques
No
paper
Scientifica
Data Quality, Data Management, Cleansing Algorithms, Model-based Reasoning
English
The International Conference on Data Management Technologies and Applications (DATA) - July 29 - 31
978-989-8565-67-9
Mezzanzanica, M., Boselli, R., Cesarini, M., & Mercorio, F. (2013). Automatic Synthesis of Data Cleansing Activities. In Proceedings of the 2nd International Conference on Data Technologies and Applications (pp.138-149). Markus Helfert and Chiara Francalanci [10.5220/0004491101380149].
Mezzanzanica, M; Boselli, R; Cesarini, M; Mercorio, F
File in questo prodotto:
File Dimensione Formato  
paper_36.pdf

accesso aperto

Dimensione 210.54 kB
Formato Adobe PDF
210.54 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/44493
Citazioni
  • Scopus 15
  • ???jsp.display-item.citation.isi??? ND
Social impact