The data extracted from electronic archives is a valuable asset; however, the issue of the (poor) data quality should be addressed before performing data analysis and decision-making activities. Poor data quality is frequently cleansed using business rules derived from domain knowledge. Unfortunately, the process of designing and implementing cleansing activities based on business rules requires a relevant effort. In this article, we illustrate a model-based approach useful to perform inconsistency identification and corrective interventions, thus simplifying the process of developing cleansing activities. The article shows how the cleansing activities required to perform a sensitivity analysis can be easily developed using the proposed model-based approach. The sensitivity analysis provides insights on how the cleansing activities can affect the results of indicators computation. The approach has been successfully used on a database describing the working histories of an Italian area population. A model formalizing how data should evolve over time (i.e., a data consistency model) in such domain was created (by means of formal methods) and used to perform the cleansing and sensitivity analysis activities.

Mezzanzanica, M., Boselli, R., Cesarini, M., Mercorio, F. (2015). A model-based approach for developing data cleansing solutions. ACM JOURNAL OF DATA AND INFORMATION QUALITY, 5(4), 13-28 [10.1145/2641575].

A model-based approach for developing data cleansing solutions

MEZZANZANICA, MARIO;BOSELLI, ROBERTO;CESARINI, MIRKO;MERCORIO, FABIO
2015

Abstract

The data extracted from electronic archives is a valuable asset; however, the issue of the (poor) data quality should be addressed before performing data analysis and decision-making activities. Poor data quality is frequently cleansed using business rules derived from domain knowledge. Unfortunately, the process of designing and implementing cleansing activities based on business rules requires a relevant effort. In this article, we illustrate a model-based approach useful to perform inconsistency identification and corrective interventions, thus simplifying the process of developing cleansing activities. The article shows how the cleansing activities required to perform a sensitivity analysis can be easily developed using the proposed model-based approach. The sensitivity analysis provides insights on how the cleansing activities can affect the results of indicators computation. The approach has been successfully used on a database describing the working histories of an Italian area population. A model formalizing how data should evolve over time (i.e., a data consistency model) in such domain was created (by means of formal methods) and used to perform the cleansing and sensitivity analysis activities.
Articolo in rivista - Articolo scientifico
Data quality, ETL, data believability, data consistency, data verification
English
2015
5
4
13
28
13
reserved
Mezzanzanica, M., Boselli, R., Cesarini, M., Mercorio, F. (2015). A model-based approach for developing data cleansing solutions. ACM JOURNAL OF DATA AND INFORMATION QUALITY, 5(4), 13-28 [10.1145/2641575].
File in questo prodotto:
File Dimensione Formato  
2641575.pdf

Solo gestori archivio

Tipologia di allegato: Publisher’s Version (Version of Record, VoR)
Dimensione 743.11 kB
Formato Adobe PDF
743.11 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/79861
Citazioni
  • Scopus 19
  • ???jsp.display-item.citation.isi??? 12
Social impact