In the last years, the growing diffusion of IT-based services has given a rise to the use of huge masses of data. However, using data for analytical and decision making purposes requires to perform several tasks, e.g. data cleansing, data filtering, data aggregation and synthesis, etc. Tools and methodologies empowering people are required to appropriately manage the (high) complexity of large datasets. This paper proposes the multidimensional RDQA, an enhanced version of an existing model-based data verification technique, that can be used to identify, extract, and classify data inconsistencies on longitudinal data. Specifically, it discovers fine grained information about the data inconsistencies and it uses a multidimensional visualisation technique for showing them. The enhanced RDQA supports and empowers the users in the task of assessing and improving algorithms and solutions for data analysis, especially when large datasets are considered. The proposed technique has been applied on a real-world dataset derived from the Italian labour market domain, which we made publicly available to the community.

Boselli, R., Cesarini, M., Mercorio, F., & Mezzanzanica, M. (2013). Inconsistency Knowledge Discovery for Longitudinal Data Management: A Model-Based Approach. In Human-Computer Interaction and Knowledge Discovery in Complex, Unstructured, Big Data (pp. 183-194). Springer Berlin Heidelberg [10.1007/978-3-642-39146-0_17].

Inconsistency Knowledge Discovery for Longitudinal Data Management: A Model-Based Approach

BOSELLI, ROBERTO;CESARINI, MIRKO;MERCORIO, FABIO;MEZZANZANICA, MARIO
2013

Abstract

In the last years, the growing diffusion of IT-based services has given a rise to the use of huge masses of data. However, using data for analytical and decision making purposes requires to perform several tasks, e.g. data cleansing, data filtering, data aggregation and synthesis, etc. Tools and methodologies empowering people are required to appropriately manage the (high) complexity of large datasets. This paper proposes the multidimensional RDQA, an enhanced version of an existing model-based data verification technique, that can be used to identify, extract, and classify data inconsistencies on longitudinal data. Specifically, it discovers fine grained information about the data inconsistencies and it uses a multidimensional visualisation technique for showing them. The enhanced RDQA supports and empowers the users in the task of assessing and improving algorithms and solutions for data analysis, especially when large datasets are considered. The proposed technique has been applied on a real-world dataset derived from the Italian labour market domain, which we made publicly available to the community.
Capitolo o saggio
Data Cleansing,Data Quality,Model Checking,Model based approach,Data Visualisation,
English
Human-Computer Interaction and Knowledge Discovery in Complex, Unstructured, Big Data
978-3-642-39145-3
Boselli, R., Cesarini, M., Mercorio, F., & Mezzanzanica, M. (2013). Inconsistency Knowledge Discovery for Longitudinal Data Management: A Model-Based Approach. In Human-Computer Interaction and Knowledge Discovery in Complex, Unstructured, Big Data (pp. 183-194). Springer Berlin Heidelberg [10.1007/978-3-642-39146-0_17].
Boselli, R; Cesarini, M; Mercorio, F; Mezzanzanica, M
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/45007
Citazioni
  • Scopus 15
  • ???jsp.display-item.citation.isi??? ND
Social impact