Data preparation is crucial for achieving good data management following the four foundational FAIR principles-Findability, Accessibility, Interoperability, and Reusability. Processing datasets to achieve high data (and metadata) quality is mandatory in modern applications. However, the data preparation activities that are needed to reach such levels may easily become unsustainable due to, for example, resource intensity or scalability challenges. Moreover, some preparation efforts may become unnecessary if they result in negligible improvements or duplicate actions. This article examines the sustainability aspects of data preparation through the lens of a circular economy. Within the data landscape, this perspective encourages practices that minimize waste, extend the data life cycle, and maximize reuse in alignment with the FAIR principles. We explore these practices and their impact on selecting and configuring effective data preparation strategies to design sustainable, high-quality pipelines. To this end, we propose an evaluation model that integrates data quality metrics with sustainability parameters for human and computational tasks. Finally, we apply the model in a comparative analysis of key data preparation methods, demonstrating its effectiveness in assessing sustainability and quality tradeoffs.

Pernici, B., Cappiello, C., Bono, C., Sancricca, C., Catarci, T., Angelini, M., et al. (2025). Sustainable Quality in Data Preparation. ACM JOURNAL OF DATA AND INFORMATION QUALITY, 17(4), 1-33 [10.1145/3769120].

Sustainable Quality in Data Preparation

Palmonari M.;De Paoli F.;
2025

Abstract

Data preparation is crucial for achieving good data management following the four foundational FAIR principles-Findability, Accessibility, Interoperability, and Reusability. Processing datasets to achieve high data (and metadata) quality is mandatory in modern applications. However, the data preparation activities that are needed to reach such levels may easily become unsustainable due to, for example, resource intensity or scalability challenges. Moreover, some preparation efforts may become unnecessary if they result in negligible improvements or duplicate actions. This article examines the sustainability aspects of data preparation through the lens of a circular economy. Within the data landscape, this perspective encourages practices that minimize waste, extend the data life cycle, and maximize reuse in alignment with the FAIR principles. We explore these practices and their impact on selecting and configuring effective data preparation strategies to design sustainable, high-quality pipelines. To this end, we propose an evaluation model that integrates data quality metrics with sustainability parameters for human and computational tasks. Finally, we apply the model in a comparative analysis of key data preparation methods, demonstrating its effectiveness in assessing sustainability and quality tradeoffs.
Articolo in rivista - Articolo scientifico
Data preparation; data quality; sustainability;
English
5-dic-2025
2025
17
4
1
33
23
open
Pernici, B., Cappiello, C., Bono, C., Sancricca, C., Catarci, T., Angelini, M., et al. (2025). Sustainable Quality in Data Preparation. ACM JOURNAL OF DATA AND INFORMATION QUALITY, 17(4), 1-33 [10.1145/3769120].
File in questo prodotto:
File Dimensione Formato  
Pernici et al-2025-Journal of Data and Information Quality-VoR.pdf

accesso aperto

Tipologia di allegato: Publisher’s Version (Version of Record, VoR)
Licenza: Creative Commons
Dimensione 688.37 kB
Formato Adobe PDF
688.37 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/583201
Citazioni
  • Scopus 3
  • ???jsp.display-item.citation.isi??? 3
Social impact