Data enrichment is a critical task in the data preparation process in which a dataset is extended with additional information from various sources to perform analyses or add meaningful context. Facilitating the enrichment process design for data workers and supporting its execution on large datasets are only supported to a limited extent by existing solutions. Harnessing semantics at scale can be a crucial factor in effectively addressing this challenge. This chapter presents a comprehensive approach covering both design- and run-time aspects of tabular data enrichment and discusses our experience in making this process scalable. We illustrate how data enrichment steps of a Big Data pipeline can be implemented via tabular transformations exploiting semantic table annotation methods and discuss techniques devised to support the enactment of the resulting process on large tabular datasets. Furthermore, we present results from experimental evaluations in which we tested the scalability and run-time efficiency of the proposed cloud-based approach, enriching massive datasets with promising performance.

Ciavotta, M., Cutrona, V., De Paoli, F., Nikolov, N., Palmonari, M., Roman, D. (2022). Supporting Semantic Data Enrichment at Scale. In E. Curry, S. Auer, A.J. Berre, A. Metzger, M.S. Perez, S. Zillner (a cura di), Technologies and Applications for Big Data Value (pp. 19-39). Cham : Springer [10.1007/978-3-030-78307-5_2].

Supporting Semantic Data Enrichment at Scale

Ciavotta, Michele
;
Cutrona, Vincenzo;De Paoli, Flavio;Palmonari, Matteo;
2022

Abstract

Data enrichment is a critical task in the data preparation process in which a dataset is extended with additional information from various sources to perform analyses or add meaningful context. Facilitating the enrichment process design for data workers and supporting its execution on large datasets are only supported to a limited extent by existing solutions. Harnessing semantics at scale can be a crucial factor in effectively addressing this challenge. This chapter presents a comprehensive approach covering both design- and run-time aspects of tabular data enrichment and discusses our experience in making this process scalable. We illustrate how data enrichment steps of a Big Data pipeline can be implemented via tabular transformations exploiting semantic table annotation methods and discuss techniques devised to support the enactment of the resulting process on large tabular datasets. Furthermore, we present results from experimental evaluations in which we tested the scalability and run-time efficiency of the proposed cloud-based approach, enriching massive datasets with promising performance.
Capitolo o saggio
Big data processing; Data enrichment; Data extension; Data integration; Linked data; Scalability;
English
Technologies and Applications for Big Data Value
Curry, E; Auer, S; Berre, AJ; Metzger, A; Perez, MS; Zillner, S
2022
9783030783068
Springer
19
39
Ciavotta, M., Cutrona, V., De Paoli, F., Nikolov, N., Palmonari, M., Roman, D. (2022). Supporting Semantic Data Enrichment at Scale. In E. Curry, S. Auer, A.J. Berre, A. Metzger, M.S. Perez, S. Zillner (a cura di), Technologies and Applications for Big Data Value (pp. 19-39). Cham : Springer [10.1007/978-3-030-78307-5_2].
none
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/372742
Citazioni
  • Scopus 5
  • ???jsp.display-item.citation.isi??? ND
Social impact