Bicocca Open Archive

Data enrichment is a critical task in the data preparation process in which a dataset is extended with additional information from various sources to perform analyses or add meaningful context. Facilitating the enrichment process design for data workers and supporting its execution on large datasets are only supported to a limited extent by existing solutions. Harnessing semantics at scale can be a crucial factor in effectively addressing this challenge. This chapter presents a comprehensive approach covering both design- and run-time aspects of tabular data enrichment and discusses our experience in making this process scalable. We illustrate how data enrichment steps of a Big Data pipeline can be implemented via tabular transformations exploiting semantic table annotation methods and discuss techniques devised to support the enactment of the resulting process on large tabular datasets. Furthermore, we present results from experimental evaluations in which we tested the scalability and run-time efficiency of the proposed cloud-based approach, enriching massive datasets with promising performance.

Ciavotta, M., Cutrona, V., De Paoli, F., Nikolov, N., Palmonari, M., Roman, D. (2022). Supporting Semantic Data Enrichment at Scale. In E. Curry, S. Auer, A.J. Berre, A. Metzger, M.S. Perez, S. Zillner (a cura di), Technologies and Applications for Big Data Value (pp. 19-39). Cham : Springer [10.1007/978-3-030-78307-5_2].

Supporting Semantic Data Enrichment at Scale

Ciavotta, Michele;Cutrona, Vincenzo;De Paoli, Flavio;Nikolov, Nikolay;Palmonari, Matteo;Roman, Dumitru

2022

Abstract

Data enrichment is a critical task in the data preparation process in which a dataset is extended with additional information from various sources to perform analyses or add meaningful context. Facilitating the enrichment process design for data workers and supporting its execution on large datasets are only supported to a limited extent by existing solutions. Harnessing semantics at scale can be a crucial factor in effectively addressing this challenge. This chapter presents a comprehensive approach covering both design- and run-time aspects of tabular data enrichment and discusses our experience in making this process scalable. We illustrate how data enrichment steps of a Big Data pipeline can be implemented via tabular transformations exploiting semantic table annotation methods and discuss techniques devised to support the enactment of the resulting process on large tabular datasets. Furthermore, we present results from experimental evaluations in which we tested the scalability and run-time efficiency of the proposed cloud-based approach, enriching massive datasets with promising performance.

Scheda breve

Scheda completa

Scheda completa (DC)

	Sottotipologia
	
				Capitolo o saggio
			
	Parole chiave
	
				Big data processing; Data enrichment; Data extension; Data integration; Linked data; Scalability;
			
	Lingua del contenuto
	
				English
			
	Titolo del volume
	
				Technologies and Applications for Big Data Value
			
	Curatori del volume
	
				Curry, E; Auer, S; Berre, AJ;  Metzger, A; Perez, MS; Zillner, S
			
	Data di pubblicazione
	
				2022
			
	ISBN del volume
	
				9783030783068
			
	Editore
	
				Springer
			
	Pagina iniziale
	
				19
			
	Pagina finale
	
				39
			
	DOI del contributo
	
				https://dx.doi.org/10.1007/978-3-030-78307-5_2
			
	Citazione
	
				Ciavotta, M., Cutrona, V., De Paoli, F., Nikolov, N., Palmonari, M., Roman, D. (2022). Supporting Semantic Data Enrichment at Scale. In E. Curry, S. Auer, A.J. Berre, A. Metzger, M.S. Perez, S. Zillner (a cura di), Technologies and Applications for Big Data Value (pp. 19-39). Cham : Springer [10.1007/978-3-030-78307-5_2].
			
	Fulltext
	
				open
			
	Appare nelle tipologie:
	
				03 - Contributo in libro

File in questo prodotto:

File	Dimensione	Formato
unpaywall-bitstream--2039812707.pdf accesso aperto Tipologia di allegato: Publisher’s Version (Version of Record, VoR) Licenza: Creative Commons Dimensione 1.77 MB Formato Adobe PDF Visualizza/Apri	1.77 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/372742

Citazioni

18

ND

Social impact