Bicocca Open Archive

A considerable amount of data, presented in a structured form, is available on the Web nowadays. For the informational content of such data to be made accessible and understandable to users, its translation into text is preferable. This task is named 'data-to-text generation' in the state-of-the-art, and it is an instance of the Natural Language Generation. In order to generate some valuable text from data, also known as lexicalisation, some approaches have begun to consider the Resource Description Format (RDF) data present within the Knowledge Graphs. In this context, it is possible to identify two main categories of lexicalisation approaches that use neural networks: pipeline and end-to-end. The former has better performances but is more complex to adapt. The latter, the end-to-end systems, has much simpler architectures but is less precise. In this work, in order to get the best from the two categories, we propose a new hybrid approach, TripleEnc, which, thanks to the concept of vector similarity between RDF triples, identifies the best approach for lexicalisation. Empirical comparisons demonstrate that the novel approach improves the quality of the generated text.

Cremaschi, M., Saleri, S., Maurino, A. (2022). A geometrical deep learning model for the lexicalisation of 'unseen' RDF triples. In 2021 IEEE 23rd Int Conf on High Performance Computing & Communications; 7th Int Conf on Data Science & Systems; 19th Int Conf on Smart City; 7th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys) (pp.2233-2240). Institute of Electrical and Electronics Engineers Inc. [10.1109/HPCC-DSS-SmartCity-DependSys53884.2021.00334].

A geometrical deep learning model for the lexicalisation of 'unseen' RDF triples

Cremaschi M.;Saleri S.;Maurino A.

2022

Abstract

A considerable amount of data, presented in a structured form, is available on the Web nowadays. For the informational content of such data to be made accessible and understandable to users, its translation into text is preferable. This task is named 'data-to-text generation' in the state-of-the-art, and it is an instance of the Natural Language Generation. In order to generate some valuable text from data, also known as lexicalisation, some approaches have begun to consider the Resource Description Format (RDF) data present within the Knowledge Graphs. In this context, it is possible to identify two main categories of lexicalisation approaches that use neural networks: pipeline and end-to-end. The former has better performances but is more complex to adapt. The latter, the end-to-end systems, has much simpler architectures but is less precise. In this work, in order to get the best from the two categories, we propose a new hybrid approach, TripleEnc, which, thanks to the concept of vector similarity between RDF triples, identifies the best approach for lexicalisation. Empirical comparisons demonstrate that the novel approach improves the quality of the generated text.

Scheda breve

Scheda completa

Scheda completa (DC)

	Tipo di intervento
	
				paper
			
	Parole chiave
	
				Lexicalisation; Natural Language Generation; RDF triples;
			
	Lingua del contenuto
	
				English
			
	Nome del convegno
	
				23rd IEEE International Conference on High Performance Computing and Communications, 7th IEEE International Conference on Data Science and Systems, 19th IEEE International Conference on Smart City and 7th IEEE International Conference on Dependability in Sensor, Cloud and Big Data Systems and Applications, HPCC-DSS-SmartCity-DependSys 2021 - 20-22 December 2021
			
	Anno del convegno
	
				2021
			
	Titolo degli atti
	
				2021 IEEE 23rd Int Conf on High Performance Computing & Communications; 7th Int Conf on Data Science & Systems; 19th Int Conf on Smart City; 7th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys)
			
	ISBN del volume degli atti
	
				9781665494571
			
	Data di pubblicazione
	
				2022
			
	Pagina iniziale
	
				2233
			
	Pagina finale
	
				2240
			
	DOI dell'intervento
	
				https://dx.doi.org/10.1109/HPCC-DSS-SmartCity-DependSys53884.2021.00334
			
	Fulltext
	
				reserved
			
	Citazione
	
				Cremaschi, M., Saleri, S., Maurino, A. (2022). A geometrical deep learning model for the lexicalisation of 'unseen' RDF triples. In 2021 IEEE 23rd Int Conf on High Performance Computing & Communications; 7th Int Conf on Data Science & Systems; 19th Int Conf on Smart City; 7th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys) (pp.2233-2240). Institute of Electrical and Electronics Engineers Inc. [10.1109/HPCC-DSS-SmartCity-DependSys53884.2021.00334].
			
	Appare nelle tipologie:
	
				02 - Intervento a convegno

File in questo prodotto:

File	Dimensione	Formato
Cremaschi et al-2022-HPCC-DSS-SmartCity-DependSys-VoR.pdf Solo gestori archivio Tipologia di allegato: Publisher’s Version (Version of Record, VoR) Licenza: Tutti i diritti riservati Dimensione 169.84 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	169.84 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/559282

Citazioni

0

ND

Social impact