A considerable amount of data, presented in a structured form, is available on the Web nowadays. For the informational content of such data to be made accessible and understandable to users, its translation into text is preferable. This task is named 'data-to-text generation' in the state-of-the-art, and it is an instance of the Natural Language Generation. In order to generate some valuable text from data, also known as lexicalisation, some approaches have begun to consider the Resource Description Format (RDF) data present within the Knowledge Graphs. In this context, it is possible to identify two main categories of lexicalisation approaches that use neural networks: pipeline and end-to-end. The former has better performances but is more complex to adapt. The latter, the end-to-end systems, has much simpler architectures but is less precise. In this work, in order to get the best from the two categories, we propose a new hybrid approach, TripleEnc, which, thanks to the concept of vector similarity between RDF triples, identifies the best approach for lexicalisation. Empirical comparisons demonstrate that the novel approach improves the quality of the generated text.
Cremaschi, M., Saleri, S., Maurino, A. (2022). A geometrical deep learning model for the lexicalisation of 'unseen' RDF triples. In 2021 IEEE 23rd Int Conf on High Performance Computing & Communications; 7th Int Conf on Data Science & Systems; 19th Int Conf on Smart City; 7th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys) (pp.2233-2240). Institute of Electrical and Electronics Engineers Inc. [10.1109/HPCC-DSS-SmartCity-DependSys53884.2021.00334].
A geometrical deep learning model for the lexicalisation of 'unseen' RDF triples
Cremaschi M.
;Maurino A.
2022
Abstract
A considerable amount of data, presented in a structured form, is available on the Web nowadays. For the informational content of such data to be made accessible and understandable to users, its translation into text is preferable. This task is named 'data-to-text generation' in the state-of-the-art, and it is an instance of the Natural Language Generation. In order to generate some valuable text from data, also known as lexicalisation, some approaches have begun to consider the Resource Description Format (RDF) data present within the Knowledge Graphs. In this context, it is possible to identify two main categories of lexicalisation approaches that use neural networks: pipeline and end-to-end. The former has better performances but is more complex to adapt. The latter, the end-to-end systems, has much simpler architectures but is less precise. In this work, in order to get the best from the two categories, we propose a new hybrid approach, TripleEnc, which, thanks to the concept of vector similarity between RDF triples, identifies the best approach for lexicalisation. Empirical comparisons demonstrate that the novel approach improves the quality of the generated text.| File | Dimensione | Formato | |
|---|---|---|---|
|
Cremaschi et al-2022-HPCC-DSS-SmartCity-DependSys-VoR.pdf
Solo gestori archivio
Tipologia di allegato:
Publisher’s Version (Version of Record, VoR)
Licenza:
Tutti i diritti riservati
Dimensione
169.84 kB
Formato
Adobe PDF
|
169.84 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


