Definitional knowledge has proved to be essential in various Natural Language Processing tasks and applications, especially when information at the level of word senses is exploited. However, the few sense-annotated corpora of textual definitions available to date are of limited size: this is mainly due to the expensive and time-consuming process of annotating a wide variety of word senses and entity mentions at a reasonably high scale. In this paper we present SenseDefs, a large-scale high-quality corpus of disambiguated definitions (or glosses) in multiple languages, comprising sense annotations of both concepts and named entities from a wide-coverage unified sense inventory. Our approach for the construction and disambiguation of this corpus builds upon the structure of a large multilingual semantic network and a state-of-the-art disambiguation system: first, we gather complementary information of equivalent definitions across different languages to provide context for disambiguation; then we refine the disambiguation output with a distributional approach based on semantic similarity. As a result, we obtain a multilingual corpus of textual definitions featuring over 38 million definitions in 263 languages, and we publicly release it to the research community. We assess the quality of SenseDefs’s sense annotations both intrinsically and extrinsically on Open Information Extraction and Sense Clustering tasks

Camacho-Collados, J., Delli Bovi, C., Raganato, A., Navigli, R. (2019). SenseDefs: a multilingual corpus of semantically annotated textual definitions: exploiting multiple languages and resources jointly for high-quality Word Sense Disambiguation and Entity Linking. LANGUAGE RESOURCES AND EVALUATION, 53(2), 251-278 [10.1007/s10579-018-9421-3].

SenseDefs: a multilingual corpus of semantically annotated textual definitions: exploiting multiple languages and resources jointly for high-quality Word Sense Disambiguation and Entity Linking

Raganato A.;
2019

Abstract

Definitional knowledge has proved to be essential in various Natural Language Processing tasks and applications, especially when information at the level of word senses is exploited. However, the few sense-annotated corpora of textual definitions available to date are of limited size: this is mainly due to the expensive and time-consuming process of annotating a wide variety of word senses and entity mentions at a reasonably high scale. In this paper we present SenseDefs, a large-scale high-quality corpus of disambiguated definitions (or glosses) in multiple languages, comprising sense annotations of both concepts and named entities from a wide-coverage unified sense inventory. Our approach for the construction and disambiguation of this corpus builds upon the structure of a large multilingual semantic network and a state-of-the-art disambiguation system: first, we gather complementary information of equivalent definitions across different languages to provide context for disambiguation; then we refine the disambiguation output with a distributional approach based on semantic similarity. As a result, we obtain a multilingual corpus of textual definitions featuring over 38 million definitions in 263 languages, and we publicly release it to the research community. We assess the quality of SenseDefs’s sense annotations both intrinsically and extrinsically on Open Information Extraction and Sense Clustering tasks
Articolo in rivista - Articolo scientifico
Entity linking; Glosses; Lexical resources; Multilinguality; Textual definitions; Word Sense Disambiguation;
English
23-lug-2018
2019
53
2
251
278
reserved
Camacho-Collados, J., Delli Bovi, C., Raganato, A., Navigli, R. (2019). SenseDefs: a multilingual corpus of semantically annotated textual definitions: exploiting multiple languages and resources jointly for high-quality Word Sense Disambiguation and Entity Linking. LANGUAGE RESOURCES AND EVALUATION, 53(2), 251-278 [10.1007/s10579-018-9421-3].
File in questo prodotto:
File Dimensione Formato  
Camacho_Sense_2019.pdf

Solo gestori archivio

Dimensione 976.03 kB
Formato Adobe PDF
976.03 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/361567
Citazioni
  • Scopus 4
  • ???jsp.display-item.citation.isi??? 2
Social impact