Parallel corpora are widely used in a variety of Natural Language Processing tasks, from Machine Translation to cross-lingual Word Sense Disambiguation, where parallel sentences can be exploited to automatically generate high-quality sense annotations on a large scale. In this paper we present EUROSENSE, a multilingual sense-annotated resource based on the joint disambiguation of the Europarl parallel corpus, with almost 123 million sense annotations for over 155 thousand distinct concepts and entities from a language-independent unified sense inventory. We evaluate the quality of our sense annotations intrinsically and extrinsically, showing their effectiveness as training data for Word Sense Disambiguation.

Delli Bovi, C., Camacho Collados, J., Raganato, A., Navigli, R. (2017). EuroSense: Automatic Harvesting of Multilingual Sense Annotations from Parallel Text. In Proceedings of 55th annual meeting of the Association for Computational Linguistics (pp.594-600). USA : Association for Computational Linguistics (ACL) [10.18653/v1/P17-2094].

EuroSense: Automatic Harvesting of Multilingual Sense Annotations from Parallel Text

Raganato, A;
2017

Abstract

Parallel corpora are widely used in a variety of Natural Language Processing tasks, from Machine Translation to cross-lingual Word Sense Disambiguation, where parallel sentences can be exploited to automatically generate high-quality sense annotations on a large scale. In this paper we present EUROSENSE, a multilingual sense-annotated resource based on the joint disambiguation of the Europarl parallel corpus, with almost 123 million sense annotations for over 155 thousand distinct concepts and entities from a language-independent unified sense inventory. We evaluate the quality of our sense annotations intrinsically and extrinsically, showing their effectiveness as training data for Word Sense Disambiguation.
paper
europarl; sense annotations; word sense disambiguation; entity linking; multilinguality; lexical semantics; semantic similarity;
English
55th Annual Meeting of the Association for Computational Linguistics, ACL 2017 - 30 July-4 August
2017
Proceedings of 55th annual meeting of the Association for Computational Linguistics
978-194562676-0
2017
2
594
600
http://lcl.uniroma1.it/eurosense/papers/ACL17.pdf
reserved
Delli Bovi, C., Camacho Collados, J., Raganato, A., Navigli, R. (2017). EuroSense: Automatic Harvesting of Multilingual Sense Annotations from Parallel Text. In Proceedings of 55th annual meeting of the Association for Computational Linguistics (pp.594-600). USA : Association for Computational Linguistics (ACL) [10.18653/v1/P17-2094].
File in questo prodotto:
File Dimensione Formato  
ACL17.pdf

Solo gestori archivio

Dimensione 265.5 kB
Formato Adobe PDF
265.5 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/361557
Citazioni
  • Scopus 31
  • ???jsp.display-item.citation.isi??? 21
Social impact