Taxonomies are the mainstay of a wide range of applications in the semantic web and provide formal support for syntactic and semantic exchanges. Hence, they are pivotal for machine understanding in natural language processing and business-oriented tasks. However, keeping such hierarchies updated and able to represent the domain from which they have been drawn, is still a time-consuming and error-prone activity. Here, word embeddings have proven to be effective in catching lexicon and semantic similarities to enrich taxonomies from text data. This, in turn, would require to evaluate the generated embeddings to estimate the extent to which they encode the semantic similarity derived from the hierarchy itself. In this paper, we propose and implement MEET a methodology that aims at generating and evaluating embeddings from a text corpus preserving the semantic similarity relations synthesised from a taxonomy. To this aim, we develop a new measure - namely Hierarchical Semantic Similarity (HSS) - to compute the semantic similarity between taxonomic elements. Experimental results show that (i) HSS outperforms state-of-the-art measures for measuring semantic similarity in taxonomy and (ii) the embedding selected through MEET obtains a clear victory against embeddings selected by the literature on benchmark tasks. We made available an open-source repository 1 1 gitlab anonymised project with all the material employed in this research, including the HSS for 35,000 word pairs.
Malandri, L., Mercorio, F., Mezzanzanica, M., Nobani, N. (2020). MEET: A Method for Embeddings Evaluation for Taxonomic Data. In 2020 International Conference on Data Mining Workshops (ICDMW) (pp.31-38) [10.1109/ICDMW51313.2020.00014].
MEET: A Method for Embeddings Evaluation for Taxonomic Data
Malandri, Lorenzo;Mercorio, Fabio;Mezzanzanica, Mario;Nobani, Navid
2020
Abstract
Taxonomies are the mainstay of a wide range of applications in the semantic web and provide formal support for syntactic and semantic exchanges. Hence, they are pivotal for machine understanding in natural language processing and business-oriented tasks. However, keeping such hierarchies updated and able to represent the domain from which they have been drawn, is still a time-consuming and error-prone activity. Here, word embeddings have proven to be effective in catching lexicon and semantic similarities to enrich taxonomies from text data. This, in turn, would require to evaluate the generated embeddings to estimate the extent to which they encode the semantic similarity derived from the hierarchy itself. In this paper, we propose and implement MEET a methodology that aims at generating and evaluating embeddings from a text corpus preserving the semantic similarity relations synthesised from a taxonomy. To this aim, we develop a new measure - namely Hierarchical Semantic Similarity (HSS) - to compute the semantic similarity between taxonomic elements. Experimental results show that (i) HSS outperforms state-of-the-art measures for measuring semantic similarity in taxonomy and (ii) the embedding selected through MEET obtains a clear victory against embeddings selected by the literature on benchmark tasks. We made available an open-source repository 1 1 gitlab anonymised project with all the material employed in this research, including the HSS for 35,000 word pairs.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.