Bicocca Open Archive

Taxonomies are the mainstay of a wide range of applications in the semantic web and provide formal support for syntactic and semantic exchanges. Hence, they are pivotal for machine understanding in natural language processing and business-oriented tasks. However, keeping such hierarchies updated and able to represent the domain from which they have been drawn, is still a time-consuming and error-prone activity. Here, word embeddings have proven to be effective in catching lexicon and semantic similarities to enrich taxonomies from text data. This, in turn, would require to evaluate the generated embeddings to estimate the extent to which they encode the semantic similarity derived from the hierarchy itself. In this paper, we propose and implement MEET a methodology that aims at generating and evaluating embeddings from a text corpus preserving the semantic similarity relations synthesised from a taxonomy. To this aim, we develop a new measure - namely Hierarchical Semantic Similarity (HSS) - to compute the semantic similarity between taxonomic elements. Experimental results show that (i) HSS outperforms state-of-the-art measures for measuring semantic similarity in taxonomy and (ii) the embedding selected through MEET obtains a clear victory against embeddings selected by the literature on benchmark tasks. We made available an open-source repository 1 1 gitlab anonymised project with all the material employed in this research, including the HSS for 35,000 word pairs.

Malandri, L., Mercorio, F., Mezzanzanica, M., Nobani, N. (2020). MEET: A Method for Embeddings Evaluation for Taxonomic Data. In 2020 International Conference on Data Mining Workshops (ICDMW) (pp.31-38). IEEE Computer Society [10.1109/ICDMW51313.2020.00014].

MEET: A Method for Embeddings Evaluation for Taxonomic Data

Malandri, Lorenzo;Mercorio, Fabio;Mezzanzanica, Mario;Nobani, Navid

2020

Abstract

Taxonomies are the mainstay of a wide range of applications in the semantic web and provide formal support for syntactic and semantic exchanges. Hence, they are pivotal for machine understanding in natural language processing and business-oriented tasks. However, keeping such hierarchies updated and able to represent the domain from which they have been drawn, is still a time-consuming and error-prone activity. Here, word embeddings have proven to be effective in catching lexicon and semantic similarities to enrich taxonomies from text data. This, in turn, would require to evaluate the generated embeddings to estimate the extent to which they encode the semantic similarity derived from the hierarchy itself. In this paper, we propose and implement MEET a methodology that aims at generating and evaluating embeddings from a text corpus preserving the semantic similarity relations synthesised from a taxonomy. To this aim, we develop a new measure - namely Hierarchical Semantic Similarity (HSS) - to compute the semantic similarity between taxonomic elements. Experimental results show that (i) HSS outperforms state-of-the-art measures for measuring semantic similarity in taxonomy and (ii) the embedding selected through MEET obtains a clear victory against embeddings selected by the literature on benchmark tasks. We made available an open-source repository 1 1 gitlab anonymised project with all the material employed in this research, including the HSS for 35,000 word pairs.

Scheda breve

Scheda completa

Scheda completa (DC)

	Tipo di intervento
	
				paper
			
	Parole chiave
	
				Embeddings Evaluation; Semantic Similarity; Taxonomies;
			
	Lingua del contenuto
	
				English
			
	Nome del convegno
	
				20th IEEE International Conference on Data Mining Workshops, ICDMW 2020
			
	Anno del convegno
	
				2020
			
	Titolo degli atti
	
				2020 International Conference on Data Mining Workshops (ICDMW)
			
	ISBN del volume degli atti
	
				9781728190129
			
	Data di pubblicazione
	
				2020
			
	Numero del volume
	
				2020-
			
	Pagina iniziale
	
				31
			
	Pagina finale
	
				38
			
	Article number
	
				9346357
			
	DOI dell'intervento
	
				https://dx.doi.org/10.1109/ICDMW51313.2020.00014
			
	Fulltext
	
				none
			
	Citazione
	
				Malandri, L., Mercorio, F., Mezzanzanica, M., Nobani, N. (2020). MEET: A Method for Embeddings Evaluation for Taxonomic Data. In 2020 International Conference on Data Mining Workshops (ICDMW) (pp.31-38). IEEE Computer Society [10.1109/ICDMW51313.2020.00014].
			
	Appare nelle tipologie:
	
				02 - Intervento a convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/305244

Citazioni

6

3

Social impact