Bicocca Open Archive

This article presents a comparison of different Word Sense Induction (wsi) clustering algorithms on two novel pseudoword data sets of semantic-similarity and co-occurrence-based word graphs, with a special focus on the detection of homonymic polysemy. We follow the original definition of a pseudoword as the combination of two monosemous terms and their contexts to simulate a polysemous word. The evaluation is performed comparing the algorithm’s output on a pseudoword’s ego word graph (i.e., a graph that represents the pseudoword’s context in the corpus) with the known subdivision given by the components corresponding to the monosemous source words forming the pseudoword. The main contribution of this article is to present a self-sufficient pseudoword-based evaluation framework for wsi graph-based clustering algorithms, thereby defining a new evaluation measure (top2) and a secondary clustering process (hyperclustering). To our knowledge, we are the first to conduct and discuss a large-scale systematic pseudoword evaluation targeting the induction of coarse-grained homonymous word senses across a large number of graph clustering algorithms.

Cecchini, F., Riedl, M., Fersini, E., Biemann, C. (2018). A comparison of graph-based word sense induction clustering algorithms in a pseudoword evaluation framework. LANGUAGE RESOURCES AND EVALUATION, 52(3), 733-770 [10.1007/s10579-018-9415-1].

A comparison of graph-based word sense induction clustering algorithms in a pseudoword evaluation framework

Cecchini, Flavio Massimiliano;Riedl, Martin;Fersini, Elisabetta;Biemann, Chris

2018

Abstract

This article presents a comparison of different Word Sense Induction (wsi) clustering algorithms on two novel pseudoword data sets of semantic-similarity and co-occurrence-based word graphs, with a special focus on the detection of homonymic polysemy. We follow the original definition of a pseudoword as the combination of two monosemous terms and their contexts to simulate a polysemous word. The evaluation is performed comparing the algorithm’s output on a pseudoword’s ego word graph (i.e., a graph that represents the pseudoword’s context in the corpus) with the known subdivision given by the components corresponding to the monosemous source words forming the pseudoword. The main contribution of this article is to present a self-sufficient pseudoword-based evaluation framework for wsi graph-based clustering algorithms, thereby defining a new evaluation measure (top2) and a secondary clustering process (hyperclustering). To our knowledge, we are the first to conduct and discuss a large-scale systematic pseudoword evaluation targeting the induction of coarse-grained homonymous word senses across a large number of graph clustering algorithms.

Scheda breve

Scheda completa

Scheda completa (DC)

	Sottotipologia
	
				Articolo in rivista - Articolo scientifico
			
	Parole chiave
	
				Evaluation; Graph clustering; Pseudowords; Word sense induction;
			
	Parole chiave
	
				Evaluation; Graph clustering; Pseudowords; Word sense induction; Language and Linguistics; 3304; Linguistics and Language; Library and Information Sciences
			
	Lingua del contenuto
	
				English
			
	Data di pubblicazione
	
				2018
			
	Rivista
	
				LANGUAGE RESOURCES AND EVALUATION
			
	Numero del volume
	
				52
			
	Fascicolo
	
				3
			
	Pagina iniziale
	
				733
			
	Pagina finale
	
				770
			
	DOI dell'articolo
	
				https://dx.doi.org/10.1007/s10579-018-9415-1
			
	Fulltext
	
				none
			
	Citazione
	
				Cecchini, F., Riedl, M., Fersini, E., Biemann, C. (2018). A comparison of graph-based word sense induction clustering algorithms in a pseudoword evaluation framework. LANGUAGE RESOURCES AND EVALUATION, 52(3), 733-770 [10.1007/s10579-018-9415-1].
			
	Appare nelle tipologie:
	
				01 - Articolo su rivista

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/196386

Citazioni

2

3

Social impact