Bicocca Open Archive

In tag-word disambiguation, a word is assigned to a specific context chosen among the different ones to which it is related. Relatedness to a context is often defined based on the co-occurrence of the target word with other words (context words) in sentences of a specific corpus. The overall disambiguation process can be thought as a classification process, where the context words play the role of features for the target. A problem with this approach is that the large number of possible context words can reduce the classification performance, both in terms of computational effort and in terms of quality of the outcome. Feature selection can improve the process in both regards, by reducing the overall feature space to a manageable size with high information content. In this work we propose to use, in disambiguation, a feature selection approach based on the Shapley Value (SV)- A Coalitional Game Theory related metrics, measuring the importance of a component within a coalition. By including in the feature set only the words with the highest Shapley Value, we obtain remarkable quality and performance improvements. The problem of the exponential complexity in the exact SV computation is avoided by an approximate computation based on sampling. We demonstrate the effectiveness of this method and of the sampling approach results, by using both a synthetic language corpus and a real world linguistic corpus.

Legesse, M., Gianini, G., Teferi, D. (2017). Selecting Feature-Words in Tag Sense Disambiguation Based on Their Shapley Value. In Proceedings - 12th International Conference on Signal Image Technology and Internet-Based Systems, SITIS 2016 (pp.236-240). IEEE [10.1109/SITIS.2016.45].

Selecting Feature-Words in Tag Sense Disambiguation Based on Their Shapley Value

Legesse, M;Gianini, G;Teferi, D

2017

Abstract

In tag-word disambiguation, a word is assigned to a specific context chosen among the different ones to which it is related. Relatedness to a context is often defined based on the co-occurrence of the target word with other words (context words) in sentences of a specific corpus. The overall disambiguation process can be thought as a classification process, where the context words play the role of features for the target. A problem with this approach is that the large number of possible context words can reduce the classification performance, both in terms of computational effort and in terms of quality of the outcome. Feature selection can improve the process in both regards, by reducing the overall feature space to a manageable size with high information content. In this work we propose to use, in disambiguation, a feature selection approach based on the Shapley Value (SV)- A Coalitional Game Theory related metrics, measuring the importance of a component within a coalition. By including in the feature set only the words with the highest Shapley Value, we obtain remarkable quality and performance improvements. The problem of the exponential complexity in the exact SV computation is avoided by an approximate computation based on sampling. We demonstrate the effectiveness of this method and of the sampling approach results, by using both a synthetic language corpus and a real world linguistic corpus.

Scheda breve

Scheda completa

Scheda completa (DC)

	Tipo di intervento
	
				paper
			
	Parole chiave
	
				Dimensional reduction; Disambiguation; Feature selection; semantic relatedness; Shapley Value; tagging;
			
	Lingua del contenuto
	
				English
			
	Nome del convegno
	
				12th International Conference on Signal Image Technology and Internet-Based Systems, SITIS 2016 - 28 November 2016 through 1 December 2016
			
	Anno del convegno
	
				2016
			
	Curatori della monografia
	
				De Pietro, G; Dipanda, A; Chbeir, R; Gallo, L; Yetongnon, K
			
	Titolo degli atti
	
				Proceedings - 12th International Conference on Signal Image Technology and Internet-Based Systems, SITIS 2016
			
	ISBN del volume degli atti
	
				9781509056989
			
	Data di pubblicazione
	
				2017
			
	Pagina iniziale
	
				236
			
	Pagina finale
	
				240
			
	Article number
	
				7907472
			
	DOI dell'intervento
	
				https://dx.doi.org/10.1109/SITIS.2016.45
			
	Fulltext
	
				reserved
			
	Citazione
	
				Legesse, M., Gianini, G., Teferi, D. (2017). Selecting Feature-Words in Tag Sense Disambiguation Based on Their Shapley Value. In Proceedings - 12th International Conference on Signal Image Technology and Internet-Based Systems, SITIS 2016 (pp.236-240). IEEE [10.1109/SITIS.2016.45].
			
	Appare nelle tipologie:
	
				02 - Intervento a convegno

File in questo prodotto:

File	Dimensione	Formato
Legesse-2017-SITIS-.pdf Solo gestori archivio Tipologia di allegato: Publisher’s Version (Version of Record, VoR) Licenza: Tutti i diritti riservati Dimensione 256.46 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	256.46 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/454964

Citazioni

6

5

Social impact