Bicocca Open Archive

Topic models aim at discovering a set of hidden themes in a text corpus. A user might be interested in identifying the most similar topics of a given theme of interest. To accomplish this task, several similarity and distance metrics can be adopted. In this paper, we provide a comparison of the state-of-the-art topic similarity measures and propose novel metrics based on word embeddings. The proposed measures can overcome some limitations of the existing approaches, highlighting good capabilities in terms of several topic performance measures on benchmark datasets.

Terragni, S., Fersini, E., Messina, E. (2021). Word Embedding-Based Topic Similarity Measures. In Natural Language Processing and Information Systems. 26th International Conference on Applications of Natural Language to Information Systems, NLDB 2021, Saarbrücken, Germany, June 23–25, 2021, Proceedings (pp.33-45). Cham : Springer Nature [10.1007/978-3-030-80599-9_4].

Word Embedding-Based Topic Similarity Measures

Terragni, S;Fersini, E;Messina, E

2021

Abstract

Topic models aim at discovering a set of hidden themes in a text corpus. A user might be interested in identifying the most similar topics of a given theme of interest. To accomplish this task, several similarity and distance metrics can be adopted. In this paper, we provide a comparison of the state-of-the-art topic similarity measures and propose novel metrics based on word embeddings. The proposed measures can overcome some limitations of the existing approaches, highlighting good capabilities in terms of several topic performance measures on benchmark datasets.

Scheda breve

Scheda completa

Scheda completa (DC)

	Tipo di intervento
	
				paper
			
	Parole chiave
	
				Topic modeling; Topic similarity; Word embeddings;
			
	Lingua del contenuto
	
				English
			
	Nome del convegno
	
				26th International Conference on Applications of Natural Language to Information Systems, NLDB 2021
			
	Anno del convegno
	
				2021
			
	Curatori della monografia
	
				Elisabeth Métais, Farid Meziane, Helmut Horacek, Epaminondas Kapetanios
			
	Titolo degli atti
	
				Natural Language Processing and Information Systems. 26th International Conference on Applications of Natural Language to Information Systems, NLDB 2021, Saarbrücken, Germany, June 23–25, 2021, Proceedings
			
	ISBN del volume degli atti
	
				978-3-030-80598-2
			
	Collana o serie
	
				LECTURE NOTES IN ARTIFICIAL INTELLIGENCE
			
	Data di pubblicazione
	
				2021
			
	Numero del volume
	
				12801
			
	Pagina iniziale
	
				33
			
	Pagina finale
	
				45
			
	DOI dell'intervento
	
				https://dx.doi.org/10.1007/978-3-030-80599-9_4
			
	Fulltext
	
				none
			
	Citazione
	
				Terragni, S., Fersini, E., Messina, E. (2021). Word Embedding-Based Topic Similarity Measures. In Natural Language Processing and Information Systems. 26th International Conference on Applications of Natural Language to Information Systems, NLDB 2021, Saarbrücken, Germany, June 23–25, 2021, Proceedings (pp.33-45). Cham : Springer Nature [10.1007/978-3-030-80599-9_4].
			
	Appare nelle tipologie:
	
				02 - Intervento a convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/363098

Citazioni

28

15

Social impact