Topic models aim at discovering a set of hidden themes in a text corpus. A user might be interested in identifying the most similar topics of a given theme of interest. To accomplish this task, several similarity and distance metrics can be adopted. In this paper, we provide a comparison of the state-of-the-art topic similarity measures and propose novel metrics based on word embeddings. The proposed measures can overcome some limitations of the existing approaches, highlighting good capabilities in terms of several topic performance measures on benchmark datasets.

Terragni, S., Fersini, E., Messina, E. (2021). Word Embedding-Based Topic Similarity Measures. In Natural Language Processing and Information Systems. 26th International Conference on Applications of Natural Language to Information Systems, NLDB 2021, Saarbrücken, Germany, June 23–25, 2021, Proceedings (pp.33-45). Cham : Springer Nature [10.1007/978-3-030-80599-9_4].

Word Embedding-Based Topic Similarity Measures

Terragni, S;Fersini, E;Messina, E
2021

Abstract

Topic models aim at discovering a set of hidden themes in a text corpus. A user might be interested in identifying the most similar topics of a given theme of interest. To accomplish this task, several similarity and distance metrics can be adopted. In this paper, we provide a comparison of the state-of-the-art topic similarity measures and propose novel metrics based on word embeddings. The proposed measures can overcome some limitations of the existing approaches, highlighting good capabilities in terms of several topic performance measures on benchmark datasets.
paper
Topic modeling; Topic similarity; Word embeddings;
English
26th International Conference on Applications of Natural Language to Information Systems, NLDB 2021
2021
Elisabeth Métais, Farid Meziane, Helmut Horacek, Epaminondas Kapetanios
Natural Language Processing and Information Systems. 26th International Conference on Applications of Natural Language to Information Systems, NLDB 2021, Saarbrücken, Germany, June 23–25, 2021, Proceedings
978-3-030-80598-2
2021
12801
33
45
none
Terragni, S., Fersini, E., Messina, E. (2021). Word Embedding-Based Topic Similarity Measures. In Natural Language Processing and Information Systems. 26th International Conference on Applications of Natural Language to Information Systems, NLDB 2021, Saarbrücken, Germany, June 23–25, 2021, Proceedings (pp.33-45). Cham : Springer Nature [10.1007/978-3-030-80599-9_4].
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/363098
Citazioni
  • Scopus 23
  • ???jsp.display-item.citation.isi??? 9
Social impact