Searching in a domain-specific corpus of structured documents (e.g., e-commerce, media streaming services, job-seeking platforms) is often managed as a traditional retrieval task or through faceted search. Semantic Query Labeling --- the task of locating the constituent parts of a query and assigning domain-specific predefined semantic labels to each of them --- allows leveraging the structure of documents during retrieval while leaving unaltered the keyword-based query formulation. Due to both the lack of a publicly available dataset and the high cost of producing one, there have been few published works in this regard. In this paper, basing on the assumption that a corpus already contains the information the users search, we propose a method for the automatic generation of semantically labeled queries and show that a semantic tagger --- based on BERT, gazetteers-based features, and Conditional Random Fields --- trained on our synthetic queries achieves results comparable to those obtained by the same model trained on real-world data. We also provide a large dataset of manually annotated queries in the movie domain suitable for studying Semantic Query Labeling. We hope that the public availability of this dataset will stimulate future research in this area.

Bassani, E., Pasi, G. (2021). Semantic Query Labeling Through Synthetic Query Generation. In SIGIR '21: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 11-15 Jul 2021, Virtual conference (pp.2278-2282) [10.1145/3404835.3463071].

Semantic Query Labeling Through Synthetic Query Generation

Bassani, E;Pasi, G
2021

Abstract

Searching in a domain-specific corpus of structured documents (e.g., e-commerce, media streaming services, job-seeking platforms) is often managed as a traditional retrieval task or through faceted search. Semantic Query Labeling --- the task of locating the constituent parts of a query and assigning domain-specific predefined semantic labels to each of them --- allows leveraging the structure of documents during retrieval while leaving unaltered the keyword-based query formulation. Due to both the lack of a publicly available dataset and the high cost of producing one, there have been few published works in this regard. In this paper, basing on the assumption that a corpus already contains the information the users search, we propose a method for the automatic generation of semantically labeled queries and show that a semantic tagger --- based on BERT, gazetteers-based features, and Conditional Random Fields --- trained on our synthetic queries achieves results comparable to those obtained by the same model trained on real-world data. We also provide a large dataset of manually annotated queries in the movie domain suitable for studying Semantic Query Labeling. We hope that the public availability of this dataset will stimulate future research in this area.
poster + paper
Semantic query labeling, Query generation, Vertical search
English
44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’21)
2021
SIGIR '21: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 11-15 Jul 2021, Virtual conference
9781450380379
2021
2278
2282
none
Bassani, E., Pasi, G. (2021). Semantic Query Labeling Through Synthetic Query Generation. In SIGIR '21: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 11-15 Jul 2021, Virtual conference (pp.2278-2282) [10.1145/3404835.3463071].
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/328432
Citazioni
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 1
Social impact