We propose a method for automatic optimization of pseudo relevance feedback (PRF) in information retrieval. Based on the conjecture that the initial query’s contribution to the final query may not be necessary once a good model is built from pseudo relevant documents, we set out to optimize per query only the number of top-retrieved documents to be used for feedback. The optimization is based on several query performance predictors for the initial query, by building a linear regression model discovering the optimal machine learning pipeline via genetic programming. Even by using only 50–100 training queries, the method yields statistically-significant improvements in MAP of 18–35% over the initial query, 7–11% over the feedback model with the best fixed number of pseudo-relevant documents, and up to 10% (5.5% on median) over the standard method of optimizing both the balance coefficient and the number of feedback documents by grid-search in the training set. Compared to state-of-the-art PRF methods from the recent literature, our method outperforms by up to 21% with an average of 10%. Further analysis shows that we are still far from the method’s effectiveness ceiling (in contrast to the standard method), leaving amble room for further improvements.

Arampatzis, A., Peikos, G., Symeonidis, S. (2021). Pseudo relevance feedback optimization. INFORMATION RETRIEVAL, 24(4-5), 269-297 [10.1007/s10791-021-09393-5].

Pseudo relevance feedback optimization

Peikos G.
Secondo
;
2021

Abstract

We propose a method for automatic optimization of pseudo relevance feedback (PRF) in information retrieval. Based on the conjecture that the initial query’s contribution to the final query may not be necessary once a good model is built from pseudo relevant documents, we set out to optimize per query only the number of top-retrieved documents to be used for feedback. The optimization is based on several query performance predictors for the initial query, by building a linear regression model discovering the optimal machine learning pipeline via genetic programming. Even by using only 50–100 training queries, the method yields statistically-significant improvements in MAP of 18–35% over the initial query, 7–11% over the feedback model with the best fixed number of pseudo-relevant documents, and up to 10% (5.5% on median) over the standard method of optimizing both the balance coefficient and the number of feedback documents by grid-search in the training set. Compared to state-of-the-art PRF methods from the recent literature, our method outperforms by up to 21% with an average of 10%. Further analysis shows that we are still far from the method’s effectiveness ceiling (in contrast to the standard method), leaving amble room for further improvements.
Articolo in rivista - Articolo scientifico
Blind relevance feedback; Optimization; Pseudo relevance feedback; Query difficulty; Query performance predictors; Regression;
English
2021
24
4-5
269
297
reserved
Arampatzis, A., Peikos, G., Symeonidis, S. (2021). Pseudo relevance feedback optimization. INFORMATION RETRIEVAL, 24(4-5), 269-297 [10.1007/s10791-021-09393-5].
File in questo prodotto:
File Dimensione Formato  
Arampatzis-2021-Information Retrieval-VoR.pdf

Solo gestori archivio

Tipologia di allegato: Publisher’s Version (Version of Record, VoR)
Licenza: Tutti i diritti riservati
Dimensione 2.37 MB
Formato Adobe PDF
2.37 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/484379
Citazioni
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 2
Social impact