Pseudo relevance feedback optimization

Arampatzis, A; Peikos, G; Symeonidis, S

doi:10.1007/s10791-021-09393-5

We propose a method for automatic optimization of pseudo relevance feedback (PRF) in information retrieval. Based on the conjecture that the initial query’s contribution to the final query may not be necessary once a good model is built from pseudo relevant documents, we set out to optimize per query only the number of top-retrieved documents to be used for feedback. The optimization is based on several query performance predictors for the initial query, by building a linear regression model discovering the optimal machine learning pipeline via genetic programming. Even by using only 50–100 training queries, the method yields statistically-significant improvements in MAP of 18–35% over the initial query, 7–11% over the feedback model with the best fixed number of pseudo-relevant documents, and up to 10% (5.5% on median) over the standard method of optimizing both the balance coefficient and the number of feedback documents by grid-search in the training set. Compared to state-of-the-art PRF methods from the recent literature, our method outperforms by up to 21% with an average of 10%. Further analysis shows that we are still far from the method’s effectiveness ceiling (in contrast to the standard method), leaving amble room for further improvements.

Arampatzis, A., Peikos, G., Symeonidis, S. (2021). Pseudo relevance feedback optimization. INFORMATION RETRIEVAL, 24(4-5), 269-297 [10.1007/s10791-021-09393-5].

Pseudo relevance feedback optimization

Peikos G.^Secondo;Symeonidis S.^Ultimo

2021

Abstract

We propose a method for automatic optimization of pseudo relevance feedback (PRF) in information retrieval. Based on the conjecture that the initial query’s contribution to the final query may not be necessary once a good model is built from pseudo relevant documents, we set out to optimize per query only the number of top-retrieved documents to be used for feedback. The optimization is based on several query performance predictors for the initial query, by building a linear regression model discovering the optimal machine learning pipeline via genetic programming. Even by using only 50–100 training queries, the method yields statistically-significant improvements in MAP of 18–35% over the initial query, 7–11% over the feedback model with the best fixed number of pseudo-relevant documents, and up to 10% (5.5% on median) over the standard method of optimizing both the balance coefficient and the number of feedback documents by grid-search in the training set. Compared to state-of-the-art PRF methods from the recent literature, our method outperforms by up to 21% with an average of 10%. Further analysis shows that we are still far from the method’s effectiveness ceiling (in contrast to the standard method), leaving amble room for further improvements.

Scheda breve

Scheda completa

Scheda completa (DC)

	Sottotipologia
	
				Articolo in rivista - Articolo scientifico
			
	Parole chiave
	
				Blind relevance feedback; Optimization; Pseudo relevance feedback; Query difficulty; Query performance predictors; Regression;
			
	Lingua del contenuto
	
				English
			
	Data di pubblicazione
	
				2021
			
	Rivista
	
				INFORMATION RETRIEVAL
			
	Numero del volume
	
				24
			
	Fascicolo
	
				4-5
			
	Pagina iniziale
	
				269
			
	Pagina finale
	
				297
			
	DOI dell'articolo
	
				https://dx.doi.org/10.1007/s10791-021-09393-5
			
	Fulltext
	
				reserved
			
	Citazione
	
				Arampatzis, A., Peikos, G., Symeonidis, S. (2021). Pseudo relevance feedback optimization. INFORMATION RETRIEVAL, 24(4-5), 269-297 [10.1007/s10791-021-09393-5].
			
	Appare nelle tipologie:
	
				01 - Articolo su rivista

File in questo prodotto:

File	Dimensione	Formato
Arampatzis-2021-Information Retrieval-VoR.pdf Solo gestori archivio Tipologia di allegato: Publisher’s Version (Version of Record, VoR) Licenza: Tutti i diritti riservati Dimensione 2.37 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	2.37 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/484379

Citazioni

3

2

Bicocca Open Archive

Pseudo relevance feedback optimization

Peikos G.^Secondo;Symeonidis S.^Ultimo

Primo

Secondo

Ultimo

2021

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

Citazioni

Social impact

Bicocca Open Archive

Pseudo relevance feedback optimization

Arampatzis A. Primo;Peikos G.Secondo;Symeonidis S.Ultimo

Primo

Secondo

Ultimo

2021

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Citazioni

Social impact

Conferma cancellazione

Peikos G.^Secondo;Symeonidis S.^Ultimo

Scheda breve

Scheda completa

Scheda completa (DC)