While Dense Retrieval Models (DRMs) have advanced Information Retrieval (IR), they often suffer from limited generalizability and robustness. Various studies address these limitations with representation learning techniques that leverage the Mixture-of-Experts (MoE) architecture. Unlike prior works in IR that integrate MoE within the Transformer layers of DRMs, we add a single MoE block (SB-MoE) after the output of the final Transformer layer. Our empirical evaluation investigates how SB-MoE compares, in terms of retrieval effectiveness, to standard model fine-tuning. Given MoEs sensitivity to its hyperparameters (i.e., the number of experts), we also investigate our model’s performance under different expert configurations. Results show that SB-MoE is particularly effective for lightweight DRMs, consistently outperforming their fine-tuned counterparts. For larger DRMs, SB-MoE requires more training data to deliver improved retrieval performance. Our code is available online at: https://anonymous.4open.science/r/DenseRetrievalMoE.
Sokli, E., Kasela, P., Peikos, G., Pasi, G. (2025). Investigating Mixture of Experts in Dense Retrieval. In Proceedings of the 15th Italian Information Retrieval Workshop (IIR 2025) (pp.101-107). CEUR-WS.
Investigating Mixture of Experts in Dense Retrieval
Sokli E.;Peikos G.;Pasi G.
2025
Abstract
While Dense Retrieval Models (DRMs) have advanced Information Retrieval (IR), they often suffer from limited generalizability and robustness. Various studies address these limitations with representation learning techniques that leverage the Mixture-of-Experts (MoE) architecture. Unlike prior works in IR that integrate MoE within the Transformer layers of DRMs, we add a single MoE block (SB-MoE) after the output of the final Transformer layer. Our empirical evaluation investigates how SB-MoE compares, in terms of retrieval effectiveness, to standard model fine-tuning. Given MoEs sensitivity to its hyperparameters (i.e., the number of experts), we also investigate our model’s performance under different expert configurations. Results show that SB-MoE is particularly effective for lightweight DRMs, consistently outperforming their fine-tuned counterparts. For larger DRMs, SB-MoE requires more training data to deliver improved retrieval performance. Our code is available online at: https://anonymous.4open.science/r/DenseRetrievalMoE.| File | Dimensione | Formato | |
|---|---|---|---|
|
Sokli et al-2025-Italian Information Retrieval Workshop-CEUR-VoR.pdf
accesso aperto
Tipologia di allegato:
Publisher’s Version (Version of Record, VoR)
Licenza:
Creative Commons
Dimensione
402.19 kB
Formato
Adobe PDF
|
402.19 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


