The analysis of Visually-Rich Documents (VRDs) is crucial in the banking sector to support Trend and Risk Analysis (TRA) as financial TRA documents are multimodal to a large extent. Recently, Retrieval Augmented Generation (RAG) systems have enabled the effective use of Large Language Models (LLMs) to answer questions related to multimodal content. However, the inherent verbosity and complexity of financial documents could degrade the quality of the generated answers. In this work, we explore the use of text summarization techniques to condense the information retrieved from TRA-related VRDs. We analyze the level of synthesis of the original RAG answers, both with and without cascading an ad hoc summarization step. We apply summarization performance measures to compare standard RAG answers with the summarization outputs achieved on the retrieved passages directly. The results show that proprietary LLMs (GPT-4o) significantly improve the RAG's ability to sum up the retrieved passages, whereas integrating open-source LLMs or traditional summarizers turns out to be not beneficial even while applying the summarization step on top of the RAG answer.
Gallipoli, G., Cagliero, L., Mosca, A., Miola, A., Borghi, D. (2025). Retrieval Augmented Generation of Summarized Answers on Visually-Rich Documents for Trend and Risk Analysis. In Proceedings of the Workshops of the EDBT/ICDT 2025 Joint Conference co-located with the EDBT/ICDT 2025 Joint Conference (pp.1-7). CEUR-WS.
Retrieval Augmented Generation of Summarized Answers on Visually-Rich Documents for Trend and Risk Analysis
Miola, A;
2025
Abstract
The analysis of Visually-Rich Documents (VRDs) is crucial in the banking sector to support Trend and Risk Analysis (TRA) as financial TRA documents are multimodal to a large extent. Recently, Retrieval Augmented Generation (RAG) systems have enabled the effective use of Large Language Models (LLMs) to answer questions related to multimodal content. However, the inherent verbosity and complexity of financial documents could degrade the quality of the generated answers. In this work, we explore the use of text summarization techniques to condense the information retrieved from TRA-related VRDs. We analyze the level of synthesis of the original RAG answers, both with and without cascading an ad hoc summarization step. We apply summarization performance measures to compare standard RAG answers with the summarization outputs achieved on the retrieved passages directly. The results show that proprietary LLMs (GPT-4o) significantly improve the RAG's ability to sum up the retrieved passages, whereas integrating open-source LLMs or traditional summarizers turns out to be not beneficial even while applying the summarization step on top of the RAG answer.| File | Dimensione | Formato | |
|---|---|---|---|
|
Gallipoli et al-2025-EDBT/ICDT-CEUR-VoR.pdf
accesso aperto
Descrizione: This volume and its papers are published under the Creative Commons License Attribution 4.0 International (CC BY 4.0).
Tipologia di allegato:
Publisher’s Version (Version of Record, VoR)
Licenza:
Creative Commons
Dimensione
1.34 MB
Formato
Adobe PDF
|
1.34 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


