Enriching sentences with knowledge from qualitative sources benefits various NLP tasks and enhances the use of labelled data in model training. This is crucial for Financial Sentiment Analysis (FSA), where texts are often brief and contain implied information. We introduce RE-FIN (Retrieval-based Enrichment for FINancial data), an automated system designed to retrieve information from a knowledge base to enrich financial sentences, making them more knowledge-dense and explicit. RE-FIN generates propositions from the knowledge base and employs Retrieval-Augmented Generation (RAG) to augment the original text with relevant information. A large language model (LLM) rewrites the original sentence, incorporating this data. Since the LLM does not create new content, the risk of hallucinations is significantly reduced. The LLM generates multiple new sentences using different relevant information from the knowledge base; we developed an algorithm to select one that best preserves the meaning of the original sentence while avoiding excessive syntactic similarity. Results show that enhanced sentences present lower perplexity than the original ones and improve performances on FSA.

Malandri, L., Mercorio, F., Mezzanzanica, M., Pallucchini, F. (2025). RE-FIN: Retrieval-based Enrichment for Financial data. In Proceedings - International Conference on Computational Linguistics, COLING (pp.751-759). Association for Computational Linguistics (ACL).

RE-FIN: Retrieval-based Enrichment for Financial data

Malandri L.;Mercorio F.;Mezzanzanica M.;Pallucchini F.
2025

Abstract

Enriching sentences with knowledge from qualitative sources benefits various NLP tasks and enhances the use of labelled data in model training. This is crucial for Financial Sentiment Analysis (FSA), where texts are often brief and contain implied information. We introduce RE-FIN (Retrieval-based Enrichment for FINancial data), an automated system designed to retrieve information from a knowledge base to enrich financial sentences, making them more knowledge-dense and explicit. RE-FIN generates propositions from the knowledge base and employs Retrieval-Augmented Generation (RAG) to augment the original text with relevant information. A large language model (LLM) rewrites the original sentence, incorporating this data. Since the LLM does not create new content, the risk of hallucinations is significantly reduced. The LLM generates multiple new sentences using different relevant information from the knowledge base; we developed an algorithm to select one that best preserves the meaning of the original sentence while avoiding excessive syntactic similarity. Results show that enhanced sentences present lower perplexity than the original ones and improve performances on FSA.
paper
machine learning; word embedding; artificial intellingece; nlp
English
31st International Conference on Computational Linguistics, COLING 2025 - 19 January 2025through 24 January
2025
Rambow O., Wanner L., Apidianaki M., Al-Khalifa H., Di Eugenio B., Schockaert S., Darwish K., Agarwal A.
Proceedings - International Conference on Computational Linguistics, COLING
9798891761971
2025
751
759
none
Malandri, L., Mercorio, F., Mezzanzanica, M., Pallucchini, F. (2025). RE-FIN: Retrieval-based Enrichment for Financial data. In Proceedings - International Conference on Computational Linguistics, COLING (pp.751-759). Association for Computational Linguistics (ACL).
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/555041
Citazioni
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
Social impact