Bicocca Open Archive

Enriching sentences with knowledge from qualitative sources benefits various NLP tasks and enhances the use of labelled data in model training. This is crucial for Financial Sentiment Analysis (FSA), where texts are often brief and contain implied information. We introduce RE-FIN (Retrieval-based Enrichment for FINancial data), an automated system designed to retrieve information from a knowledge base to enrich financial sentences, making them more knowledge-dense and explicit. RE-FIN generates propositions from the knowledge base and employs Retrieval-Augmented Generation (RAG) to augment the original text with relevant information. A large language model (LLM) rewrites the original sentence, incorporating this data. Since the LLM does not create new content, the risk of hallucinations is significantly reduced. The LLM generates multiple new sentences using different relevant information from the knowledge base; we developed an algorithm to select one that best preserves the meaning of the original sentence while avoiding excessive syntactic similarity. Results show that enhanced sentences present lower perplexity than the original ones and improve performances on FSA.

Malandri, L., Mercorio, F., Mezzanzanica, M., Pallucchini, F. (2025). RE-FIN: Retrieval-based Enrichment for Financial data. In Proceedings - International Conference on Computational Linguistics, COLING (pp.751-759). Association for Computational Linguistics (ACL).

RE-FIN: Retrieval-based Enrichment for Financial data

Malandri L.;Mercorio F.;Mezzanzanica M.;Pallucchini F.

2025

Abstract

Enriching sentences with knowledge from qualitative sources benefits various NLP tasks and enhances the use of labelled data in model training. This is crucial for Financial Sentiment Analysis (FSA), where texts are often brief and contain implied information. We introduce RE-FIN (Retrieval-based Enrichment for FINancial data), an automated system designed to retrieve information from a knowledge base to enrich financial sentences, making them more knowledge-dense and explicit. RE-FIN generates propositions from the knowledge base and employs Retrieval-Augmented Generation (RAG) to augment the original text with relevant information. A large language model (LLM) rewrites the original sentence, incorporating this data. Since the LLM does not create new content, the risk of hallucinations is significantly reduced. The LLM generates multiple new sentences using different relevant information from the knowledge base; we developed an algorithm to select one that best preserves the meaning of the original sentence while avoiding excessive syntactic similarity. Results show that enhanced sentences present lower perplexity than the original ones and improve performances on FSA.

Scheda breve

Scheda completa

Scheda completa (DC)

	Tipo di intervento
	
				paper
			
	Parole chiave
	
				machine learning; word embedding; artificial intellingece; nlp
			
	Lingua del contenuto
	
				English
			
	Nome del convegno
	
				31st International Conference on Computational Linguistics, COLING 2025 - 19 January 2025through 24 January
			
	Anno del convegno
	
				2025
			
	Curatori della monografia
	
				Rambow O., Wanner L., Apidianaki M., Al-Khalifa H., Di Eugenio B., Schockaert S., Darwish K., Agarwal A.
			
	Titolo degli atti
	
				Proceedings - International Conference on Computational Linguistics, COLING
			
	ISBN del volume degli atti
	
				9798891761971
			
	Collana o serie
	
				INTERNATIONAL CONFERENCE ON COMPUTATIONAL LINGUISTICS
			
	Data di pubblicazione
	
				2025
			
	Pagina iniziale
	
				751
			
	Pagina finale
	
				759
			
	Fulltext
	
				none
			
	Citazione
	
				Malandri, L., Mercorio, F., Mezzanzanica, M., Pallucchini, F. (2025). RE-FIN: Retrieval-based Enrichment for Financial data. In Proceedings - International Conference on Computational Linguistics, COLING (pp.751-759). Association for Computational Linguistics (ACL).
			
	Appare nelle tipologie:
	
				02 - Intervento a convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/555041

Citazioni

0

ND

Social impact