Aiming to reduce health misinformation in Web search, we present a system for Health Information Retrieval (HIR) that ranks documents according to both their topical relevance and correctness (i.e., factuality). The system first segments documents into passages and then employs a Large Language Model (LLM) to identify and extract claims from each passage. For each claim, we formulate corresponding SPARQL queries and execute them against a Knowledge Graph (KG) extracted from a subset of DBpedia, allowing us to estimate the correctness of claims and, hence, a correctness score for documents. Topical relevance is estimated with the BM25 algorithm, which is used to produce the initial ranking of documents. To generate the final ranking, the system combines each document’s pre-computed correctness score with its topical relevance score. While existing approaches rely on machine learning or LLMs to verify correctness, our KG-based methodology enables transparent fact-checking by grounding its assessments in structured knowledge. Our approach is empirically evaluated using three TREC Health Misinformation collections (2020–2022).
Milanese, G., Peikos, G., Pasi, G., Viviani, M. (2025). Fact-Driven Health Information Retrieval: Integrating LLMs and Knowledge Graphs to Combat Misinformation. In Advances in Information Retrieval. ECIR 2025 (pp.192-200) [10.1007/978-3-031-88714-7_17].
Fact-Driven Health Information Retrieval: Integrating LLMs and Knowledge Graphs to Combat Misinformation
Milanese, Gian Carlo;Peikos, Georgios;Pasi, Gabriella;Viviani, Marco
2025
Abstract
Aiming to reduce health misinformation in Web search, we present a system for Health Information Retrieval (HIR) that ranks documents according to both their topical relevance and correctness (i.e., factuality). The system first segments documents into passages and then employs a Large Language Model (LLM) to identify and extract claims from each passage. For each claim, we formulate corresponding SPARQL queries and execute them against a Knowledge Graph (KG) extracted from a subset of DBpedia, allowing us to estimate the correctness of claims and, hence, a correctness score for documents. Topical relevance is estimated with the BM25 algorithm, which is used to produce the initial ranking of documents. To generate the final ranking, the system combines each document’s pre-computed correctness score with its topical relevance score. While existing approaches rely on machine learning or LLMs to verify correctness, our KG-based methodology enables transparent fact-checking by grounding its assessments in structured knowledge. Our approach is empirically evaluated using three TREC Health Misinformation collections (2020–2022).File | Dimensione | Formato | |
---|---|---|---|
Milanese-2025-ECIR 2025-VoR.pdf
Solo gestori archivio
Tipologia di allegato:
Publisher’s Version (Version of Record, VoR)
Licenza:
Tutti i diritti riservati
Dimensione
1.27 MB
Formato
Adobe PDF
|
1.27 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.