Generating research hypotheses is a crucial step in scientific investigation that involves the creation of precise, verifiable, and logically valid statements that can be empirically examined. Therefore, many efforts have been made to automate or assist this process through the use of various Artificial Intelligence solutions. However, most existing methods are tailored to very specific domains, particularly within the biomedical field. There have been recent attempts to formalize hypothesis generation as a link prediction task over knowledge graphs. This solution is potentially domain-independent and applicable across diverse disciplines. Nevertheless, current approaches for link prediction, which typically rely on embedding models or path-based methods, have shown limited success in accurately predicting new hypotheses. To address these limitations, this paper introduces ResearchLink, an innovative and domain-independent methodology for hypothesis generation over knowledge graphs. ResearchLink combines path-based features and knowledge graph embeddings with text embeddings, capturing the semantic context of entities within a given corpus, and integrates additional information from bibliometric databases to improve research collaboration predictions. To conduct a rigorous evaluation of ResearchLink, we constructed CSKG-600, a new dataset for hypothesis generation, consisting of 600 statements that were manually labeled by domain experts. ResearchLink achieved outstanding performance (78.7% P@20), significantly outperforming alternative approaches such as TransH (71.8%), TransD (71.7%), and RotatE (70.7%).

Borrego, A., Dessì, D., Ayala, D., Hernández, I., Osborne, F., Reforgiato Recupero, D., et al. (2025). Research hypothesis generation over scientific knowledge graphs. KNOWLEDGE-BASED SYSTEMS, 315(22 April 2025) [10.1016/j.knosys.2025.113280].

Research hypothesis generation over scientific knowledge graphs

Osborne F.;
2025

Abstract

Generating research hypotheses is a crucial step in scientific investigation that involves the creation of precise, verifiable, and logically valid statements that can be empirically examined. Therefore, many efforts have been made to automate or assist this process through the use of various Artificial Intelligence solutions. However, most existing methods are tailored to very specific domains, particularly within the biomedical field. There have been recent attempts to formalize hypothesis generation as a link prediction task over knowledge graphs. This solution is potentially domain-independent and applicable across diverse disciplines. Nevertheless, current approaches for link prediction, which typically rely on embedding models or path-based methods, have shown limited success in accurately predicting new hypotheses. To address these limitations, this paper introduces ResearchLink, an innovative and domain-independent methodology for hypothesis generation over knowledge graphs. ResearchLink combines path-based features and knowledge graph embeddings with text embeddings, capturing the semantic context of entities within a given corpus, and integrates additional information from bibliometric databases to improve research collaboration predictions. To conduct a rigorous evaluation of ResearchLink, we constructed CSKG-600, a new dataset for hypothesis generation, consisting of 600 statements that were manually labeled by domain experts. ResearchLink achieved outstanding performance (78.7% P@20), significantly outperforming alternative approaches such as TransH (71.8%), TransD (71.7%), and RotatE (70.7%).
Articolo in rivista - Articolo scientifico
Artificial Intelligence; Hypothesis generation; Knowledge graphs; Link prediction; Scholarly domain; Scientific facts;
English
8-mar-2025
2025
315
22 April 2025
113280
open
Borrego, A., Dessì, D., Ayala, D., Hernández, I., Osborne, F., Reforgiato Recupero, D., et al. (2025). Research hypothesis generation over scientific knowledge graphs. KNOWLEDGE-BASED SYSTEMS, 315(22 April 2025) [10.1016/j.knosys.2025.113280].
File in questo prodotto:
File Dimensione Formato  
Borrego-2025-Knowledge-Based Systems-VoR.pdf

accesso aperto

Descrizione: This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
Tipologia di allegato: Publisher’s Version (Version of Record, VoR)
Licenza: Creative Commons
Dimensione 1.55 MB
Formato Adobe PDF
1.55 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/547826
Citazioni
  • Scopus 3
  • ???jsp.display-item.citation.isi??? 0
Social impact