Objectives: Establishing causal dependencies is crucial in applied domains, such as medicine and healthcare, where decision-making must be explainable. In these settings, small sample sizes and missing data call for federated approaches to maximise the amount of information we can use. Methods: We propose a novel federated causal discovery algorithm capable of pooling information from multiple sources with heterogeneous missing data to learn a graph representing cause–effect relationships. In particular, we learn a causal graph on a centralised server while taking into account both prior knowledge and missingness mechanism specific to each client. Results: We applied the proposed algorithm to synthetic data and real-world data from a multicentric study on endometrial cancer, validating the obtained causal graph through quantitative analyses and a clinical literature review. Conclusion: Our approach learns an accurate model despite data missing not-at-random.

Zanga, A., Bernasconi, A., Lucas, P., Pijnenborg, H., Reijnen, C., Scutari, M., et al. (2025). Federated causal discovery with missing data in a multicentric study on endometrial cancer. JOURNAL OF BIOMEDICAL INFORMATICS, 169(September 2025) [10.1016/j.jbi.2025.104877].

Federated causal discovery with missing data in a multicentric study on endometrial cancer

Zanga, Alessio
Primo
;
Bernasconi, Alice
Secondo
;
2025

Abstract

Objectives: Establishing causal dependencies is crucial in applied domains, such as medicine and healthcare, where decision-making must be explainable. In these settings, small sample sizes and missing data call for federated approaches to maximise the amount of information we can use. Methods: We propose a novel federated causal discovery algorithm capable of pooling information from multiple sources with heterogeneous missing data to learn a graph representing cause–effect relationships. In particular, we learn a causal graph on a centralised server while taking into account both prior knowledge and missingness mechanism specific to each client. Results: We applied the proposed algorithm to synthetic data and real-world data from a multicentric study on endometrial cancer, validating the obtained causal graph through quantitative analyses and a clinical literature review. Conclusion: Our approach learns an accurate model despite data missing not-at-random.
Articolo in rivista - Articolo scientifico
Federated learning; Missing data; Multiple sources;
English
22-lug-2025
2025
169
September 2025
104877
open
Zanga, A., Bernasconi, A., Lucas, P., Pijnenborg, H., Reijnen, C., Scutari, M., et al. (2025). Federated causal discovery with missing data in a multicentric study on endometrial cancer. JOURNAL OF BIOMEDICAL INFORMATICS, 169(September 2025) [10.1016/j.jbi.2025.104877].
File in questo prodotto:
File Dimensione Formato  
Zanga-2025-J Biomedical Informatics-AAM.pdf

accesso aperto

Descrizione: the VoR PDF is not yet available on the publisher's website
Tipologia di allegato: Author’s Accepted Manuscript, AAM (Post-print)
Licenza: Creative Commons
Dimensione 2.26 MB
Formato Adobe PDF
2.26 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/562249
Citazioni
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 1
Social impact