Synthetic Data Generation (SDG) is expanding rapidly, yet existing surveys differ widely in scope and methodological quality. This tertiary study systematically searched four major scholarly databases (2015-2025) and, after PRISMA screening and DARE-4 appraisal,1 identified 17 eligible secondary studies. The evidence reveals a strong concentration in healthcare (58.8% of surveys), limited coverage of non-health domains, and inconsistent reporting of evaluation protocols (e.g., incomplete specification of metrics, data splits, baselines, or evaluation scripts). Fidelity and downstream utility dominate assessment practices, whereas privacy and diversity remain under-examined. Only 4 of 17 surveys provide any reproducibility artefacts. By consolidating these findings, we propose a compact, domain-agnostic evaluation baseline and highlight structural gaps in transparency, domain breadth, and methodological consistency. The study offers actionable guidance for strengthening reproducibility and broadening the evidential foundations of SDG research.

Nobani, N., Officioso, G., Pallucchini, F., Sperlì, G., Mercorio, F. (2026). Synthetic data generation: A tertiary study. INFORMATION PROCESSING & MANAGEMENT, 63(6 (September 2026)) [10.1016/j.ipm.2026.104715].

Synthetic data generation: A tertiary study

Nobani, N
Primo
;
Officioso, G
Secondo
;
Pallucchini, F;Mercorio, F
Ultimo
2026

Abstract

Synthetic Data Generation (SDG) is expanding rapidly, yet existing surveys differ widely in scope and methodological quality. This tertiary study systematically searched four major scholarly databases (2015-2025) and, after PRISMA screening and DARE-4 appraisal,1 identified 17 eligible secondary studies. The evidence reveals a strong concentration in healthcare (58.8% of surveys), limited coverage of non-health domains, and inconsistent reporting of evaluation protocols (e.g., incomplete specification of metrics, data splits, baselines, or evaluation scripts). Fidelity and downstream utility dominate assessment practices, whereas privacy and diversity remain under-examined. Only 4 of 17 surveys provide any reproducibility artefacts. By consolidating these findings, we propose a compact, domain-agnostic evaluation baseline and highlight structural gaps in transparency, domain breadth, and methodological consistency. The study offers actionable guidance for strengthening reproducibility and broadening the evidential foundations of SDG research.
Articolo in rivista - Articolo scientifico
Synthetic data generation,Tertiary study, Survey of surveys, Machine learning, Data privacy, Evaluation metrics
English
16-mar-2026
2026
63
6 (September 2026)
104715
open
Nobani, N., Officioso, G., Pallucchini, F., Sperlì, G., Mercorio, F. (2026). Synthetic data generation: A tertiary study. INFORMATION PROCESSING & MANAGEMENT, 63(6 (September 2026)) [10.1016/j.ipm.2026.104715].
File in questo prodotto:
File Dimensione Formato  
Nobani et al-2026-Information Processing & Management-VoR.pdf

accesso aperto

Tipologia di allegato: Publisher’s Version (Version of Record, VoR)
Licenza: Creative Commons
Dimensione 4.34 MB
Formato Adobe PDF
4.34 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/597341
Citazioni
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
Social impact