Big data, observational research and P-value: a recipe for false-positive findings? A study of simulated and real prospective cohorts

Veronesi, G; Grassi, G; Savelli, G; Quatto, P; Zambon, A

doi:10.1093/ije/dyz206

BACKGROUND: An increasing number of observational studies combine large sample sizes with low participation rates, which could lead to standard inference failing to control the false-discovery rate. We investigated if the 'empirical calibration of P-value' method (EPCV), reliant on negative controls, can preserve type I error in the context of survival analysis. METHODS: We used simulated cohort studies with 50% participation rate and two different selection bias mechanisms, and a real-life application on predictors of cancer mortality using data from four population-based cohorts in Northern Italy (n = 6976 men and women aged 25-74 years at baseline and 17 years of median follow-up). RESULTS: Type I error for the standard Cox model was above the 5% nominal level in 15 out of 16 simulated settings; for n = 10 000, the chances of a null association with hazard ratio = 1.05 having a P-value < 0.05 were 42.5%. Conversely, EPCV with 10 negative controls preserved the 5% nominal level in all the simulation settings, reducing bias in the point estimate by 80-90% when its main assumption was verified. In the real case, 15 out of 21 (71%) blood markers with no association with cancer mortality according to literature had a P-value < 0.05 in age- and gender-adjusted Cox models. After calibration, only 1 (4.8%) remained statistically significant. CONCLUSIONS: In the analyses of large observational studies prone to selection bias, the use of empirical distribution to calibrate P-values can substantially reduce the number of trivial results needing further screening for relevance and external validity.

Veronesi, G., Grassi, G., Savelli, G., Quatto, P., Zambon, A. (2020). Big data, observational research and P-value: a recipe for false-positive findings? A study of simulated and real prospective cohorts. INTERNATIONAL JOURNAL OF EPIDEMIOLOGY, 49(3 (June 2020)), 876-884 [10.1093/ije/dyz206].

Big data, observational research and P-value: a recipe for false-positive findings? A study of simulated and real prospective cohorts

Veronesi, Giovanni;Grassi, Guido;Savelli, Giordano;Quatto, Piero;Zambon, Antonella

2020

Abstract

BACKGROUND: An increasing number of observational studies combine large sample sizes with low participation rates, which could lead to standard inference failing to control the false-discovery rate. We investigated if the 'empirical calibration of P-value' method (EPCV), reliant on negative controls, can preserve type I error in the context of survival analysis. METHODS: We used simulated cohort studies with 50% participation rate and two different selection bias mechanisms, and a real-life application on predictors of cancer mortality using data from four population-based cohorts in Northern Italy (n = 6976 men and women aged 25-74 years at baseline and 17 years of median follow-up). RESULTS: Type I error for the standard Cox model was above the 5% nominal level in 15 out of 16 simulated settings; for n = 10 000, the chances of a null association with hazard ratio = 1.05 having a P-value < 0.05 were 42.5%. Conversely, EPCV with 10 negative controls preserved the 5% nominal level in all the simulation settings, reducing bias in the point estimate by 80-90% when its main assumption was verified. In the real case, 15 out of 21 (71%) blood markers with no association with cancer mortality according to literature had a P-value < 0.05 in age- and gender-adjusted Cox models. After calibration, only 1 (4.8%) remained statistically significant. CONCLUSIONS: In the analyses of large observational studies prone to selection bias, the use of empirical distribution to calibrate P-values can substantially reduce the number of trivial results needing further screening for relevance and external validity.

Scheda breve

Scheda completa

Scheda completa (DC)

	Sottotipologia
	
				Articolo in rivista - Articolo scientifico
			
	Parole chiave
	
				Big data; Calibration of P-value; Cohort studies; Observational studies; Selection bias; Survival analysis;
			
	Parole chiave
	
				Observational studies; big data; calibration of P-value; cohort studies; selection bias; survival analysis
			
	Lingua del contenuto
	
				English
			
	Data ahead of print o Data prima pubblicazione Online
	
				16-ott-2019
			
	Data di pubblicazione
	
				2020
			
	Rivista
	
				INTERNATIONAL JOURNAL OF EPIDEMIOLOGY
			
	Numero del volume
	
				49
			
	Fascicolo
	
				3 (June 2020)
			
	Pagina iniziale
	
				876
			
	Pagina finale
	
				884
			
	DOI dell'articolo
	
				https://dx.doi.org/10.1093/ije/dyz206
			
	Fulltext
	
				open
			
	Citazione
	
				Veronesi, G., Grassi, G., Savelli, G., Quatto, P., Zambon, A. (2020). Big data, observational research and P-value: a recipe for false-positive findings? A study of simulated and real prospective cohorts. INTERNATIONAL JOURNAL OF EPIDEMIOLOGY, 49(3 (June 2020)), 876-884 [10.1093/ije/dyz206].
			
	Appare nelle tipologie:
	
				01 - Articolo su rivista

File in questo prodotto:

File	Dimensione	Formato
10281-259407_VoR.pdf accesso aperto Tipologia di allegato: Publisher’s Version (Version of Record, VoR) Licenza: Creative Commons Dimensione 542.19 kB Formato Adobe PDF Visualizza/Apri	542.19 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/259407

Citazioni

5

4

Bicocca Open Archive

Big data, observational research and P-value: a recipe for false-positive findings? A study of simulated and real prospective cohorts

Veronesi, Giovanni;Grassi, Guido;Savelli, Giordano;Quatto, Piero;Zambon, Antonella

2020

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

Citazioni

Social impact

Bicocca Open Archive

Big data, observational research and P-value: a recipe for false-positive findings? A study of simulated and real prospective cohorts

Veronesi, Giovanni;Grassi, Guido;Savelli, Giordano;Quatto, Piero;Zambon, Antonella

2020

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Citazioni

Social impact

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)