BigData users and the BigData research community are expanding rapidly, while statisticians at large are seemingly becoming divided between those who are enthusiastic and those who are concerned, if not downright hostile. Is BigData also a big step ahead, truly advancing our ability to extract meaningful information and actual knowledge from data? Is BigData underplaying traditional statistical inference as we know it, supplanting survey methodology as a low-cost futuristic option? In this paper I will attempt to unravel the multifaceted relationship bridging BigData to sampling methodology. Starting by reasoning why it should be interesting to look at BigData from a sampling statistician’s perspective, I will delve deeper into the somewhat ambiguous definition of BigData and share some very personal considerations and views on the matter. In the process, several open questions will arise while discussing a personal selection of insights that are traceable through the vast body of statistical literature around BigData and sampling methodology. The discussion will take various angles explored across nine key points, and it will conclude with a forward-looking perspective on a main challenge for future research: addressing the strong assumptions needed to manage deviations from purely randomized data collection.

Mecatti, F. (2025). Bridging BigData and sampling methodology: What is big and where is the bridge?. SURVEY METHODOLOGY, 51(1), 145-168.

Bridging BigData and sampling methodology: What is big and where is the bridge?

Mecatti , F.
Primo
2025

Abstract

BigData users and the BigData research community are expanding rapidly, while statisticians at large are seemingly becoming divided between those who are enthusiastic and those who are concerned, if not downright hostile. Is BigData also a big step ahead, truly advancing our ability to extract meaningful information and actual knowledge from data? Is BigData underplaying traditional statistical inference as we know it, supplanting survey methodology as a low-cost futuristic option? In this paper I will attempt to unravel the multifaceted relationship bridging BigData to sampling methodology. Starting by reasoning why it should be interesting to look at BigData from a sampling statistician’s perspective, I will delve deeper into the somewhat ambiguous definition of BigData and share some very personal considerations and views on the matter. In the process, several open questions will arise while discussing a personal selection of insights that are traceable through the vast body of statistical literature around BigData and sampling methodology. The discussion will take various angles explored across nine key points, and it will conclude with a forward-looking perspective on a main challenge for future research: addressing the strong assumptions needed to manage deviations from purely randomized data collection.
Articolo in rivista - Articolo scientifico
Bayesian network models; Causal inference; Data quality; Digital data and sources; Non-probability samples; Observational data;
English
30-giu-2025
2025
51
1
145
168
open
Mecatti, F. (2025). Bridging BigData and sampling methodology: What is big and where is the bridge?. SURVEY METHODOLOGY, 51(1), 145-168.
File in questo prodotto:
File Dimensione Formato  
Mecatti-2025-Survey Methodology-VoR.pdf

accesso aperto

Tipologia di allegato: Publisher’s Version (Version of Record, VoR)
Licenza: Licenza open access specifica dell’editore
Dimensione 516.78 kB
Formato Adobe PDF
516.78 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/561301
Citazioni
  • Scopus 1
  • ???jsp.display-item.citation.isi??? ND
Social impact