In recent years, there has been a remarkable proliferation of big data across various fields of society. A significant portion of this data, which encompasses open administrative records as well as unstructured data sources such as crowdsourcing, web scraping, and social media platforms, has become publicly available. While harnessing vast quantities of data might seem promising, it isn't always adequate for uncovering all answers in scientific research or practical endeavours. Web and open data, due to their inherent imperfections and biases, shouldn't be directly used into statistical models or analytical tools like machine learning or AI, as they can severely distort empirical findings and skew any conclusions. Alternatively, the methodologies developed so far must be expanded to have the capacity to handle the uncertainty and biases inherent in this type of data.
Negli ultimi anni, c'è stata una notevole proliferazione di big data in vari settori della società. Una parte significativa di questi dati, che comprende sia registri amministrativi aperti che fonti di dati non strutturati come il crowdsourcing, lo scraping web e le piattaforme di social media, è diventata pubblicamente disponibile. Anche se sfruttare grandi quantità di dati potrebbe sembrare promettente, non è sempre adeguato per ricevere tutte le risposte nella ricerca scientifica o negli utilizzi pratici. I dati web e aperti, a causa delle loro imperfezioni e bias intrinseci, non dovrebbero essere utilizzati direttamente nei modelli statistici o negli strumenti analitici come il machine learning o l'AI, poiché possono distorcere gravemente i risultati empirici e qualsiasi conclusione. In alternativa, le metodologie sviluppate fino ad ora devono essere ampliate per avere la capacità di gestire l'incertezza e i bias intrinseci in questo tipo di dati.
(2024). Methods for Extracting Valuable Information from Spatial Web and Open Reliable Data. (Tesi di dottorato, Università degli Studi di Milano-Bicocca, 2024).
Methods for Extracting Valuable Information from Spatial Web and Open Reliable Data
NARDELLI, VINCENZO
2024
Abstract
In recent years, there has been a remarkable proliferation of big data across various fields of society. A significant portion of this data, which encompasses open administrative records as well as unstructured data sources such as crowdsourcing, web scraping, and social media platforms, has become publicly available. While harnessing vast quantities of data might seem promising, it isn't always adequate for uncovering all answers in scientific research or practical endeavours. Web and open data, due to their inherent imperfections and biases, shouldn't be directly used into statistical models or analytical tools like machine learning or AI, as they can severely distort empirical findings and skew any conclusions. Alternatively, the methodologies developed so far must be expanded to have the capacity to handle the uncertainty and biases inherent in this type of data.File | Dimensione | Formato | |
---|---|---|---|
phd_unimib_854324.pdf
accesso aperto
Descrizione: Methods for Extracting Valuable Information from Spatial Web and Open Reliable Data
Tipologia di allegato:
Doctoral thesis
Dimensione
2.76 MB
Formato
Adobe PDF
|
2.76 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.