In the last years, we observed a surge of interest in the statistical analysis of spatial data lying on or alongside networks. Car crashes, vehicle thefts, bicycle incidents, roadside kiosks, neuroanatomical features, and ambulance interventions are just a few of the most typical examples, whereas the edges of the network represent an abstraction of roads, rivers, railways, cargo-ship routes or nerve fibers. This type of data is interesting for several reasons. First, the statistical analysis of the events presents several challenges because of the complex and non-homogeneous nature of the network, which creates unique methodological problems. Several authors discussed and illustrated the common pitfalls of re-adapting classical planar spatial models to network data. Second, the rapid development of open-source spatial databases (such as Open Street Map) provides the starting point for creating road networks at a wide range of spatial scales. The size and volume of the data raise complex computational problems, while common geometrical errors in the network’s software representations create another source of complexity. Third, at the time of writing, the most important software routines and functions (mainly implemented in R) are still in the process of being re-written and readapted for the new spatial support. This manuscript collects four articles presenting data structures and statistical models to analyse spatial data lying on road networks using point-pattern and network-lattice approaches. The first paper reviews classes, vital pre-processing steps and software representations to manipulate road network data. In particular, it focuses on the R packages stplanr and dodgr, highlighting their main functionalities, such as shortest paths or centrality measures, using a range of datasets, from a roundabout to a complete network covering an urban city. The second paper proposes the adoption of two indices for assessing the risk of car crashes on the street network of a metropolitan area via a dynamic zero-inflated Poisson model. The elementary statistical units are the road segments of the network. It employs a set of open-source spatial covariates representing the network’s structural and demographic characteristics (such as population density, traffic lights or crossings) extracted from Open Street Map and 2011 Italian Census. The third paper demonstrates a Bayesian hierarchical model for identifying road segments of particular concern using a network-lattice approach. It is based on a case study of a major city (Leeds, UK), in which car crashes of different severities were recorded over several years. It includes spatially structured and unstructured random effects to capture the spatial nature of the events and the dependencies between the severity levels. It also recommends a novel procedure for estimating the MAUP (Modifiable Areal Unit Problem) for network-lattice data. Finally, the fourth paper summarises a set of preliminary results related to the analysis of spatio-temporal point patterns lying on road networks using non-homogeneous Poisson processes. It focuses on the ambulance interventions that occurred in the municipality of Milan from 2015 to 2017, developing two distinct models, one for the spatial component and one for the temporal component. The spatial intensity function was estimated using a network readaptation of the classical non-parametric kernel estimator. The first two appendices briefly review the basics of INLA methodology, the corresponding R package and the supplementary materials related to the fourth chapter, while the third appendix briefly introduces an R package, named osmextract, that was developed during the PhD and focuses on Open Street Map data. The fifth chapter concludes the manuscript, summarising the main contributions and emphasising future research developments.

Negli ultimi anni è nato un interesse sempre crescente verso l’analisi statistica di dati spaziali aventi supporto di network. Gli esempi più classici di questa tipologia di eventi sono, ad esempio, gli incidenti stradali, i furti di auto, i crimini, e gli interventi delle ambulanze, mentre le linee che compongono la network rappresentano tipicamente le strade, i fiumi, i binari della ferrovia, oppure le terminazioni nervose. L’analisi di questi fenomeni è interessante sotto diversi punti di vista. Innanzitutto, i modelli statistici presentano diverse problematiche legate al supporto spaziale. Per questo motivo, negli ultimi anni sono stati pubblicati diversi paper che mostrano le difficoltà principali legate alla natura stessa della network. Inoltre, il recente sviluppo di database spaziali open source (quali Open Street Map) ha permesso il download e la creazione di dataset che coprono le reti stradali di quasi tutto il mondo. L’enorme mole di dati e gli (inevitabili) errori geometrici presenti nei database di Open Street Map rappresentano due problematiche ulteriori. Infine, dato che al momento la maggior parte dei pacchetti R per l’analisi di dati su network sono ancora in fase di sviluppo, esistono anche diverse difficoltà computazionali e problemi nell’implementazione di metodologie nuove. Questo lavoro di tesi riassume quattro articoli che presentano strutture dati e metodologie statistiche per l’analisi di dati spaziali aventi supporto di network, considerando sia un approccio di tipo network-lattice che un approccio di tipo point-pattern. Il primo paper presenta una revisione bibliografica dei pacchetti R che implementano classi e funzioni per l’analisi di network stradali, concentrandosi in particolare su stplanr e dodgr. Vengono introdotte le principali routines legate al calcolo di shortest paths e centrality measures utilizzando dataset via via più complessi. Il secondo lavoro presenta un modello di Poisson Dinamico Zero Inflated per la stima di due indici di rischiosità relativi agli incidenti stradali avvenuti nel network di Milano dal 2015 al 2017. L’unità statistica elementare è rappresentata dal singolo segmento di strada, mentre la variabile risposta misura il numero di incidenti avvenuti in ognuno dei tre anni. Viene impiegato un insieme di covariate demografiche e strutturali estratte da Open Street Map e dai dati del censimento italiano avvenuto nel 2011. Il terzo paper introduce un modello Bayesiano gerarchico multivariato per la stima della rischiosità stradale tramite un approccio di tipo network-lattice. Ci si è concentrati sul network stradale della città di Leeds e su due diverse tipologie di incidenti. La componente spaziale è stata modellata tramite un errore casuale di tipo Multivariate CAR, mentre le correlazioni residue sono state catturate tramite un errore casuale non strutturato. Infine, si è anche sviluppata una metodologia nuova per l’analisi di MAUP su dati di tipo network-lattice. Per concludere, il quarto articolo presenta un insieme di risultati preliminari relativi all’analisi spazio-temporale di point pattern su network tramite processi di Poisson non-omogenei. In particolare, si è analizzata la distribuzione degli interventi delle ambulanze nel comune di Milano tra il 2015 ed il 2017, sviluppando un modello a fattori latenti per la componente temporale ed uno stimatore kernel non-parametrico per l’intensità spaziale, riadattato nel caso di dati su reticolo. La tesi si compone anche di tre appendici. Le prima riassume le caratteristiche di base del software e della metodologia INLA, la seconda presenta i materiali addizionali legati al quarto capitolo, mentre la terza appendice introduce un pacchetto R chiamato osmextract, utilizzato per manipolare dati da Open Street Map. Il quinto capitolo conclude la tesi, riassumendo i risultati principali e introducendo alcuni sviluppi futuri.

(2021). Statistical Models and Data Structures for Spatial Data on Road Networks. (Tesi di dottorato, Università degli Studi di Milano-Bicocca, 2021).

Statistical Models and Data Structures for Spatial Data on Road Networks

GILARDI, ANDREA
2021

Abstract

In the last years, we observed a surge of interest in the statistical analysis of spatial data lying on or alongside networks. Car crashes, vehicle thefts, bicycle incidents, roadside kiosks, neuroanatomical features, and ambulance interventions are just a few of the most typical examples, whereas the edges of the network represent an abstraction of roads, rivers, railways, cargo-ship routes or nerve fibers. This type of data is interesting for several reasons. First, the statistical analysis of the events presents several challenges because of the complex and non-homogeneous nature of the network, which creates unique methodological problems. Several authors discussed and illustrated the common pitfalls of re-adapting classical planar spatial models to network data. Second, the rapid development of open-source spatial databases (such as Open Street Map) provides the starting point for creating road networks at a wide range of spatial scales. The size and volume of the data raise complex computational problems, while common geometrical errors in the network’s software representations create another source of complexity. Third, at the time of writing, the most important software routines and functions (mainly implemented in R) are still in the process of being re-written and readapted for the new spatial support. This manuscript collects four articles presenting data structures and statistical models to analyse spatial data lying on road networks using point-pattern and network-lattice approaches. The first paper reviews classes, vital pre-processing steps and software representations to manipulate road network data. In particular, it focuses on the R packages stplanr and dodgr, highlighting their main functionalities, such as shortest paths or centrality measures, using a range of datasets, from a roundabout to a complete network covering an urban city. The second paper proposes the adoption of two indices for assessing the risk of car crashes on the street network of a metropolitan area via a dynamic zero-inflated Poisson model. The elementary statistical units are the road segments of the network. It employs a set of open-source spatial covariates representing the network’s structural and demographic characteristics (such as population density, traffic lights or crossings) extracted from Open Street Map and 2011 Italian Census. The third paper demonstrates a Bayesian hierarchical model for identifying road segments of particular concern using a network-lattice approach. It is based on a case study of a major city (Leeds, UK), in which car crashes of different severities were recorded over several years. It includes spatially structured and unstructured random effects to capture the spatial nature of the events and the dependencies between the severity levels. It also recommends a novel procedure for estimating the MAUP (Modifiable Areal Unit Problem) for network-lattice data. Finally, the fourth paper summarises a set of preliminary results related to the analysis of spatio-temporal point patterns lying on road networks using non-homogeneous Poisson processes. It focuses on the ambulance interventions that occurred in the municipality of Milan from 2015 to 2017, developing two distinct models, one for the spatial component and one for the temporal component. The spatial intensity function was estimated using a network readaptation of the classical non-parametric kernel estimator. The first two appendices briefly review the basics of INLA methodology, the corresponding R package and the supplementary materials related to the fourth chapter, while the third appendix briefly introduces an R package, named osmextract, that was developed during the PhD and focuses on Open Street Map data. The fifth chapter concludes the manuscript, summarising the main contributions and emphasising future research developments.
BORGONI, RICCARDO
Modelli Bayesiani; Network Lattice; Processi di punto; Statistica Spaziale; Network Stradali
Bayesian Models; Network Lattice; Point Pattern; Spatial Statistics; Network Stradali
SECS-S/01 - STATISTICA
English
29-apr-2021
STATISTICA E FINANZA MATEMATICA
33
2019/2020
open
(2021). Statistical Models and Data Structures for Spatial Data on Road Networks. (Tesi di dottorato, Università degli Studi di Milano-Bicocca, 2021).
File in questo prodotto:
File Dimensione Formato  
phd_unimib_762781.pdf

accesso aperto

Descrizione: Tesi di Gilardi Andrea - 762781
Tipologia di allegato: Doctoral thesis
Dimensione 15.28 MB
Formato Adobe PDF
15.28 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/314016
Citazioni
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
Social impact