The thesis is about the Forward Search, an approach to robust data analysis proposed and developed in the last 15 years mainly by Atkinson, Riani and Cerioli. It is a general method for detecting unidentified subsets and masked outliers in complex data and for determining their effect on models fitted to the data. This thesis approaches the Forward Search in the regression context under different perspectives, motivated by issues encountered in concrete application contexts related to the analysis of international trade data. The main contributions of the thesis can be summarised as follows. Firstly, it is investigated how the Forward Search achieves its nominal size and how it faces with the multiple testing issue. Secondly, the Forward Search algorithm is relaxed in order to identify outliers with arbitrary significance levels, other than the standard 1% that is inherent to the method. Thirdly, it reports the results of a rigorous and extensive assessment of the actual size and power of the Forward Search in comparison with the today reference methods in robust regression (LMS and LTS). The results empirically prove that the Forward Search can achieve at the same time high power and small size. Fourthly, the Forward Search is extended to identify and validate homogeneous sub-populations in the data that, in regression, manifest as mixtures of linear components. Finally the Forward Search is addressed from the exploratory data analysis perspective, discussing new dynamic and interactive graphical tools aimed at extracting information from the numerous plots produced by the Forward Search.
(2010). Advances in the forward search: methodological and applied contributions. (Tesi di dottorato, Università degli Studi di Milano-Bicocca, 2010).
Advances in the forward search: methodological and applied contributions
TORTI, FRANCESCA
2010
Abstract
The thesis is about the Forward Search, an approach to robust data analysis proposed and developed in the last 15 years mainly by Atkinson, Riani and Cerioli. It is a general method for detecting unidentified subsets and masked outliers in complex data and for determining their effect on models fitted to the data. This thesis approaches the Forward Search in the regression context under different perspectives, motivated by issues encountered in concrete application contexts related to the analysis of international trade data. The main contributions of the thesis can be summarised as follows. Firstly, it is investigated how the Forward Search achieves its nominal size and how it faces with the multiple testing issue. Secondly, the Forward Search algorithm is relaxed in order to identify outliers with arbitrary significance levels, other than the standard 1% that is inherent to the method. Thirdly, it reports the results of a rigorous and extensive assessment of the actual size and power of the Forward Search in comparison with the today reference methods in robust regression (LMS and LTS). The results empirically prove that the Forward Search can achieve at the same time high power and small size. Fourthly, the Forward Search is extended to identify and validate homogeneous sub-populations in the data that, in regression, manifest as mixtures of linear components. Finally the Forward Search is addressed from the exploratory data analysis perspective, discussing new dynamic and interactive graphical tools aimed at extracting information from the numerous plots produced by the Forward Search.File | Dimensione | Formato | |
---|---|---|---|
phd_unimib_707887.pdf
accesso aperto
Tipologia di allegato:
Doctoral thesis
Dimensione
9.5 MB
Formato
Adobe PDF
|
9.5 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.