The Nearest Neighbour Imputation (NNI) method has a long history in missing data imputation. Likewise, multivariate dimensional reduction techniques allow for preserving the maximum information from the data. Recently, the combined use of these methodologies has been proposed to solve data imputation problems and exploit as much as information from the complete part of the data. In this paper we perform an extensive simulation study to test the performance of this new imputation approach (called “Forward Imputation” - ForImp). We compare the two ForImp methods developed for missing quantitative data (the first one called ForImpPCA involving the NNI method and the Principal Component Analysis (PCA) as a multivariate data analysis technique, and the second one called ForImpMahalanobis, which involves the Mahalanobis distance for NNI) with other two imputation techniques regarded as benchmark, namely Stekhoven and Buehlmann’s missForest method, which is a nonparametric imputation technique for continuous and/or categorical data based on a random forest, and the Iterative PCA, which is an algorithmic-type technique that imputes missing values simultaneously by an iterative use of PCA. The simulation study is based on constructing simulated data with different levels of kurtosis or skewness and strength of linear relationship of variables, so that the performance of the four methods can be compared on various data patterns. Distributions used for these simulated data belong to the families of Multivariate Exponential Power and Multivariate Skew-Normal distributions, respectively. Results tend to favour ForImpMahalanobis especially in the presence of skew data with small or negative correlations of a same magnitude, or a mix of negative and positive correlations of low level, whereas ForImpPCA works better than it when a slightly higher level of correlations is present in the data.

Solaro, N., Barbiero, A., Manzi, G., Ferrari, P. (2015). A comprehensive simulation study on the Forward Imputation [Working paper del dipartimento].

A comprehensive simulation study on the Forward Imputation

SOLARO, NADIA
Primo
;
2015

Abstract

The Nearest Neighbour Imputation (NNI) method has a long history in missing data imputation. Likewise, multivariate dimensional reduction techniques allow for preserving the maximum information from the data. Recently, the combined use of these methodologies has been proposed to solve data imputation problems and exploit as much as information from the complete part of the data. In this paper we perform an extensive simulation study to test the performance of this new imputation approach (called “Forward Imputation” - ForImp). We compare the two ForImp methods developed for missing quantitative data (the first one called ForImpPCA involving the NNI method and the Principal Component Analysis (PCA) as a multivariate data analysis technique, and the second one called ForImpMahalanobis, which involves the Mahalanobis distance for NNI) with other two imputation techniques regarded as benchmark, namely Stekhoven and Buehlmann’s missForest method, which is a nonparametric imputation technique for continuous and/or categorical data based on a random forest, and the Iterative PCA, which is an algorithmic-type technique that imputes missing values simultaneously by an iterative use of PCA. The simulation study is based on constructing simulated data with different levels of kurtosis or skewness and strength of linear relationship of variables, so that the performance of the four methods can be compared on various data patterns. Distributions used for these simulated data belong to the families of Multivariate Exponential Power and Multivariate Skew-Normal distributions, respectively. Results tend to favour ForImpMahalanobis especially in the presence of skew data with small or negative correlations of a same magnitude, or a mix of negative and positive correlations of low level, whereas ForImpPCA works better than it when a slightly higher level of correlations is present in the data.
Working paper del dipartimento
Working Paper n. 2015-04, Febbraio 2015, Dipartimento di Economia, Management e Metodi Quantitativi, Università degli Studi di Milano
Correlation, Data patterns, Kurtosis, Mahalanobis distance, MissForest, Nearest Neighbour Imputation, Principal Component Analysis, Skewness
English
2015
2015-04
1
28
https://ideas.repec.org/p/mil/wpdepa/2015-04.html
Solaro, N., Barbiero, A., Manzi, G., Ferrari, P. (2015). A comprehensive simulation study on the Forward Imputation [Working paper del dipartimento].
none
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/80405
Citazioni
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
Social impact