Outlier detection is a prerequisite to identify the presence of aberrant samples in a given set of data. The identification of such diverse data samples is significant particularly for multivariate data analysis where increasing data dimensionality can easily hinder the data exploration and such outliers often go undetected. This paper is aimed to introduce a novel Mahalanobis distance measure (namely, a pseudo-distance) termed as locally centred Mahalanobis distance, derived by centering the covariance matrix at each data sample rather than at the data centroid as in the classical covariance matrix. Two parameters, called as Remoteness and Isolation degree, were derived from the resulting pairwise distance matrix and their salient features facilitated a better identification of atypical samples isolated from the rest of the data, thus reflecting their potential application towards outlier detection. The Isolation degree demonstrated to be able to detect a new kind of outliers, that is, isolated samples within the data domain, thus resulting in a useful diagnostic tool to evaluate the reliability of predictions obtained by local models (e.g. k-NN models).To better understand the role of Remoteness and Isolation degree in identification of such aberrant data samples, some simulated and published data sets from literature were considered as case studies and the results were compared with those obtained by using Euclidean distance and classical Mahalanobis distance

Todeschini, R., Ballabio, D., Consonni, V., Sahigara, F., Filzmoser, P. (2013). Locally centred Mahalanobis distance: a new distance measure with salient features towards outlier detection. ANALYTICA CHIMICA ACTA, 787, 1-9 [10.1016/j.aca.2013.04.034].

Locally centred Mahalanobis distance: a new distance measure with salient features towards outlier detection

TODESCHINI, ROBERTO;BALLABIO, DAVIDE;CONSONNI, VIVIANA;SAHIGARA, FAIZAN ABDULRAZAK;
2013

Abstract

Outlier detection is a prerequisite to identify the presence of aberrant samples in a given set of data. The identification of such diverse data samples is significant particularly for multivariate data analysis where increasing data dimensionality can easily hinder the data exploration and such outliers often go undetected. This paper is aimed to introduce a novel Mahalanobis distance measure (namely, a pseudo-distance) termed as locally centred Mahalanobis distance, derived by centering the covariance matrix at each data sample rather than at the data centroid as in the classical covariance matrix. Two parameters, called as Remoteness and Isolation degree, were derived from the resulting pairwise distance matrix and their salient features facilitated a better identification of atypical samples isolated from the rest of the data, thus reflecting their potential application towards outlier detection. The Isolation degree demonstrated to be able to detect a new kind of outliers, that is, isolated samples within the data domain, thus resulting in a useful diagnostic tool to evaluate the reliability of predictions obtained by local models (e.g. k-NN models).To better understand the role of Remoteness and Isolation degree in identification of such aberrant data samples, some simulated and published data sets from literature were considered as case studies and the results were compared with those obtained by using Euclidean distance and classical Mahalanobis distance
Articolo in rivista - Articolo scientifico
Mahalanobis distance, Outlier detection, Similarity, Isolation degree, Remoteness, Covariance matrix, Data mining
English
2013
787
1
9
reserved
Todeschini, R., Ballabio, D., Consonni, V., Sahigara, F., Filzmoser, P. (2013). Locally centred Mahalanobis distance: a new distance measure with salient features towards outlier detection. ANALYTICA CHIMICA ACTA, 787, 1-9 [10.1016/j.aca.2013.04.034].
File in questo prodotto:
File Dimensione Formato  
LCMD_ACA_2013.pdf

Solo gestori archivio

Dimensione 2.3 MB
Formato Adobe PDF
2.3 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/45149
Citazioni
  • Scopus 61
  • ???jsp.display-item.citation.isi??? 50
Social impact