In this paper, we address two issues that have long plagued researchers in statistical modeling and data mining. The first is well-known as the “curse of dimensionality”. Very large datasets are becoming more and more frequent, as mankind is now measuring everything he can as frequently as he can. Statistical analysis techniques developed even 50 years ago can founder in all this data. The second issue we address is that of model misspecification – specifically that of an incorrect assumed functional form. These issues are addressed in the context of multivariate regression modeling. To drive dimension reduction and model selection, we use the newly developed form of Bozdogan’s ICOMP, introduced in Bozdogan and Howe (Misspecification resistant multivariate regression models using the genetic algorithm and information complexity as the fitness function, Technical report 1, (2012)), that penalizes models with a complexity measure of the “sandwich” model covariance matrix. This information criterion is used by the genetic algorithm as the objective function in a two-step hybrid dimension reduction process. First, we use probabilistic principle components analysis to independently reduce the number of response and predictor variables. Then, we use the genetic algorithm with the multivariate Gaussian regression model to identify the best subset regression model. We apply these methods to identify a substantially reduced multivariate regression relationship for a dataset regarding Italian high school students. From 29 response variables, we get 4, and from 46 regressors, we get 1.

Bozdogan, H., Howe, J., Katragadda, S., Liberati, C. (2013). Misspecification Resistant Model Selection Using Information Complexity With Applications. In A. Giusti, G. Ritter, M. Vichi (a cura di), Classification and Data Mining (pp. 165-172). Berlin Heidelberg : Springer-Verlag [10.1007/978-3-642-28894-4_20].

Misspecification Resistant Model Selection Using Information Complexity With Applications

LIBERATI, CATERINA
2013

Abstract

In this paper, we address two issues that have long plagued researchers in statistical modeling and data mining. The first is well-known as the “curse of dimensionality”. Very large datasets are becoming more and more frequent, as mankind is now measuring everything he can as frequently as he can. Statistical analysis techniques developed even 50 years ago can founder in all this data. The second issue we address is that of model misspecification – specifically that of an incorrect assumed functional form. These issues are addressed in the context of multivariate regression modeling. To drive dimension reduction and model selection, we use the newly developed form of Bozdogan’s ICOMP, introduced in Bozdogan and Howe (Misspecification resistant multivariate regression models using the genetic algorithm and information complexity as the fitness function, Technical report 1, (2012)), that penalizes models with a complexity measure of the “sandwich” model covariance matrix. This information criterion is used by the genetic algorithm as the objective function in a two-step hybrid dimension reduction process. First, we use probabilistic principle components analysis to independently reduce the number of response and predictor variables. Then, we use the genetic algorithm with the multivariate Gaussian regression model to identify the best subset regression model. We apply these methods to identify a substantially reduced multivariate regression relationship for a dataset regarding Italian high school students. From 29 response variables, we get 4, and from 46 regressors, we get 1.
Capitolo o saggio
Robust Regression, Misspecification, Complexity, Dimension Reduction, Genetic Algorithms
English
Classification and Data Mining
Giusti, Antonio; Ritter, Gunter; Vichi, Maurizio
2013
978-3-642-28893-7
Springer-Verlag
165
172
Bozdogan, H., Howe, J., Katragadda, S., Liberati, C. (2013). Misspecification Resistant Model Selection Using Information Complexity With Applications. In A. Giusti, G. Ritter, M. Vichi (a cura di), Classification and Data Mining (pp. 165-172). Berlin Heidelberg : Springer-Verlag [10.1007/978-3-642-28894-4_20].
none
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/18872
Citazioni
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
Social impact