Regression models with good fitting but no predictive ability are sometimes chance correlations and often show some pathological features such as multicollinearity, overfitting, and inclusion of noisy/spurious variables. This problem is well known and of the utmost importance. The present paper proposes some criteria that are to be fulfilled as conditions for model acceptability, the aim being to recognize linear regression models with pathology. These criteria have been thought of in order to face the following problems: model instability due to outliers and influential objects; predictor multicollinearity; redundancy in explanatory variables; overfitting due to chance factors. A multicriteria fitness function based on the maximization of the Q<sup>2</sup> statistics under a set of tests is proposed here. This new fitness function can also be used in model searching by variable selection approaches in order to obtain a final optimal population of models. Computations on the Selwood data set are reported to illustrate the use of this multicriteria fitness function in model searching. © 2003 Elsevier B.V. All rights reserved.

Todeschini, R., Consonni, V., Mauri, A., Pavan, M. (2004). Detecting "bad" regression models: multicriteria fitness functions in regression analysis. ANALYTICA CHIMICA ACTA, 515(1), 199-208 [10.1016/j.aca.2003.12.010].

Detecting "bad" regression models: multicriteria fitness functions in regression analysis

TODESCHINI, ROBERTO;CONSONNI, VIVIANA;
2004

Abstract

Regression models with good fitting but no predictive ability are sometimes chance correlations and often show some pathological features such as multicollinearity, overfitting, and inclusion of noisy/spurious variables. This problem is well known and of the utmost importance. The present paper proposes some criteria that are to be fulfilled as conditions for model acceptability, the aim being to recognize linear regression models with pathology. These criteria have been thought of in order to face the following problems: model instability due to outliers and influential objects; predictor multicollinearity; redundancy in explanatory variables; overfitting due to chance factors. A multicriteria fitness function based on the maximization of the Q2 statistics under a set of tests is proposed here. This new fitness function can also be used in model searching by variable selection approaches in order to obtain a final optimal population of models. Computations on the Selwood data set are reported to illustrate the use of this multicriteria fitness function in model searching. © 2003 Elsevier B.V. All rights reserved.
Articolo in rivista - Articolo scientifico
regression methods,genetic algorithms,variable subset selection
English
199
208
10
Todeschini, R., Consonni, V., Mauri, A., Pavan, M. (2004). Detecting "bad" regression models: multicriteria fitness functions in regression analysis. ANALYTICA CHIMICA ACTA, 515(1), 199-208 [10.1016/j.aca.2003.12.010].
Todeschini, R; Consonni, V; Mauri, A; Pavan, M
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/4524
Citazioni
  • Scopus 164
  • ???jsp.display-item.citation.isi??? 158
Social impact