A previous paper introduced a new correlation index, K, and tested it to evaluate the correlation content of a set of multivariate data. This paper presents an extension of the K index theory together with some applications in several fields where chemometrics is commonly encountered. Starting from a correlation measurement, evaluated by the K index theory, it becomes possible (a) to calculate the information content within a set of multivariate data, (b) to give an estimate of data set entropy, (c) to allow variable reduction, preserving the correlation structure in the original data. Moreover, (d) the effect of common scaling procedures on the structure of the original data can be measured, (e) and an estimate made of the minimum number of cross- validation groups, without loosing relevant but not predictable information; finally (f) a search can be made for the best subset models in regression analysis excluding models without predictive power.
Todeschini, R., Consonni, V., Maiocchi, A. (1999). The K correlation index: theory development and its applications in chemometrics. CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 46(1), 13-29 [10.1016/S0169-7439(98)00124-5].
The K correlation index: theory development and its applications in chemometrics
TODESCHINI, ROBERTO;CONSONNI, VIVIANA;
1999
Abstract
A previous paper introduced a new correlation index, K, and tested it to evaluate the correlation content of a set of multivariate data. This paper presents an extension of the K index theory together with some applications in several fields where chemometrics is commonly encountered. Starting from a correlation measurement, evaluated by the K index theory, it becomes possible (a) to calculate the information content within a set of multivariate data, (b) to give an estimate of data set entropy, (c) to allow variable reduction, preserving the correlation structure in the original data. Moreover, (d) the effect of common scaling procedures on the structure of the original data can be measured, (e) and an estimate made of the minimum number of cross- validation groups, without loosing relevant but not predictable information; finally (f) a search can be made for the best subset models in regression analysis excluding models without predictive power.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.