So far, similarity/diversity of objects has been widely studied in different research fields and a number of distance measures to estimate diversity between objects have been proposed. However, not much interest has been addressed to analysis of how diverse are configurations of objects in two different multivariate spaces. Since computerisation and automation nowadays lead to a large availability of information, it is apparent that a system could be described in different ways and, consequently, methods for comparison of the different viewpoints are required. These methods, for instance, may be usefully applied to Quantitative Structure-Activity Relationship (QSAR) studies. In this field, several thousands of molecular descriptors have been proposed in the literature and different selections of descriptors define different chemical spaces that need to be compared. Moreover, variable selection techniques such as Genetic Algorithms, Simulated Annealing, and Tabu Search are widely used to process available information in order to select optimal QSAR models. When more than one optimal model results, the problem arising is how to compare these models to find out whether they are really diverse or based on descriptors explaining almost the same information. In this paper, novel indices are proposed to measure similarity/diversity between pairs of data sets by the aid of the variable cross-correlation matrix. © 2009 Elsevier B.V. All rights reserved.

Todeschini, R., Consonni, V., Manganaro, A., Ballabio, D., Mauri, A. (2009). Canonical Measure of Correlation (CMC) and Canonical Measure of Distance (CMD) between sets of data. Part 1. Theory and simple chemometric applications. ANALYTICA CHIMICA ACTA, 648, 45-51 [10.1016/j.aca.2009.06.032].

Canonical Measure of Correlation (CMC) and Canonical Measure of Distance (CMD) between sets of data. Part 1. Theory and simple chemometric applications

TODESCHINI, ROBERTO;CONSONNI, VIVIANA;BALLABIO, DAVIDE;
2009

Abstract

So far, similarity/diversity of objects has been widely studied in different research fields and a number of distance measures to estimate diversity between objects have been proposed. However, not much interest has been addressed to analysis of how diverse are configurations of objects in two different multivariate spaces. Since computerisation and automation nowadays lead to a large availability of information, it is apparent that a system could be described in different ways and, consequently, methods for comparison of the different viewpoints are required. These methods, for instance, may be usefully applied to Quantitative Structure-Activity Relationship (QSAR) studies. In this field, several thousands of molecular descriptors have been proposed in the literature and different selections of descriptors define different chemical spaces that need to be compared. Moreover, variable selection techniques such as Genetic Algorithms, Simulated Annealing, and Tabu Search are widely used to process available information in order to select optimal QSAR models. When more than one optimal model results, the problem arising is how to compare these models to find out whether they are really diverse or based on descriptors explaining almost the same information. In this paper, novel indices are proposed to measure similarity/diversity between pairs of data sets by the aid of the variable cross-correlation matrix. © 2009 Elsevier B.V. All rights reserved.
Articolo in rivista - Articolo scientifico
similarity/diversity; Hamming distance; correlation; distance measure
English
45
51
Todeschini, R., Consonni, V., Manganaro, A., Ballabio, D., Mauri, A. (2009). Canonical Measure of Correlation (CMC) and Canonical Measure of Distance (CMD) between sets of data. Part 1. Theory and simple chemometric applications. ANALYTICA CHIMICA ACTA, 648, 45-51 [10.1016/j.aca.2009.06.032].
Todeschini, R; Consonni, V; Manganaro, A; Ballabio, D; Mauri, A
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/6941
Citazioni
  • Scopus 23
  • ???jsp.display-item.citation.isi??? 22
Social impact