Consider a two-way contingency table displaying the joint distribution of two categorical variables X and Y. A frequent need in the analysis of this kind of data is to group and collapse some rows of the table, in such a way that the collapsed table reproduces the association between X and Y in the best way. To solve this problem, it is hence necessary to compute a measure of association (for instance the likelihood-ratio statistic G2) for every possible grouping of the rows of the original table. Unfortunaltely, this can hardly be done if the number of categories to be grouped is large; hence some approximate solutions must be searched (see Siatkowski and Krajewsky, 1989). We now propose a new method, based on the logic of genetic algorithms, to determine such approximations. The proposed algorithm can be applied when the number of group in the collapsed table is fixed or variable. The performance of the algorithm is tested on a dataset regarding rose varieties and reasults are compared with those obtained by Siatkowski and Krajewsky (1989)
Borroni, C., Piccarreta, R. (2000). Genetic Algorithms for Optimal Grouping of Categories in Two-way Contingency Tables. STATISTICA APPLICATA, 12(4), 435-444.
Genetic Algorithms for Optimal Grouping of Categories in Two-way Contingency Tables
BORRONI, CLAUDIO GIOVANNI;
2000
Abstract
Consider a two-way contingency table displaying the joint distribution of two categorical variables X and Y. A frequent need in the analysis of this kind of data is to group and collapse some rows of the table, in such a way that the collapsed table reproduces the association between X and Y in the best way. To solve this problem, it is hence necessary to compute a measure of association (for instance the likelihood-ratio statistic G2) for every possible grouping of the rows of the original table. Unfortunaltely, this can hardly be done if the number of categories to be grouped is large; hence some approximate solutions must be searched (see Siatkowski and Krajewsky, 1989). We now propose a new method, based on the logic of genetic algorithms, to determine such approximations. The proposed algorithm can be applied when the number of group in the collapsed table is fixed or variable. The performance of the algorithm is tested on a dataset regarding rose varieties and reasults are compared with those obtained by Siatkowski and Krajewsky (1989)I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.