Generalization is a very important issue in Machine Learning. In this paper, we present a new idea for improving Genetic Programming generalization ability. The idea is based on a dynamic two-layered selection algorithm and it is tested on a real-life drug discovery regression application. The algorithm begins using root mean squared error as fitness and the usual tournament selection. A list of individuals called "repulsors" is also kept in memory and initialized as empty. As an individual is found to overfit the training set, it is inserted into the list of repulsors. When the list of repulsors is not empty, selection becomes a two-layer algorithm: individuals participating to the tournament are not randomly chosen from the population but are themselves selected, using the average dissimilarity to the repulsors as a criterion to be maximized. Two kinds of similarity/dissimilarity measures are tested for this aim: the well known structural (or edit) distance and the recently defined subtree crossover based similarity measure. Although simple, this idea seems to improve Genetic Programming generalization ability and the presented experimental results show that Genetic Programming generalizes better when subtree crossover based similarity measure is used, at least for the test problems studied in this paper. Copyright 2009 ACM.
Gustafson, S., Vanneschi, L. (2009). Using crossover based similarity measure to improve genetic programming generalization ability. In Proceedings of the 11th Annual Genetic and Evolutionary Computation Conference, GECCO-2009 (pp.1139-1146). New York : ACM [10.1145/1569901.1570054].
Using crossover based similarity measure to improve genetic programming generalization ability
VANNESCHI, LEONARDO
2009
Abstract
Generalization is a very important issue in Machine Learning. In this paper, we present a new idea for improving Genetic Programming generalization ability. The idea is based on a dynamic two-layered selection algorithm and it is tested on a real-life drug discovery regression application. The algorithm begins using root mean squared error as fitness and the usual tournament selection. A list of individuals called "repulsors" is also kept in memory and initialized as empty. As an individual is found to overfit the training set, it is inserted into the list of repulsors. When the list of repulsors is not empty, selection becomes a two-layer algorithm: individuals participating to the tournament are not randomly chosen from the population but are themselves selected, using the average dissimilarity to the repulsors as a criterion to be maximized. Two kinds of similarity/dissimilarity measures are tested for this aim: the well known structural (or edit) distance and the recently defined subtree crossover based similarity measure. Although simple, this idea seems to improve Genetic Programming generalization ability and the presented experimental results show that Genetic Programming generalizes better when subtree crossover based similarity measure is used, at least for the test problems studied in this paper. Copyright 2009 ACM.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.