Generalization is a very important issue in Machine Learning. In this paper, we present a new idea for improving Genetic Programming generalization ability. The idea is based on a dynamic two-layered selection algorithm and it is tested on a real-life drug discovery regression application. The algorithm begins using root mean squared error as fitness and the usual tournament selection. A list of individuals called "repulsors" is also kept in memory and initialized as empty. As an individual is found to overfit the training set, it is inserted into the list of repulsors. When the list of repulsors is not empty, selection becomes a two-layer algorithm: individuals participating to the tournament are not randomly chosen from the population but are themselves selected, using the average dissimilarity to the repulsors as a criterion to be maximized. Two kinds of similarity/dissimilarity measures are tested for this aim: the well known structural (or edit) distance and the recently defined subtree crossover based similarity measure. Although simple, this idea seems to improve Genetic Programming generalization ability and the presented experimental results show that Genetic Programming generalizes better when subtree crossover based similarity measure is used, at least for the test problems studied in this paper. Copyright 2009 ACM.

Gustafson, S., Vanneschi, L. (2009). Using crossover based similarity measure to improve genetic programming generalization ability. In Proceedings of the 11th Annual Genetic and Evolutionary Computation Conference, GECCO-2009 (pp.1139-1146). New York : ACM [10.1145/1569901.1570054].

Using crossover based similarity measure to improve genetic programming generalization ability

VANNESCHI, LEONARDO
2009

Abstract

Generalization is a very important issue in Machine Learning. In this paper, we present a new idea for improving Genetic Programming generalization ability. The idea is based on a dynamic two-layered selection algorithm and it is tested on a real-life drug discovery regression application. The algorithm begins using root mean squared error as fitness and the usual tournament selection. A list of individuals called "repulsors" is also kept in memory and initialized as empty. As an individual is found to overfit the training set, it is inserted into the list of repulsors. When the list of repulsors is not empty, selection becomes a two-layer algorithm: individuals participating to the tournament are not randomly chosen from the population but are themselves selected, using the average dissimilarity to the repulsors as a criterion to be maximized. Two kinds of similarity/dissimilarity measures are tested for this aim: the well known structural (or edit) distance and the recently defined subtree crossover based similarity measure. Although simple, this idea seems to improve Genetic Programming generalization ability and the presented experimental results show that Genetic Programming generalizes better when subtree crossover based similarity measure is used, at least for the test problems studied in this paper. Copyright 2009 ACM.
paper
crossover, based, similarity, measure, improve, genetic, programming, generalization, ability
English
11th Annual Genetic and Evolutionary Computation Conference, GECCO-2009
2009
Proceedings of the 11th Annual Genetic and Evolutionary Computation Conference, GECCO-2009
9781605583259
2009
1139
1146
none
Gustafson, S., Vanneschi, L. (2009). Using crossover based similarity measure to improve genetic programming generalization ability. In Proceedings of the 11th Annual Genetic and Evolutionary Computation Conference, GECCO-2009 (pp.1139-1146). New York : ACM [10.1145/1569901.1570054].
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/16082
Citazioni
  • Scopus 15
  • ???jsp.display-item.citation.isi??? ND
Social impact