Bicocca Open Archive

Generalization is a very important issue in Machine Learning. In this paper, we present a new idea for improving Genetic Programming generalization ability. The idea is based on a dynamic two-layered selection algorithm and it is tested on a real-life drug discovery regression application. The algorithm begins using root mean squared error as fitness and the usual tournament selection. A list of individuals called "repulsors" is also kept in memory and initialized as empty. As an individual is found to overfit the training set, it is inserted into the list of repulsors. When the list of repulsors is not empty, selection becomes a two-layer algorithm: individuals participating to the tournament are not randomly chosen from the population but are themselves selected, using the average dissimilarity to the repulsors as a criterion to be maximized. Two kinds of similarity/dissimilarity measures are tested for this aim: the well known structural (or edit) distance and the recently defined subtree crossover based similarity measure. Although simple, this idea seems to improve Genetic Programming generalization ability and the presented experimental results show that Genetic Programming generalizes better when subtree crossover based similarity measure is used, at least for the test problems studied in this paper. Copyright 2009 ACM.

Gustafson, S., Vanneschi, L. (2009). Using crossover based similarity measure to improve genetic programming generalization ability. In Proceedings of the 11th Annual Genetic and Evolutionary Computation Conference, GECCO-2009 (pp.1139-1146). New York : ACM [10.1145/1569901.1570054].

Using crossover based similarity measure to improve genetic programming generalization ability

Gustafson, S;VANNESCHI, LEONARDO

2009

Abstract

Generalization is a very important issue in Machine Learning. In this paper, we present a new idea for improving Genetic Programming generalization ability. The idea is based on a dynamic two-layered selection algorithm and it is tested on a real-life drug discovery regression application. The algorithm begins using root mean squared error as fitness and the usual tournament selection. A list of individuals called "repulsors" is also kept in memory and initialized as empty. As an individual is found to overfit the training set, it is inserted into the list of repulsors. When the list of repulsors is not empty, selection becomes a two-layer algorithm: individuals participating to the tournament are not randomly chosen from the population but are themselves selected, using the average dissimilarity to the repulsors as a criterion to be maximized. Two kinds of similarity/dissimilarity measures are tested for this aim: the well known structural (or edit) distance and the recently defined subtree crossover based similarity measure. Although simple, this idea seems to improve Genetic Programming generalization ability and the presented experimental results show that Genetic Programming generalizes better when subtree crossover based similarity measure is used, at least for the test problems studied in this paper. Copyright 2009 ACM.

Scheda breve

Scheda completa

Scheda completa (DC)

	Tipo di intervento
	
				paper
			
	Parole chiave
	
				crossover, based, similarity, measure, improve, genetic, programming, generalization, ability
			
	Lingua del contenuto
	
				English
			
	Nome del convegno
	
				11th Annual Genetic and Evolutionary Computation Conference, GECCO-2009
			
	Anno del convegno
	
				2009
			
	Titolo degli atti
	
				Proceedings of the 11th Annual Genetic and Evolutionary Computation Conference, GECCO-2009
			
	ISBN del volume degli atti
	
				9781605583259
			
	Data di pubblicazione
	
				2009
			
	Pagina iniziale
	
				1139
			
	Pagina finale
	
				1146
			
	DOI dell'intervento
	
				https://dx.doi.org/10.1145/1569901.1570054
			
	Fulltext
	
				none
			
	Citazione
	
				Gustafson, S., Vanneschi, L. (2009). Using crossover based similarity measure to improve genetic programming generalization ability. In Proceedings of the 11th Annual Genetic and Evolutionary Computation Conference, GECCO-2009 (pp.1139-1146). New York : ACM [10.1145/1569901.1570054].
			
	Appare nelle tipologie:
	
				02 - Intervento a convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/16082

Citazioni

16

ND

Social impact