Coevolutionary Gradient Algorithms and their Application...

Coevolutionary Gradient Algorithmsand their Application to Othello

Marcin Szubert Krzysztof Krawiec

Institute of Computing SciencePoznan University of Technology

May 17, 2011

Previous Research Learner Architecture Coevolutionary Gradient Algorithms Experiments Summary

Outline

1 Starting Point – Previous ResearchCoevolution and Reinforcement LearningCoevolutionary Temporal Difference Learning

2 Flexible Learner ArchitectureTopology and Weight Evolving ANNsN-tuple Networks

3 Coevolutionary Gradient Algorithms

4 Experimental Results

5 Summary

Coevolutionary Gradient Algorithms and their Application to Othello 2 / 40 M. Szubert, K.Krawiec


Outline





5 Summary



Coevolutionary Algorithms


Bio-inspired methods that attempt to harness Darwinian notions ofheredity and survival of the fittest but in contrast to traditionalevolutionary algorithms do not attempt to objectively measure thefitness of individuals. Instead, individuals are compared on thebasis of their outcomes from interactions with other individuals.

Natural evolution is coevolution, where the fitness of an individual isdefined with respect to its competitors and collaborators, as well as tothe environment.

Simon M. Lucas




The outcome of evaluating an individual in a coevolutionary algorithm dependsupon the context of whom the individual interacts with. This context sensitivityis characteristic of coevolutionary systems and responsible for the complexdynamics for which coevolution is (in)famous.

Sevan G. Ficici

Single-population coevolutionary algorithm

18 2 Coevolution

Algorithm 1: Basic scheme of a generational evolutionary algorithm1: P ! createRandomPopulation()2: A ! initializeArchive()3: evaluatePopulation(A, P)4: while ¬terminationCondition() do5: S ! selectParents(P)6: P ! recombineAndMutate(S)7: evaluatePopulation(A, P)8: updateArchive(A, P)9: end while

10: return getFittestIndividual(A, P)

The family of EA is composed of a few methods that di!er slightly in technical de-tails, but all match the basic scheme presented in Algorithm 1. The most importantdi!erence between these methods concerns so called representation which defines amapping from phenotypes onto a set of genotypes and specifies what data structuresare employed in this encoding. Phenotypes are objects forming solutions to theoriginal problem, i.e., points of the problem space of possible solutions. Genotypes,on the other hand, are used to denote points in the evolutionary search space whichare subject to genetic operations. The process of genotype-phenotype decoding isintended to model natural phenomenon of embryogenesis. More detailed descriptionof these terms can be found in [Weise 09].

Returning to di!erent dialects of EA, candidate solutions are represented typi-cally by strings over a finite (usually binary) alphabet in Genetic Algorithms (GA)[Holland 62], real-valued vectors in Evolution Strategies (ES) [Rechenberg 73], finitestate machines in classical Evolutionary Programming (EP) [Fogel 95] and trees inGenetic Programming (GP) [Koza 92]. A certain representation might be preferableif it makes encoding solutions to a given problem more natural. Obviously, geneticoperations of recombination and mutation must be adapted to chosen representa-tion. For example, crossover in GP is usually based on exchanging subtrees betweencombined individuals.

The most significant advantage of EA lies in their flexibility and adaptability tothe given task. This may be explained by their metaheuristic character of “blackbox” that makes only few assumptions about the underlying objective function whichis the subject of optimization. EA are claimed to be robust problem solvers showingroughly good performance over a wide range of problems, as reported by Goldberg[Goldberg 89]. Especially the combination of EA with problem-specific heuristicsincluding local-search based techniques, often make possible highly e"cient opti-mization algorithms for many areas of application. Such hybridization of EA isgetting popular due to their capabilities in handling real-world problems involvingnoisy environment, imprecision or uncertainty. The latest state-of-the-art method-

18 2 Coevolution

Procedure evaluatePopulation(A, P)

1: E ! selectEvaluators(A, P)2: performInteractions(P , E)3: aggregateInteractionOutcomes(P , E)



The most significant advantage of EA lies in their flexibility and adaptability tothe given task. This may be explained by their metaheuristic character of “blackbox” that makes only few assumptions about the underlying objective function whichis the subject of optimization. EA are claimed to be robust problem solvers showingroughly good performance over a wide range of problems, as reported by Goldberg[Goldberg 89]. Especially the combination of EA with problem-specific heuristicsincluding local-search based techniques, often make possible highly e"cient opti-mization algorithms for many areas of application. Such hybridization of EA isgetting popular due to their capabilities in handling real-world problems involvingnoisy environment, imprecision or uncertainty. The latest state-of-the-art method-ologies in Hybrid Evolutionary Algorithms are reviewed in [Grosan 07].



Reinforcement Learning

Reinforcement Learning (RL)

Machine learning paradigm focused on solving problems in whichan agent interacts with an environment by taking actions andreceiving rewards at discrete time steps. The objective is to findsuch a decision policy that maximizes cumulative reward.

Agent

Environment

2. action at

3. reward rt1. state st

4. learn on the basis of < st , at , rt , st+1 >

In board games:

agent =⇒ player

environment =⇒ game

state =⇒ board state

action =⇒ legal move

reward =⇒ game result



Temporal Difference Learning

Temporal Difference Learning (TDL)

RL method which attempts to estimate a value function byobserving the progression of states – the learner adjusts it to makethe value of the current state more like the value of the next state.

Value function V (b) can be represented as a neural networkwith a modifiable weight vector w.The adjustment is based on a gradient-descent update, e.g.

∆wi := ηebi

e = v - v



Outline





5 Summary



Coevolutionary Temporal Difference Learning

Coevolutionary Temporal Difference Learning (CTDL)

A hybrid of coevolutionary search with reinforcement learning thatworks by interlacing one-population competitive coevolution withtemporal difference learning.

18 2 Coevolution

Algorithm 1 Basic scheme of a generational evolutionary algorithm

1: P ! createRandomPopulation()2: A ! initializeArchive()3: evaluatePopulation(A, P)4: while ¬terminationCondition() do5: S ! selectParents(P)6: P ! recombineAndMutate(S)7: individualReinforcementLearning(P)8: evaluatePopulation(A, P)9: updateArchive(A, P)

10: end while11: return getFittestIndividual(A, P)



The most significant advantage of EA lies in their flexibility and adaptability tothe given task. This may be explained by their metaheuristic character of “blackbox” that makes only few assumptions about the underlying objective function whichis the subject of optimization. EA are claimed to be robust problem solvers showingroughly good performance over a wide range of problems, as reported by Goldberg[Goldberg 89]. Especially the combination of EA with problem-specific heuristicsincluding local-search based techniques, often make possible highly e"cient opti-mization algorithms for many areas of application. Such hybridization of EA isgetting popular due to their capabilities in handling real-world problems involvingnoisy environment, imprecision or uncertainty. The latest state-of-the-art method-



Relative Methods Performance Over Time

4000

5000

6000

7000

8000

9000

10000

11000

12000

13000

0 10 20 30 40

poin

ts in tourn

am

ents

games played (x 100 000)

CTDL + HoF

CTDL

TDL

CEL + HoF

CEL



Observations and Motivation

Observations on learning Othello strategies

Temporal Difference Learning is much faster and under mostexperimental settings it is able to learn better strategies.

Coevolution can eventually produce better strategies if it issupported by an archive which sustains progress.

CTDL benefits from these complementary characteristics.

Motivation for further research on CTDL

No need for human expertise – useful when the knowledge ofthe problem domain is unavailable or expensive to obtain.

Potential for employing more complex learner architecture.

Interesting biological interpretation.



Outline





5 Summary



Evolution of Artificial Neural Networks

Typically, a network topology is chosen before the experimentand evolution searches the space of weight connections.

Can evolving topologies along with weights provide anadvantage over evolving weights on a fixed-topology?

Any continuous function can be approximated by a fully connected neural networkhaving only one internal hidden layer and with an arbitrary sigmoidal nonlinearity.

George V. Cybenko

Challenges of Topology and Weight Evolving ANNs (TWEANNs)

How to cross over disparate topologies in a meaningful way?How can topological innovation that needs a few generations to beoptimized be protected so that it does not disappear prematurely?How can topologies be minimized throughout evolution?



Evolvability and Neural Interference

Evolvability is an organism’s capacity to generate heritable phenotypic variation.

Marc Kirschner & John Gerhart

Evolvability of neural networks allows evolutionary algorithms to find weightsettings that produce a desired behavior or approximate a given function.

Julian Togelius

Topology of a neural network largely influences its evolvability– it can be increased by removing single inputs or connections.

The availability of certain information at certain points in thenetwork can lead evolution into local optima.

Neural interference appears in nonmodular neural networksthat learn complex behavior consisting of multiple tasks.



Neuroevolution of Augmenting Topologies (NEAT)

Matching Topologies using Innovation Numbers

Different network structures (size and connection order) –Competing Conventions Problem

NEAT performs artificial synapsis based on historical markingsEvolving NN’s through Augmenting Topologies

Figure 1: The competing conventions problem. The two networks compute the sameexact function even though their hidden units appear in a different order and are repre-sented by different chromosomes, making them incompatible for crossover. The figureshows that the two single-point recombinations are both missing one of the 3 maincomponents of each solution. The depicted networks are only 2 of the 6 possible per-mutations of hidden unit orderings.

We now turn to several specific problems with TWEANNs and address each inturn.

2.2 Competing Conventions

One of the main problems for NE is the Competing Conventions Problem (Montana andDavis, 1989; Schaffer et al., 1992), also known as the Permutations Problem (Radcliffe,1993). Competing conventions means having more than one way to express a solutionto a weight optimization problem with a neural network. When genomes represent-ing the same solution do not have the same encoding, crossover is likely to producedamaged offspring.

Figure 1 depicts the problem for a simple 3-hidden-unit network. The three hid-den neurons A, B, and C, can represent the same general solution in 3! = 6 differentpermutations. When one of these permutations crosses over with another, critical in-formation is likely to be lost. For example, crossing [A,B, C] and [C, B, A] can resultin [C,B,C], a representation that has lost one third of the information that both of theparents had. In general, for n hidden units, there are n! functionally equivalent solu-tions. The problem can be further complicated with differing conventions, i.e., [A,B, C]and [D,B,E], which share functional interdependence on B.

An even more difficult form of competing conventions is present in TWEANNs,because TWEANN networks can represent similar solutions using entirely differenttopologies, or even genomes of different sizes. Because TWEANNs do not satisfy strictconstraints on the kinds of topologies they produce, proposed solutions to the com-peting conventions problem for fixed or constrained topology networks such as nonre-dundant genetic encoding (Thierens, 1996) do not apply. Radcliffe (1993) goes as far ascalling an integrated scheme combining connectivity and weights the “Holy Grail in

Evolutionary Computation Volume 10, Number 2 103

A

2

3

1

B C C

2

3

1

B A

[A,B,C]

[A,B,A] [C,B,C] Crossovers:

[C,B,A] x

(both are missing information)

Figure comes from “Evolving Neural Networks through Augmenting Topologies” by K. Stanley and R. Miikkulainen




Two types of structural mutations in NEAT:

2 NEUROEVOLUTION OF

AUGMENTING TOPOLOGIES (NEAT)

The NEAT method of evolving artificial neural networks

combines the usual search for appropriate network weights

with complexification of the network structure. This ap-

proach is highly effective: NEAT outperforms other neu-

roevolution (NE) methods, e.g. on the benchmark double

pole balancing task by a factor of five (Stanley and Miik-

kulainen 2001, 2002b,c). The NEATmethod consists of so-

lutions to three fundamental challenges in evolving neural

network topology: (1) What kind of genetic representation

would allow disparate topologies to crossover in a mean-

ingful way? (2) How can topological innovation that needs

a few generations to optimize be protected so that it does

not disappear from the population prematurely? (3) How

can topologies be minimized throughout evolution so the

most efficient solutions will be discovered? In this section,

we explain how NEAT addresses each challenge.1

2.1 GENETIC ENCODING

Evolving structure requires a flexible genetic encoding. In

order to allow structures to complexify, their representa-

tions must be dynamic and expandable. Each genome in

NEAT includes a list of connection genes, each of which

refers to two node genes being connected. Each connec-

tion gene specifies the in-node, the out-node, the weight of

the connection, whether or not the connection gene is ex-

pressed (an enable bit), and an innovation number, which

allows finding corresponding genes during crossover.

Mutation in NEAT can change both connectionweights and

network structures. Connection weights mutate as in any

NE system, with each connection either perturbed or not.

Structural mutations, which form the basis of complexifi-

cation, occur in two ways (figure 1). In the add connection

mutation, a single new connection gene is added connect-

ing two previously unconnected nodes. In the add node

mutation an existing connection is split and the new node

placed where the old connection used to be. The old con-

nection is disabled and two new connections are added to

the genome. This method of adding nodes was chosen in

order to integrate new nodes immediately into the network.

Through mutation, genomes of varying sizes are created,

sometimes with completely different connections specified

at the same positions.

In order to perform crossover, the system must be able to

tell which genes match up between any individuals in the

population. The key observation is that two genes that have

the same historical origin represent the same structure (al-

1A more comprehensive description of the NEAT method isgiven in Stanley and Miikkulainen (2001, 2002c).

1 2 3

4

5

1 2 3

4

5

1 2 3

4

5

1 2 3

4 6

5

1!>4

1!>4

1!>4

1!>4

2!>4

2!>4

2!>4

2!>4

2!>5

2!>5

2!>5

2!>5

3!>5

3!>5

3!>5

3!>5

4!>5

4!>5

4!>5

4!>5

3!>4

3!>6 6!>5

DIS

DIS

DIS

DIS DIS

1

1

1

1

3

3

3

3

4

4

4

4

5

5

5

5

6

6

6

6

7

8 9

Mutate Add Connection

Mutate Add Node

Figure 1: The two types of structural mutation in NEAT.Both types, adding a connection and adding a node, are illus-trated with the genes above their phenotypes. The top number ineach genome is the innovation number of that gene. The bottomtwo numbers denote the two nodes connected by that gene. Theweight of the connection, also encoded in the gene, is not shown.The symbol DISmeans that the gene is disabled, and therefore notexpressed in the network. The figure shows how connection genesare appended to the genome when a new connection is added tothe network and when a new node is added. Assuming the de-picted mutations occurred one after the other, the genes would beassigned increasing innovation numbers as the figure illustrates,thereby allowing NEAT to keep an implicit history of the originof every gene in the population.

though possibly with different weights), since they were

both derived from the same ancestral gene from some point

in the past. Thus, all a system needs to do to know which

genes line up with which is to keep track of the historical

origin of every gene in the system.

Tracking the historical origins requires very little compu-

tation. Whenever a new gene appears (through structural

mutation), a global innovation number is incremented and

assigned to that gene. The innovation numbers thus rep-

resent a chronology of every gene in the system. As an

example, let us say the two mutations in figure 1 occurred

one after another in the system. The new connection gene

created in the first mutation is assigned the number , and

the two new connection genes added during the new node

mutation are assigned the numbers and . In the future,

whenever these genomes crossover, the offspring will in-

herit the same innovation numbers on each gene; innova-

tion numbers are never changed. Thus, the historical origin

of every gene in the system is known throughout evolution.

Through innovation numbers, the system now knows ex-

actly which genes match up with which. Genes that do not

match are either disjoint or excess, depending on whether

they occur within or outside the range of the other parent’s

innovation numbers. When crossing over, the genes in both

Figure comes from “Evolving Neural Networks through Augmenting Topologies” by K. Stanley and R. Miikkulainen




Protecting Innovation through Speciation

Changing the topology of a network is often very disruptive.

Structural innovation is unlikely to survive in the population.

NEAT divides the population into species that competeprimarily within their own niches.

Minimizing Dimensionality

Forcing minimal topologies could be achieved by incorporatingnetwork size into the fitness function.

NEAT biases the search towards minimal-dimensional spacesby starting with a population with no hidden nodes.



Outline





5 Summary



N-tuple Network Architecture

Type of ANN that operates on compound object (matrix,image) x which elements can be easily indexed and retrieved.

Formed by a set of m tuples – each created by (randomly)sampling input object with n locations.

29 39 191

134210203

195 189 90

Figure comes from “Face Recognition with the Continuous N-tuple Classifier” by S. M. Lucas



N-tuple Network Output Value

Each input location has v possible values – a single n-tuplerepresents an n-digit number in base-v numeral system.

Each n-tuple has an associated look-up table (LUT) whichcontains parameters equivalent to weights in standard ANN.

Locations aij , where j=0..n−1 specified by each n-tuple ti areused to identify an address in a look-up table.

The output of the network is calculated by summing LUTvalues indexed by particular n-tuples:

f (x) =m∑i=0

fi (x) =m∑i=0

LUTi

n−1∑j=0

x(aij)v j



N-tuple Network for Othello

In the context of Othello, an n-tuple network acts as a stateevaluation function – computes utility of a given board state.

2 0 1

1 0 20

0 0.57

26 - 0.02

- 0.34

0.87

1

19

LUT1

0 0.43

80 0.09

- 0.76

- 0.21

1

33

LUT2

Snake-shaped inputs are randomly assigned and stay fixedwhile learning affects weights in the look-up table.



N-tuple Network as TWEANN

Structural Genetic Operators

Mutation consists in changing the input assignment of a singleelement of a tuple to one of its neighbouring locations.

Size of tuples remains constant throughout the evolution.

Crossover is restricted to exchanging whole tuples.

Each tuple represents an independent module that can beeasily combined with other modules.

Innovations are protected by applying an intensive individuallearning to a newly created structures.

Size of the representation does not grow.



Outline





5 Summary



Coevolutionary Gradient Search Process

Our approach is to analyse characteristics of the problem search space andthence to identify the algorithms (within the class considered) which exploitthese characteristics – we pay for our lunch, one might say.

Lionel Barnett

We aim to search both spaces in parallel – discrete networktopology space and continuous weight space.

How to move in these spaces to gain from their character?

Coevolutionary Gradient Search

Directed gradient search – numerically estimates direction ofchange in the vicinity of the current candidate solution.

Undirected coevolutionary search – stochastically jumps overthe search space starting from the fittest configurations.



Search Operators

Genetic Operators

Following genetic operators operate on the fittest individuals:

Weight mutation (mw )Topology mutation (mt)Topology crossover (x)

Gradient Operators

Gradient-based search operators work in the weight space andconsist in a single gradient-descent TDL learning scenario.

How to create a competitive learning environment?

self-play scenario (s)population opponent (p)archival opponent (a)



Guiding the Search Process

Interactions between candidate solutions is the only source ofinformation that guides the search process.

18 2 Coevolution


P ! createRandomPopulation()evaluatePopulation(P)while ¬terminationCondition() doS ! selectParents(P)P ! recombineAndMutate(S)evaluatePopulation(P)

end whilereturn getFittestIndividual(P)




18 2 Coevolution


P ! createRandomPopulation()evaluatePopulation(A)while ¬terminationCondition() doS ! selectParents(P)P ! recombineAndMutate(S)evaluatePopulation(P)

end whilereturn getFittestIndividual(P)




1. Play round robin tournament between population members

2. Randomly select archival individuals to act as opponents

3. Select the best-of-generation individual and add it to the archive

Search operators use different types of interaction feedback.



Outline





5 Summary



Learning 7 x 4 N-tuple Networks

0

0.1

0.2

0.3

0.4

0.5

0.6

0 200 400 600 800 1000 1200 1400 1600 1800 2000

avera

ge p

erc

enta

ge s

core

games played (x 1,000)

CTDL-sxmw + HoF

CTDL-sxmw

TDL

CEL + HoF

CEL




0

0.1

0.2

0.3

0.4

0.5

0.6

0 200 400 600 800 1000 1200 1400 1600 1800 2000

avera

ge p

erc

enta

ge s

core


TDL

CTDL-sxmw + HoF

CTDL-sxmw

CEL + HoF

CEL




0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 200 400 600 800 1000 1200 1400 1600 1800 2000

avera

ge p

erc

enta

ge s

core


ETDL-sxmt

CTDL-sxmt + HoF

TDLCTDL-sxmw

CEL



Relative Performance of Self-play Methods

2000

3000

4000

5000

6000

7000

8000

9000

10000

11000

12000

0 400 800 1200 1600 2000

poin

ts in tourn

am

ents


TDL

PTDL

CTDL-s

CTDL-sx

CTDL-sxmt

CTDL-sxmt + HOF



Relative Performance of Mutual-play Methods

2000

3000

4000

5000

6000

7000

8000

9000

10000

11000

12000

13000

0 400 800 1200 1600 2000

poin

ts in tourn

am

ents


CTDL-p

CTDL-px

CTDL-pxmt

CTDL-ax + HoF

CTDL-asxmt + HoF



Relative Performance of All Methods

2000

3000

4000

5000

6000

7000

8000

9000

10000

0 400 800 1200 1600 2000

poin

ts in tourn

am

ents


CTDL-px

ETDL-sxmt

CTDL-sxmt

CTDL-sxmt + HoF

CTDL-asxmt + HoF



Evolutionary Player in the Othello League



Outline





5 Summary



Summary

Learning models that can be identified from observing nature:

population learning by genetic meanslife-time learning at an individual levelcultural learning by social interactions

We have implemented these models as different searchprocedures that incrementally improve candidate solutions.

A properly balanced combination of these models can result inobtaining the best performance in the long-term perspective.

The efficiency of n-tuple networks and our hybrid CTDLalgorithm has been confirmed in the Othello League.



Future Work

I am an enthusiastic Darwinian, but I think Darwinism is too big a theory to beconfined to the narrow context of the gene.

Richard Dawkins

Sociobiological inspirations:

Gene-culture coevolution (Dual Inheritence Theorem)– two types of replicators: genes and memes.Epigenetic transmission mechanisms – niche construction.

Improvement of selection procedures in noisy environments.

Designing more complex structural mutations for n-tuples.

Comparison between CTDL and NEAT algorithms.



Thank You


Coevolutionary Gradient Algorithms and their Application...

Documents

Transcript of Coevolutionary Gradient Algorithms and their Application...