Representation in Evolutionary Computation

Representation in Evolutionary Computation

Daniel Ashlock, Senior Member IEEE

Motivation

The most critical choice when designing an evolutionary algorithm isthe representation. This incorporates both the data structure andvariation operators used to represent members of an evolvingpopulation. The goals of this tutorial are:

1 To convince you that representation has an impact onperformance and behavior in evolutionary computation.

2 To give examples of several types of representations used inevolutionary computation.

3 To give you tools for thinking about how to represent a problem.

Ashlock (Guelph) Representation in EC 1 / 93

Scope

Topics

Test problems in evolutionarycomputation.

Ordered gene problems.

Real parameter optimization.

Representations for graphevolution.

Automatic content generation.

Data manipulationrepresentations.

Representations

Direct representations with thesame genotype and phenotype.

Generative representations,both command-based andfractal.

Representations based ongroup theory.

State conditioned and hybridrepresentations.

Impact

This presentation shows representation impactsperformance and behavior. It can even open up newcapabilities using representation-dependent algorithms.


Self-Avoiding Walks

Definition: A walk on a grid is a sequence of moves between cells of thegrid that share a side. If no square is visited twice, then the walk is selfavoiding.

RRUULDLUURRULLURRRRRRRDLLLLDDDDRUUURRRDLLDDRURD

Shown above left is a self avoiding walk that covers the squares of a 6x8grid. The moves are given by U=up, D=down, L=left, and R=right. Thewalk is shaded from red to blue as it proceeds.


Examples of Self-Avoiding Walks

RRUULDLUURRRDRURDDDLULD RRRUUULLLDDRURD URDRURDRRULURULLDLLLURR

Finding a self-avoiding walk(SAW) is an evolutionary computation testproblem. It has a large number of optima; results are easy to visualize; andthe optimization character of the SAW problem changes when the gridshape is changed.

The SAW problem also exhibits epistasis. The fitness contribution of aparticular value at a particular position depends strongly on the earlierpositions.


Representations for the SAW Problem

The standard representation uses character strings over thealphabet Up, Down, Left, and Right.

The state-conditioned representation uses strings over thealphabet Forward, Left then advance, and Right then advance.This representation must remember which way it is facing.

The gene expression representation is derived from the othersby using upper and lower case letters. Lower case letters areignored and the size of the string is increase 50%.

These four examples have alphabets of size 4, 3, 8, and 6 as well asusing strings of different lengths. A larger alphabet and a longerstring both increase the size of the search space exponentially.


Contrast: Population Size (4× 4 SAW)

The effect of changing population size is very different for differentrepresentations.


Contrast: Population Size (5× 5 SAW)

The effect of changing population size is different for the 4× 4 and 5× 5versions of the SAW problem.


Contrast: Mutation Rate (4× 4 SAW)

The best rate of mutation also is different for different representations.


Contrast: Crossover (4× 4 SAW)

The effect of changing the crossover operator can be explained by theepistatic character of the SAW problem. Notice the different impact ondifferent representations.


Contrast: Selection Pressure (4× 4 SAW)

The tournament size is the number of creatures compared to select parents forreproduction. Increasing tournament size increases selection pressure. Note thatsmall tournaments compensate for lack of exploration.


A Representation from Australia: James Montgomery

Here is a very different representation. Instead of specific moves we usereal number 0 ≤ r ≤ 1 to specify each move. When a move must bemade, the number is partitioned to select among the possible moves.

When no move is possible then the real number is partitioned to select oneof the three moves that do not exactly reverse the previous move. Thisrepresentation is called the adaptive representation for SAW.


A Representation from Australia: James Montgomery

Suppose we have the chromosome (0.23, 0.56, 0.81, 0.95, 0.17, 0.22,0.32, 0.41) for the 3× 3 saw. Then, starting in the lower left corner andassume the order we examine the moves in is UDLR, this chromosomewould specify the walk as follows

Position Step Move Choices Value(0,0) 0 Start 2 n/a(0,1) 1 UP 2 0.23 (bottom half)(1,1) 2 RIGHT 3 0.56 (top half)(1,0) 3 DOWN 1 0.81 (top third)(2,0) 4 RIGHT 1 0.95 (no choice)(2,1) 5 UP 1 0.17 (no choice)(2,2) 6 UP 1 0.22 (no choice)(1,2) 7 LEFT 1 0.32 (no choice)(0,2) 8 LEFT 0 0.41 (no choice)

Yielding the walk:

Notice the number of “no choice” moves and moves with less then threechoices. This suggests that the new representation can radically reducethe effective search space, retaining and extending the advantages ofthe state conditioned representation.


Results for the new representation

Remember, 6x6 is hard

Number of optimal solutions located in 30 trialsN 4 5 6 7 8 9 10

Optimal solutions found: 30 30 30 30 30 17 14

These are some of the 8× 8 solutions located. The new representation hasan incredible increase in search power; roughly ten orders of magnitude.


Comments on Adaptive Representations

When can you have an adaptive representation?

When the structure being evolved it built from a serial collection ofchoices that constrain one another.

The constraints placed on the next choice by the previous choicesmust be transparent.

Conservation of the number of choices of a given type might wellprovide such constraints.

Generative representations are prime cadidates for transformation intoan adaptive representation.

ReferenceJ. Montgomery, M. Randall, and A. Lewis Differential Evolution forRFID Antenna Design: A Comparison with Ant ColonyOptimisation. Genetic and Evolutionary Computation Conference(GECCO-2011), pp. 673680, 2011.


Conclusions: SAW Problems

The state-conditioned representation reduces the size of the searchspace by encoding the idea that backing up is bad and yet oftenperforms badly; it exploits into local optima.

The gene expression representation can insert and delete charactersinto the expressed SAW genome when mutation exchanges an upperand lower case character. This favors exploration over exploitation.

The impact of changing representation was much larger than impactof any of the other choices. Especially with the adaptiverepresentation which annihilated the others.

Excluding the adaptive representation, the two representations withlarger search spaces, the gene expression representations, exhibitedthe best performance. They compensate more effectively for a roughfitness landscape, enough to overcome their larger search space.


Next:Ordered gene representations

When you need each object exactly once.


Ordered gene representations

It is not uncommon to need to optimize an order of a collection of objects.The best known ordered gene problem is the Traveling SalesmanProblem.

City Position City PositionA (1,93) G (12,39)B (45,75) H (47,38)C (29,18) I (8,27)D (87,18) J (88,73)E (50,5) K (50,75)F (23,98) L (98,75)

City Position City PositionA (10,90) H (32,33)B (48,35) I (28,60)C (76,50) J (98,85)D (56,35) K (10,10)E (34,20) L (34,52)F (68,52) M (1,92)G (42,28)

Shown are coordinates for 2 Traveling Salesman tours (left) and pictures of theircorresponding optimal tours (above). The goal is to find an order on the citiesthat minimizes total distance.


Comments on ordered gene representations

The typical representation for an ordered gene problem is a list of theobjects in some order, once each. This turns out to be a terribleidea, on average.

Ordered gene problems differ from other evolutionary computationproblems in their variation operators. The largest difference is that astandard crossover of two ordered genes often fails to yield acorrect ordered gene.

The awkwardness of the natural representation suggests that orderedgenes are a good place to look for additional representations.

We will look at:I ♣ Random key representations.I ♣ Adjacent transposition representations.I ♣ Adaptive transposition representations.

The additional representations are all simple, linear representations forordered genes that can use standard crossover operators.


The problem: an example of ordered gene crossover

1 2 | 3 4 5 6 | 7 8

7 1 | 3 6 5 8 | 2 4

(crossover takes place afterlocations 2 and 6)

3 4 5 6 | 7 1 8 2

3 6 5 8 | 1 2 4 7

This crossover operator is a modification ofa standard string crossover operator. Thecontributions from one parent arepreserved intact and moved to thefront of the permutation. The elementsof the ordered list not in this preservedsegment are taken from the other parentin the order they appear there andplaced at the end.

There are many crossover operators for ordered lists. Each preservesdifferent aspects of the ordered list: position within the list, proximityof elements, order of blocks of elements. Which is appropriate dependson what qualities most need to be heritable. This is often best assessed bytesting.


Random keys: a different way to store ordered genes

The random key encoding for ordered genes operates by storing anordered gene implicitly as the sorting order of an array of numbers. Thearray of numbers is the structure on which crossover and mutationoperate.

Random Key0.7 1.2 0.6 4.5 -0.6 2.3 3.2 1.12 4 1 7 0 5 6 3

Permutation

Random key encoding permits standard evolutionary algorithms to storeordered lists by decoding arrays of reals into permutations.

ReferenceJames C. Bean, Genetic Algorithms and Random Keys forSequencing and Optimization in the ORSA Journal onComputing 2(2), 1994, PP. 154-160.


Comment on the random key representation

The random key representation permits you to feed ordered geneproblems into a standard real parameter optimizer. This may or maynot be a good idea but it is convenient.

The real key representation is many one with permutationscorresponding to volumes of Euclidean space. The distance betweennumbers in the sorted list represents a binding strength: thecloser together the numbers are the more likely the list elements theyrepresent are to remain together. This means that evolved listscontain information about which elements of the list belongtogether.

It is likely (no test yet) that the random key representation canevolve to defeat bad choice of crossover operator in a way thatdirect representations of ordered genes cannot by exploiting closeness.

The potential to encode binding strengths permits post-evolutiondata mining and use of estimation of distribution algorithms.


The adjacent transposition representation

Theorem

Adjacent transpositionsgenerate the symmetricgroup.

Algorithmic Version

Any ordered list may be obtainedfrom any other by specifying a seriesof swaps of adjacent elements.

The adjacent transposition representation (ATR) uses a list of integersin the range 0 . . . n − 1 to represent an ordering of a list of n elements. Toexpress the representation, the sorted order of the list is used as a startingpoint. Reading the chromosome in order a number k < n − 1 means swapitems k and k + 1 while if k = n − 1 nothing is changed.

Why do we need the “do nothing”? Some permutations are theproduct of an even number of swaps, others an odd number. Do nothingpermits the representation to encode both.


An example of expressing the ATR

Gene: 2010645 Expression

Start01234567 2 - swap 2,301324567 0 - swap 0,110324567 1 - swap 1,213024567 0 - swap 0,131024567 6 - swap 6,731024576 4 - swap 4,531025476 5 - swap 5,631025746 final result

The gene 2010645 encodes the ordered list 31025746. Notice that thisrepresentation is multi-one and must be long enough.


Comments on the ATR

It is not too hard to see that a string of 12 (n2 − n) adjacent

transpositions and do-nothings can encode any permutation.

A representation longer than the minimum length becomes many-onefor all permutations.

The number of different encodings for a particular ordered list variesdepending on the minimum number of adjacent transpositionsneeded to produce it.

If the representation is too short to search the entire space ofpermutations it becomes a type of local search.

A generative representation is one that specifies how to constructthe desired object. The ATR is a fairly simple generativerepresentation.

Adjacent transpositions are an example of a generating set of thespace of all ordered genes. There is an astronomical space ofgenerating sets - each can be used as an alphabet for an ordered-generepresentation.


Additional techniques related to the ATR

Any generative representation that uses a starting point can berestarted and recentered by taking the best object located so farand using it as a new starting point.

The ATR can use transpositions that are relative to the starting pointpermutation, whatever it is.

Short gene lengths permit local search near a good example, whichpermits finishing or polishing a good solution.

Notice that the gene length becomes an important algorithmicparameter. In the following reference it is dynamically adapted.

Reference

H. James, S. Houghten, and D. Ashlock, Recentering,reanchoring and restarting an evolutionary algorithm,in Proceedings of the 2013 World Congress on Nature andBiologically Inspired Computing (NaBIC), PP 76-83, 2013.


Ordered genes control greedy algorithms

Simple greedy algorithms can be used to vertex or edge color graphsor to pack cargo into bins. These and other greedy algorithms havethe following properties:

The quality of the solution depends on the order in which thegreedy algorithms considers the objects it is acting on.

It is possible to prove that an optimal result exists for at leastone (and usually many) orders.

A greedy ordered evolutionary algorithm uses an evolutionaryalgorithm that evolves orders to control a greedy algorithm. Such analgorithm can use any of the representations for ordered genesincluded in this tutorial. The greedy algorithm encodes domainknowledge about the problem.


RNApredict: a greedy ordered representation

RNApredict is a secondarystructure prediction programfor RNA, developed andmaintained by the Wiese lab.It incorporates a beautifulgreedy representation.RNApredict uses an orderedgene representation thatorders the stems in an RNAsequence.

Thanks: Rfam database on wikimedia for RNA images.


The representation from RNApredict

A stem is a pair of complementary stretches of RNA that can base pair.RNApredict enumerates all possible stems in an RNA sequence. The stems arethen selected for incorporation into a secondary structure with a greedy algorithm:a stem is used unless some of its bases have already been used by another stem.

The evolutionary algorithm in RNApredict evolves the order in which the stemsare considered by the greedy algorithm. This technique, together with verycarefully designed energetic models, yields performance competitive with the bestcurrent secondary structure predictions.

ReferenceKay C. Wiese, Alain Deschnes, Andrew Hendriks, RnaPredict-AnEvolutionary Algorithm for RNA Secondary Structure Predictionin IEEE/ACM Trans. Comput. Biology Bioinform, 5(1): 25-41 (2008)


Summary for Ordered Gene Representations

The direct representation of an ordered list requires the use ofcumbersome, disruptive crossover operators.

The random key representation re-represents ordered gene problemsare real parameter optimization problems.

The many one character of the random key representation canencode information about which list items belong close to oneanother.

The adjacent transposition representation introduces theadditional algorithmic parameter gene length that permits control ofthe degree of locality of the ordered gene search.

The random key, ATR, and other representations based on generatingsets of the space of ordered lists are all linear, able to use standardcrossover operators.


New Topic: Representation in Real Optimization

In part, a tale of two triangles.


Well Known Representations

Strings of bits that are chunked into blocks that code for the value ofreal parameters. Mutation consists of flipping bits.

Arrays of real values that directly specify parameter values. Mutationsare probability distributions which need to be chosen with somecare. This choice gives the designer substantial control over the typeof evolutionary search performed.

Arrays of real values that encode both the real parameters beingoptimized and parameters for mutation operators, e.g. thevariance-covariance matrix for a multivariate Gaussian distribution.This is called self adaptation.

CitationD. B. Fogel, Phenotypes, Genotypes, and Operators inEvolutionary Computation, Proceedings of the 1195 IEEEInternational Conference on Evolutionary Computation, PP193-198, 1995.


An Excellent General-Purpose Modification

Baldwinian optimization is a hybrid of evolutionary computation with alocal optimizer. Evolutionary algorithms are typically good at global searchbut labor somewhat to find the top of a hill once they locate it. Localoptimizers can find the top of a hill quickly, but may lack the ability tofind another hill. A Baldwinian real optimizer evolves starting points forthe local optimizer. The fitness of a chromosome is the value that resultsfrom running the local optimizer starting with the chromosome. There aremany different possible local optimizers, and the good choice is problemdependent.

CitationD. Ackley and M. Littman, Interactions Between Learningand Evolution , Proceedings of the Second Conference onArtificial Life SFI Studies in the Sciences of Complexity, (X) G.C.Langton et. al. Eds. PP 487-509, 1991.

Question: Do different representations interact differently with local optimizers?


The Gene-Expression Representation

For real optimization, this representation contains a regulatory and avalue layer.

Mutations can change the value or the regulatory layers. Regulatorymutations enable insertion or deletion mutations. This is disruptive,encouraging exploration. Note that values are remembered even when notexpressed.


Optimize This!

This function:

gn(~x) =1

20n

n∑k=1

xk +n∑

k=1

sin(√

k xk)

has no upper bound and so is goodfor testing if an algorithm favorsexploration.

The graphic displays a portion of the two dimensional version of thefunction as a heat map. The function is made of hypereliptical hills with adifferent diameter along each coordinate axis. We will compare arepresentation using an array of D values with gene-expressionrepresentations with 2D-positions in D = 2, 3, . . . , 7 dimensions.


The Gene-Regulatory Representation Favors Exploration

Regulatory Control

100,000 fitness evaluations (n=400)D Mean Best Mean Best2 2.77 ± 0.02 3.25 2.59 ± 0.02 2.983 3.74 ± 0.01 4.11 3.54 ± 0.02 3.974 4.71 ± 0.01 5.05 4.54 ± 0.01 4.975 5.68 ± 0.02 5.98 5.51 ± 0.01 5.856 6.58 ± 0.02 6.95 6.49 ± 0.01 6.827 7.54 ± 0.04 7.93 7.50 ± 0.01 7.85

1,000,000 fitness evaluations (n=400)D mean fitness best fitness mean fitness best fitness

2 3.04 ± 0.03 4.17 2.58 ± 0.02 2.983 3.94 ± 0.02 4.45 3.55 ± 0.02 3.974 4.88 ± 0.01 5.33 4.52 ± 0.01 4.975 5.80 ± 0.02 6.31 5.52 ± 0.01 5.876 6.73 ± 0.01 7.10 6.51 ± 0.01 6.907 7.68 ± 0.02 8.08 7.50 ± 0.01 7.80

The advantage of the gene-regulatory representation decreases withdimension, but it makes better use of additional time.


A Fractal Representation for Real Optimization

All of the fractals above are generated by starting at a vertex of thebounding polygon and iterating the operation “average your position witha vertex selected uniformly at random”. When the figure fills in, you canuse a string of averaging commands as a representation.

This works for any sheared hypercube in D dimensions, but, here’s therub, you need an “alphabet” of 2D averaging commands. The Sierpinskirepresentation evolves sets of directions for how to construct points.


The Poladian-Sierpinski Representation

Instead of iteratively averagingtoward a vertex, we caniteratively select a sub-sector.The area covered is a simplex(triangle, tetrahedron, etc). Thenumber of generators needed inD dimensions is D + 1 (!!!).

This representation is still a string of commands that select a point (thecenter of the final simplex), but it scales far better as dimension increases.


Exploiting the String Representation for Points

A clear advantage of storing points in Rn as strings is that this permits usto save the strings in a dictionary. Then we can:

1 Find an optimum.

2 Use it to initialize a dictionary.

3 Iteratively re-run the EA to locate more optima awarding worst fitnessto anyone that shares a prefix of a certain length with a member ofthe dictionary.

4 Place each new optimum in the dictionary.

This is the Multi-Optima Sierpinski Searcher (MOSS). By using it wecan enumerate the optima present in the area being searched.

Notice that the effect is similar to that achieved with niche specializationbut the computational cost to exclude all optima already located isconstant.


A MOSS Application: Searching the Mandelbrot Set

Searching for interestingviews in the Mandelbrot setis a three-parameterproblem: x + iy as thecorner of the view and theside length S of the view.The picture at the left showsthe location of 1000 optimalocated using the MOSS.The fitness function is anRMS-error measured with adesired appearance mask.


The First 300 Optima.


The Twenty-Four Best Optima.


The Middle Twenty-Four Optima.


The Twenty-Four Worst Optima.


Conclusions and Observations

The gene-regulatory trick can be applied far beyond real optimization.It works with any linear representation.

The Poladian-Sierpinski representation extends the usefulness of theSierpinski representation from D < 10 real parameters to fairly highdimensions.

Domains where enumerating optima might be useful: proteinfolding, fitting complex model parameters, or evolved art. Inthese cases the fitness function starts wrong and so desirable answersare often local optima.

Representations do not function in a vacuum. There is a need tocompare different representations in different types of algorithms.Among the algorithms to test are evolutionary programming,evolution strategies, genetic algorithms, particle swarmoptimization, artificial ant algorithms, and others.


The Walking Triangle Representation


Context

The walking triangle representations (WTRs)are a collection of generative representations forpoints in Rn. The focus of the representation isa simplex (n-dimensional triangle) whose centerof mass is the point specified by therepresentation. A series of commands or movesthat modify a starting triangle form thechromosome structure of the representation.

A vertex of a d-dimensional simplex is oned + 1 points that define its boundaries.

A face is a simplex one dimension lowerconsisting of all but one vertex. Theremaining vertex is opposite the face.

The center of mass of a simplex is theaverage of all the vertices.

Vertex A is oppositeface BC .

Center of mass is:(A + B + C )/3.


The Walking Triangle Moves

The walking triangle representations generalize earlier fractalrepresentations. Five moves are used in the versions presented here, eachof which can be applied at any vertex. The representation is used to selecta point in Rn which is represented by the center of mass of the triangle.


The Walking Triangle Representation

The walking triangle representation is a linear one - we start with astandard simplex consisting of the standard basis together with theorigin and the apply a string of commands.

The moves are defined for any number of dimensions. In the sequelwe will use triangle to mean simplex.

Important Properties

Theorem: for any ε > 0, if p is a point in general position and d isthe distance from the center of mass of a simplex S to p thenthere exists of finite sequence of walking triangle moves q1q2 · · · qm

that places the center of mass of Sq1q2 · · · qm within ε of p. Thisis true for the sets of moves: WUC, WGS, and WUCGS.

Observation: The number of moves m in the sequence q1q2 · · · qm

is proportional to the logarithm of the distance from the center ofmass of S to p.


Variation Operators

Part of specifying a representation is to give the variation operators.Mutation is simple - we replace a command with another command atsome point in the chromosome. The choice of crossover operator isinformed by the group theory of the WTRs.

The walking triangle operations do notcommute with one another. An exampleof failure to commute appears at theleft. This means that the representationhas a high degree of epistasis. Thismeans that a natural choice is onepoint crossover.


Experiments.

Initialize a standard evolutionary optimizer, using Gaussian mutation, withpoints in the unit square, 0 ≤ x , y ≤ 1, and a walking triangle optimizerwith WUC gene length 30 starting with the standard simplex. Use thefunction

f (x , y) =1

1 + (x − R)2 + (y − R)2

Successes in 30 trials

R=0 R=40 R=100

Standard 30 30 3

Walking Triangle 30 30 30

The walking triangle representation is more robust to bad initialization.This is important when you don’t know where the optima are.


Parameter Setting f (x , y , z) = 1(x−10)2+(y−10)2+(z−10)2+1

with WUC

Defaults settings are populationsize 178, maximum number ofmutations 3, gene length 30.Notice that once gene length islong enough it becomes a softparameter.


What is a WTR chromosome?

So far, we have defined the WTR chromosomes as strings of

commands. The commands w1u0c1 applied to a simplex S are, in

more familiar form,

c1(u0(w1(S)))

The point is that WTR chromosomes specify functions from theset of all simplices to itself. This means that the point specifiedby a chromosome depends completely on the starting simplex.So far we have used the same starting simplex for everything, butthere is a potential gain from changing the starting simplex.

So where do we get better starting simplices?


Recentering

Once a population of WTR genes has been evolving for a while, they havetypically found a simplex with better fitness than the standard startingsimplex. Recentering consists of periodically replacing the startingsimplex with the best simplex found so far.

The simplex C1 is the initialstarting simplex. The firstrecentering replaces it with simplexC2; the second recentering replacesC2 with C3. The amount of timebetween recenterings is called anepoch. In addition to replacing thestarting simplex, a new randompopulation is generated.


Recentering Results f (x , y , z) = 1(x−10)2+(y−10)2+(z−10)2+1

using WUC

Note the large change in the number of failures. The slightly longertime to solution is the result of far more runs finding a solution.


Recentering Results

There is no large change in the number of failures this time. The bestmaximum number of mutations is now clearly three with recentering.Overall, the time to solution is better with recentering for three ormore maximum mutations.


Recentering Results

The large change in the number of failures appears only at theshortest gene lengths, but overall reliability is up. Gene lengths remaina soft parameter, but the results are more variable with recentering.


Walk This Way?

Here are the best walks for three different displaced hill functions.Highlighted are inverse pairs that, in effect, shorten the gene.

Walk FitnessF1 U0U0W0C2C1C3C3C3W3U0W2C3U3C2C0 1.0

C1U2W0W2C1W3U0C3C0C0U3C1C3W3U1F2 U0U0W0W0W0C2U2U0U0U2C2C3C1C1C2 1.0

W3W1C2C0U0C3W2C3C2C3W1U0W3U0W0F3 U0U0C2U3W2W2 W0W0C0W2W0C1C0C0W1 0.999992

U2U3C1C0C2C2C0W2U3W1C0U0C2C0C1

F1(x1, xx , x3) =1

1 +∑3

i=1(xi − 10)2,

F2(x1, xx , x3) =1

1 +∑3

i=1(xi − 100)2,

F3(x1, xx , x3) =1

1 +∑3

i=1(xi − 5− 10 ∗ i)2


Open-ended Functions : Trendwave

An open-ended function hasno global maximum and, forany value v , a optimum witha value larger than v . Thesefunctions are not interestingin their own right, rather theyare used to test analgorithm’s exploratorypower.

Trendwave: F1(~x) =∑xi

xi20

+ Cos(xi )

Trendwave is a field of cosine waves in every direction added to a shallowplane that rises in the direction (1, 1, . . . , 1).


Results for the trendwave function.

The walking triangle representation finds the trend and ignores the cosinewaves.


Example Genes for WTR, Length 30.

Lets look at the solutions that the algorithm found.

Trendwave U0U2U2U2U2U2U2U2U2U2U2U2U2U2U2U2U2U2U2U2U2U2U2U2U2U2U2U2U2U2

Bent Trendwave U0U0U1U1U1U1U1U1U1U1U1U1U1U1U1U1U1U1U1U1U1U1U1U1U1U1U1U1U1U1

Triple Hat U2U2U2U2U2U2U2U2U2U2U2U2U2U2U2U2U2U2U2U2U2U2U2U2U2U2U2U2U2U2

The genes show that the WTR representations are purely discovering thetrend and completely ignoring the cosine “noise”. The U0 in thetrendwave is not too helpful, but it doesn’t hurt too much. It is likely thatit remained in the best-of-run gene by epistasis.


Results for standard evolutionary optimizer.

A walking triangle representation found fitness values of roughly 1021 for each ofthe three open ended fitness functions. The Gaussian evolutionary optimizer wasrun 100 times on these functions, initialized in the unit square. The sorted resultsare shown below.

The log of the WTR fitness exceeded the plain fitness of the standardoptimizer.


New Topic: Representations for Evolving Networks


Networks Defined

Networks or graphs are specified by a set of nodes and a set of edges.Edges are pairs of nodes that are connected. The graph below has tenvertices and fifteen edges.

To specify a graph we must know home many vertices there are and whichpairs of vertices are at the end of edges. The shortest path betweenvertices places a distance structure on the set of vertices; when no pathexists this distance is infinite.


Graphs are used to...

Form a computationally tractable representation of essentialfeatures of a geography. This application arises in the famous mapcoloring problems.

Diagram networks of human beings or computers such as socialnetworks, contact networks for epidemiology, LANs, the Internet, orco-author networks among researchers.

Graphs are used to diagram and resolve conflicts, e.g. the verticescan be meetings and edges form when two meetings have a commonparticipant.

Planar graphs are graphs that can be drawn without edges crossing.These are used to lay out electric circuit diagrams formanufacturing.

Graphs are also used in many mathematical enterprises such asfinding error correcting codes.


Finding Contact Networks for Epidemics

Out first graph induction problem was, given the number of people infected ineach time period of an epidemic, find a plausible contact network. For this wesearch the space of networks for likely to yield the given behavior.

ReferenceD. Ashlock, E. Shiller, and C. Lee Comparison of Evolved Epidemic Networks withDiffusion Characters, Proceedings of IEEE Congress on Evolutionary Computation, PP781-788, 2011.


Representations for Evolving Graphs

The TADS representation for evolving networks is encoded as a list oflarge integers that are interpreted as specifying commands: for twovertices, force an edge to exist between them Add, for two vertices, makesure no edge exists between them Delete, for two vertices, toggle theexistence of an edge between them Toggle, for two vertices and two oftheir neighbors, perform the edge swap shown above Swap.

The toggle-add-delete-swap (TADS) representation uses these fourcommands, assigning a probability that each will be generated whencreating or mutating strings of the commands. A starting graph is alsorequired to complete the specification of an instance of the TADSrepresentation.


Comment on the TADS representation

The starting graph is a place to embed some types of domain knowledge.

The swap operation is edge-density neutral and searches connectivity.

The TADS representation is actually a parametrized family ofrepresentations. From the four probabilities for toggle, add, delete, and swapit is possible to compute (and thus choose) the expected edge density ofrandom population members.

It is possible for TADS to self-adapt its representation on the fly.

Representation length is an important parameter that controls locality ofsearch.

ReferenceD. Ashlock, J. Schonfeld, L. Barlowand C. Lee, Test Problems andRepresentations for Graph Evolution,Proceedings of the IEEE Symposiumon the Foundations of ComputationalIntelligence, PP 38-45, 2014.

ReferenceD. Ashlock and L. Barlow A Class ofRepresentations for Evolving Graphs,accepted to the 2015 Congress onEvolutionary Computation, 2015.


Lessons learning while evolving graphs.

The obvious representations, a binary string specifying the presence orabsence of each edge, performs very badly for most of the fitness functionswe have examined. This is because almost all graphs have 50± ε% edgedensity and most interesting graphs have low edge density.

Evolving neural net topologies is a domain where graph evolution hasbeen done for years. Neural net training algorithms are so powerful thatfinding good topologies is usually not hard.

The TADS representation incorporates, for various parameter values, severalpublished representations. This has been very convenient.

TADS can solve distance structure, network robustness, epidemic fitting,cooperation-induction, and locating difficult-to-color graph problems. Whiledoing this we learned that specializing the representation to theproblem pays large bonuses in graph evolution.

Having a parametrized representation enables the performance ofparameter tuning studies across a space of representations. This hasenabled the solution of some fairly hard problems.


And now for something completely different.


Automatic Content Generation

Automatic content generation (ACG) consists of using an algorithmto create content. A common task for ACG is to generate maps forlevels of a video game. We will look at a variety of representations formaps situated on a rectangular grid created via evolutionarycomputation. The representations we will look at:

A simple binary representation,

A constructive representations that specifies walls,

A constructive representation that digs out rooms,

A more complex representation,

A height-based representation, and

A cellular automata representation.

Examine the character of the levels maps that are generated.


A Direct Binary Representation

This representation uses thecomplex encoding 1=full,0=empty.

It only works if you startmostly empty and let evolutionfill in more full squares later.We call this technique sparseinitialization.

The fitness function is basedon dynamic programming thatmeasures distances betweencheckpoints, shown as greendots.

A great deal may be done by playing with the fitness function; the newgoal is to look at the impact of representation.


A Positive Generative Representation

This representation is an arrayof wall building commands.

Sparse initialization is usedhere as well, but in the form ofshort initial wall lengths.

The representation isadditionally constrained by alimit on total number ofsquares that may be filled bywalls.

The results here look moreplanned and less cave-like.

This representation uses walls with a starting point, direction, and length.There are eight available directions; it is trivial to modify therepresentation to use only some directions, e.g. horizontal and vertical.


A Negative Generative Representation

This representation is an array ofroom and corridor specificationsto be removed from a matrix thatstarts full.

Sparse initialization is not needed.In this representation it is criticalto have enough rooms and a“skip me” bit.

The entrance and exit are alwaysplaced in rooms that are part ofthe starting configuration toreduce the probability of deathfrom non-traversability.

This representation specifies rooms and corridors with an upper left cornerand horizontal and vertical dimensions. Corridors have one dimension setto one, rooms are at least two-by-two.


A Positive Generative Representation with Multiple Wall Types

This representation is another versionof the positive representation withthree types of walls that make twomazes.

The stone-fire and stone-water mazesare evaluated with different fitnessfunctions to create a tactical situationfor two agent types.

A key factor is how to combine thetwo fitnesses; here a geometricaverage was used.

Citations

D. Ashlock, C. Lee, and C. McGuinness,Search-Based Procedural Generation ofMaze-Like Levels, IEEE Transactions onComputational Intelligence and AI in Games,3(3), PP 260-273, 2011.

D. Ashlock, C. Lee, and C. McGuinness,Simultaneous Dual Level Creation for Games,IEEE Computational Intelligence Magazine,6(2), PP 26-37, 2011.


Do They Look Different?

Here are mazes evolved withdifferent representations.The choice of representationgives substantial controlover the character andappearance of the mapsthat evolve. With theexception of the fire-and-icemaze, these maps wereevolved with the samefitness function.


Hacking the Representation: Symmetry

These mazes all use aversion of the direct, binaryrepresentation. Each ismodified to have a differenttype of symmetry. The bitsare used twice or four timesto specify full or emptysquares. Many other typesof symmetry, includingperiodicity, could beimposed.


Hacking the Representation: close-ups

Note that while this technique is implemented with the direct, binaryrepresentation it can be used with any representation.


Hacking the Representation: close-ups

These example show reflective and rotational symmetry. It is alsopossible to use translational symmetry to create a map with alarge number of repeating subunits.


An easier fitness function with required content.

These rooms have checkpoints in the center of all four sides and in each ofthe planned structures. For each pair we ask the algorithm to maximize,minimize, or ignore the distance between that pair of checkpoints. If M isthe sum of the long distances and m is the sum of the short ones

M

m + 1

is a good fitness function. It gives variable repayable content in “tiles”.


Avoiding quadratic scaling by putting together pieces

Huge mazes can be assembled out of evolved tiles. Note theuser-designed features present in some of the tiles. With controlled tileproperties, we can achieve replayability. The assembly plan is also anevolved maze with controlled properties. It is a 6x6 chromatic maze.


Putting Together Other Pieces

When assembling large mazes, tiles evolved with different representationsmay be mixed and matched. These examples do not begin approach thelimits of the technique; they were chosen to fit on the screen.


Another Representation: Height Maps

The height map above, if you place a

wall between squares that have a height

difference more than a threshold,

generates the level map at the left. The

height map is generated with an

evolvable state-conditioned fractal

quadrature.

ReferenceD. Ashlock and C. McGuinness, Landscape Automata for SearchBased Procedural Content Generation, in Proceedings of the 2013IEEE Conference on Computational Intelligence in Games, PP 9-16.


Another Representation: Cellular Automata Rules

The maps above are coded as a competition matrix that says what score each cellstate gets from others. The automata run is for a short time. Fitness is the sizeof the largest connected component, penalized by the deviation from half full.

Reference: First CA-level map paper

Lawrence Johnson, Georgios N. Yannakakis, Julian Togellius Cellularautomata for real-time generation of infinite cave levels, Proceedingsof PC Games, 2010


Cellular Automata are Scalable

Cellular automata rules areapplied locally. Once a rule isevolved, much larger maps canbe made by simply applying therule to a larger grid. Changingthe initial conditions createsmaps with the same characterbut different details.

The initial conditions are generated uniformly at random. The grid wherethe automata are tested for fitness are toroidal so the maps also tilecorrectly.


A Representation for Data Manipulation


Group Theory in Representation for Evolutionary Computation

Earlier in this presentation, we have seen the adjacent transpositionrepresentation for ordered genes and walking triangle representations.Both of these representations arise from group theory, a field of abstractalgebra. While they may be many other things, all groups can be thoughtof as closed sets of invertible functions - this is Cayley’s famous theoremon permutation groups, rephrased into engineering. Consider theinvertible, increasing functions from the unit interval 0 ≤ x ≤ 1 to itself.

f (x) = 11x10x+1 f (x) = x+Sin(6x)

1+ 16Sin(6)


Distorting the y -axis of an image

D(y) = 2yy+1 D(y) = y D(y)= y

2−y

The images are indefinite resolution, defined by a series of comparisons,and effectively 17 billion pixels on a side, meaning that if we apply thewarp function and then ask the picture its color we avoid undesireddistortion. We can map warps of the unit interval to any interval by anaffine transformation. The distorition can be applied in any direction oreven in alternate coordinate systems, e.g. radially. Because the warpsare drawn from a group the composition of warps is a warp.


A Representation for Evolving Space Warps

Suppose we have a collection of spacewarp functions. Then a simple stringrepresentation saying in what order to compose them gives us an evolvablerepresentation. We have already seen the image manipulation applicationof warp functions. Another is inverse CDF discovery.

Take a set of data and sort it into increasing order.

Normalize the data to the unit square.

Evolve a space warp that, when applied to the data points, minimizesthe error of the data points with f (x) = x .

This system performs evolutionary regression to find the inverse of thecumulative distribution function of the data, after you reverse thenormalization. There are already package for doing this, but this method,again because of the group theoretic nature of the representation, yieldsinvertible, infinitely differentable functions, something the standardpackages do not do. This means that the approximate CDF and PDF ofthe data may also be retrieved.


Another Representation

It turns out that if f (x) and g(x) are both increasing invertible functionsfrom the unit interval to itself then so is λf (x) + (1− λ)g(x) for0 ≤ λ ≤ 1. If the have n basis functions, then we can evolve coefficientsand find space warps via simple real parameter optimization.

The shapes above were created with human-in-the-loop evolution in whicha human was selecting the real parameters. These shapes are anotherexample of use of indefinite resolution techniques.


Where are you getting the original warp functions?

Here is a starter set:

f (x) = xα for 0 < α <∞,

g(x) = (ω+1)xωx+1 , 0 < ω <∞,

h(x) =

∫ x0 q(t)2 · dt(∫ 10 q(s)2 · ds

) ,where q(x) is bounded on the unit interval and integrable.

The last example is a design-your-own plan. The example function:

f (x) =x + Sin(6x)

1 + 16 Sin(6)

from earlier in the presentation was created by integrating Cos2(3x). Ingeneral it is easy to find warp function and even choose them to beapplication specific.


Data Manipulation Representation Summary

Applications include image modification and fitting distributions todata. Many other applications are possible.

The representation based on invertable, increasing maps are oneexample of a representation that arises from group theory. Othersinclude the walking triangle and adjacent transpositionrepresentations. We have others in development.

The fact that we designed the representation using groups gave usadded power - the ability to recover PDFs and CDFs from inverseCDFs, which was what was being approximated.

The theory of groups is a branch of mathematics in whichone does something to something and then compares theresult with the results of doing the same thing tosomething else, or something else to the same thing.- James Newman


It Is Too Much. Let Me Sum Up...

Representation is a place to incorporate domain knowledge.

Careful design of representation can affect solution time more thanparameter tuning. You should, of course, do both.

Adaptive representations, an emerging technology, can have anincredible impact on performance. We saw this with the Australianrepresentation for the SAW problem.

It is possible to make parameterized families of representationssuch as the TADS representation for network or graph evolution.

In real optimization, different representations granted differentcapabilities. The Sierpinski-Poladian representation permitsenumeration of optima. The gene expression representationstrongly favors exploration.

The automatic map generation example showed that representationshave a large impact on style of the evolved maps.

Algebraic groups, like bowties, are cool.


Many Thanks

Thanks to the Natural Sciences and Engineering Research Council ofCanada for support of this work. I am indebted to Eun-Young Kim,James Huges, Sherridan Houghton, Leon Poladian, Wendy Ashlock,Colin Lee, Cameron McGuinness, and Jeremy Gilbert for their helpand inspiration.


Representation in Evolutionary Computation

Documents

Transcript of Representation in Evolutionary Computation