BMC Evolutionary Biology - Universiteit Utrechtbioinformatics.bio.uu.nl/pdf/DeBoer.beb10-10.pdf ·...

This Provisional PDF corresponds to the article as it appeared upon acceptance. Fully formattedPDF and full text (HTML) versions will be made available soon.

Eco-evolutionary Dynamics, Coding Structure and the Information Threshold

BMC Evolutionary Biology 2010, 10:361 doi:10.1186/1471-2148-10-361

Folkert K de Boer ([email protected])Paulien Hogeweg ([email protected])

ISSN 1471-2148

Article type Research article

Submission date 19 March 2010

Acceptance date 24 November 2010

Publication date 24 November 2010

Article URL http://www.biomedcentral.com/1471-2148/10/361

Like all articles in BMC journals, this peer-reviewed article was published immediately uponacceptance. It can be downloaded, printed and distributed freely for any purposes (see copyright

notice below).

Articles in BMC journals are listed in PubMed and archived at PubMed Central.

For information about publishing your research in BMC journals or any BioMed Central journal, go to

http://www.biomedcentral.com/info/authors/

BMC Evolutionary Biology

© 2010 de Boer and Hogeweg , licensee BioMed Central Ltd.This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0),

which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

mailto:[email protected]

mailto:[email protected]

http://www.biomedcentral.com/1471-2148/10/361

http://www.biomedcentral.com/info/authors/

http://creativecommons.org/licenses/by/2.0

Eco-evolutionary Dynamics, Coding Structure and theInformation Threshold

Folkert K de Boer∗and Paulien Hogeweg

Theoretical Biology and Bioinformatics, Utrecht University,Padualaan 8, 3584 CH, Utrecht, Netherlands

Email: FK de Boer∗- [email protected]; P Hogeweg - [email protected];

∗Corresponding author

Abstract

Background: The amount of information that can be maintained in an evolutionary system of replicators islimited by genome length, the number of errors during replication (mutation rate) and various external factorsthat influence the selection pressure. To date, this phenomenon, known as the information threshold, has beenstudied (both genotypically and phenotypically) in a constant environment and with respect to maintenance (asopposed to accumulation) of information. Here we take a broader perspective on this problem by studying theaccumulation of information in an ecosystem, given an evolvable coding structure. Moreover, our setup allowsfor individual based as well as ecosystem based solutions. That is, all functions can be performed by individualreplicators, or complementing functions can be performed by different replicators. In this setup, where boththe ecosystem and the individual genomes can evolve their structure, we study how populations cope with highmutation rates and accordingly how the information threshold might be alleviated.

Results: We observe that the first response to increased mutation rates is a change in coding structure. Atmoderate mutation rates evolution leads to longer genomes with a higher diversity than at high mutation rates.Thus, counter-intuitively, at higher mutation rates diversity is reduced and the efficacy of the evolutionary processis decreased. Therefore, moderate mutation rates allow for more degrees of freedom in exploring genotype spaceduring the evolutionary trajectory, facilitating the emergence of solutions. When an individual based solutioncannot be attained due to high mutation rates, spatial structuring of the ecosystem can accommodate theevolution of ecosystem based solutions.

Conclusions: We conclude that the evolutionary freedom (eg. the number of genotypes that can be reached byevolution) is increasingly restricted by higher mutation rates. In the case of such severe mutation rates that anindividual based solution cannot be evolved, the ecosystem can take over and still process the required informationforming ecosystem based solutions. We provide a proof of principle for species fulfilling the different roles in anecosystem when single replicators can no longer cope with all functions simultaneously. This could be a first stepin crossing the information threshold.

1

BackgroundThe information threshold [1] puts a limit on themaximum amount of information that can be evo-lutionarily maintained by a single population ofreplicators. For an evolutionary process this im-plies that the length of genomes of replicators andthe number of errors during replication (mutationrate) is limited. It raises the question of how a’simple’ prebiotic system can evolve towards a morecomplex living system. To increase the complex-ity of a prebiotic system, replicators which shouldboth store information and act as an enzyme, musthave been able to accumulate and pass on infor-mation correctly. To correctly transfer more andmore genetic information between generations, thefidelity of replication has to improve as well. Thiscould be done for example with specific replicase orproofreading enzymes. However, this requires an in-creased coding length, which cannot be maintainedwithout these same enzymes. Thus it is not possibleto have a (large) genome without enzymes, but theevolution of enzymes would not be possible withoutlarge genomes. This is referred to as Eigen’s para-dox [2, 3].

This paper aims to extend the context in whichthe information threshold is studied and its possi-ble role in the early evolution of life. Traditionallythe information threshold is formulated in termsof a master and quasispecies of genotypes with astatic, single-peaked fitness landscape [1]. Thesestudies have been extended by taking a nonlineargenotype-phenotype mapping into account, usingRNA folding as a prototype example. In such sys-tems neutral mutations play an important role, anddue to this neutrality larger sequences can be main-tained, but this increase is limited [4, 5]. We extendthese previous studies in three different directions:(1) Replicators have a flexible coding structure,leading to a variable genome length and variablegenotype-phenotype mapping, (2) the acquisition,rather than the maintenance of information is stud-ied, and (3) replicators evolve within a non staticenvironment.

The first attempt that has been proposed tocross the information threshold involved the intro-duction of multiple replicators [6]. Such hypothet-ical replicators, later on mostly considered to beRNA (or another catalyst), could form the basis ofa prebiotic ecosystem, sometimes referred to as ’the

RNA-world’ [7]. Most likely, the only viable solu-tion to Eigen’s paradox lies in the co-existence ofseveral different replicators, such that the informa-tion necessary for coding enzymes can be stored andtransmitted by a population of co-existing smallerreplicators. Due to the fact that the co-existence ofdifferent species is typically considered to be an eco-logical problem, these approaches have been called’the ecological solution’ [8]. The two main modelsthat attempt to formulate such ecological solutionsto Eigen’s paradox are the hypercycle model [6,9,10]and the metabolic system model [8, 11, 12]. How-ever, these studies address the maintenance of areplicator-ecosystem despite mutations (or stabilityagainst invasion of parasite mutants), rather thanthe generation of an ecosystem as a consequenceof high mutation rates. Moreover, although theoriginal question involved the mechanism of obtain-ing functionality despite high mutation rates, nofunction beyond reproduction was incorporated inthese models. Here we study how a system cancope with externally defined requirements undervarious mutation rates (per base). We consider thecase that viable replicators can evolve functional-ity through either individual or population baseddiversity [13–15]- eg. all replicators perform allfunctions by themselves (like a Swiss army pocketknife) or different functions are divided over differ-ent replicators, the latter being an ecosystem basedsolution. In other words, the ecosystem as a wholecan provide a solution for the posed ’problems’ inthe environment. In our system these problems inthe environment change over time, co-evolving withthe replicators, resulting in a dynamic fitness land-scape.Regarding this environment, almost all theoreticalstudies published so far have demonstrated thatsome kind of spatial structure is indispensable forthe persistence and/or the parasite resistance of anyfeasible replicator system [8, 9, 11]. Through spatialpattern formation, selection is extended from purelyindividual-level selection to multi-level selection.Multi-level selection is considered to be a definingproperty of ecosystems [9, 10] and the success ofevolution strongly depends on how the ecosystem isable to structure itself [16, 17]. As such, our modelsystem allows for an emergent structure on two lev-els: both the coding of replicators and the spatialdistribution within the ecosystem.

To summarize, we study the problem of the infor-

2

mation threshold, bearing in mind that informationhas to be obtained, rather than merely maintainedunder high mutation rates. We use a flexible codingstructure at the level of individuals and in additionwe allow for the evolution of ecosystem based so-lutions using a spatial co-evolutionary setup. Westudy how such a system copes with high mutationrates, ie. whether an ecosystem based solution canreplace an individual based solution when the latteris not attained.

ResultsWe study the information threshold assuming thatall replicators are under physiological constraintsthat require the acquisition of mechanisms (or path-ways) to process food for survival and reproduc-tion. Replicators have to cope with their varying(co-evolving) environment, having different solutionsfor different situations (prey). To this end we useco-evolutionary function approximation as a tool formodeling eco-evolutionary dynamics [15,17,18]. Wedevelop a model-ecosystem of predators, prey andscavengers. Predators and scavengers are the stud-ied replicators and the consumption of prey acts asan analogy for coping with the environment. Theenvironment takes the form of co-evolving numeri-cal examples or problems (prey). Prey consists ofnumerical values, here (x,y). During prey replica-tion, these values can change by mutation. Copingwith the environment (eating prey) is defined as pro-ducing a numerical value defined by a global exter-nal function on the values of a prey (for example ifthe global function would be simply addition, thena predator producing ’5’ for prey (x=2,y=3) wouldget maximum fitness).

The genotypes of predators and scavengers arebased on LISP-constructs, which allow for a flexibleintegration of numerical functions in a genome (seeFigure 1 and methods section).Predators and scavengers observe the state vari-ables (x,y) of prey and should respond by producingthe numeric value in accordance with an externallydefined function (see Figure 2). For predators this isthe exact value of the target function; for scavengersthis is the value of what is left over by the predator.Note that scavengers do not observe what has beeneaten by predators. Each generation a replicator isconfronted with several prey competing with sur-

rounding replicators (only of the same kind) to eatit and fitness is defined proportional to the fractionof prey consumed.

The evolutionary targets used are chosen suchthat a solution is evolved easily by predators aloneunder a wide range of parameters. They differ inthe absolute minimum amount of coding needed andin the composition of x, y and mixed (x,y)-terms.The amount of coding is expressed in the numberof elements (i.e. operator, variable or constant) ona genome, referred to as length. Table 1 lists theevolutionary targets used, the corresponding mini-mal coding length m, and some examples of genomeswith minimal coding length for a solution. Note thatthis minimum can only be reached by ’smart’ cod-ing. That is, coding used for different terms has tooverlap. An example of overlapping terms is (* 2 (+x y)) (coding length 5), which is shorter than (+ (*2 x) (* 2 y)), which uses 7 elements to code for thesame function.The coding and the setup of our model-ecosystemenables the possibility to find two types of solutions:individual based solutions where all possible preycan be fully consumed by a single predator and anecosystem based solution where a solution is formedby an ecosystem of multiple replicators, namely apredator and a scavenger. Simulations can be clas-sified into three main classes: an individual basedsolution whereby the majority of prey are fully con-sumed by single predators coding for the full targetfunction, an ecosystem based solution whereby themajority of prey are fully consumed by complemen-tary predator-scavenger pairs which together codefor exactly the target function, or no solution at allwhen none or only a small minority of prey is fullyconsumed (by predators or predator-scavenger pairswhich do not code for the whole target function).Note that an individual based solution excludes anecosystem based solution. However, it is possiblethat an individual based solution replaces an ecosys-tem based solution over evolutionary time.In Figure 3 we see a clear transition between the typeof solutions found under various mutation rates. Asimilar pattern is seen for all functions in table 1.The shift to the right occurs because of the differentlength of coding needed. That is, under lower mu-tation rates an individual based solution is evolvedin almost all cases (blue and green) and under in-creased mutation rates it becomes increasingly dif-ficult to reach an individual based solution. At the

3

same time the number of ecosystem based solutionsincrease (orange and red). These ecosystem basedsolutions can be found up until quite severe mutationrates, clearly beyond the range of individual basedsolutions. The exact transition is also influenced bythe nature of coding needed for a target, as exempli-fied by the difference in transition for targets withminimal coding length 15.The remainder of this section is divided in two parts.First we will discuss the role of mutation rates on theinformation accumulation of individuals by focusingon basic genomic characteristics such as length andstructure. Secondly, we extend our scope by look-ing at the role of co-evolutionary and ecosystem dy-namics and how an ecosystem based solution canarise under circumstances where individual basedsolutions cannot. With these results we show howflexibility in a (co-)evolutionary system can help inovercoming the information threshold.

Information Accumulation and Individual BasedSolutions

First we focus on the full solution of individual repli-cators, and therefore we study the evolutionary tra-jectory from the point in time where such a solutionarises in the predator population. Figure 4 shows theinfluence of mutation rates on the length of evolvedsolutions for the shortest and longest evolutionarytargets, respectively. For each mutation rate 25 dif-ferent simulations are run. The distribution of thelength for the full solutions, evolved under differentmutation rates is shown. Note that in some cases thecorrect individual based solution first reached can-not be maintained due to the information thresholdand is lost again. The initial genome length of thepredators first to reach the evolutionary target de-creases under increasing mutation rates. Secondly,it is evident that 250 generations after the arrival ofthis first solution, the length of coding used by thesolution becoming dominant in the population hasdecreased. The length of coding approaches the ab-solute minimum for the corresponding targets. Thusthe predators in the ecosystem have restructuredtheir coding such that the same phenotype is codedfor by a shorter genome. Such shorter genomes tendto be more robust because of less mutations pergeneration. In table 2 an example of this stream-lining of genotypes and altered genotype-phenotypemapping is shown for µ = 0.03. After arrival of thefirst solution, multiple ’mutant’ strains (not neces-

sarily ancestors) with the correct individual solutionarise in the ecosystem, all with a shorter coding forthe same phenotype. These different genotypes co-exist in the population, however in the long term,the most compact coded solutions will out-competethose with a longer genome.These observations strongly suggest that, despite astrong preference for having a coding structure asshort as possible, predators initially exploit moreinformation than strictly necessary to evolve a so-lution. Under increased mutation rates these longersolutions cannot arise or maintained anymore. Thuswe observe that high mutation rates decrease thedegrees of freedom and thereby restrict the chanceof finding an individual based solution. Under severemutation rates an individual based solution cannotbe found anymore at all.These conclusions are corroborated by results onthe average time to reach the evolutionary target.Figure 5 illustrates that under increased mutationrates it takes longer to evolve a full solution. Thiscontradicts the expectation that a higher rate ofchange and a smaller search space because of smallergenomes would lead to a faster coverage of geno-type space. As shown in Figure 4 only a subsetof genotypes coding for the full solution is reached(depending on coding length). Moreover, on averageit takes predators longer to evolve a solution, againsuggesting mutational restrictions in usable codinglength.To disentangle the role of mutation rate and genomelength, and establish that it is indeed the genomelength rather than mutation rates which determineefficacy, we perform experiments with an externalrestriction on the available genome length for repli-cators (simulating a lethal mutation for replicatorswhich exceed a certain genome length). As shownin table 3 both under low and high mutation ratesit becomes increasingly difficult to find individualbased solutions. The median time needed for evolv-ing individual based solutions increases and successrate drops dramatically. Assuming ’optimal’ muta-tion rates [19,20] would predict that predators witha restriction on length would perform better underhigher mutation rates in search for an optimal rateof change. However, although the influence of bothlength and mutation rate cannot be disentangledcompletely, this is clearly not the case with presentresults. Taking into account the decreased multi-plicity of reachable genotypes coding for individualsolutions as observed in Figure 4, we can only con-

4

clude that reachable genotype space (and solutions)are restricted by high mutation rates via the genomelength of replicators.

Population Based Diversity and Ecosystem BasedSolutions

Before an individual based solution has beenreached, the composition of the predator populationis heterogeneous. Due to co-evolutionary dynam-ics between predator and prey, both populationsbecome speciated. Prey maximize the genotypicdistance between the different sub-populations anddifferent sub-populations of predators specialize to-wards each of these (for an extensive analysis ofthese dynamics, see [15]). Because of the spatialembedding, predators and prey structure them-selves such that wave-like patterns arise as shownin Figure 6. In table 4 an example of co-existingsub-populations is shown, taken from a simulationwith the longest evolutionary target at µ = 0.075.Within the population of prey, one sub-populationevolves a high x-value and a low y-value, while inthe other sub-population this is reversed. Predatorsspeciate in ’eating’ a different part of prey. One sub-population feeds mostly on y (i.e. contains mostlyy-terms), prospering on prey with a high y-value,and a predator feeding best on the x-value of preyhas the opposite preference. Note that this does notnecessitate strict partitioning in x and y-terms ofthe evolutionary target.Scavengers, feeding on the remains of prey, also spe-ciate during evolution, trying to have a preferenceopposite to the dominant predator in their neigh-borhood. Under moderate mutation rates, predatorskeep evolving towards the full evolutionary target,possibly diminishing the remains of prey more andmore. Scavengers can keep up in such a case only byfeeding on smaller parts. Note that they can keepfitness, because fitness is assigned as a fraction ofthe remains. However, when an individual basedsolution evolves, scavengers loose all their function-ality because there is nothing left to feed on.

Only under higher mutation rates, the ecosys-tem based solutions become a stable evolutionaryattractor. Under high mutation rates the system isno longer able to evolve individual based solutionsdue to mutation rates and the constraints in genomelength as shown above. Due to the high mutation

rates, predators can obtain only enough informa-tion to code just for feeding “sufficiently enough”on local prey. Robustness in solving only a subsetof possible prey with high local fitness is ’chosen’above high, but unstable global fitness (meaningthe hypothetical fitness they could acquire on allprey). Due to spatial pattern formation several suchpartial solutions can co-exist in a stable ecosystem.Scavengers are able to feed on the remains of prey,resulting in a structured ecosystem based solution,as shown in Figure 6. Despite the sub-populationsizes possibly being small due to a large amount ofmutants, predators and scavengers forming the cor-rect complementary partial solutions can stabilizeover time, even under high mutation rates, as shownin Figure 3.

Finally we compare our results for acquiring in-formation with the threshold for maintaining infor-mation [1, 21]. When seeding the population withindividual based solutions of the minimal codinglength for the evolutionary target, these full solu-tions can only be sustained for mutation rates upuntil µ = 0.126 ,µ = 0.110 and µ = 0.088, respec-tively for the targets with a length of 13, 15 and 19elements (shown as stars in Figure 3). Therefore wecan conclude that the limits posed by the informa-tion threshold are even more severe for obtaining in-formation than they are for maintaining information.Moreover, individual based solutions with a longergenome (coding for the solution) cannot persist un-der such severe mutation rates and the individualbased solutions will be lost completely or recoded toa shorter solution. Thus, by allowing for variablecoding, the information threshold for obtaining cer-tain functionality can already be alleviated by usinga more compact coding.As for the actual crossing of the information thresh-old (for maintenance of information), ecosystembased solutions should be able to fully consume preyunder mutation rates under which individual basedsolutions with minimal coding cannot even persist.In Figure 7 we show an example for the shortesttarget with µ = 0.13, i.e. above µ = 0.126, iden-tified as the maximum mutation rate under whichan individual based solution can be maintained. Weobserve thus a case where an individual based so-lution with a most compact coding (13 elements) isindeed out-competed completely, and the ecosystemtakes over, processing the required information andstill consuming a considerable amount of prey fully

5

each generation.

Discussion

First we studied the influence of mutation rateson the evolutionary trajectory by observing howevolved individual based solutions are coded for un-der different mutation rates. We also looked at thestructure and length of predators first reaching theevolutionary target. It has previously been observedthat the information threshold restricts the amountof information that replicators can maintain underincreased mutation rates [1, 21]. We showed thatunder high mutation rates a more severe thresholdrestricts the required increase of information beforereplicators are fully functional.For genomes with a fixed length it has been shownanalytically that ’optimal’ mutation rates exist [19,20], that mutation rate itself is a selectable trait [22],and that the time to reach a target increases givenhigher mutation rates [23]. Moreover, for evolvingtowards a static target with variable genome lengthit has been shown that high mutation rates lead tocompact coding for functionality on a genome [24].Here we see that in the case of flexible coding, repli-cators use different lengths under different mutationrates. For a fixed mutation rate per base, morelength leads to more mutations per replicator andthus by adjusting the length of coding, the mutationrates per generation are altered. However, a restric-tion of genome length does not result in a more ef-ficient covering of genotype-space under higher mu-tation rates nor in an increase in efficacy. In con-trast, under lower mutation rates without lengthrestrictions, an increase in the success rate can berelated to an higher diversity of reachable genomescoding for individual based solutions. That is, an in-creased multiplicity of available solutions has a clearpositive effect on efficacy. The possibility to gen-erate longer genomes increases efficacy. Similar re-sults have been shown experimentally with the isola-tion of novel ribozymes from random-sequence RNApools, where longer random sequences increase theprobability of finding complex structures [25]. Withvariable genome length it is not maximum mutationrates which increase genetic diversity as one wouldintuitively expect. Instead, moderate mutation ratesincrease genetic diversity through increasing genomelength and therewith enlarges evolutionary searchspace which in turn maximizes evolutionary efficacy.

The diversity of attainable genomes, coding for anindividual based solution, decreases under highermutation rates. However, we have shown that highmutation rates lead to the persistence of more nichesfor replicators which are only partially functional.In contrast, recent work based on RNA-like repli-cators has shown that a lowering in mutation ratecan lead to an increase of niches [26]. The differencebetween these systems is that in our case there is apredefined set of “tasks”, whereas in a RNA-systemthe only “task” is (catalytic) reproduction. Whenselection is solely acting on replication through in-teraction, high mutation rates disrupt the interac-tion strength within an ecosystem by dilution of thefittest sequences, preventing the formation of newspecies. This differs from our current system wherehigh mutation rates maintain more niches by pre-venting the out-competition of multiple partial so-lutions (the ecosystem) by the full solution. Whennot fully functional replicators are viable (i.e. selec-tion is based on functionality instead of replication),replicators can suffice with lower functionality un-der increased mutation rates. This shows a new sideof the information threshold: the impossibility ofevolving replicators with full functionality leads toan increase in diversity because of the multiplicityof partly functional replicators in the system. Undermoderate mutation rates, spatial structuring of co-evolving populations benefits the information inte-gration over evolutionary time in replicators [16–18].If, however, the necessary information cannot be in-tegrated in a single replicator, the diversity of par-tial solutions can be kept in the ecosystem becauseof this co-evolutionary nature and spatial distribu-tion of the system.We showed that under high mutation rates our sys-tem does switch from individual based solutions to-wards the generation of an ecosystem based solu-tion. Thus we conclude that ecosystem structuringenables the increase of the complexity despite thepresence of an information threshold.

Conclusions

The coding structure of evolved replicators revealsthe influence and severity of mutation rates. Theinformation threshold not only influences the main-tenance of information in the genome, but it alsoconstrains the degrees of freedom of the evolution-ary trajectory by restricting the permissible genome

6

length. In our system, multiple ’solutions’ are possi-ble due to a complex genotype-phenotype mappingand freely evolving coding structures. However, thenumber of genotypes coding for a full solution whichcan be reached is increasingly restricted by highermutation rates. If the length of maintainable infor-mation is limited by the information threshold, repli-cators can adapt their coding structure. In this way,for a given functionality the information thresholdcan partly be alleviated by using a different, morecompact, coding. This aspect of the informationthreshold is of great importance for questions aboutthe evolution of complexity. We show that in a sys-tem with such severe mutation rates that an indi-vidual based solution cannot evolve, the ecosystemcan take over and still process the required informa-tion, forming ecosystem based solutions. Therefore,we conclude that, when taking eco-evolutionary dy-namics and flexible coding structures into account,the integration of information within the ecosystemunder circumstances where individual based solu-tions cannot evolve, can be a feasible solution toEigen’s paradox and a possible option for crossingthe threshold for obtaining information.

Methods

We use a stochastic Cellular Automata (CA) model,which is a spatially extended, synchronously up-dated individual-based simulation model. It consistsof (vertically stacked) two-dimensional square gridson which individuals are located. The grids are madeup of 75 x 75 cells, and the boundaries are toroidal.Simulations not displayed here show that the qual-itative behavior of the system does not depend onthe size of the grid if larger than 50x50, which islarge enough for spatial patterns to develop. Onesquare in a grid, hereafter called a ’cell’, holds atmost one individual. The different grids are locatedexactly on top of each other, as if each location (i, j)holds three individuals. Interactions between indi-viduals within the same grid as well as interactionsbetween individuals from different grids are all localin a 3x3 neighborhood. That is, a neighborhood con-sists of the eight cells adjacent to a cell and the cellitself (Moore neighbors). Our model distinguishesthree types of individuals, with each type locatedon a separate grid. The state of the model systemis fully specified by the type, the state and loca-tion of all individuals. The three different types are

called prey, predators and scavengers. The state ofpredators and scavengers is the numerical functionthey encode and the state of prey is a numerical(x,y)-value. Fitness of predators and prey dependon their co-evolutionary relationship and scavengersfeed on the remains of prey, only after the predatorshave finished. Thus, scavengers have no influenceon the evolutionary pressures of predators and prey.Scavengers are only implicitly evaluated on how wellthey complement predators and only see the origi-nal values of prey. Therefore scavengers cannot seehow much of a prey has already been consumed bya predator.

We use a system-wide defined evolutionary tar-get, as carried out in function approximation meth-ods [15,17,18]. In our case this is a numerical func-tion considered in a limited domain only, namelyx = 0.0, 5.0 and y = 0.0, 5.0. Table 1 lists the evo-lutionary targets used, the corresponding minimalcoding length m, and some examples of a ’genome’coding for a solution.

Prey consist of genotypes which are instances of(x,y)-value pairs within the function domain. Thatis, each prey represents a certain problem whichhas to be solved by the predators. This value-pairmaps via the predefined target function to an uniquevalue, f(x, y), which can be considered as the solu-tion to the particular problem presented by a prey.This unique value has to be matched by predators,which determines how much of this prey is consumed(see Figure 2). Every time step each of the valuesof the prey population is subject to a 40% chanceon mutation per value. Both the values of (x, y) ofa prey can change randomly between the definedmutational boundaries (that is, current value plusor minus 0.2). The genotype space of prey (domainof function landscape) is not toroidal. If an x ory-value of a prey is on the border of the domain, itcan only mutate in one direction (to keep mutationrate constant, mutations over the border will not beneglected, but reflected).

Predators and scavengers have the same genomicarchitecture. The genetic representation is in theform of a program (i.e., a functional representation,as in genetic programming). Many different pro-grams can code for the same numerical function.The genome consists of a limited set of terminalsand operators, based on LISP-programming, codingfor a function (for an example see Figure 1). The set

7

of operators is {+,-,x,$}, where $ is a save divisionoperator, often used in genetic programming, whichgives 1 when dividing by zero. The set of terminalsis {x,y, C}; x and y are the variables and C is aconstant defined at declaration either as an integerbetween 0 and 10, or as a float between 0 and 1.

Both predators and scavengers selected afterevaluation are subject to point mutation, using amutation rate µ per element and a gross chromoso-mal rearrangements(GCR) rate per genome. Due tothe tree-like representation of genomes, it is impor-tant to realize that mutating elements high in thehierarchy can possibly affect underlying elements.When a mutation leads to the change of an oper-ator into a terminal (either a variable or constant),underlying elements are discarded. In the reversedcase (terminal mutating into an operator) a randomsub-tree of maximal 3 elements is added. In all othercases only the element itself mutates and the under-lying elements remain untouched. A gross chromo-somal rearrangement-event( chance of µGCR) meansthat a randomly chosen part of a genome is overwrit-ten with another randomly chosen (possibly overlap-ping) part of this genome. Where point mutationscan only lead to a gradual increase or decrease inlength, GCR can possibly lead to a sudden large in-crease in length. However, although GCR speeds upthe process, even without GCR qualitatively simi-lar results are obtained. Results are only shown forµGCR = 0.1 and µprey = 0.4, however test simula-tions have shown that qualitative results do not de-pend on these parameters. In most simulations weare interested in µ, which is varied between 0.02 and0.15. Note that under neutral expectations there isa small bias for predators and scavengers to becomesmaller, due to the combination of mutational opera-tors and tree-like representation (it is easier to loosea large sub-tree, than to gain it). However, this biasis the same for all different µ and of no influence forthe results shown.

Simulations start with a population of 5625 (oneindividual per cell) for each type of individual. Preystart with a random (x,y)-value pair within the do-main. Predators and scavengers start with a randomgenerated genome with an average length of 13.8.The consecutive application of a simple algorithmon all positions of the spatial grids (of which eachposition i,j contains a prey, a predator and a scav-enger respectively) defines the temporal dynamics ofthe model. All individuals are replaced every time

step (leaving no empty grid cells), keeping popula-tion sizes constant. Asynchronous simulations withoverlapping generations for predators and scavengersgive qualitatively the same results. The synchronousalgorithm used runs as follows:

• evaluation

1. check for the prey at position (i,j), whichlocal predator approximates f(x,y) bestand may feed on this prey.

2. if there is some of the prey left → checkwhich local scavenger is most suited tofeed on the remains.

3. define fitness for all types of individuals.In order to keep a clear evolutionary sig-nal, individuals whose approximation areidentical within its neighborhood, acquirethe same fitness for this prey. (see Figure2 for an schematic representation of thefitness evaluation).

– Prey fitness is the fraction of it whichhas not been eaten by the predator.

– Predator fitness consists of the sumof the fraction of prey it eats in a localneighborhood. Each consumed preyadds e−(1−fractioneaten) to the fitnessof the predator. This makes f = 9.0the maximum fitness when all prey inthe neighborhood are fully eaten.

– Scavenger fitness consists of the sumof the fraction of remains of theprey it eats in its local neighbor-hood. Each consumed prey addse−(1−fractioneaten) to the fitness ofthe scavenger. Note that the approx-imation of scavengers is based on theoriginal (x,y) of prey and that whenthere are no remains of a prey aftera predator, scavengers can get no fit-ness on this prey.

• selection

– apply to the prey, predator and scavengerpresent at position (i,j):

∗ add all fitness in local neighborhoodto determine competition

∗ select replicator from neighborhoodwith a chance proportional to theirfitness.

8

• reproduction

1. Apply mutational operators on selectedindividuals with chance:

– µprey per value for prey– µ per element and µGCR for preda-

tors and scavengers

2. let new individual inhabit cell

Simulations are stopped when either an individ-ual solution (a predator coding for exactly the targetfunction) has evolved, spread through the popula-tion and stayed in the population for 250 time steps,or if the maximum of 1500 generations is reached.An ecosystem based solution (the combination of apredator and scavenger exactly coding for the targetfunction) is classified as either stable, when the solu-tion stays in the population for the rest of the simu-lation or unstable if the solution is lost before the endof the simulation. A last possibility is an ecosystembased solution preceding an individual based solu-tion as a transient state.

Authors contributionsBoth authors conceived of the study and were in-volved in writing the manuscript. FK carried outmost of the model work. PH supervised all aspectsof the research.All authors read and approved the final manuscript.

AcknowledgmentsWe thank Thomas D Cuypers, Sally Vincent-Smithand Daniel J van der Post for useful comments onthe manuscript. This research is funded by theNetherlands Science Organization (NWO) undergrant number 612.060.522.

References1. Eigen M: Selforganization of matter and the

evolution of biological macromolecules. Naturwis-senschaften 1971, 58(10):465–523.

2. Maynard Smith J, Brookfield JFY: Models of Evolu-tion. Proceedings of the Royal Society of London. SeriesB, Biological Sciences 1983, 219:315–325.

3. Szathmary E: The integration of the earliest ge-netic information. Trends in Ecology & Evolution1989, 4(7):200–204.

4. Huynen MA, Stadler PF, Fontana W: Smoothnesswithin ruggedness: the role of neutrality in adap-tation. Proceedings of the National Academy of Sciencesof the United States of America 1996, 93:397 –401.

5. Takeuchi N, Poorthuis P, Hogeweg P: Phenotypic errorthreshold; additivity and epistasis in RNA evolu-tion. BMC Evolutionary Biology 2005, 5:9.

6. Eigen M, Schuster P: The Hypercycle. Naturwis-senschaften 1978, 65:7–41.

7. Gilbert W: Origin of life: The RNA world. Nature1986, 319(6055):618–618.

8. Scheuring I, Czaran T, Szabo P, Karolyi G, Toroczkai Z:Spatial Models of Prebiotic Evolution: Soup Be-fore Pizza? Origins of Life and Evolution of Biospheres2003, 33(4):319–355.

9. Boerlijst M, Hogeweg P: Spiral wave structure in pre-biotic evolution: Hypercycles stable against para-sites. Physica D: Nonlinear Phenomena 1991, 48:17–28.

10. Hogeweg P, Takeuchi N: Multilevel Selection in Mod-els of Prebiotic Evolution: Compartments andSpatial Self-organization. Origins of Life and Evo-lution of Biospheres 2003, 33(4):375–403.

11. Szathmary E, Demeter L: Group selection of earlyreplicators and the origin of life. Journal of Theo-retical Biology 1987, 128(4):463–486.

12. Konnyu B, Czaran T, Szathmary E: Prebiotic repli-case evolution in a surface-bound metabolic sys-tem: parasites as a source of adaptive evolution.BMC Evolutionary Biology 2008, 8:267.

13. Pagie L, Hogeweg P: Individual-and population-based diversity in restriction-modification sys-tems. Bulletin of Mathematical Biology 2000, 62(4):759–774.

14. Crombach A, Hogeweg P: Evolution of resource cy-cling in ecosystems and individuals. BMC Evolu-tionary Biology 2009, 9:122.

15. de Boer F, Hogeweg P: Coevolution and EcosystemBased Problem Solving. Under Review 2010.

16. Williams N, Mitchell M: Investigating the success ofspatial coevolution. In Proceedings of the 2005 confer-ence on Genetic and evolutionary computation, Wash-ington DC, USA: ACM 2005:523–530.

17. de Boer FK, Hogeweg P: The Role of Speciation inSpatial Coevolutionary Function Approximation.In Proceedings of the 2007 Genetic and EvolutionaryComputation Conference 2007:2437–2441.

18. Pagie L, Hogeweg P: Evolutionary consequences ofcoevolving targets. Evolutionary Computation 1997,5(4):401–418.

19. van Nimwegen E, Crutchfield JP: Optimizing epochalevolutionary search: population-size independenttheory. Computer Methods in Applied Mechanics andEngineering 2000, 186(2-4):171–194.

20. Nimwegen EV, Crutchfield JP: Optimizing EpochalEvolutionary Search: Population-Size DependentTheory. Machine Learning 2001, 45:77–114.

21. Takeuchi N, Hogeweg P: Error-threshold exists in fit-ness landscapes with lethal mutants. BMC Evolu-tionary Biology 2007, 7:15.

9

22. Earl DJ, Deem MW: Evolvability is a selectable trait.Proceedings of the National Academy of Sciences of theUnited States of America 2004, 101(32):11531 –11536.

23. Stich M, Briones C, Manrubia S: Collective propertiesof evolving molecular quasispecies. BMC Evolution-ary Biology 2007, 7:110.

24. Knibbe C, Mazet O, Chaudier F, Fayard J, Beslon G:Evolutionary coupling between the deleterious-ness of gene mutations and the amount of non-

coding sequences. Journal of Theoretical Biology 2007,244(4):621–630.

25. Ekland E, Szostak J, Bartel D: Structurally complexand highly active RNA ligases derived from ran-dom RNA sequences. Science 1995, 269(5222):364–370.

26. Takeuchi N, Hogeweg P: Evolution of complexity inRNA-like replicator systems. Biology Direct 2008,3:11.

FiguresFigure 1 - Coding StructureThe genetic representation of predators and scavengers is a tree-like coding structure. This genotype definesthe phenotypic reaction to a prey, based on the (x,y)-value of this prey. That is, how does one process preyconsisting of a certain amount of nutrients, x and y. The functional representation of this replicator wouldbe (+ ($ x x) (* (+ y 3) y)).

Figure 2 - Fitness EvaluationSchematic representation of fitness evaluation of predators, scavengers and prey. Dashed lines denote onbasis of which value a response is. That is, predators produce a value based on the (x,y)-values of a prey(colored red and green respectively). This value, relative to the value which the prey produces (based onthe evolutionary target), defines the fraction of prey which is eaten by the predator. A scavenger feeds onthe remains of prey, based on the same (x,y)-values of prey. Fitness is based on the fraction of prey whichis eaten. In this particular example, the fitness of this prey would then be 0.2 (1. − 0.8) and the predatorand scavenger would get respectively 0.82 (e−0.2) and 0.63 (e−0.47) added to their fitness.

Figure 3 - Evolved Solutions under Different Mutation RatesType of solutions (classified as described in the methods section) under different mutation rates (per base)for the evolutionary targets of table 1 with minimal coding length of (a) 13, (b) 15, (c) 15, (d) 19. Notethat the exact nature of the target does also make a difference as shown by the difference in shift for bothtargets with a minimal coding length of 15. Blue and green represent simulations which evolve an individualsolution, where blue has a transient state of ecosystem based solutions. Red and orange represent ecosystembased solutions, where in the orange cases this solution is lost again later in evolution. For each targetthe information threshold for maintaining the target is indicated with a star. Above this mutation ratethe shortest solution for this target cannot be maintained as described in the last paragraph of the resultssection.

Figure 4 - Initial and Final Coding Length for Different Mutation RatesA decrease in initial coding length under higher mutation rates is observed. Restructuring of initial solutionsafter prolonged evolution also decreases length. For each mutation rate 25 simulations are run. The lengthdistributions under different mutation rates are shown for (a) first evolved individual base solutions and (b)most compact individual based solution 250 generations later, for the target with minimal coding length 13;(c) first evolved individual based solutions and (d) most compact individual based solution 250 generationslater, for the target with minimal coding length 19.Note that some first evolved solutions are lost from the population after prolonged evolution due to theinformation threshold (for example the solution found for the longest target with µ =0.095 in (c) is lost in(d)).

10

Figure 5 - Generations Needed to Evolve a Full SolutionFor evolutionary targets with minimal coding length (a) 13, (b) 15, (c) 15 and (d) 19, the median numberof generations needed for the evolution of a full solution is shown. Error bars depict the minimum andmaximum number of generations. On the right (in red) the actual number of solutions out of 25 simulationsper mutation rate is plotted. Among the solutions shown are some which cannot be maintained and are lostfrom the population. Leaving out these solutions even strengthens our conclusions. Prolonged experiments(maximum generations=20000) with high mutation rates give comparable results. That is, results do notqualitatively depend on the amount of time provided for information accumulation.

Figure 6 - Spatial Ecosystem DistributionThis figure shows the spatial structure of an ecosystem based solution under high mutation rates. The shadeof green denotes the fitness of prey, or rather: how much of the prey is eaten. Prey depicted as yellow arefully eaten by an ecosystem based solution. Red denotes single prey which are fully eaten by a predatoralone (not being an individual based solution). In this case the pattern is governed by the prey which arefully eaten by a predator-scavenger pair. Such a pattern, with comparable numbers of ’yellow’ prey, can onlybe met when a correct ecosystem based solution is present in the population.

Figure 7 - Passing the Information thresholdWhen seeding a population under mutation rates above the information threshold(µ =0.13), with correctindividual based solutions, these solutions are quickly lost from the population. This is shown by the decliningnumber of prey which are eaten by correct individual based solutions(black line). The loss of these individualbased solutions creates a niche for ecosystem based solutions, which indeed arise as can be observed by theincrease of prey consumed by a correct ecosystem based solution (red line).

TablesTable 1 - Evolutionary TargetsEvolutionary targets with corresponding minimal coding length m needed to code for them. In the lastcolumn an example of a genome with the minimal length coding for such a full solution.

m Evolutionary Target Minimal Coding Example(a) 13 f(x, y) = x3 + y3 + 5x2

(+ (* (* (+ x 5) x) x) (* (* y y) y))

(b) 15 f(x, y) = x3 + y3 + 5x2 + xy (+ (* (+ (* y y) x) y) (* (* (+ 5 x) x) x))

(c) 15 f(x, y) = x3 + y3 + 5x2 + 2y2(+ (* x (* (+ 5 x) x)) (* (* y (+ 2 y)) y))

(d) 19 f(x, y) = y4 + x3 + y3 + yx2 + y2(+ (* (* x x) (+ x y)) (* (* (+ (+ y 1) (* y y)) y) y))

Table 2 - Streamlining of Individual Based SolutionObserved streamlining of genotypes for the phenotype of an individual based solution found in a simulationwith µ=0.03. After the arrival of a first individual based solution of length 25, the length of consecutivemutants is decreased after prolonged evolution. Examples are shown of two strains leading to a solutionwith length 15 and 17 respectively. Note that intermediate mutants are not necessarily shown.

25: (- (* (* y y) y) (* (- (- x x) x) (+ x (* (+ 3 (- 1 (- (- x x) x))) x))))

21: (- (* (* y y) y) (* (- (- x x) x) (+ x (* (+ 3 (+ 1 x)) x)))) 19: (- (* (* y y) y) (* (- (- x x) x) (+ x (* x (+ 4 x)))))

17: (+ (* (* y y) y) (* x (+ x (* x (+ 3 (+ x 1)))))) 15: (+ (* (* y y) y) (* x (+ x (* x (+ 4 x)))))

11

Table 3 - Length RestrictionEvolution of replicators with a restriction on their genome length of 13 and 15 elements respectively, comparedwith no restriction for µ=0.04 and µ = 0.08. With restriction, replicators exceeding the maximum numberof elements allowed, are considered as lethal mutants. The number of simulations which established anindividual based solution are shown and in parentheses the median time to find these solutions. For eachset 25 simulations are run with the shortest evolutionary target (minimal coding length=13).

∞ 13 15µ solutions (median time)

0.04 25(90) 5(150) 12(168)0.08 18(355) 8(806) 5(730)

Table 4 - Coding Example of Ecosystem Based SolutionExample of evolved ecosystem based solution for the longest evolutionary target used: f(x, y) = y4 + x3 +y3 + y ∗ x2 + y2. These predator-scavenger combinations can feed perfectly on all possible prey in the modeluniverse.

predator scavenger((* x (* x (+ y x)))) ((* y (+ (* y (+ (* y y) y)) y)))

((* (+ (* (+ (* y y) y) y) y) y)) ((* (+ y x) (* x x)))

12

x

Y

*

3

x

$

+ y

+

{{

x yFigure 1

Figure 2

0.02 0.04 0.06 0.08 0.1 0.12 0.150

10

20

30

10

20

30

10

20

30

10

20

30

mutation rate

number of sim

ulations

ecosystem based:

individual based:

after ecosystem

direct

unstable

stable

total simulations

*

*

*

*

Threshold for

maintaining

information*

13

15

19

15

(a)

(b)

(c)

(d)

Figure 3

Figure 4

0.02 0.04 0.06 0.08 0.1 0.120

500

1000

1500

0.02 0.04 0.06 0.08 0.1 0.120

5

10

15

20

25

30

mutationrate

Tim

e in generations

Corre

ct S

olutio

ns

0.02 0.04 0.06 0.08 0.1 0.120

500

1000

1500

0.02 0.04 0.06 0.08 0.1 0.120

5

10

15

20

25

30

mutationrate

Tim

e in generations

Corre

ct S

olutio

ns

0.02 0.04 0.06 0.08 0.1 0.120

500

1000

1500

0.02 0.04 0.06 0.08 0.1 0.120

5

10

15

20

25

30

mutationrate

Tim

e in generations

Corre

ct S

olutio

ns

0.02 0.04 0.06 0.08 0.1 0.120

500

1000

1500

0.02 0.04 0.06 0.08 0.1 0.120

5

10

15

20

25

30

mutationrate

Tim

e in generations

Corre

ct S

olutio

ns

a b

c d

Figure 5

prey fitness

Figure 6

Time in generations

Fully consumed prey

0 500 10000

1000

2000

Ecosystem Based SolutionIndividual Based Solution

Figure 7

BMC Evolutionary Biology - Universiteit Utrechtbioinformatics.bio.uu.nl/pdf/DeBoer.beb10-10.pdf ·...

Documents

Transcript of BMC Evolutionary Biology - Universiteit Utrechtbioinformatics.bio.uu.nl/pdf/DeBoer.beb10-10.pdf ·...