Kimura Journal

download Kimura Journal

of 57

Transcript of Kimura Journal

  • 8/3/2019 Kimura Journal

    1/57

    Diffusion Models in Population Genetics

    Motoo Kimura

    Journal of Applied Probability, Vol. 1, No. 2. (Dec., 1964), pp. 177-232.

    Stable URL:

    http://links.jstor.org/sici?sici=0021-9002%28196412%291%3A2%3C177%3ADMIPG%3E2.0.CO%3B2-4

    Journal of Applied Probability is currently published by Applied Probability Trust.

    Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available athttp://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtainedprior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content inthe JSTOR archive only for your personal, non-commercial use.

    Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained athttp://www.jstor.org/journals/apt.html.

    Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission.

    The JSTOR Archive is a trusted digital repository providing for long-term preservation and access to leading academicjournals and scholarly literature from around the world. The Archive is supported by libraries, scholarly societies, publishers,and foundations. It is an initiative of JSTOR, a not-for-profit organization with a mission to help the scholarly community takeadvantage of advances in technology. For more information regarding JSTOR, please contact [email protected].

    http://www.jstor.orgThu Oct 25 14:29:34 2007

    http://links.jstor.org/sici?sici=0021-9002%28196412%291%3A2%3C177%3ADMIPG%3E2.0.CO%3B2-4http://www.jstor.org/about/terms.htmlhttp://www.jstor.org/journals/apt.htmlhttp://www.jstor.org/journals/apt.htmlhttp://www.jstor.org/about/terms.htmlhttp://links.jstor.org/sici?sici=0021-9002%28196412%291%3A2%3C177%3ADMIPG%3E2.0.CO%3B2-4
  • 8/3/2019 Kimura Journal

    2/57

    J. Appl. Prob. 1 , 177-232 (1964)Printed in Zsrdel

    DIFFUSION MODELS IN POPULATION GENETICSM O T 0 0 KIMURA, National Institute of Genetics, Mishima, Japan

    CONTENTSPloo

    1 . Introduction .. .. .. . . . . . . . . . . .. . . 178 2 . Changes of gene frequencies as stochastic processes . . . . 178 3. The partial differential equation method . . . . . . . . 181 11. RANDOMDRIFTI N THE NA RR OWSEN SE4. Random drift in a small finite population .. . . .. .. 191 5. An approximate treatment by the angular transformation 199

    6. A population under linear pressure and random sampling of gametes . . . . .. .. . . . . .. . . .. . .. 201 7. Change o f gene frequency unde r selection and random sampling of gametes .. . . .. . . .. . . .. . . .. 203 7.1 Genic selection (C ase of no dominance) . . .. . . . 203

    7.2 Case of overdominance . . .. . . .. .. . . . . 208 8. Random fluctuation of selection intensities .. .. . . .. 211 IV. GENETIC ; STATIONARY AN D GENE IXATIONQUILIBRIUM DISTRIBUTIONS

    9. Gene frequency distribution at equilibrium . . . . . . . . 215 9.1 Stationary distribution . . .. . . .. . . . .. 215 9.2 Distribution under irreversible mutation . . . . .. .. 218

    10. Probability of fixation of mutant genes in a population . 223 10.1 Introductory remarks . . .. . .. .. . . .. .. 223 10.2 Single locus . . . . . . .. . . . . . . .. . . . 223 10.3 Multiple loci . . . . . . . . . . . . . . . . 228

    References . . . . . . . . . . . . . . . . . . .. .. . . . . .. 230 Received in revised form 10 June 1964. C ontribution N o . 453 from the NationsInstitute o f G enetics.

    177

  • 8/3/2019 Kimura Journal

    3/57

    M. KIMURA

    I. PROBLEMS METHODS GENETICSND OF POPULATION1. Introduction

    Population genetics is that branch of genetics, whose object is the study ofthe genetical make-up of natural populations. By investigating the laws whichgovern the genetic structure of natural populations, we intend to clarify themechanism of evolution.

    In a n atu ral po pulation of sexually reprod ucing species, with only a hund redloci segregating, the number of possible genotypes may be practically infinite,an d the genotype of each individual is quite likely to be unique in the entire his-tor y of the species. Thus, as an aggregate of individual genotypes, a populationis an enormously com plicated system, sometimes too complicated to be treatedtheoretically. O n the other h and , in any reasonably large population , the relativepropo rtion of an allele ( a particular for m of a gene) within the popu lation changesalmost continuously with time. This is because, unlike genotypes, each genereproduces its own kind with complete fidelity except for the very ra re event ofmutation. As pointed out by Fisher (1953), "the frequencies w ith which thedifferent genotypes occur define th e gene ratios characteristic of the population,so that i t is often convenient to consider a natu ral population no t so much asan aggregate of living individuals as an agg regate of gene ratios. Such a changeof viewpoint is similar to that familiar in the theory of gases, where the speci-fication of the pop ulation of velocities is often m ore useful than tha t of a po pu-lation of particles." This line of investigation was initiated by F isher (1922) andlater elaborated by him (Fisher, 1930), Haldane (cf. 1932) and especially byWright (1931, and later publications).

    In the present paper, I shall review the theoretical works on p opula tion genetics,treating the c hanges of gene frequencies as stochastic processes, an d describingthese especially by th e use of diffusion e quation s. Since I started my work in thisfield as a geneticist, the mathematical sophistication of my approach has beenrather l imi ted; I cannot escape from this l imitation in the present paper, butI h ope it will stimu late mathematicians t o w ork in this fascinating field. Indeed,there is much to be done in the refinement and extension of the mathematicalmethods involved, as is shown by the works of Feller (1951, 1952) an d Mo ran(cf. 1962).2. Changes of gene frequencies as stochastic processes

    Fr om the standpoint of population genetics, the most elementary step in evo-

  • 8/3/2019 Kimura Journal

    4/57

    179ifi sio n models in population geneticslution is the change of gene frequencies. Here, gene frequencies mean the pro-portions of genes in a population. The simplest mathem atical a pproa ch to thisproblem is to regard the process of change as deterministic. Such an approachwas first used extensively by Haldane in his series of papers starting in 1924(Haldane, 1924). Strictly speaking, it applies only if a population is infinitelylarge and is placed in an environment which remains constant o r changes ina deterministic way. There are many circumstances in which this is sufficientlyrealistic as a first approximation. Furthermore, because of its simplicity, thisappro ach is sti l l the m ost useful, and is often the only manageable one for manyproblems. In nature, however, the process ofcha nge ma y no t be quitedeterm inistic,because of the existence of factors which produce ran dom fluctuation in genefrequencies, of which two different types may be recognized (Wright, 1949).On e is the ra ndom sampling of gametes in reproduc tion. The process of changein gen e freque ncy which is due solely to this factor is often called d rift. How ever,the term drif t in this contex t is hardly adeq uate unless the prefix ra ndoin is alsoattache d, an d i t may be called the ran dom drift in the narrow sense. This factorbecomes prominent in a small population. The other type consists of randomfluctuations in wha t W right called the systematic evolutionary pressures, ofwhich random fluctuation in selection intensity may be especially important.These two types of factors introduce a ra ndom element into the process of changein gene frequencies (random drif t in the wide sense). Thus , in the present review,we will regard the process of change as a stochastic process, where this meansthe m athem atical formulation of a chance event evolving in time.

    So far as I know , this line of investigation was initiated by F isher (1922). Inhis paper, F isher considered the ran dom sampling of gametes as the factor causingrandom f uctuation in gene frequency a nd , assuming n o selection, he investigatedits effect (H aged oorn Effect) on the decrease of variability in a species. Thou ghhis treatm ent was restricted to a quite simple situation, the paper was impo rtan tin th at h e introduced the m ethod of part ial differential equations in the studyof gene freq uency distributionsin a popu lation. This method,if properly extended,is equivalent to the approach which makes use of the Fokker-Planck equation,later introduced by Wright (1945). Here Fisher used the transformed gene fre-quency rath er tha n th e gene frequency itself. Also he suggested the method offunction al equation s to study the probability of fixation of an individual muta ntgene. As in many othe r pioneering w orks, this 1922 pap er w as not finished andcontained some minor errors and ambiguity. Later a more complete treatmentwas presented (Fisher, 1930) in which the errors were amended and the resultswere greatly extended. In my opinion this is one of the most beautiful papersever written on the math em atical theory of popula tion genetics.

    In 1931, Wrig ht published his now classical pap er "E volution in M endelianpopulations" in which he studied similar problems by his method of integral

  • 8/3/2019 Kimura Journal

    5/57

    180 M. KIMURAequations (Wright, 1931). Since then, Wright has published a number of im-portant papers on the probabili ty distribution of gene frequencies and on therole of random processes in evolution. He has emphasized the importance ofproper balance between the directed and the random processes in relation topopulation structure in evolution (cf. Wright, 1950).

    These two authors, as Feller (1951) has remarked, have studied individualproblems with great ingenuity, with the result that many limiting probabilitydistributions have been worked out. However, the problem of constructing amodel for the en tire process of change in the gene frequencies has no t been dealtwith by these authors.

    In t he field of the ma them atical theory of probability, progress in ou r knowl-edge of stochastic processes has been qu ite extensive since Kolmogo rov's fu n-damental paper (Kolmogorov, 1931). This is doubtless a result of the growingneed fo r the stochastic treatm ent of prob lems in diverse fields of modern science.It is no t surprising therefore that pioneering attempts to construct a m odel forthe entire process of change in gene frequencies were mad e by mathema ticians.Malecot (1948) who considered "evolution of the probability law in the courseof time", especially for the case of mutation pressure and ra ndom sampling ofgametes, sketched a me thod by which the solution m ight be obtained . Goldberg(1950) in his unpub lished thesis studied the sam e case and succeeded in o btainingtwo solutions of the diffusion equation involved.Fr om t he m athematical s tan dpo int, various types of stochastic processesarise in pop ulation genetics. Of special impo rtance is the M ark ov process whichKolmogorov called stochastically definite. The process of change in gene fre-quencies in a very small population consisting of a few individuals with non-overlapping generations will most appropriately be treated as a finite Markovchain. Th e fate of an individual muta nt gene in a large population can be treatedby the theory of branching processes.

    Generally, however, the processes of organ ic evolution in na ture ar e very slowand the nu mb er of individuals involved per gen eration is very large, so that theymay be treated with a dvantage as a continuous Marko v process in space (genefrequency) and time, as will be explained in the next section. Here the Fokker-Planck equation (Kolm ogorov forward equation) plays a fundamen tal role. Usingthis approach, the problem of constructing a model for the entire process ofchange in gene frequencies starting fro m an arbitrary initial frequency was solvedfor several genetically interesting cases by the present author (Kimura, 1954,1955 a, b, c, 1956 a, b, 1957, Crow a nd Kim ura, 1956). Also, it has been shownby the author (K imur a, 1957, 1962) that the Kolmogorov backward equationmay be used to obta in the probability of fixation of mu tan t genes in a popu lation.Recently, the stocha stic theory of gene frequency change w as used by Rob ert-son (1960) in his theor y of limits in a rtificial selection. He also studied the problem

  • 8/3/2019 Kimura Journal

    6/57

    181if is io n models in population geneticsof selection for heterozygotes in sma ll pop ulation s (Ro be rtso n, 1962) basedon the mathematical work of Miller (1962) who developed a powerful methodfor evaluating eigenvalues of the Fokker-Planck equation involved.

    Usually, the Fokker-Planck equations which appear in population geneticshave singularities at the boundaries and a deep mathematical investigation ofthese was carried o ut by Feller (1952) who clarified the nat ure of bo unda ries byusing semi-group theory.

    The diffusion equation approach has been used extensively in the study ofgene frequency c hang e because of its extreme usefulness. But it is an ap proxi ma-tion based on rather intuitive arguments. Therefore, to investigate th e conditionsunder which such approximation may be valid, is an im portan t task for mathe-maticians; Mo ran (1958 a and b) introduced two population models, with over-lapping a nd non-overlapping generations fo r the study of gene frequency distrib-ution in po pulatio ns. Watterson (1962) obtained sufficient cond itions, concerningthe change of gene frequency per unit length of time, under which the diffusionapproximation is valid, even if the gene frequency does not necessarily form aMarkov process. Recently he applied these conditions to unify Moran's twomodels (Wa tterson, 1964). Mo ran's models have also been investigated by Karlinand McG regor (1962). An im portan t contribution was m ade by M oran (1961)to th e problem of gene fixation in a finite popu lation. T hou gh he was able totrea t only a very simple genetical situation, his rigoro us treatm ent is im por tan tin giving a case where the diffusion approximation introduced by the author(Kim ura, 1957) can be checked.

    Accuracy of the diffusion approximation method may also be checked nu-merically by high-speed computer when the population number is very small; arecent study of Ewens (1963) seems to indicate tha t the approxima tion is quitegood for popu lations of reasonable size. Furth erm ore (Ewens, 1964), formulaefo r the leading terms of the corrections to diffusion approximations can be fou nd.3. The partial differential equation method

    A natural population which plays a significant role as an evolutionary unitshould consist of a large number of individuals, and the gene frequencies forthese behave practically as continuous variables. Also any change of gene fre-quencies must in general be very slow by ou r ordina ry time scale. The re are certaincases in which relatively rapid changes were observed in polymorphic characters,such as the spread of industrial melanism in moths. However, the typical rateof evolution shown in fossil records is of the ord er of one -tenth of a d arwin, onedarwin standing for the rate of change with a factor of e ( = 2.71 per millionyears (Hald ane, 1949); this suggests tha t t he ch ange in gene frequencies involvedmust be correspondingly slow.For these reasons, the process of change in gene frequency may be treated asa continuous stochastic process; this means roughly that as the time interval

  • 8/3/2019 Kimura Journal

    7/57

    I82 M. KIMURAbecomes smaller, so also does the am ou nt of change in gene frequency x du ringtha t interval. M ore strictly, the process is called a contin uous stochastic processif for an y given positive value E , the probabili ty that the change in x d uring thet ime interval ( t , t + 6t) exceeds E is o(6t), i.e. an infinitesimal of higher orderthan 6t . W e will assume also th at change in gene frequencies is Markovian, thatis, the probability distribution of gene frequencies at a given mo me nt t dependson th e gene frequencies at a preceding time to(to< t) but not on the previoushistory which has led to the gene frequencies a t t o .

    For th e study of this continuous M arkov process, one of the m ost powerfulmethodsa vailable makes use of the Kolmogorov e quations (Kolmog orov, 1931).We will first derive the Kolmogorov forward equation as applied to populationgene tics.Th rough out this article, we will assum e, unless otherwise stated , a diploidpopulation consisting of a fixed number N of individuals in each generation.Thus, there a re 2N genes a t each locus.

    Consider a pair of alleles A , and A , with respective frequencies x an d 1 - x .Let +(p,x;t) be the conditional probabili ty density that the gene frequency isx at t ime t , given that the init ial frequency is p at t ime t = 0 . This gives thetransit ion probability that the gene frequency moves from p to x after t ime t .With p fixed, +(p,x;t) determines a frequency distribution such that when1/(2N) is substi tuted for dx, + (p ,x ; t)dx gives an approxim ation to the frequencyof the class with gene frequency x (0 < x < 1) at time t, which when expressedin terms o f generations ,we may also roughly refer to as the t th generation; thisfrequency distribution may be denoted by

    When p is fixed, i t will often be omitted so tha t + (p ,x ;t) is writ ten as +(x ,t).Also it should be noted tha t the abov e relation (3.1) holds only fo r unfixed classes,i .e. for 0 < x < 1 . Frequencies of classes with x = 0 or 1 have to be treatedseparately.

    Let g(a x,x ; 6t ,t) be the probabili ty density tha t the gene frequency changesf ro m x t o x + 6x during th e t ime interval (t , t + 6t). Using this probability den sity,the assumption of a continuo us stochastic process m ay be expressed as1g(ax, x ;at, t) d ( a ~ ) at), ( a t-01,where E is some arbitrary preassigned positive value.

    Furth erm ore, for the process of change in gene frequency. we have

  • 8/3/2019 Kimura Journal

    8/57

    183~fusionmodels in population geneticswhere the integral on the right is taken over all possible values of 6x . Th e aboverelation is a n atura l consequence of th e assumption tha t the process is Markovian.The probability that the gene frequency is x at t ime t + 6t is the sum to tal ofthe probabilities of cases in which the gene frequency is x - 6x a t t ime t , andthe gene frequency increases by 6x during the sub sequent time interval ( t, t + at) ,with 6x ta king all possible values. Actually, the above relation is a special formof the K olmogo rov-Ch apman equa tion in the theory of stochastic processes.In this expression, 6x may take any value such that x - 6x lies between 0 a nd 1 ,exclusive of the end points. However, because of (3.2), only values in the range18x1< E are of any significance.

    Expanding the integrand on the right side of (3.3) in terms of 6x, we have

    where 4 and g stand for 4 (p ,x ;t ) an d g(d x,x;6 t, t) respectively. Thus (3.3) maybe expressed as

    Here we have assumed th at the ord ers of summ ation, integration and differen-tiation may be interchanged freely.

    Noting thatSgd(6x) = 1 ,

    and transferring th e first term on th e right side of (3.4) to the left, we have, afterdividing both sides by 6t,

  • 8/3/2019 Kimura Journal

    9/57

    1 d3 ( 6 ~ ) ~(6x, x ;6 t ,t ) 46x 1+ ... .

    Let(6x)g(6x ,x;6t , t )d(6x) = M(x, t ) ,

    at-0 6 t

    and assume that

    for n 2 3 . Then we have

    where M(x , t ) an d V(x, t) refer to the first and second mo ments of 6x duringthe infinitesimal time interval (t, t + 6t) .

    In practice, however, quantities such as mutation rates, rate of migration,intensity of selection, and e!Tect of ra nd om sam pling of gametes w hich determine6x are all measured with on e generation as a time unit and the limiting rate with6t -+ 0 can only be obtained by extrapolation. So we will replace M(x,t) andV(x, t ) in the above equation by M ax nd Vax, he mean and variance of the changein gene frequency per generation (6t corresponding to one generation). Thuswe obtain

    Such an equatio n, as given in (3.9), is called the Kolm ogorov forward equationby mathematicians. It is also called the Fokker-Planck equation by physicists.Actually, Fokker derived the steady state form in 1914 and Planck (1917) laterextended it to a quite general form , though rigorous mathem atical formu ationswere first given by Kolmogorov (1931).

  • 8/3/2019 Kimura Journal

    10/57

    185iffusion models in population geneticsThe above derivation leading t o equation (3.9), is ra ther formal. Mo re rigorous

    derivations may be found in the mathematical literature, such as Kolmogorov'spaper (1931). On the other hand, the above derivation may be too formal formost biologists to see the physical meaning of the terms invo1ved.A less rigorousbut very elementary derivation of the equ ation was devised by the a uth or , basedon the geometrical interpretation of the process involved (Kim ura , 1955 c). Itwas shown that the first and second terms on the right side of (3.10) give therates of change in the p robability distribution d ue respectively to ran dom fluc-tuatio n an d systematic pressures. It was also pointed ou t that the variance

    r at he r t ha n t h e s ec on d mo m en t, i.e . ~ ( ( 6 x ) ~ )hould be used for Vaxin (3.10),when the equation is applied to actual population genetics problems. This isbased on the consideration that (3.10) should give the deterministic processcorrectly when there is no random fluctuation, i.e. in the limit when Vbx= 0 .

    Since the gene frequency x lies between 0 and 1 in general, the process ofchange in gene frequency in a population through time is represented as thestochastic movement of a point x on the closed r eal interval [0, 11. Th e equation(3.10) can describe this movement at least on the open interval (0,l). We willnow show that

    which satisfies

    represents the rate (per generation) of net flow of probability across the pointx. First, let us consider the am ou nt of prob ability which flows over the point xin a positive direction during the time interval of length 6t. The contributionof the class with gene frequency t: to this is (see Figure 1) :

    Figure 1 Probability flux across the point x.

  • 8/3/2019 Kimura Journal

    11/57

    186 M. KIMURAand the total amount of probability flow in the positive direction, denoted byP+(x)dt, is the sum of all contribution s from the class at the left of x :

    Let t = x + q and expand th e integrand in terms of r ] , or ( - x ; we havewhere $ and g respectively denote 4(p,x;t) and g(d[,x;dt,t).

    Thus

    Similarly, the total amount of probability which flows over the point x in theopposite direction is

    Thu s the net am ou nt of probability which flows past the point x during the t imeinterval (t, t + dt) is

  • 8/3/2019 Kimura Journal

    12/57

    D@usion models in population genetics

    = 4 j 6 5 g ( 6 t , x ; d l, 0 4 6 0

    an d if we note (3.6), (3.7) and (3.8), we o btain , in the limit as 6t -t 0 ,

    In terms o f a generation a s the unit of t ime, the ab ove equation becomes (3.11)as was to be shown.In equations (3.10) and (3.11), M ,, and V,, are in gen eral functions of bothx an d t , but in most of ou r applications to population genetics, they are func-tions of x only, and independent of the time parameter t .Actually in th e presentpap er, we shall be concerned on ly with the cases where M ,, and V,, are functionsof x but independent o f t , namely where the process is t ime homogeneous. How-ever, except for the special case of being identically zero, they can never takecon stan t values, as might be th e case in m any diffusion pro blems in physics.

    As stated already, our fundamental equation (3.10) can describe the "move-men t" of the point x representing the gene frequenc y of a population o n theopen interval ( 0 ,l ) and as noted in (3.1), +(x, t)dx with dx = 1/(2N) gives theapp roxim ate frequenc y of the class with gene frequenc y x fo r 0 < x < 1. Theequ ation by itself ca nn ot give the rates of chang e in the relative frequencies ofterminalclasses. However, these rate s can be obtained by utilizing the establishedrelation (3.11); we use the fact that -P(0, t) a nd P(1, t) respectively represen tthe rates at which the probability flows into the classes x = 0 and x = 1, fromthe open interval (0, l). In the special but important case in which the changein frequencies of these term inal classes (x = 0 and x = 1) is solely du e to sucha n inflow of the probability, i.e. when bounda ries a ct as "absorbing barriers",we have

    where f (0, t) an d f (1, t) a re respectively the frequencies of classes with x = 0and x = 1 at the t th generat ion.Now, if 4( x, t) an d its first derivative with respect to x are finite at x = 0 andif V,',,a nd M,, vanish there, then

  • 8/3/2019 Kimura Journal

    13/57

    M. KIMURA

    In particular, if the random fluctuation is due solely to random sampling ofgametes, Vax= x(1 - x)/(2N) and [d V8,/dx], =, = 1/(2N ). Therefore

    The right-hand side of the above equation is approximately equal to half the rel-ative frequency of the subterm inal class with x = 1/(2N). This is because, for alarge value of N , 4(O, t)/(2N) mu st be very near to f (1/2N, t) unless la4/ ax I s verylarge at the neighborhood of x = 0 . It sho uld be noted here tha t if the effectivesize (N,) is different from the actual size (N) of the population, we must putVdx= x(1-x)/(2Ne) and therefore the right-hand side of (3.18) must be multi-plied by the factor of N IN ,. Similarly, we have

    Th e relation (3.11) is also useful in deriving the probability distribution ofgene frequencies in the steady state when the distribution curve reaches constancyin form. The distribution in this state may either be obtained from(3.20) P(x,t) = 0 (stable distribution)or from(3.21) P ( x , t) = con stan t (steady flux)depending on the circumstances.

    The abov e arguments may readily be extended to th e cases of two o r mo rerando m variables. Fo r the case of two rando m variables such as appear in thetri-allelic system having three alleles, A,, A , an d A, with respective frequenc iesx,, x, an d x, (= 1- x, - x,), the co rrespon ding differential equation becomes

    where = c#~(p,,p,,x,, x, ;t ) gives the probability density that the frequenciesof A , and A, become x, and X, at the tth genera tion given that th eir frequencies

  • 8/3/2019 Kimura Journal

    14/57

    189ifusion models in population geneticsare p, and p, a t t 0.In the above equation, WdXldX2 stands for the covariancebetween ax1 and S X ~ ,here ax1 and Sx, stand respectively fo r the rates ofchange of x, an d x, per generation. Also their mean an d variance M d x rand VaX,( i = 1,2 ) are in general functions of x, and x, , as well as of t . More generally,fo r the case of n seg regating loci each having a pair of alleles, if xc') isthe frequency of an allele at the ith locus ( i = 1,2, ...,n) then the Kolmogorovforward equation becomes

    where 4 = ~ (x c ' ) , . . . , x c " ) ;) isthe probability density that gene frequencies ar exc l )- xc l) + dx(') , ...,x(")- x(")+ dxc ") n the tth generation. In t he above equa-tion Sx(')is the rate of change of x(') per generation and M, Vand W respectivelydenote the mean, variance and covariance of the Sxc'!'s.The equ ations for the probability flux in the m ultivariate case may be derivedas in the case of a single variable. Here, I will merely present the equations forthe case of two independent random variables, x, and x,:where P (x, (x, ; t) and Q(x, ( x , ;t) are respectively the fluxes at point (xl,xz)along the x, and x, axes. In terms of these quantities, (3.22) is expressed in theform :

    It should be noted here tha t the existence of a stable gene frequency distributionat equilibrium does not necessarily mean that

    Fo r examp le, in a locus with three alleles, A,, A, and A,, if genes mu tate onlyin the sequence A, -,A, -,A, +A, ..., a stable non-trivial distribution may berealized under a cyclic flow of probability.So fa r, we have treated gene frequencies after t generations a s random variablesand initial gene frequencies as fixed. For example, in the expression 4(p,x;t),

    x is considered as a random variable and p is assumed fixed. This means that

  • 8/3/2019 Kimura Journal

    15/57

    190 M.KIMURAwe have considered the process of change in gene frequency in the forwarddirection in time.O n the other h and, we may regard x as fixed and consider p as a rand om var-iable.Nsm:ly, we reverse the tim e sequen ce an d view th e process retrospectively.In order to make our argument simpler, we will assume in what follows, thatthe process is time hom ogeneous. T ha t is, if x(t,) and x(t,) ar e respectively fre-quencies of a gene a t times t, an d t, (t , < t,) then the probability distributionof x(t,) given x(t,), w hich in general should be a function o f t , a nd t2 separately,depen ds only o n the difference t, - t2 . Th en, we haveThe above equation which is a counterpart of (3.3), contains g as a functionof three variables only, i.e. 6 p , p and 6t. This is because the probability thatthe gene frequency changes from p to p + 6p during th e time interval of length 6tis the same fo r any t (generation ) due to the assum ption of time homogeneity.Expanding g5 ( p + 6 1 7 , ~ ;) on the right-hand side of the above equation interms of 6p and using relations (3.6), (3.7) and (3.8) we obtain

    or in terms of on e generation as a unit of time, we have

    Note here that the initial gene frequency p is the variable and x is assumed tobe constant. Mathematically, equation (3.27) is the Kolmo gorov backwardequation as applied to the time homogeneous case, and i t is the adjoint formof (3.9).When x = 1, 4 in (3.28) gives the p robability tha t the gene whose initial fre-quency was p becomes fixed in the population by the tth generation. We willden ote this probability by u(p, t), fo r which we have

    W P , t) - v a p a2u(p, t> ~ u ( P ,)a t 2 ap 2 + M a p - . 8~The probability of fixation by a given time t will then be obtained by solvingthe above equation with boundary conditions

    In the present paper we will be especially interested in the ultimate probabilityof fixation defined by

  • 8/3/2019 Kimura Journal

    16/57

    Diffusion models in population geneticsu(p) = l im u(p,t).

    t * o O

    For this probability,a~- = oa t

    an d u(p) satisfies the or dina ry differen tial equation

    with boundary conditions

    Equation (3.29) may readily be extended to multivariate cases: consider n in -depen dent loci each with a pair of alleles, a no rm al and a m utan t allele. We willdenote by pci)theinitial requency of the mutant gene at the ithlocus ( i = 1,2, . ..,n).L et ~ ( p ( ' ) , p ( ~ ) , . . . , p ( ~ ) ; t )e the p robability tha t all the n mutant genes becomefixed in the population by the tth generation, given that their frequencies arep('), ..,p(") at t = 0 . Then u(p('), ...,p("'; t) satisfies

    In what follows, we will apply t he method of partial differential equ ation stosolve concrete problems arising in the theory of population genetics.

    4. Random drift in a small 6nite populationWe will start ou r discussion fro m the simplest situation where mu tation, mi-

    gration a nd selection are absent, b ut the gene frequency fluctuates from generationto generation because of the ran do m samp ling of gametes in a finite population .The process of change in gene frequency in this simplified form has attractedconsiderable attention am on g evolutionary geneticists, and various names havebeen given to it . Fisher (1922) called it the "Haged oorn effect". Since W right'swork (W right, 1931), the term dr i f t has become quite popu lar am ong biologists,an d terms su ch as the W right drift o r the Sewall Wright effect have been coined.However, in the mathematical theory of Brownian motion, the term drift orig-inally connotes directional movement of the particle; therefore, to use thisterm to denote the ran dom process in our context, the adjective random shouldbe attached to it .

    Let us consider an isolated population of N breeding diploid individuals.

  • 8/3/2019 Kimura Journal

    17/57

    Let A , and A , be a pair of alleles with respective frequencies x an d 1- x . Weassume that mating is at random and that the mode of reproduction is sucht h a t N male and N female gametes are drawn as a random sample from thepopulation to fo rm the next generation. The mean an d variance in the changeof gene frequency x per generation are : M,, = 0 and VaX= x(1- x)/(2N), thelatter being the binomial variance corresponding to 2N genes. If mating is notrand om , o r the distribution of the number of offspring does no t follow a Poissondistribution, the effective numb er N , may be substituted for the actual numberN (cf. Kimura and Crow, 1963).

    Substituting the above expressions for Maxand Va, into (3.10), we obtain thepartial differential equation

    where 4 = 4 (p ,x ; t) is the probab ility density tha t the gene frequency becomesx in the tth generation, given that it is p a t t = 0 , i . e.

    in which a ( - ) represents the D irac delta function.To solve (4.1) we try a solution of the form

    4 = T X ,where T is a function of t only and X i s a funct ion of x only. Substituting thisinto (4.1) and dividing bo th sides of the equation by T X , we have

    By assumption, Tis a function o f t only and hence the left-hand side of the aboveequation depends only on t ,while X is a function of x only, and hence the right-han d side of the equation depends only on x . It follows then th at both sides ofthe equation must equal a constant which we shall designate by - A . Thus theabove equation can be separated into two ordinary differential equations

    an d

    From the first equation (4.3) we have

  • 8/3/2019 Kimura Journal

    18/57

    Diffusion models in population geneticsThs second equation (4.4) is the hypergeometric equation(4.5) x(1- x ) X + [-p - (a + fl + l)x]X1 - aj?X = 0in which y = 2 , a + j ? = 3 and @ = 2 - 4 N A .

    Thus we have 3 + ,/I + 16NA 3 - J T T i E Za = and fl =2 2

    Though we cannot impose arbitrary conditions at the boundaries, we requirea solution w hich is finite at the singular points (x = 0 and 1).Am ong the two independent solutions of (4.5), only one, i. e. F(a, fl,y,x) is finitea t x = 0 in this case. In order to find the condition which makes F(a,fl,2,x)finite at the o ther singular poin t (x = I) , we make use of the following relation :

    If we note that a +B = 3 , we see th at in order that lim,,, F (a ,f i,2 ,x ) be finite,2 - a must be a negative integer and j? must be 0 or a negative integer. Thus theonly possible values of A are represented bywhere the i's are positive integers ( i = 1,2,3,...) .Corresponding to this eigen-value A,, we have a, = 2 + i and pi = 1- i.Thus we can write

    X = F ( 2 + i ,1 - i,2,x)except that it may be multiplied by a constant. Here it may be convenient touse the Gegenbauer polynomial defined by

    so that we can putX = T ,(z)

    where z = 1- 2 x. The properties of the polynomial have been thoroughly studied(see for example, Morse and Feshbach, 1953, pp. 782-783).Th e complete solution of (4.1) may then be w ritten in th e form

    mailto:@=2-4NAmailto:@=2-4NA
  • 8/3/2019 Kimura Journal

    19/57

    194 M. KIMURAT h e coefficients Ci can be determined by applying the initial condition that thepopulation starts from the gene frequency p, namely fro m (4.2), that

    Multiplying by (1 - z2)1;.!. ,(z) on both sides of the above equation, and usingthe orthogonaIity property

    where m in the Kronecker 6 m , i - 1epresents z ero o r a positive integer, we obta in

    Therefo re, the required so lution of (4.1) is

    where3 5r = 1-2p and T;(r) = 1 , ~ : ( r ) 3 r , T:(r) = -(5r2 - I ) , ~ : ( r ) -(7 r3 -3 r ), e tc .2 2In terms of the hypergeometric function F( . , . ; ; ) , equation (4.9) may be

    expressed in the fo rm

    F o r t > 0, the se ries is uniformly convergent in x and p . T his may be easily seenif we note that the exponential term approaches zero rapidly.Based on this solution, the process of change in the probability distribution

    of gene frequency when the population starts from p = 0.5 and 0.1 is illustrate din Figures 2a a nd 2b . In these figures, the abscissa represents the gene frequencyx and the ordinate the probabili ty density 4 . In discussing such a distribution,it is often conve nient to ad op t the "frequency interpretation" of probability,regarding the distribution curve as representing relative frequencies of variousgene frequency classes in the infinite collection of populations having the samesize an d subjected to the same conditions. Th e area und er each curve represents

  • 8/3/2019 Kimura Journal

    20/57

    Difu sion models in population genetics

    Figure

    Figure 2b

    Theprocess o f change in the probability distributionof gene frequency, due to random samplingofgam etesin reproduction. It is assumedthatthe populationstartsfrom thegenefrequency0.5 inFigure &and 0.1 in Figure 2b. t indicatestime(in generations),andNtheeffectivesizeofthepopu-lation.The ab cissa is genefrequency, heordinate is the probability density. (From Kimura, 1955a)

  • 8/3/2019 Kimura Journal

    21/57

    196 M. KIMURAthe probability that A , and A , coexist in the population. It may be seen from thefigures th at this probabilitygradually decreases with time. In othe r words, thefrequency of unfixed classes decreases with increasing numbers of generations.This is because if, by rand om chang e, the gene frequency becomes eithe r x = 1(fixation of A, or loss of A, from the popu lation) o r x = 0 (loss of A , ) , i t cannotreturn to intermediate values because of our assumption that no m utation occurs.Namely, genes go irreversibly into fixation (or loss). In the language of proba-bility theory , boundaries a t x = 0 and x =1 act as absorb ing barriers. From (4.10)it can be seen tha t the probability distribution finally becomes flat and decreasesits height at the rate of 1/(2 N) per generation. This is known as the state of steadydecay, and mathematically 1/(2N ) correspon ds to th e smallest eigenvalue of th epartial differential equation involved. This rate of decay is the most importantsingle quantity used to describe the process of rando m drift in the narrow sense,and it was first determined by W right (1931). T hu s we have

    Th e number of generations after which this asym ptotic formula becomes usefuldepends on the initial frequency p. Fo r example, with p = 0.5, it will be seenfro m Figure 2a th at the distribution curve becomes almost flat after 2N genera-tions , and the genes are still unfixed in ab ou t 50 per ce nt of the cases. On theother han d, with p =0 .1 (see Figure2b) i t takes 4N or 5N generations before thedistribution curves become practically flat. By th at time, however, the genesare fixed in more tha n 90 per cent of the cases an d the a symp totic form ula (4.11)may not be as useful as in the case of p = 0.5.

    The probability tha t A , and A2 coexist in the population of the tth generationis given by

    where r = 1 - 2 p and P(.) represents the Legendre polynomials Po(r)= 1 ,P l ( r ) = r , P 2 (r )= 4(3 r2- I ) , P3(r )= t (5r3 - 3r ), etc. Thus we have the asymp-tot ic formulaTh e frequency of heterozygotes or the probability tha t an individua l in a popu la-tion is heterozygous can also be calculated by using (4.10) to obtain

    This shows tha t heterozygosity decreases exactly a t the rate of 1/(2N) per genera-

  • 8/3/2019 Kimura Journal

    22/57

    197iffi sio n models in population geneticstion. Actually,this holdsalso for multialleliccasesandisindependent of th enu mb erof alleles involved.At this point a few remarks are in order. First, it should be noted that hom o-zygosity o r heterozygosity of an individual within a popu lation is a distinct con-cept from the genetic homogeneity or heterogeneity of the population itself.Wright used the term homallelic or heterallelic; a population is homallelic ifit contains only o ne kind of allele, an d is heterallelic if it contains two o r more.Secon dly, the pro bability of heterozygosity decreases at the ra te of exactly 1/(2N )per generation under random mating as shown by (4.14), while as shown in (4.12)the probability of coexistence of both alleles w ithin a pop ulation, thou gh con-tinuously diminishing in each generation , does not generally decrease a t a co n-stant rate even for a population of constant size N . Its rate of decrease approachesthe final value of 1/(2N) only asymptotically.

    The above treatments do not directly give the probability of absorption,namely the probability of reaching fixation by a given generation t , startingfrom an interm ediate gene frequency p . Th is may be obtained by the use of thebackward equation (3.28) assuming M,, = 0 and V,, = p( l -p)/(2N). It turnsout to be

    where this was first obtained from the stud y of the moments of the distribution(Kimu ra, 1955 a). In the present case, u(p,t) is equivalent to f(1 ,t) . In terms ofLegendre polynom ials, (4.15) can be expressed also as

    (4.16) f ( l , t ) = p f z (-1y-{pi- - pi+l(r)}e- i ( i+ l ) t l ( 4 N ! 9i = 1 2where r = 1 - 2p . U sing relation (3.19), it can also be obtained by integrating+(p, l,z)/(4N ) with respect to z from z = 0 t o z = 1 . The probabil ity f (0,)of A, being lost or A, being fixed by th e tth generation is obta ined simply byreplacing p with 1 - p and r with - r in the ab ove expressions. It is then possibleto show that

    So far, we have assumed that the population contains a pair of alleles at thestart. If a pop ulation con tains more than two alleles, the problem becomes muchmore difficult. F or example, suppose that the po pulation c ontains three allelesA,, A, a nd A, with respective frequencies of x,, x, an d 1 - x l - x,. Th en, theprobability density 4( pl ,p 2 x , , x,, t) that the frequencies of A , and A , becomerespectively x, an d x, at time t , given that they are p, an d p, at t = 0, satisfiesthe following equation (cf. equation 3.22)

  • 8/3/2019 Kimura Journal

    23/57

    198 M. KIMURA

    where 0 < x , < x , + x 2 < 1 . Th e solution of this equation w ith the initial cond ition

    + ( P ~ > P z ; x ~ > ~ z ; ~ )- PZ) p1)6(x2 -is (cf. K imu ra, 1956 a)

    where

    Here x3 = 1 - x l - x 2 , p 3 = 1 - p, - p 2 a n d , T: ( . ) an d Jj(.;;) de no te re-spectively the Gegenb auer an d Jacob i polynomials. The latter is expressedin terms of the hypergeometric function as follows:

    3In part icular Jo(a, c , p) = 1, J 1(5 , 4, p) = 1 - - p, etc.2Fo r the general case of an arbitrar y num ber of alleles, the ex act solution has

    no t been obtained . Nevertheless, the asymptotic behavior of the processes hasbeen successfully analysed a nd we have th e following result (cf. K im ura , 195 5 b).

    If we start from a population which contains n alleles, say Al,A 2,.. . ,A , w ithfrequencies p l, p2 ,. .. ,p n respectively ( Z ; p i = I), the probabil i ty density that i tcontains k of them, say A l ,A 2, . . . ,A k wi th respective frequencies x l , x2 , . . . ,G(Ctx , = 1) in th e tth gen eration is given a symptotically by

    where k n . The validity of this formula depends on the assumption that thepopu lation size N is sufficiently large as com pared with n , the nu mb er of allelesin question.

    The above result indicates that as the number of coexisting alleles increases,

  • 8/3/2019 Kimura Journal

    24/57

    199ifusion models in population geneticsthe rate at w hich a p articular state is eliminated by ran dom drift increases rapidly.In this sense, ran dom drift may be effective in keeping down the num ber of co-existing alleles in the population.5. An approximate treatment by the angular transformation

    It has been noted by Fisher (1922) that if we transform the gene frequencyfrom x to 8 by cos8 = 1- 2x, he sampling variance becomes independent ofgene frequency. Here 8 changes from 0 to 71 as x changes from 0 to 1 . The rate ofchange of 0 per gen eration, i.e. 68 , is related to 6x as follows

    Thu s, taking M,, = 0 and V, = x(1- x)/(2N), and neglecting the higher orderterms, we get

    If we start from a fixed gene frequency, the variance in 8 after t generations maybe given appro xim ately by V,(t) = t/(2N ), if t is much smaller than N. I t shouldbe noted here, however, that the expected value of 68 is not strictly zero, i.e.th e expression M,, = 0 which might be obtained by neglecting the second andfollowing terms on th e right-hand side of (5.1) is incorrect. This may n ot produceany trouble in the treatment of variance for a sh ort period, but will cause a seriouserror in the gene frequency distribution after a large number of generations.Let $(8,t) be the probability density of 8 at the tth gene ration. We ob tain

    which is th e Ko lmog orov forwar d equ ation in terms of 8 for this case (cf. 3.10an d 5.2). The above equa tion was given by Fisher (1930) as the correct equ aticnto replace the erroneous one which he had given earlier (Fisher, 1922), i.e.

    Th e latter equation was obtained by taking Md0= 0 and gave the incorrect valueof 1/(4N) as the rate of steady decay, rather than the correct value of 1/(2N)obtained by Wright (1931). The fact that the sampling variance becomes con stantby the angular transformation is nevertheless convenient for treating data onrandom drift over a relatively short period (cf. Bodmer, 1960).One of the most interesting applications of this type of transformation wasgiven by Cavalli and Conterio (1960), who analysed the distribution of bloodgro up genes in the Parm a River Valley. Their method is based on the conceptof "distance" as suggested by Fisher. Consider a locus with mu ltiple alleles

  • 8/3/2019 Kimura Journal

    25/57

    200 M. KIMURAA,, A,,...,A, . In order to characterize the genetic constitution of a population,we use n-dimensional Cartesian coordinates with each axis representing thesquare r oot of one of th e allelic frequencies. A population which contains thesealleles with respective frequencies of xl,x2,...,x, may be located on a hyper-sphere with radius 1. Figure 3 illustrates the case with three alleles. Letp1 ,p2 , . . - ,pnbe the corresponding allelic frequencies in some other population.Then the coefficient of genetic distance 8 between these two populations maybe defined by

    Figure 3 This illustrates the concept of the genetic distance for n = 3. (Redrawn from Cavalli and Conterio, 1960, with a slight modification)

    Geometrically, 8 is the angle made by two vectors (J

  • 8/3/2019 Kimura Journal

    26/57

    Diffusion models in population genetics1 1v = - =(5.6) 8N 4 (Num ber of genes) '

    where N is the effective size of the p opulation. As in the angular transformation9 = cos-'(1-2x), the expected value of 68 in this case is no t zero, bu t this maynot cause any serious error in treating the variance of 8, as long as the numb erof generations involved is muc h smaller than N . Fu rthe rm ore , if 8, is the geneticdistance of a population from the general population with respect to the firstlocus consisting of alleles A , , A,;.-,A,, an d if O b is the corresponding distancewith respect to the second locus consisting of alleles B,, B , , ...,B,, then thedistanc e Oabw ithrespect o these two loci com bined is given by cos 8,, =c os 8,cos 8,.

    Using these relations, Cavalli and Conterio studied the regression of 8 onpopulation density, village size an d ccdimensionality". Fo r details, their originalpaper (Cavalli and C onterio , 1960) should be referred to.

    6. A population under linear pressure and random sampling of gametesUnder the term linear pressure, we include the pressures of gene mutations

    and o f migration. Usually the rate of mu tation is so low th at although supply-ing the raw material for evolution, it can hardly determine the course of changein gene frequency. On the other hand, migration between sub-populations maybe of considerable significance in determining the gene frequency, as will befound in Wright's theory.

    Consider a random mating population of effective size N in which the fre-quencies of a pair of alleles A, an d A, ar e x and 1-x respectively. L et us sup-pose tha t this population exchanges individuals w ith a rand om sample takenfrom the total species at the rate of m per generation. Then the mean and va-riance of the rate o f change in x a re given by(6.1) M a x = m ( 2 - x), Vax = x ( l - x)/(2N),where R is the frequency of A, in the immigrants. If mutation rates are not neg-1igible, m may be replaced by m + p + v and m 2 by m 2 + v , where p and v arerespectively the muta tion rates of A, to and from its allele A,. Tho ugh the pressureof selection is intrinsically non-linear, in ce rtain cases, like tha t of selection act-ing at the neighborhood of the equilibrium gene frequency, it may be treatedas if it were linear w ith good approxima tion. H owever, the range of applicabilityis quite restricted. The solution of (3.10) when M,, and Vaxare given by (6.1)was obtained by the au tho r through the study of the mome nts of the distribution(Crow an d Kim ura, 1956), and it was found tha t it agrees with the "fundamentalsolution with flux zero bou ndar y condition" derived by Goldberg (1950). Thesolution is given by

  • 8/3/2019 Kimura Journal

    27/57

    whereXi(x) = X B - l ( l- x ) ( A - B ) - (A+ i - 1 , - i , A-B, 1 -x )

    in which A = 4Nm and B = 4Nm2.

    Figure 4b

  • 8/3/2019 Kimura Journal

    28/57

    Diffusion models in population genetics

    Figure 4c Asymptotic behavior o f th e distribution curve for a finite population w ith migration o r other linear pressure. In all three drawings, the gene frequency o f th e imm igrants is assumed t o be 0.5 and th e initial frequency in th e population 0.2. The abscissa is th e gene frequency x, the ordinate is the probability density 4. N represent population number, and m the rate of migration. (From Crow and Kimura, 1956) Figures 4a, 4 b and 4c show the asymptotic behavior of the distribution curve

    fo r three different cases: 4Nm = 0.2, 4Nm = 2 and 4Nm = 6 . In all three casesillustrated, the gene frequency P of the imm igrants is 0.5 and th e initial genefrequency p of the population is assumed to be 0.2.As t + oo, our formula (6.2) converges to Wright's well known formula forthe steady state gene frequency distribution with migration

    7. Change of gene frequency under selection and random sampling of gametes7.1. Genic selection (Case of no dominance). Since selection, either natu ralo r artificial, is always a t work in the process of evolution o r breeding, it is ex-

    tremely impo rtan t to study the effec t of selection under ran dom sampling ofgametes. We will start fro m the simplest case of genic selection an d considerrand om mating population of size N, in which A , and A , occur with respectivefrequencies x and 1 - x . Let s be th e selective advantage of A , over A , such thatthe average rate of change in x per generation is Max= sx(1- x) . We take, asbefore V,, = x(1 -x)/(2N), which is the variance due to rand om sampling ofgametes. With these expressions, (3.10) becomes

  • 8/3/2019 Kimura Journal

    29/57

    M . KIMURA

    The process of chan ge in gene frequency is analogous to the one studied in Sec-tion 4, but h ere selection is superimposed. Th e boundaries x = 0 and x = 1 actas absorbing barriers and the probability tha t a population contains bo th allelesA , and A , gradually d ecreases with time . Finally, it decreases at a c on stant ratewhich is given by th e sm allest eigenvalue (I . , ) of the above equa tion . In this stateof steady decay, the distribution curve retains constancy of form , but its heightdecreases at the rate of A, per generation. Probably, the smallest eigenvalue A,is the most important single quantity in the representation of this stochasticprocess. The above eq uation has been used to analyse the gene frequency changein a very small experim ental popu lation of Droso phila mela.nogaster (Wrigh tand Ke rr, 1954). In this paper, W right devised an ingenious method fo r analysingthe process of steady decay; the complete solution of the above equation hasbeen ob tained by the present autho r (Kimu ra, 195% and Crow a nd Kimura,1956). In ord er to solve the equ ation (7.1), let us put

    where c = Ns and V(x) is a function of x only. If we substitute this in (7.1), wehave

    Then by the substitutionx = ( I - z ) / 2 ,

    the abo ve equation (7.2) becomes

    where z = 1 -2x (- 1 < z < 1) . This type of d ifferential equation is known asthe oblate spheroidal equation. We want here the solutions which are finite atthe singularities, z = k 1, and reduce to the Gegenbauer polynomials if thereis no selection (N s = c =0).Such a solution has been studied by Stratton andothers (1941) an d is expressed in th e form

    where k = 0,1,2, .-.(k here corresponds to 1 in the no tation s of Stratto n et al.).In the above expression,f t's a re constants and ~ f , (z ) ' sre the G egenbauer poly-nomials defined by (4.7). The primed summation is over even values of n if kis even, odd values of n if k is odd .

  • 8/3/2019 Kimura Journal

    30/57

    205iflusion models in population geneticsTh e desired solution of (7.2) is given by summing the V/;'(Z) fo r a ll possible

    values of k , after h aving m ultiplied throug h by e2'"-"' ,where 1, is th e kth eigen-value; then

    The coefficient Ck can be determined by the initial condition

    using the orthogonal relation

    Thus we have

    where r = 1 -2p, c = Ns a nd th e primed sum mation is over even values of n ifk is even, over odd values of n if k is odd. The solution (7.5) with coefficientsdefined by (7.6) gives the probability distribu tion of the gene frequency am ongunfixed classes.As t increases, the e xpo nentia l terms in (7.5) decrease in abso lute value veryrapidly, and for large t only the first few terms are important. The numericalvalues of th e first few eigenvalues I,, I, and I, can be obtained from the tablesof the separation constants (Bln k)n the book by Stratton et al. (1941), by usingthe relation

    Among them, the smallest eigenvalue A, gives the final rate of decay and hasspecial importance. Fo r small values of c , l o can be expanded into a power seriesi n c ,

    In th enew table of spheroidal wave functions by Stra tton et al. (1956), "t" is tab -ulated fo r c (denoted by g i n the table) up to c = 8.0 (pp . 506-508), from wh ichvalues of 2N Io may be obtained by the relation

    In Figure 5, the relation between 2N Io and N s is plotted for Ns from 0 to 8.0.

  • 8/3/2019 Kimura Journal

    31/57

    M. KIMURA

    Figure 5 Relation between the rate of steady decay and intensity of selection as illustrated in terms of the relation between 2N& and Ns, where N is the effective size of the population, & is the rate of steady decay and s is the selection coefficient relating t o genic selection between a pair of alleles. (From Kim ura, 195%)

    Th e eigenfunctions V;','(Z) correspon ding to the AkYs re given by (7.4). Thecoefficients f k , correspon ding to the first three eigenvalues are found in the tableof Stratton et a/ . F or c = 0, all the formulae given above reduce to the ones forthe case of pure random drift studied in Section 4.

    The first eigenfunction v(,','(z) which corresponds to A, is of special signifi-cance, since it gives the frequency distrib ution of unfixed classes in the s tate ofsteady decay, when it is multiplied by ec('-" . It is expressed by

    Figure 6 illustrates the sh ape of the distribution curve in the state of steady decayfor three different cases; N s = 0, N s = 1.0 and N s = 1.7. The area under eachcurve is adjusted so tha t i t is u nity.The frequencies of bo th term inal classes can be o btained by using th e relations(3.19) and (3.18):

  • 8/3/2019 Kimura Journal

    32/57

    Diffusion models in population genetics

    I 4.0 -

    3.0

    2.0 -

    1 .o

    0.0 0.5 1.OFigure 6Frequency distribution of unfixed classes at the stage of steady decay is illustrated for threevalues of N s . The area under each curve is adjusted so that it is unity. Numbers beside the arrowsindicate rates of steady decay. N is the effective size of population, and s is the selectioncoefficient. (From Kimura, 195.5~)

    and(7.10)where

    and

    So f ar we have considered genic selection in which the gene is add itive (i.e.no dominance) with respect to fitness. The case of complete dominance is mo re

  • 8/3/2019 Kimura Journal

    33/57

    208 M. KIMURAdiff icult to treat, but the process of steady decay has been worked ou t for weakselection (Kimura, 1957). Also, for the more general case of an arb itrary degreeof dom inanc e, the smallest eigenvalue can be given as a power series for weakselection. Namely, if the selective advantages of A , A , and A , A , over A 2 A ,a re s a n d sh respectively, such that IM,, = s [ h + (1 - 2 h ) x l x ( l - .Y), thenwhere

    1 1 2 2 . 3 1 2 2K 1 = - -D, K 2 = - + - D', K , = D - - 0 3 , etc.,5 2 . 5 53 . 7 2 . 5 3 . 7 5 5.7in which c = N s a n d D = 2h - 1 . I t may be noted that for the case of n o dom -inance ( D = O), the ab ove series ( 7 . 1 1 ) agrees with ( 7 . 7 ) provided that 2s is usedinstead of s to express the selective advantage of the homozygous m utants.

    7 .2 . Case of overdominance . I t has been known since the early work ofFisher ( 1 9 2 2 ) tha t, in a n infinite popu lation, heterozygote superiority in f itnessfo r a pair of alleles leads to a stable polymorphism. Furthermore, a considerablenu mb er of claims have been m ade in recent years stating in effect th at overd om-inance is the m ajor factor for maintaining genetic variability in natura l popula-tions. Therefore, investigation of the ov erdominant case assuming a f inite popu -lation num ber will be of interest. Let us assume a pair of ov erdom inant allelesA , an d A , and designate by s , and s , (b ot h positive ) the selection coefficientsagainst the homozygotes A I A l a n d A 2 A 2 respectively, such tha t the averagerate of change in the frequency of A , is M,,= [s, - ( s ,+ s 2 ) x ]x ( l - x ) . Thevariance of 6x is again given by V,, = x ( l - x ) / ( 2 N ) .Th e partial differential equation corresponding to (3.10) with the M , , an dv,, given abo ve is

    where x is the frequency of A , .Let 3 be the average of the two selection coefficients, i.e., S = ( s l + s2) /2 ,and let i be the equilibrium frequency of A , th at ma y be expected in an infinitely

    large population, i .e. 2 = s 2 / ( s l+ s ? ) ,then the abov e equation m ay be expressedin the fo rm,

    or denoting 2 2 - 1 = i,

  • 8/3/2019 Kimura Journal

    34/57

    209ifusion models in population geneticsThe smallest eigenvalue I , of the above equa tion has been worked ou t by M iller(1962).Without loss in generality, we can take .f 2 0.5 or 12 0. For a large valueof c = NS = N(s, + s2)/2 and for the range 1> i 2 0, Miller obtained theasymptotic expansion

    where

    in which the Ci's are given by the recurrence relation( l - i Z ) C , + l = 2 1 C i + C i - , ,

    starting from Co = 1/(1 - i Z ) and C, = 2.941 - 2')'.In particular, when I = 0, that is when s, = s,, (7.15) reduces to

    Also, when 2N(s, + s,)(l - 2)' is large, (7.15) may be replaced byA A

    Miller has also obtained A, for various values of c ( 2 0) up to c = 12 by num ericalanalysis.It may be noted here that fo r small values of NS, the eigenvalue may be cal-culated from the power series (7.11) by putting c = N(s, - s,) and cD = N(s,+ s,) = 2NS in it. F igure 7 is constructed on the basis of his numerical results,giving 2N1, as a function of c for the cases of .f = 0.5,0 .7,0.8 and 0.9. One ofthe most remarkable features disclosed in the figure seems to be that if s, ands, differ to such an extent tha t the equilibrium frequency 9 s higher than 0.8(o r, because of symm etry, less than 0.2), overdom inance tends to acceleratefmation as compared with the neutral case, rather than retard fixation. Thiswas first pointed o ut by Robertson (1962) who presented this fact in the for mof a graph shown in Figure 8, where the term retardation factor is defined asthe reciprocal of 2 N lo .According to him, selection for a heterozygoteis a fac torretarding fixation only if the equilibrium frequency lies within the range of

  • 8/3/2019 Kimura Journal

    35/57

    210 M. KIMURA2.5 i

    0Xz (U

    Figure 7 Relation between 2N& and c (= N(sl + 4 1 2 ) is plotted for va- rious equilibrium frequencies 2, when there is ov erdominance bet- ween a pai r of alleles.

    0 5 10 C

    I03(I0 t-o 2 102z 02- to '

    Figure 8 0:Graphs showing retardation factor I-a s a function of equilibrium gene :frequency f o r various values of N . 10'(sl+sz),where Nistheeffective sizeof th e population, and sl and s2areth e selection coefficients againstboth homozygotes. (From Robert- 10"son, 1962)E QUI L I BRI UM GE NE FREQUE NCY

  • 8/3/2019 Kimura Journal

    36/57

    Diffusion models in population genetics 2110.2-0.8. Fo r equilibrium gene frequencies outsid e this ran ge, there is a rang eof values of N(s, + s2) for which heterozygote advantag e accelerates fixation,and the more extrem e the equilibrium frequency the wider the range. However,for allvalues of i except 0 or 1 , an increase in the values of N(s, + s2) eventuallyleads to retardation.8. Randon fluctuation of selection intensities

    Among the fdctors which cause rand om fluctuation in gene frequencies, r and omfluctuation of selection intensities may often be as importa nt as ran dom samp lingof gametes. To single out this factor, we will here assume that the populationis infinitely large so th at the effect of rando m sampling may be neglected. Alsowe will con sider the simplest case of genic selection in which a pair of alleles,A, an d A, are involved. Let s be the selective advantage of A , over A , such thatthe rate of change in the frequency of A, for a fixed value of s is sx(1 -x ). Letus assum e also tha t A, is selectively neu tral on the average so tha t the meanvalue of s over a long period is zero, and its variance Vs is constant. ThenM ,, = 0 , V,, = v,x2(1-x ) ~nd the pa rtial differential equation (3.10) becomes

    To solve this equation, let us put

    and

    Then we obtain the heat conduction equation

    It is known that this equation has a unique solution which is continuous over- oo to + oo when t 2 0 , and reduces t o u(5,O) when t = 0 . The solution isgiven by

    Therefore, the required solution of (8.1), when the initial distribution of genefrequency is 4(x,O), is given by

  • 8/3/2019 Kimura Journal

    37/57

    M.KIMURA

    0 0 0 5 10Figure 9

    The process o f change in the gene frequency distribution u nder ran dom fluctuation of selection intensities. In this illustration i t is assumed th at th e gene is selectively ne utra l when averaged over a very long period, tha t there is no dominance, a n d p = 0.5. V = 0.0483. The abscissa is the gene frequency x, the ord inatethe probability density 4 . (From Kimura, 1954)

    If the initial condition is not a con tinuo us distribution, but a fixed gene frequencyp , the above form ula reduces to

    Th e process of ch ange in the distribution curve with time is illustrated in Figure9 assuming p = 0.5 an d V ,= 0.0483. As will be seen in the figure the distribu tioncurve is unim odalfor a considerable number ofgenera tions (in the case il lustrated,27 generations), after which it becomes bimodal. In th e 100th gen eratio n, genefrequencies in our example giving maximum probability (peaks in the curve)are approximately 0.0007 and 0.9993. As time goes on the distribution curvebecomes more and more U-shaped, though it is no t a true U-shaped curve, sinceits value a t either terminal is always 0 . This means th at as t ime elapses the gene

  • 8/3/2019 Kimura Journal

    38/57

    213ifusion models in population geneticsfrequency shifts towards either terminal of the distribution (x = 0 o r 1) indef-initely and accumulates in the neighborhood just short of fixation or loss, butnever becom es fixed or lost completely (at least theoretically). To distinguishthis from the fixation or loss in the case of random drift in Small populations,the terms quasi-fixation and quasi-loss were proposed (Kimura, 1954).If the genes are no t neu tral on the average, then M,, = 0 should be replacedwith M,, = Sx(1- x) in the partial differential equa tion, where S is the long termaverage of s . Unfortunately, the solution of the corresponding partial differen-tial equation has not so far been fou nd.How ever, the following approximate treatm ent may be helpful in obtaininga rough picture of the process involved. If we transform the gene frequency xinto its logit ( by the relation 5 = log [x/( l - x)], ( changes continuously from- co to + co as x changes from 0 to 1 . For a small change of (, we have

    Neglecting terms of higher order than the first, and noting M,, = Sx(1 -x ) andV,, = vSx2(1- x ) ~ ,we obtain

    These expressions indicate that on the logit scale the mean and variance of thegene frequency distribution increase approximately linearly with time. Namelythe probability distribution of ( in the tth generation is given by the normaldistribution with mean to+ St and variance Kt, where go is the logit of p , i.e.to= log [p/(l - p)] . Figure 10 illustrates the p rocess of change obtained by thismethod for the case of a pair of alleles with S = 0.1 and I/;= 0.0025 (a, = 0.05).Actually, the case of genic selection in an infinite population with randomfluctuation in selection intensities can most easily be treated by the followingdiscrete model. Consider the multiple alleles, A,, A,, ...,A, at a locus and letwi be the fitness of A, measured in selective values. If xi(t) is the frequency ofA , in the tth generation, then

    where W is the average selective value of the population in the tth generation,

    Here t takes on d iscrete values, 0, 1, 2, etc. From (8.6), it follows that, fo r any pair of alleles, say for Ai and A j, we have

    so that

  • 8/3/2019 Kimura Journal

    39/57

    M. KIMURA

    Figure 10 The process of change in gene frequency distrib ution d ue to genic selection w ith ra ndom fluctuation of selection intensity. Gene frequency is measured on a logit scale 6,and S and V , ar e respectively th e mean and variance of the selection intensity. In this figure the initial gene frequency is 0 onth elo git scale,i.e.p =0. 5, and tst ands fortim e(in generations). (From Kim ura, 1955 c)

    where zilj(t) is the logarithmic gene ratio at the t th generation, i .e.

    an d silj(z) is the value of log(w i/wj) in the 7th generation . Th us, if siij(z) is dis-tributed normally with mean jiljan d variance o$j, then z ilj(t) is distributednormally with mean ziIj(0)+ j i l j t and variance oij t . Furthermore, even if si l jis not distributed normally, the distribution of z,/,(t) for a large value of t willappr oach th e norm al distribution by the central l imit theorem. Since (8.7) holdsfor any pair of i and j , and since

    in each generation, the joint distribution of gene frequencies for an arbitrar ygeneration can easily be worked out assuming normality of zilj(t).Besides the effect of random fluctuation of selection intensity discussed inthis section, that of random fluctuation in migration rate has also been studied.If m is the migration rate which fluctuates randomly from generation to genera-tion with mean f i and variance Vm,then we have-- = -,,,-2 {(x- t)%}+ 6 a {(x- 04J1,a t 2 axZ

  • 8/3/2019 Kimura Journal

    40/57

    215iffusion models in population geneticswhere 5 is the frequency of A, in the migrants. Assuming that the initial genefrequency p is higher than that of the migrants, the solution of this equation is

    It is then possible to show thatl im $ (p , x ; t ) = S(x -

  • 8/3/2019 Kimura Journal

    41/57

    216 M. KIMURAIn the above formula C is a c ons tant which is usually ad justed in such a way

    that(9 .2) jb+(x)dx = I .It is remark able t ha t W right (1938a) first derived formu la (9.1) from th e simple

    consideration that at equilibrium the mean and variance of the frequency dis-tribution are unchanged in successive generations.

    From our stan dpoint i t is more natura l to derive the formula from (3.11) byimposing condition (3.20), namely th at the probab ility flux must be z ero a t everypoint in the open interval (0 , l ) .

    This leads to

    which, upon integration, gives (9.1). On e of the tacit assumptions involved here isthat the process is t ime hom ogeneous, so tha t M,, and V,, are independent of t.

    As an example, let us consider the case discussed in Section 6, whereM,, = m(2 - x)and V,, = x(1- x)/ (2N).

    Substituting these in (9.1), we obtain

    The coefficient C as determined from (9.2) is

    Thus, the equilibrium distribution is given by

    It is imp orta nt to note that the above e quation agrees with (6.3), which is ob-tained from (6.2) by taking the limit as t + a.Fo r the application of (9.1) to various other cases, the reader m ay refer to aseries of papers by W right, especially Wright (19 38a , b, 1939, 1948), a nd alsoto Kim ura (1955 c).

    M ore recently, an interesting m odel for a haploid organism with overlappinggenerat ions was used by Moran (1958~) o derive an exact distribution underreversible mutation.

    So far, we have considered gene frequency distributions containing only onerandom variable x . Here, Wright 's formula (9.1) is fundam ental, and it hassufficient generality to be useful for m ost purposes. H owever, in o rder to study

  • 8/3/2019 Kimura Journal

    42/57

    217iffusion models in population geneticsthe effect of gene interaction on the frequency distribution involving multiplealleles or multiple loci, formulae for the distribution of more than one randomvariable are required. Unfortunately, no formulae of com parable !generality to(9.1) have been obtained for such cases, but W right (1937) has obtained an im -portant formula which deals with epistasis under the assumption of randommating and constant fitness of individual genotypes. For the case of two locieach with a pair ofalleles, A , and A , at the first locus and B , and B , at the second,his formula can be expressed as follows:

    where x and y are respectively the frequencies of A , and B , ,and @ is the averagefitness of a population measured in selective values with respect to these twoloci. In the ab ove formula, p, is the mutation rate from A , to A , and v l is therate in the reverse direction, while p, and v , are corresponding values for B 1an d B , . If the fitness is measured in Malthusian parameters (cf. Fisher, 1958,Kimura, 1958), @ in the abov e form ula should be replaced by e i , where E is theaverage fitness measured in Malthusian parameters.Wright (1937) derived the above equation through an ingenious but intuitiveargum ent. Fr om ou r standpoint, however, i t is more natural , as in the case ofa single variable, to derive the eq uation from a co nsideration of the probabilityflux. Actually, this enables us to view the problem in a m uch wider perspective.

    Consider a region in two-dimensional Cartesian co-ordinates with 0 5 x S 1and 0S y 5 1 . The probability flux which passes throug h the point (x,y ) alongthe x-axis is

    Similarly, the flux which passes throug h the same point along the y axis is

    A t equilibrium, when the flux is zero at every po int,P (X IY ; t ) = ~ ( y I x ; t ) = o ,and we have

    where $ = log +(x , y). Thus, if the above simultaneous equation in a$/ax andd$/ay has a uniqu e solution ($,,$,) an d if

  • 8/3/2019 Kimura Journal

    43/57

    218 M . K I M U R Ais the to tal differential, w hich we will den ote by d $ , then the simultaneous dis-tribution at equilibrium may be given by

    where the constant C is determined by the condition

    ( 9 . 7 ) j: / ) ( x , y ) d x d y = 1 .In the special case in which random sampling of gametes is the sole factor

    for producing ra ndom fluctuation in gene frequencies,v = - - - x ) I / =----Y ) an d Wax,?(9 .8 ) " x(1-2 N ' a y ' ( I2 N = 0.

    Under the assumption of rando m mating and co nstant (but no t necessarily equal)fitness of individual genotypes, the mean rates of change in gene frequenciesare expressed by

    where ii = log V . Substi tuting ( 9 . 8 ) and ( 9 . 9 ) in ( 9 . 9 , we get

    and since ~ 3 $ ~ / d y d $ , / d x it is evident that d $ exists. Indeed= 2 ~ d ~ i i / a x a ~ ,it is given by

    f ( 4 N p 2- 1 ) log ( 1 - y ) f ( 4 N v 2- 1)l o g y ) . Thus, from (9.6), we obtain (9 .10) ~ ( X , ' ) = C ~ ~ ~ ' X ~ ~ " ' - ~1 - x ) 4 N p 1 - 1 Y4 N v 2 - 1( 1 - y ) 4 ~ " 2 - 1where Z = l o g V. This completes the derivation of ( 9 . 4 ) .

    9 . 2 . Distr ibution under irrevers ib le mutat ion. Ou r formula ( 3 . 1 1 ) mayalsobe used to obtain the frequency distribution under steady flux. In this case we

  • 8/3/2019 Kimura Journal

    44/57

    219igm ion models in population geneticsassume th at th e steady state is reached with respect to the d istribution of inter-mediate gene frequencies (0< x < I), but tha t there is a constan t flow of prob-ability from one terminal class to the other. Such an assumption may be justi-fied if the loss of proba bility by the d on or te rm ina l class is negligible, as in thecase of a deleterious mutation steadily reaching fixation in a finite populationa t an exceedingly low rat e under the pressure of irreversible mu tation againstthe strong action of selection.Th e steady flux so lution m ay be ob tained from (3.11) by applyin g c ondition(3.21) as follows. Let D be the p robability flux, then

    The solution of this equa tion, i. e.(9.11)where(9.12)gives the steady flux distribution, in which C is a constan t. The form ula (9.11)above was first obtained by Wright (1945).In what follows I shall discuss the application of this formula t o a more con-crete genetical problem, and will also present some extensions of Wright's re-sults on irreversible mutation.Let us suppose that m utation is irreversible a nd occurs a t a con stant rate onlyin the direction A, + A , . In a finite pop ulat ion, A, will eventually be lost fromthe popu lation even if A, is disadva ntageo us, because ran dom drift may carryA, in to fixation, and once this occurs A, is perm anently lost from the populatio n.Let x be the frequency of A, and suppose that mutation is occurring fromthe class x = 0 (i.e. from A,) a t an exceedingly minu te rate v, with irreversiblefixation in the class x = 1 . Following Wright (1942), we assume two con ditions,

    and

    Both of these are approximations. The first condition m eans that in the neigh-borhood of x = 0 , mutation and random extinction balance each other so thatthe number of new mutations 2 N v is half the frequency of the su bterm inal class

  • 8/3/2019 Kimura Journal

    45/57

    220 M . KIMURA(see equatio n 3 . 1 8 ) . The second condition means tha t the height of the distribu-tion curve is so low in the neigh borhoo d of x = 1 as compared with the neigh-borhood of x = 0 , ha t the frequency of the su bterminal class with x = 1- 1 / ( 2 N )may be set equal to zero. H ere we assume that the m ajority of populations co n-tain only A , , i.e. f ( 0 ) = 1 approximately.The selective advantages of A , A , and A , A , over A 2 A 2 may be designatedby s and s h , both of which may be negative if A , is uncond itionally deleterious.The mean r ate of change in x per generation by selection may be given by

    M,, = s ( h + ( 1- 2 h ) x ) x ( 1 - X ) .It is more convenient, however, for the subsequent treatment to express M,,in the form

    M , , = ( s , + s 2 x )x ( 1 - X ) ,where s , = sh and s , = s ( l - 2 h ) . If we combine this withand substitute them in ( 9 . 1 1 ) , we obtain

    The relative frequency, f ( x ) , of the class with gene frequency x (discrete)will be given by 4 ( x ) / ( 2 N ) for x between 1 / ( 2 N ) and 1- 1 / ( 2 N ) . In the aboveformula , C and D are constants to be determined by the two conditions ( 9 . 1 3 )and ( 9 . 1 4 ) .In th e following calculation, we will assume t ha t N becomes infinitelylarge and I s , ( and Is2 1 infinitely small, while 2 N s , an d 2 N s 2 remain finite.

    First , from condition (9 .13) , we get, neglecting higher ord er terms,

    Secondly, from ( 9 . 1 4 ) ,

    From the assumption th at I s, I and I s , I are very small, i t turns out that D/Nmay be neglected as comp ared with C in ( 9 . 1 6 ) .Th en, we obtain

  • 8/3/2019 Kimura Journal

    46/57

    Diffusion models in population geneticsand

    For the special case of s, = O (no dom inance), the above formula reduces to

    Fisher (1930) gave the frequency distribution of mutant genes when there is asupply of one mutation in each generation. His formula is

    where df is the frequency element, i .e. f( x) in o ur notatio n, a is the selectiveadvantage (our s,) assumed to be very small , n is the number of breeding in-dividuals in a population (our N) an d z is the logit of the mu tan t gene frequ encyx . It is no t difficult to show tha t (9.21) agrees with Fishe r's form ula (9.22) if wenote tha t n = N , a = s, ,x = log,(x/(l - x)} , dz =dx/{x( l -x )} = 1/{2Nx(1- x))an d 2 N v = 1 .

    The net probability flux D will give us the probability of ultimate fixation ofan individual mutant gene if we divide D by the number of mutations per gen-eration, i ,e, 2Nv.

    Thus we obtain, from (9.19),

    or put t ing s , = sh and s, = s( 1 - 2 h ) ,A more general expression for the distribution under irreversible mutation

    may be obtained directly from (9.11) by imposing conditions similar to (9.13)and (9.14). In the following treatment, we will take a general form of selectionin M,, so that selection coefficients may be gene-frequency dependent. Also,V,, may include the effect of random fluctuation in selection intensities. Sincethe rate of change in gene frequency by selection contains the factor x( l -x) ,Vdxmay be expressed in the form

  • 8/3/2019 Kimura Journal

    47/57

    M. KIMURA

    where P(x) is a poy nom ial in x and N , is the effective size of the population. N ,may differ from the actual number N of the individuals.First, consider the exchange of class frequencies in th e neighbo rhood of x = 0.

    The flux due to the new production of the m utant genes from th e terminal class(x = 0) is 2Nv, while the flux towards the opposite direction due to the loss ofthe mutant genes is

    where the higher ord er terms a re neglected. Equa ting these two op posite fluxes,

    No te th at th e above relation reduces to (9.13) if N , = N . When applied to (9.11)it leads to

    if higher order terms are neglected.Ne xt, we assume as before th e condition (9.14). It leads to

    Subs tituting (9.25) an d (9.26) in (9.11), we obtain a general formula for th egene frequency distribution under irreversible mutation

    where

    No te here th at in calculating Maxonly th e effect of selection is assu me d. Theeffect of m utation sho uld no t be included in this term for the present calculation.The probabil i ty of fixation of an individual mutant gene may be obtainedfrom D by dividing by 2N v, so that

  • 8/3/2019 Kimura Journal

    48/57

    Diffusion models in population genetics

    As pointedoutiby Wright (1938 a), the present treatment should have a bearingon th e possible evolutiona ry modification caused by m utatio n pressure, th eeye degeneration and loss of pigment of cave animals being especially suitedexamples.10. Probability of fixation of mutant genes in a population

    10.1. Intro duc tory rem arks. In the study of evolutionary genetics, it isimportant to know the probability of ultimate success (i.e. fixation) of mutantgenes, because fixation of an advan tageous gene is the key factor in the evolutionof th e species. Pioneering work has been do ne byFisher (1922, 1930) an d Ha ldan e(1927) who ob tained the approx imate (b ut sufficiently accurate) probab ility offixation of an ind ividual mu tant gene fo r the case of genic selection (i.e. no d om -inance). They made use of the m ethod which is now stan dard in the treatmentof branching processes. Recently, M oran (1961) was able to co nstruct a rigoroustheory for the probability of survival of a mutant gene in a finite populationof a haploid org anism, where complication by dom inance is n o t involved.

    Results equivalent to those of Fisher and Haldane have been obtained byWright (1931) from the study of the frequency distribution under irreversiblemu tation . Also the prob ability for a recessive gene was estimated by H aldane(1927)and Wright(1942). Later, a mo regeneralresult was obtained by the presentauthor (Kimura, 1957) based on a diffusion model which covers any degree ofdominance. The probability of eventual fixation u(p) was expressed in termsof th e initial frequency p , selection coefficients and the effective populationnumber. This function was used by Robertson (1960) in his theory of selectionlimits in plants an d animal breeding. A still more general, b ut quite simpleexpression for u(p) was obtained by th e auth or in terms of the initial frequency,and th e mean and variance of the rate of change of gene frequency per genera-tion (K imu ra, 1962). It was applied t o solve problems where there is a ran domfluctuation in selection intensity. These results by t he au tho r have been o btainedby using the method of the Kolmogorov backward equation.Th e m ethod is quite far-reaching, and even enables us to obtain the probabilityof joi nt fixation of m utant genes at multiple loci under th e assumption of rand ommating and constant (but not necessarily equal) selective values of individualgenotypes. In the present article, the result for multiple loci will be presentedfo r the first time. I t enables us t o study th e effect of epistasis o n th e fixation ofgenes in a finite population.

    10.2 Sin gle locus. I t was stated in Section 3 tha t if u(p ,t) is the probabilityof a mu tant gene's reaching fixation by the tth generation, given th at its frequency

  • 8/3/2019 Kimura Journal

    49/57

    224 M. KlMURAis p a t t = 0 , then u (p ,t) satisfies equation (3.29). It was also indicated th atthe required probability would be obtained by solving this equation withboundary conditions (3.30). In the simplest case of random drift in a finitepopulation of size N with no m utation an d selection, the solution of the eq ua -tion was given by (4.15), but in a more gen eral case the exact solution is ratherdifficult to obtain.

    Naw let us consider the ultimate probability of fixation defined byu(p) = lim u( p, t).

    t - *m

    Fromthe standpoint of long-term evolution, this may be the ,most importantquantityrelating to the fixation of mutant genes. Since au(p)lat = 0 for thisqua ntity, equation (3.29) reduces to th e o rdinary differential equation

    with boundary conditions

    For tuna tely, the pertinent solution of this equation can easily be foun d an dis expressed as follows (Kimura, 1962),

    where

    in which Ma , and V,',, are the m ean an d variance of the ch ange in gene frequencyx per generation.

    The above formula for u(p) is the steady state solution of the Kolmogorovbackward equ ation, and is the counterpart of Wright's form ula for d(x) (cf. 9.1),which is the steady state solution of the Kolmogorov forward equation. Bothformulae have a pleasing simplicity, and are yet of sufficient generality to coverthe cases of sexually reproducing haploid, diploid and polyploid organisms, aswell as asexually reproducing plants.

    The probability of fixation of individual mutant genes in a population of Ndiploid individuals may then be obtained by taking p = 1/(2N), so that

    However, caution is necessary when applying the abo ve metho d to a dioeciouspopu lation where the numb er of males, N*, can be different from t ha t of females,

  • 8/3/2019 Kimura Journal

    50/57

    Diffusion models in population genetics 225N**. In such a case, either the initial condition p = 1/(4N*) or p = 1/(4N**)should be used depending on whether the m utant gene occurred in a m ale or ina female, as was pointed ou t by M ora n (1961) and Watterson (1962).In what follows I will discuss a few simple cases, assuming a population ofsexually repro ducin g diploid individuals. We will deno te by A , the m utant genewhose initial frequency is p .Th e simplest case is th at of genic selection, in which A , has a const ant selectiveadva ntage s over its alleles in a p opula tion of effective size N,. In t his caseM,, = sx(1 - x), Vax= x(1 - x)/(2Ne) so that 2M,,/V,, = 4Nes, G(x) = e-4N*sxand we obtain from (10.4)

    For 1 2Nes I < n: , the right-hand side of the abov e equation may be expandedin terms of 4Nes as follows:

    where the 4i(p)'s are Bernoulli polynomials. Thus for a small value of 2Nes,u(p) - p is 2Ne times sp(1 -p ). In other words, the total advance is 2N, timesthe change in the first generation, as was pointed ou t by Robertson (1960).If the effective size of the population is equal to t he ac tual size, N m ay be sub -stituted for N,. Assuming the sex ratio is unity, the probability of fixation ofan individual mutan t gene is obtained f rom fo rmu la (10.6) by putting p = 1/(2N).If I sl is small, we obtainas a good approximation. This agrees with the result obtained by Fisher (1930)an d by Wright (1931) using different me thods. F or a positive s an d very large N,we obtain th e well known result th at the probability of ultimate survival of anadva ntage ous mu tan t gene is approxim ately twice.the selection coefficient (Ha l-dane , 1927). If N, differs from N , this value should be modified by a factor o fN,/N so tha t(10.9) u = 2s(Ne/N).According t o C row (1954) and also Crow and Mo rton (1955), estimated valuesof N,/N for a few cases ar e: Drosophila 0.48 0.9, Lym naea 0.75, M an 0.69- 0.95.Th e above results were obtained o n the basis of the method of diffusion ap-proximation, an d it is desirable to check some of them by a rigorous treatm ent.This was done by Moran (1961) who used a population model consisting of

  • 8/3/2019 Kimura Journal

    51/57

    haploid individuals, with offspring number following a negative binomial dis-tribution. He assumed th at in each generation, the population consists of exactlyM individuals of which a rando m nu mb er k are of one type (say A,) a nd M- kof the othert ype (say A,). Supp ose tha t the generating function for the probabilitydistribution of the num ber of offspring is

    fo r gene A, and

    for gene A,, where b = 1 - a and A = 1+ s with s = O(M-I) . The mean andvariance of the distribution generated by P,(z) are 1 an d a - ' respectively. Themean of the distribution generated by P,(z) is A = 1 + s , so that s representsth e selective advantag e of A , over A,. It was then shown by Moran that the prob -ability of ultimate fixation of A ,, say P I , satisfies the relation

    where k,, s the initial number of A,-genes, and that fo r large M both 8, and 8 ,become asymptotically equal to 2sv- ',where v = a - ' s the variance generatedby P , ( z ) . If we deno te by X the prop ortion of A , genes in the haploid popula-tion (X = k/M ) a nd if A, were selectively neu tral, the variance of X per genera-tion would be vX(1 - X)/(M - l ) . The effective size of the popu lation, say M emay then be defined by equating this variance with the binomial varianceX ( l - X )/ M e, so that asymptotically v- ' = M,M- ' . Since bot h