Classical Population Genetics Genetic Variation at one Locus with 2 Alleles Source: Theory of...

Post on 13-Dec-2015

217 views 1 download

Transcript of Classical Population Genetics Genetic Variation at one Locus with 2 Alleles Source: Theory of...

Classical Population Genetics

Genetic Variation at one Locus with 2 Alleles

Source: Theory of Population Genetics and Evolutionary Ecology, Jonathan Roughgarden, Prentice Hall, Upper Saddle River, NJ, 1996 reprint of 1979 edition, Part One, pp17-100

Consider a population with two Alleles, A and a.

Possible Genotypes: AA, Aa, and aa

Suppose that we have a population of size, N (usually a large number). Distribution of Genotypes

NAA = Number of AA Homozygotes

NAa = Number of Heterozygotes

Naa = Number of aa Homozygotes

N = NAA + NAa + Naa

Two important frequencies for us to consider

Genotype Frequencies:

N

NR

N

NH

N

ND

aa

Aa

AA

Gene Frequences:

N

NNq

N

NNp

Aaaa

AaAA

2

22

2

These are important relationships be sure that you understand them.

Hardy – Weinberg Law:

If we assume no external forces or processes, within one generation,

D → p2 H → 2pq R → q2

and these frequencies remain stable for all future generations.

What assumptions are being made:

1. Individuals of different genotypes do not differ in fertility.

2. Random union of gametes.

3. All individuals, regardless of genotype, have an equal likelihood of survival from gamete to adulthood.

An example to illustrate what is being said by the law:

Suppose an aquarium owner purchases a variety of fish with two alleles that determine their fin color.

A = red fin

a = blue fin

In the shipment the owner receives, 75% of the fish have red fins, 25% have blue fins, and none have purple fins. What will be the eventual distribution of fin colors in the aquarium?

After one generation:

Note: Aa = purple fin

4

1

4

300 qp

16

1

8

3

16

62

16

9 2000

20 qRqpHpD

Proof of the Law:

Because of random union of gametes:

prob(AA) = p*p = p2

prob(Aa) = prob(aA) = p*q

or prob(Heterozygote) = 2pq

prob(aa) = q*q = q2

Note: gamete frequencies at start are p and q.*

At this point we use the third assumption that equal ratios of gametes survive, mate, and the zygotes survive until the adult stage to produce gametes for the next generation.

Thus, D = p2 H = 2pq R = q2

* Gametes are haploid and previous information about previous diploids’ population is lost.

What is missing?

1. Natural selection

2. Differential fertility and/or survival

3. Mutation

4. Immigration from other populations

5. Genetic drift

Are any assumptions unnecessary?

1. Random mating also produces the same results. Just slightly more complex to show than the random union case.

2. The requirement of distinct generations is not necessary. However, this assumption makes the algebra easier.

3. If there is a different distribution of genotypes among the sexes, the stable position does not emerge for two generations (assuming that all other assumptions hold – in particular the survival one)

Enter Natural Selection:

Consider

Survival Rates: lAA , lAa , Iaa

Fertility Rates: mAA, mAa, maa

Let WAA = lAA*mAA WAa = lAa*mAa Waa =Iaa*maa

Go Back to slides 2 and 3 and we can derive the number of gametes in the population at time, t + 1:

# from AA adults= 2*WAA*pt2 * Nt

# from Aa adults = 2*WAa*2*pt*qt*Nt

# from aa adults = 2*Waa* qt2*Nt

The total population size at time, t+1, is one half the sum of these three quantities.

Nt+1 = (WAA*pt2 + WAa*2*pt*qt + WAA* qt

2)*Nt

An equation such as this is called a difference equation.

This is an example of a “fast” evolutionary change (< 40 years). It was caused by industrial pollution in the area of Birmingham, England. Before pollution these moths had majority coloration (light) that was difficult to see against the lichen of trees growing in the area. After pollution the bark became black and the lichen died. This meant that the light colored insects became easy prey. So “selection pressure” favored the dark colored moths.

The difference equation for the population size leads to these two absolutely essential difference equations for the gene frequencies:

11

221

1

2

)(

tt

aatAattAAt

tAatAAtt

pq

WqWqpWp

pWqWpp

So what?

These equations coupled with the difference equation for the population size allow us to assign different fertility and survival rates to the existing three genotypes and model how the gene pool and population size change as a result.

Question: Is this absolutely the way things will turn out?

One last notational adjustment to make matters a little more simple.

We will work to eliminate the preponderance of W’s from the equation by multiplying them by a suitable constant. We “normalize” by selecting one of the W’s to be 1. Say WAA=1. Then we must divide the remaining two W’s by WAA. Thus,

wAA = 1 (=WAA/WAA)

wAa = WAa/WAA

waa = Waa/WAA

Note that we denoted these normalized values with a small, italicized w.

And, FINALLY, we define the selectivity coefficients:

sAA = 1 – wAA

sAa = 1 – wAa

saa = 1 – waa

Notice that, in general, these are selectivity against. That means that a value of 0 is good and positive decreases the gene pool.

Example:

mAA = 100 mAa = 50 maa = 25

lAA = ¾ lAa = ½ laa = 1/5

Then,

WAA = (100)(3/4) = 75 WAa = (50)(1/2) = 25 Waa = (25)(1/5) = 5

wAA = 75/75 = 1 wAa = 25/75 = 1/3 waa = 5/75 = 1/15

sAA = 0 sAa = 2/3 saa = 14/15

With all of these substitutions we finally have an expression for pt+1 that is “manageable”.

)2(

)1(

:

2

)(

1

221

taatAat

tAatt

aatAattt

Aatttt

qspsq

qspp

or

wqwqpp

wqppp

The simulations that follow all used the first form of the difference equation.

We will consider:

1. Selection against a dominant allele

2. Selection against a recessive allele

3. Heterozygote superiority

HWApprox(p, wdd, wdr, wrr, n, q, i, pp, pn, qp, qn, hw) ≔ Prog q 1 - p ≔ i 0 ≔ pp p ≔ qp q ≔ pn p ≔ qn q ≔ hw [] ≔ Loop If i > n RETURN hw hw APPEND(hw, [[i, pn]]) ≔ pn dp(pp, qp, wdd, wdr, wrr) ≔ qn 1 - pn ≔ pp pn ≔ qp qn ≔ i i + 1≔

p·(p·wdd + q·wdr) dp(p, q, wdd, wdr, wrr) ≔⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯ 2 2 p ·wdd + 2·p·q·wdr + q ·wrr

Writing a program to implement this model is a quite straight forward process. This program is written in a functional programming language used in the Derive® Computer Algebra System.

Selection Against the Dominant Allele

p0 = .9 wAA = .8 wAa = .8 waa = 1

Note that even though the recessive allele made up only 10% of the gene pool, in approximately 70 generations it makes up the entire gene pool.

Selection Against the Recessive Allele

p0 = .1 wAA = 1 wAa = 1 waa = .8

The end result is expected, but there is a qualitative difference. In the former case the decline of the majority gene started slowly and then accelerated. Here the initial decline is rapid and then the rate slows down.

Selection in Favor of Heterozygote

Selection against the recessive is four times that against dominant

p0 = .9; .5 wAA = .9 wAa = 1 waa = .6

Note that in each of the cases (in fact, all cases except p0 = 0 or 1) The dominant allele will eventually make up 80% of the gene pool and the recessive will make up 20%. This result is called a stable equilibrium.

Finally a Highly Unusual Result

Selection against the Heterozygote

p0 = .55; .5; .45 wAA = 1 wAa = .8 waa = 1

Notice that if both populations start out with 50% of the gene pool then that percentage will persist. However, if the percentage wanders off of 50%, the majority gene will become the entire gene pool and the other will become extinct. Thus 50% is called an unstable equilibrium.

Of the four scenarios that we considered, three resulted in the elimination of one of the Alleles. Only the case of selection in favor of the Heterozygote resulted in a mixed gene pool.

Thus, in the presence of natural selection (We will see later in this lecture what a powerful force this can be.), this is the only case where genetic variation is maintained. Polymorphism

Other cases fix on one or the other of the alleles.

Selection in Favor of Heterozygote

Selection against the recessive is four times that against dominant

p0 = .9; .5 wAA = .9 wAa = 1 waa = .6

Note that in each of the cases (in fact, all cases except p0 = 0 or 1) The dominant allele will eventually make up 80% of the gene pool and the recessive will make up 20%. This result is called a stable equilibrium. Can we determine what this equilibrium will be?

More notation (Mathematicians love it!!)

allelesAoffrequencyforvaluemequilibriup ˆ

What do we mean by equilibrium?

When equilibrium is achieved then the frequency of the alleles stays stable.

pt+1 = pt for all t > some t0

And of course,

qt+1 = 1 – pt+1 = 1 – pt = qt

On the previous slide this happens around generation 50. So, t0 ≈ 50.

Let’s see if we can predict . Recall, .

p̂ 1,0ˆ p

We start with the definition of equilibrium:

pt+1 = pt

Earlier we saw that in the presence of natural selection,

221 2

)(

tAattAAt

tAatAAtt qwqpwp

pwqwpp

Since pt ≠ 0, this means that 22 2 tAattAAtAatAAt qwqpwpwqwp

For all t > t0 . Or at the equilibrium value,

2)ˆ1()ˆ1(ˆ2ˆ)ˆ1(ˆ pwppwpwpwp AaAAAaAA

Some simple, but messy, algebra gives us the following result.

)()(ˆ

aaAaAAAa

aaAa

wwww

wwp

Or,

aaAA

aa

ss

sp

ˆ

In our example:

wAA = .9 wAa = 1 waa = .6

So,

2.8.1ˆ1ˆ

8.5.

4.

4.1.

4.

)6.1()9.1(

6.1ˆ

pq

p

Experimental evidence:

ST and CH are names of blocks of genes in Drosophilia pseudo-obscura because of a chromosomal feature called inversion the genes in each block are held together and function as two alleles at a single locus.

Solid line simulated path for p.

Dashed lines are 95% confidence limits

Vertical bars: experimental data

Results correctly predicted the equilibrium and the dynamics of the approach to equilibrium.

But, what about mutation?

Ordinarily it works this way

A a

v

u

We are going to “stack the deck” in favor of mutation and assume

A a

i.e. we assume: v = 0

In the absence of any selection our difference equation becomes

pt+1 = (1 – u) pt

This is just the difference equation for exponential decay

u

Look at the time axis! This process is much slower than our simulations of natural selection that was anywhere from 1 generation (pure Hardy-Weinberg) to about 15,000 generations to drop from p=.9 to p=.1.

To actually calculate the predicted time to move from p0 to pt . Begin with:

01

03

23

02

12

01

)1()1(

)1()1(

)1()1(

)1(

pupup

pupup

pupup

pup

ttt

Rearrange bottom line as:

t

t

up

p)1(0

Take log of both sides and solve for t. This yields,

)1log(

)log(

)1log()log(

0

0

u

pp

t

utp

p

t

t

Let’s calculate the time to move from p0 = .9 to pt = .1 for the first curve on the graph shown two slides previously, i.e. u = 10-5 =.00001

sgeneration721,219)99999log(.

)11111log(.

)101log(

)9.1.log(5

t

Mathematical note: Since this quantity involves the quotient of two logarithms, any base logarithms will give the same numerical result. i.e. We can use either the log10 or ln button on our calculator or even log2 if we care to do this.

Extra Credit Project: Use a spreadsheet or write a computer program to generate the graphs that were shown two slides previously.

In general, mutation has little effect if selection is at work.

If selection is virtually neutral, say s < .001, then mutation can have an effect, but it is slow.

However, recurrent mutation can not be totally disregarded.

• Recurrent mutation tends to maintain a supply of genetic variation for mutation to act upon

• Even if selection is tending to eliminate one allele, recurrent mutation tends to maintain its presence in the gene pool. Thus, if the environment changes to a situation that is more favorable to the allele that was being selected against, that allele is still available.

•Mutation is the ultimate source of genetic variation.

Sometimes mutation may oppose selection.

Suppose selection is against A

wAA = 1 – s wAa = 1 – s (A is dominant) waa = 1

However, also assume v > 0, i.e. There is recurrent mutation of a to A at a rate, v. Then, it can be shown

s

vp ˆ

On the other hand if A is recessive,

wAA = 1 – s wAa = 1 waa = 1

We have,

s

vp ˆ

If A is recessive, mutation maintains a much higher frequency than if it is dominant.

Genetic Drift

So far every model we have considered has been a deterministic model, i.e. everything is set in motion on a predetermined path. Chance has been ignored.

But, chance does play a role!

• In the sea urchin model, gametes can wash out to sea.

• Some types of individual may produce more offspring than others

• Survival rates may vary

A theory involving chance is called a stochastic theory.

Instead of getting a single number, we get a distribution between several states

Two sources for chance occurrences1. Changing environment2. Internal to the population – they would occur even in a fixed

environment.“Genetic Drift” refers to all chance events internal to the population

Example:

Suppose we start with a large population and p = ½ .

From the gamete pool draw 4 individuals (small sample)

Could be 2 & 2 relative to the alleles

Could also be 3 & 1, or 1 & 3, or 0 & 4, or 4 & 0.

Suppose 3 & 1 is the distribution in our sample, then p has moved from ½ to ¾ without any selective pressures. This is called “sampling error.”

NOTE: Sampling error is more likely to occur as the population size decreases.

Experimental evidence of Genetic Drift

Kerr and Wright (1954) sampled a population of Drosophilia melanogaster heterozygotes. They constructed 96 groups of 4 males and 4 females. At each generation they randomly extracted 4 males and 4 females from that generation, etc. The following is their data.

Note the “U” shape of the later histograms of the frequency distributions. This is characteristic of this type of situation