Population Genetics 1: Introduction and Hardy...

14
1 Population Genetics 1: Introduction and Hardy-Weinberg equilibrium Population genetics : discipline devoted to the study of the genetic basis of microevolution Population genetics : the practice of using the information obtained from a sample of a natural population to make inferences about the evolutionary processes affecting that population (i) to measure the extent of genetic variation in natural populations (ii) to explain natural genetic variation in terms of its origin, maintenance and evolutionary significance

Transcript of Population Genetics 1: Introduction and Hardy...

Page 1: Population Genetics 1: Introduction and Hardy …awarnach.mathstat.dal.ca/~joeb/biol3046/PDFs/slides/...(or Hardy-Weinberg equilibrium; G.H. Hardy and W. Weinberg, 1908) Assumption

1

Population Genetics 1:

Introduction and Hardy-Weinberg equilibrium

Population genetics: discipline devoted to the study of the genetic basis of microevolution Population genetics: the practice of using the information obtained from a sample of a natural population to make inferences about the evolutionary processes affecting that population

(i) to measure the extent of genetic variation in natural populations (ii) to explain natural genetic variation in terms of its origin, maintenance and evolutionary significance

Page 2: Population Genetics 1: Introduction and Hardy …awarnach.mathstat.dal.ca/~joeb/biol3046/PDFs/slides/...(or Hardy-Weinberg equilibrium; G.H. Hardy and W. Weinberg, 1908) Assumption

2

Population genetics is based on statistical models: Parametric inference: use of models to test hypotheses about the evolutionary processes that generated the sample of the data and to estimate values of model parameters. Some important questions:

•  What is the relationship between genetic variation and a disease phenotype

•  What can the gene sequences tell us about the evolutionary history of a species; e.g., humans

•  What has been the role of natural selection during the evolution of a species.

Population genetics is based on statistical models:

“A model is an intentional simplification of a complex situation designed to eliminate extraneous detail in order to focus attention

on the essentials of the situation” (Daniel L. Hartl).

Page 3: Population Genetics 1: Introduction and Hardy …awarnach.mathstat.dal.ca/~joeb/biol3046/PDFs/slides/...(or Hardy-Weinberg equilibrium; G.H. Hardy and W. Weinberg, 1908) Assumption

3

Define a model

Explore properties

Estimate model parametersfrom the data

Test goodness of fit

Refine Model

Define a model

Explore properties

Estimate model parametersfrom the data

Test goodness of fit

Refine Model

Rules / parameters / quantities

Summary stats / graphical data exploration / simulation

Moments / maximum likelihood / Bayesian methods

Compare estimators / heterogeneity / outliers

Update parameters

Statistical modeling and inference: Concerns:

Mutation

Migration

Recombination

Selection

Genetic drift

Forces of evolution Natural populations Sample

ACTTAGGACTTATAA ACAAAGGACTTATAA

ACTTAGCACTTATAA

ACTTAGGACAAATAA

ACCCAGGACTTATAA

Stochastic evolutionary

process

Stochastic sampling process

Inference

Page 4: Population Genetics 1: Introduction and Hardy …awarnach.mathstat.dal.ca/~joeb/biol3046/PDFs/slides/...(or Hardy-Weinberg equilibrium; G.H. Hardy and W. Weinberg, 1908) Assumption

4

Population: a subgroup of individuals of the same species living within some set of restrictions, usually in a restricted geographic area. •  also called a local population or deme. •  the practical implementation of the definition will vary among researchers •  the evolving unit of the species •  the unit within which the evolution of adaptive characteristics occurs

In this section: Evolution: the change in the genetic constitution of a population over time. More simply, the change in allele frequencies in a population over time. [microevolution] In all sections of this course: Evolution ≠ Natural selection [alone]

Allele frequencies in populations

Eskimo MN blood group data:

Genotypes Total (n) MM MN NN

count 233 385 129 747

Frequency of M = p Frequency of N = q p + q = 1

( ) ( )n

p2

MN1MM2 += pq −= 1

( ) ( ) 57.074723852332

+=p 43.057.01 =−=q

Page 5: Population Genetics 1: Introduction and Hardy …awarnach.mathstat.dal.ca/~joeb/biol3046/PDFs/slides/...(or Hardy-Weinberg equilibrium; G.H. Hardy and W. Weinberg, 1908) Assumption

5

Allele frequencies in populations •  p and q are parameters of a population •  p and q were estimated •  p and q have error

( ) ( )nppp

21Var −

=

AA = 8 / Aa = 6 / aa = 2 p = 0.6875

var(p) = 0.0067

StdDev = 0.0820 (x2 = 0.164)

95%CI = (0.523 , 0.851)

width = 0.3277

AA = 80 / Aa = 60 / aa = 20 p = 0.6875

var(p) = 0.00067

StdDev = 0.0259 (x2 = 0.052)

95%CI = (0.636 , 0.739)

width = 0.1036

Assuming that repeated estimates would be normally distributed we can use the variance to predict how close our estimate of p is to the population value.

Allele frequencies differ among populations

Genotypes Total (n) MM MN NN

Iceland 0.312 0.515 0.173 747 Greenland 0.835 0.156 0.009 569

Evolution has occurred!

What is the origin and evolutionary significance of such change?

Page 6: Population Genetics 1: Introduction and Hardy …awarnach.mathstat.dal.ca/~joeb/biol3046/PDFs/slides/...(or Hardy-Weinberg equilibrium; G.H. Hardy and W. Weinberg, 1908) Assumption

6

Allele frequencies differ among populations

What are the possible causes for microevolution:

1.  Finite population size 2.  Mutation 3.  Non-random mating 4.  Natural selection 5.  Migration / gene flow

Our null model:

Nothing interesting ever happens in the population

(or Hardy-Weinberg equilibrium; G.H. Hardy and W. Weinberg, 1908)

Assumption of the HW model 1. The organism is diploid 2. Reproduction is sexual 3. Mating is random 4. Generations are discrete 5. Population size is infinite (or very large) 6. No migration 7. No mutation 8. No natural selection

Idealized population

Page 7: Population Genetics 1: Introduction and Hardy …awarnach.mathstat.dal.ca/~joeb/biol3046/PDFs/slides/...(or Hardy-Weinberg equilibrium; G.H. Hardy and W. Weinberg, 1908) Assumption

7

Hardy-Weinberg equilibrium

M or N and M or N (p + q) x (p + q)

(p + q) x (p + q) = 1

p2 + 2pq + q2 = 1

So, under HW conditions, the frequency of the blood group genotypes

in the next generation are:

fMM = p2 fMN = 2pq fNN = q2

Male gametes

Female gametes

M(p) N(q)

M (p)

AA (p2)

Aa (pq)

N (q)

Aa (pq)

aa (q2)

Same thing, but by using the traditional cross-multiplication table

Note 1: here we are mixing gametes at random among all members of the population! (Not, as in transmission genetics, mixing gametes of just two parents at random

Note 2: these are the expected frequencies of alleles at the same locus when they are randomly associated with each other.

Page 8: Population Genetics 1: Introduction and Hardy …awarnach.mathstat.dal.ca/~joeb/biol3046/PDFs/slides/...(or Hardy-Weinberg equilibrium; G.H. Hardy and W. Weinberg, 1908) Assumption

8

Hardy-Weinberg equilibrium Keynotes of the HW model: •  HW model specifies the relationship between allele frequencies

and gene frequencies •  Natural populations can be tested for HW •  Mendelian inheritance means that frequencies do not change

unless some external pressure is acting. •  No matter what the initial frequencies, just one generation of

random mating will result in HW frequencies. Note: •  HW is not very sensitive to certain kinds of violations •  power issue

Rare recessive alleles can “hide” in the heterozygotes

q 2pq : q2 0.1 18:1

0.01 198:1 0.001 1998:1

Cystic Fibrosis (CF) example:

CF: about 1 in 1700 newborn Caucasians

ASSUMING HW:

q2 = 1/1700

q = (1/1700)1/2 = 0.024

2pq = 2 x 0.024 x (1-0.024) = 0.047

Note we assumed HW without testing the assumption. Clearly it is subject to natural selection.

But, with just one generation of random mating we see 1 in 21 individuals are carriers, although only 1 in 1700 exhibit the disease.

Page 9: Population Genetics 1: Introduction and Hardy …awarnach.mathstat.dal.ca/~joeb/biol3046/PDFs/slides/...(or Hardy-Weinberg equilibrium; G.H. Hardy and W. Weinberg, 1908) Assumption

9

A proof of the HW model

Genotypes MM MN NN Genotype frequencies P1 P2 P3

If in HW equilibrium:

P1 = P1’ = p2

P2 = P2’ = 2pq

P3 = P3’ = q2

Present generation P1 P2 P3 Next generation P1

’ P2’ P3

A reminder of allele frequencies in populations

Eskimo MN blood group data:

Genotypes Total (n) MM MN NN

count 233 385 129 747

( )nn

p2MN)(1

2MM2

+=

pq −= 1 ( )21PP 2 1 ×+=p

( ) ( )n

p2

MN1MM2 +=

Page 10: Population Genetics 1: Introduction and Hardy …awarnach.mathstat.dal.ca/~joeb/biol3046/PDFs/slides/...(or Hardy-Weinberg equilibrium; G.H. Hardy and W. Weinberg, 1908) Assumption

10

A proof of the HW model

Genotype frequencies of offspring

Mating Frequency MM MN NN MM × MM P1

2 1 0 0 MM × MN 2 × P1 × P2 1/2 1/2 0 MM × NN 2 × P1 × P3 0 1 0 MN × MN P2

2 1/4 1/2 1/4 MN × NN 2 × P2 × P3 0 1/2 1/2 NN × NN P3

2 0 0 1

Total in next generation: P1’ P2’ P3’

( ) pq22PP

2PP2 PP P1/2 P2P PPP 2

32

132223121

'2 =⎟

⎞⎜⎝

⎛ +⎟⎠

⎞⎜⎝

⎛ +=+++=

( ) 22

23

23322

'3 2

PPPPP2/1P qP =⎟⎠

⎞⎜⎝

⎛+=++=

P2’ = P2

’’

P3’ = P3

’’

Note: p = (P1) + (P2 × 1/2) and q = (P3) + (P2 × 1/2)

Nice proof: HW in 1 generation; hence, no changes once in HW

( ) 22

21

2221

21

'1 2

PP4/1PPPP pP =⎟⎠

⎞⎜⎝

⎛+=++= P1

’ = P1

’’

( If HWE )

Testing for HW equilibrium 3 steps:

1.  Compute observed genotype frequencies 2.  Compute expected genotype frequencies 3.  Test goodness of fit

MN blood types in Pueblo Indians Genotypes MM MN NN Observed counts 83 46 11

Total = n = 140

Let’s use some real data as an example: the following data are for the MN blood genotypes in Pueblo Indians:

Page 11: Population Genetics 1: Introduction and Hardy …awarnach.mathstat.dal.ca/~joeb/biol3046/PDFs/slides/...(or Hardy-Weinberg equilibrium; G.H. Hardy and W. Weinberg, 1908) Assumption

11

Testing for HW equilibrium PART 1: Observed Genotype frequencies:

MM = 83/140 = 0.59 MN = 46/140 = 0.33 NN = 11/140 = 0.08

Observed allele frequencies:

M = p = 0.59 + 0.33(1/2) = 0.755 N = q = (1-0.755) = 0.245

Do NOT compute the allele frequencies at this step by assuming HW (i.e., p2 ≠ 0.59). Anyone who does this will automatically get an F in the class!

Testing for HW equilibrium PART 2: Expected genotype frequencies:

p2 = (0.755)2 = 0.57 2pq = 2 x 0.755 x 0.245 = 0.37

q2 = (0.245)2 = 0.06 Expected genotype counts:

p2 x n = 0.57 × 140 = 79.8 2pq x n = 0.37 × 140 = 51.8

q2 x n = 0.06 × 140 = 8.4

Compare these counts to the observed counts in the table above.

Page 12: Population Genetics 1: Introduction and Hardy …awarnach.mathstat.dal.ca/~joeb/biol3046/PDFs/slides/...(or Hardy-Weinberg equilibrium; G.H. Hardy and W. Weinberg, 1908) Assumption

12

Testing for HW equilibrium PART 3: d.f. = [(number of categories tested) – (non-independent categories) – (calculate p from

data)] = [3 – 1 – 1] = 1 P = 0.20; i.e., there is a 20% chance that we would have observed a test statistic this

large (or larger) under HW.

( ) ( ) ( ) ( )4.8

4.8118.51

8.51468.79

8.7983expected

expected - observed 22222 −

+−

+−

==∑χ

805.0649.0128.02 ++=χ

58.12 =χ

Testing for HW equilibrium Testing HW is NOT possible under dominance

Genotype Phenotype DD Rh+

Dd Rh+ dd Rh-

Example: Rh+ phenotype (DD or Dd) in North America:

Rh+ = 0.858 (DD or Dd)

Rh- = 0.142 (dd)

q = (0.142)1/2 = 0.3768 p = (1 – 0.3768) = 0.6232

d.f. = 2 (for two classes of data) – 1 – 1 (for estimating q) = 0

Page 13: Population Genetics 1: Introduction and Hardy …awarnach.mathstat.dal.ca/~joeb/biol3046/PDFs/slides/...(or Hardy-Weinberg equilibrium; G.H. Hardy and W. Weinberg, 1908) Assumption

13

HW equilibrium with three alleles

Alleles: A1 A2 and A3 Frequencies: p1 p2 and p3

p1 + p2 + p3 = 1

Male gametes

A1 (p1) A2 (p2) A3 (p3) A1

(p1) A1A1 p1

2 A1A2 p1 p2

A1A3 p1 p3

A2 (p2)

A1A2 p1 p2

A2A2 p2

2 A2A3 p2 p3 Fe

mal

e ga

met

es

A3 (p3)

A1A3 p1 p3

A2A3 p2 p3

A3A3 p3

2

The traditional cross-multiplication square

A1A1: p12

A1A2: 2p1 p2 A1A3: 2p1 p3 A2A2: p2

2 A2A3: 2p2 p3 A3A3: p3

2

Genotype frequencies in the next generation

Define a model

Explore properties

Estimate model parametersfrom the data

Test goodness of fit

Refine Model

Define a model

Explore properties

Estimate model parametersfrom the data

Test goodness of fit

Refine Model

Rules / parameters / quantities

Summary stats / graphical data exploration / simulation

Moments / maximum likelihood / Bayesian methods

Compare estimators / heterogeneity / outliers

Update parameters

Statistical modeling and inference: Concerns:

Page 14: Population Genetics 1: Introduction and Hardy …awarnach.mathstat.dal.ca/~joeb/biol3046/PDFs/slides/...(or Hardy-Weinberg equilibrium; G.H. Hardy and W. Weinberg, 1908) Assumption

14

Change in frequencies Agency Genotype Allele Notes Linkage no no Creates disequilibrium among loci

Inbreeding yes no Acts on all loci in genome; results in loss of heterozygosity

Assortative Mating yes no Only acts on the locus subject to assortment, and those loci linked to it

Migration a yes yes Depends of migration rate and frequency differences between populations

Mutation yes yes Very very very slow

Natural Selection yes yes Acts on the locus subject to selection, and those loci linked to it

Genetic Drift yes yes Acts on all loci in the genome; results in loss of heterozygosity and loss of alleles

HW model: no change in frequencies

Alt model; change in frequencies (molecular evolution)