Summary DNA evolves leading to unique sequences that may be used to identify species, biological...

Summary• DNA evolves leading to unique sequences that may be used to identify

species, biological species, provenences of strains, genotypes, genetic or allelic richness and genetic structure

• Mutations and recombinations drive evolution of DNA sequences. Isolation, drift, and selection lead to unique sequences associated with different species or isolated populations

• Isolation: allopatric vs. sympatric. In both cases there is no gene flow between species

• DNA sequences can be used to identify species. They need to be aligned and compared. If each species is unequivocally found within a statistically supported clade, then that sequence can be used to identify species and provenance for that group of organisms

• Diagnostic sequence,narrower concept need to be from a locus that is less variable within species and more variable in between species. Alternatively fixed alleles may be the most powerful. Rare alleles or private alleles are also important in defining populations (individuals that are freely mating): allele frequencies used by assignment tests such as structure

Summary

• Sequences used to identify species either by comparison of actual sequence or by use of taxon specific PCR primers that will only amplify target organism. Need for control. I.e. primers that will amplify any organism to make sure reaction is working.

• If sequences are obtained and compared they can– Aligned with sequences of similar organisms to determine presence of

statistically significant clades– Compared with sequences present in public databases such as

GenBank. BLAST engine – Beware that a single locus may be deceiving, because history of locus

(gene geneaology is not necessarily history of organism)

Summary• If more than just species identification is needed, multiple genetic markers

will be needed. These should be as much as possible unlinked. These multiple markers can be used to identify genotypes and study their distribution to understand epidemiology of a disease or perform paternity tests; determine allelic richness: this is considered an important issue in conservation biology (normally small or isolated populations tend to loose alleles); study the genetic structure of a species, I.e. Are populations genetically different (are their alleleic frequencies significantly different) and if so at what scale does the difference become significant; finally multiple genetic markers can be used to understand if species is reproducing sexually or not. This is important to understand epidemiology

• Genetic information can be supported by other types of information. For fungi for instance the use of somatic compatibility and of mating allele richness can be used to make inferences on genotypic composition, and relatedness of genotypes.

• Mitochondrial analysis can also be used to make inferences on genetic relatedness

Recognition of self vs. non self

• Intersterility genes: maintain species gene pool. Homogenic system

• Mating genes: recognition of “other” to allow for recombination. Heterogenic system

• Somatic compatibility: protection of the individual.


• It is possible to have different genotypes with the same vc alleles

• VC grouping and genotyping is not the same

• It allows for genotyping without genetic tests

• Reasons behing VC system: protection of resources/avoidance of viral contagion

Somatic incompatibility

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.



More on somatic compatibility

• Perform calculation on power of approach

• Temporary compatibility allows for cytoplasmic contact that then is interrupted: this temporary contact may be enough for viral contagion

SOMATIC COMPATIBILITY

• Fungi are territorial for two reasons– Selfish– Do not want to become infected

• If haploids it is a benefit to mate with other, but then the n+n wants to keep all other genotypes out

• Only if all alleles are the same there will be fusion of hyphae

• If most alleles are the same, but not all, fusion only temporary

SOMATIC COMPATIBILITY

• SC can be used to identify genotypes• SC is regulated by multiple loci• Individual that are compatible (recognize one

another as self, are within the same SC group)• SC group is used as a proxy for genotype, but in

reality, you may have some different genotypes that by chance fall in the same SC group

• Happens often among sibs, but can happen by chance too among unrelated individuals


• What are the chances two different individuals will have the same set of VC alleles?

• Probability calculation (multiply frequency of each allele)

• More powerful the larger the number of loci

• …and the larger the number of alleles per locus

Recognition of self vs. non self:

probability of identity (PID)• 4 loci• 3 biallelelic• 1 penta-allelic

• P= 0.5x0.5x0.5x0.2=0.025

• In humans 99.9%, 1000, 1 in one million

INTERSTERILITY

• If a species has arisen, it must have some adaptive advantages that should not be watered down by mixing with other species

• Will allow mating to happen only if individuals recognized as belonging to the same species

• Plus alleles at one of 5 loci (S P V1 V2 V3)

INTERSTERILITY

• Basis for speciation

• These alleles are selected for more strongly in sympatry

• You can have different species in allopatry that have not been selected for different IS alleles

MATING

• Two haploids need to fuse to form n+n

• Sex needs to increase diversity: need different alleles for mating to occur

• Selection for equal representation of many different mating alleles

MATING

• If one individuals is source of inoculum, then the same 2 mating alleles will be found in local population

• If inoculum is of broad provenance then multiple mating alleles should be found

MATING

• How do you test for mating?

• Place two homokaryons in same plate and check for formation of dikaryon (microscopic clamp connections at septa)

Clamp connections







MATING ALLELES

• All heterokaryons will have two mating allelels, for instance a, b

• There is an advantage in having more mating alleles (easier mating, higher chances of finding a mate)

• Mating allele that is rare, may be of migrant just arrived

• If a parent is important source, genotypes should all be of one or two mating types

Two scenarios:

• A, A, B, C, D, D, E, H, I, L

• A, A, A,B, B, A, A

Two scenarios:

• A, A, B, C, D, D, E, H, I, L

• Multiple source of infections (at least 4 genotypes)

• A, A, A,B, B, A, A

• Siblings as source of infection (1 genotype)

SEX

• Ability to recombine and adapt

• Definition of population and metapopulation

• Different evolutionary model

• Why sex? Clonal reproductive approach can be very effective among pathogens

Long branches in Long branches in between groups between groups suggests no sex is suggests no sex is occurring in between occurring in between groupsgroups

Het INSULARE

True Fir EUROPE

Spruce EUROPE

True Fir NAMERICA

Pine EUROPE

Pine NAMERICA

0.05 substitutions/site

NJ

Fir-SpruceFir-Spruce

Pine EuropePine Europe

Pine N.Am.Pine N.Am.

Small branches within a clade Small branches within a clade indicate sexual reproduction is indicate sexual reproduction is

ongoing within that group of ongoing within that group of individualsindividuals

11.10 SISG CA

2.42 SISG CA

BBd SISG WA

F2 SISG MEX

BBg SISG WA

14a2y SISG CA

15a5y M6 SISG CA

6.11 SISG CA

9.4 SISG CA

AWR400 SPISG CA

9b4y SISG CA

15a1x M6 PISG CA

1M PISG MEX

9b2x PISG CA

A152R FISG EU

A62R SISG EU

A90R SISG EU

A93R SISG EU

J113 FISG EU

J14 SISG EU

J27 SISG EU

J29 SISG EU

0.0005 substitutions/site

NJ

890 bpCI>0.9

NA S

NA P

EU S

EU F

Index of association

Ia= if same alleles are associated too much as opposed to random,

it means sex is not occurring

Association among alleles calculated and compared to

simulated random distribution

Evolution and Population genetics

• Positively selected genes:……• Negatively selected genes……• Neutral genes: normally population genetics

demands loci used are neutral• Loci under balancing selection…..

Evolutionary history

• Darwininan vertical evolutionary models

• Horizontal, reticulated models..

Are my haplotypes sensitive enough?

• To validate power of tool used, one needs to be able to differentiate among closely related individual

• Generate progeny

• Make sure each meiospore has different haplotype

• Calculate P

RAPD combination1 2

• 1010101010

• 1010101010

• 1010101010

• 1010101010• 1010000000

• 1011101010

• 1010111010

• 1010001010

• 1011001010• 1011110101

Conclusions

• Only one RAPD combo is sensitive enough to differentiate 4 half-sibs (in white)

• Mendelian inheritance?

• By analysis of all haplotypes it is apparent that two markers are always cosegregating, one of the two should be removed

If we have codominant markers how many do I need

• IDENTITY tests = probability calculation based on allele frequency… Multiplication of frequencies of alleles

• 10 alleles at locus 1 P1=0.1

• 5 alleles at locus 2 P2=0,2

• Total P= P1*P2=0.02

Have we sampled enough?

• Resampling approaches

• Saturation curves

– A total of 30 polymorphic alleles– Our sample is either 10 or 20– Calculate whether each new sample is

characterized by new alleles

Saturation (rarefaction) curves

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

NoOf Newalleles

Dealing with dominant anonymous multilocus markers

• Need to use large numbers (linkage)

• Repeatability

• Graph distribution of distances

• Calculate distance using Jaccard’s similarity index

Jaccard’s

• Only 1-1 and 1-0 count, 0-0 do not count

1010011

1001011

1001000

Jaccard’s

• Only 1-1 and 1-0 count, 0-0 do not count

A: 1010011 AB= 0.60.4 (1-AB)

B: 1001011 BC=0.5 0.5

C: 1001000 AC=0.2 0.8

Now that we have distances….

• Plot their distribution (clonal vs. sexual)



• Analysis: – Similarity (cluster analysis); a variety of

algorithms. Most common are NJ and UPGMA



• Analysis: – Similarity (cluster analysis); a variety of

algorithms. Most common are NJ and UPGMA

– AMOVA; requires a priori grouping

AMOVA groupings

• Individual

• Population

• Region

AMOVA: partitions molecular variance amongst a priori defined groupings

Example

• SPECIES X: 50%blue, 50% yellow

AMOVA: example

v

Scenario 1 Scenario 2

POP 1

POP 2v

Expectations for fungi

• Sexually reproducing fungi characterized by high percentage of variance explained by individual populations

• Amount of variance between populations and regions will depend on ability of organism to move, availability of host, and

• NOTE: if genotypes are not sensitive enough so you are calling “the same” things that are different you may get unreliable results like 100 % variance within pops, none among pops

Plotting distances

• Pairwise genetic distances can be plotted: the distribution of distances can be informative of biology of organism

Results: Jaccard similarity coefficients

0.3

0.90 0.92 0.94 0.96 0.98 1.00

00.10.2

0.40.50.60.7

Coefficient

Fre

quen

cy

P. nemorosa

P. pseudosyringae: U.S. and E.U.

0.3

Coefficient0.90 0.92 0.94 0.96 0.98 1.00

00.10.2

0.40.50.60.7

Fre

quen

cy

Fre

quen

cy

0.9 0.91 0.92 0.93 0.94 0.95 0.96 0.97 0.98 0.99

Pp U.S.

Pp E.U.

0.0

0.1

0.2

0.3

0.4

0.5

0.6

Jaccard coefficient of similarity

0.7

P. pseudosyringae genetic similarity patterns are different in U.S. and E.U.

0.1

4175A

p72

p39

p91

1050

p7

2502

p51

2055.2

2146.1

5104

4083.1

2512

2510

2501

2500

2204

2201

2162.1

2155.3

2140.2

2140.1

2134.1

2059.2

2052.2

HCT4

MWT5

p114

p113

p61

p59

p52

p44

p38

p37

p13

p16

2059.4

p115

2156.1

HCT7

p106

P. nemorosa

P. ilicisP. pseudosyringae

Results: Results: P. nemorosaP. nemorosa

Results: Results: P. pseudosyringaeP. pseudosyringae

0.1

4175A2055.2p44

FC2DFC2E

GEROR4 FC1B

FCHHDFCHHCFC1A

p80FAGGIO 2FAGGIO 1FCHHBFCHHAFC2FFC2CFC1FFC1DFC1Cp83p40

BU9715 p50

p94p92

p88p90

p56Bp45

p41p72p84p85p86p87p93p96p39p118p97p81p76p73p70p69p62p55p54

HELA2HELA 1

P. nemorosaP. ilicis

P. pseudosyringae

= E.U. isolate

The “scale” of disease

• Dispersal gradients dependent on propagule size, resilience, ability to dessicate, NOTE: not linear

• Important interaction with environment, habitat, and niche availability. Examples: Heterobasidion in Western Alps, Matsutake mushrooms that offer example of habitat tracking

• Scale of dispersal (implicitely correlated to metapopulation structure)---

QuickTime™ and aTIFF (LZW) decompressor


RAPDS> not used often now





RAPD DATA W/O COSEGREGATING MARKERS

PCA



AFLP

• Amplified Fragment Length Polymorphisms

• Dominant marker• Scans the entire genome like RAPDs• More reliable because it uses longer PCR

primers less likely to mismatch• Priming sites are a construct of the

sequence in the organism and a piece of synthesized DNA

How are AFLPs generated?

• AGGTCGCTAAAATTTT (restriction site in red)• AGGTCG CTAAATTT• Synthetic DNA piece ligated

– NNNNNNNNNNNNNNCTAAATTTTT

• Created a new PCR priming site– NNNNNNNNNNNNNNCTAAATTTTT

• Every time two PCR priming sitea are within 400-1600 bp you obtain amplification

White mangroves:Corioloposis caperata

Coco Solo Mananti Ponsok DavidCoco Solo 0Mananti 237 0Ponsok 273 60 0David 307 89 113 0

Distances between study sites

Coriolopsis caperataCoriolopsis caperata on on Laguncularia racemosaLaguncularia racemosa

Forest fragmentation can lead to loss of gene flow among previously contiguous populations. The negative repercussions of such genetic isolation should most severely affect highly specialized organisms such as some plant-parasitic fungi.

AFLP study on single spores

Site # of isolates # of loci % fixed alleles

Coco Solo 11 113 2.6

David 14 104 3.7

Bocas 18 92 15.04

Distances =PhiST between pairs ofpopulations. Above diagonal is the ProbabilityRandom distance > Observed distance (1000iterations).

Coco Solo Bocas David

Coco Solo 0.000 0.000 0.000

Bocas 0.2083 0.000 0.000

David 0.1109 0.2533 0.000

Using DNA sequences

• Obtain sequence• Align sequences, number of parsimony informative sites• Gap handling• Picking sequences (order)• Analyze sequences

(similarity/parsimony/exhaustive/bayesian• Analyze output; CI, HI Bootstrap/decay indices

Using DNA sequences

• Testing alternative trees: kashino hasegawa • Molecular clock• Outgroup• Spatial correlation (Mantel)

• Networks and coalescence approaches

From Garbelotto and Chapela, From Garbelotto and Chapela, Evolution and biogeography of matsutakesEvolution and biogeography of matsutakes

Biodiversity within speciesBiodiversity within speciesas significant as betweenas significant as betweenspeciesspecies

Microsatellites or SSRs

• AGTTTCATGCGTAGGT CG CG CG CG CG AAAATTTTAGGTAAATTT

• Number of CG is variable• Design primers on FLANKING region, amplify DNA• Electrophoresis on gel, or capillary• Size the allele (different by one or more repeats; if

number does not match there may be polimorphisms in flanking region)

• Stepwise mutational process (2 to 3 to 4 to 3 to2 repeats)

75

MS18 (AC)38 218 bp(AC)39 220 bp(AC)40 222 bp

MS43a (CAGA)70 373 bpMS43a (CAGA)71 377 bpMS43a (CAGA)72 381 bp

(220-218)2 22

(222-218)2 42

(377-373)2 42

(381-373)2 82

(39-38)2 12

(40-38)2 22

(71-70)2 12

(72-70)2 22

ACACACACACACACACAC

AMOVA Analysis of Molecular Variance

76

Example 1: Origins of the Sudden Oak Death Epidemic in California

(Mascheretti et al., Molecular Ecology (2008) 17: 2755-2768)

Photo: UC Davis

Photo: www.membranetransport.org

Photo: Northeast Plant Diagnostic Network

http://www.membranetransport.org/

77

Spatial autocorrelation

Geographical distance (m)

10 100 1000

Mor

an’s

I

0

Within approx. 100 meters the genetic structure correlates with the geographical distance

78

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

1 10 100 1000 10000 100000 1000000

Mean Geographical Distance (m)

Mo

ran

's I

Spatial autocorrelation

Moran’s I (coefficient of departure from spatial randomness) correlates with distance up to Distribution of genotypes (6 microsatellite markers) in different populations of P.ramorum in California

79

NJ tree of P. ramorum populations in California

SC-1MA-4

NURSERY

SC-3

MA-3

SO-1SO-2MA-5

SC-2MO-1MO-2

MA-2

MA-1

HU-1

HU-2

80

• Phytophthora ramorum (Oomycete) – causal agent of Sudden Oak Death (SOD) first reported in

California in 1994

– SOD affects tanoak (Lithocarpus densiflora), coast live oak (Quercus agrifolia), Californian black oak (Quercus kelloggii), and Canyon live oak (Quercus chrysolepis)

– P.ramorum also cause a disease characterized mostly by leaf blight and/or branch dieback in over 100 species of both wild and ornamental plants, including California bay laurel (Umbellularia cailfornica), California redwood (Sequoia sempervirens), Camellia and Rhododrendron species

Example: microsatellites genotyping of P. ramorum isolates

Collection of infected bay leaves from several forests in Sonoma, Monterey, Marin, Napa, Alameda, San Mateo

81

Microsatellites (I)mating type A1 (EU) and mating type A2 (US)

A2 (US) A1 (EU)

Locus 29 325/ - 325/337

-/337

Locus 33 315/337 325/337

Locus 65 234/252 236/244

220/222

82

Ind. MS39a MS39b MS43a MS43b MS45 MS18 MS64 Mating type

1 129-129 246-246 369-369 486-486 167-187 220-278 342-374 A1

2 129-129 246-246 369-369 486-486 167-187 220-278 342-374 A1

3 129-129 246-246 373-373 486-486 167-187 220-274 342-374 A1

4 129-129 246-246 373-373 486-486 167-187 220-274 342-378 A1

5 129-129 246-246 373-373 486-486 167-187 220-274 342-378 A1

6 129-129 246-246 373-373 486-486 167-187 220-274 342-378 A1

7 129-129 246-246 373-373 486-486 167-187 220-278 342-378 A1

8 129-129 246-246 373-373 486-486 167-187 220-278 342-374 A1

9 129-129 250-250 369-369 486-486 167-187 220-278 342-374 A1

10 129-129 250-250 369-369 486-486 167-187 220-278 342-374 A1

11 129-129 250-250 369-369 486-486 167-187 220-278 342-374 A1

12 129-129 250-250 377-377 490-490 167-187 220-278 342-374 A1

13 129-129 250-250 377-377 490-490 167-187 220-278 342-381 A1

14 129-129 250-250 377-377 490-490 167-187 220-278 342-381 A1

15 129-129 250-250 377-377 490-490 167-187 220-278 342-381 A1

16 129-129 246-246 377-377 490-490 167-187 220-278 342-374 A1

17 129-129 246-246 377-377 486-486 167-187 220-278 342-374 A1

18 129-129 246-246 369-369 486-486 167-187 220-278 342-374 A1

19 129-129 246-246 381-381 486-486 167-187 222-null 342-374 A2

20 129-129 246-246 381-381 494-494 167-187 222-null 342-374 A2

Genetic analysis requires variation at loci, variation of markers (polymorphisms)

• How the variation is structured will tell us– Does the microbe reproduce sexually or clonally– Is infection primary or secondary– Is contagion caused by local infectious spreaders or by a long-

disance moving spreaders– How far can individuals move: how large are populations– Is there inbreeding or are individuals freely outcrossing

CASE STUDY

A stand of adjacent trees is infected by a disease:

How can we determine the way trees are infected?

CASE STUDY

A stand of adjacent trees is infected by a disease:

How can we determine the way trees are infected?

BY ANALYSING THE GENOTYPE OF THE MICROBES: if the genotype is the same then we have local secondary tree-to-tree contagion. If all genotypes are different then primary infection caused by airborne spores is the likely cause of Contagion.

CASE STUDY

WE HAVE DETERMINED AIRBORNE SPORES (PRIMARY INFECTION ) IS THE MOST COMMON FORM OF INFECTION

QUESTION: Are the infectious spores produced by a localspreader, or is there a general airborne population of spores thatmay come from far away ?

HOW CAN WE ANSWER THIS QUESTION?

If spores are produced by a local spreader..

• Even if each tree is infected by different genotypes (each representing the result of meiosis like us here in this class)….these genotypes will be related

• HOW CAN WE DETERMINE IF THEY ARE RELATED?

HOW CAN WE DETERMINE IF THEY ARE RELATED?

• By using random genetic markers we find out the genetic similarity among these genotypes infecting adjacent trees is high

• If all spores are generated by one individual– They should have the same mitochondrial

genome– They should have one of two mating alleles

WE DETERMINE INFECTIOUS SPORES ARE

NOT RELATED• QUESTION: HOW FAR ARE THEY COMING FROM?

….or……

• HOW LARGE IS A POPULATION?Very important question: if we decide we want to wipe out

an infectious disease we need to wipe out at least the areas corresponding to the population size, otherwise we will achieve no result.

HOW TO DETERMINE WHETHER DIFFERENT SITES BELONG TO THE SAME POP

OR NOT?• Sample the sites and run the genetic markers

• If sites are very different:

– All individuals from each site will be in their own exclusive clade, if two sites are in the same clade maybe those two populations actually are linked (within reach)

– In AMOVA analysis, amount of genetic variance among populations will be significant (if organism is sexual portion of variance among individuals will also be significant)

– F statistics: Fst will be over ) 0.10 (suggesting sttong structuring)– There will be isolation by distance

Levels of Analyses

Individual

• identifying parents & offspring– very important in zoological circles – identify patterns of mating between individuals (polyandry, etc.)

In fungi, it is important to identify the "individual" -- determining clonal individuals from unique individuals that resulted from a single mating event.

Levels of Analyses cont…

• Families – looking at relatedness within colonies (ants, bees, etc.)

• Population – level of variation within a population. – Dispersal = indirectly estimate by calculating

migration– Conservation & Management = looking for

founder effects (little allelic variation), bottlenecks (reduction in population size leads to little allelic variation)

• Species – variation among species = what are the relationship between species.

• Family, Order, ETC. = higher level phylogenies

What is Population Genetics?

About microevolution (evolution of species)

The study of the change of allele frequencies,

genotype frequencies, and phenotype

frequencies

• Natural selection (adaptation)• Chance (random events)• Mutations• Climatic changes (population expansions and contractions)• …To provide an explanatory framework to describe the evolutionof species, organisms, and their genome, due to:Assumes that:• the same evolutionary forces acting within species(populations) should enable us to explain the differences we seebetween species• evolution leads to change in gene frequencies within populations

Goals of population genetics

Pathogen Population Genetics

• must constantly adapt to changing environmental conditions to survive– High genetic diversity = easily adapted– Low genetic diversity = difficult to adapt to changing

environmental conditions– important for determining evolutionary potential of a pathogen

• If we are to control a disease, must target a population rather than individual

• Exhibit a diverse array of reproductive strategies that impact population biology

Analytical Techniques

– Hardy-Weinberg Equilibrium • p2 + 2pq + q2 = 1• Departures from non-random mating

– F-Statistics• measures of genetic differentiation in populations

– Genetic Distances – degree of similarity between OTUs

• Nei’s• Reynolds• Jaccards• Cavalli-Sforza

– Tree Algorithms – visualization of similarity• UPGMA• Neighbor Joining

Allele Frequencies

• Allele frequencies (gene frequencies) = proportion of all alleles in an all individuals in the group in question which are a particular type

• Allele frequencies: p + q = 1

• Expected genotype frequencies: p2 + 2pq + q2

Evolutionary principles: Factors causing changes in genotype

frequency • Selection = variation in fitness; heritable• Mutation = change in DNA of genes• Migration = movement of genes across populations

– Vectors = Pollen, Spores

• Recombination = exchange of gene segments• Non-random Mating = mating between neighbors rather

than by chance• Random Genetic Drift = if populations are small

enough, by chance, sampling will result in a different allele frequency from one generation to the next.

The smaller the sample, the greater the chance of deviation from an ideal population.

Genetic drift at small population sizes often occurs as a result of two situations: the bottleneck effect or the founder effect.

Founder Effects; typical of exotic diseases

• Establishment of a population by a few individuals can profoundly affect genetic variation– Consequences of Founder effects

• Fewer alleles

• Fixed alleles

• Modified allele frequencies compared to source pop

• GREATER THAN EXPECTED DIFFERENCES AMONG POPULATIONS BECAUSE POPULATIONS NOT IN EQUILIBRIUM (IF A BLONDE FOUNDS TOWN A AND A BRUNETTE FOUND TOWN B ANDF THERE IS NO MOVEMENT BETWEEN TOWNS, WE WILL ISTANTANEOUSLY OBSERVE POPULATION DIFFERENTIATION)

• The bottleneck effect occurs when the numbers of individuals in a larger population are drastically reduced

• By chance, some alleles may be overrepresented and others underrepresented among the survivors• Some alleles may be eliminated altogether• Genetic drift will continue to impact the gene pool until the population is large enough

Bottleneck Effect

Founder vs Bottleneck

Northern Elephant Seal: Example of Bottleneck

Hunted down to 20 individuals in 1890’s

Population has recovered to over 30,000

No genetic diversity at 20 loci

Hardy Weinberg Equilibriumand F-Stats

• In general, requires co-dominant marker system• Codominant = expression of heterozygote

phenotypes that differ from either homozygote phenotype.

• AA, Aa, aa

Hardy-Weinberg Equilibrium

• Null Model = population is in HW Equilibrium– Useful– Often predicts genotype frequencies well

if only random mating occurs, then allele frequenciesremain unchanged over time.

After one generation of random-mating, genotype frequencies are given by

AA Aa aap2 2pq q2

p = freq (A)q = freq (a)

Hardy-Weinberg Theorem

• The possible range for an allele frequency or genotype frequency therefore lies between ( 0 – 1)

• with 0 meaning complete absence of that allele or genotype from the population (no individual in the population carries that allele or genotype)

• 1 means complete fixation of the allele or genotype (fixation means that every individual in the population is homozygous for the allele -- i.e., has the same genotype at that locus).

Expected Genotype Frequencies

1) diploid organism2) sexual reproduction3) Discrete generations (no overlap)4) mating occurs at random5) large population size (infinite)6) No migration (closed population)7) Mutations can be ignored8) No selection on alleles

ASSUMPTIONS

If the only force acting on the population is random mating, allele frequencies remain unchanged and genotypic frequencies are constant.

Mendelian genetics implies that genetic variability can persist indefinitely, unless other evolutionary forces act to remove it

IMPORTANCE OF HW THEOREM

Departures from HW Equilibrium

• Check Gene Diversity = Heterozygosity– If high gene diversity = different genetic sources due

to high levels of migration

• Inbreeding - mating system “leaky” or breaks down allowing mating between siblings

• Asexual reproduction = check for clones– Risk of over emphasizing particular individuals

• Restricted dispersal = local differentiation leads to non-random mating

Pop 1

Pop 2Pop 3

Pop 4

FST = 0.02FST = 0.30

Pop1 Pop2 Pop3

Sample size

20 20 20

AA 10 5 0

Aa 4 10 8

aa 6 5 12

Pop1 Pop2 Pop3

Freq

p (20 + 1/2*8)/40 = 0.60

(10+1/2*20)/40 = .50

(0+1/2*16)/40 = 0.20

q (12 + 1/2*8)/40 = 0.40

(10+1/2*20)/40 = .50

(24+1/2*16)/40 = 0.80

• Calculate HOBS

– Pop1: 4/20 = 0.20– Pop2: 10/20 = 0.50– Pop3: 8/20 = 0.40

• Calculate HEXP (2pq)– Pop1: 2*0.60*0.40 = 0.48– Pop2: 2*0.50*0.50 = 0.50– Pop3: 2*0.20*0.80 = 0.32

• Calculate F = (HEXP – HOBS)/ HEXP

• Pop1 = (0.48 – 0.20)/(0.48) = 0.583• Pop2 = (0.50 – 0.50)/(0.50) = 0.000• Pop3 = (0.32 – 0.40)/(0.32) = -0.250

Local Inbreeding Coefficient

F StatsProportions of Variance

• FIS = (HS – HI)/(HS)

• FST = (HT – HS)/(HT)

• FIT = (HT – HI)/(HT)

Pop Hs HI p q HT FIS FST FIT

1 0.48 0.20 0.60 0.40

2 0.50 0.50 0.50 0.50

3 0.32 0.40 0.20 0.80

Mean

0.43 0.37 0.43 0.57 0.49 -0.14

0.12 0.24

Important point

• Fst values are significant or not depending on the organism you are studying or reading about:

– Fst =0.10 would be outrageous for humans, for fungi means modest substructuring

R E S E A R C H A R T I C L E

Isolation by landscape in populations of a prized edible mushroom Tricholoma matsutake Anthony Amend Æ Matteo Garbelotto Æ Zhendong Fang Æ Sterling Keeley Conserv Genet DOI 10.1007/s10592-009-9894-0





Microsatellites or SSRs

• AGTTTCATGCGTAGGT CG CG CG CG CG AAAATTTTAGGTAAATTT

• Number of CG is variable• Design primers on FLANKING region, amplify DNA• Electrophoresis on gel, or capillary• Size the allele (different by one or more repeats; if

number does not match there may be polimorphisms in flanking region)

• Stepwise mutational process (2 to 3 to 4 to 3 to2 repeats)

Rhizopogon vulgaris

Rhizopogon occidentalisHost islands within the California Northern ChannelIslands create fine-scale genetic structure in two sympatricspecies of the symbiotic ectomycorrhizal fungusRhizopogon

Rhizopogon sampling & study area

• Santa Rosa, Santa Cruz– R. occidentalis– R. vulgaris

• Overlapping ranges– Sympatric– Independent

evolutionary histories

Sampling

Bioassay – Mycorrhizal pine roots

BT

N E

W

Local Scale Population Structure

Rhizopogon occidentalis

FST = 0.26

FST = 0.33FST = 0.24

Grubisha LC, Bergemann SE, Bruns TDMolecular Ecology in press.

FST = 0.17

Populations are differentPopulations are similar

8-19 km

5 km

N E

W

Local Scale Population Structure

Rhizopogon vulgaris

FST = 0.21

FST = 0.25FST = 0.20

Grubisha LC, Bergemann SE, Bruns TDMolecular Ecology in press

Populations are different

B.

Santa Cruz Island (SCI) Santa Rosa Island (SRI)

Locus Allele SCI East SCI North SCI West SRI Rvu24.9 234 0.267 0.458 0.576

237 0.467 0.479 0.424 1.000 240 0.267 0.063

Rvu20.80 144 0.033 0.033 153 0.383 0.156 0.076 0.833 156 0.133 0.323 0.065 159 0.400 0.281 0.739 0.167 162 0.104 0.087 165 0.033 0.135 168 0.017

Rvu19.80 195 0.050 0.167 0.054 198 0.042 0.033 201 0.100 0.125 0.663 204 0.017 0.010 207 0.817 0.615 0.228 1.000 210 0.017 0.042 0.022

Rvu20.46 144 0.017 0.042 0.478 0.417 147 0.983 0.958 0.522 0.583

Rvu21.83 291 0.021 294 0.433 0.646 0.587 1.000 297 0.300 0.125 0.043 300 0.050 0.010 0.370 303 0.200 0.115 306 0.017 0.073 309 0.010

Rvu21.13 261 0.983 0.865 0.989 1.000 264 0.017 0.135 0.011

How do we know that we are sampling a population?

• We actually do not know

• Mostly we tend to identify samples from a discrete location as a population, obviously that’s tautological

• Assignment tests will use the data to define population, that is what Grubisha et al. did using the program STRUCTURE

Four phases of INVASION

• TRANSPORT

• SURVIVAL AND ESTABLISHMENT (LAG PHASE)

• INVASION

• POST-INVASION

TRANSPORT

• Biology will determine how

• Normally very few organisms will make it

• Use phylogeographic approach to determine origin ( Armillaria, Heterobasidion)

• Use population genetic approach (Cryphonectria, Certocystis fimbriata)

TRANSPORT-2

• Need to sample source pop or a pop that is close enough

• Need markers that are polymorphic and will differentiate genotypes haplotypes

• Need analysis that will discriminate amongst individuals and identify relationships ( similarity clusterying, parsimony, Fst & N, coalescent)

ESTABLISHMENT

• LAG PHASE; normally effects not noticed because mortality are masked by background normal mortality

• By the time the introduction is discovered, normally too late to eradicate

• Short lag phase= aggressive pathogen• Long lag phase= less aggressive pathogen

ESTABLISHMENT

• NORMALLY REDUCED GENETIC VARIABILITY

INVASION

• Because of lack of equilibrium, high Fst values, I.e. strong genetic structuring among populations

• Normally dominance of a few genotypes

• Spatial autocorrelation analyses to tell us exten of spread

INVASION-2

• Later phase: genetic differentiation

• Higher genetic difference in areas of older establishment

Summary DNA evolves leading to unique sequences that may be used to identify species, biological...

Documents

Transcript of Summary DNA evolves leading to unique sequences that may be used to identify species, biological...