Phylogenetics and Coalescence

28
Phylogenetics and Coalescence Lab 9 October 24, 2012

description

Phylogenetics and Coalescence. Lab 9 October 24, 2012. Goals. Construct phylogenetic trees using the UPGMA method Use nucleotide sequences to construct phylogenetic trees using UPGMA, NJ, and Maximum Parsimony methods Use coalescent simulation to determine historical change in N e - PowerPoint PPT Presentation

Transcript of Phylogenetics and Coalescence

Page 1: Phylogenetics and Coalescence

Phylogenetics and Coalescence

Lab 9October 24, 2012

Page 2: Phylogenetics and Coalescence

Goals• Construct phylogenetic trees using the UPGMA

method• Use nucleotide sequences to construct phylogenetic

trees using UPGMA, NJ, and Maximum Parsimony methods

• Use coalescent simulation to determine historical change in Ne

• Interpret coalescent trees to draw inferences about human migrations

Page 3: Phylogenetics and Coalescence

Phylogenetic Methods

• Scope of the problem– Number of possible unrooted trees for n OTUs:

– For 10 taxa -> 2,027,025 possible unrooted trees.– Need an optimality criterion

Page 4: Phylogenetics and Coalescence

Phylogenetic methods

A. Distance methods.1. Unweighted Pair Group Methods using Arithmetic

averages(UPGMA).2. Neighbor Joining (NJ).3. Minimum evolution(ME).

B. Character based methods.1. Maximum Parsimony (MP).2. Maximum Likelihood (ML).3. Bayesian Method (BA)

Page 5: Phylogenetics and Coalescence

UPGMA

Taxa 1 2 3 4 5 6 7Human T G C G T A TChimpanzee T G G G T A TGorilla T G C G C T TOrangutan T G C T G T GGibbon T A G T A G C

Step 1: Generate data (Sequence/ Genotype/ Morphological) for each OTU.

Page 6: Phylogenetics and Coalescence

Distance can be calculated by using different substitution models:1. # of nucleotide differences.2. p-distance.3. JC distance4. K2P distance.5. F816. HKY857.GTR etc

Step 2: Calculate p- distance for all pairs of taxa.

Taxa 1 2 3* 4 5 6 7Human T G C G T A TChimpanzee T G G G T A T

= 0.142857143

Page 7: Phylogenetics and Coalescence

Step 3: Calculate distance matrix for all pairs of taxa and select pair of taxa with minimum distance as new OTU.

Taxa Hu Ch Go Or GiHu 0 Ch 0.1428 0 Go 0.2857 0.4285 0 Or 0.5714 0.7142 0.42850 Gi 0.8571 0.7142 0.85710.71420

HumanChimpanzee

0.714

0.714

Page 8: Phylogenetics and Coalescence

Step 4: Recalculate new distance matrix, assuming human and chimpanzee as one OTU.

taxa Hu+ch Go Or GiHu+ChGoOrGi

= 0.3571

taxa Hu+ch Go Or GiHu+Ch 0 Go 0.35714 0 Or 0.64285 0.4285 0 Gi 0.78571 0.8571 0.7142 0

Taxa Hu Ch Go Or GiHu 0 Ch 0.1428 0 Go 0.2857 0.4285 0 Or 0.5714 0.7142 0.42850 Gi 0.8571 0.7142 0.85710.71420

Page 9: Phylogenetics and Coalescence

Step 5: Select pair of taxa with minimum distance as new OTU.

Human

Chimpanzee

0.071

0.071

Gorilla0.179

0.107

Page 10: Phylogenetics and Coalescence

Step 6: Again select pair of OTU with minimum distance as new OTU and recalculate distance matrix.

taxa (Hu+ch)Go Or Gi(Hu+ch)GoOrGi

= 0.5714

taxa (Hu+ch)Go Or Gi(Hu+ch)Go 0 Or 0.5714 0 Gi 0.8095 0.7142 0

Taxa Hu Ch Go Or GiHu 0 Ch 0.1428 0 Go 0.2857 0.4285 0 Or 0.5714 0.7142 0.42850 Gi 0.8571 0.7142 0.85710.71420

Page 11: Phylogenetics and Coalescence

Step 7: Again select pair of taxa with minimum distance as new OTU.

Chimpanzee

Human0.071

0.071

Gorilla0.179

0.107

Orangutan0.286

0.107

Page 12: Phylogenetics and Coalescence

Step 8: Again select pair of OTU with minimum distance as new OTU and recalculate distance matrix.

taxa ((Hu+ch)Go)Or Gi((Hu+ch)Go)Or

Gi

= 0.7857

taxa ((Hu+ch)Go)Or Gi((Hu+ch)Go)Or 0

Gi 0.7857 0

Taxa Hu Ch Go Or GiHu 0

Ch 0.1428 0 Go 0.2857 0.4285 0 Or 0.5714 0.7142 0.4285 0 Gi 0.8571 0.7142 0.8571 0.7142 0

Page 13: Phylogenetics and Coalescence

Step 9: Again select pair of OTU with minimum distance as new OTU and make final rooted tree.

Chimpanzee

Human0.071

0.071

Gorilla0.179

0.107

Orangutan0.286

0.107

Gibbon0.393

0.107

Page 14: Phylogenetics and Coalescence

Branch Supports

1.Bootstrap support.

2.Jack-knife support.

3.Bremer support.

4.Posterior probability support.

Page 15: Phylogenetics and Coalescence

Bootstrap supportStep 1: Randomly make “n” pseudo-replicates of the data with replacement and make tree from each replicate.

Taxa 2 2 3 4 6 7 7Human G G C G A T TChimpanzee G G G G A T TGorilla G G C G T T T

Taxa 1 3 5 6 7 2 4Human T C T A T G GChimpanzee T G T A T G GGorilla T C C T T G G

Taxa 1 2 3 4 5 6 7Human T G C G T A TChimpanzee T G G G T A TGorilla T G C G C T T

Page 16: Phylogenetics and Coalescence

Bootstrap supportStep 2: Make consensus tree of trees obtained from all pseudo replicates.

Page 17: Phylogenetics and Coalescence

Phylogenetic Software available

1.PAUP.

2.Phyllip.

3.MrBayes.

4.Mega.

Page 18: Phylogenetics and Coalescence

Problem 1. File mt_primates.meg contains the sequence data used to calculate the genetic distances in Example 1. Use Mega to build phylogenetic trees based on:

1.UPGMA. 2.The NJ Method.3.Maximum Parsimony.

Compute bootstrap confidence in the internal nodes of each tree.

Compare the trees derived using each of these methods. Which do you think is the most informative? Does the computational efficiency of the UPGMA method result in misleading results in this case?

Page 19: Phylogenetics and Coalescence

Problem 2. File pdha1_human.meg contains haplotypes detected by sequencing a 4.2-kb region of the X-linked Pyruvate Dehydrogenease E1 α Subunit (PDHA1) in 16 African and 19 non-African males. Use Mega to build a phylogenetic tree based on the NJ Method and interpret the results in the light of hypotheses about the origin of modern humans (see Example 11.4, p. 620-621, as well as p. 618 in Hedrick 2005).

Page 20: Phylogenetics and Coalescence

Coalescence

Wright-Fisher Model• Until now we have implicitly used the Wright-

Fisher Model• Computationally expensive

Page 21: Phylogenetics and Coalescence

Wright Fisher

Page 22: Phylogenetics and Coalescence

The Discrete Coalescent

• Probability that two genes have MRCA j generations ago

• Probaility that 2 genes out of k have a common ancestor j generations ago

NNjTP

j

2

1

2

11)(

1

Probability of no coalescence for j – 1 generations

Probability of coalescence in the jth generation

NNjTP kk

j

2

1

22

1

21)(

1

Probability of no coalescence in k lineages for j – 1 generations

Probability of coalescence in the jth generation

Page 23: Phylogenetics and Coalescence

The Continuous Coalescent

• Can derive continuous exponential function from discrete geometric representation

• Waiting time (T) for k genes to have k-1 ancestors (See math box 3.2 in Hamilton, 2009)

tketTP

21)(

2N

j Let t

Page 24: Phylogenetics and Coalescence

The Continuous Coalescent

• Model of population growth underlies coalescence

Exponential Growth Bottleneck Hein et al. 2005

Page 25: Phylogenetics and Coalescence

Coalescent Applications

• Coalescent topologies can be dependent upon convolution of Ne and μ, migration rate, selection, recombination rate.

• Applications– Estimating recombination rates– Estimating historical migration rates between

poulations– Estimating tMRCA– Estimating historical effective population size– Estimating strength of selection

Page 26: Phylogenetics and Coalescence

From Data to coalescence

• Suppose we observe n genes with k mutations• We want to get θ=4Neμ but do not know its

true value• Can calculate likelihood of θ for a bunch of

possible values and find the one with highest probability

dHHPHDPDPDLH

)|(),|()|()|(

Page 27: Phylogenetics and Coalescence

MCMC

1. Sample a new history from a distribution of histories (topologies + waiting times)

2. Divide the likelihood of this new history by the likelihood of the last history sampled

3. With probability proportional to this likelihood ratio, move to the new point.

4. Repeat steps 1-4.

dHHPHDPDPDLH

)|(),|()|()|(

Page 28: Phylogenetics and Coalescence

Problem

• Fossil and molecular based evidence have both provided strong evidence for the divergence of the human and chimpanzee lineages approximately 6 MYA. However, timings and locations of human expansions beyond Africa have proved controversial. Use the Bayesian MCMC software BEAST to derive coalescent trees for sequences from the X-linked Pyruvate Dehydrogenease E1-alpha subunit gene that you also analyzed in Problem 2