Lecture 7 Difficult problems….and solutions Platypus (Ornithorhynchus anatinus)

47
Lecture 7 Difficult problems….and solutions Platypus (Ornithorhynchus anatinus)

Transcript of Lecture 7 Difficult problems….and solutions Platypus (Ornithorhynchus anatinus)

Page 1: Lecture 7 Difficult problems….and solutions Platypus (Ornithorhynchus anatinus)

Lecture 7

Difficult problems….and solutions

Platypus (Ornithorhynchus anatinus)

Page 2: Lecture 7 Difficult problems….and solutions Platypus (Ornithorhynchus anatinus)

Non-homogenous evolution

Taxon1 ACGTAAGTCATCGTAGC Taxon2 ATGGAAATTATCGCGGT Taxon3 ACATAAATCATCGTAGA Taxon4 ACGCAAGTCATCGAAGT

3

1 2 1

43 4

2

Assuming equal substitution rates across sites

Allowing some sites to be invariant – reveals more parallel evolution among the variant sites

Mutations at some sites are lethal, so they are invariant

Page 3: Lecture 7 Difficult problems….and solutions Platypus (Ornithorhynchus anatinus)

Rates can also differ among the variable sites due to fitness effects, differential mutability and codon bias - again leading homogenous models to underestimate parallel change

Such rate variation can often be accommodated by assuming a gamma distribution of rates across sites in the likelihood (or distance) model

Page 4: Lecture 7 Difficult problems….and solutions Platypus (Ornithorhynchus anatinus)

Non-homogenous data partitions

Partition 1 Partition 2 Reconstructed under a single likelihood modelKolaczkowski and Thornton

(Nature, 2004)

Rifleman GTAACACTAGCCBroadbill GTCACACTAGCCFlycatcher GTTACATTAGCCLyrebird GTTACTTTAGCAIndigobird GTAACCCTAGCCZebraFinch GTAACCTTAGCARook GTAACTCTAGCA Codon pos. 123123123123

Red for variable sites, most change at 3rd positions

Rifleman

Page 5: Lecture 7 Difficult problems….and solutions Platypus (Ornithorhynchus anatinus)

reptiles

monotremes

marsupials

placentalsMarsupionta Theria

Competing hypotheses for the interrelations of the mammalian sub-classes

Page 6: Lecture 7 Difficult problems….and solutions Platypus (Ornithorhynchus anatinus)

Janke et al. (PNAS, 1997)

ML analysis of complete mitochondrial genome protein-coding sequences

Marsupionta

Page 7: Lecture 7 Difficult problems….and solutions Platypus (Ornithorhynchus anatinus)

Purine base frequency

00.2 0.3 0.4 0.5 0.6

0.6

0.1

0.8

0.4

0.2

1.0

ppn.

con

stan

t site

s

Model

TN93+I+ (concatenated) TN93+I+ (partitioned)

df

40

480

AIC

162260.5 158054.3

Grouping of protein - coding and RNA - coding genes based on observed constant site proportions and Purine base frequency. RNAloops ( ); RNAstems ( ); COI ( ); NADH6; ( ); ATPase8, NADH2, NADH4L ( ); ATPase6, NADH1, NADH3, NADH4, NADH5( ); COII, COIII, Cytb ( ).

Page 8: Lecture 7 Difficult problems….and solutions Platypus (Ornithorhynchus anatinus)

Partitioned ML: Theria is favoured

KH-test p-value - Phillips et al. (MPE, 2003)

Mar

supi

als

Mon

otre

mes

Pla

cent

als

Rep

tile

s

Theria

Page 9: Lecture 7 Difficult problems….and solutions Platypus (Ornithorhynchus anatinus)

Compositional heterogeneity

Stationarity: A standard assumption of most phylogeny reconstruction methods is that underlying substitution processes are the same across the tree

When violated, biases arise that provide signals in the data that can overwhelm the “true” phylogenetic signal

Shifting substitution processes (e.g. AG being favoured in some branches but G A in others) can result in signals for relationships arising due to similar DNA or protein sequence composition, rather than shared ancestry.

Page 10: Lecture 7 Difficult problems….and solutions Platypus (Ornithorhynchus anatinus)

ElephantPlatypus

Opossum

Bandicoot

Aardvark

Rook

Hippopotamus

Rhea

ViduaWallaroo

Brushtail PossumFin Whale

Mole

Armadillo

Green Turtle

Painted Turtle

Ostrich

61

53

5268

Extreme example: NJ tree - mt 3rd codon positions, transitions only

Branch thickness proportional to T:C ratio

Page 11: Lecture 7 Difficult problems….and solutions Platypus (Ornithorhynchus anatinus)

Composition 2 test (stochastic test)

Taxon A C G T-----------------------------------------------Rifleman 165 154 82 95Broadbill 203 142 48 103Flycatcher 195 115 60 126Lyrebird 138 142 127 89Indigobird 137 144 128 87Zebra Finch 141 143 124 88Rook 145 144 118 89Expected 160.57 140.57 98.14 96.71

Chi-square = (Exp-Obs)2

Exp* = 119.211273 df= (n-1)(t-1)= 18 P < 0.0001

Tells only of the presence of a bias and is unreliable when most of the variation occurs among a small number of character states

Page 12: Lecture 7 Difficult problems….and solutions Platypus (Ornithorhynchus anatinus)

Relative compositional variability (magnitude metric)

Allows the magnitude of compositional heterogeneity to be compared between sequences or coding regimes (for the same taxa)

RCV = (| Ai - A*| + | Ti - T*| + | Ci- C* | + | Gi - G* |) / n.ti1

n

Where Ai is the observed frequency of adenine for taxon i, A* is the average frequency of adenine across all taxa, n is the number of taxa and t is the number of sites

Page 13: Lecture 7 Difficult problems….and solutions Platypus (Ornithorhynchus anatinus)

Accounting for compositional heterogeneity

1. LogDet distances - recover additive distances between sequences when base composition varies

For each pair of DNA sequences x and y, a 4 4 matrix with each possible pair of sites

Olithodiscus(x) A C G T 224 5 24 8 3 149 1 16 24 5 230 4 5 19 8 175

0.249 0.006 0.027 0.009 0.003 0.166 0.001 0.018 0.027 0.006 0.256 0.004 0.006 0.021 0.009 0.194

Euglena(y)

A C G T

Fxy=

Dxy = -ln[det Fxy] = 6.216

Page 14: Lecture 7 Difficult problems….and solutions Platypus (Ornithorhynchus anatinus)

Rates-across-sites LogDet has yet to be developed, so this method is often inconsistent due to poor branch-length estimation

Euglena

Liverwort

Chlamydomonas

Rice

Tobacco

Anacystis

ChlorellaOlithodiscus

Lockhart et al. (MBE, 1994)

a. Jukes-Cantor distances

b. LogDet distances

Chlorophyll a/b

Chlorophyll a/cPhycobilin

uncertain

Euglena

Liverwort

Chlamydomonas

Rice

Tobacco

Anacystis

ChlorellaOlithodiscus

Page 15: Lecture 7 Difficult problems….and solutions Platypus (Ornithorhynchus anatinus)

2. Non-homogenous base composition Maximum likelihood

Galtier and Gouy (MBE, 1998)

ωλ1.Φ θ1

λ1.1Φ θ1λ2

θ2

λ3 θ3

λ4 θ4

λ6 θ6

λ5 θ5 λ7

θ7

Parameters symbol number root G+C% ω 1 branch-length λ 2n-3 root location Φ 1 Ts/Tv ratio κ 1 equilibrium G+C% θ 2n-2

Limitations 1. restricted to GC vs. AT bias 2. computer time intensive

Page 16: Lecture 7 Difficult problems….and solutions Platypus (Ornithorhynchus anatinus)

3. Character state re-coding

• Often much of the compositional heterogeneity arises within specific classes of character state

e.g. Purine and Pyrimidine transitions

These can be re-coded: RY-coding involves A,G R and C,T Y

• Similarly, lumping amino acids into functionally similar groups e.g. Valine, leucine and Isoleucine as single category of mid-sized aliphatic amino acids.

Page 17: Lecture 7 Difficult problems….and solutions Platypus (Ornithorhynchus anatinus)

Nardi et al. (Science, 2003) found Hexapoda to be paraphyletic

Page 18: Lecture 7 Difficult problems….and solutions Platypus (Ornithorhynchus anatinus)

Delsuc et al. (Science, 2003)

1st and 3rd codon positions RY-coded

RCVnt = 0.1064 RCVry = 0.0413

Hex

apod

a

Page 19: Lecture 7 Difficult problems….and solutions Platypus (Ornithorhynchus anatinus)

Mistaking precision for accuracy

106 nuclear genes: Different methods provide conflicting Yeast topologies, each with 100% bootstrap support

The results underline the importance of understanding how non-phylogenetic signals will bias inference under the model used

Phillips et al. (MBE, 2004)

Page 20: Lecture 7 Difficult problems….and solutions Platypus (Ornithorhynchus anatinus)

Not enough phylogentic signal to resolve the tree

Branch-length too short Ans. Increase gene sequencing

Signal erosion with time Ans. Use high-value (often slower evolving) characters

Long unbroken branches make for “noisier” data Ans. Increase taxon sampling

Page 21: Lecture 7 Difficult problems….and solutions Platypus (Ornithorhynchus anatinus)

Stemminess (Fiala and Sokal: Evol., 1985) on uncorrected distance trees indicates the relative extent of phylogenetic signal erosion among alternative sequemces (or coding regimes) for the same taxa

Σ external branch-lengthstotal tree-length

Stemminess =

Greater phylogenetic signal retention for slower evolving genes results in higher stemminess

Page 22: Lecture 7 Difficult problems….and solutions Platypus (Ornithorhynchus anatinus)

Tigercat

Dunnart

Wombat

Brushtail

Wallaroo

Monodelphis

Opossum

Spiny Bandicoot

Northern Brown Bandicoot

Tigercat

Dunnart

Wombat

Brushtail

Wallaroo

MonodelphisOpossum

12 mitochondrial protein-coding genes

Stemminess =0.086

5 nuclear protein-coding genes

Stemminess =0.440

Spiny Bandicoot

Northern Brown Bandicoot

Page 23: Lecture 7 Difficult problems….and solutions Platypus (Ornithorhynchus anatinus)

Saturation – the problem of multiple changes at the same sites

• Theory, simulations, and practical experience all indicate that the sequences must eventually lose information about events that were long ago.

• Part of the problem with using DNA sequence alignments to infer deep events is that the state space is small {A,C,G,T}

Page 24: Lecture 7 Difficult problems….and solutions Platypus (Ornithorhynchus anatinus)

Other sorts of characters

• In an idealised situation where each site had an infinite state space there would be no parallel changes or reversals and our character matrices would be homoplasy free.

• Obviously it is interesting to try and find characters that are closer to this ideal than DNA sequences.

Page 25: Lecture 7 Difficult problems….and solutions Platypus (Ornithorhynchus anatinus)

SINEs and LINEs

• SINEs (and LINEs) are Short (or Long) interspersed nuclear elements.

• Retrotransposed DNA elements that are copied into the genome.

• Low expectations for the same retrotransposon sequence to insert in exactly the same position independently (low homoplasy markers)

Page 26: Lecture 7 Difficult problems….and solutions Platypus (Ornithorhynchus anatinus)

Taxon1 ATGCT-------//-------GTCTAGT Taxon2 AGGCTGTTATGT//TCTCTAGGTCAAGT Taxon3 ATGCTGCTATGT//TCTCTAGGTCTATT Taxon4 ATACT-------//-------GTATAGT

Insertion event 1 into chromosome A

The SINE/LINE is copied from loci 1 on chromosome A to loci 2 on chromosome B

Loci 2 sequence

Taxon3 (present at loci 1 and 2)

Taxon2 (present at loci 1 and 2)

Taxon4 (only present at loci 1)

Taxon1 (not present at loci 1 or loci 2)

Page 27: Lecture 7 Difficult problems….and solutions Platypus (Ornithorhynchus anatinus)

Competing hypothesis for the position of the whales

Page 28: Lecture 7 Difficult problems….and solutions Platypus (Ornithorhynchus anatinus)

SINEs and LINEs provide homoplasy free support for the position of the whales as sister group to the hippos.

Page 29: Lecture 7 Difficult problems….and solutions Platypus (Ornithorhynchus anatinus)

Genome-order based phylogeny

Large state-space

• DNA sequences : 4 states per site• Signed circular genomes with n genes:

states, 1 site

• Circular genomes (1 site)

– with 37 genes: states

– with 120 genes: states

2n-1(n1)!

2.56×1052

3.70×10232

Page 30: Lecture 7 Difficult problems….and solutions Platypus (Ornithorhynchus anatinus)

Reference sequence

Inversion (of orange and blue)

Transposition (of grey)

Indicates sequence read direction

Inverted transposition (of grey)

Genome rearrangements

Page 31: Lecture 7 Difficult problems….and solutions Platypus (Ornithorhynchus anatinus)

Breakpoint Distance

• Breakpoint distance=5

1 2 3 4 5 6 7 8 9 10

1 –3 –2 4 5 9 6 7 8 10

Page 32: Lecture 7 Difficult problems….and solutions Platypus (Ornithorhynchus anatinus)

Minimum Inversion Distance

1 2 3 4 5 6 7 8 9 10

1 2 3 –8 –7 –6 –5 –4 9 10

1 8 –3 –2 –7 –6 –5 –4 9 10

1 8 –3 7 2 –6 –5 –4 9 10

• Inversion distance=3

Page 33: Lecture 7 Difficult problems….and solutions Platypus (Ornithorhynchus anatinus)

Distance-based methods

Tandy Warnow, UT-Austin

Page 34: Lecture 7 Difficult problems….and solutions Platypus (Ornithorhynchus anatinus)

Maximum Parsimony on Rearranged Genomes (MPRG)

• The leaves are rearranged genomes.• Find the tree that minimizes the total number of

rearrangement events

A

B

C

D

3 6

2

3

4

A

B

C

D

EF

Total length= 18

Tandy Warnow, UT-Austin

Page 35: Lecture 7 Difficult problems….and solutions Platypus (Ornithorhynchus anatinus)

Mitochondrial genome rearrangement maximum parsimony

Fritzsch et al. (J.Theor. Biol., 2006)

Data choice and analytical methods are in their infancy

Note non-monophyly of Nematoda and Mollusca; Well resolved sequence and morphology clades

?

Page 36: Lecture 7 Difficult problems….and solutions Platypus (Ornithorhynchus anatinus)

An additional possibility is that there are multiple signals: 1. Biases in the data (e.g. compositional heterogeneity), 2. genes have different histories (e.g. lineage sorting or hybridization)

If a gene has a long coalescent time, then its relationships among taxa may differ from the species tree

Gene tree

Species tree

A B C D

Page 37: Lecture 7 Difficult problems….and solutions Platypus (Ornithorhynchus anatinus)

Molecular dating

e.g. Zukerkandl and Pauling (J. Theor Biol., 1965)

The molecular clock

Time since divergence

Gen

etic

cha

nge

Time since divergenceG

enet

ic d

iver

genc

e

observed

corrected for saturation

Human – ChimpanzeeHuman – MouseHuman – Bird

Page 38: Lecture 7 Difficult problems….and solutions Platypus (Ornithorhynchus anatinus)

Is the data clock-like?

Can the deviation from an ultrametric tree be explained by the stochastic nature of substitution (sampling error), or do substitution rates differ across the tree?

Page 39: Lecture 7 Difficult problems….and solutions Platypus (Ornithorhynchus anatinus)

Relative rates tests

HO: Two sister taxa are evolving at the same rate (by comparison with an outgroup)

Hebsgaard et al. (TIM, 2005)

Page 40: Lecture 7 Difficult problems….and solutions Platypus (Ornithorhynchus anatinus)

Molecular clock likelihood ratio testHO: That a clock model explains the data as well as a non-clock model

1. Optimize the likelihood of the (unrooted) tree under a non-clock model (lnLn)

2. Optimise the likelihood of the (rooted) tree under a clock model (lnLc)

3. Calculate the test statistic = 2(lnLc minus lnLn)

4. This is compared to a 2 distribution critical value (where the degrees of freedom are the difference in the number of free parameters being estimated between the two models = n2)

Page 41: Lecture 7 Difficult problems….and solutions Platypus (Ornithorhynchus anatinus)

Linearized trees: Takezaki et al. (MBE, 1995)

Prune the taxa that are the most non-clock-like until the molecular clock likelihood ratio test is passed

Concerns: 1. removing any branches reduces the power of the test (so increases the probability of passing) and 2. remaining branches may hide complementary rate shifts that cancel out

Page 42: Lecture 7 Difficult problems….and solutions Platypus (Ornithorhynchus anatinus)

Relaxing the molecular clock

1. Local clocks 2. Autocorrelated rate evolution

r1

r2

r3

r6r5

r4r3

r1 r2

r10r9

r8

r7

Relies on the identification of rate classes with respect to clades

Each rate ri is a function of the rate of its parent branch. Many different models of rate change have been applied including: quadratic, lognormal, exponential, gamma, Ornstein-Uhlenbeck

Page 43: Lecture 7 Difficult problems….and solutions Platypus (Ornithorhynchus anatinus)

3. Uncorrelated rate evolution

r6r5

r4r3

r1 r2

r10r9

r8

r7

Method of Drummond et al. (PLoS Biol., 2006)

Rates ri do not depend on the rate of their parent branch, but are drawn from a lognormal or exponential distribution that maximises the posterior probability of the tree

Page 44: Lecture 7 Difficult problems….and solutions Platypus (Ornithorhynchus anatinus)

Performance of correlated rates methods on trees simulated under uncorrelated rates among branches

Page 45: Lecture 7 Difficult problems….and solutions Platypus (Ornithorhynchus anatinus)

Calibrating molecular clocks

Biogeographical divergences

e.g. New Zealand split from Gondwana about 80 million years ago and so did some of New Zealand’s endemic fauna

Fossils that post-date divergences

61 Ma calibration

Pen

guin

s

Alb

atro

ss

Duc

ks

90 Ma estimateSlack et al., (MBE, 2006)

Page 46: Lecture 7 Difficult problems….and solutions Platypus (Ornithorhynchus anatinus)

timePoint calibration

Calibration bounds

upper lower

Flat Prior

Normal Prior

Page 47: Lecture 7 Difficult problems….and solutions Platypus (Ornithorhynchus anatinus)

Using a lognormal (19Ma-25Ma upper 95%, mean=21Ma) calibration for cats/hyaenas

Barnett et al. (Curr. Biol., 2005)

25 20 15 10 5 0Millions of years ago