Molecular Phylogenetics

74
1 Dan Graur Molecular Molecular Phylogenetics Phylogenetics

description

Molecular Phylogenetics. Dan Graur. Objectives of molecular phylogenetics. Reconstruct the correct evolutionary relationships among biological entities Estimate the time of divergence between biological entities Chronicle the sequence of events along evolutionary lineages. - PowerPoint PPT Presentation

Transcript of Molecular Phylogenetics

Page 1: Molecular Phylogenetics

1

Dan Graur

Molecular Molecular PhylogeneticsPhylogenetics

Page 2: Molecular Phylogenetics

2

Objectives of molecular Objectives of molecular phylogeneticsphylogenetics

• Reconstruct the correct evolutionary relationships among biological entities

• Estimate the time of divergence between biological entities

• Chronicle the sequence of events along evolutionary lineages

Page 3: Molecular Phylogenetics

3

Evolutionary relationships are Evolutionary relationships are illustrated by means of a illustrated by means of a phphyylologgenetic treeenetic tree or a or a dendodendoggramram..

Page 4: Molecular Phylogenetics

4

Ernst Heinrich Haeckel 1834-1919

Page 5: Molecular Phylogenetics

5

July 1837

July 2007

Page 6: Molecular Phylogenetics

6November 1859

Page 7: Molecular Phylogenetics

7

The routes of inheritance represent the passage of genes from parents to offspring, and the branching pattern depicts a gene tree.

Page 8: Molecular Phylogenetics

8

Different genes, however, may have different evolutionary histories, i.e., different routes of inheritance.

Page 9: Molecular Phylogenetics

9

The routes of inheritance are confined by reproductive barriers, i.e., gene flow occurs only within a species. A species tree is a representation of splitting of species lineages.

Page 10: Molecular Phylogenetics

10

TerminologyTerminology

Page 11: Molecular Phylogenetics

11

A phylogenetic tree or dendrogram is a graph composed of nodes and branches, in which only one branch connects any two adjacent nodes.

Page 12: Molecular Phylogenetics

12

InternalExternal or Peripheral

Branch

Page 13: Molecular Phylogenetics

13

Page 14: Molecular Phylogenetics

14

Assumptions:Bifurcation = Real

speciation event

Multifurcation = Lack of resolution

Page 15: Molecular Phylogenetics

15

Binary tree

Page 16: Molecular Phylogenetics

16

Rooted and unrooted trees

Page 17: Molecular Phylogenetics

17

How many unrooted topologies are here?

a

b

c

d

e

a

ec

db

a

b

c

e

d

b

a

c

d

e

43

21

Page 18: Molecular Phylogenetics

18

In an unrooted tree with four external nodes, the internal branch is referred to as the central branchcentral branch.

Page 19: Molecular Phylogenetics

19

Cladograms & Phylograms(collectively Dendograms)

Bacterium 1

Bacterium 3Bacterium 2

Eukaryote 1

Eukaryote 4Eukaryote 3Eukaryote 2

Bacterium 1

Bacterium 3Bacterium 2

Eukaryote 1

Eukaryote 4Eukaryote 3

Eukaryote 2

Phylograms show branch order and branch lengths

Cladograms show branching order - branch lengths are meaningless

Page 20: Molecular Phylogenetics

20

Unscaled phylogramScaled phylogram

Page 21: Molecular Phylogenetics

21

Page 22: Molecular Phylogenetics
Page 23: Molecular Phylogenetics

23

The Newick format In computer programs, trees are represented in a linear form by a string of nested parentheses, enclosing taxon names (and possibly also branch lengths and bootstrap values), and separated by commas. This type of representation is called the Newick format. The originator of this format in mathematics was Arthur Cayley.

Page 24: Molecular Phylogenetics

24

The Newick format The Newick format for phylogenetic trees was adopted on June 26, 1986 at an informal meeting at Newick's Lobster House in Dover, New Hampshire. The Newick format currently serves as the de facto standard for representing phylogenetic tree and is employed by almost all phylogenetic software tools. Unfortunately, it has never been described in a formal publication; the first time it is mentioned in a publication is in 1992.

Page 25: Molecular Phylogenetics

25

The Newick format In the Newick format, the pattern of the parentheses indicates the topology of the tree by having each pair of parentheses enclose all members of a monophyletic group. A phylogenetic tree in the Newick format always ends in a semicolon (;). ;

Page 26: Molecular Phylogenetics

26

The Newick format 

One can use the Newick format to write down rooted trees, unrooted trees, multifurcations, branch lengths, and bootstrap values.

Page 27: Molecular Phylogenetics

27

3 OTUs

1 unrooted tree = 3 rooted trees

Page 28: Molecular Phylogenetics

28

4 OTUs

3 unrooted trees = 15 rooted trees

Page 29: Molecular Phylogenetics

29

The number of possible bifurcating The number of possible bifurcating rooted trees (Nrooted trees (NRR) for ) for nn 22OTUsOTUs

NR =(2n 3)!

2n 2(n 2)!

The number of possible bifurcating The number of possible bifurcating unrooted trees (Nunrooted trees (NUU) for ) for nn 33OTUsOTUs

NU =(2n 5)!

2n 3(n 3)!

Page 30: Molecular Phylogenetics

30

Number of OTUs Number of possible rooted tree

2 13 34 155 1056 9547 10,3958 135,1359 2,027,02510 34,459,42515 213,458,046,676,875

20 8,200,794,532,637,891,559,375

Page 31: Molecular Phylogenetics

31

Evolution is an historical process.

Only one historical narrative is true.

From 8,200,794,532,637,891,559,375 possibilities, 1 possibility is true and 8,200,794,532,637,891,559,374 are false.

Truth is one, falsehoods are many.

Page 32: Molecular Phylogenetics

32

How do we know which of the 8,200,794,532,637,891,559,3758,200,794,532,637,891,559,375

trees is true?

Page 33: Molecular Phylogenetics

33

We don’t, we infer by using decision

criteria.

Page 34: Molecular Phylogenetics

34

True and inferred trees

The sequence of speciation events that has led to the formation of a group of OTUs is historically unique. A tree representing the true evolutionary history is called the true tree.

A tree that is obtained by using a certain set of data and a certain method of tree reconstruction is called an inferred tree.

An inferred tree may or may NOT be the true tree.

Page 35: Molecular Phylogenetics

35

ancestor

descendant 1 descendant 2

CladogenesisCladogenesis = the splitting of an evolutionary lineage into two genetically

independent lineages.

Page 36: Molecular Phylogenetics

36

ancestor

descendant 1 descendant 2

AnagenesisAnagenesis = changes occurring along an evolutionary lineage.

Page 37: Molecular Phylogenetics

37

In molecular phylogenetics, we assume that species are only created by cladogenesis.

Page 38: Molecular Phylogenetics

38

A gene tree may differ from a species tree

Page 39: Molecular Phylogenetics

39

Gene trees and species trees

It is often assumed that gene trees always equal species trees. This may be not be true.

a

b

c

A

B

D

Gene tree Species tree

Page 40: Molecular Phylogenetics

40

Orthologs and paralogs

a A*b* c BC*

Ancestral gene

Duplication yields 2 copies (paralogs) on the same genome

orthologousorthologous

paralogousA*C*b*

A mixture of orthologs and paralogs is sampled

Page 41: Molecular Phylogenetics

41

Page 42: Molecular Phylogenetics

42

A taxon is a species or a group of species that has been given a name, e.g., Homo Homo sapienssapiens (modern humans), or LepidopteraLepidoptera (butterflies), or herbsherbs.

There are codes of biological nomenclature which seek to ensure that every taxon has a single and stable name, and that every name is used for only one taxon.

Taxon Taxon (singular);(singular); Taxa Taxa (plural)(plural)

Page 43: Molecular Phylogenetics

43

• Strictly: A clade is a group of all the taxa that have been derived from a common ancestor plus the common ancestor itself.

• In molecular phylogenetics: A clade is a group of taxa under study that share a common ancestor, which is not shared by any other species outside the group.

Clades*

*also: monophyletic groups, natural clades

Page 44: Molecular Phylogenetics

44

• A taxon whose common ancestor is shared by any other taxon is called a paraphyletic taxon or an invalid taxon.

Paraphyletic Taxa

Reptiles are paraphyletic.

44

Page 45: Molecular Phylogenetics

45

• A named taxon that lacks phylogenetic validity, but is nonetheless used, is called a convenience taxon.

“a convenience fish”

Fish (Pisces)

Page 46: Molecular Phylogenetics

46

• If a clade is composed of two taxa, these are referred to as sister taxa.

Sister Taxa

Birds and crocodiles are sister taxa.

Page 47: Molecular Phylogenetics

47Phenotypic distance

= clades

Page 48: Molecular Phylogenetics

48

Which of the following groups are not monophyletic?

E. coli rat mouse baboon chimp human

a. human, chimpanzee, baboon b. mouse, chimpanzee, baboonc. rat, moused. human, chimpanzee, baboon, rat, mousee. E. coli, human, chimpanzee, baboon, rat, mouse

Page 49: Molecular Phylogenetics

49

Which of the following groups are not monophyletic?

E. coli rat mouse baboon chimp human

a. human, chimpanzee, baboon b. mouse, chimpanzee, baboonc. rat, moused. human, chimpanzee, baboon, rat, mousee. E. coli, human, chimpanzee, baboon, rat, mouse

Page 50: Molecular Phylogenetics

50

Page 51: Molecular Phylogenetics

51

A character provides

information about an

individual OTU.

A distance represents a quantitative

statement concerning the dissimilarity between two

OTUs.

Page 52: Molecular Phylogenetics

52

A character is a well-defined feature that in a taxonomic unit can assume one out of two or more mutually exclusive character states.

Mutually exclusive: If David is tall, David cannot be short.

Page 53: Molecular Phylogenetics

53

Page 54: Molecular Phylogenetics

54

Page 55: Molecular Phylogenetics

55

Continuous Discrete

BinaryMultistate

Unordered

UnpolarPolarUnpolarPolar

Character

Ordered

Page 56: Molecular Phylogenetics

56

A character is unordered if a change from one character state to any other character state can occur in one step.

Page 57: Molecular Phylogenetics

57

A character is ordered if there exists a unique symmetrical path of change from one character state to another.

Page 58: Molecular Phylogenetics

58

Polar

A character is polar if there exists a unique asymmetrical (irreversible) path of change from one character state to another.

Page 59: Molecular Phylogenetics

59

In partially ordered characters the number of steps varies for the different pairwise combinations of character states, but no definite relationship exists between the number of steps and the character-state.

Amino-acid sites are partially ordered characters. An amino acid cannot change into all other amino acids in a singe step, as sometimes 2 or 3 steps are required. For example, a tyrosine may only changeinto a leucine through an intermediate state, i.e., phenylalanine or histidine.

Page 60: Molecular Phylogenetics

60

The number of steps in partially ordered characters is specified by a step matrix, the elements of which indicate the number of steps required between any two character states

Page 61: Molecular Phylogenetics

61

Page 62: Molecular Phylogenetics

62

Assumptions about character evolution

Methods of phylogenetic reconstruction require that we make explicit assumptions about:

(1) the number of discrete steps required for one character state to change into another.

(2) the probability with which such a change may occur.

Page 63: Molecular Phylogenetics

63

Temporal Polarity of Character States

Character states may be ranked by relative antiquity into:

(1) primitive or ancestral (plesiomorphy)

(2) derived or novel (apomorphy)

Page 64: Molecular Phylogenetics

64

Taxonomic Distribution of Character States

A primitive state that is shared by several taxa is a symplesiomorphy.

A derived state that is shared by several taxa is a synapomorphy.

A derived character state unique to a particular taxon is an autapomorphy.

A character state that is shared by several taxa due to convergence, parallelism and reversals, rather than due to common descent, is a homoplasy.

sympathysynapsesyllablesystem

Page 65: Molecular Phylogenetics

65

C C

C

A

A

A

B A A

A

B

plesiomorphy

apomorphy(autapomorphy)

synapomorphysymplesiomorphy

homoplasy

A

D

Page 66: Molecular Phylogenetics

66

Page 67: Molecular Phylogenetics

67

Distance Data

Page 68: Molecular Phylogenetics

68

Page 69: Molecular Phylogenetics

69

Most molecular data yield character states that are subsequently converted into distances.

Page 70: Molecular Phylogenetics

70

Some molecular data can only be expressed as distances.

Page 71: Molecular Phylogenetics

71

Page 72: Molecular Phylogenetics

72

Page 73: Molecular Phylogenetics

73

Page 74: Molecular Phylogenetics

74

+