Molecular Phylogenetics
description
Transcript of Molecular Phylogenetics
1
Dan Graur
Molecular Molecular PhylogeneticsPhylogenetics
2
Objectives of molecular Objectives of molecular phylogeneticsphylogenetics
• Reconstruct the correct evolutionary relationships among biological entities
• Estimate the time of divergence between biological entities
• Chronicle the sequence of events along evolutionary lineages
3
Evolutionary relationships are Evolutionary relationships are illustrated by means of a illustrated by means of a phphyylologgenetic treeenetic tree or a or a dendodendoggramram..
4
Ernst Heinrich Haeckel 1834-1919
5
July 1837
July 2007
6November 1859
7
The routes of inheritance represent the passage of genes from parents to offspring, and the branching pattern depicts a gene tree.
8
Different genes, however, may have different evolutionary histories, i.e., different routes of inheritance.
9
The routes of inheritance are confined by reproductive barriers, i.e., gene flow occurs only within a species. A species tree is a representation of splitting of species lineages.
10
TerminologyTerminology
11
A phylogenetic tree or dendrogram is a graph composed of nodes and branches, in which only one branch connects any two adjacent nodes.
12
InternalExternal or Peripheral
Branch
13
14
Assumptions:Bifurcation = Real
speciation event
Multifurcation = Lack of resolution
15
Binary tree
16
Rooted and unrooted trees
17
How many unrooted topologies are here?
a
b
c
d
e
a
ec
db
a
b
c
e
d
b
a
c
d
e
43
21
18
In an unrooted tree with four external nodes, the internal branch is referred to as the central branchcentral branch.
19
Cladograms & Phylograms(collectively Dendograms)
Bacterium 1
Bacterium 3Bacterium 2
Eukaryote 1
Eukaryote 4Eukaryote 3Eukaryote 2
Bacterium 1
Bacterium 3Bacterium 2
Eukaryote 1
Eukaryote 4Eukaryote 3
Eukaryote 2
Phylograms show branch order and branch lengths
Cladograms show branching order - branch lengths are meaningless
20
Unscaled phylogramScaled phylogram
21
23
The Newick format In computer programs, trees are represented in a linear form by a string of nested parentheses, enclosing taxon names (and possibly also branch lengths and bootstrap values), and separated by commas. This type of representation is called the Newick format. The originator of this format in mathematics was Arthur Cayley.
24
The Newick format The Newick format for phylogenetic trees was adopted on June 26, 1986 at an informal meeting at Newick's Lobster House in Dover, New Hampshire. The Newick format currently serves as the de facto standard for representing phylogenetic tree and is employed by almost all phylogenetic software tools. Unfortunately, it has never been described in a formal publication; the first time it is mentioned in a publication is in 1992.
25
The Newick format In the Newick format, the pattern of the parentheses indicates the topology of the tree by having each pair of parentheses enclose all members of a monophyletic group. A phylogenetic tree in the Newick format always ends in a semicolon (;). ;
26
The Newick format
One can use the Newick format to write down rooted trees, unrooted trees, multifurcations, branch lengths, and bootstrap values.
27
3 OTUs
1 unrooted tree = 3 rooted trees
28
4 OTUs
3 unrooted trees = 15 rooted trees
29
The number of possible bifurcating The number of possible bifurcating rooted trees (Nrooted trees (NRR) for ) for nn 22OTUsOTUs
NR =(2n 3)!
2n 2(n 2)!
The number of possible bifurcating The number of possible bifurcating unrooted trees (Nunrooted trees (NUU) for ) for nn 33OTUsOTUs
NU =(2n 5)!
2n 3(n 3)!
30
Number of OTUs Number of possible rooted tree
2 13 34 155 1056 9547 10,3958 135,1359 2,027,02510 34,459,42515 213,458,046,676,875
20 8,200,794,532,637,891,559,375
31
Evolution is an historical process.
Only one historical narrative is true.
From 8,200,794,532,637,891,559,375 possibilities, 1 possibility is true and 8,200,794,532,637,891,559,374 are false.
Truth is one, falsehoods are many.
32
How do we know which of the 8,200,794,532,637,891,559,3758,200,794,532,637,891,559,375
trees is true?
33
We don’t, we infer by using decision
criteria.
34
True and inferred trees
The sequence of speciation events that has led to the formation of a group of OTUs is historically unique. A tree representing the true evolutionary history is called the true tree.
A tree that is obtained by using a certain set of data and a certain method of tree reconstruction is called an inferred tree.
An inferred tree may or may NOT be the true tree.
35
ancestor
descendant 1 descendant 2
CladogenesisCladogenesis = the splitting of an evolutionary lineage into two genetically
independent lineages.
36
ancestor
descendant 1 descendant 2
AnagenesisAnagenesis = changes occurring along an evolutionary lineage.
37
In molecular phylogenetics, we assume that species are only created by cladogenesis.
38
A gene tree may differ from a species tree
39
Gene trees and species trees
It is often assumed that gene trees always equal species trees. This may be not be true.
a
b
c
A
B
D
Gene tree Species tree
40
Orthologs and paralogs
a A*b* c BC*
Ancestral gene
Duplication yields 2 copies (paralogs) on the same genome
orthologousorthologous
paralogousA*C*b*
A mixture of orthologs and paralogs is sampled
41
42
A taxon is a species or a group of species that has been given a name, e.g., Homo Homo sapienssapiens (modern humans), or LepidopteraLepidoptera (butterflies), or herbsherbs.
There are codes of biological nomenclature which seek to ensure that every taxon has a single and stable name, and that every name is used for only one taxon.
Taxon Taxon (singular);(singular); Taxa Taxa (plural)(plural)
43
• Strictly: A clade is a group of all the taxa that have been derived from a common ancestor plus the common ancestor itself.
• In molecular phylogenetics: A clade is a group of taxa under study that share a common ancestor, which is not shared by any other species outside the group.
Clades*
*also: monophyletic groups, natural clades
44
• A taxon whose common ancestor is shared by any other taxon is called a paraphyletic taxon or an invalid taxon.
Paraphyletic Taxa
Reptiles are paraphyletic.
44
45
• A named taxon that lacks phylogenetic validity, but is nonetheless used, is called a convenience taxon.
“a convenience fish”
Fish (Pisces)
46
• If a clade is composed of two taxa, these are referred to as sister taxa.
Sister Taxa
Birds and crocodiles are sister taxa.
47Phenotypic distance
= clades
48
Which of the following groups are not monophyletic?
E. coli rat mouse baboon chimp human
a. human, chimpanzee, baboon b. mouse, chimpanzee, baboonc. rat, moused. human, chimpanzee, baboon, rat, mousee. E. coli, human, chimpanzee, baboon, rat, mouse
49
Which of the following groups are not monophyletic?
E. coli rat mouse baboon chimp human
a. human, chimpanzee, baboon b. mouse, chimpanzee, baboonc. rat, moused. human, chimpanzee, baboon, rat, mousee. E. coli, human, chimpanzee, baboon, rat, mouse
50
51
A character provides
information about an
individual OTU.
A distance represents a quantitative
statement concerning the dissimilarity between two
OTUs.
52
A character is a well-defined feature that in a taxonomic unit can assume one out of two or more mutually exclusive character states.
Mutually exclusive: If David is tall, David cannot be short.
53
54
55
Continuous Discrete
BinaryMultistate
Unordered
UnpolarPolarUnpolarPolar
Character
Ordered
56
A character is unordered if a change from one character state to any other character state can occur in one step.
57
A character is ordered if there exists a unique symmetrical path of change from one character state to another.
58
Polar
A character is polar if there exists a unique asymmetrical (irreversible) path of change from one character state to another.
59
In partially ordered characters the number of steps varies for the different pairwise combinations of character states, but no definite relationship exists between the number of steps and the character-state.
Amino-acid sites are partially ordered characters. An amino acid cannot change into all other amino acids in a singe step, as sometimes 2 or 3 steps are required. For example, a tyrosine may only changeinto a leucine through an intermediate state, i.e., phenylalanine or histidine.
60
The number of steps in partially ordered characters is specified by a step matrix, the elements of which indicate the number of steps required between any two character states
61
62
Assumptions about character evolution
Methods of phylogenetic reconstruction require that we make explicit assumptions about:
(1) the number of discrete steps required for one character state to change into another.
(2) the probability with which such a change may occur.
63
Temporal Polarity of Character States
Character states may be ranked by relative antiquity into:
(1) primitive or ancestral (plesiomorphy)
(2) derived or novel (apomorphy)
64
Taxonomic Distribution of Character States
A primitive state that is shared by several taxa is a symplesiomorphy.
A derived state that is shared by several taxa is a synapomorphy.
A derived character state unique to a particular taxon is an autapomorphy.
A character state that is shared by several taxa due to convergence, parallelism and reversals, rather than due to common descent, is a homoplasy.
sympathysynapsesyllablesystem
65
C C
C
A
A
A
B A A
A
B
plesiomorphy
apomorphy(autapomorphy)
synapomorphysymplesiomorphy
homoplasy
A
D
66
67
Distance Data
68
69
Most molecular data yield character states that are subsequently converted into distances.
70
Some molecular data can only be expressed as distances.
71
72
73
74
+