A (short) introduction to...
Transcript of A (short) introduction to...
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
A (short) introduction to phylogenetics
Thibaut Jombart, Marie-Pauline Beugin
MRC Centre for Outbreak Analysis and ModellingImperial College London
Genetic data analysis withPR∼Statistics, Millport Field Station
16 Aug 2016
1/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
OutlineContext
Phylogenies...
Distance trees
Parsimony
Likelihood/Bayesian
Uncertainty
Pitfalls & best practices
And more...
2/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
OutlineContext
Phylogenies...
Distance trees
Parsimony
Likelihood/Bayesian
Uncertainty
Pitfalls & best practices
And more...
3/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
Phylogenetics: from the origins...
C. Darwin, Notebook, 1837.
’From the first growth of the tree, manya limb and branch has decayed anddropped off; and these fallen branches ofvarious sizes may represent those wholeorders, families, and genera which havenow no living representatives, and whichare known to us only in a fossil state.’
4/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
Phylogenetics: ...to the present
Bininda-Emonds et al., 2007,Nature.
• phylogenetic trees are part of thestandard toolbox of genetic dataanalysis
• represent the evolutionary historyof a set of (sampled) taxa
5/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
And the main difference is...
Current trees look better!
(and some other minor differences)
6/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
And the main difference is...
Current trees look better!
(and some other minor differences)
6/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
And the main difference is...
Current trees look better!
(and some other minor differences)
6/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
About the minor differences...
• DNA sequencing revolution
• huge data banks freely available(e.g. GenBank)
• easier, cheaper, faster to obtainDNA sequences
• increasing number of full genomesavailable
Different ways to exploit this information.
7/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
About the minor differences...
• DNA sequencing revolution
• huge data banks freely available(e.g. GenBank)
• easier, cheaper, faster to obtainDNA sequences
• increasing number of full genomesavailable
Different ways to exploit this information.
7/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
OutlineContext
Phylogenies...
Distance trees
Parsimony
Likelihood/Bayesian
Uncertainty
Pitfalls & best practices
And more...
8/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
Genetic changes (substitution) accumulate over time
Substitution: replacement of a nucleotide (e.g. a → t)
...attgtacgtaggatctagg...
...attgaacgtaggatctagg......attgtacgtaccatctagg...
...atcgtacgtaccatctggg...
...attgaacgtagttgctagg...
Tim
e
Substitution patterns reflect the evolutionary history
9/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
Genetic changes (substitution) accumulate over time
Substitution: replacement of a nucleotide (e.g. a → t)
...attgaacgtagttgctagg...
...atcgtacgtaccatctggg...
...attgaacgtaggatctagg......attgtacgtaccatctagg...
Tim
e...attgtacgtaggatctagg...
Substitution patterns reflect the evolutionary history
9/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
Genetic changes (substitution) accumulate over time
Substitution: replacement of a nucleotide (e.g. a → t)
...attgaacgtagttgctagg...
...atcgtacgtaccatctggg...Tim
e...attgtacgtaggatctagg...
...attgaacgtaggatctagg......attgtacgtaccatctagg...
Substitution patterns reflect the evolutionary history
9/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
Genetic changes (substitution) accumulate over time
Substitution: replacement of a nucleotide (e.g. a → t)
Tim
e...attgtacgtaggatctagg...
...attgaacgtaggatctagg......attgtacgtaccatctagg...
...atcgtacgtaccatctggg...
...attgaacgtagttgctagg...
Substitution patterns reflect the evolutionary history
9/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
Genetic changes (substitution) accumulate over time
Substitution: replacement of a nucleotide (e.g. a → t)
Tim
e...attgtacgtaggatctagg...
...attgaacgtaggatctagg......attgtacgtaccatctagg...
...atcgtacgtaccatctggg...
...attgaacgtagttgctagg...
Substitution patterns reflect the evolutionary history
9/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
Genetic changes (substitution) accumulate over time
Substitution: replacement of a nucleotide (e.g. a → t)
Tim
e...attgtacgtaggatctagg...
...attgaacgtaggatctagg......attgtacgtaccatctagg...
...atcgtacgtaccatctggg...
...attgaacgtagttgctagg...
Substitution patterns reflect the evolutionary history
9/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
Using substitution patterns to reconstruct the evolutionaryhistory
Reconstructed phylogeny
...attaaacgtaggatctagg...
...attaaacgtaggatctagg...
...attcatacgtaggatcagg...
...attgtacgtaggatctttt...
...attgtacgtaggatctttt...
...attgcatgtaggatctttt...
Phylogenetics aim to reconstruct evolutionary trees (phylogenies)from genetic sequence data.
10/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
Using substitution patterns to reconstruct the evolutionaryhistory
...attaaacgtaggatctagg...
...attaaacgtaggatctagg...
...attcatacgtaggatcagg...
...attgtacgtaggatctttt...
...attgtacgtaggatctttt...
...attgcatgtaggatctttt... Reconstructed phylogeny
Phylogenetics aim to reconstruct evolutionary trees (phylogenies)from genetic sequence data.
10/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
Using substitution patterns to reconstruct the evolutionaryhistory
...attaaacgtaggatctagg...
...attaaacgtaggatctagg...
...attcatacgtaggatcagg...
...attgtacgtaggatctttt...
...attgtacgtaggatctttt...
...attgcatgtaggatctttt... Reconstructed phylogeny
Phylogenetics aim to reconstruct evolutionary trees (phylogenies)from genetic sequence data.
10/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
Using trees to represent the evolutionary history
How to we build them?
11/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
Using trees to represent the evolutionary history
- "taxa"- "tips"- "leaves"- "Operational Taxonomic Units (OTUs)"
How to we build them?
11/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
Using trees to represent the evolutionary historyMost Recent Common Ancestors
(MRCA)
- "nodes"- "Hypothetical Taxonomic Units (HTUs)"
How to we build them?
11/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
Using trees to represent the evolutionary history
- "edge"- length = amount of evolution (not time, as a rule)- length is optional
branch
How to we build them?
11/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
Using trees to represent the evolutionary history
- "patristic" distance: sum of branch lengths- other measures of distance/dissimilarity- vertical axis meaningless
distances between tips
How to we build them?
11/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
Using trees to represent the evolutionary history
Root
- oldest part of the tree- defined using an outgroup- optional
How to we build them?
11/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
Using trees to represent the evolutionary history
Root
- oldest part of the tree- defined using an outgroup- optional
How to we build them?
11/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
WorkflowPrepare data
• align sequences: alignment software + manual refinement
Build the tree
• distance-based methods
• maximum parsimony
• likelihood-based methods (ML, Bayesian)
Analyse the tree
• assess uncertainty
• test phylogenetic signal
• model trait evolution
• ... 12/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
WorkflowPrepare data
• align sequences: alignment software + manual refinement
Build the tree
• distance-based methods
• maximum parsimony
• likelihood-based methods (ML, Bayesian)
Analyse the tree
• assess uncertainty
• test phylogenetic signal
• model trait evolution
• ... 12/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
WorkflowPrepare data
• align sequences: alignment software + manual refinement
Build the tree
• distance-based methods
• maximum parsimony
• likelihood-based methods (ML, Bayesian)
Analyse the tree
• assess uncertainty
• test phylogenetic signal
• model trait evolution
• ... 12/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
WorkflowPrepare data
• align sequences: alignment software + manual refinement
Build the tree
• distance-based methods
• maximum parsimony
• likelihood-based methods (ML, Bayesian)
Analyse the tree
• assess uncertainty
• test phylogenetic signal
• model trait evolution
• ... 12/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
OutlineContext
Phylogenies...
Distance trees
Parsimony
Likelihood/Bayesian
Uncertainty
Pitfalls & best practices
And more...
13/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
Distance-based phylogenetic reconstruction
Approaches relying on agglomerative clustering algorithms(e.g. Single linkage, UPGMA, Neighbor-Joining)
Rationale
1. compute pairwise genetic distances D
2. group closest sequences
3. update D
4. go back to 2) until all sequences are grouped
14/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
Distance-based phylogenetic reconstruction
15/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
Distance-based phylogenetic reconstruction
15/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
Distance-based phylogenetic reconstruction
15/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
Distance-based phylogenetic reconstruction
15/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
Distance-based phylogenetic reconstruction
15/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
Distance-based phylogenetic reconstruction
15/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
What is the distance between a node and tips?
Hierarchical clustering:
k
i jDi,jg
D =...k,g
• single linkage: Dk,g = min(Dk,i, Dk,j)
• complete linkage: Dk,g = max(Dk,i, Dk,j)
• UPGMA: Dk,g =Dk,i+Dk,j
2
Neighbor joining:Transforms original distances to account for heterogeneous rates ofevolution.
16/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
What is the distance between a node and tips?
Hierarchical clustering:
k
i jDi,jg
D =...k,g
• single linkage: Dk,g = min(Dk,i, Dk,j)
• complete linkage: Dk,g = max(Dk,i, Dk,j)
• UPGMA: Dk,g =Dk,i+Dk,j
2
Neighbor joining:Transforms original distances to account for heterogeneous rates ofevolution.
16/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
What is the distance between a node and tips?
Hierarchical clustering:
k
i jDi,jg
D =...k,g
• single linkage: Dk,g = min(Dk,i, Dk,j)
• complete linkage: Dk,g = max(Dk,i, Dk,j)
• UPGMA: Dk,g =Dk,i+Dk,j
2
Neighbor joining:Transforms original distances to account for heterogeneous rates ofevolution.
16/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
What is the distance between a node and tips?
Hierarchical clustering:
k
i jDi,jg
D =...k,g
• single linkage: Dk,g = min(Dk,i, Dk,j)
• complete linkage: Dk,g = max(Dk,i, Dk,j)
• UPGMA: Dk,g =Dk,i+Dk,j
2
Neighbor joining:Transforms original distances to account for heterogeneous rates ofevolution.
16/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
What is the distance between a node and tips?
Hierarchical clustering:
k
i jDi,jg
D =...k,g
• single linkage: Dk,g = min(Dk,i, Dk,j)
• complete linkage: Dk,g = max(Dk,i, Dk,j)
• UPGMA: Dk,g =Dk,i+Dk,j
2
Neighbor joining:Transforms original distances to account for heterogeneous rates ofevolution.
16/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
Distance-based phylogenetic reconstruction
Advantages
• simple
• flexible (many distances and clustering algorithms)
• fast and scalable (applicable to large datasets)
Limitations
• sensitive to distance/clustering chosen
• evolutionary rates are not estimated
• no measure of uncertainty for the tree obtained
17/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
Distance-based phylogenetic reconstruction
Advantages
• simple
• flexible (many distances and clustering algorithms)
• fast and scalable (applicable to large datasets)
Limitations
• sensitive to distance/clustering chosen
• evolutionary rates are not estimated
• no measure of uncertainty for the tree obtained
17/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
OutlineContext
Phylogenies...
Distance trees
Parsimony
Likelihood/Bayesian
Uncertainty
Pitfalls & best practices
And more...
18/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
Maximum parsimony phylogenies
Approaches relying on finding the tree with the smallest numberof character changes (substitutions)
Rationale
1. start from a pre-defined tree
2. compute initial parsimony score
3. permute branches and compute parsimony score
4. accept new tree if the parsimony score is improved
5. go back to 3) until convergence
19/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
Maximum parsimony phylogenies
score: 8
score: 5
score: 6
Initial tree
20/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
Maximum parsimony phylogenies
score: 5
score: 6
Initial tree
score: 8
20/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
Maximum parsimony phylogenies
score: 5
score: 6
Initial tree
score: 8
20/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
Maximum parsimony phylogenies
score: 6
Initial tree
score: 5
score: 8
20/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
Maximum parsimony phylogenies
score: 6
Initial tree
score: 5
score: 8
20/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
Maximum parsimony phylogenies
Advantages
• applicable to any discontinuous characters (not just DNA)
• intuitive explanation: ‘simplest’ evolutionary scenario
Limitations
• evolutionary rates are not estimated
• no measure of uncertainty for the tree obtained
• computer-intensive
• different types of substitutions ignored
• evolution not necessarily parsimonious
• sensitive to heterogeneous rates of evolution (long branchattraction)
21/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
Maximum parsimony phylogenies
Advantages
• applicable to any discontinuous characters (not just DNA)
• intuitive explanation: ‘simplest’ evolutionary scenario
Limitations
• evolutionary rates are not estimated
• no measure of uncertainty for the tree obtained
• computer-intensive
• different types of substitutions ignored
• evolution not necessarily parsimonious
• sensitive to heterogeneous rates of evolution (long branchattraction)
21/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
OutlineContext
Phylogenies...
Distance trees
Parsimony
Likelihood/Bayesian
Uncertainty
Pitfalls & best practices
And more...
22/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
Likelihood-based phylogenies (ML / Bayesian)
Approaches relying on a model of sequence evolution:
• ML: find tree and evolutionary rates with highest likelihood
• Bayesian: find tree and evolutionary rates to posteriorprobability
Rationale
1. start from a pre-defined tree
2. compute initial likelihood/posterior
3. permute branches, sample new parameters and computelikelihood/posterior
4. accept new tree and parameters based on likelihood/posteriorimprovement
5. go back to 3) until convergence
23/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
Likelihood-based phylogenies (ML / Bayesian)
Approaches relying on a model of sequence evolution:
• ML: find tree and evolutionary rates with highest likelihood
• Bayesian: find tree and evolutionary rates to posteriorprobability
Rationale
1. start from a pre-defined tree
2. compute initial likelihood/posterior
3. permute branches, sample new parameters and computelikelihood/posterior
4. accept new tree and parameters based on likelihood/posteriorimprovement
5. go back to 3) until convergence
23/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
Likelihood-based phylogenies (ML / Bayesian)
Den
sity
Likelihood / Posterior
24/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
Likelihood-based phylogenies (ML / Bayesian)
Advantages
• very flexible
• consistent with a model of evolution
• statistically consistent (model comparison)
• parameter estimation
• (Bayesian) several trees → measure of uncertainty
Limitations
• computer-intensive
• choice of the model of evolution
• (ML) no measure of uncertainty for the tree obtained
• (Bayesian) need to find a consensus tree
25/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
Likelihood-based phylogenies (ML / Bayesian)
Advantages
• very flexible
• consistent with a model of evolution
• statistically consistent (model comparison)
• parameter estimation
• (Bayesian) several trees → measure of uncertainty
Limitations
• computer-intensive
• choice of the model of evolution
• (ML) no measure of uncertainty for the tree obtained
• (Bayesian) need to find a consensus tree
25/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
OutlineContext
Phylogenies...
Distance trees
Parsimony
Likelihood/Bayesian
Uncertainty
Pitfalls & best practices
And more...
26/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
How do we know the tree is robust?
Main issue: assess the uncertainty of the tree topology /individual nodes
Approaches
• ML: model selection to comparetrees (whole tree)
• Bayesian methods:between-samples variability(individual nodes)
• any method:bootstrap (individual nodes)
27/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
How do we know the tree is robust?
Main issue: assess the uncertainty of the tree topology /individual nodes
Approaches
• ML: model selection to comparetrees (whole tree)
• Bayesian methods:between-samples variability(individual nodes)
• any method:bootstrap (individual nodes)
27/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
How do we know the tree is robust?
Main issue: assess the uncertainty of the tree topology /individual nodes
Approaches
• ML: model selection to comparetrees (whole tree)
• Bayesian methods:between-samples variability(individual nodes)
• any method:bootstrap (individual nodes)
27/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
How do we know the tree is robust?
Main issue: assess the uncertainty of the tree topology /individual nodes
Approaches
• ML: model selection to comparetrees (whole tree)
• Bayesian methods:between-samples variability(individual nodes)
• any method:bootstrap (individual nodes)
27/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
How do we know the tree is robust?
Main issue: assess the uncertainty of the tree topology /individual nodes
Approaches
• ML: model selection to comparetrees (whole tree)
• Bayesian methods:between-samples variability(individual nodes)
• any method:bootstrap (individual nodes)
27/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
Bootstrapping phylogenies
• assess variability due to sampling the genome andconflicting signals
• relies on analysing resampled datasets
Rationale
1. obtain a reference tree
2. resample the sites with replacement
3. obtain a tree for the resampled dataset
4. go back to 2) until the desired number of bootstrapped treesis attained
5. compute the frequency of each bifurcation of the referencetree occuring in bootstrapped trees
28/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
Bootstrapping phylogenies
• assess variability due to sampling the genome andconflicting signals
• relies on analysing resampled datasets
Rationale
1. obtain a reference tree
2. resample the sites with replacement
3. obtain a tree for the resampled dataset
4. go back to 2) until the desired number of bootstrapped treesis attained
5. compute the frequency of each bifurcation of the referencetree occuring in bootstrapped trees
28/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
Bootstrapping phylogenies
...ttttaaaccatggatctagg...
...ttttaaaccatggatctagg...
...atttcatacgtaggatcagg...
...aaatgtaccatggatcttgt...
...aaatgtaccatggatcttgt...
...aaatgcatcatggatcttgt...
...attaaacgtaggatctagg...
...attaaacgtaggatctagg...
...attcatacgtaggatcagg...
...attgtacgtaggatctttt...
...attgtacgtaggatctttt...
...attgcatgtaggatctttt... Reconstructed phylogeny
...ttttaaaccatggatctagg...
...ttttaaaccatggatctagg...
...atttcatacgtaggatcagg...
...aaatgtaccatggatcttgt...
...aaatgtaccatggatcttgt...
...aaatgcatcatggatcttgt...Reconstructed phylogeny
Reconstructed phylogenyReconstructed phylogeny
Reconstructed phylogenyReconstructed phylogeny
sampling sites withreplacement
compare topologies
...ttttaaaccatggatctagg...
...ttttaaaccatggatctagg...
...atttcatacgtaggatcagg...
...aaatgtaccatggatcttgt...
...aaatgtaccatggatcttgt...
...aaatgcatcatggatcttgt...
...ttttaaaccatggatctagg...
...ttttaaaccatggatctagg...
...atttcatacgtaggatcagg...
...aaatgtaccatggatcttgt...
...aaatgtaccatggatcttgt...
...aaatgcatcatggatcttgt...
...ttttaaaccatggatctagg...
...ttttaaaccatggatctagg...
...atttcatacgtaggatcagg...
...aaatgtaccatggatcttgt...
...aaatgtaccatggatcttgt...
...aaatgcatcatggatcttgt...
29/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
Bootstrapping phylogenies
Advantages
• standard
• simple to implement
Limitations
• possibly computer-intensive
• assumes that the genome has been sampled randomly (oftenwrong)
30/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
Bootstrapping phylogenies
Advantages
• standard
• simple to implement
Limitations
• possibly computer-intensive
• assumes that the genome has been sampled randomly (oftenwrong)
30/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
OutlineContext
Phylogenies...
Distance trees
Parsimony
Likelihood/Bayesian
Uncertainty
Pitfalls & best practices
And more...
31/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
Plotting trees as rooted
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
1
23
4
5
6
7
8
9
10
11
12
13
14
15
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
1
23
4
5
6
7
89
10
11
12
13
14
15
1
234
5
6
7
8
9
10
11
12
13
14
15
Never plot an unrooted tree as rooted.
32/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
Plotting trees as rooted
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
1
23
4
5
6
7
8
9
10
11
12
13
14
15
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
1
23
4
5
6
7
89
10
11
12
13
14
15
1
234
5
6
7
8
9
10
11
12
13
14
15
Never plot an unrooted tree as rooted.
32/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
Interpreting distances
33/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
Interpreting distances
33/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
Interpreting distances
meaningful distance = sum of branch lengths
meanin
gle
ss
33/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
The paradox of divergent clusters
MRCA and genetic distances may give different information.
34/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
The paradox of divergent clusters
D(red, red)>D(red,blue)
MRCA and genetic distances may give different information.
34/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
The paradox of divergent clusters
D(red, red)>D(red,blue)
MRCA and genetic distances may give different information.
34/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
Taking uncertainty into account
193 HIV−1 sequences from DRC(Strimmer & Pybus 2001)
0.2 0.15 0.1 0.05 0
At best, the tree is an estimate of the likely evolutionary history ofthe taxa studied.
35/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
Taking uncertainty into account
193 HIV−1 sequences from DRC(Strimmer & Pybus 2001)
0.2 0.15 0.1 0.05 0
Collapsed tree (threshold length 0.01)
0.2 0.15 0.1 0.05 0
At best, the tree is an estimate of the likely evolutionary history ofthe taxa studied.
35/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
Taking uncertainty into account
193 HIV−1 sequences from DRC(Strimmer & Pybus 2001)
0.2 0.15 0.1 0.05 0
Collapsed tree (threshold length 0.01)
0.2 0.15 0.1 0.05 0
At best, the tree is an estimate of the likely evolutionary history ofthe taxa studied.
35/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
(Over, Mis)Interpreting temporal trends
t30t46t33t64t25
t99 t73t8t81t31
t76t91t100
t42t77
t74t98 t65t79t36t58
t32t26t59t16t54t10
t18t11t94t5t45
t60t53t67t88
t27t52t38t40t22
t90t80t37t12t93t78
t49t82
t50 t57t71t21t7t1t84
t70
t20t97t41t43t19t47t28t17t66t55 t34t61t35
t3t2t89t75t86
t15t13t23
t56
t96t24t44
t39t63t69
t83t72t29t14t62t6
t48t4
t87t92t9t95 t68t51t85
12 10 8 6 4 2 0
“Time trees” only make sense under a near-perfect molecular clock.
36/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
(Over, Mis)Interpreting temporal trends
t30t46t33t64t25t99 t73t8t81
t31
t76 t91t100t42t77
t74t98 t65t79t36t58
t32t26t59t16t54t10
t18t11t94t5t45t60 t53t67t88 t27t52t38t40t22
t90t80t37t12t93t78
t49t82
t50 t57t71t21t7t1t84
t70
t20t97t41t43t19t47t28t17 t66t55 t34t61t35t3
t2t89t75t86t15t13t23
t56
t96 t24t44t39 t63t69t83t72t29t14
t62t6t48t4
t87t92t9t95 t68t51t85
12 10 8 6 4 2 0 0 20 40 60 80
02
46
810
12
Time to the root
Mut
atio
ns to
the
root
> anova(lm(d.root~-1+t.root))
anova(lm(d.root~-1+t.root))Analysis of Variance Table
Response: d.root Df Sum Sq Mean Sq F value Pr(>F) t.root 1 5434.7 5434.7 214.66 < 2.2e-16 ***Residuals 99 2506.4 25.3
“Time trees” only make sense under a near-perfect molecular clock.
36/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
(Over, Mis)Interpreting temporal trends
t30t46t33t64t25t99 t73t8t81
t31
t76 t91t100t42t77
t74t98 t65t79t36t58
t32t26t59t16t54t10
t18t11t94t5t45t60 t53t67t88 t27t52t38t40t22
t90t80t37t12t93t78
t49t82
t50 t57t71t21t7t1t84
t70
t20t97t41t43t19t47t28t17 t66t55 t34t61t35t3
t2t89t75t86t15t13t23
t56
t96 t24t44t39 t63t69t83t72t29t14
t62t6t48t4
t87t92t9t95 t68t51t85
12 10 8 6 4 2 0 0 20 40 60 80
02
46
810
12
Time to the root
Mut
atio
ns to
the
root
> anova(lm(d.root~-1+t.root))
anova(lm(d.root~-1+t.root))Analysis of Variance Table
Response: d.root Df Sum Sq Mean Sq F value Pr(>F) t.root 1 5434.7 5434.7 214.66 < 2.2e-16 ***Residuals 99 2506.4 25.3
“Time trees” only make sense under a near-perfect molecular clock.
36/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
OutlineContext
Phylogenies...
Distance trees
Parsimony
Likelihood/Bayesian
Uncertainty
Pitfalls & best practices
And more...
37/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
This is only the beginning
Many things can be done with trees
• estimate divergence time
• model trait evolution (phylogeneticcomparative method)
• reconstruct ancestral states
• measure diversity
• infer past demographics/effectivepopulation size (coalescence)
• ...
• and also, other approaches thanphylogenetics to analyse geneticdata
38/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
This is only the beginning
Many things can be done with trees
• estimate divergence time
• model trait evolution (phylogeneticcomparative method)
• reconstruct ancestral states
• measure diversity
• infer past demographics/effectivepopulation size (coalescence)
• ...
• and also, other approaches thanphylogenetics to analyse geneticdata
38/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
This is only the beginning
Many things can be done with trees
• estimate divergence time
• model trait evolution (phylogeneticcomparative method)
• reconstruct ancestral states
• measure diversity
• infer past demographics/effectivepopulation size (coalescence)
• ...
• and also, other approaches thanphylogenetics to analyse geneticdata
38/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
This is only the beginning
Many things can be done with trees
• estimate divergence time
• model trait evolution (phylogeneticcomparative method)
• reconstruct ancestral states
• measure diversity
• infer past demographics/effectivepopulation size (coalescence)
• ...
• and also, other approaches thanphylogenetics to analyse geneticdata
38/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
This is only the beginning
Many things can be done with trees
• estimate divergence time
• model trait evolution (phylogeneticcomparative method)
• reconstruct ancestral states
• measure diversity
• infer past demographics/effectivepopulation size (coalescence)
• ...
• and also, other approaches thanphylogenetics to analyse geneticdata
38/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
This is only the beginning
Many things can be done with trees
• estimate divergence time
• model trait evolution (phylogeneticcomparative method)
• reconstruct ancestral states
• measure diversity
• infer past demographics/effectivepopulation size (coalescence)
• ...
• and also, other approaches thanphylogenetics to analyse geneticdata
38/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
This is only the beginning
Many things can be done with trees
• estimate divergence time
• model trait evolution (phylogeneticcomparative method)
• reconstruct ancestral states
• measure diversity
• infer past demographics/effectivepopulation size (coalescence)
• ...
• and also, other approaches thanphylogenetics to analyse geneticdata
38/39
Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...
Time to get your hands dirty!
The pdf of the practical is online:
http://adegenet.r-forge.r-project.org/
or
Google → adegenet → documents → “GDAR August 2016”
39/39