Rasmus Group Presentation
Transcript of Rasmus Group Presentation
-
8/14/2019 Rasmus Group Presentation
1/39
A phylogeographic
distance metric forProject leader: Dr Rasmus Hovmoller
Group members:
John Christensen, Shishi Luo, Jacob Porter,
-
8/14/2019 Rasmus Group Presentation
2/39
Talk outline
Project background and goals
Theory and calculations
Results Conclusions
-
8/14/2019 Rasmus Group Presentation
3/39
Influenza
H5N1 Endemic in bird population
Bird-to-human transmission possible
H3N2 Seasonal flu
Human-to-human transmission only
-
8/14/2019 Rasmus Group Presentation
4/39
Influenza
Wild aquatic birds,reservoir for allinfluenza subtypes
Domestic fowl, eg Humans, eg H3N2
-
8/14/2019 Rasmus Group Presentation
5/39
Global distribution of H5N1 andH N2
-
8/14/2019 Rasmus Group Presentation
6/39
Goal
To compare and contrast the patternsof geographic spread of two types ofinfluenza. calculate a statistic for correlation
between the distance between the virusisolates in the phylogenetic tree(patristic distance) and their actualgeographical distances.
-
8/14/2019 Rasmus Group Presentation
7/39
Goal
Create datasets
Make trees
Calculate patristic distances between
all pairs of sequences Collect geographic metadata
Calculate geographic distances
between all pairs of sequences Calculate correlation coefficient and
determine significance
-
8/14/2019 Rasmus Group Presentation
8/39
Databases and software
Genetic data from GenBank
Phylogenetic trees generated in RAxML, TNT, Mr
Bayes
Integrative Tree of Life (iTOL) to visualize trees Excel, Unix to manipulate data
Patristic distances calculated in Matlab
Geographic data (longitude, latitude) fromGenbank data
-
8/14/2019 Rasmus Group Presentation
9/39
Types of phylogenetic trees
Parsimony tree is the tree thatrequires the least evolutionarychange to explain given data
Maximum likelihood tree is the treewhich has the maximum likelihoodover all possible topologies under the
specified evolution model
-
8/14/2019 Rasmus Group Presentation
10/39
Types of phylogenetic trees
Parsimony
Optimality criterion: search for most
simple tree
Equal branch lengths
Doesn't work for horizontal gene transfer
Doesn't show genes lost during evolutionprocess
-
8/14/2019 Rasmus Group Presentation
11/39
Types of phylogenetic trees
Maximum Likelihood
- Evolution is characterized by acontinuous Markov chain
- Evolution model is a substitution ratematrix
- Branch lengths show geneticdistances
- Doesnt work with big datasets
-
8/14/2019 Rasmus Group Presentation
12/39
Types of phylogenetic trees
Improving of Maximum likelihood
(mixed model):
- Applying maximum likelihood onreasonable randomized parsimony
starting trees
- Using loop-level parallelism in thelikelihood functions
-
8/14/2019 Rasmus Group Presentation
13/39
H3N2 ML vs. parsimony
-
8/14/2019 Rasmus Group Presentation
14/39
H5N1 ML vs. parsimony
-
8/14/2019 Rasmus Group Presentation
15/39
Calculating correlation
Patristic distance Geographic distance
B
C
(Assume all branch
B
-
8/14/2019 Rasmus Group Presentation
16/39
Geographic
z
d=cos-
d is the shortestdistancebetween two pointsalongthe surface of thesphere.
corresponds tolatitude corresponds tolongitudeand is the anglebetween the
prime meridian and a
-
8/14/2019 Rasmus Group Presentation
17/39
-
8/14/2019 Rasmus Group Presentation
18/39
Testing the significance of
GCC does not have same null-hypothesis distribution as the usualPearsons correlation coefficient, r
Use permutation distribution instead Since data set large, used random
sample of permutations
2774! > 108000
1646! >H
H
-
8/14/2019 Rasmus Group Presentation
19/39
Testing the significance of
H0: no significant correlation
H1: significant positive correlation
Reject H0 for sufficiently small p-value.
P-value: proportion of permutation GCC> observed GCC
-
8/14/2019 Rasmus Group Presentation
20/39
Smal
f
GCObserved
f
GCObserved
f
GCObserved
Easy to calculate
exact p-value
Medium
n
Computationallintensive to
Large n
p-value should beestimated
p-
-
8/14/2019 Rasmus Group Presentation
21/39
P Value Computation:
Non-standardized GCC
Permuted non-
Phillip Good. Permutation, Parametric, and Bootstrap Tests ofHypotheses. 3rdEdition. Springer: 2005Critchlow, et al. Some Statistical Methods for PhylogeneticTrees withApplication to HIV Disease. Mathematical and Computer Modeling 32
-
8/14/2019 Rasmus Group Presentation
22/39
P Value Computation: R# X, Y are the matrices to correlate
# perm is a permutation of row indices computed with the Rsample function# n is the number of rows of the matrix
covPerm
-
8/14/2019 Rasmus Group Presentation
23/39
P Value Computation:
-
8/14/2019 Rasmus Group Presentation
24/39
Asia Scatter
-
8/14/2019 Rasmus Group Presentation
25/39
Asia Scatter
-
8/14/2019 Rasmus Group Presentation
26/39
H3N2 phylogeny colored by
-
8/14/2019 Rasmus Group Presentation
27/39
H5N1 Ph l ith G hi
-
8/14/2019 Rasmus Group Presentation
28/39
H5N1 Phylogeny with GeographicL ti n
-
8/14/2019 Rasmus Group Presentation
29/39
H3N2 Ph logen ith Geographic
-
8/14/2019 Rasmus Group Presentation
30/39
H3N2 Phylogeny with GeographicL ti n
-
8/14/2019 Rasmus Group Presentation
31/39
Conclusions
-
8/14/2019 Rasmus Group Presentation
32/39
H5N1
Conclusion:
There is no significant relationshipbetween patristic distance andgeographical distance.
Explanation:(1)Its bird flu. Its much easier for birds
to migrate among countries of Asia.
(2)The H5N1 strain is fast-mutating.
1. The Asian subset of data
-
8/14/2019 Rasmus Group Presentation
33/39
Conclusion:
There is some relationship between
two kinds of distance. Explanation:
Its a human influenza virus. The
migration of humans is not frequentamong countries of Asia.
H3N2
-
8/14/2019 Rasmus Group Presentation
34/39
ConfusingH5N1
Maximum Likelihood tree: norelationship.
Parsimony tree: some
relationship.H3N2
Maximum Likelihood tree: no
2. The global set of data
-
8/14/2019 Rasmus Group Presentation
35/39
H5N1: no significantrelationship
H3N2: some relationship
he result of Europeansubset of data is
3. The European subset of
-
8/14/2019 Rasmus Group Presentation
36/39
-
8/14/2019 Rasmus Group Presentation
37/39
Tree algorithms:
1) Parsimony tree: set all branchlengths equal to 1.
2) Maximum likelihood tree: Its
computational intensive, we haveto stop it before finding the besttree.
Hypothesis Test: We sampled avery small proportion of the
-
8/14/2019 Rasmus Group Presentation
38/39
Our results are consistent in smalldata sets:
H5N1: no significant relationship
H3N2: some relationship
Thus, they are persuasive.
On the other hand, the result forglobal data set is confusing, we needto do further research.
Conclusion
-
8/14/2019 Rasmus Group Presentation
39/39
Questions?