Phylogenetic Analysis Gabor T. Marth Department of Biology, Boston College [email protected] BI420 –...
-
Upload
kelley-hampton -
Category
Documents
-
view
221 -
download
3
Transcript of Phylogenetic Analysis Gabor T. Marth Department of Biology, Boston College [email protected] BI420 –...
Phylogenetic Analysis
Gabor T. Marth
Department of Biology, Boston [email protected]
BI420 – Introduction to Bioinformatics
Figures from Higgs & Attwood
The goals of phylogenetics
To understand the evolutionary relationships among species, e.g.- the order in which they diverged- the time since divergence
The assumptions in phylogenetics
1. Any group of organisms are related to each
other by descent from a common ancestor
2. The relationships between organisms are
described by a bifurcating tree
3. Change in characteristics between organisms
occurs over time
Phylogenetic “objects”
taxonclade
node
branch
Phylogenetic tree
Constructing an evolutionary tree
Step 2. Construction of multiple sequence alignment
Step 1. Selection of appropriate sequences
Step 3. Calculation of pair-wise evolutionary distances
Step 4. Tree construction
Step 5. Tree evaluation
1. Sequence selection
• find sequences with an appropriate amount of divergence: there can be too little or too much divergence (e.g. genes identical across taxa, or non-conserved genomic sequence)• try to select orthologous sequences to make sure that the genes used for tree construction are likely to have preserved functions
2. Multiple alignment
(mitochondrial small subunit RNA gene)
• informative sites• alignment editing• mechanics of multiple alignment construction covered in earlier classes in the course
3. Pair-wise distance
• measures how diverged two sequences are:
ACGCGTTATTACAGTTGACTACACGTTATGACAGTTGACT
2 differences in 20bp D = 2/20 = 0.1 (10% divergence)
Jukes-Cantor (JC) d = -3/4 ln(1-D*4/3) = 0.10732 (evolutionary distance)
• how evolutionarily distant two sequences are:
Pair-wise distances
Pair-wise JC distance matrix
More complex substitution models
• substitutions between less similar residues indicate more divergence than between more similar residues (hydrophobic vs. hydrophilic)
A C G TA - 2 1 2C 2 - 2 1G 1 2 - 2T 2 1 2 -
ACGCGTTATTACAGTTGACTACACGTTATGACAGTTGACT A/G (1) + T/G (2) diff = 3
• amino acid substitution matrices (e.g. PAM, BLOSUM)
4. Tree construction
• goal is to group (cluster) sequences in a hierarchical fashion• each step creates a “node” that represents the common ancestor of all the species/sequences within the group
CA of group containing (A,B)
CA of group containing (A,B,C,D)
CA of group containing (A,B)
UPGMA method for phylogeny construction
UPGMA (unweighted pair-group method with arithmetic mean) is conceptually very simple
Step #1. Cluster two nodes with the shortest distance: e.g. if d(C,D) is lower than d(A,B), d(A,C), etc. then group C and D together. CD is now a new “node”
Step #2. re-calculate distance between new node CD and all other current node, e.g.:d(CD, A) = ½ * (d(C,A) + d(D,A))
Go to Step #1. until every node is clustered into a single group
CD
Example
UPGMA phylogeny from a given distance matrix
First cluster: Chimp + Pygmy chimp
Example (cont’d)
After performing the complete clustering with UPGMA, we get the following rooted tree:
There are many other tree-building methods (see Higgs & Attwood)
Branch lengths
ultra-metricity
additivity
Rooted vs. un-rooted trees
Tree rooted with an outgroup (rodents)
5. Tree evaluation
• Goal: to evaluate the strength of the phylogenetic signal in the data and the robustness of the tree
• Bootstrapping: re-sample the original columns of the alignment with replacement, and produce a random, artificial alignment
Bootstrap support
• Report: for each node, the %-age of times resampled alignments produced the same tree topology (from that node down to the leaves)
strong bootstrap support
weak bootstrap support