1
Multiple Sequence Alignment
Sequences
> Yeast YOR020cmstllksaksivplmdrvlvqrikaqaktasglylpeknveklnqaevvavgpgftdangnkvvpqvkvgdqvlipqfggstiklgnddevilfrdaeilakiakd
> Neurospora crassamattvrsvksliplldrvlvqrvkaeaktasgiflpessvkdlneakvlavgpgaldkdgkrlpmgvnagdrvlipqyggspvkvgeeeytlfrdseilakiae
> Aspergillus nidulansmsllrnvknlaplldrvlvqrvkpeaktasgiflpessvkeqneakvlavgpgavdrngqripmgvaagdrvlvpqfggsplkigeeeyhlfrdseilakine
> Schizosaccharomyces pombe (fission yeast)matklksaksivplldrilvqrikadtktasgiflpeksveklsegrvisvgkggynkegklaqpsvavgdrvllpayggsnikvgeeeyslyrdhellaiike
> Mortierella alpinamasritkfsktivpmmdrvlvqrikpqqktasgiyipekaqealnegyvvavgkglttqegkvvpselaegdkvllppyggsvvkvdneelilfreseilakiq
> Crypthecodinium cohniimatgiakrftplldrvlvqrlkpeaktasglflpesaakapnyatvlavgpggrtrdgdilpmnvkvgdkvvvpeyggmtlkfedeefqvfrdadimgilne
> Drosophila melanogastermaaaikkiipmldriliqraealtktkggivlpekavgkvlegtvlavgpgtrnastgnhipigvkegdrvllpefggtkvnlegdqkelflfresdilakle
> Homo sapiensagqafrkflplfdrvlversaaetvtkggimlpeksqgkvlqatvvavgsgskgkggeiqpvsvkvgdkvllpeyggtkvvlddkdyflfrdgxilgky
> Geobacillus stearothermophilusvlkplgdrvvievieteektasgivlpdtakekpqegrvvavgkgrvldsgervapevevgdriifskyagtevkydgkeylilresdilavig
> Mycobacterium tuberculosismakvnikpledkilvqaneaetttasglvipdtakekpqegtvvavgpgrwdedgekripldvaegdtviyskyggteikyngeeylilsardvlavvsk
> Mus musculus (house mouse)magqafrkflllfdrvlversaaetvtkggimlpeksqgkvlqatvvavgsggkgksgeiepvsvkvgdkvllp
eyggtkvvlddkdyflfrdsdilgkyvn
2
Multiple Sequence Alignment(MSA)
Why MSA?
– Selection of sequences
– Multiple sequence alignment of sequences
– Tree building
– Tree evaluation
• Proteins are often related to a larger group (i.e., a family) of proteins
• Multiple sequence alignment is more sensitive thanpairwise alignment for detecting homologs
• MSAs can elucidate conserved residues, motifs, or other functional regions in a protein
• MSA is critical for phylogenetic analysis
3
Pairwise Alignment
0 5 4 6
0 0 10 4
0
0
3-sequence Alignment
5 0 0
0 0 0
G
A
A
A G T
TCC
AGA
AGT
TCC
4
Sequences
> Yeast YOR020cmstllksaksivplmdrvlvqrikaqaktasglylpeknveklnqaevvavgpgftdangnkvvpqvkvgdqvlipqfggstiklgnddevilfrdaeilakiakd
> Neurospora crassamattvrsvksliplldrvlvqrvkaeaktasgiflpessvkdlneakvlavgpgaldkdgkrlpmgvnagdrvlipqyggspvkvgeeeytlfrdseilakiae
> Aspergillus nidulansmsllrnvknlaplldrvlvqrvkpeaktasgiflpessvkeqneakvlavgpgavdrngqripmgvaagdrvlvpqfggsplkigeeeyhlfrdseilakine
> Schizosaccharomyces pombe (fission yeast)matklksaksivplldrilvqrikadtktasgiflpeksveklsegrvisvgkggynkegklaqpsvavgdrvllpayggsnikvgeeeyslyrdhellaiike
> Mortierella alpinamasritkfsktivpmmdrvlvqrikpqqktasgiyipekaqealnegyvvavgkglttqegkvvpselaegdkvllppyggsvvkvdneelilfreseilakiq
> Crypthecodinium cohniimatgiakrftplldrvlvqrlkpeaktasglflpesaakapnyatvlavgpggrtrdgdilpmnvkvgdkvvvpeyggmtlkfedeefqvfrdadimgilne
> Drosophila melanogastermaaaikkiipmldriliqraealtktkggivlpekavgkvlegtvlavgpgtrnastgnhipigvkegdrvllpefggtkvnlegdqkelflfresdilakle
> Homo sapiensagqafrkflplfdrvlversaaetvtkggimlpeksqgkvlqatvvavgsgskgkggeiqpvsvkvgdkvllpeyggtkvvlddkdyflfrdgxilgky
> Geobacillus stearothermophilusvlkplgdrvvievieteektasgivlpdtakekpqegrvvavgkgrvldsgervapevevgdriifskyagtevkydgkeylilresdilavig
> Mycobacterium tuberculosismakvnikpledkilvqaneaetttasglvipdtakekpqegtvvavgpgrwdedgekripldvaegdtviyskyggteikyngeeylilsardvlavvsk
> Mus musculus (house mouse)magqafrkflllfdrvlversaaetvtkggimlpeksqgkvlqatvvavgsggkgksgeiepvsvkvgdkvllp
eyggtkvvlddkdyflfrdsdilgkyvn
Multiple Sequence Alignment
5
Pairwise Alignment Scores
Yeast
Neurospora
Aspergillus
Schizosaccharomyces
Mortierella
Crypthecodinium
Drosophila
Homo
Geobacillus
Mycobacterium
Mus
Yea
st
Neu
rosp
ora
Asp
ergi
llus
Schi
zosc
chrm
ycs
Mor
tiere
lla
Cry
pthe
codi
nium
Dro
soph
ila
Hom
o
Geo
baci
llus
Myc
obac
teriu
m
Mus
49 46 78 45 55 54 44 38 37 4252 41 40 43 46 44 41 39 43
43 48 45 45 40 40 38 3942 53 55 41 41 40 40
43 46 40 43 38 3961 43 34 36 45
49 42 36 4937 32 93
59 3832
Guide Tree
Neu
rosp
ora
Asp
ergi
llus
Yea
st
Schi
zosa
ccha
rom
yces
Cry
pthe
codi
nium
Dro
soph
ila
Geo
baci
llus
Myc
obac
teriu
m
Mor
tiere
lla
Hom
o
Mus
6
• Unweighted pair group method with arithmetic mean (UPGMA)
• Neighbor joining (NJ)
Constructing a Guide Tree
• Assume each organism is its own group
• Repeat the following step
– Merge together the two closest groups
Unweighted Pair Group Method with Arithmetic mean (UPGMA)
7
Unweighted Pair Group Method with Arithmetic mean (UPGMA)
Neu
rosp
ora
Asp
ergi
llus
Yea
st
Schi
zosa
ccha
rom
yces
Cry
pthe
codi
nium
Dro
soph
ila
Geo
baci
llus
Myc
obac
teriu
m
Mor
tiere
lla
Hom
o
Mus
Yeast
Neurospora
Aspergillus Schizosaccharomyces
Mortierella
Crypthecodinium
Drosophila
Homo
Geobacillus
Mycobacterium
Mus
Unweighted Pair Group Method with Arithmetic mean (UPGMA)
Neu
rosp
ora
Asp
ergi
llus
Yea
st
Schi
zosa
ccha
rom
yces
Cry
pthe
codi
nium
Dro
soph
ila
Geo
baci
llus
Myc
obac
teriu
m
Mor
tiere
lla
Hom
o
Mus
Yeast
Neurospora
Aspergillus Schizosaccharomyces
Mortierella
Crypthecodinium
Drosophila
Homo
Geobacillus
Mycobacterium
Mus
8
Unweighted Pair Group Method with Arithmetic mean (UPGMA)
Neu
rosp
ora
Asp
ergi
llus
Yea
st
Schi
zosa
ccha
rom
yces
Cry
pthe
codi
nium
Dro
soph
ila
Geo
baci
llus
Myc
obac
teriu
m
Mor
tiere
lla
Hom
o
Mus
Yeast
Neurospora
Aspergillus Schizosaccharomyces
Mortierella
Crypthecodinium
Drosophila
Homo
Geobacillus
Mycobacterium
Mus
Unweighted Pair Group Method with Arithmetic mean (UPGMA)
Neu
rosp
ora
Asp
ergi
llus
Yea
st
Schi
zosa
ccha
rom
yces
Cry
pthe
codi
nium
Dro
soph
ila
Geo
baci
llus
Myc
obac
teriu
m
Mor
tiere
lla
Hom
o
Mus
Yeast
Neurospora
Aspergillus Schizosaccharomyces
Mortierella
Crypthecodinium
Drosophila
Homo
Geobacillus
Mycobacterium
Mus
9
Unweighted Pair Group Method with Arithmetic mean (UPGMA)
Neu
rosp
ora
Asp
ergi
llus
Yea
st
Schi
zosa
ccha
rom
yces
Cry
pthe
codi
nium
Dro
soph
ila
Geo
baci
llus
Myc
obac
teriu
m
Mor
tiere
lla
Hom
o
Mus
Yeast
Neurospora
Aspergillus Schizosaccharomyces
Mortierella
Crypthecodinium
Drosophila
Homo
Geobacillus
Mycobacterium
Mus
Unweighted Pair Group Method with Arithmetic mean (UPGMA)
Neu
rosp
ora
Asp
ergi
llus
Yea
st
Schi
zosa
ccha
rom
yces
Cry
pthe
codi
nium
Dro
soph
ila
Geo
baci
llus
Myc
obac
teriu
m
Mor
tiere
lla
Hom
o
Mus
Yeast
Neurospora
Aspergillus Schizosaccharomyces
Mortierella
Crypthecodinium
Drosophila
Homo
Geobacillus
Mycobacterium
Mus
10
Unweighted Pair Group Method with Arithmetic mean (UPGMA)
Neu
rosp
ora
Asp
ergi
llus
Yea
st
Schi
zosa
ccha
rom
yces
Cry
pthe
codi
nium
Dro
soph
ila
Geo
baci
llus
Myc
obac
teriu
m
Mor
tiere
lla
Hom
o
Mus
Yeast
Neurospora
Aspergillus Schizosaccharomyces
Mortierella
Crypthecodinium
Drosophila
Homo
Geobacillus
Mycobacterium
Mus
Unweighted Pair Group Method with Arithmetic mean (UPGMA)
Neu
rosp
ora
Asp
ergi
llus
Yea
st
Schi
zosa
ccha
rom
yces
Cry
pthe
codi
nium
Dro
soph
ila
Geo
baci
llus
Myc
obac
teriu
m
Mor
tiere
lla
Hom
o
Mus
Yeast
Neurospora
Aspergillus Schizosaccharomyces
Mortierella
Crypthecodinium
Drosophila
Homo
Geobacillus
Mycobacterium
Mus
11
Unweighted Pair Group Method with Arithmetic mean (UPGMA)
Neu
rosp
ora
Asp
ergi
llus
Yea
st
Schi
zosa
ccha
rom
yces
Cry
pthe
codi
nium
Dro
soph
ila
Geo
baci
llus
Myc
obac
teriu
m
Mor
tiere
lla
Hom
o
Mus
Yeast
Neurospora
Aspergillus Schizosaccharomyces
Mortierella
Crypthecodinium
Drosophila
Homo
Geobacillus
Mycobacterium
Mus
Unweighted Pair Group Method with Arithmetic mean (UPGMA)
Neu
rosp
ora
Asp
ergi
llus
Yea
st
Schi
zosa
ccha
rom
yces
Cry
pthe
codi
nium
Dro
soph
ila
Geo
baci
llus
Myc
obac
teriu
m
Mor
tiere
lla
Hom
o
Mus
Yeast
Neurospora
Aspergillus Schizosaccharomyces
Mortierella
Crypthecodinium
Drosophila
Homo
Geobacillus
Mycobacterium
Mus
12
Unweighted Pair Group Method with Arithmetic mean (UPGMA)
Neu
rosp
ora
Asp
ergi
llus
Yea
st
Schi
zosa
ccha
rom
yces
Cry
pthe
codi
nium
Dro
soph
ila
Geo
baci
llus
Myc
obac
teriu
m
Mor
tiere
lla
Hom
o
Mus
Yeast
Neurospora
Aspergillus Schizosaccharomyces
Mortierella
Crypthecodinium
Drosophila
Homo
Geobacillus
Mycobacterium
Mus
Guide Tree
Neu
rosp
ora
Asp
ergi
llus
Yea
st
Schi
zosa
ccha
rom
yces
Cry
pthe
codi
nium
Dro
soph
ila
Geo
baci
llus
Myc
obac
teriu
m
Mor
tiere
lla
Hom
o
Mus
13
• Generate full tree with starlike structure
• Repeat the following step
– Connect two closest groups (i.e., neighbors) through a single node
Neighbor Joining (NJ)
Neighbor Joining
Yeast
Neurospora Aspergillus
Schizosaccharomyces
Mortierella
Crypthecodinium
DrosophilaHomo
GeobacillusMycobacterium
Mus
14
Neighbor Joining
Yeast
Neurospora Aspergillus
Schizosaccharomyces
Mortierella
Crypthecodinium
DrosophilaHomo
GeobacillusMycobacterium
Mus
Neighbor Joining
Yeast
Neurospora Aspergillus
Schizosaccharomyces
Mortierella
Crypthecodinium
DrosophilaHomo
GeobacillusMycobacterium
Mus
15
Neighbor Joining
Yeast
Neurospora Aspergillus
Schizosaccharomyces
Mortierella
Crypthecodinium
DrosophilaHomo
GeobacillusMycobacterium
Mus
Neighbor Joining
Yeast
Neurospora Aspergillus
Schizosaccharomyces
Mortierella
Crypthecodinium
DrosophilaHomo
GeobacillusMycobacterium
Mus
16
Neighbor Joining
Yeast
Neurospora Aspergillus
Schizosaccharomyces
Mortierella
Crypthecodinium
DrosophilaHomo
GeobacillusMycobacterium
Mus
Neighbor Joining
Yeast
Neurospora Aspergillus
Schizosaccharomyces
Mortierella
Crypthecodinium
DrosophilaHomo
GeobacillusMycobacterium
Mus
17
Neighbor Joining
Yeast
Neurospora Aspergillus
Schizosaccharomyces
Mortierella
Crypthecodinium
DrosophilaHomo
GeobacillusMycobacterium
Mus
Neighbor Joining
Yeast
Neurospora Aspergillus
Schizosaccharomyces
Mortierella
Crypthecodinium
DrosophilaHomo
GeobacillusMycobacterium
Mus
18
Neighbor Joining
Yeast
Neurospora Aspergillus
Schizosaccharomyces
Mortierella
Crypthecodinium
DrosophilaHomo
GeobacillusMycobacterium
Mus
Multiple Sequence Alignment
19
Multiple Sequence Alignment
Multiple Sequence Alignment
20
What can phylogeny do for you?
Why do we care about evolution and the evolutionary history of organisms?
OR
How do we benefit from phylogeny? AND
How is bioinformatics related to any of this?
What are the goals of phylogeny?
1) Deduce correct trees of life for all species
2) Infer or estimate divergence times
All life forms share a common origin and are part of the Tree of Life
How can we use phylogenetic analyses?
21
Revolutionalizing the Tree of Life
Carl Woese:rRNA IDs Archaea as
separate branch of Tree of Life
Discovering new life forms
22
Developing effective snakebite antivenins
Identifying emergent diseases
23
Protecting ecosystems from invasive species
Caulerpa taxifoliaPurple loosetrife
Eurasian water milfoil
A
B
C
D
EAncestral Nodeor ROOT of
the TreeInternal Nodes
hypothetical taxanomicunits (HTUs)
Branches orLineages
Terminal Nodesoperational
taxanomic units (OTUs)
Represent the TAXA(genes, populations,
species, etc.) used to infer the phylogeny
Common phylogenetic tree terminology
24
Phylogenetic trees can be drawn many ways
A
B
C
D
E
Clade: group with a single common ancestor and its descendents
“B-C clade”
“D-E clade”
“A-B-C clade”
25
A
B
C
DRooted
A
B C
D
Unrooted
Shows degree of kinshipDoesn’t make assumptions or require knowledge of
common ancestor
Specifies evolutionary pathRoot node is most recent
common ancestor of all TUs; specifies time flow
Phylogenetic trees can be rooted or unrooted
C
Unscaled
Branch length not proportional to number of
changes/distance
Phylogenetic trees can be scaled or unscaled
A
B C
D
Cladogram
A
B
D
Scaled
Branch length proportional to number of
changes/distance
Phylogram
26
Phylogenetic trees diagram evolutionary relationships
No meaning to thespacing between the
taxa, or to the order inwhich they appear from
top to bottom.
1) No scale (cladograms)2) Proportional to genetic distance (phylograms)3) Proportional to time (ultrametric trees)
E
D
C
B
A
Rotating clades: same meanings
E
D
C
B
A C
B
A
=E
D
27
Interpreting phylogenetic trees
Is the frog more closely related to the fish or the human ?
How are phylogenetic trees built?
- Closely related organisms don’t always look similar- Similar looking organisms not always closely related- How do you decide importance of traits?
Caveats:
Traditionally: use homologous structures
28
Structural analogy can result from convergent evolution
Classification based on traits can be tricky
cell number
organelles
29
Molecular phylogenetic trees
Large molecular data sets: Bioinformatics!
Caveat:
Gene divergence may not correlate with species divergence
Result: great improvement on classical phylogenies
Molecular clock vs. punctuated equilibrium
Eliminates analogy and trait selection issues
Molecular phylogenies can be constructed using different elements
Nuclear genes
Mitochondrial DNA
Genome structure
Usually integrate analyses of multiple different genes
Reasonably well conserved, present in common ancestors
30
==
≠
Molecular comparisons vs. body plans
Which species are the closest living relatives of modern humans?
MYA
Chimpanzees
Orangutans
Humans
Bonobos
Gorillas
014
MitoDNA, most nuclear genes, and DNA hybridization
Bonobos and chimpanzees are related more closely to humans than either are to gorillas.
Humans
Bonobos
Gorillas
Orangutans
Chimpanzees
MYA015-30
Pre-molecular view
Great apes (chimpanzees, gorillas and orangutans) formed a clade separate from humans.
31
What is the closest living relative of whales?
Phylogenetic trees are hypotheses
How do you construct phylogenetic trees?
How do you test the robustness of hypotheses?
What computational strategies are used?
Top Related