1 Summary on similarity search or Why do we care about far homologies ? A protein from a new...
-
Upload
stephany-harmon -
Category
Documents
-
view
218 -
download
0
Transcript of 1 Summary on similarity search or Why do we care about far homologies ? A protein from a new...
![Page 1: 1 Summary on similarity search or Why do we care about far homologies ? A protein from a new pathogenic bacteria. We have no idea what it does A protein.](https://reader031.fdocuments.net/reader031/viewer/2022032205/56649eb55503460f94bbdf11/html5/thumbnails/1.jpg)
1
Summary on similarity searchor
Why do we care about far homologies ?
A protein from a new pathogenic
bacteria.We have no idea
what it does
A protein from a model organism.We know what it does but we do not know who
does the same in human?
A protein related to a disease
We have no idea what it does
in relation to the disease
![Page 2: 1 Summary on similarity search or Why do we care about far homologies ? A protein from a new pathogenic bacteria. We have no idea what it does A protein.](https://reader031.fdocuments.net/reader031/viewer/2022032205/56649eb55503460f94bbdf11/html5/thumbnails/2.jpg)
retinol-binding protein
odorant-binding protein
apolipoprotein D
![Page 3: 1 Summary on similarity search or Why do we care about far homologies ? A protein from a new pathogenic bacteria. We have no idea what it does A protein.](https://reader031.fdocuments.net/reader031/viewer/2022032205/56649eb55503460f94bbdf11/html5/thumbnails/3.jpg)
RBP4 and obesity
retinol-binding protein
odorant-binding protein
apolipoprotein D
![Page 4: 1 Summary on similarity search or Why do we care about far homologies ? A protein from a new pathogenic bacteria. We have no idea what it does A protein.](https://reader031.fdocuments.net/reader031/viewer/2022032205/56649eb55503460f94bbdf11/html5/thumbnails/4.jpg)
Scoring matrices let you focus on the big (or small) picture
retinol-binding proteinretinol-binding
protein
PAM250
PAM30
Blosum45
Blosum80
![Page 5: 1 Summary on similarity search or Why do we care about far homologies ? A protein from a new pathogenic bacteria. We have no idea what it does A protein.](https://reader031.fdocuments.net/reader031/viewer/2022032205/56649eb55503460f94bbdf11/html5/thumbnails/5.jpg)
PSI-BLAST generates scoring matrices more powerful than PAM or BLOSUM
retinol-binding protein
retinol-binding protein
![Page 6: 1 Summary on similarity search or Why do we care about far homologies ? A protein from a new pathogenic bacteria. We have no idea what it does A protein.](https://reader031.fdocuments.net/reader031/viewer/2022032205/56649eb55503460f94bbdf11/html5/thumbnails/6.jpg)
Phylogenetic trees
![Page 7: 1 Summary on similarity search or Why do we care about far homologies ? A protein from a new pathogenic bacteria. We have no idea what it does A protein.](https://reader031.fdocuments.net/reader031/viewer/2022032205/56649eb55503460f94bbdf11/html5/thumbnails/7.jpg)
7
Phylogeny is the inference of evolutionary relationships.Traditionally, phylogeny relied on the comparison of morphological features between organisms. Today, molecular sequence data are mainly used for phylogenetic analyses.
One tree of life A sketch Darwin madesoon after returning from his voyage onHMS Beagle (1831–36) showed his thinkingabout the diversification of speciesfrom a single stock (see Figure, overleaf).This branching, extended by the conceptof common descent,
Phylogeny in Greek =the origin of the tribe
![Page 8: 1 Summary on similarity search or Why do we care about far homologies ? A protein from a new pathogenic bacteria. We have no idea what it does A protein.](https://reader031.fdocuments.net/reader031/viewer/2022032205/56649eb55503460f94bbdf11/html5/thumbnails/8.jpg)
8
Haeckel (1879) Pace (2001)
![Page 9: 1 Summary on similarity search or Why do we care about far homologies ? A protein from a new pathogenic bacteria. We have no idea what it does A protein.](https://reader031.fdocuments.net/reader031/viewer/2022032205/56649eb55503460f94bbdf11/html5/thumbnails/9.jpg)
9
Molecular phylogeny uses trees to depict evolutionaryrelationships among organisms. These trees are based upon DNA and protein sequence data
Human
Chimpanzee
Gorilla
Orangutan
Gorilla
Chimpanzee
Orangutan
Human
Molecular analysis:Chimpanzee is related more closely
to human than the gorilla
Pre-Molecular analysis:The great apes
(chimpanzee, Gorilla & orangutan)Separate from the human
![Page 10: 1 Summary on similarity search or Why do we care about far homologies ? A protein from a new pathogenic bacteria. We have no idea what it does A protein.](https://reader031.fdocuments.net/reader031/viewer/2022032205/56649eb55503460f94bbdf11/html5/thumbnails/10.jpg)
10
What can we learn from phylogenetics tree?
![Page 11: 1 Summary on similarity search or Why do we care about far homologies ? A protein from a new pathogenic bacteria. We have no idea what it does A protein.](https://reader031.fdocuments.net/reader031/viewer/2022032205/56649eb55503460f94bbdf11/html5/thumbnails/11.jpg)
• Was the extinct quagga more like a zebra or a horse?
Determine the closest relatives of one organism in which we are interested
![Page 12: 1 Summary on similarity search or Why do we care about far homologies ? A protein from a new pathogenic bacteria. We have no idea what it does A protein.](https://reader031.fdocuments.net/reader031/viewer/2022032205/56649eb55503460f94bbdf11/html5/thumbnails/12.jpg)
12
Which species are closest to Human?
Human
Chimpanzee
Gorilla
Orangutan
Gorilla
Chimpanzee
Orangutan
Human
![Page 13: 1 Summary on similarity search or Why do we care about far homologies ? A protein from a new pathogenic bacteria. We have no idea what it does A protein.](https://reader031.fdocuments.net/reader031/viewer/2022032205/56649eb55503460f94bbdf11/html5/thumbnails/13.jpg)
13
Human Evolution
ModernMan
Neanderthals
![Page 14: 1 Summary on similarity search or Why do we care about far homologies ? A protein from a new pathogenic bacteria. We have no idea what it does A protein.](https://reader031.fdocuments.net/reader031/viewer/2022032205/56649eb55503460f94bbdf11/html5/thumbnails/14.jpg)
14
Example Metagenomics
A new field in genomics aims the study the genomes recovered from environmental samples.
A powerful tool to access the wealthy biodiversity of native environmental samples
Help to find the relationship between the species and identify new species
![Page 15: 1 Summary on similarity search or Why do we care about far homologies ? A protein from a new pathogenic bacteria. We have no idea what it does A protein.](https://reader031.fdocuments.net/reader031/viewer/2022032205/56649eb55503460f94bbdf11/html5/thumbnails/15.jpg)
106 cells/ ml seawater107 virus particles/ ml seawater
>99% uncultivated microbes
How can we discover new species in the ocean?
![Page 16: 1 Summary on similarity search or Why do we care about far homologies ? A protein from a new pathogenic bacteria. We have no idea what it does A protein.](https://reader031.fdocuments.net/reader031/viewer/2022032205/56649eb55503460f94bbdf11/html5/thumbnails/16.jpg)
16
Relationships can be represented by Phylogenetic Tree or Dendrogram
A B C D
E
F
![Page 17: 1 Summary on similarity search or Why do we care about far homologies ? A protein from a new pathogenic bacteria. We have no idea what it does A protein.](https://reader031.fdocuments.net/reader031/viewer/2022032205/56649eb55503460f94bbdf11/html5/thumbnails/17.jpg)
17
Phylogenetic Tree Terminology
• Graph composed of nodes & branches
• Each branch connects two adjacent nodes
A B C D
E
F
R
![Page 18: 1 Summary on similarity search or Why do we care about far homologies ? A protein from a new pathogenic bacteria. We have no idea what it does A protein.](https://reader031.fdocuments.net/reader031/viewer/2022032205/56649eb55503460f94bbdf11/html5/thumbnails/18.jpg)
18
Rooted tree
Human
Chimp
Chicken
Gorilla
Human ChimpChicken Gorilla
Un-rooted tree
Phylogenetic Tree Terminology
![Page 19: 1 Summary on similarity search or Why do we care about far homologies ? A protein from a new pathogenic bacteria. We have no idea what it does A protein.](https://reader031.fdocuments.net/reader031/viewer/2022032205/56649eb55503460f94bbdf11/html5/thumbnails/19.jpg)
19
Rooted vs. unrooted trees
1
2
3
3 1
2
![Page 20: 1 Summary on similarity search or Why do we care about far homologies ? A protein from a new pathogenic bacteria. We have no idea what it does A protein.](https://reader031.fdocuments.net/reader031/viewer/2022032205/56649eb55503460f94bbdf11/html5/thumbnails/20.jpg)
20
How can we build a tree with molecular data?
-Trees based on DNA sequence (rRNA)-Trees based on Protein sequences
![Page 21: 1 Summary on similarity search or Why do we care about far homologies ? A protein from a new pathogenic bacteria. We have no idea what it does A protein.](https://reader031.fdocuments.net/reader031/viewer/2022032205/56649eb55503460f94bbdf11/html5/thumbnails/21.jpg)
Basic algorithm forconstructing a rooted tree
Unweighted Pair Group Method using Arithmetic Averages
(UPGMA)Assumption: Divergence of sequences is assumed to occur at a constant rate Distance to root is equal
Sequence a ACGCGTTGGGCGATGGCAACSequence b ACGCGTTGGGCGACGGTAATSequence c ACGCATTGAATGATGATAATSequence d ACACATTGAGTGTGATAATA
a b c d
![Page 22: 1 Summary on similarity search or Why do we care about far homologies ? A protein from a new pathogenic bacteria. We have no idea what it does A protein.](https://reader031.fdocuments.net/reader031/viewer/2022032205/56649eb55503460f94bbdf11/html5/thumbnails/22.jpg)
22
a b c d
a 0 8 7 5
b 8 0 3 9
c 7 3 0 8
d 5 9 8 0
Moving from Similarity to Distance
Distance Table
Sequence a ACGCGTTGGGCGATGGCAACSequence b ACACATTGAGTGTGATCAACSequence c ACACATTGAGTGAGGACAACSequence d ACGCGTTGGGCGACGGTAAT
Distances *
Sequences
Dab = 8Dac = 7Dad = 5Dbc = 3Dbd = 9 Dcd = 8
* Can be calculated using different distance metrics
![Page 23: 1 Summary on similarity search or Why do we care about far homologies ? A protein from a new pathogenic bacteria. We have no idea what it does A protein.](https://reader031.fdocuments.net/reader031/viewer/2022032205/56649eb55503460f94bbdf11/html5/thumbnails/23.jpg)
23
a b c d
a 0 8 7 5
b 8 0 3 9
c 7 3 0 8
d 5 9 8 0
a
d
c
b
Step 1:Choose the nodes with the shortest distance and fuse them.
Constructing a tree starting from a STAR model
![Page 24: 1 Summary on similarity search or Why do we care about far homologies ? A protein from a new pathogenic bacteria. We have no idea what it does A protein.](https://reader031.fdocuments.net/reader031/viewer/2022032205/56649eb55503460f94bbdf11/html5/thumbnails/24.jpg)
24
Step 2: recalculate the distance between the rest of the remaining sequences (a and d) to the new node (e) and remove the fused nodesfrom the table.
dc,b e
aa d e
a 0 5 6
d 5 0 7
e 6 7 0
D (ea) = (D(ac)+ D(ab)-D(cb))/2
D (ed) = (D(dc)+ D(db)-D(cb))/2
a b c d
a 0 8 7 5
b 8 0 3 9
c 7 3 0 8
d 5 9 8 0
![Page 25: 1 Summary on similarity search or Why do we care about far homologies ? A protein from a new pathogenic bacteria. We have no idea what it does A protein.](https://reader031.fdocuments.net/reader031/viewer/2022032205/56649eb55503460f94bbdf11/html5/thumbnails/25.jpg)
25
!!!The distances Dce and Dde are calculated assuming constant rate evolution
d
c
e
a
a d e
a 0 5 6
d 5 0 7
e 6 7 0 b
Dce
Dde
Step 3: In order to get a tree, un-fuse c and b by calculating their distance to the new node (e)
![Page 26: 1 Summary on similarity search or Why do we care about far homologies ? A protein from a new pathogenic bacteria. We have no idea what it does A protein.](https://reader031.fdocuments.net/reader031/viewer/2022032205/56649eb55503460f94bbdf11/html5/thumbnails/26.jpg)
26
a,d
c
ea d e
a 0 5 6
d 5 0 7
e 6 7 0 b
Dce
Dde
f
Next…
We want to fuse the next closest nodes
![Page 27: 1 Summary on similarity search or Why do we care about far homologies ? A protein from a new pathogenic bacteria. We have no idea what it does A protein.](https://reader031.fdocuments.net/reader031/viewer/2022032205/56649eb55503460f94bbdf11/html5/thumbnails/27.jpg)
27
ac
ef e
f 0 4
e 4 0
b
Daf
Dde
f
d
Dce
Dbf
Finally
D (ef) = (D(ea)+ D(ed)-D(ad))/2
We need to calculate the distance between e and f
![Page 28: 1 Summary on similarity search or Why do we care about far homologies ? A protein from a new pathogenic bacteria. We have no idea what it does A protein.](https://reader031.fdocuments.net/reader031/viewer/2022032205/56649eb55503460f94bbdf11/html5/thumbnails/28.jpg)
28
a
d
c
b
acb d
fe
From a Star to a tree
![Page 29: 1 Summary on similarity search or Why do we care about far homologies ? A protein from a new pathogenic bacteria. We have no idea what it does A protein.](https://reader031.fdocuments.net/reader031/viewer/2022032205/56649eb55503460f94bbdf11/html5/thumbnails/29.jpg)
29
IMPORTANT !!!•Usually we don’t assume a constant mutation rate
and in order to choose the nodes to fuse we have to calculate the relative distance of each node to all other nodes .
Neighbor Joining (NJ)- is an algorithm which is suitable to cases when the rate of evolution varies
![Page 30: 1 Summary on similarity search or Why do we care about far homologies ? A protein from a new pathogenic bacteria. We have no idea what it does A protein.](https://reader031.fdocuments.net/reader031/viewer/2022032205/56649eb55503460f94bbdf11/html5/thumbnails/30.jpg)
30
Human Evolution Tree
Neighbor JoiningUPGMA
![Page 31: 1 Summary on similarity search or Why do we care about far homologies ? A protein from a new pathogenic bacteria. We have no idea what it does A protein.](https://reader031.fdocuments.net/reader031/viewer/2022032205/56649eb55503460f94bbdf11/html5/thumbnails/31.jpg)
The down side of phylogenetic trees
- Using different regions from a same alignment may produce different trees.
![Page 32: 1 Summary on similarity search or Why do we care about far homologies ? A protein from a new pathogenic bacteria. We have no idea what it does A protein.](https://reader031.fdocuments.net/reader031/viewer/2022032205/56649eb55503460f94bbdf11/html5/thumbnails/32.jpg)
Problems with phylogenetic trees
1
7
3
5
6
2
4
0.2
Bacillus
E.coli
Pseudomonas
Salmonella
Aeromonas
Lechevaliera
Burkholderias
![Page 33: 1 Summary on similarity search or Why do we care about far homologies ? A protein from a new pathogenic bacteria. We have no idea what it does A protein.](https://reader031.fdocuments.net/reader031/viewer/2022032205/56649eb55503460f94bbdf11/html5/thumbnails/33.jpg)
1
7
5
3
6
2
4
0.2
Bacillus
1
3
7
5
6
2
4
0.2
1
5
3
7
6
2
4
0.2
3
5
7
1
6
2
4
0.2
Bacillus
Bacillus
Bacillus
E.coli
E.coli E.coli
E.coli
Pseudomonas
Pseudomonas
Pseudomonas
Pseudomonas
Salmonella
Salmonella Salmonella
Salmonella
Aeromonas
Aeromonas
Aeromonas
Aeromonas
Lechevaliera
Lechevaliera
Lechevaliera
Lechevaliera
Burkholderias
Burkholderias
Burkholderias
Burkholderias
Problems with phylogenetic trees
![Page 34: 1 Summary on similarity search or Why do we care about far homologies ? A protein from a new pathogenic bacteria. We have no idea what it does A protein.](https://reader031.fdocuments.net/reader031/viewer/2022032205/56649eb55503460f94bbdf11/html5/thumbnails/34.jpg)
Problems with phylogenetic trees
• What to do ?
![Page 35: 1 Summary on similarity search or Why do we care about far homologies ? A protein from a new pathogenic bacteria. We have no idea what it does A protein.](https://reader031.fdocuments.net/reader031/viewer/2022032205/56649eb55503460f94bbdf11/html5/thumbnails/35.jpg)
35
A.We create new data sets by sampling N positions with replacement.
B.We generate 100 - 1000 such pseudo-data sets. C.For each such data set we reconstruct a tree, using the
same method.D.We note the agreement between the tree reconstructed
from the pseudo-data set to the original tree.
Note: we do not change the number of sequences !
Bootstrapping
![Page 36: 1 Summary on similarity search or Why do we care about far homologies ? A protein from a new pathogenic bacteria. We have no idea what it does A protein.](https://reader031.fdocuments.net/reader031/viewer/2022032205/56649eb55503460f94bbdf11/html5/thumbnails/36.jpg)
1
3
7
5
6
2
477
100
83
58
0.2
Pseudomonas
Burkholderias
E.coli
Salmonella
Lechevaliera
Aeromonas
Bacillus
Bootstrapped tree
Less reliable Branch
Highly reliable branch
![Page 37: 1 Summary on similarity search or Why do we care about far homologies ? A protein from a new pathogenic bacteria. We have no idea what it does A protein.](https://reader031.fdocuments.net/reader031/viewer/2022032205/56649eb55503460f94bbdf11/html5/thumbnails/37.jpg)
37
Open Questions
• Do DNA and proteins from the same gene produce different trees ?
• Can different genes have different evolutionary history ?
![Page 38: 1 Summary on similarity search or Why do we care about far homologies ? A protein from a new pathogenic bacteria. We have no idea what it does A protein.](https://reader031.fdocuments.net/reader031/viewer/2022032205/56649eb55503460f94bbdf11/html5/thumbnails/38.jpg)
38