CS 598SS Probabilistic Methods in Biological Sequence Analysis Saurabh Sinha.
Sequence Analysis Methods
description
Transcript of Sequence Analysis Methods
![Page 1: Sequence Analysis Methods](https://reader035.fdocuments.net/reader035/viewer/2022081421/568148f9550346895db6207b/html5/thumbnails/1.jpg)
CZ5225: Modeling and Simulation in CZ5225: Modeling and Simulation in BiologyBiology
Lecture 3: Sequence analysis methods Lecture 3: Sequence analysis methods
Prof. Chen Yu ZongProf. Chen Yu Zong
Tel: 6874-6877Tel: 6874-6877Email: Email: [email protected]@nus.edu.sghttp://xin.cz3.nus.edu.sghttp://xin.cz3.nus.edu.sg
Room 07-24, level 7, SOC1, Room 07-24, level 7, SOC1, National University of SingaporeNational University of Singapore
![Page 2: Sequence Analysis Methods](https://reader035.fdocuments.net/reader035/viewer/2022081421/568148f9550346895db6207b/html5/thumbnails/2.jpg)
22
Sequence Analysis Methods
![Page 3: Sequence Analysis Methods](https://reader035.fdocuments.net/reader035/viewer/2022081421/568148f9550346895db6207b/html5/thumbnails/3.jpg)
33
Gene and Protein Sequence Alignment Gene and Protein Sequence Alignment as a Mathematical Problem: as a Mathematical Problem:
Example: Sequence a: ATTCTTGC Sequence b: ATCCTATTCTAGC
Best Alignment: ATTCTTGC
ATCCTATTCTAGC /|\ gap Bad Alignment: AT TCTT GC ATCCTATTCTAGC /|\ /|\ gap gap
What is a good alignment?
![Page 4: Sequence Analysis Methods](https://reader035.fdocuments.net/reader035/viewer/2022081421/568148f9550346895db6207b/html5/thumbnails/4.jpg)
44
How to rate an alignment?How to rate an alignment?• Match: +8 (w(x, y) = 8, if x = y)
• Mismatch: -5 (w(x, y) = -5, if x ≠ y)
• Each gap symbol: -3 (w(-,x)=w(x,-)=-3)
![Page 5: Sequence Analysis Methods](https://reader035.fdocuments.net/reader035/viewer/2022081421/568148f9550346895db6207b/html5/thumbnails/5.jpg)
55
Pairwise AlignmentPairwise AlignmentSequence a: CTTAACTSequence b: CGGATCAT
An alignment of a and b:
C---TTAACTCGGATCA--T
Insertion gap
Match Mismatch
Deletion gap
![Page 6: Sequence Analysis Methods](https://reader035.fdocuments.net/reader035/viewer/2022081421/568148f9550346895db6207b/html5/thumbnails/6.jpg)
66
Alignment GraphAlignment GraphSequence a: CTTAACT
Sequence b: CGGATCATC G G A T C A T
C
T
T
A
A
C
T
C---TTAACTCGGATCA--T
Insertion gap
Deletion gap
![Page 7: Sequence Analysis Methods](https://reader035.fdocuments.net/reader035/viewer/2022081421/568148f9550346895db6207b/html5/thumbnails/7.jpg)
77
Graphic representation of an alignmentGraphic representation of an alignment
Sequence a: CTTAACT Sequence b: CGGATCAT
C
C C---TTAACTCGGATCA--T
![Page 8: Sequence Analysis Methods](https://reader035.fdocuments.net/reader035/viewer/2022081421/568148f9550346895db6207b/html5/thumbnails/8.jpg)
88
Graphic representation of an alignmentGraphic representation of an alignment
Sequence a: CTTAACT Sequence b: CGGATCAT
C G G A
C C---TTAACTCGGATCA--T
![Page 9: Sequence Analysis Methods](https://reader035.fdocuments.net/reader035/viewer/2022081421/568148f9550346895db6207b/html5/thumbnails/9.jpg)
99
Graphic representation of an alignmentGraphic representation of an alignment
Sequence a: CTTAACT Sequence b: CGGATCAT
C G G A T
C
T
C---TTAACTCGGATCA--T
![Page 10: Sequence Analysis Methods](https://reader035.fdocuments.net/reader035/viewer/2022081421/568148f9550346895db6207b/html5/thumbnails/10.jpg)
1010
Graphic representation of an alignmentGraphic representation of an alignment
Sequence a: CTTAACT Sequence b: CGGATCAT
C G G A T C A
C
T
T
A
A
C
C---TTAACTCGGATCA--T
![Page 11: Sequence Analysis Methods](https://reader035.fdocuments.net/reader035/viewer/2022081421/568148f9550346895db6207b/html5/thumbnails/11.jpg)
1111
Graphic representation of an alignmentGraphic representation of an alignment
Sequence a: CTTAACT Sequence b: CGGATCAT
C G G A T C A T
C
T
T
A
A
C
T
C---TTAACTCGGATCA--T
![Page 12: Sequence Analysis Methods](https://reader035.fdocuments.net/reader035/viewer/2022081421/568148f9550346895db6207b/html5/thumbnails/12.jpg)
1212
Pathway of an alignmentPathway of an alignmentSequence a: CTTAACT
Sequence b: CGGATCATC G G A T C A T
C
T
T
A
A
C
T
C---TTAACTCGGATCA--T
![Page 13: Sequence Analysis Methods](https://reader035.fdocuments.net/reader035/viewer/2022081421/568148f9550346895db6207b/html5/thumbnails/13.jpg)
1313
Graphic representation of an alignmentGraphic representation of an alignment
Sequence a: CTTAACT Sequence b: CGGATCAT
C G G A T C A T
C
T
T
A
A
C
T
CTTAACT-CGGATCAT
![Page 14: Sequence Analysis Methods](https://reader035.fdocuments.net/reader035/viewer/2022081421/568148f9550346895db6207b/html5/thumbnails/14.jpg)
1414
Pathway of an alignmentPathway of an alignmentSequence a: CTTAACT
Sequence b: CGGATCATC G G A T C A T
C
T
T
A
A
C
T
CTTAACT-CGGATCAT
![Page 15: Sequence Analysis Methods](https://reader035.fdocuments.net/reader035/viewer/2022081421/568148f9550346895db6207b/html5/thumbnails/15.jpg)
1515
Use of graph to generate alignmentsUse of graph to generate alignments
Sequence a: CTTAACT
Sequence b: CGGATCATC G G A T C A T
C
T
T
A
A
C
T
- CTTAACTCGGATCAT
![Page 16: Sequence Analysis Methods](https://reader035.fdocuments.net/reader035/viewer/2022081421/568148f9550346895db6207b/html5/thumbnails/16.jpg)
1616
Use of graph to generate alignmentsUse of graph to generate alignments
Sequence a: CTTAACT
Sequence b: CGGATCATC G G A T C A T
C
T
T
A
A
C
T
- C - - TTAACTCGGATC - AT -
![Page 17: Sequence Analysis Methods](https://reader035.fdocuments.net/reader035/viewer/2022081421/568148f9550346895db6207b/html5/thumbnails/17.jpg)
1717
Use of graph to generate alignmentsUse of graph to generate alignments
Sequence a: CTTAACT
Sequence b: CGGATCATC G G A T C A T
C
T
T
A
A
C
T
CTTAACT - - -
- - CGGATCAT
![Page 18: Sequence Analysis Methods](https://reader035.fdocuments.net/reader035/viewer/2022081421/568148f9550346895db6207b/html5/thumbnails/18.jpg)
1818
Which pathway is better?Which pathway is better?Sequence a: CTTAACT
Sequence b: CGGATCATC G G A T C A T
C
T
T
A
A
C
T
Multiple pathways
Each with a unique scoring function
![Page 19: Sequence Analysis Methods](https://reader035.fdocuments.net/reader035/viewer/2022081421/568148f9550346895db6207b/html5/thumbnails/19.jpg)
1919
Alignment ScoreAlignment ScoreSequence a: CTTAACT
Sequence b: CGGATCAT
8
C G G A T C A T
C
T
T
A
A
C
T
C---TTAACTCGGATCA--T
![Page 20: Sequence Analysis Methods](https://reader035.fdocuments.net/reader035/viewer/2022081421/568148f9550346895db6207b/html5/thumbnails/20.jpg)
2020
Alignment ScoreAlignment ScoreSequence a: CTTAACT
Sequence b: CGGATCAT
8
8-3
=5
C G G A T C A T
C
T
T
A
A
C
T
C---TTAACTCGGATCA--T
![Page 21: Sequence Analysis Methods](https://reader035.fdocuments.net/reader035/viewer/2022081421/568148f9550346895db6207b/html5/thumbnails/21.jpg)
2121
Alignment ScoreAlignment ScoreSequence a: CTTAACT
Sequence b: CGGATCAT
8
8-3
=5
5-3
=2
2-3
=-1
C G G A T C A T
C
T
T
A
A
C
T
C---TTAACTCGGATCA--T
![Page 22: Sequence Analysis Methods](https://reader035.fdocuments.net/reader035/viewer/2022081421/568148f9550346895db6207b/html5/thumbnails/22.jpg)
2222
Alignment ScoreAlignment ScoreSequence a: CTTAACT
Sequence b: CGGATCAT
8 5 2 -1
-1+8
=7
7-3
=4
4+8
=12
12-3
=9
9-3
=6
C G G A T C A T
C
T
T
A
A
C
T
C---TTAACTCGGATCA--T
6+8=14
Alignment score
![Page 23: Sequence Analysis Methods](https://reader035.fdocuments.net/reader035/viewer/2022081421/568148f9550346895db6207b/html5/thumbnails/23.jpg)
2323
An optimal alignmentAn optimal alignment-- the alignment of maximum score-- the alignment of maximum score
• Let A=a1a2…am and B=b1b2…bn .
• Si,j: the score of an optimal alignment between
a1a2…ai and b1b2…bj
• With proper initializations, Si,j can be computedas follows.
),(
),(
),(
max
1,1
1,
,1
,
jiji
jji
iji
ji
baws
bws
aws
s
![Page 24: Sequence Analysis Methods](https://reader035.fdocuments.net/reader035/viewer/2022081421/568148f9550346895db6207b/html5/thumbnails/24.jpg)
2424
Computing Computing SSi,ji,j
i
j
w(ai,-)
w(-,bj)
w(ai,bj)
Sm,n
![Page 25: Sequence Analysis Methods](https://reader035.fdocuments.net/reader035/viewer/2022081421/568148f9550346895db6207b/html5/thumbnails/25.jpg)
2525
InitializationsInitializationsS0,0= 0
S0,1=-3, S0,2=-6,
S0,3=-9, S0,4=-12,
S0,5=-15, S0,6=-18,
S0,7=-21, S0,8=-24
S1,0=-3, S2,0=-6,
S3,0=-9, S4,0=-12,
S5,0=-15, S6,0=-18,
S7,0=-21
0 -3 -6 -9 -12 -15 -18 -21 -24
-3
-6
-9
-12
-15
-18
-21
C G G A T C A T
C
T
T
A
A
C
T
Gap symbol: -3
![Page 26: Sequence Analysis Methods](https://reader035.fdocuments.net/reader035/viewer/2022081421/568148f9550346895db6207b/html5/thumbnails/26.jpg)
2626
SS1,11,1 = = ??Option 1:
S1,1 = S0,0 +w(a1, b1)
= 0 +8 = 8
Option 2:
S1,1=S0,1 + w(a1, -)
= -3 - 3 = -6
Option 3:
S1,1=S1,0 + w( - , b1)
= -3-3 = -6
Optimal:
S1,1 = 8
0 -3 -6 -9 -12 -15 -18 -21 -24
-3 ?
-6
-9
-12
-15
-18
-21
C G G A T C A T
C
T
T
A
A
C
T
Match: 8
Mismatch: -5
Gap symbol: -3
![Page 27: Sequence Analysis Methods](https://reader035.fdocuments.net/reader035/viewer/2022081421/568148f9550346895db6207b/html5/thumbnails/27.jpg)
2727
SS1,21,2 = = ??Option 1:
S1,2 = S0,1 +w(a1, b2)
= -3 -5 = -8
Option 2:
S1,2=S0,2 + w(a1, -)
= -6 - 3 = -9
Option 3:
S1,2=S1,1 + w( - , b2)
= 8-3 = 5
Optimal:
S1,2 =5
0 -3 -6 -9 -12 -15 -18 -21 -24
-3 8 ?
-6
-9
-12
-15
-18
-21
C G G A T C A T
C
T
T
A
A
C
T
Match: 8
Mismatch: -5
Gap symbol: -3
![Page 28: Sequence Analysis Methods](https://reader035.fdocuments.net/reader035/viewer/2022081421/568148f9550346895db6207b/html5/thumbnails/28.jpg)
2828
SS2,12,1 = = ??Option 1:
S2,1= S1,0 +w(a2, b1)
= -3 -5 = -8
Option 2:
S2,1=S1,1 + w(a2, -)
= 8 - 3 = 5
Option 3:
S2,1=S2,0 + w( - , b1)
= -6-3 = -9
Optimal:
S2,1 =5
0 -3 -6 -9 -12 -15 -18 -21 -24
-3 8 5
-6 ?
-9
-12
-15
-18
-21
C G G A T C A T
C
T
T
A
A
C
T
Match: 8
Mismatch: -5
Gap symbol: -3
![Page 29: Sequence Analysis Methods](https://reader035.fdocuments.net/reader035/viewer/2022081421/568148f9550346895db6207b/html5/thumbnails/29.jpg)
2929
SS2,22,2 = = ??Option 1:
S2,2= S1,1 +w(a2, b2)
= 8 -5 = 3
Option 2:
S2,2=S1,2 + w(a2, -)
= 5 - 3 = 2
Option 3:
S2,2=S2,1 + w( - , b2)
= 5-3 = 2
Optimal:
S2,2 =3
0 -3 -6 -9 -12 -15 -18 -21 -24
-3 8 5
-6 5 ?
-9
-12
-15
-18
-21
C G G A T C A T
C
T
T
A
A
C
T
Match: 8
Mismatch: -5
Gap symbol: -3
![Page 30: Sequence Analysis Methods](https://reader035.fdocuments.net/reader035/viewer/2022081421/568148f9550346895db6207b/html5/thumbnails/30.jpg)
3030
SS3,53,5 = = ??
0 -3 -6 -9 -12 -15 -18 -21 -24
-3 8 5 2 -1 -4 -7 -10 -13
-6 5 3 0 -3 7 4 1 -2
-9 2 0 -2 -5 ?
-12
-15
-18
-21
C G G A T C A T
C
T
T
A
A
C
T
![Page 31: Sequence Analysis Methods](https://reader035.fdocuments.net/reader035/viewer/2022081421/568148f9550346895db6207b/html5/thumbnails/31.jpg)
3131
SS3,53,5 = = ??
0 -3 -6 -9 -12 -15 -18 -21 -24
-3 8 5 2 -1 -4 -7 -10 -13
-6 5 3 0 -3 7 4 1 -2
-9 2 0 -2 -5 5 -1 -4 9
-12 -1 -3 -5 6 3 0 7 6
-15 -4 -6 -8 3 1 -2 8 5
-18 -7 -9 -11 0 -2 9 6 3
-21 -10 -12 -14 -3 8 6 4 14
C G G A T C A T
C
T
T
A
A
C
T
optimal score
![Page 32: Sequence Analysis Methods](https://reader035.fdocuments.net/reader035/viewer/2022081421/568148f9550346895db6207b/html5/thumbnails/32.jpg)
3232
C T T A A C – TC T T A A C – TC G G A T C A TC G G A T C A T
0 -3 -6 -9 -12 -15 -18 -21 -24
-3 8 5 2 -1 -4 -7 -10 -13
-6 5 3 0 -3 7 4 1 -2
-9 2 0 -2 -5 5 -1 -4 9
-12 -1 -3 -5 6 3 0 7 6
-15 -4 -6 -8 3 1 -2 8 5
-18 -7 -9 -11 0 -2 9 6 3
-21 -10 -12 -14 -3 8 6 4 14
C G G A T C A T
C
T
T
A
A
C
T
8 – 5 –5 +8 -5 +8 -3 +8 = 14
![Page 33: Sequence Analysis Methods](https://reader035.fdocuments.net/reader035/viewer/2022081421/568148f9550346895db6207b/html5/thumbnails/33.jpg)
3333
Local vs. Global Sequence Alignment: Local vs. Global Sequence Alignment:
Example:
DNA sequence a: ATTCTTGC
DNA sequence b: ATCCTATTCTAGC
Local Alignment: ATTCTTGC Gaps ignored in local alignments
ATCCTATTCTAGC /|\ gap Global Alignment: AT TCTT GC ATCCTATTCTAGC /|\ /|\ gap gap Gaps counted in global alignments
![Page 34: Sequence Analysis Methods](https://reader035.fdocuments.net/reader035/viewer/2022081421/568148f9550346895db6207b/html5/thumbnails/34.jpg)
3434
Global Alignment vs. Local AlignmentGlobal Alignment vs. Local Alignment
• global alignment:
• local alignment:
All sections are counted
Only local sections (normally separated by gaps) are counted
![Page 35: Sequence Analysis Methods](https://reader035.fdocuments.net/reader035/viewer/2022081421/568148f9550346895db6207b/html5/thumbnails/35.jpg)
3535
An optimal local alignmentAn optimal local alignment
• Si,j: the score of an optimal local alignment ending at ai and bj
• With proper initializations, Si,j can be computedas follows.
),(
),(),(
0
max
1,1
1,
,1
,
jiji
jji
iji
ji
baws
bwsaws
s
![Page 36: Sequence Analysis Methods](https://reader035.fdocuments.net/reader035/viewer/2022081421/568148f9550346895db6207b/html5/thumbnails/36.jpg)
3636
InitializationsInitializations
0 0 0 0 0 0 0 0 0
0
0
0
0
0
0
0
C G G A T C A T
C
T
T
A
A
C
T
Match: 8
Mismatch: -5
Gap symbol: -3
![Page 37: Sequence Analysis Methods](https://reader035.fdocuments.net/reader035/viewer/2022081421/568148f9550346895db6207b/html5/thumbnails/37.jpg)
3737
SS1,11,1 = = ?? Option 1:
S1,1 = S0,0 +w(a1, b1)
= 0 +8 = 8
Option 2:
S1,1=S0,1 + w(a1, -)
= 0 - 3 = -3
Option 3:
S1,1=S1,0 + w( - , b1)
= 0-3 = -3
Option 4:
S1,1=0
Optimal:
S1,1 = 8
0 0 0 0 0 0 0 0 0
0 ?
0
0
0
0
0
0
C G G A T C A T
C
T
T
A
A
C
T
Match: 8
Mismatch: -5
Gap symbol: -3
![Page 38: Sequence Analysis Methods](https://reader035.fdocuments.net/reader035/viewer/2022081421/568148f9550346895db6207b/html5/thumbnails/38.jpg)
3838
local alignmentlocal alignment
0 0 0 0 0 0 0 0 0
0 8 5 2 0 0 8 5 2
0 5 3 0 0 8 5 3 13
0 2 0 0 0 8 5 2 11
0 0 0 0 8 5 3 ?
0
0
0
C G G A T C A T
C
T
T
A
A
C
T
Match: 8
Mismatch: -5
Gap symbol: -3
![Page 39: Sequence Analysis Methods](https://reader035.fdocuments.net/reader035/viewer/2022081421/568148f9550346895db6207b/html5/thumbnails/39.jpg)
3939
0 0 0 0 0 0 0 0 0
0 8 5 2 0 0 8 5 2
0 5 3 0 0 8 5 3 13
0 2 0 0 0 8 5 2 11
0 0 0 0 8 5 3 13 10
0 0 0 0 8 5 2 11 8
0 8 5 2 5 3 13 10 7
0 5 3 0 2 13 10 8 18
C G G A T C A T
C
T
T
A
A
C
T
The best
score
A – C - TA T C A T8-3+8-3+8 = 18
local alignmentlocal alignment
![Page 40: Sequence Analysis Methods](https://reader035.fdocuments.net/reader035/viewer/2022081421/568148f9550346895db6207b/html5/thumbnails/40.jpg)
4040
BLAST BLAST Basic Local Alignment Search ToolBasic Local Alignment Search Tool
Procedure:
• Divide all sequences into overlapping constituent words (size k)
• Build the hash table for Sequence a.• Scan Sequence b for hits.• Extend hits.
![Page 41: Sequence Analysis Methods](https://reader035.fdocuments.net/reader035/viewer/2022081421/568148f9550346895db6207b/html5/thumbnails/41.jpg)
4141
BLAST BLAST Basic Local Alignment Search ToolBasic Local Alignment Search Tool
Step 1:Hash table for sequence A
![Page 42: Sequence Analysis Methods](https://reader035.fdocuments.net/reader035/viewer/2022081421/568148f9550346895db6207b/html5/thumbnails/42.jpg)
4242
Amino acid Amino acid similarity similarity matrix matrix PAM 120PAM 120
Instead of using the simple values +8 and -5 for matches and mismatches, this statistically derived score matrix is used to rank the level of similarity between two amino acids
![Page 43: Sequence Analysis Methods](https://reader035.fdocuments.net/reader035/viewer/2022081421/568148f9550346895db6207b/html5/thumbnails/43.jpg)
4343
Amino acid similarity matrix PAM 250Amino acid similarity matrix PAM 250This is a more popularly used score matrix for ranking the level of similarity of two amino acids. It is derived by consideration of more diverse sets of data and more number of statistical steps.
![Page 44: Sequence Analysis Methods](https://reader035.fdocuments.net/reader035/viewer/2022081421/568148f9550346895db6207b/html5/thumbnails/44.jpg)
4444
Amino acid similarity matrix Blosum 45Amino acid similarity matrix Blosum 45The Blosum matrices were calculated using data from the BLOCKS database which contains alignments of more distantly-related proteins. In principle, Blosum matrices should be more realistic for comparing distantly-related proteins, but may introduce error for conventional proteins. .
![Page 45: Sequence Analysis Methods](https://reader035.fdocuments.net/reader035/viewer/2022081421/568148f9550346895db6207b/html5/thumbnails/45.jpg)
4545
BLAST BLAST Basic Local Alignment Search ToolBasic Local Alignment Search Tool
![Page 46: Sequence Analysis Methods](https://reader035.fdocuments.net/reader035/viewer/2022081421/568148f9550346895db6207b/html5/thumbnails/46.jpg)
4646
BLAST BLAST Basic Local Alignment Search ToolBasic Local Alignment Search Tool
LN:LN=9
NF:NY=8
GW:PW=10
Step 2:
Use all of the 2-letter words in query sequence to scan against database sequence and mark those with score > 8
Note:
Marked points can be on the diagonal and off-diagonal
![Page 47: Sequence Analysis Methods](https://reader035.fdocuments.net/reader035/viewer/2022081421/568148f9550346895db6207b/html5/thumbnails/47.jpg)
4747
BLASTStep2: Scan sequence b for hits.
![Page 48: Sequence Analysis Methods](https://reader035.fdocuments.net/reader035/viewer/2022081421/568148f9550346895db6207b/html5/thumbnails/48.jpg)
4848
BLASTStep2: Scan sequence b for hits.
Step 3: Extend hits.
hit
Terminate if the score of the extension fades away.
BLAST 2.0 saves the time spent in extension, and
considers gapped alignments.
![Page 49: Sequence Analysis Methods](https://reader035.fdocuments.net/reader035/viewer/2022081421/568148f9550346895db6207b/html5/thumbnails/49.jpg)
4949
Multiple sequence alignment (MSA)Multiple sequence alignment (MSA)
• The multiple sequence alignment problem is to simultaneously align more than two sequences.
Seq1: GCTC
Seq2: AC
Seq3: GATC
GC-TC
A---C
G-ATC
![Page 50: Sequence Analysis Methods](https://reader035.fdocuments.net/reader035/viewer/2022081421/568148f9550346895db6207b/html5/thumbnails/50.jpg)
5050
Multiple sequence alignment MSAMultiple sequence alignment MSA
![Page 51: Sequence Analysis Methods](https://reader035.fdocuments.net/reader035/viewer/2022081421/568148f9550346895db6207b/html5/thumbnails/51.jpg)
5151
How to score an MSA?How to score an MSA?
• Sum-of-Pairs (SP-score)
GC-TC
A---C
G-ATC
GC-TC
A---C
GC-TC
G-ATC
A---C
G-ATC
Score =
Score
Score
Score
+
+
![Page 52: Sequence Analysis Methods](https://reader035.fdocuments.net/reader035/viewer/2022081421/568148f9550346895db6207b/html5/thumbnails/52.jpg)
5252
How to score an MSA?How to score an MSA?
• Sum-of-Pairs (SP-score)
GC-TC
A---C
G-ATC
GC-TC
A---C
GC-TC
G-ATC
A---C
G-ATC
Score =
Score
Score
Score
+
+
-5-3+8-3+8= 5
+
8-3-3+8+8= 18
+
-5+8-3-3+8= 5
= 28
SP-score=5+18+5=28
![Page 53: Sequence Analysis Methods](https://reader035.fdocuments.net/reader035/viewer/2022081421/568148f9550346895db6207b/html5/thumbnails/53.jpg)
5353
PPosition osition SSpecific pecific IIterated terated BLASTBLAST
• PSI-BLAST is a rather permissive alignment tool and it can find more distantly related sequences than FASTA or BLAST
• Especially, in many cases, it is much more sensitive to weak but biologically relevant sequence similarities.
![Page 54: Sequence Analysis Methods](https://reader035.fdocuments.net/reader035/viewer/2022081421/568148f9550346895db6207b/html5/thumbnails/54.jpg)
5454
PPosition osition SSpecific pecific IIterated terated BLASTBLAST
PSI-BLAST is used for:PSI-BLAST is used for: Distant homology detection Fold assignment: profile-profile comparison Domain identification Evolutionary Analysis (e.g. tree building) Sequence Annotation / function assignment Profile export to other programs Sequence clustering Structural genomics target selection
![Page 55: Sequence Analysis Methods](https://reader035.fdocuments.net/reader035/viewer/2022081421/568148f9550346895db6207b/html5/thumbnails/55.jpg)
5555
PPosition osition SSpecific pecific IIterated terated BLASTBLAST
• Collect all database sequence segments that have been aligned with query sequence with E-value below set threshold (default 0.001)
• Construct position specific scoring matrix for collected sequences. Rough idea:– Align all sequences to the query sequence as the
template.– Assign weights to the sequences – Construct position specific scoring matrix
• Iterate
![Page 56: Sequence Analysis Methods](https://reader035.fdocuments.net/reader035/viewer/2022081421/568148f9550346895db6207b/html5/thumbnails/56.jpg)
MGLLTREIF--ILQQ
FGLGRT-I-T-YMTN-GLVRT-I---LGLE
FGLLRT-I---YMTQ
MGLLTREIF--ILQQ
Take a sequence
Search for similar sequences in a full sequence database
A 029001100003200C 000070000000000..Y 002000080202000
Construct a profile, and represent conservation in each position numerically
Profile holds more information than a single sequence: use the profile to retrieve additional sequences
Sequences are multiply alignedFGLLRT-I-T-YMTN
-RLTRD-I---LGLYFGLLRT-I---FMTS
New sequences in the multiple alignment
Construct a new profileA 027005101003200C 000070000000000..Y 202000060202000
After several iterations of this procedure we have:
• Sequence information, including links to annotation
• Several sets of multiple alignments.
• Profiles, derived by us or by PSI-BLAST
• Threshold information (alignment statistics)
A 029001100003200C 000070000000000..Y 002000080202000
using profile
How PLS-BLAST works?
![Page 57: Sequence Analysis Methods](https://reader035.fdocuments.net/reader035/viewer/2022081421/568148f9550346895db6207b/html5/thumbnails/57.jpg)
5757
Consensus sequenceConsensus sequence
• A sequence where each position is defined by majority vote based on multiple sequence alignment. Use consensus sequence for data base search.
PEAINYGRFTPFS I KSDVW
![Page 58: Sequence Analysis Methods](https://reader035.fdocuments.net/reader035/viewer/2022081421/568148f9550346895db6207b/html5/thumbnails/58.jpg)
5858Next New iteration……
MGLLTREIF--ILQQ
FGLGRT-I-T-YMTN-GLVRT-I---LGLEFGLLRT-I---YMTQ
MGLLTREIF--ILQQ
Take a sequence
Search for similar sequences in a full sequence database
A 029001100003200C 000070000000000..Y 002000080202000
Construct a profile, and represent conservation in each position numerically
Profile holds more information than a single sequence: use the profile to retrieve additional sequences
Sequences are multiply aligned
Construct a new profile
A 027005101003200C 000070000000000..Y 202000060202000
Using profile to search for similar sequences in a full sequence database
A 029001100003200C 000070000000000..Y 002000080202000
FGLLRT-I-T-YMTN-RLTRD-I---LGLYFGLLRT-I---FMTS
New sequences in the multiple alignments
New iteration
Flow chart of PSI-BLAST
![Page 59: Sequence Analysis Methods](https://reader035.fdocuments.net/reader035/viewer/2022081421/568148f9550346895db6207b/html5/thumbnails/59.jpg)
5959
PSI-BLASTPSI-BLAST
NCBI PSI-BLAST tutorial :
http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/psi1.html
![Page 60: Sequence Analysis Methods](https://reader035.fdocuments.net/reader035/viewer/2022081421/568148f9550346895db6207b/html5/thumbnails/60.jpg)
6060
PSI-BLASTPSI-BLASTNCBI PSI-BLAST tutorial :
http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/psi1.html
![Page 61: Sequence Analysis Methods](https://reader035.fdocuments.net/reader035/viewer/2022081421/568148f9550346895db6207b/html5/thumbnails/61.jpg)
6161
PSI-BLASTPSI-BLASTNCBI PSI-BLAST tutorial :
http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/psi1.html
![Page 62: Sequence Analysis Methods](https://reader035.fdocuments.net/reader035/viewer/2022081421/568148f9550346895db6207b/html5/thumbnails/62.jpg)
6262
PSI-BLASTPSI-BLASTNCBI PSI-BLAST tutorial :
http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/psi1.html
![Page 63: Sequence Analysis Methods](https://reader035.fdocuments.net/reader035/viewer/2022081421/568148f9550346895db6207b/html5/thumbnails/63.jpg)
6363
PSI-BLASTPSI-BLASTNCBI PSI-BLAST tutorial :
http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/psi1.html
![Page 64: Sequence Analysis Methods](https://reader035.fdocuments.net/reader035/viewer/2022081421/568148f9550346895db6207b/html5/thumbnails/64.jpg)
6464
PSI-BLASTPSI-BLASTNCBI PSI-BLAST tutorial :
http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/psi1.html
![Page 65: Sequence Analysis Methods](https://reader035.fdocuments.net/reader035/viewer/2022081421/568148f9550346895db6207b/html5/thumbnails/65.jpg)
6565
PSI-BLASTPSI-BLASTNCBI PSI-BLAST tutorial :
http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/psi1.html
![Page 66: Sequence Analysis Methods](https://reader035.fdocuments.net/reader035/viewer/2022081421/568148f9550346895db6207b/html5/thumbnails/66.jpg)
6666
PSI-BLASTPSI-BLASTNCBI PSI-BLAST tutorial :
http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/psi1.html
![Page 67: Sequence Analysis Methods](https://reader035.fdocuments.net/reader035/viewer/2022081421/568148f9550346895db6207b/html5/thumbnails/67.jpg)
6767
Use of PSI-BLAST to probe the Use of PSI-BLAST to probe the function of a viral proteinfunction of a viral protein
PEAINYGRFTPFS I KSDVW
![Page 68: Sequence Analysis Methods](https://reader035.fdocuments.net/reader035/viewer/2022081421/568148f9550346895db6207b/html5/thumbnails/68.jpg)
6868
Summary of Today’s lectureSummary of Today’s lecture
• Sequence alignment methods revisited:– Pair-wise alignment– Multiple sequence alignment– BLAST– PSI-BLAST
• Use of PSI-BLAST to probe protein function