Gapped BLAST and PSI-BLAST Altschul et al Presenter: 張耿豪 莊凱翔.
-
Upload
corinne-farley -
Category
Documents
-
view
255 -
download
3
Transcript of Gapped BLAST and PSI-BLAST Altschul et al Presenter: 張耿豪 莊凱翔.
![Page 1: Gapped BLAST and PSI-BLAST Altschul et al Presenter: 張耿豪 莊凱翔.](https://reader033.fdocuments.net/reader033/viewer/2022061518/56649c995503460f94955f71/html5/thumbnails/1.jpg)
Gapped BLAST and PSI-BLAST
Altschul et alPresenter: 張耿豪 莊凱翔
![Page 2: Gapped BLAST and PSI-BLAST Altschul et al Presenter: 張耿豪 莊凱翔.](https://reader033.fdocuments.net/reader033/viewer/2022061518/56649c995503460f94955f71/html5/thumbnails/2.jpg)
Outline
BLAST 1.0 background (from lecture slides)
BLAST 2.0 Gapped BLAST PSI-BLAST Demonstration
![Page 3: Gapped BLAST and PSI-BLAST Altschul et al Presenter: 張耿豪 莊凱翔.](https://reader033.fdocuments.net/reader033/viewer/2022061518/56649c995503460f94955f71/html5/thumbnails/3.jpg)
Statistical preliminaries
Pi : background probability that amino acids occur randomly at all position
E: number of distinct HSPs with normalized score at least S
sij
qij : target frequency of aligned pair of letters (i, j) with HSP, high-scoring segment paris
![Page 4: Gapped BLAST and PSI-BLAST Altschul et al Presenter: 張耿豪 莊凱翔.](https://reader033.fdocuments.net/reader033/viewer/2022061518/56649c995503460f94955f71/html5/thumbnails/4.jpg)
Outline
BLAST 1.0 background (from lecture slides)
BLAST 2.0 Gapped BLAST PSI-BLAST
![Page 5: Gapped BLAST and PSI-BLAST Altschul et al Presenter: 張耿豪 莊凱翔.](https://reader033.fdocuments.net/reader033/viewer/2022061518/56649c995503460f94955f71/html5/thumbnails/5.jpg)
BLAST
Basic Local Alignment Search Tool(by Altschul, Gish, Miller, Myers and Lipman)
The central idea of the BLAST algorithm is that a statistically significant alignment is likely to contain a high-scoring pair of aligned words.
![Page 6: Gapped BLAST and PSI-BLAST Altschul et al Presenter: 張耿豪 莊凱翔.](https://reader033.fdocuments.net/reader033/viewer/2022061518/56649c995503460f94955f71/html5/thumbnails/6.jpg)
The maximal segment pair measure A maximal segment pair (MSP) is
defined to be the highest scoring pair of identical length segments chosen from 2 sequences.(for DNA: Identities: +5; Mismatches: -4)
the highest scoring pair
•The MSP score may be computed in time proportional to the product of their lengths. (How?) An exact procedure is too time consuming.
•BLAST heuristically attempts to calculate the MSP score.
![Page 7: Gapped BLAST and PSI-BLAST Altschul et al Presenter: 張耿豪 莊凱翔.](https://reader033.fdocuments.net/reader033/viewer/2022061518/56649c995503460f94955f71/html5/thumbnails/7.jpg)
BLAST
1) Build the hash table for Sequence A.2) Scan Sequence B for hits.3) Extend hits.
![Page 8: Gapped BLAST and PSI-BLAST Altschul et al Presenter: 張耿豪 莊凱翔.](https://reader033.fdocuments.net/reader033/viewer/2022061518/56649c995503460f94955f71/html5/thumbnails/8.jpg)
BLASTStep 1: Build the hash table for Sequence A. (3-tuple example)
For DNA sequences:
Seq. A = AGATCGAT 12345678AAAAAC..AGA 1..ATC 3..CGA 5..GAT 2 6..TCG 4..
TTT
For protein sequences:
Seq. A = ELVIS
Add xyz to the hash table if Score(xyz, ELV) ≧ T;Add xyz to the hash table if Score(xyz, LVI) ≧ T;Add xyz to the hash table if Score(xyz, VIS) ≧ T;
The higher T, the less sensitivity, but faster
![Page 9: Gapped BLAST and PSI-BLAST Altschul et al Presenter: 張耿豪 莊凱翔.](https://reader033.fdocuments.net/reader033/viewer/2022061518/56649c995503460f94955f71/html5/thumbnails/9.jpg)
BLASTStep2: Scan sequence B for hits.
![Page 10: Gapped BLAST and PSI-BLAST Altschul et al Presenter: 張耿豪 莊凱翔.](https://reader033.fdocuments.net/reader033/viewer/2022061518/56649c995503460f94955f71/html5/thumbnails/10.jpg)
BLASTStep2: Scan sequence B for hits.
Step 3: Extend hits.
hit
Terminate if the score of the sxtension fades away. (That is, when we reach a segment pair whose score falls a certain distance below the best score found for shorter extensions.)
BLAST 2.0 saves the time spent in extension, and
considers gapped alignments.
![Page 11: Gapped BLAST and PSI-BLAST Altschul et al Presenter: 張耿豪 莊凱翔.](https://reader033.fdocuments.net/reader033/viewer/2022061518/56649c995503460f94955f71/html5/thumbnails/11.jpg)
Outline
BLAST 1.0 background (from lecture slides)
BLAST 2.0 Gapped BLAST PSI-BLAST
![Page 12: Gapped BLAST and PSI-BLAST Altschul et al Presenter: 張耿豪 莊凱翔.](https://reader033.fdocuments.net/reader033/viewer/2022061518/56649c995503460f94955f71/html5/thumbnails/12.jpg)
Two-Hit Method BLAST 1.o
Extension step accounts for 90% of total time Observations:
HSP of interest is much longer than a single word pair
Entail multiple hits on the same diagonal and within short distance of one another
Invoke an extension only when two non-overlapping hits are found within distance A on the same diagonal
![Page 13: Gapped BLAST and PSI-BLAST Altschul et al Presenter: 張耿豪 莊凱翔.](https://reader033.fdocuments.net/reader033/viewer/2022061518/56649c995503460f94955f71/html5/thumbnails/13.jpg)
Demonstration
Recent[i]: the most recent hit found on the ith diagonal (always increasing)
> A
overlap
< A Extend!
![Page 14: Gapped BLAST and PSI-BLAST Altschul et al Presenter: 張耿豪 莊凱翔.](https://reader033.fdocuments.net/reader033/viewer/2022061518/56649c995503460f94955f71/html5/thumbnails/14.jpg)
Discussion
T must to be lowered More one-hits while the
majority are dismissed Speed:
Twice as rapid as one-hit
Sensitivity Almost the same
![Page 15: Gapped BLAST and PSI-BLAST Altschul et al Presenter: 張耿豪 莊凱翔.](https://reader033.fdocuments.net/reader033/viewer/2022061518/56649c995503460f94955f71/html5/thumbnails/15.jpg)
Outline
BLAST 1.0 background (from lecture slides)
BLAST 2.0 Gapped BLAST PSI-BLAST
![Page 16: Gapped BLAST and PSI-BLAST Altschul et al Presenter: 張耿豪 莊凱翔.](https://reader033.fdocuments.net/reader033/viewer/2022061518/56649c995503460f94955f71/html5/thumbnails/16.jpg)
Gapped BLAST Original BLAST: find several distinct HSPs
All HSPs related to one alignment should be found Now:
Find one HSP only– seed, than use 2-hit T can be raised faster
Find all HSPs vs find one HSP for one optimal alignment
For example, result should > 0.95, p: miss prob of HSP Orignial with 2 HSP: (1-p)(1-p)>0.95 p<0.025 Now: p2<0.05p=0.22
![Page 17: Gapped BLAST and PSI-BLAST Altschul et al Presenter: 張耿豪 莊凱翔.](https://reader033.fdocuments.net/reader033/viewer/2022061518/56649c995503460f94955f71/html5/thumbnails/17.jpg)
Gapped BLAST (contd) A gapped extension takes much longer
to execute than an ungapped extension, but by performing very few of them the fraction of the total time could be kept low.
Trigger a gapped extension for any HSP exceeding score Sg
![Page 18: Gapped BLAST and PSI-BLAST Altschul et al Presenter: 張耿豪 莊凱翔.](https://reader033.fdocuments.net/reader033/viewer/2022061518/56649c995503460f94955f71/html5/thumbnails/18.jpg)
Example
Original BLAST locates only the first and the last ungapped aligment, E-value > 50 times
![Page 19: Gapped BLAST and PSI-BLAST Altschul et al Presenter: 張耿豪 莊凱翔.](https://reader033.fdocuments.net/reader033/viewer/2022061518/56649c995503460f94955f71/html5/thumbnails/19.jpg)
Outline
BLAST 1.0 background (from lecture slides)
BLAST 2.0 Gapped BLAST PSI-BLAST
![Page 20: Gapped BLAST and PSI-BLAST Altschul et al Presenter: 張耿豪 莊凱翔.](https://reader033.fdocuments.net/reader033/viewer/2022061518/56649c995503460f94955f71/html5/thumbnails/20.jpg)
PSI-BLAST position-specific score matrices
Vs substitution matrices Use it as ordinary ways
Iterated, using position-specific score matrices
For a BLAST run Constructed automatically from the output Use this matrix in place of the query for the next
run For proteins, |query| = L
Position-specific matrix : L * 20 Benefits:
Better to detect weak relationships
![Page 21: Gapped BLAST and PSI-BLAST Altschul et al Presenter: 張耿豪 莊凱翔.](https://reader033.fdocuments.net/reader033/viewer/2022061518/56649c995503460f94955f71/html5/thumbnails/21.jpg)
Construct Position-specific matrix
1. Construct multiple alignment M from the output
2. For every column of M1) Find reduced Mc of column C
2) Calculate scores in column C of the position-specific matrix
![Page 22: Gapped BLAST and PSI-BLAST Altschul et al Presenter: 張耿豪 莊凱翔.](https://reader033.fdocuments.net/reader033/viewer/2022061518/56649c995503460f94955f71/html5/thumbnails/22.jpg)
Construct multiple alignment M
Collect sequence segments output With E-value below a Threshold (why) Identical sequence are dropped
Pair-wise alignment columns with query involves inserted gap are ignored Multiple alignment M has same length
(column length) as query
![Page 23: Gapped BLAST and PSI-BLAST Altschul et al Presenter: 張耿豪 莊凱翔.](https://reader033.fdocuments.net/reader033/viewer/2022061518/56649c995503460f94955f71/html5/thumbnails/23.jpg)
Construct multiple alignment M
![Page 24: Gapped BLAST and PSI-BLAST Altschul et al Presenter: 張耿豪 莊凱翔.](https://reader033.fdocuments.net/reader033/viewer/2022061518/56649c995503460f94955f71/html5/thumbnails/24.jpg)
Calculate position-specific matrix score
The scores of a given alignment column should dependent the residues appeared on the column
But upon those in other columns as well
![Page 25: Gapped BLAST and PSI-BLAST Altschul et al Presenter: 張耿豪 莊凱翔.](https://reader033.fdocuments.net/reader033/viewer/2022061518/56649c995503460f94955f71/html5/thumbnails/25.jpg)
Find reduced Mc of column C
R: sequences contribute a residue in column C
Mc: those columns of M in which all the sequences are represented
![Page 26: Gapped BLAST and PSI-BLAST Altschul et al Presenter: 張耿豪 莊凱翔.](https://reader033.fdocuments.net/reader033/viewer/2022061518/56649c995503460f94955f71/html5/thumbnails/26.jpg)
Calculate scores in column C of the position-specific matrix
Related to all residues frequency observed fi, and number of independent residues in column C (Nc) log(Qi/Pi)
Qi: estimated probability for residue i to be found in C
![Page 27: Gapped BLAST and PSI-BLAST Altschul et al Presenter: 張耿豪 莊凱翔.](https://reader033.fdocuments.net/reader033/viewer/2022061518/56649c995503460f94955f71/html5/thumbnails/27.jpg)
Thank you
Any problems now?
![Page 28: Gapped BLAST and PSI-BLAST Altschul et al Presenter: 張耿豪 莊凱翔.](https://reader033.fdocuments.net/reader033/viewer/2022061518/56649c995503460f94955f71/html5/thumbnails/28.jpg)
Outline
BLAST 1.0 background (from lecture slides)
BLAST 2.0 Gapped BLAST PSI-BLAST Demonstration