Copyright (c) 2002 by SNU CSE Biointelligence Lab 1 Chap. 4 Pairwise alignment using HMMs...
-
Upload
augustus-cobb -
Category
Documents
-
view
217 -
download
0
Transcript of Copyright (c) 2002 by SNU CSE Biointelligence Lab 1 Chap. 4 Pairwise alignment using HMMs...
Copyright (c) 2002 by SNU CSE Biointelligence Lab 1
Chap. 4 Pairwise alignment using HMMsChap. 4 Pairwise alignment using HMMs
Biointelligence Laboratory
School of Computer Sci. & Eng.
Seoul National University
Seoul 151-742, Korea
This slide file is available online at
http://bi.snu.ac.kr/
2Copyright (c) 2002 by SNU CSE Biointelligence Lab
ContentsContents
FSA → HMM Pair HMMs The full probability of x & y Suboptimal alignment posterior that xi is aligned to yi
Pair HMMs vs FSAs for searching
3Copyright (c) 2002 by SNU CSE Biointelligence Lab
Figure 4.1 A finite state machine diagram for affine gap alignment on the left, and the corresponding probabilistic model on the right.
X(+1,+0)
M(+1,+1)
Y(+0,+1)
-e
-d
-d
-es(xi,yj)
s(xi,yj)
s(xi,yj)
Xqxi
Mpxiyj
Yqyj
ε
ε1-ε
1-ε
δ
δ
1-δ
4Copyright (c) 2002 by SNU CSE Biointelligence Lab
Recurrence RelationRecurrence Relation
ejiV
djiVjiV
ejiV
djiVjiV
jiV
jiV
jiV
yxsjiV ii
)1,(
)1,(max),(
),1(
),1(max),(
)1,1(
)1,1(
)1,1(
max),(),(
Y
MY
X
MX
Y
X
M
M
5Copyright (c) 2002 by SNU CSE Biointelligence Lab
Pair HMMs (1)Pair HMMs (1)
FSA → HMMs: How to? Specification of emission & transition probabilities
X(+1,+0)
M(+1,+1)
Y(+0,+1)
-e
-d
-d
-es(xi,yj)
s(xi,yj)
s(xi,yj)
Xqxi
Mpxiyj
Yqyj
ε
ε1-ε
1-ε
δ
δ
1-δ
6Copyright (c) 2002 by SNU CSE Biointelligence Lab
Pair HMMs (2)Pair HMMs (2)
Definition of begin state & end state Providing pd. over all possible sequences
Pair HMM Identical to ordinary HMM Emitting a pairwise alignment
7Copyright (c) 2002 by SNU CSE Biointelligence Lab
Xqxi
Mpxiyj
Y
qyj
ε
ε
1-ε-τ
δ
δ1-2δ-τ
1-ε-τ
δ
δ
τ
τ
τ
τ
BeginEnd
Figure 4.2 The full probabilistic version of Figure 4.1
1-2δ-τ
8Copyright (c) 2002 by SNU CSE Biointelligence Lab
Pair HMMs (3)Pair HMMs (3)
Algorithm: Viterbi algorithm for pair HMMs Initialization:
• vM(0, 0) = 1. vX(0, 0) = vY(0, 0) = 0 v*(-1, j) = v*(i, -1) = 0.
Recurrence: i = 0,…,n, j = 0,…,m, except for(0,0);
Termination:
)1,(
)1,(max),(
),1(
),1(max),(
)1,1()1(
)1,1()1(
)1,1()21(
max),(
X
MY
X
MX
Y
X
M
M
jiv
jivqjiv
jiv
jivqjiv
jiv
jiv
jiv
pjiv
j
i
ji
y
x
yx
)),(),,(),,(max( YXME mnvmnvmnvv
9Copyright (c) 2002 by SNU CSE Biointelligence Lab
Pair HMMs (4)Pair HMMs (4)
Random model
Probability of a pair of sequences x and y
Xqxi
Yqyj
η
η1-η
1-η η
1-η η
1-η
Begin End
m
jx
n
ix
mn
m
jx
mn
ix
n
ji
ji
qqRyxp
11
2
11
)1(
)1()1()|,(
10Copyright (c) 2002 by SNU CSE Biointelligence Lab
Pair HMMs (5)Pair HMMs (5)
Correspondence with FSA Probability terms to log-odd terms Viterbi match / random match
Tricks
1log
)21)(1(
)1(log
)1(
)21(loglog),(
2
e
d
pbas
ba
ab
Compensating term
11Copyright (c) 2002 by SNU CSE Biointelligence Lab
Pair HMMs (6)Pair HMMs (6)
Algorithm: Optimal log-odds alignment Initialization:
• VM(0, 0) = 2logη , VX(0,0) = VY(0,0)= -.• All V•(i,-1), V•(-1, j) are set to -.
Recurrence: i = 0,…,n, j = 0,…,m except(0,0);
Termination:
ejiV
djiVjiV
ejiV
djiVjiV
jiV
jiV
jiV
yxsjiV ii
)1,(
)1,(max),(
),1(
),1(max),(
)1,1(
)1,1(
)1,1(
max),(),(
X
MY
X
MX
Y
X
M
M
)),(,),(),,(max( YXM cmnVcmnVmnVV
)1log()21log( c
Last compensating term
12Copyright (c) 2002 by SNU CSE Biointelligence Lab
Pair HMMs (7)Pair HMMs (7)
Pair HMM for local alignment figure 4.3
13Copyright (c) 2002 by SNU CSE Biointelligence Lab
The full probability of The full probability of xx and and y y (1)(1)
Summation over all alignments
Forward algorithm does it. P(x, y) = fE(n, m)
Posterior distribution P(π|x,y) can be acquired.
alignments
)),( P(x,y,yxP
),(
),,(),|(
yxP
yxPyxP
14Copyright (c) 2002 by SNU CSE Biointelligence Lab
The full probability of The full probability of xx and and y y (2)(2)
Algorithm: Forward calculation for pair HMMs Initialization:
• fM(0, 0) = 1, fX(0,0) = fY(0,0)= 0.• All f•(i,-1), f•(-1, j) are set to 0.
Recurrence: i = 0,…,n, j = 0,…,m except (0,0);
Termination:
)].1,()1,([),(
)];,1(),1([),(
))];1,1()1,1()(1(
)1,1()21[(),(
XMY
XMX
YX
MM
jivjivqjif
jifjifqjif
jifjif
jifpjif
j
i
ji
y
x
yx
)];,(),(),([),( YXME mnfmnfmnfmnf
15Copyright (c) 2002 by SNU CSE Biointelligence Lab
Suboptimal alignment (1)Suboptimal alignment (1)
Type of suboptimal alignment Slightly different from optimal alignment in a few
positions Substantially or completely different
• Repeats in one or both of the sequences
16Copyright (c) 2002 by SNU CSE Biointelligence Lab
Suboptimal alignment (2)Suboptimal alignment (2)
Probabilistic sampling of alignments Sampling from the posterior distribution
Trace back through fk(i, j)),(
),,(),|(
yxP
yxPyxP
))];1,1()1,1()(1(
)1,1()21[(),(
YX
MM
jifjif
jifpjifji yx
),(
)1,1()1( prob. with )1,1(Y
),(
)1,1()1( prob. with )1,1(X
),(
)1,1()21( prob. with )1,1(M
M
Y
M
X
M
M
jif
jifpji
jif
jifpji
jif
jifpji
ji
ji
ji
yx
yx
yx
17Copyright (c) 2002 by SNU CSE Biointelligence Lab
Suboptimal alignment (3)Suboptimal alignment (3)
Finding distinct suboptimal alignments Waterman & Eggert [1987] Finding the next best alignment No aligned residue pairs in common with any previousl
y determined alignment
18Copyright (c) 2002 by SNU CSE Biointelligence Lab
figure 4.5
19Copyright (c) 2002 by SNU CSE Biointelligence Lab
posterior that posterior that xxii is aligned to is aligned to yyj j (1)(1)
Reliability measure for each part of an alignment Interest
),(
),,(),|(
yxP
yxyxPyxyxP ji
ji
)|,(),,(
),,|,(),,(),,(
...1...1...1...1
...1...1...1...1...1...1
jimjnijiji
jijimjnijijiji
yxyxPyxyxP
yxyxyxPyxyxPyxyxP
Forward algorithm Backward algorithm
20Copyright (c) 2002 by SNU CSE Biointelligence Lab
posterior that posterior that xxii is aligned to is aligned to yyj j (2)(2)
Algorithm: Backward calculation for pair HMMs Initialization:
• bM(n, m) = bX(n, m) = bY(n,m) = τ.• All b•(i, m+1), b•(n+1, j) are set to 0.
Recurrence: i = 1,…,n, j = 1,…,m except (n, m);
)].1,()1,1()1(),(
)];,1()1,1()1(),(
)];1,(),1([
)1,1()21(),(
YMY
XMX
Y1
X1
MM
111
111
11
jibqjibpjib
jibqjibpjib
jibqjibq
jibpjib
jji
iji
ji
ji
yyx
xyx
yx
yx
21Copyright (c) 2002 by SNU CSE Biointelligence Lab
posterior that posterior that xxii is aligned to is aligned to yyj j (3)(3)
The expected accuracy of an alignment Expected overlap between π and paths sampled from th
e posterior distribution
Dynamic programming
),(
)()(ji
ji yxPA
)1,(
),1(
)()1,1(
max),(
jiA
jiA
yxPjiA
jiAji
22Copyright (c) 2002 by SNU CSE Biointelligence Lab
Pair HMMs vs FSAs for searchinPair HMMs vs FSAs for searching (1)g (1) Two difficulties of conventional methods in
searching Not a probabilistic models for searching Not computable full probability P(x, y|M)
a b a c
qaS
B
α
1-α
1 1 1
1PS(abac) = α4qaqbqaqc
PB(abac) = 1-α
Model comparison using the best match rather than the total probability
23Copyright (c) 2002 by SNU CSE Biointelligence Lab
Pair HMMs vs FSAs for searchinPair HMMs vs FSAs for searching (2)g (2) Conversion FSA into probabilistic model
Probabilistic models may underperform standard alignment methods if Viterbi is used for database searching.
Buf if forward algorithm is used, it would be better than standard methods.