Copyright (c) 2002 by SNU CSE Biointelligence Lab 1 Chap. 4 Pairwise alignment using HMMs...

23
Copyright (c) 2002 by SNU CSE Biointe lligence Lab 1 Chap. 4 Pairwise alignment u Chap. 4 Pairwise alignment u sing HMMs sing HMMs Biointelligence Laboratory School of Computer Sci. & Eng. Seoul National University Seoul 151-742, Korea This slide file is available online at http://bi. snu .ac. kr /

Transcript of Copyright (c) 2002 by SNU CSE Biointelligence Lab 1 Chap. 4 Pairwise alignment using HMMs...

Page 1: Copyright (c) 2002 by SNU CSE Biointelligence Lab 1 Chap. 4 Pairwise alignment using HMMs Biointelligence Laboratory School of Computer Sci. & Eng. Seoul.

Copyright (c) 2002 by SNU CSE Biointelligence Lab 1

Chap. 4 Pairwise alignment using HMMsChap. 4 Pairwise alignment using HMMs

Biointelligence Laboratory

School of Computer Sci. & Eng.

Seoul National University

Seoul 151-742, Korea 

This slide file is available online at

http://bi.snu.ac.kr/

Page 2: Copyright (c) 2002 by SNU CSE Biointelligence Lab 1 Chap. 4 Pairwise alignment using HMMs Biointelligence Laboratory School of Computer Sci. & Eng. Seoul.

2Copyright (c) 2002 by SNU CSE Biointelligence Lab

ContentsContents

FSA → HMM Pair HMMs The full probability of x & y Suboptimal alignment posterior that xi is aligned to yi

Pair HMMs vs FSAs for searching

Page 3: Copyright (c) 2002 by SNU CSE Biointelligence Lab 1 Chap. 4 Pairwise alignment using HMMs Biointelligence Laboratory School of Computer Sci. & Eng. Seoul.

3Copyright (c) 2002 by SNU CSE Biointelligence Lab

Figure 4.1 A finite state machine diagram for affine gap alignment on the left, and the corresponding probabilistic model on the right.

X(+1,+0)

M(+1,+1)

Y(+0,+1)

-e

-d

-d

-es(xi,yj)

s(xi,yj)

s(xi,yj)

Xqxi

Mpxiyj

Yqyj

ε

ε1-ε

1-ε

δ

δ

1-δ

Page 4: Copyright (c) 2002 by SNU CSE Biointelligence Lab 1 Chap. 4 Pairwise alignment using HMMs Biointelligence Laboratory School of Computer Sci. & Eng. Seoul.

4Copyright (c) 2002 by SNU CSE Biointelligence Lab

Recurrence RelationRecurrence Relation

ejiV

djiVjiV

ejiV

djiVjiV

jiV

jiV

jiV

yxsjiV ii

)1,(

)1,(max),(

),1(

),1(max),(

)1,1(

)1,1(

)1,1(

max),(),(

Y

MY

X

MX

Y

X

M

M

Page 5: Copyright (c) 2002 by SNU CSE Biointelligence Lab 1 Chap. 4 Pairwise alignment using HMMs Biointelligence Laboratory School of Computer Sci. & Eng. Seoul.

5Copyright (c) 2002 by SNU CSE Biointelligence Lab

Pair HMMs (1)Pair HMMs (1)

FSA → HMMs: How to? Specification of emission & transition probabilities

X(+1,+0)

M(+1,+1)

Y(+0,+1)

-e

-d

-d

-es(xi,yj)

s(xi,yj)

s(xi,yj)

Xqxi

Mpxiyj

Yqyj

ε

ε1-ε

1-ε

δ

δ

1-δ

Page 6: Copyright (c) 2002 by SNU CSE Biointelligence Lab 1 Chap. 4 Pairwise alignment using HMMs Biointelligence Laboratory School of Computer Sci. & Eng. Seoul.

6Copyright (c) 2002 by SNU CSE Biointelligence Lab

Pair HMMs (2)Pair HMMs (2)

Definition of begin state & end state Providing pd. over all possible sequences

Pair HMM Identical to ordinary HMM Emitting a pairwise alignment

Page 7: Copyright (c) 2002 by SNU CSE Biointelligence Lab 1 Chap. 4 Pairwise alignment using HMMs Biointelligence Laboratory School of Computer Sci. & Eng. Seoul.

7Copyright (c) 2002 by SNU CSE Biointelligence Lab

Xqxi

Mpxiyj

Y

qyj

ε

ε

1-ε-τ

δ

δ1-2δ-τ

1-ε-τ

δ

δ

τ

τ

τ

τ

BeginEnd

Figure 4.2 The full probabilistic version of Figure 4.1

1-2δ-τ

Page 8: Copyright (c) 2002 by SNU CSE Biointelligence Lab 1 Chap. 4 Pairwise alignment using HMMs Biointelligence Laboratory School of Computer Sci. & Eng. Seoul.

8Copyright (c) 2002 by SNU CSE Biointelligence Lab

Pair HMMs (3)Pair HMMs (3)

Algorithm: Viterbi algorithm for pair HMMs Initialization:

• vM(0, 0) = 1. vX(0, 0) = vY(0, 0) = 0 v*(-1, j) = v*(i, -1) = 0.

Recurrence: i = 0,…,n, j = 0,…,m, except for(0,0);

Termination:

)1,(

)1,(max),(

),1(

),1(max),(

)1,1()1(

)1,1()1(

)1,1()21(

max),(

X

MY

X

MX

Y

X

M

M

jiv

jivqjiv

jiv

jivqjiv

jiv

jiv

jiv

pjiv

j

i

ji

y

x

yx

)),(),,(),,(max( YXME mnvmnvmnvv

Page 9: Copyright (c) 2002 by SNU CSE Biointelligence Lab 1 Chap. 4 Pairwise alignment using HMMs Biointelligence Laboratory School of Computer Sci. & Eng. Seoul.

9Copyright (c) 2002 by SNU CSE Biointelligence Lab

Pair HMMs (4)Pair HMMs (4)

Random model

Probability of a pair of sequences x and y

Xqxi

Yqyj

η

η1-η

1-η η

1-η η

1-η

Begin End

m

jx

n

ix

mn

m

jx

mn

ix

n

ji

ji

qq

qqRyxp

11

2

11

)1(

)1()1()|,(

Page 10: Copyright (c) 2002 by SNU CSE Biointelligence Lab 1 Chap. 4 Pairwise alignment using HMMs Biointelligence Laboratory School of Computer Sci. & Eng. Seoul.

10Copyright (c) 2002 by SNU CSE Biointelligence Lab

Pair HMMs (5)Pair HMMs (5)

Correspondence with FSA Probability terms to log-odd terms Viterbi match / random match

Tricks

1log

)21)(1(

)1(log

)1(

)21(loglog),(

2

e

d

qq

pbas

ba

ab

Compensating term

Page 11: Copyright (c) 2002 by SNU CSE Biointelligence Lab 1 Chap. 4 Pairwise alignment using HMMs Biointelligence Laboratory School of Computer Sci. & Eng. Seoul.

11Copyright (c) 2002 by SNU CSE Biointelligence Lab

Pair HMMs (6)Pair HMMs (6)

Algorithm: Optimal log-odds alignment Initialization:

• VM(0, 0) = 2logη , VX(0,0) = VY(0,0)= -.• All V•(i,-1), V•(-1, j) are set to -.

Recurrence: i = 0,…,n, j = 0,…,m except(0,0);

Termination:

ejiV

djiVjiV

ejiV

djiVjiV

jiV

jiV

jiV

yxsjiV ii

)1,(

)1,(max),(

),1(

),1(max),(

)1,1(

)1,1(

)1,1(

max),(),(

X

MY

X

MX

Y

X

M

M

)),(,),(),,(max( YXM cmnVcmnVmnVV

)1log()21log( c

Last compensating term

Page 12: Copyright (c) 2002 by SNU CSE Biointelligence Lab 1 Chap. 4 Pairwise alignment using HMMs Biointelligence Laboratory School of Computer Sci. & Eng. Seoul.

12Copyright (c) 2002 by SNU CSE Biointelligence Lab

Pair HMMs (7)Pair HMMs (7)

Pair HMM for local alignment figure 4.3

Page 13: Copyright (c) 2002 by SNU CSE Biointelligence Lab 1 Chap. 4 Pairwise alignment using HMMs Biointelligence Laboratory School of Computer Sci. & Eng. Seoul.

13Copyright (c) 2002 by SNU CSE Biointelligence Lab

The full probability of The full probability of xx and and y y (1)(1)

Summation over all alignments

Forward algorithm does it. P(x, y) = fE(n, m)

Posterior distribution P(π|x,y) can be acquired.

alignments

)),( P(x,y,yxP

),(

),,(),|(

yxP

yxPyxP

Page 14: Copyright (c) 2002 by SNU CSE Biointelligence Lab 1 Chap. 4 Pairwise alignment using HMMs Biointelligence Laboratory School of Computer Sci. & Eng. Seoul.

14Copyright (c) 2002 by SNU CSE Biointelligence Lab

The full probability of The full probability of xx and and y y (2)(2)

Algorithm: Forward calculation for pair HMMs Initialization:

• fM(0, 0) = 1, fX(0,0) = fY(0,0)= 0.• All f•(i,-1), f•(-1, j) are set to 0.

Recurrence: i = 0,…,n, j = 0,…,m except (0,0);

Termination:

)].1,()1,([),(

)];,1(),1([),(

))];1,1()1,1()(1(

)1,1()21[(),(

XMY

XMX

YX

MM

jivjivqjif

jifjifqjif

jifjif

jifpjif

j

i

ji

y

x

yx

)];,(),(),([),( YXME mnfmnfmnfmnf

Page 15: Copyright (c) 2002 by SNU CSE Biointelligence Lab 1 Chap. 4 Pairwise alignment using HMMs Biointelligence Laboratory School of Computer Sci. & Eng. Seoul.

15Copyright (c) 2002 by SNU CSE Biointelligence Lab

Suboptimal alignment (1)Suboptimal alignment (1)

Type of suboptimal alignment Slightly different from optimal alignment in a few

positions Substantially or completely different

• Repeats in one or both of the sequences

Page 16: Copyright (c) 2002 by SNU CSE Biointelligence Lab 1 Chap. 4 Pairwise alignment using HMMs Biointelligence Laboratory School of Computer Sci. & Eng. Seoul.

16Copyright (c) 2002 by SNU CSE Biointelligence Lab

Suboptimal alignment (2)Suboptimal alignment (2)

Probabilistic sampling of alignments Sampling from the posterior distribution

Trace back through fk(i, j)),(

),,(),|(

yxP

yxPyxP

))];1,1()1,1()(1(

)1,1()21[(),(

YX

MM

jifjif

jifpjifji yx

),(

)1,1()1( prob. with )1,1(Y

),(

)1,1()1( prob. with )1,1(X

),(

)1,1()21( prob. with )1,1(M

M

Y

M

X

M

M

jif

jifpji

jif

jifpji

jif

jifpji

ji

ji

ji

yx

yx

yx

Page 17: Copyright (c) 2002 by SNU CSE Biointelligence Lab 1 Chap. 4 Pairwise alignment using HMMs Biointelligence Laboratory School of Computer Sci. & Eng. Seoul.

17Copyright (c) 2002 by SNU CSE Biointelligence Lab

Suboptimal alignment (3)Suboptimal alignment (3)

Finding distinct suboptimal alignments Waterman & Eggert [1987] Finding the next best alignment No aligned residue pairs in common with any previousl

y determined alignment

Page 18: Copyright (c) 2002 by SNU CSE Biointelligence Lab 1 Chap. 4 Pairwise alignment using HMMs Biointelligence Laboratory School of Computer Sci. & Eng. Seoul.

18Copyright (c) 2002 by SNU CSE Biointelligence Lab

figure 4.5

Page 19: Copyright (c) 2002 by SNU CSE Biointelligence Lab 1 Chap. 4 Pairwise alignment using HMMs Biointelligence Laboratory School of Computer Sci. & Eng. Seoul.

19Copyright (c) 2002 by SNU CSE Biointelligence Lab

posterior that posterior that xxii is aligned to is aligned to yyj j (1)(1)

Reliability measure for each part of an alignment Interest

),(

),,(),|(

yxP

yxyxPyxyxP ji

ji

)|,(),,(

),,|,(),,(),,(

...1...1...1...1

...1...1...1...1...1...1

jimjnijiji

jijimjnijijiji

yxyxPyxyxP

yxyxyxPyxyxPyxyxP

Forward algorithm Backward algorithm

Page 20: Copyright (c) 2002 by SNU CSE Biointelligence Lab 1 Chap. 4 Pairwise alignment using HMMs Biointelligence Laboratory School of Computer Sci. & Eng. Seoul.

20Copyright (c) 2002 by SNU CSE Biointelligence Lab

posterior that posterior that xxii is aligned to is aligned to yyj j (2)(2)

Algorithm: Backward calculation for pair HMMs Initialization:

• bM(n, m) = bX(n, m) = bY(n,m) = τ.• All b•(i, m+1), b•(n+1, j) are set to 0.

Recurrence: i = 1,…,n, j = 1,…,m except (n, m);

)].1,()1,1()1(),(

)];,1()1,1()1(),(

)];1,(),1([

)1,1()21(),(

YMY

XMX

Y1

X1

MM

111

111

11

jibqjibpjib

jibqjibpjib

jibqjibq

jibpjib

jji

iji

ji

ji

yyx

xyx

yx

yx

Page 21: Copyright (c) 2002 by SNU CSE Biointelligence Lab 1 Chap. 4 Pairwise alignment using HMMs Biointelligence Laboratory School of Computer Sci. & Eng. Seoul.

21Copyright (c) 2002 by SNU CSE Biointelligence Lab

posterior that posterior that xxii is aligned to is aligned to yyj j (3)(3)

The expected accuracy of an alignment Expected overlap between π and paths sampled from th

e posterior distribution

Dynamic programming

),(

)()(ji

ji yxPA

)1,(

),1(

)()1,1(

max),(

jiA

jiA

yxPjiA

jiAji

Page 22: Copyright (c) 2002 by SNU CSE Biointelligence Lab 1 Chap. 4 Pairwise alignment using HMMs Biointelligence Laboratory School of Computer Sci. & Eng. Seoul.

22Copyright (c) 2002 by SNU CSE Biointelligence Lab

Pair HMMs vs FSAs for searchinPair HMMs vs FSAs for searching (1)g (1) Two difficulties of conventional methods in

searching Not a probabilistic models for searching Not computable full probability P(x, y|M)

a b a c

qaS

B

α

1-α

1 1 1

1PS(abac) = α4qaqbqaqc

PB(abac) = 1-α

Model comparison using the best match rather than the total probability

Page 23: Copyright (c) 2002 by SNU CSE Biointelligence Lab 1 Chap. 4 Pairwise alignment using HMMs Biointelligence Laboratory School of Computer Sci. & Eng. Seoul.

23Copyright (c) 2002 by SNU CSE Biointelligence Lab

Pair HMMs vs FSAs for searchinPair HMMs vs FSAs for searching (2)g (2) Conversion FSA into probabilistic model

Probabilistic models may underperform standard alignment methods if Viterbi is used for database searching.

Buf if forward algorithm is used, it would be better than standard methods.