Traceback and local alignment

43
Traceback and local alignment Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington [email protected]

description

Traceback and local alignment. Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington [email protected]. Outline. Responses from last class Sequence alignment Motivation Scoring alignments Python. - PowerPoint PPT Presentation

Transcript of Traceback and local alignment

Page 1: Traceback  and local alignment

Traceback and local alignment

Prof. William Stafford NobleDepartment of Genome Sciences

Department of Computer Science and EngineeringUniversity of Washington

[email protected]

Page 2: Traceback  and local alignment

Outline

• Responses from last class• Sequence alignment– Motivation– Scoring alignments

• Python

Page 3: Traceback  and local alignment

One-minute responses• Thank you for doing the one-minute

responses and the revisions.• Liked that you let us solve the DP.• Liked going to lab in second half of

lecture.• More slowly with Python

programming.• Please explain more about string

handling in Python, especially sys.argv.

• sys.argv is now more clear.• Need more practical problems using

sys.• I know how to compute the DP score

but not how to get the alignment.

• I still do not get how sequence alignment works.

• Can we go over DP again?• Please don’t give us a test because

our essay phase has begun already.• Please limit the number of

questions in class.• Moving too slow with previous work

and too fast with current work.• I understood about 95% of the

lecture.• Can you give us some Python

documentation or some interesting web site to improve and learn?

Page 4: Traceback  and local alignment

Revision

• What two things are needed to score a pairwise alignment?– A substitution matrix and a gap penalty.

• What does entry (i,j) in the DP matrix store?– The score of the best-scoring alignment up to those

positions.• What are the three valid moves when filling in

the DP matrix?– Horizontal and vertical, corresponding to gaps.

Diagonal, corresponding to a substitution.

Page 5: Traceback  and local alignment

A small exampleA C G T

A 2 -7 -5 -7C -7 2 -7 -5G -5 -7 2 -7T -7 -5 -7 2

A A G

A

G

C 1,1 jiF

jiF , jiF ,1

1, jiF

d

d ji yxs ,

Find the optimal alignment of AAG and AGC.Use a gap penalty of d=-5.

Page 6: Traceback  and local alignment

A simple exampleA C G T

A 2 -7 -5 -7C -7 2 -7 -5G -5 -7 2 -7T -7 -5 -7 2

A A G

0 -5 -10 -15

A -5

G -10

C -15 1,1 jiF

jiF , jiF ,1

1, jiF

d

d ji yxs ,

Find the optimal alignment of AAG and AGC.Use a gap penalty of d=-5.

Page 7: Traceback  and local alignment

A simple exampleA C G T

A 2 -7 -5 -7C -7 2 -7 -5G -5 -7 2 -7T -7 -5 -7 2

A A G

0 -5 -10 -15

A -5 2 -3 -8

G -10 -3 -3 -1

C -15 -8 -8 -6 1,1 jiF

jiF , jiF ,1

1, jiF

d

d ji yxs ,

Find the optimal alignment of AAG and AGC.Use a gap penalty of d=-5.

Page 8: Traceback  and local alignment

Traceback

• Start from the lower right corner and trace back to the upper left.

• Each arrow introduces one character at the end of each aligned sequence.

• A horizontal move puts a gap in the left sequence.• A vertical move puts a gap in the top sequence.• A diagonal move uses one character from each

sequence.

Page 9: Traceback  and local alignment

• Start from the lower right corner and trace back to the upper left.

• Each arrow introduces one character at the end of each aligned sequence.

• A horizontal move puts a gap in the left sequence.

• A vertical move puts a gap in the top sequence.

• A diagonal move uses one character from each sequence.

A simple example

A A G

0 -5

A 2 -3

G -1

C -6

Find the optimal alignment of AAG and AGC.Use a gap penalty of d=-5.

Page 10: Traceback  and local alignment

• Start from the lower right corner and trace back to the upper left.

• Each arrow introduces one character at the end of each aligned sequence.

• A horizontal move puts a gap in the left sequence.

• A vertical move puts a gap in the top sequence.

• A diagonal move uses one character from each sequence.

A simple example

A A G

0 -5

A 2 -3

G -1

C -6

Find the optimal alignment of AAG and AGC.Use a gap penalty of d=-5.

AAG- AAG--AGC A-GC

Page 11: Traceback  and local alignment

DP matrix

G A A T C

0 -4 -8 -12 -16 -20

C -4 -5 -9 -13 -12 -6

A -8 -4 5 1 -3 -7

T -12 -8 1 0 11 7

A -16 -12 2 11 7 6

C -20 -16 -2 7 11 17

GA-ATCCATA-C

Page 12: Traceback  and local alignment

DP matrix

G A A T C

0 -4 -8 -12 -16 -20

C -4 -5 -9 -13 -12 -6

A -8 -4 5 1 -3 -7

T -12 -8 1 0 11 7

A -16 -12 2 11 7 6

C -20 -16 -2 7 11 17

GAAT-CCA-TAC

Page 13: Traceback  and local alignment

DP matrix

G A A T C

0 -4 -8 -12 -16 -20

C -4 -5 -9 -13 -12 -6

A -8 -4 5 1 -3 -7

T -12 -8 1 0 11 7

A -16 -12 2 11 7 6

C -20 -16 -2 7 11 17

GAAT-CC-ATAC

Page 14: Traceback  and local alignment

DP matrix

G A A T C

0 -4 -8 -12 -16 -20

C -4 -5 -9 -13 -12 -6

A -8 -4 5 1 -3 -7

T -12 -8 1 0 11 7

A -16 -12 2 11 7 6

C -20 -16 -2 7 11 17

GAAT-C-CATAC

Page 15: Traceback  and local alignment

Multiple solutions

• When a program returns a sequence alignment, it may not be the only best alignment.

GA-ATCCATA-C

GAAT-CCA-TAC

GAAT-CC-ATAC

GAAT-C-CATAC

Page 16: Traceback  and local alignment

Traceback problem #1

G A A T C

0 -4 -8 -12 -16 -20

C -4 -5 -9 -13 -12 -6

A -8 -4 5 1 -3 -7

T -12 -8 1 0 11 7

A -16 -12 2 11 7 6

C -20 -16 -2 7 11 17

Write down the alignment corresponding to the circled score.

Page 17: Traceback  and local alignment

Solution #1

G A A T C

0 -4 -8 -12 -16 -20

C -4 -5 -9 -13 -12 -6

A -8 -4 5 1 -3 -7

T -12 -8 1 0 11 7

A -16 -12 2 11 7 6

C -20 -16 -2 7 11 17

Write down the alignment corresponding to the circled score.

GACA

Page 18: Traceback  and local alignment

Traceback problem #2

G A A T C

0 -4 -8 -12 -16 -20

C -4 -5 -9 -13 -12 -6

A -8 -4 5 1 -3 -7

T -12 -8 1 0 11 7

A -16 -12 2 11 7 6

C -20 -16 -2 7 11 17

Write down three alignments corresponding to the circled score.

Page 19: Traceback  and local alignment

G A A T C

0 -4 -8 -12 -16 -20

C -4 -5 -9 -13 -12 -6

A -8 -4 5 1 -3 -7

T -12 -8 1 0 11 7

A -16 -12 2 11 7 6

C -20 -16 -2 7 11 17

GAATCCA---Solution #2

Write down three alignments corresponding to the circled score.

Page 20: Traceback  and local alignment

G A A T C

0 -4 -8 -12 -16 -20

C -4 -5 -9 -13 -12 -6

A -8 -4 5 1 -3 -7

T -12 -8 1 0 11 7

A -16 -12 2 11 7 6

C -20 -16 -2 7 11 17

GAATCCA---

GAATCC-A--Solution #2

Write down three alignments corresponding to the circled score.

Page 21: Traceback  and local alignment

Solution #2

G A A T C

0 -4 -8 -12 -16 -20

C -4 -5 -9 -13 -12 -6

A -8 -4 5 1 -3 -7

T -12 -8 1 0 11 7

A -16 -12 2 11 7 6

C -20 -16 -2 7 11 17

GAATCCA---

GAATCC-A--

GAATC-CA--

Write down three alignments corresponding to the circled score.

Page 22: Traceback  and local alignment

Local alignment

• A protein may be homologous to a region within a second protein.

• Usually, an alignment that spans the complete length of both sequences is not required.

Page 23: Traceback  and local alignment

BLAST allows local alignments

Global alignment

Local alignment

Page 24: Traceback  and local alignment

Global alignment DP

• Align sequence x and y.• F is the DP matrix; s is the substitution matrix;

d is the linear gap penalty.

djiFdjiF

yxsjiFjiF

F

ji

1,,1

,1,1max,

00,0

Page 25: Traceback  and local alignment

Local alignment DP

• Align sequence x and y.• F is the DP matrix; s is the substitution matrix;

d is the linear gap penalty.

01,,1

,1,1max,

00,0

djiFdjiF

yxsjiFjiF

F

ji

Page 26: Traceback  and local alignment

Local DP in equation form

1,1 jiF

jiF , jiF ,1

1, jiF

d

d ji yxs ,

0

Page 27: Traceback  and local alignment

A simple exampleA C G T

A 2 -7 -5 -7C -7 2 -7 -5G -5 -7 2 -7T -7 -5 -7 2

A A G

A

G

C 1,1 jiF

jiF , jiF ,1

1, jiF

d

d ji yxs ,

Find the optimal local alignment of AAG and AGC.Use a gap penalty of d=-5.

0

Page 28: Traceback  and local alignment

A simple exampleA C G T

A 2 -7 -5 -7C -7 2 -7 -5G -5 -7 2 -7T -7 -5 -7 2

A A G

0 0 0 0

A 0

G 0

C 0 1,1 jiF

jiF , jiF ,1

1, jiF

d

d ji yxs ,

Find the optimal local alignment of AAG and AGC.Use a gap penalty of d=-5.

0

Page 29: Traceback  and local alignment

A simple exampleA C G T

A 2 -7 -5 -7C -7 2 -7 -5G -5 -7 2 -7T -7 -5 -7 2

A A G

0 0 0 0

A 0

G 0

C 0 1,1 jiF

jiF , jiF ,1

1, jiF

d

d ji yxs ,

Find the optimal local alignment of AAG and AGC.Use a gap penalty of d=-5.

0

0 -50-5

2

Page 30: Traceback  and local alignment

A simple exampleA C G T

A 2 -7 -5 -7C -7 2 -7 -5G -5 -7 2 -7T -7 -5 -7 2

A A G

0 0 0 0

A 0 2

G 0 ?

C 0 ? 1,1 jiF

jiF , jiF ,1

1, jiF

d

d ji yxs ,

Find the optimal local alignment of AAG and AGC.Use a gap penalty of d=-5.

0

Page 31: Traceback  and local alignment

A simple exampleA C G T

A 2 -7 -5 -7C -7 2 -7 -5G -5 -7 2 -7T -7 -5 -7 2

A A G

0 0 0 0

A 0 2 ? ?

G 0 0 ? ?

C 0 0 ? ? 1,1 jiF

jiF , jiF ,1

1, jiF

d

d ji yxs ,

Find the optimal local alignment of AAG and AGC.Use a gap penalty of d=-5.

0

Page 32: Traceback  and local alignment

A simple exampleA C G T

A 2 -7 -5 -7C -7 2 -7 -5G -5 -7 2 -7T -7 -5 -7 2

A A G

0 0 0 0

A 0 2 2 0

G 0 0 0 4

C 0 0 0 0 1,1 jiF

jiF , jiF ,1

1, jiF

d

d ji yxs ,

Find the optimal local alignment of AAG and AGC.Use a gap penalty of d=-5.

0

Page 33: Traceback  and local alignment

Local alignment

• Two differences with respect to global alignment:– No score is negative.– Traceback begins at the highest score in the matrix

and continues until you reach 0.• Global alignment algorithm: Needleman-

Wunsch.• Local alignment algorithm: Smith-Waterman.

Page 34: Traceback  and local alignment

A simple exampleA C G T

A 2 -7 -5 -7C -7 2 -7 -5G -5 -7 2 -7T -7 -5 -7 2

A A G

0 0 0 0

A 0 2 2 0

G 0 0 0 4

C 0 0 0 0 1,1 jiF

jiF , jiF ,1

1, jiF

d

d ji yxs ,

Find the optimal local alignment of AAG and AGC.Use a gap penalty of d=-5.

0

AGAG

Page 35: Traceback  and local alignment

Local alignmentA C G T

A 2 -7 -5 -7C -7 2 -7 -5G -5 -7 2 -7T -7 -5 -7 2

A A G0 0 0 0

G 0A 0A 0G 0G 0C 0

1,1 jiF

jiF , jiF ,1

1, jiF

d

d ji yxs ,

Find the optimal local alignment of AAG and GAAGGC.Use a gap penalty of d=-5.

0

Page 36: Traceback  and local alignment

Local alignmentA C G T

A 2 -7 -5 -7C -7 2 -7 -5G -5 -7 2 -7T -7 -5 -7 2

A A G0 0 0 0

G 0 0 0 2A 0 2 2 0A 0 2 4 0G 0 0 0 6G 0 0 0 2C 0 0 0 0

1,1 jiF

jiF , jiF ,1

1, jiF

d

d ji yxs ,

Find the optimal local alignment of AAG and GAAGGC.Use a gap penalty of d=-5.

0

Page 37: Traceback  and local alignment

Local alignmentA C G T

A 2 -7 -5 -7C -7 2 -7 -5G -5 -7 2 -7T -7 -5 -7 2

A A G0 0 0 0

G 0 0 0 2A 0 2 2 0A 0 2 4 0G 0 0 0 6G 0 0 0 2C 0 0 0 0

1,1 jiF

jiF , jiF ,1

1, jiF

d

d ji yxs ,

Find the optimal local alignment of AAG and GAAGGC.Use a gap penalty of d=-5.

0

AAGAAG

Page 38: Traceback  and local alignment

Summary

• Local alignment finds the best match between subsequences.

• Smith-Waterman local alignment algorithm:– No score is negative.– Trace back from the largest score in the matrix.

Page 39: Traceback  and local alignment

Sample problem #1• Given:

– Two letters– A substitution matrix written as three columns (letter, letter, value)A C 0A D -2

• Return:– The value associated with the two letters

• You must store the substitution matrix in memory.• You must account for the fact that the order of the letters doesn’t

matter.

Page 40: Traceback  and local alignment

Solution outline

• Read the substitution matrix into a dictionary.– Each line of the file has three values.A C 4A D -2– Each line is stored in a dictionary with the first two

entries as a tuple key and the last entry as the value.

substitionMatrix[(letter1, letter2)] = value

Page 41: Traceback  and local alignment

Sample problem #2

• Given: – A file containing two sequences of equal length (one per

line)– A substitution matrix written as three columns (letter, letter,

value)• Return:– The score of the ungapped alignment between the

sequences

• Test using input files from class web page. – Solutions: 1 = 69, 2 = 104, 3 = 153

Page 42: Traceback  and local alignment

Solution outline

• Store the substitution matrix in memory, as before.

• Check to be sure the sequences are the same length, and report an error if they are not.

• Use a for loop to traverse both sequences at once.

Page 43: Traceback  and local alignment

One-minute response

At the end of each class• Write for about one minute.• Provide feedback about the class.• Was part of the lecture unclear?• What did you like about the class?• Do you have unanswered questions?I will begin the next class by responding to the one-minute responses