Segment Alignment (SEA) Yuzhen Ye Adam Godzik The Burnham Institute.

16
Segment Alignment (SEA) Yuzhen Ye Adam Godzik The Burnham Institute

Transcript of Segment Alignment (SEA) Yuzhen Ye Adam Godzik The Burnham Institute.

Page 1: Segment Alignment (SEA) Yuzhen Ye Adam Godzik The Burnham Institute.

Segment Alignment (SEA)

Yuzhen YeAdam Godzik

The Burnham Institute

Page 2: Segment Alignment (SEA) Yuzhen Ye Adam Godzik The Burnham Institute.

Outline

• A new look at the local structure prediction• Network matching problem• Practical issues• Applications

Page 3: Segment Alignment (SEA) Yuzhen Ye Adam Godzik The Burnham Institute.

GSDKKGNGVALMTTLFADN

EEEEEEHHHHHHHH HHHHHH

EEEEEE LLHHHHHHHHLLL

LHHHHHLLLLLLLEEEEEEEEE

LLLLL

Description of local structure one or many answers?

GSDKKGNGVALMTTLFADN

LLHHHHHHHHLLLEEEEEE A prediction

HHHHHHHHLLLLLHHHHHH Real structure

Page 4: Segment Alignment (SEA) Yuzhen Ye Adam Godzik The Burnham Institute.

Motivation• A natural description of local structures: keep the

segment information of local structures

• Keep uncertainties in local structure predictions: drawbacks of prediction programs and intrinsic uncertainties of local structures in absence of global interactions

Incorporating the protein local structure in protein sequence comparison may help to detect the distant homologies and to improve their alignments (for homology modeling)

Page 5: Segment Alignment (SEA) Yuzhen Ye Adam Godzik The Burnham Institute.

Proteins are described as a network of PLSSs (predicted local structure segments)

Page 6: Segment Alignment (SEA) Yuzhen Ye Adam Godzik The Burnham Institute.

Protein comparison problem is equivalent to a network matching problem

Given two networks of PLSSs, find two optimal paths from the source to the sink in each of the networks, whose corresponding PLSSs are most similar to each other.

It does not follow the typical position-by-position alignment mode

Page 7: Segment Alignment (SEA) Yuzhen Ye Adam Godzik The Burnham Institute.

Solving the network matching problem: dynamic programming

V(i,j)

i

j

V(i, j) = maxall ( combinations E i E j V i j, ( ), ( ) ( , )

V(i1,j1)

V(i1,j2)

V(i3,j1)

V(i3,j2)

(i-1)1 i1

32

1 2

4 (i-1)3, (i-1)4 i2

Page 8: Segment Alignment (SEA) Yuzhen Ye Adam Godzik The Burnham Institute.

Example: (1e68A,1nkl)

Each protein is represented as a collection of potentially overlapping and contradictory PLSSs (a network).

SEA finds an optimal alignment between these two proteins

Simultaneously, SEA identifies the optimal subset of PLSSs (a path in the network) describing each protein.

1e68A: Bacteriocin As-48

1nkl : Nk-lysin

Page 9: Segment Alignment (SEA) Yuzhen Ye Adam Godzik The Burnham Institute.

subset measures CE SEA_true SEA_c30 SEA_c10 SEA_c5 SEA_1d BLAST ALIGN FFAS

average-shift

0.61 0.56 0.56 0.54 0.49 0.44 0.48 0.49

shift>0.9 73 69 63 56 47 51 60 43

shift>0.7 207 199 192 183 152 146 165 161

shift>0.5 282 260 259 251 215 197 228 227

RMSD3.0

257 95 82 82 76 63 77 54 40

RMSD5.0 397 237 184 171 177 147 157 138 118

RMSD8.0 408 294 248 249 249 231 196 206 194

Family

(409 pairs)

all 409 345 404 398 368 366 232 372 409

average-shift

0.27 0.12 0.12 0.12 0.08 0.09 0.06 0.07

shift>0.9 3 3 3 2 0 1 2 1

shift>0.7 17 8 9 7 4 10 9 7

shift>0.5 54 26 23 21 17 18 18 17

RMSD3.0

55 12 6 6 7 6 8 3 1

RMSD5.0 160 44 16 18 18 11 18 11 1

RMSD8.0 163 69 37 34 41 28 23 22 15

Superfamily

(225 pairs)

all 166 128 217 204 181 177 41 149 225

General performance of SEA incorporating different local structure diversities

Page 10: Segment Alignment (SEA) Yuzhen Ye Adam Godzik The Burnham Institute.

Keeping local structure diversity helps improve alignment quality

alignment between -repressor from E.coli (1lliA) and 434 repressor (1r69)

Page 11: Segment Alignment (SEA) Yuzhen Ye Adam Godzik The Burnham Institute.

Stable region

Variable region

Local structure information is crucial for improving alignments, especially in the more divergent regions

1esfA: straphylococcal enterotoxin

2tssA: toxic shock syndrome toxin-1

Page 12: Segment Alignment (SEA) Yuzhen Ye Adam Godzik The Burnham Institute.

Practical issue: local structural prediction

• Searching I-site database (web-server or standalone program)

• Our solution: FragLib– using sensitive profile-profile alignment program FFAS to

predict local structures

Page 13: Segment Alignment (SEA) Yuzhen Ye Adam Godzik The Burnham Institute.

Applications

• Distant homology detection

• Local structure prediction

• Improving alignments for protein modeling

Page 14: Segment Alignment (SEA) Yuzhen Ye Adam Godzik The Burnham Institute.

Reference A segment alignment approach to protein comparison (Bioinformatics, April issue)

Web server http://ffas.ljcrf.edu/sea

Page 15: Segment Alignment (SEA) Yuzhen Ye Adam Godzik The Burnham Institute.

Related work• Spliced sequence alignment

– Gelfand et al., 1996, PNAS; Novichkov et al., 2001

– Assembling genes from alternative exons

• Jumping alignment– Spang R, Rehmsmeier M, Stoye J. JCB, 2002

– Computes a local alignment of a single sequence and a multiple alignment

– The sequence is at each position aligned to one sequence of the multiple alignment (reference sequence) instead of a profile

• Partial order alignment– Lee C, Grasso C, Sharlow MF, Bioinformatics, 2002

– Multiple alignment

Page 16: Segment Alignment (SEA) Yuzhen Ye Adam Godzik The Burnham Institute.

Acknowledgements

• Dariusz Plewczyński

• Iddo Friedberg

• Łukasz Jaroszewski

• Weizhong Li

• This project is supported by SPAM grant GM63208