Protein Structure Alignment by Incremental Combinatorial Extension (CE) of the Optimal Path Ilya N....

13
Protein Structure Alignment by Incremental Combinatorial Extension (CE) of the Optimal Path Ilya N. Shindyalov, Philip E. Bourne

Transcript of Protein Structure Alignment by Incremental Combinatorial Extension (CE) of the Optimal Path Ilya N....

Page 1: Protein Structure Alignment by Incremental Combinatorial Extension (CE) of the Optimal Path Ilya N. Shindyalov, Philip E. Bourne.

Protein Structure Alignment by Incremental Combinatorial

Extension (CE) of the Optimal Path

Ilya N. Shindyalov, Philip E. Bourne

Page 2: Protein Structure Alignment by Incremental Combinatorial Extension (CE) of the Optimal Path Ilya N. Shindyalov, Philip E. Bourne.

Why Align Structures?

• Additional measure of protein similarity

• Structure generally preserved better than sequence over the course of evolution

• May help in protein fold identification

• Interesting combinatorial problem

Page 3: Protein Structure Alignment by Incremental Combinatorial Extension (CE) of the Optimal Path Ilya N. Shindyalov, Philip E. Bourne.

The Structural Alignment Problem

• We know how to optimally superimpose two proteins of the same length so as to minimize RMSD (Hendrickson, 1979)

• However, no obvious way to compare objects of different length, or to optimally add or remove gaps

• Heuristic methods for structural alignment are the best we can do at the moment

Page 4: Protein Structure Alignment by Incremental Combinatorial Extension (CE) of the Optimal Path Ilya N. Shindyalov, Philip E. Bourne.

Alignment Fragment Pairs For a pair of proteins A and B, an alignment

fragment pair (AFP) is defined as a continuous segment of A aligned against a continuous segment of B of the same size (without gaps).

If n1 and n2 are the lengths of A and B, and AFP length is set to m, then there is a total of (n1 m)(n2 m) AFPs.

Page 5: Protein Structure Alignment by Incremental Combinatorial Extension (CE) of the Optimal Path Ilya N. Shindyalov, Philip E. Bourne.

Defining an Alignment• An alignment is defined as a continuous path of AFPs of

fixed length m s.t. for every two consecutive AFPs there may be gaps inserted into either A or B, but not into both. That is, for every two consecutive AFPs i and i+1, we have

1) and or2) and or3) and

Where piA represents the starting position of

AFP i in protein A

mpp Ai

Ai 1 mpp B

iBi 1

mpp Ai

Ai 1

mpp Bi

Bi 1mpp A

iAi 1

mpp Bi

Bi 1

Page 6: Protein Structure Alignment by Incremental Combinatorial Extension (CE) of the Optimal Path Ilya N. Shindyalov, Philip E. Bourne.

The CE Algorithm• Goal: Find a “good” local alignment for

structures of proteins A and B.• Basic idea:

1. Select some initial AFP.

2. Build an alignment path by incrementally adding AFPs in a way that satisfies the conditions on the previous slide.

3. Repeat step (2) until the length of each protein is traversed, or until no “good” AFPs remain.

Page 7: Protein Structure Alignment by Incremental Combinatorial Extension (CE) of the Optimal Path Ilya N. Shindyalov, Philip E. Bourne.

Algorithm Specifics

• How do we choose the starting AFP?

• What are the criteria for adding AFPs to our alignment path?

• How do we know when to stop? That is, at what point do we know that there no “good” AFPs left?

There are various heuristics that could be used to supply answers to the above questions.

Page 8: Protein Structure Alignment by Incremental Combinatorial Extension (CE) of the Optimal Path Ilya N. Shindyalov, Philip E. Bourne.

Sample Heuristics: AFP Distances• We can define the distance between two different

AFPs i and j as:

Here, dA(p,q) represents the distance between the alpha carbon atoms at positions p and q in protein A. Setting i=j, and using the same formula, we can define the distance Dii between two fragments of the same AFP.

),1(),1(1

1

kmpkpdkmpkpdm

D Bj

BiB

Aj

Ai

m

kAij

Page 9: Protein Structure Alignment by Incremental Combinatorial Extension (CE) of the Optimal Path Ilya N. Shindyalov, Philip E. Bourne.

Sample Heuristics:Extending the Alignment Path

• Suppose our alignment path already consists of AFPs 0…n1, and we are trying to decide whether to add AFP n to the path. We will do so only if:

• (4) 0DDnn

1

011

1 n

iin DD

n

n

i

n

jij DD

n 0 012

1

Page 10: Protein Structure Alignment by Incremental Combinatorial Extension (CE) of the Optimal Path Ilya N. Shindyalov, Philip E. Bourne.

Extending Alignment Path (Cont)Where:• D0 and D1 are specified cut-off distances.• The decision whether AFP n is “fit” is

based on 4. • The decision whether AFP n “works” with

all the other alignments in the path is based on the 5.

• The decision whether we should extend the alignment path at all is based on 6.

Page 11: Protein Structure Alignment by Incremental Combinatorial Extension (CE) of the Optimal Path Ilya N. Shindyalov, Philip E. Bourne.

Alignment Assessment and Post-alignment Optimization

• To assess how good the alignment produced by CE is, we can compare it to the alignment of a random pair of structures, and compute the Z-score based on the RMSD distance and number of gaps in the final alignment.

• Since CE does not penalize gaps, we can perform additional optimization after the CE is completed in order to remove excess gaps using dynamic programming.

Page 12: Protein Structure Alignment by Incremental Combinatorial Extension (CE) of the Optimal Path Ilya N. Shindyalov, Philip E. Bourne.

Results and Conclusion

• The CE method is highly configurable, which is at once its strength and weakness. Adjusting multiple parameters, such as AFP length m, cutoff distances D0 and D1, and definitions for AFP distances, can result varying alignments and execution speeds.

Page 13: Protein Structure Alignment by Incremental Combinatorial Extension (CE) of the Optimal Path Ilya N. Shindyalov, Philip E. Bourne.

Results and Conclusion• In general, CE does not outperform

previously existing structural alignment methods, such as Dali and VAST: it does better for some pairs of structures, and worse for others.

• Since it is fairly straightforward and easy to implement, CE provides an interesting addition to the toolbox of structural alignment algorithms.