Using Pictorial Structures to Identify Proteins in X-ray Crystallographic Electron Density Maps

22
Using Pictorial Structures to Identify Proteins in X- ray Crystallographic Electron Density Maps Frank DiMaio [email protected] Jude Shavlik [email protected] George N. Phillips, Jr. [email protected] ICML Bioinformatics Workshop 21 August 2003

description

Using Pictorial Structures to Identify Proteins in X-ray Crystallographic Electron Density Maps. Frank DiMaio [email protected] Jude Shavlik [email protected] George N. Phillips, Jr. [email protected] ICML Bioinformatics Workshop 21 August 2003. Task Overview. . Given - PowerPoint PPT Presentation

Transcript of Using Pictorial Structures to Identify Proteins in X-ray Crystallographic Electron Density Maps

Page 1: Using Pictorial Structures to Identify Proteins in X-ray Crystallographic Electron Density Maps

Using Pictorial Structures to Identify Proteins in X-ray Crystallographic Electron Density Maps

Frank DiMaio [email protected] Shavlik [email protected] N. Phillips, Jr. [email protected]

ICML Bioinformatics Workshop21 August 2003

Page 2: Using Pictorial Structures to Identify Proteins in X-ray Crystallographic Electron Density Maps

Task Overview

Given • Electron density for a

region in a protein• Protein’s topology

Find• Atomic positions of

individual atoms in the density map

Page 3: Using Pictorial Structures to Identify Proteins in X-ray Crystallographic Electron Density Maps

Pictorial Structures

A pictorial structure is…

a collection of image parts

together with…a deformable conformation of these parts

Page 4: Using Pictorial Structures to Identify Proteins in X-ray Crystallographic Electron Density Maps

Pictorial Structures

Formally, a model consists of

Set of parts V={v1, …, vn}

Configuration L=(l1, …, ln)

Edges eij E, connect neighboring parts vi, vj

– Explicit dependency between li, lj

– G = (V,E) forms a Markov Random Field

Appearance parameters Ai for each part

Connection parameters Cij for each edge

v3

v4 v5

v6

v1 v2

e13 e23

e34

e35

e46

v4

Page 5: Using Pictorial Structures to Identify Proteins in X-ray Crystallographic Electron Density Maps

Matching Algorithm Overview Want configuration L of model Θ maximizing

P(L|I,Θ) P(I|L,Θ) · P(L|Θ)

P(I|L,Θ) = Πi P(I|li,Θ) =1

Z1e- Σi matchi(li)

P(L|Θ) = Π (vi,vj)E P(li,lj|Cij) =1

Z2e- Σ(vi,vj)E dij(li,lj)

Equivalent to minimizing

Σi matchi(li) + Σ(vi,vj)E dij(li,lj)

Page 6: Using Pictorial Structures to Identify Proteins in X-ray Crystallographic Electron Density Maps

Linear-Time Matching Algorithm A Dynamic Programming implementation runs in

quadratic time

Requires tree configuration of parts

Felzenszwalb & Huttenlocher (2000) developed linear-time matching algorithm

Additional constraint on part-to-part cost function dij

Basic “Trick”: Parallelize minimization computation over entire grid using a Generalized Distance Transform

Page 7: Using Pictorial Structures to Identify Proteins in X-ray Crystallographic Electron Density Maps

Pictorial Structures for Map InterpretationBasic Idea: Build pictorial structure that is able to model all configurations of a molecule

Each part in “collection of parts” corresponds to an atom

Model has low-cost conformation for low-energy states of the molecule

Page 8: Using Pictorial Structures to Identify Proteins in X-ray Crystallographic Electron Density Maps

The Screw-Joint Model

Ideally, we would have

cost function = atomic energy

Problem: Impossible to represent atomic energy function using pairwise potentials while maintaining tree-structure

Solution: screw-joint model Ignore non-bonded interactions

Edges correspond to covalent bonds

Allow free rotation around bonds

Page 9: Using Pictorial Structures to Identify Proteins in X-ray Crystallographic Electron Density Maps

Screw-Joint Model Details Each part’s configuration has six params (x,y,z,α,β,γ) with

(x,y,z) is part’s position α is part’s rotation (about bond connecting vi and vj)

(β,γ) is part’s orientation

vi

vj

vi

vj(xij,yij,zij)

(βi,γi)

(βj,γj) (xi,yi,zi)

(xj,yj,zj)

αj

αi

Part-to-part cost function dij based on child’s deviation from ideal

Matching cost function matchi based on 3x3x3 template match

Page 10: Using Pictorial Structures to Identify Proteins in X-ray Crystallographic Electron Density Maps

Pictorial Structures for Map Interpretation

Ideally, we would … Build pictorial structure for the entire protein Run the matching algorithm to get best layout

However, computationally infeasible

Instead, we use two-phase algorithm that …a) computes best backbone trace

b) computes best sidechain conformation(current focus)

Page 11: Using Pictorial Structures to Identify Proteins in X-ray Crystallographic Electron Density Maps

Sidechain Refinement Assume we have a rough Cα trace of the protein

Next use pictorial structure matching to place sidechains

Walk along chain one residue at a time, placing individual atoms

Cα, MET_80

Cα, ARG_81

Cα, ALA_82

Cα, PRO_83

Page 12: Using Pictorial Structures to Identify Proteins in X-ray Crystallographic Electron Density Maps

Sidechain Refinement

Given: residue type approximate Cα locations

Find: most likely location for sidechain atoms in the residue

Example Alanine

N

C-1 Cα

Cα-1 O-1 C Cβ

O

Cα+1

N+1

O

N NO Matching

algorithm

Page 13: Using Pictorial Structures to Identify Proteins in X-ray Crystallographic Electron Density Maps

Learning Model Parameters

O

N N

OC

N

CβAveraged 3D Template

Averaged Bond Geometry

Canonic Orientation

N

C-1 Cα

C Cβ

O N+1

Alanine Cα

C

N

r = 1.53θ = 0.0°φ = -19.3°

r = 1.51θ = 118.4°φ = -19.7°

Page 14: Using Pictorial Structures to Identify Proteins in X-ray Crystallographic Electron Density Maps

Soft Maximums

Sometimes we may get an optimal match like the one to the right

When this occurs, explore the space of non-optimal solutions via soft maximums in DP

Basic Idea: Take a path with probability inversely proportional to its cost

ACTUAL PREDICTED 1

Page 15: Using Pictorial Structures to Identify Proteins in X-ray Crystallographic Electron Density Maps

Soft Maximums

Figure to the right shows soft maximums

Red molecule eventually found

Annealing increases “softness” until legal structure found

Legal structure may not be “right”

ACTUAL PREDICTED 1

PREDICTED 2

Page 16: Using Pictorial Structures to Identify Proteins in X-ray Crystallographic Electron Density Maps

Results

Only sidechain refinement implemented & tested Experimental Methodology

Assume Cα’s known to within 2Å

Trained on 1.7 Å resolution protein, tested on 1.9 Å resolution protein

Templates built for ALA, VAL, TYR, LYS

Model Parameters Grid spacing of 0.5 Å within diameter 10 Å sphere Rotational discretization:

12 rotational steps 84 orientations

Page 17: Using Pictorial Structures to Identify Proteins in X-ray Crystallographic Electron Density Maps

Sidechain Placement

Compared predicted vs. actual location for 599 atoms on testset protein

29.9% atoms within 0.5Å

72.3% atoms within 1.0Å

93.0% atoms within 2.0Å

Recall 0.5Å grid spacing

0

0.2

0.4

0.6

0.8

1

0 2 4 6 8

Accuracy (angstroms)

% a

tom

s

Page 18: Using Pictorial Structures to Identify Proteins in X-ray Crystallographic Electron Density Maps

Predictive Accuracy Task

We used DP matching score as a predictor of amino acid type

Tested 49 ALA, LYS, TYR, VAL residues

Highest scoring normalized template determined type

61.2% accuracy (majority classification = 33%)

ala

lys

tyr

val

alalystyrval

0

2

1

7

1

7

6

0

9

2

3

2

0

8

1

0

actual

predicted

Page 19: Using Pictorial Structures to Identify Proteins in X-ray Crystallographic Electron Density Maps

The Good… PREDICTEDPREDICTED vs. ACTUALACTUAL

LYSINELYSINE

VALINE

TYROSINE

Page 20: Using Pictorial Structures to Identify Proteins in X-ray Crystallographic Electron Density Maps

… and the Bad PREDICTEDPREDICTED vs. ACTUALACTUAL

LYSINE

ALANINETYROSINE

VALINE

Page 21: Using Pictorial Structures to Identify Proteins in X-ray Crystallographic Electron Density Maps

Future Work

Implement & integrate backbone tracing algorithm, to create complete two-tiered solution

Better strategies to handle illegal molecule configurations perturbation of branches involved in collisions

more accurate representation of atomic energy function, e.g. torsion angle

Better match function … make use of previous work?

More tests (larger training set, higher resolution)

Page 22: Using Pictorial Structures to Identify Proteins in X-ray Crystallographic Electron Density Maps

Acknowledgements

NLM grant 1T15 LM007359-01

NLM grant 1R01 LM07050-01

NIH grant P50 GM64598.