Anna Yershova Department of Computer Science Duke University February 5, 2010
description
Transcript of Anna Yershova Department of Computer Science Duke University February 5, 2010
Anna YershovaDepartment of Computer Science
Duke University
February 5, 2010
Automated High-Resolution Protein Structure Determination using
Residual Dipolar Couplings
Feb 5 2010, NC State UniversityFeb 5 2010, NC State University Automated Protein Structure Determination using Automated Protein Structure Determination using RDCsRDCs
1
High-resolution structures are needed for:
Determining protein functions Protein redesign
IntroductionIntroduction Motivation
Protein Structure Determination is Protein Structure Determination is ImportantImportant
Protein Structure Determination is Protein Structure Determination is ImportantImportant
2
Amino acid sequences
Structures
Functions Protein redesign
IntroductionIntroduction Motivation
What is Protein Structure: Primary What is Protein Structure: Primary StructureStructure
What is Protein Structure: Primary What is Protein Structure: Primary StructureStructure
3
1 2 3 4
The sequence of amino acids forms the backbone.Residues are sidechains attached to the backbone.
Amino acidSide chain Dihedral angle
IntroductionIntroduction Motivation
What is Protein Structure: Secondary What is Protein Structure: Secondary Structure ElementsStructure Elements
What is Protein Structure: Secondary What is Protein Structure: Secondary Structure ElementsStructure Elements
4
Local folding is maintained by short distance interactions.
IntroductionIntroduction Motivation
What is Protein Structure: 3D FoldWhat is Protein Structure: 3D FoldWhat is Protein Structure: 3D FoldWhat is Protein Structure: 3D Fold
5
Global 3D folding is maintained by more distant interactions.
Side chain
Beta-strands
Alpha-helix
Loop
IntroductionIntroduction Motivation
High-Throughput Structure High-Throughput Structure DeterminationDeterminationIs ImportantIs Important
High-Throughput Structure High-Throughput Structure DeterminationDeterminationIs ImportantIs Important
6 http://www.metabolomics.ca/News/lectures/CPI2008-short.pdf
The gap between sequences and structures
IntroductionIntroduction Motivation
Current Approaches for Structure Current Approaches for Structure DeterminationDetermination
Current Approaches for Structure Current Approaches for Structure DeterminationDetermination
7
X-ray crystallography Difficulty: growing good quality crystals
Nuclear Magnetic Resonance (NMR) spectroscopy
Difficulty: lengthy (expensive) time in processing and analyzing experimental data
Both require expressing and purifying proteins.
IntroductionIntroduction Motivation
Bruce Donald’s LabBruce Donald’s LabBruce Donald’s LabBruce Donald’s Lab
8
Bruce Donald
Cheng-Yu Chen
John MacMaster
Michael Zeng Chittu Tripathy Lincong Wang
Pei Zhou
IntroductionIntroduction Motivation
Types of NMRTypes of NMR Spectroscopy DataSpectroscopy DataTypes of NMRTypes of NMR Spectroscopy DataSpectroscopy Data
9
Chemical shift (CS) Unique resonance frequency, serves as an ID
Nuclear Overhauser effect (NOE) Local distance restraint between two protons
Residual dipolar coupling (RDC) Global orientational restraint for bond vectors
R
133.1
8.9
Ha
4.2
B0
172.1
NOE
Bailey-Kellogg et al., 2000, 2004http://www.pnas.org/content/102/52/18890/suppl/DC1
Assigning chemical shifts to each atom
IntroductionIntroduction Motivation
Resonance Assignment ProblemResonance Assignment ProblemResonance Assignment ProblemResonance Assignment Problem
10
Obtain local distance restraints between protons
Bailey-Kellogg et al., 2000, 2004
IntroductionIntroduction Motivation
NOE Assignment ProblemNOE Assignment ProblemNOE Assignment ProblemNOE Assignment Problem
11
A famous bottleneck
a1 a2 a3
a1
a2
a3
4
?
3
?
3
4
an
an
. . .
.
.
.
. . .
. . .
. . .
.
.
....
.
.
..
..
Distance Geometry
Assignment Ambiguity
NP-Hard
NOESY spectrum
NOE assignment
Resonance assignments
IntroductionIntroduction Motivation
Structure Determination from NOEsStructure Determination from NOEsStructure Determination from NOEsStructure Determination from NOEs
12
[Saxe ’79; Hendrickson ’92, ’95]
Protein Structure Determination is Hard
A famous bottleneck
IntroductionIntroduction Motivation
Traditional Structure Determination Traditional Structure Determination ProtocolProtocol
Traditional Structure Determination Traditional Structure Determination ProtocolProtocol
13
NOE Assignments
NOE Assignments 3D Structures
Resonance assignments NOESY spectra
RDCs
SA/MD
Initial fold
XPLOR-NIH
Structure Refinement
Protein Structure Determination is Hard
error propagation
local minima
manual intervention for initial fold and for evaluation of NOE assignments
A famous bottleneck
Can we have a poly-time algorithm using orientational restraints?
Yes: Wang and Donald, 2004; Wang et al, 2006
IntroductionIntroduction Motivation
Traditional Structure Determination Traditional Structure Determination ProtocolProtocol
Traditional Structure Determination Traditional Structure Determination ProtocolProtocol
NOE Assignments
NOE Assignments 3D Structures
Resonance assignments NOESY spectra
RDCs
SA/MD
Initial fold
XPLOR-NIH
Structure Refinement
14
IntroductionIntroduction Motivation
Types of NMRTypes of NMR Spectroscopy DataSpectroscopy DataTypes of NMRTypes of NMR Spectroscopy DataSpectroscopy Data
15
Chemical shift (CS) Unique resonance frequency, serves as an ID
Nuclear Overhauser effect (NOE) Local distance restraint between two protons
Residual dipolar coupling (RDC) Global orientational restraint for bond vectors
R
133.1
8.9
Ha
4.2
B0
172.1
NOE
BackgroundBackground RDCs
RDC Equation for a Single BondRDC Equation for a Single BondRDC Equation for a Single BondRDC Equation for a Single Bond
16
2
1cos3
4
2
3,
20
ba
ba
rD
B0 v
a
b
S – Saupe MatrixS is traceless and symmetricS contains 5 dofs
Alignment medium
Sxx Syy
Szz
vD
Protein Structure Determination is Hard
NOE Assignments
NOE Assignments 3D Structures
Resonance assignments NOESY spectra
RDCs
SA/MD
Initial fold
XPLOR-NIH
Structure Refinement
RDCsConstaint number of NOEs
Global Fold
RDC-ANALYTIC PACKER
Sidechain Placement
NOE Assignments
XPLOR-NIH
NOE Assignments 3D Structures
RDC-PANDA Protocol
IntroductionIntroduction Motivation
Traditional Structure Determination VS RDC-Traditional Structure Determination VS RDC-PandaPanda
Traditional Structure Determination VS RDC-Traditional Structure Determination VS RDC-PandaPanda
error propagation
local minima
manual intervention for initial fold and for evaluation of NOE assignments
17Zeng et al. (Jour. Biomolecular
NMR,2009)
Global orientational restraints from RDCs
Compute initial fold using exact solutions to RDC equations
Resolve NOE assignment ambiguity
Sparce data (high-
throughput, large proteins,
membraine proteins)
Automated side-chain resonance assignment
Avoid the NP-Hard problem of structure determination from
NOEs
IntroductionIntroduction Motivation
Importance of Backbone Structure Importance of Backbone Structure DeterminationDetermination
Importance of Backbone Structure Importance of Backbone Structure DeterminationDetermination
18
IntroductionIntroduction Motivation
Current Limitations of RDC-PandaCurrent Limitations of RDC-PandaCurrent Limitations of RDC-PandaCurrent Limitations of RDC-Panda
Because it requires only 2 RDCs per residue:
Only SSE elements can be reliably determined, NOEs are needed to determine structure of loops
Difficulty in handling missing data
19
IntroductionIntroduction Motivation
My Current ProjectMy Current ProjectMy Current ProjectMy Current Project
Improve current protein structure determination techniques from our lab
Design new algorithms for protein backbone structure determination using orientational restraints from RDCs
20
Distance geometry based structure determination Braun, 1987 Crippen and Havel, 1988 More and Wu, 1999
Heuristic based structure determination Brünger, 1992 Nilges et al., 1997 Güntert, 2003 Rieping et al., 2005
RDC-based structure determination Tolman et al., 1995 Tjandra and Bax, 1997 Hus et al., 2001 Tian et al., 2001 Prestegard et al., 2004 Wang and Donald (CSB 2004) Wang and Donald (Jour. Biomolecular
NMR, 2004) Wang, Mettu and Donald (JCB 2005) Donald and Martin (Progress in NMR
Spectroscopy, 2009 ) Ruan et al., 2008 Zeng et al. (Jour. Biomolecular
NMR,2009)
• Heuristic based automated NOE assignment– Mumenthaler et al., 1997
– Nilges et al., 1997, 2003
– Herrmann et al., 2002
– Schwieters et al., 2003
– Kuszewski et al., 2004
– Huang et al., 2006
• Automated NOE assignment starting with initial fold computed from RDCs
– Wang and Donald (CSB 2005)– Zeng et al. (CSB 2008)– Zeng et al. (Jour. Biomolecular
NMR,2009)
• Automated side-chain resonance assignment
– Li and Sanctuary, 1996, 1997– Marin et al., 2004– Masse et al., 2006– Zeng et al. (In submission, 2009)
IntroductionIntroduction Motivation
Literature OverviewLiterature OverviewLiterature OverviewLiterature Overview
21
BackgroundBackground RDCs
RDC Equation for a Single BondRDC Equation for a Single BondRDC Equation for a Single BondRDC Equation for a Single Bond
22
Linear in S,
A fixed v defines a hyperplane
Quadratic in v,
A fixed S defines a hyperboloid
S
Sxx Syy
Szz
vD
BackgroundBackground RDCs
RDC Equation for a Single BondRDC Equation for a Single BondRDC Equation for a Single BondRDC Equation for a Single Bond
23
1 RDC equation defines a collection of hyperplanes, 7 variables
S
Linear in S,
A fixed v defines a hyperplane
Quadratic in v,
A fixed S defines a hyperboloid
BackgroundBackground RDCs
RDC Equations for a Protein PortionRDC Equations for a Protein PortionRDC Equations for a Protein PortionRDC Equations for a Protein Portion
24
1 2 3 4
BackgroundBackground RDCs
RDC Equations for a Protein PortionRDC Equations for a Protein PortionRDC Equations for a Protein PortionRDC Equations for a Protein Portion
25
Too few equations, too many variables!
[1] L. Wang and B. R. Donald. J. Biomol. NMR, 29(3):223–242, 2004.[2] J. Zeng, J. Boyles, C. Tripathy, L. Wang, A. Yan, P. Zhou, and B. R. Donald J. Biomol. NMR, [Epub ahead of print] PMID:19711185, 2009.
1 2 3 4
v1
u1
v2
BackgroundBackground RDCs
Forward Kinematics Reduces the Number of Forward Kinematics Reduces the Number of VariablesVariables
Forward Kinematics Reduces the Number of Forward Kinematics Reduces the Number of VariablesVariables
26
v1
u1
v2
Fix coordinate system.
BackgroundBackground RDCs
RDC Equations for a Protein PortionRDC Equations for a Protein PortionRDC Equations for a Protein PortionRDC Equations for a Protein Portion
27
v1
u1
v2
BackgroundBackground RDCs
RDC Equations for a Protein PortionRDC Equations for a Protein PortionRDC Equations for a Protein PortionRDC Equations for a Protein Portion
28
Recursive representation is possible!
BackgroundBackground RDCs
One Equation Per Dihedral Angle is One Equation Per Dihedral Angle is Not Enough!Not Enough!
One Equation Per Dihedral Angle is One Equation Per Dihedral Angle is Not Enough!Not Enough!
29
Each equation is linear in S, and quartic in either tan() or tan()
To be able to solve this system there must be additional information:
Possible scenarios:1. Additional RDC measurement(s) for each dihedral angle.2. Additional alignment media.3. Additional NOE data.4. Modeling (Ramachandran regions, steric clashes, energy function)5. Sampling (for alignment tensors)
BackgroundBackground RDC-Panda
The RDC-PANDA Structure Determination The RDC-PANDA Structure Determination PackagePackage
The RDC-PANDA Structure Determination The RDC-PANDA Structure Determination PackagePackage
30
Current requirements• 2 RDCs per residue to obtain SSE structures• Sparse NOEs to pack the SSEs
Current bottlenecks• Missing data (even in long SSEs)• Long loops• Sampling for computing alignment tensor(s)• Sampling for the orientation of the first pp
[1] L. Wang and B. R. Donald. J. Biomol. NMR, 29(3):223–242, 2004.[2] J. Zeng, J. Boyles, C. Tripathy, L. Wang, A. Yan, P. Zhou, and B. R. Donald J. Biomol. NMR, [Epub ahead of print] PMID:19711185, 2009.
Ellipse equations for CH bond vectorEllipse equations for CH bond vector
Wang & Donald, 2004; Donald & Martin, 2009.
BackgroundBackground RDC-Panda
When Saupe Matrix is Known Solution When Saupe Matrix is Known Solution Can Be Found Exactly!Can Be Found Exactly!
When Saupe Matrix is Known Solution When Saupe Matrix is Known Solution Can Be Found Exactly!Can Be Found Exactly!
Solution Structure of FF Domain 2 of human transcription elongation factor CA150 (FF2) using RDC-PANDA
PDB ID: 2KIQ
In collaboration with Dr. Zhou’s Lab
BackgroundBackground RDC-Panda
Solution Structure Deposited Using RDC-Solution Structure Deposited Using RDC-PandaPanda
Solution Structure Deposited Using RDC-Solution Structure Deposited Using RDC-PandaPanda
32
Current ProjectCurrent Project
Problem Formulation: NH, CH RDCs in 2 Problem Formulation: NH, CH RDCs in 2 MediaMedia
Problem Formulation: NH, CH RDCs in 2 Problem Formulation: NH, CH RDCs in 2 MediaMedia
33
We require measurements for at least 9 consecutive bond vectors (4.5 residues) in 2 media. The goal is to handle more equations and errors.
Current ProjectCurrent Project
Relationship to MinimizationRelationship to MinimizationRelationship to MinimizationRelationship to Minimization
34
Current ProjectCurrent Project
Relationship to Minimization and SVDRelationship to Minimization and SVDRelationship to Minimization and SVDRelationship to Minimization and SVD
35
b
sA
Solving an over constrained system of linear equations is equivalent to finding a projection of the b vector on the A hyperplane. This is also equivalent to minimizing the least square function of the terms.
Current ProjectCurrent Project
Relationship to MinimizationRelationship to MinimizationRelationship to MinimizationRelationship to Minimization
36
Current ProjectCurrent Project
Relationship to Minimization and SVDRelationship to Minimization and SVDRelationship to Minimization and SVDRelationship to Minimization and SVD
37
b
sA(i i)
Solving such a system of non-linear equations is not trivial!
There are multiple local minima in the corresponding minimization problem.
AdvantagesAdvantagesAdvantagesAdvantages
38
Current ProjectCurrent Project
If the minimization problem is solved then
• Computation of packed SSEs and loops is possible without additional NOE data.
• Saupe matrices for each of the alignment medium can be computed without sampling.
• Robust handling of missing values
The Algorithm: Initialization Using The Algorithm: Initialization Using HelixHelix
The Algorithm: Initialization Using The Algorithm: Initialization Using HelixHelix
39
Current ProjectCurrent Project
Compute initial approximation for Si using SVD
Initialize (i,i) for a helix
Compute (i,i) using tree search and minimization
Update Si using SVD
The Algorithm: Protein PortionThe Algorithm: Protein PortionThe Algorithm: Protein PortionThe Algorithm: Protein Portion
40
Current ProjectCurrent Project
Initialize Si to computed approximations
Compute (i,i) using tree search and minimization
Update Si using SVD
The Algorithm: Computing DihedralsThe Algorithm: Computing DihedralsThe Algorithm: Computing DihedralsThe Algorithm: Computing Dihedrals
41
Current ProjectCurrent Project
ψn
n
x
x
x
x
1
ψ1
Minimize each of the
RMSD terms as a
univariate function.
Compute the
list of best
solutions.
Iteratively
minimize the
RMSD function
AdvantagesAdvantagesAdvantagesAdvantages
42
Current ProjectCurrent Project
• The algorithm is converging, since every step minimizes RMSD function
• If the data was “perfect” then the solution to the minimization problem would be the roots of the polynomials in the RMSD terms, and the algorithm would find ALL of them.
• The minima of the RMSD terms give a good collection of initial structures for finding local and global minima
• Robust handling of missing values
Preliminary Results: Ubiquitin HelixPreliminary Results: Ubiquitin HelixPreliminary Results: Ubiquitin HelixPreliminary Results: Ubiquitin Helix
43
Preliminary ResultsPreliminary Results
Protein RMSD (Hz) Alignment Tensor (Syy, Szz)
Ubq :25-31
CH : 0.32
NH: 0.24
(23.66, 16.48)
(53.25, 7.65)
Conformation of the portion [25-31] of the helix for human ubiquitin computed using NH and CH RDCs in two media (red) has been superimposed on the same portion from high-resolution X-ray structure (PDB Id: 1UBQ) (green). The backbone RMSD is 0.58 Å.
-60
-40
-20
0
20
40
60
-60 -40 -20 0 20 40 60
back-computed RDCs
exp
erim
enta
l RD
Cs
NH RDCs CH RDCs
Preliminary Results: Ubiquitin StrandPreliminary Results: Ubiquitin StrandPreliminary Results: Ubiquitin StrandPreliminary Results: Ubiquitin Strand
44
Preliminary ResultsPreliminary Results
Protein RMSD (Hz)
Alignment Tensor (Syy, Szz)
Ubq: beta 2-7 CH :
NH:
(53.32, 4.83)
(48.03, 14.32)
-60
-40
-20
0
20
40
-60 -40 -20 0 20 40
back-computed RDCsex
peri
men
tal R
DC
s
NH RDCs CH RDCs
Conformation of the portion [2-7] of the beta-strand for human ubiquitin computed using NH and CH RDCs in two media has been superimposed on the same portion from high-resolution X-ray structure (PDB Id: 1UBQ). The backbone RMSD is 1.151 Å.
ConclusionsConclusionsConclusionsConclusions
45
Thank you!
Thank you!
• Complete and exhaustive search over the space of all structures minimizing the RDC fit function seems feasible due to understanding the structure of the solution.
• Possible and exiting extensions to more/different data
FundingFunding: : NIHNIH
Comparison
Data requirements vs. Accuracy (Ubiquitin):
Accuracy:Sparse
46