RECOORDRECOORD
REREcalculated calculated COORCOORdinates dinates DDatabaseatabase
Aart NederveenBijvoet Center for Biomolecular ResearchUtrecht [email protected]
Jurgen DoreleijersCenter for Eukaryotic Structural GenomicsUniversity of [email protected]
Wim VrankenMacromolecular Structure DatabaseEuropean Bioinformatics [email protected]
AimAim
• Recalculation of protein structures based on deposited NMR restraints using state of the art methods
• Goals:• decrease user- and software-dependent biases
• allow a better comparison between structures
• comparison between different structure calculation programs
• provide a database for the development and assessments of validation tools and calculation protocols
Overview recalculation projectOverview recalculation project
analysis-improvement?-correlations?-…
analysis-improvement?-correlations?-…
BMRB:STAR filesDoreleijers et al. 2003
BMRB:STAR filesDoreleijers et al. 2003
EBI/UU:Generation ofconsistentSTAR files
EBI/UU:Generation ofconsistentSTAR files
PDB:-coordinates-restraints
PDB:-coordinates-restraints
CYANA-sequence-MD SA-…
CYANA-sequence-MD SA-…
restraint manipulation
analysis
recalculation
design of RECOORD
CNS-topology-MD SA -refinement
CNS-topology-MD SA -refinement
1 2 3
4
6
5
Databases now publicly availableDatabases now publicly available
• DOCR/FRED (BMRB)databases containing converted and filtered restraintshttp://www.bmrb.wisc.edu/servlets/MRGridServlet
• RECOORD (EBI)database containing recalculated coordinateshttp://www.ebi.ac.uk/msd/recoord
SelectionSelection
• Formats (if distance restraints available): • CNS/XPLOR • DIANA/DYANA/CYANA• DISCOVER/MSI
• PDB entries selected:• only proteins• no HET atoms• multimers allowed (not yet re-calculated)• at least 20 residues
• Finally 545 monomers were selected
BMRB:STAR filesDoreleijers et al. 2003
BMRB:STAR filesDoreleijers et al. 2003
PDB:-coordinates-restraints
PDB:-coordinates-restraints
1 2
Conversion issuesConversion issues
• Data is converted to formats readable by calculation software (e.g. XPLOR/CNS and CYANA) by the FormatConverter available within CCPN software (Wim Vranken, EBI).
Problems:
• Differences between coordinate and restraint data:• e.g. 1 chain in pdb entry, 2 chains in restraint list
• residue numbering can differ in PDB entry and restraint list
• restraints for residues not present in PDB entry…
• Nomenclature in restraint list
EBI/UU:Generation ofconsistentSTAR files
EBI/UU:Generation ofconsistentSTAR files
3
Building topologyBuilding topology
• Starting script: generate_easy.inp from CNS
• Automated detection in original ensemble of:• Disulfide bridges (<3Å S-S distance in original first models)
• CIS peptides (if ||<25º in original first models)
• Protonation state of histidines (use CNS patches HISD, HISE)
• CYANA: sequence based on CNS topology• Add CYSS, HIST, HIST+, cPRO in sequence
• Automated generation of disulfide restraints
CYANA-sequence-MD SA-…
CYANA-sequence-MD SA-…
CNS-topology-MD SA -refinement
CNS-topology-MD SA -refinement
4 5
CONDOR computer cluster CS CONDOR computer cluster CS University MadisonUniversity Madison
• More than 800 processor used
• Total CPU time: 31,169 hours (3.5 years on single workstation)
• Example 2EZM, calculation of 1 model
(101 a.a. & 2.2 GHz P4 computer)CYANA 31 seconds
CNS 340 seconds
CYANA-sequence-MD SA-…
CYANA-sequence-MD SA-…
CNS-topology-MD SA -refinement
CNS-topology-MD SA -refinement
4 5
Evaluation of structure qualityEvaluation of structure quality
• Agreement with experimental restraints
• Improvement?
• Comparison CNS and CYANA
• Relation NMR data quality and structural
quality
analysis-improvement?-correlations?-…
analysis-improvement?-correlations?-…
6
Distance restraints violations Distance restraints violations
ORG: 0.08 Å (0.14 Å)
original entries
CNW: 0.04 Å (0.05 Å)
recalculated in CNS and refined in water
analysis-improvement?-correlations?-…
analysis-improvement?-correlations?-…
6
RMS distance restraints violations (Å)
freq
uenc
y
Dihedral restraints violationsDihedral restraints violations
analysis-improvement?-correlations?-…
analysis-improvement?-correlations?-…
6
RMS dihedral restraints violations (degrees)
freq
uenc
y
ORG: 1.6° (4.6°)
original entries
CNW: 0.5° (0.5°)
recalculated in CNS and refined in water
Results: quality indicatorsResults: quality indicatorsperformance CNS vs. CYANA (no water refinement yet)performance CNS vs. CYANA (no water refinement yet)
Average value over 545 entries
Original PDB
CNS recalculatio
n
CYANA recalculation
RMS distance restraints violations (Å)
0.08 ± 0.14 0.04 ± 0.06 0.04 ± 0.05
RMS dihedral restraints violations (degrees)
1.6 ± 4.6 0.5 ± 0.7 0.5 ± 0.7
Packing quality (Z-score) WHATCHECK
-3.5 ± 1.9 -4.1 ± 1.9 -4.3 ± 1.8
Bumps per 100 residues
73 ± 63 11 ± 9 86 ± 37
% most favoured PROCHECK
69 ± 14 69 ± 13 61 ± 14
analysis-improvement?-correlations?-…
analysis-improvement?-correlations?-…
6
Results: quality indicatorsResults: quality indicatorsperformance CNS before and after water refinementperformance CNS before and after water refinement
Average value over 545 entries
Original PDB
CNS recalculatio
n
CNS + water refinement
RMS distance restraints violations (Å)
0.08 ± 0.14 0.04 ± 0.06 0.04 ± 0.05
RMS dihedral restraints violations (degrees)
1.6 ± 4.6 0.5 ± 0.7 0.5 ± 0.5
Packing quality (Z-score) WHATCHECK
-3.5 ± 1.9 -4.1 ± 1.9 -2.5 ± 2.0
Bumps per 100 residues
73 ± 63 11 ± 9 10 ± 7
% most favoured PROCHECK
69 ± 14 69 ± 13 76 ± 11
analysis-improvement?-correlations?-…
analysis-improvement?-correlations?-…
6
Improvement: Improvement: packing and Ramachandran Z-packing and Ramachandran Z-
scoresscores
missing data
For ~ 5 % of entries no improvement possible because of missing NMR data compared to authors
improvement packing
impr
ovem
ent
Ram
acha
ndra
n analysis-improvement?-correlations?-…
analysis-improvement?-correlations?-…
6
Improvent Z-score:
Z=Zrefined - Zoriginal
In search of correlations In search of correlations (Pearson coefficient)(Pearson coefficient)
data density
RMS violations
circular variance
packing(Z score)
Ramachandran(Z score)
bumps
data density
-0.23 -0.46 0.35 0.31 -0.03
RMS violations
-0.11 0.22 -0.25 -0.37 0.58
circular variance
-0.32 0.00 -0.60 -0.67 0.25
packing(Z-score)
0.32 -0.06 -0.49 0.69 -0.39
Ramachandran(Z-score)
0.16 -0.11 -0.48 0.48 -0.51
bumps 0.04 0.04 0.07 -0.21 -0.47
original
refined
analysis-improvement?-correlations?-…
analysis-improvement?-correlations?-…
6
(correlations lower)
(correlations higher)
data densit
y
RMS violatio
ns
circular varianc
e
packing(Z score)
Ramachandran
(Z score)
bumps
data density
-0.23 -0.46 0.35 0.31 -0.03
RMS violations
-0.11 0.22 -0.25 -0.37 0.58
circular variance
-0.32 0.00 -0.60 -0.67 0.25
packing(Z-score)
0.32 -0.06 -0.49 0.69 -0.39
Ramachandran(Z-score)
0.16 -0.11 -0.48 0.48 -0.51
bumps 0.04 0.04 0.07 -0.21 -0.47
In search of correlations In search of correlations (Bumps)(Bumps)
analysis-improvement?-correlations?-…
analysis-improvement?-correlations?-…
6
original
refined
data densit
y
RMS violatio
ns
circular varianc
e
packing(Z score)
Ramachandran
(Z score)
bumps
data density
-0.23 -0.46 0.35 0.31 -0.03
RMS violations
-0.11 0.22 -0.25 -0.37 0.58
circular variance
-0.32 0.00 -0.60 -0.67 0.25
packing(Z-score)
0.32 -0.06 -0.49 0.69 -0.39
Ramachandran(Z-score)
0.16 -0.11 -0.48 0.48 -0.51
bumps 0.04 0.04 0.07 -0.21 -0.47
In search of correlations In search of correlations (NMR data density)(NMR data density)
analysis-improvement?-correlations?-…
analysis-improvement?-correlations?-…
6
original
refined
Correlation NMR data density Correlation NMR data density Ramachandran Z-scoreRamachandran Z-score
NMR data density
Ram
acha
ndra
n Z
-sco
re
r=0.31
analysis-improvement?-correlations?-…
analysis-improvement?-correlations?-…
6
Correlation NOE completeness Correlation NOE completeness and packing Z-scoreand packing Z-score
NMR data-based indicators cannot yield any indication of the normality of the structures
NOE completeness
pack
ing
Z-s
core
r=0.20
analysis-improvement?-correlations?-…
analysis-improvement?-correlations?-…
6
data densit
y
RMS violatio
ns
circular varianc
e
packing(Z score)
Ramachandran
(Z score)
bumps
data density
-0.23 -0.46 0.35 0.31 -0.03
RMS violations
-0.11 0.22 -0.25 -0.37 0.58
circular variance
-0.32 0.00 -0.60 -0.67 0.25
packing(Z-score)
0.32 -0.06 -0.49 0.69 -0.39
Ramachandran(Z-score)
0.16 -0.11 -0.48 0.48 -0.51
bumps 0.04 0.04 0.07 -0.21 -0.47
In search of correlations In search of correlations (Precision)(Precision)
analysis-improvement?-correlations?-…
analysis-improvement?-correlations?-…
6
original
refined
Correlation between precision and Correlation between precision and data densitydata density
analysis-improvement?-correlations?-…
analysis-improvement?-correlations?-…
6ci
rcul
ar v
aria
nce
NMR data density
r=-0.46
Correlation between precision Correlation between precision and Ramachandranand Ramachandran
1SUT
analysis-improvement?-correlations?-…
analysis-improvement?-correlations?-…
6
Ramachandran plot appearance (Z-score)
circ
ular
var
ianc
e
r=-0.67
Protein with high Ramachandran normality will have small circular variance
Correlation between RMSD and Correlation between RMSD and structural uncertainty (QUEEN)structural uncertainty (QUEEN)
r=-0.69
structural uncertainty
back
bone
RM
SD
(Å
)
Structural uncertainty imposes lower limit to the RMSD
analysis-improvement?-correlations?-…
analysis-improvement?-correlations?-…
6
Conclusions IConclusions I
• NMR-STAR files made consistent for 545 out of ±1700 entries
• Protocols and scripts available for recalculation in CYANA and CNS
• Validation database available for testing of new protocols
• Improvement compared to original data: 1 standard deviation closer to X-ray db• violations in original data do no limit recalculation effort
• refinement in water required
• 5 % no improvement: data missing
Conclusions IIConclusions II
• Correlations higher after recalculation and refinement, though most of them still weak
• Highest correlation: precision vs. Ramachandran score & structural uncertainty (QUEEN)
AcknowledgementsAcknowledgements
• Utrecht University Alexandre Bonvin Rob Kaptein
• EBI Cambridge Wim Vranken• CESG/BMRB Jurgen Doreleijers
Zachary MillerEldon Ulrich John Markley
• Radboud University Nijmegen Chris Spronk Sander Nabuurs
• RIKEN Japan Peter Güntert• Institut Pasteur Paris Michael Nilges
Top Related