Post on 18-Dec-2021
CCP4 @ APS 2014
CCP4 School Chicago 2014Experimental Phasing with shelx c/d/e
Tim GrüneGeorg-August-UniversitätInstitut für Strukturchemie
http://shelx.uni-ac.gwdg.de/˜tgtg@shelx.uni-ac.gwdg.de
Tim Grüne Phasing with Shelx C/D/E 1/34
CCP4 @ APS 2014
Overview
• shelx c/d/e
• Anode: Analysis by anomalous signal
• MrTailor: Improved external restraints
Tim Grüne Phasing with Shelx C/D/E 2/34
CCP4 @ APS 2014
Where we are
Data Integration AnomalousDifferences
SubstructureSolution
Phasing & DensityModification
Building &Refinement
MolecularReplacement
shelxc shelxd shelxe −→
Tim Grüne Phasing with Shelx C/D/E 3/34
CCP4 @ APS 2014
shelx c/d/e [1]
shelxc shelxd shelxe
• Extracts anomalous signal• Prepares native data• Analyses data sets
• Determines substructure • Density modification• Poly-ALA autobuilding
Tim Grüne Phasing with Shelx C/D/E 4/34
CCP4 @ APS 2014
hkl2map
http://webapps.embl-hamburg.de/hkl2map/
Tim Grüne Phasing with Shelx C/D/E 5/34
CCP4 @ APS 2014
Documentation
Most shelx programs issue “short” usage instruction when called without an argument.tg@slartibartfast:~$ shelxc
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ SHELXC - Create input files for SHELXD and SHELXE - Version 2013/2 ++ Copyright (c) George M. Sheldrick 2003-13 ++ Started at 13:59:57 on 10 Jun 2014 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
SHELXC reads a filename stem (denoted here by 'xx') on the command lineplus some instructions from 'standard input'. It writes some statistics to'standard output' and prepares the three files needed to run SHELXD andSHELXE. SHELXC can be called from a GUI by a command line such as:
shelxc xx <t
which would read the instructions from the file t, or (under most UNIXsystems) by a simple shell script that includes the instructions, e.g.
shelxc xx <<EOFCELL 49.70 57.90 74.17 90 90 90SPAG P212121SAD elastase.scaFIND 12<<EOFshelxd xx_fashelxe xx xx_fa -s0.37 -m20 -h -bshelxe xx xx_fa -s0.37 -m20 -h -b -i
More information including tutorials available at http://shelx.uni-ac.gwdg.de/SHELX/.
Tim Grüne Phasing with Shelx C/D/E 6/34
CCP4 @ APS 2014
Shelxc Data Preparation: Keywords
shelxc can be used for six different phasing scenarios:
SAD • SAD
• (NAT)
SIRAS • SIRA
• NAT
MAD• (NAT)• PEAK
• INFL
• LREM
• HREMSIR • SIR
• NAT
RIP • BEFORE
• AFTER
• (NAT)
Each keyword takes the filename of the corresponding integrated dataset.
Tim Grüne Phasing with Shelx C/D/E 7/34
CCP4 @ APS 2014
The Substructure
• Coordinates of anomalous scatterers• Anomalous difference∣∣∣|F+(hkl)| − |F−(hkl)|
∣∣∣ ≈ |Fsub(hkl)|
corresponds to small molecule data set• Shelxd: solve substructure with direct methods
Tim Grüne Phasing with Shelx C/D/E 8/34
CCP4 @ APS 2014
Shelxc Output Files
“shelxc mymad” creates three files:
mymad fa.ins Text file with instructions for shelxd
mymad fa.hkl Artificial substructure data set from which shelxd determines substructure coordinates. Eachline contains
h, k, l,∣∣∣|F+(hkl)| − |F−(hkl)|
∣∣∣ , αα is not used by shelxd, but by shelxe to calculate an initial phase estimate for the protein structure as
φT (hkl) = φA(hkl) + α(hkl)
MAD/SIRAS: exact α; SIR or SAD: rough estimate of αφA is the phase angle calculated from the substructure coordinates determined by shelxd.
mymad.hkl native data used by shelxe for phasing and density modification
Tim Grüne Phasing with Shelx C/D/E 9/34
CCP4 @ APS 2014
Shelxc: Resolution Cut-off for Anomalous Signal85349 Reflections read from SAD file XDS_pk1pk2.HKL
12186 Unique reflections, highest resolution 7.199 Angstroms141.7 Friedel pairs used on average for local scaling
Resl. Inf. 16.01 12.71 11.10 10.09 9.36 8.81 8.37 8.01 7.70 7.43 7.20N(data) 1149 1124 1115 1099 1108 1102 1126 1078 1091 1122 1072Chi-sq 1.30 1.22 1.24 1.27 1.37 1.55 1.58 1.62 1.49 1.35 1.43<I/sig> 41.7 28.7 24.4 22.1 16.5 13.3 9.2 7.4 5.3 3.8 2.2%Complete 98.9 99.9 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 98.2<d"/sig> 4.82 2.79 1.90 1.53 1.26 1.04 1.00 0.89 0.85 0.80 0.71CC(1/2) 94.8 81.8 65.2 53.8 39.6 25.0 23.6 9.3 5.3 -6.6 -14.4
• anomalous signal where CC(1/2) > 30%
• anomalous signal where <d"/sig> > 1.3
• CC > 30% usually more reliable than <d"/sig> > 1.3
Tim Grüne Phasing with Shelx C/D/E 10/34
CCP4 @ APS 2014
Shelxd — Finding the Substructure
shelxd mymad_fa
reads the “substructure data” mymad fa.hkl and its instructions from mymad fa.ins. The most important entriesin mymad fa.ins:
SFAC SE atom type to look forFIND 12 expected number of substructure atoms, should be within 20 % of the actual
number (try several for e.g. a soak where the number is not known)SHEL 999 3.3 resolution limits of the anomalous signal. High resolution limit can be
critical, but the default of dmin +0.5 Å works well in many cases.NTRY 10000 number of trials.ESEL 1.5 Minimum |E|-value used in peak picking during dual-space recycling. At low
resolution, try e.g. ESEL 1.2
Tim Grüne Phasing with Shelx C/D/E 11/34
CCP4 @ APS 2014
Shelxd Output
While shelxd runs, the best solution is written to mymad fa.res which contains the substructure coordinates infractional coordinates and which is later read by shelxe.
REM Best SHELXD solution: CC 60.74 CC(weak) 49.22 CFOM 109.96TITL mymad_fa.ins MAD in C2CELL 0.98000 109.02 61.75 71.74 90.00 97.08 90.00LATT -7SYMM -X, Y, -ZSFAC SEUNIT 192SE01 1 0.758774 0.508636 0.246391 1.0000 0.2SE02 1 0.792908 0.398262 0.138903 0.8845 0.2
[...]SE10 1 0.925819 0.231575 0.191291 0.5569 0.2SE11 1 0.495239 0.183609 0.416278 0.5352 0.2SE12 1 0.643097 0.029221 0.210653 0.4897 0.2 <---SE13 1 0.811539 0.048553 0.227752 0.1453 0.2 <---SE14 1 0.600281 0.156860 0.149628 0.0764 0.2HKLF 3END
The sixth column contains the occupancy of thecorresponding atom. A sharp drop (here betweenSE12 and SE13) is a promising sign of a correctsolution.The correlation coefficient (CC and CCweak) inthe first line measures the reliability of the solution
For SAD, a CC of more than 30 % is a safe sign of a correct solution, for MAD the limit is about 40 %.
Tim Grüne Phasing with Shelx C/D/E 12/34
CCP4 @ APS 2014
Shelxe: Phasing, Density Modification, Model Building
No .ins-tructions file. All parameters provided as command line options after data file names.
A typical and one of the most simple command line could be
shelxe mymad mymad_fa -s0.65 -h -a
mymad read native data mymad.hkl
mymad fa read angle estimate for α from mymad fa.hkl, substructure coordinates from mymad fa.res (theshelxd output)
-s0.65 Assume a solvent content of 65%. The solvent content is one of the most critical parameters for shelxeand it is worth testing various settings
-h substructure atoms present in native data mymad.hkl
-a run 5 (default) cycles of poly-ALA autotracing.
Tim Grüne Phasing with Shelx C/D/E 13/34
CCP4 @ APS 2014
Shelxe -i: Inverted Substructure
It is impossible to distinguish the substructure from its enantiomer with the anomalous data and there is a 50 %chance that the coordinates in mymad fa.res are inverted w.r.t. the correct substructure.
Therefore shelxe must always be run twice
• with the direct substructure• with the inverted substructure, i.e. with the same options as the direct hand plus
the switch -i. This inverts the hand and takes care of everything necessary (e.g. in-version of screw axes, P41 to P43). The output files are amended by i to distinguishthe two runs.
N.B. if the inverted hand turns out to be the correct hand, your space group may change - e.g. in the presenceof screw axes. Keep this in mind when you convert your native data to e.g. mtz-format!
Tim Grüne Phasing with Shelx C/D/E 14/34
CCP4 @ APS 2014
Solvent Content
• With autobuilding (-a): correct solvent content less important
• Without autobuilding (e.g. RNA or DNA only): solvent content (-s) one of the most critical parameters
– Average volume per amino acid: 140 Å3
– Average volume per nucleic acid: 380 Å3
– or use matthews coeff in CCP4i
• Take disordered regions into account: increase solvent content
Tim Grüne Phasing with Shelx C/D/E 15/34
CCP4 @ APS 2014
Caveat: Substructure Resolution
• “Normal” macromolecular structure: Determine atom positions even at e.g. 5 Å resolution because ofrestraints.
• Substructure unrestrained
⇒ coordinates only know within resolution of anomalous signal, often much worse than 3 Å
Way out:
1. e.g. Sharp improves substructure coordinates before density modification
2. “substructure recycling” with shelxe
Tim Grüne Phasing with Shelx C/D/E 16/34
CCP4 @ APS 2014
Shelxe: Substructure Recycling
mymad_fa.res
input
mymad.hat
renam
e
shelxd
shelxe
finds
impro
ves
• Better substructure = better starting phases = better map• Caveat: If the inverted structure turns out to be the correct
hand (i.e. mymad i.hat from the -i-run of shelxe), the secondrun of shelxe must be run without the -i switch:
Tim Grüne Phasing with Shelx C/D/E 17/34
CCP4 @ APS 2014
Shelxe: Did it work?
Criteria to tell if phasing worked:
1. Correct hand shows better Contrast, especially at early cycles of density modification.
2. Correct hand has higher map correlation coefficient throughout resolution range:
d inf - 4.66 - 3.70 - 3.23 - 2.93 - 2.72 - 2.56 - 2.43 - 2.33 - 2.24 - 2.15<mapCC> 0.626 0.795 0.775 0.754 0.819 0.804 0.756 0.694 0.620 0.582 direct<mapCC> 0.810 0.877 0.845 0.844 0.874 0.856 0.840 0.830 0.839 0.809 inverse
3. A reasonable poly-ALA trace (average 10 residues per chain) and a CC > 25%
When using the auto-tracing option (-a) in shelxe, the first two figures (contrast/ mapCC) become meaningless,but in this case the poly-ALA trace is much more conclusive.
Tim Grüne Phasing with Shelx C/D/E 18/34
CCP4 @ APS 2014
Shelxe: Structure Solved?
Indicators from shelxe:
TITLE elastase.pdb Cycle 1 CC = 4.28% 48 residues in 6 chains
TITLE elastase_i.pdb Cycle 3 CC = 41.28% 225 residues in 6 chains
1. CC>25%
2. average chain length > 10
3. jump in CC over many cycles (e.g. with -a50)
Tim Grüne Phasing with Shelx C/D/E 19/34
CCP4 @ APS 2014
Shelxe: Structure Solved?
Coot reads mymad.pdb (poly-ALA trace) and mymad.phs (map).
Elastase SAD tutorial
Direct hand Inverse hand
Tim Grüne Phasing with Shelx C/D/E 20/34
CCP4 @ APS 2014
Shelxe: MR-SAD [5]
MR-SAD: Combination of
• poor MR solution
• weak anomalous phase information
often neither of the two alone are sufficient for structure solution
Tim Grüne Phasing with Shelx C/D/E 21/34
CCP4 @ APS 2014
Shelxe: MR-SAD howto
• MR-Solution: phaser-test 1.1.pdb
• Peak Dataset from mosflm/scala: ctruncate 1234.mtz
#> cp phaser-test_1.1.pdb mrsad_test.pda
#> mtz2sca ctruncate_1234.mtz -o shelxc_input.sca
#> shelxc mrsad_test < myshelxc_SAD.inp > mrsad_test_shelxc.log
→ Creates mrsad test fa.hkl containing angle α
#> shelxe mrsad_test.pda mrsad_test_fa -s0.67 -a -h -o -z
• -o optimize input coordinates myproject.pda
• -z optimize substructure coordinates mrsad test fa.res
Tim Grüne Phasing with Shelx C/D/E 22/34
CCP4 @ APS 2014
Shelxe: MR-SAD
• Phases from model only used to find substructure in anomalous data
• Phases improved from substructure coordinates and data
• Building of poly-ALA trace with map from φT = φA(substructure coordinates) + α(anomalous signal).
poly-ALA trace free of model bias from MR-solution
Tim Grüne Phasing with Shelx C/D/E 23/34
CCP4 @ APS 2014
Anode [4]
Tim Grüne Phasing with Shelx C/D/E 24/34
CCP4 @ APS 2014
Anode
Anomalous Difference Map optimised for working with shelx c/d/e phasing.
• “post-mortem” analysis of anomalous scatterers: positions and signal strength
• cross-confirmation for MR:
1. anomalous data not sufficient for data solution
2. MR solution available, but e.g. weak or risk of wrong solution
• Distinction between unknown metal types (anomalous vs. non-anomalous)
Tim Grüne Phasing with Shelx C/D/E 25/34
CCP4 @ APS 2014
Using Anode
Input :
1. my-project.pdb (putative) MR-solution; source for φT
2. my-project fa.hkl anomalous data prepared by shelxc; source for α
#> anode my-project
Output :
1. my-project.pha Anomalous difference map calculated from |FA| and φA = φT − α
2. my-project fa.res Peaks from anomalous map
3. my-project.lsa Logfile listing atoms in model PDB closest to anomalous peaks.
Tim Grüne Phasing with Shelx C/D/E 26/34
CCP4 @ APS 2014
Anode: Tracing RNA Polymerase I [3]
Tim Grüne Phasing with Shelx C/D/E 27/34
CCP4 @ APS 2014
MrTailor: Improved External Restraints [2]
Tim Grüne Phasing with Shelx C/D/E 28/34
CCP4 @ APS 2014
External Restraints
• Complexes composed of several subunits• Often low resolution data from Molecular Replacement
• Example: RNA polymerase I @ 4.2Å• MR solution from RNA polymerase II• R = 34%, Rfree=45%• 67% model completeness
Tim Grüne Phasing with Shelx C/D/E 29/34
CCP4 @ APS 2014
External Restraints [6]
brown : no external restraints (R = 32.5%, Rfree = 47.1%)cyan : external restraints to PDB ID 1I3Q (RNA PolII) (R = 37.3%,
Rfree = 46.6%)grey : deposited RNA Pol Iblack : Reference (readily published) structure 1I3Q
Tim Grüne Phasing with Shelx C/D/E 30/34
CCP4 @ APS 2014
Mrtailor: Cut out Low Score Alignment Regions
.: . :: : :* : :. : * *:* * :* . * : : lcl|10706 GYLGNKLSVSVEQVSIAKPMSNDGVSSAVERKVYPSESRQRLTSYRGKLLLKLKWSV-2WAQ_B ---IPGLKVRLGKIRIGKPRVRE--SDRGEREISPMEARLRNLTYAAPLWLTMIPV--2PMZ_B ---IPGLKVRLGKIRIGKPRVRE--SDRGEREISPMEARLRNLTYAAPLWLTMIPV--1I3Q_B DNISRKYEISFGKIYVTKPMVNE--SDGVTHALYPQEARLRNLTYSSGLFVDVKKRTY3H0G_B GDVTRRYEINFGQIYLSRPTMTE--ADGSTTTMFPQEARLRNLTYSSPLYVDMRKKVM
←− 1I3Q.pdb←− BLAST/Cobalt Alignment
←− Clustalx Scores−→ tailored 1i3q mrtailor.pdb
Tim Grüne Phasing with Shelx C/D/E 31/34
CCP4 @ APS 2014
External Restraints [6]
brown : no external restraints (R = 32.5%, Rfree = 47.1%)cyan : external restraints to PDB ID 1I3Q (RNA PolII) (R = 37.3%,
Rfree = 46.6%)red : external restraints after MRTAILOR (R = 32.8%, Rfree =
45.6%)grey : deposited RNA Pol Iblack : Reference (readily published) structure 1I3Q
Tim Grüne Phasing with Shelx C/D/E 32/34
CCP4 @ APS 2014
References
1. SHELXCDE: G. M. Sheldrick, Acta Cryst. (2010), D66
2. MRTAILOR: T. Gruene, Acta Cryst. (2013), D69, 1861-1863
3. Pol I: C. Fernández-Tornero et al., Nature (2013) 502, 644–649
4. ANODE: A. Thorn & G. M. Sheldrick, J. Appl. Cryst. (2011) 44, 1285-1287
5. MRSAD: S. Panjikar et al. Acta Cryst. (2009). D65, 1089–1097
6. PROSMART/REFMAC5: Nicholls et al., Acta Cryst. (2012), D68, 404–417
Tim Grüne Phasing with Shelx C/D/E 33/34
CCP4 @ APS 2014
Acknowledgment
Rob Nicholls, MRC London
Nicholas Taylor, EPFL Lausanne
Tim Grüne Phasing with Shelx C/D/E 34/34