Course Xtallo Delarue 2016 - Institut...
Transcript of Course Xtallo Delarue 2016 - Institut...
Solving structures: phasing diffrac5on data and refining models
Marc Delarue Ins5tut Pasteur
The phase problem: defini5on We measure |F(h)|, However, we need F(h)=|F(h)|exp(iφ(h))
Phasing
• If you have a (plausible) PDB Model: Molecular Replacement (one na5ve data set)
• If you don’t: Isomorphous Replacement or SAD or MAD (one heavy-‐atom deriva5ve and datasets of the same sample at different λ)
• Improve the phases by imposing physical constraints on the electron density
• Direct methods work if you have very high Resolu5on data (about 1 Å) -‐> 2 Å?
0. PaUerson func5ons • Instead of the usual Fourier synthesis we can calculate Fourier map with Amplitudes squared and zero phase. This gives a peak for every interatomic distance
• For small molecules, it is possible to solve the phase problem using PaUerson methods only
1. Molecular replacement
hUp://www.phaser.cimr.cam.ac.uk/index.php/Phaser_Crystallographic_So\ware
Needs a model with >30% sequence iden5ty, but not necessarily complete
Molecular Replacement
• 6-‐D Problem: you need to find the orienta5on and posi5on of the model in the a.u.
• PaUerson-‐based methods • Usually broken down into two successive 3D searches: – Rota5on func5on – Transla5on Func5on
• Usually the R-‐factor is very high even for the true solu5on: needs a lot of reconstruc5on
Rota5on func5on
Fast rota5on func5on (R.A. Crowther) => Accessible to FFT (rapid)
Low Signal-‐to-‐noise output: submit every peak above 3 σ to TF
Parameters: Resolu5on of P(u) and integra5on radii (in Å). Rossmann and Blow, 1962
Transla5on func5on
Use PaUerson vectors from symmetry-‐related molecules (Crowther and Blow, 1967)
Molecular Replacement: ques5ons
• Play with the Model (truncate side-‐chains?) • How many molecules in the a.u.? • Which Program: Phaser, MolRep? • Parameters of the program: which resolu5on? • How do I know I have found the solu5on? • How do I know I am not missing another molecule in the a.u.?
Molecular Replacement: answers
• The Model (truncate side-‐chains to Ala or Ser) • Try Phaser first (in CCP4) • Phaser gives an es5mate of the number of molecules in the a.u., with a probability
• Resolu5on: 12-‐4 Å • Z-‐scores in the TF should be high (>8 sigmas) • Phaser looks for other molecules in the a.u., if appropriate
Molecular Replacement: variants
• How to place a model in a poorly phased map? Phased rota5on func5on
• Best recipe: scan all the rota5ons and perform TF
• Be careful to the a.u. of the rota5on space group
• Very powerful method, equivalent to CC in real space
Molecular Replacement: References
• T. Blundell and L. Johnson, Protein Crystallography • J. Drenth, Principles of Protein Crystallography • M. Rossmann, Acta Cryst, Self-‐Rota5on func5on • M. Rossmann and D. Blow, Acta Cryst, 1962. • R.A. Crowther and D. Blow, Acta Cryst, 1967. • R. Read, Phaser, Log-‐likelihood, Acta Cryst, 2007 • A. Vagin, Molrep (CCP4) • J. Navaza, Amore (CCP4) • Almn, TFsgen (CCP4) • M. Delarue in Macromolecular Crystallography, M. Sanderson and J. Skelly, Editors, OUP, 2007.
2. Experimental Phasing: Principle
• FPH=FP+FH • FPH’=FP+FH’ • … FH
FP
FPH
Experimental Phasing: what kind of heavy atoms?
• So\ heavy atoms complexes: Ir, Pt, Au, Hg • Lanthanides: Eu, Gd, Dy, Sm, Ho, Tb, Yb • Organic complexes of lanthanides (Gd+++) • Transi5on metal ions: Mn++, Co++, Zn++ (AS) • Hard divalent ca5ons: Ba++, Sr++ • Heavy monovalent ions: Rb+, Cs+ • Heavy monovalent anions: Br-‐ (AS), I-‐ • Web site: hUp://skuld.bmsc.washington.edu/scaUer/
Experimental Phasing: how many heavy atoms?
• You need at least one heavy atom • In principle you need actually two, with good isomorphism
• Or you need several data sets of the same deriva5ve at different wavelengths
• BeUer if you can tune the x-‐ray wavelength to reach the absorp5on edge (synchrotron)
• Try soaking (different 5melapse, or co-‐xtals)
Exp. Phasing: Anomalous signal
• FPH(+)=FP(+)+FH(+) • FPH(-‐)=FP(-‐)+FH(-‐) • FH(-‐) depends on lambda (Å)
FP(-‐)
Experimental Phasing: the normal route
• Do a fluorescence spectrum of your crystals: find out if there is any unusual metal ion
• If no: set out to produce Se-‐Met deriva5zed protein – How many Met? 1/100? – Put some more by site-‐directed mutagenesis? – Will they all be fixed?
• Will the new protein be soluble? Give crystals? • Collect data at several wavelengths (do a fluorescence spectrum on synchrotron site)
Isomorphous Replacement
• Principle • Data • Programs
– SHARP (G. Bricogne, Global Phasing) – Solve/Resolve (T. Terwilliger, now in Phenix)
• Find the heavy atoms sites (PaUerson searches) • Check the coherence of the sites of the different data sets (common origin, Cross Fourier Differences)
• Figures to monitor the progress of the phasing: • Figure-‐of-‐merit, phasing power and all that
Monitoring progress during phasing
• How many sites? • What is their occupa5on? B-‐factors? • Do they “find” themselves in cross FFT? • Global quality indicator:
• FOM=<cos Δϕ>=∑H ∫dϕH cos ΔϕH P(ϕH) > 0.33? • For each deriva5ve: • Phasing power=<FH>/rmsH> 1.0?
MIRAS, SIRAS, MAD, SAD and all that…
• MIRAS: same as MIR but with anomalous scaUering informa5on
• SIRAS: single heavy atom, but with anomalous scaUering informa5on
• MAD, same as above, but with several wavelengths (no lack-‐of-‐isomorphism pb)
• SAD, same as above but with only one wavelength (+ solvent flaUening)
3. DM: Gewng beUer phases
• Solvent flaUening • Non-‐crystallographic symmetry averaging • Histogram matching • All methods can be expressed both on direct space (x,y,z) or in reciprocal space (h,k,l)
4. Direct methods • Imposing (known) physical constraints on the electron density – Posi5vity: rho(r)>0 – Atomicity: rho2(r)=rho(r)
• Probabilis5c methods try to get the most out of these constraints (triplet rela5onships…) (ShelX, George Sheldrick)
• New methods try to find small fragments of protein-‐like structures and proceed from there (Arcimboldo, Isabel Uson)
Direct Methods
• Normally, the limit is 1000 atoms, 1.2 Å data • ShelX: ShelX, ShelXD, ShelXE: • hUp://shelx.uni-‐ac.gwdg.de/SHELX/ • Arcimboldo limit is now 2.1 Å, 400 aa: • hUp://chango.ibmb.csic.es/ARCIMBOLDO • Uses PHASER and SHELXE, massive parallel computers, and fragments representa5ve of proteins
Phasing References
• M. Perutz, J.C. Kendrew: the original MIR method • D. Blow and F. Crick, Acta Cryst, 1959: the sta5s5cal treatment of errors
• T. Blundell and L. Johnson, Protein Xtallography, 1976 • J. Drenth, Principles of Protein Xtallography, 1999 • W.A. Hendrickson (MAD Phasing) • G. Bricogne (Sharp, Maximum Likelihood, NCS…) • T. Terwilliger (Phasing, DM, Solve, Phenix) • A.T. Brunger (MR, R-‐free, Simulated Annealing…)
Model Building and refinement
• Effect of resolu5on • Automa5c building • Manual building • Refinement • Valida5on
Model Building: Maps • Maps: mix measured structure Factors and experimental phases
• Influence of Resolu5on
• No Model: m Fobs exp(iφbest) • With Model (2 Fobs-‐Fcalc) exp(iφcalc)
What kind of maps can one get?
Effect of resolu5on
Effect of resolu5on
Model Building: Automa5c methods
• Programs that find the trace of CA (Alwyn Jones) • Edit the trace to get the longest possible polypep5de: – Solve/Resolve (Phenix) – Kevin Cowtan (Buccaneer) – Lamzin, Perrakis and coll. (ARP/wARP)
• Itera5ve density modifica5on • Calculate 2Fo – Fc maps with model phases • Phase combina5on • Highly Itera5ve process
First C-‐alpha trace
C-‐alpha trace: manual interven5on
Alpha-‐helices
Beta sheets
Side-‐chain fiwng: use rotamers
Water molecules and all that
Phase combina5on
• Combine different sorts of informa5on: calculated (from current model) and experimental phases
• Refine Model against which Energy? • E_xray= ∑H |F(H)model -‐ F(H)calc|/ ∑H |F(H)calc| • E_tot=E_geom + E_xray • Bayes theorem • When does one get rid of the experimental phases? • When does one start reconstruc5ng manually?
Manual reconstruc5on
• O (Alwyn Jones), now Coot (Paul Emsley) • Add pep5des bonds • Add sidechains, Mutate, scan rotamers… • Stereochemically correct deforma5ons of parts of the model to fit the electron density
• Rebuild loops • Mul5ple conforma5ons • B-‐factors
Coot: manual reconstruc5on
Model Refinement • We can describe fit of the model to the data, in reciprocal space (R-‐factor)
• R = ∑H |F(H)model -‐ F(H)calc|/ ∑H |F(H)calc| • Possible to get gradient (forces) and apply minimisa5on algorithms => new model, new phases, new maps
• Problem: lots of mul5ple minima • Possible solu5on: Simulated annealing • Or use less variables (dihedral angles, CNS) • Log-‐likelihood should be used (Buster, Refmac5) • Defini5on of R-‐free (CNS, Refmac5)
Simulated Annealing • Explore Energy Landscape most effec5vely • Sta5s5cal Physics (Sherrington & Kirkpatrick) • Gradient descent minimisa5on can be stuck • P=exp(-‐E/T) • Begin at high T: accept even unfavourable E • Protocol: Gradually decrease T… • Un5l T=0 where the system is frozen
The R-‐free concept
• Minimizing the R factor is not a guarantee for finding the true op5mum
• Use sta5s5cal analysis concepts to validate the refinement
• Select a random set of 10% of the reflec5ons and leave them out of R-‐factor minimisa5on
• This gives R-‐work • Calculate R-‐free on this set of flagged Refl. • R-‐work and R-‐free should both go down during a well conducted refinement and remain close…
Increase data/param. ra5o
• Reduce the number of parameters – Dihedral angles instead of posi5ons – B-‐factors given by TLS server – Collec5ve degrees-‐of-‐freedom (Normal Modes)
• Or collect new data sets at higher resolu5on (high pressure, another crystal form…)
Model Building and Refinement: References
• Paul Emsley (Coot) • T. Terwilliger (Solve/Resolve: Phenix) • Kevin Cowtan (Buccaneer) • Lamzin and coll. (ARP/wARP) • Phase combina5on (Sim weigh5ng scheme) • R-‐free concept (CNS, A. Brünger) • Log-‐likelihood (Buster, G. Bricogne; Phaser, R. Read) • K. Karplus and K. Diederichs (CC1/2): get the most of your data (define maximul usable resolu5on)
Valida5on of the final model
• Procheck (J. Thornton, EBI) • Whatcheck (G. Vriend) • Ramachandran outliers • Rotamers outliers • PDB deposi5on (including the data sets) • PDB_Redo (Perrakis and Vriend, JMB, 2016)
Table of Refinement sta5s5cs
Analysis of structures
• Sequence analysis from mul5alignement (Muscle, MAFFT…)
• Project sequence conserva5on onto the surface (Consurf)
• Phylogeny: rate of divergence posi5on-‐by-‐posi5on mapped onto the structure (Diverge)
• Electrosta5cs (D. Baker, APBS): solve Poisson-‐Boltzmann equa5on (assign charges…)
The interpreta5on of structures
• Collect other data sets – At different temperatures – In different crystal forms – With different substrates – Mutants of known func5on
• Any exis5ng structure related to the “new” one? – Use DALI server – Visit EBI, UK – hUp://ekhidna.biocenter.helsinki.fi/dali_server/start
Mul5ple alignments: Consurf hUp://consurf.tau.ac.il/2016/
Phylogeny and Structure • Project rates of divergence posi5on-‐by posi5on onto the 3D structure
Electrosta5cs (APBS, D. Baker)
• hUp://www.poissonboltzmann.org/
Some usefuls links
• Lorentz.dynstr.pasteur.fr: this presenta5on • Lorentz.dynstr.pasteur.fr/website/index.html • Structural Medicine on-‐line course (R. Read, Cambridge, UK): hUp://www-‐structmed.cimr.cam.ac.uk/course.html
• Bernhard Rupp web site: hUp://www.ruppweb.org/Xray/101index.html