Reusing phenix.refine for powder data? Ralf W. Grosse-Kunstleve Computational Crystallography...

15
Reusing phenix.refine for powder data? Ralf W. Grosse-Kunstleve Computational Crystallography Initiative Lawrence Berkeley National Laboratory Workshop on developments and directions of powder diffraction on proteins, June 22/23, 2007

Transcript of Reusing phenix.refine for powder data? Ralf W. Grosse-Kunstleve Computational Crystallography...

Page 1: Reusing phenix.refine for powder data? Ralf W. Grosse-Kunstleve Computational Crystallography Initiative Lawrence Berkeley National Laboratory Workshop.

Reusing phenix.refine for powder data?

Ralf W. Grosse-Kunstleve

Computational Crystallography InitiativeLawrence Berkeley National Laboratory

Workshop on developments and directions of powder diffraction on proteins, June 22/23, 2007

Page 2: Reusing phenix.refine for powder data? Ralf W. Grosse-Kunstleve Computational Crystallography Initiative Lawrence Berkeley National Laboratory Workshop.

My two lives

• Live 1 (PhD project):– Zeolite structure determination from

powder data using extracted intensities

• Live 2:– Contributions to Xplor/CNS

• Single-crystal protein crystallography• About 80% of all PDB entries refined with Xplor/CNS

– Phenix project• Fresh start after losing a legal battle

Page 3: Reusing phenix.refine for powder data? Ralf W. Grosse-Kunstleve Computational Crystallography Initiative Lawrence Berkeley National Laboratory Workshop.

Funding: NIH Program Project (NIGMS, PSI), Director - Paul Adams

CCI APPS

SOLVE / RESOLVE

PHASER

TEXTAL

MolProbity / REDUCE

Computational Crystallography Initiative (LBNL)-Paul Adams, Ralf Grosse-Kunstleve, Pavel Afonine-Nigel Moriarty, Nicholas Sauter, Peter Zwart

Los Alamos National Lab (LANL)-Tom Terwilliger, Li-Wei Hung

Cambridge University -Randy Read, Airlie McCoy

Texas A&M University -Tom Ioerger, Jim Sacchettini, Erik McKee

Duke University - Jane Richardson, David Richardson, Ian Davis

Phenix Collaboration

Page 4: Reusing phenix.refine for powder data? Ralf W. Grosse-Kunstleve Computational Crystallography Initiative Lawrence Berkeley National Laboratory Workshop.

Spectrum of phenix components

• Automated analysis of data quality: phenix.xtriage

• Rapid substructure determination: phenix.hyss

• Phasing: Maximum likelihood – SOLVE, PHASER for SAD

• Density modification: Statistical density modification (RESOLVE)

• Automated model building:– Pattern matching methods (RESOLVE or TEXTAL)

• Structure refinement: phenix.refine (likelihood, annealing, TLS)

• Advanced automation: AutoSol – hkl to map

• Ligand building and fitting: eLBOW, AutoLigand

• Validation and Hydrogens: MolProbity + Reduce

Page 5: Reusing phenix.refine for powder data? Ralf W. Grosse-Kunstleve Computational Crystallography Initiative Lawrence Berkeley National Laboratory Workshop.

phenix.refine

- Group ADP refinement

- Rigid body refinement

- Restrained refinement (xyz, iso/aniso ADP)

- Automatic water picking

- Bond density

- Unrestrained refinement

- FFT or direct summation

- Hydrogens

- Automatic NCS restraints

- Simulated Annealing

- Occupancies (individual, group)

- TLS refinement

- Twinned data

- X-ray, Neutron, joint X-ray + Neutron refinement

Page 6: Reusing phenix.refine for powder data? Ralf W. Grosse-Kunstleve Computational Crystallography Initiative Lawrence Berkeley National Laboratory Workshop.

Refinement flowchart

Input data and model processing

Refinement strategy selection

Bulk-solvent, Anisotropic scaling, Twinning parameters refinement

Ordered solvent (add / remove)

Target weights calculation

Coordinate refinement(rigid body, individual)

(minimization or Simulated Annealing)

ADP refinement(TLS, group, individual iso / aniso)

Occupancy refinement (individual, group)

Output: Refined model, various maps, structure factors, complete statistics

PDB model,Any data format (CNS, Shelx, MTZ, …)

Files for COOT, O, PyMol

Repeated several times

Page 7: Reusing phenix.refine for powder data? Ralf W. Grosse-Kunstleve Computational Crystallography Initiative Lawrence Berkeley National Laboratory Workshop.

Designed to be very easy to use

Refinement of individual coordinates and B-factors:

% phenix.refine model.pdb data.hkl

Same as above plus water picking:

% phenix.refine model.pdb data.hkl ordered_solvent=true

Run with parameter file:

% phenix.refine model.pdb data.hkl parameter_file

refinement.main { high_resolution = 2.0 simulated_annealing = True ordered_solvent = True number_of_macro_cycles = 5}refinement.refine.adp { tls = chain A tls = chain B}

Page 8: Reusing phenix.refine for powder data? Ralf W. Grosse-Kunstleve Computational Crystallography Initiative Lawrence Berkeley National Laboratory Workshop.

How to best make ends meet?

• GSAS & proteins– Extending a small-molecule powder program to deal with

proteins– Advantage: program designed for the field

• Community used to inputs, outputs, idiosyncrasies

– Disadvantage: some approaches suitable for small molecules don’t scale

• Direct-summation structure factor calculation• Neighborhood calculations (nonbonded interactions, a.k.a.

anti-bumping restraints)

• phenix.refine– Extending a single-crystal protein program to deal with

powders– Advantage: program designed to deal with large structures

• Protein, RNA/DNA restraint libraries, optimized algorithms

– Disadvantage: new data formats, differences in terminology

Page 9: Reusing phenix.refine for powder data? Ralf W. Grosse-Kunstleve Computational Crystallography Initiative Lawrence Berkeley National Laboratory Workshop.

Two main challenges

• Challenge 1:– Input/output of powder-specific format

• Fundamentally trivial but potentially tedious

• New command?– No interference with existing, non-trivial algorithms for automatic

recognition, processing, and consolidation of already very heterogeneous inputs

• Extend the existing input algorithms?– Nicer, but requires higher degree of collaboration

• Challenge 2:– Development of a powder-specific target function

• Based on extracted intensities or primary pattern + pre-fitted profile parameters?

• Maximum likelihood with or without cross-validation?

• Will probably require some refactoring of the refinement engine

Page 10: Reusing phenix.refine for powder data? Ralf W. Grosse-Kunstleve Computational Crystallography Initiative Lawrence Berkeley National Laboratory Workshop.

Modular design

• Application level– phenix wizards (data in, structure out)– phenix.refine– phenix.hyss (hybrid substructure search)– Visible source

• Library level– cctbx project, organized in modules

• libtbx, scitbx, cctbx, iotbx, mmtbx

– cctbx is intended to cover small-molecule work• But nothing yet specific to powders

– Unrestricted open source

Page 11: Reusing phenix.refine for powder data? Ralf W. Grosse-Kunstleve Computational Crystallography Initiative Lawrence Berkeley National Laboratory Workshop.

Existing target functions

• Least-squares (variety)• Maximum likelihood on amplitudes• Maximum likelihood with experimental

phases• Least-squares twin target• SAD-specific maximum likelihood target

implemented in Phaser– Reusing target from external application!

• Dirty laundry– Severe code duplication in implementation of twin target

• Needs to be consolidated

– Some friction integrating the Phaser ML-SAD target• Phaser target relatively slow: we need better bookkeeping to avoid repeated calculations with exactly the same

input

Page 12: Reusing phenix.refine for powder data? Ralf W. Grosse-Kunstleve Computational Crystallography Initiative Lawrence Berkeley National Laboratory Workshop.

Precedence for reusing cctbx?

• cctbx used heavily by all phenix collaborators• Phaser uses cctbx -> cctbx supported by CCP4 6.0

and up• smtbx: small-molecule toolbox

– Group at Durham University, U.K. collaborating with David Watkin at Oxford University, U.K.

– Long-term goal: highly integrated single-crystal structure determination (direct methods), automatic model building and refinement

– Initial focus: iterative model building and refinement– Initial approach: reuse + adjust cctbx core libraries directly

combined with copying sub-modules to smtbx where they are modified

– Long term: consolidate duplications as much as possible• half the code = half the bugs, reuse of optimizations

Page 13: Reusing phenix.refine for powder data? Ralf W. Grosse-Kunstleve Computational Crystallography Initiative Lawrence Berkeley National Laboratory Workshop.

Summary of ideas

• Implement powder-specific target function(s) that plug into the refinement engine in the open source cctbx libraries– Can be done stand-alone using ad-hoc input/output methods– Collaborate in making the necessary adjustments to the existing

libraries• Figure out the best way to handle input/output at the

application level– Learn and re-evaluate as we go

• If the powder field joins in there will be the potential for direct cross-fertilization between three specializations in crystallography– Single-crystal protein– Single-crystal small-molecule– Powder diffraction protein– More? (powder diffraction small-molecule)

• cctbx libraries are very general• Ever increasing integration is the secret behind the stunning

successes in the development of computing technology– Can we make this idea work in crystallography?

Page 14: Reusing phenix.refine for powder data? Ralf W. Grosse-Kunstleve Computational Crystallography Initiative Lawrence Berkeley National Laboratory Workshop.

Availability

• Phenix incl. Graphical User Interface– http://www.phenix-online.org/

– Freely available to academic (non-profit) groups

• Core libraries (cctbx)– http://cctbx.sourceforge.net/

– Freely available to all

Page 15: Reusing phenix.refine for powder data? Ralf W. Grosse-Kunstleve Computational Crystallography Initiative Lawrence Berkeley National Laboratory Workshop.

Acknowledgments

• Phenix developers

– P.D. Adams– P. Afonine– T.R. Ioerger– A.J. McCoy– E.W. McKee– N.W. Moriarty– R.J. Read– N.K. Sauter– J.N. Smith– L.C. Storoni– T.C. Terwilliger– P.H. Zwart

• Funding: – LBNL (DE-AC03-76SF00098)

– NIH/NIGMS (1P01GM063210)

– PHENIX Industrial Consortium