What is the Phase Problem? Overview of the Phase...
Transcript of What is the Phase Problem? Overview of the Phase...
1
14 Feb 2008 Biology 555Crystallographic Phasing I p. 1 of 42
Protein DataCrystal StructurePhases
Overview of the Phase Problem
John RoseACA Summer School 2006
Reorganized by Andy Howard, Biology 555, Spring 2008
RememberWe can measure reflection intensities
We can calculate structure factors from the intensitiesWe can calculate the structure factors from atomic positions
We need phase information to generate the image
x,y.z
X-ray Diffraction Experiment
All phase information is lost
Fhkl[Real Space] [Reciprocal Space]
What is the Phase Problem?
In the X-ray diffraction experiment photons are reflected from thecrystal lattice (planes) in different directions giving rise to thediffraction pattern.
Using a variety of detectors (film, image plates, CCD areadetectors) we can estimate intensities but we lose anyinformation about the relative phase for differentreflections.
14 Feb 2008 Biology 555Crystallographic Phasing I p. 3 of 42
Phases• Let’s define a phase φj associated with a specific plane
[hkl] for an individual atom:φj = 2π(hxj + kyj + lzj)
• Atom at xj=0.40, yj=0.05, zj=0.10 for plane [213]:φj = 2π(2*0.40 + 1*0.05 + 3*0.10) = 2π(1.35)
• If we examine a 2-dimensional case like k=0, thenφj = 2π(hxj + lzj)
• Thus for [201] (a two-dimensional case):φj = 2π(2*0.40 + 0*0.05 + 1*0.10) = 2π(0.90)
• Now, to understand what this means:
14 Feb 2008 Biology 555Crystallographic Phasing I p. 4 of 42
A
B
G
C
H
D
F
I
E
A
B
G
C
H
D
F
I
E
0°
720°
c0
a
201 planes
4π
360°2π
1080°
6π
0.4, y, 0.1
φD = 2π[ 2•(0.40) + 1•(0.10)] = 2π(0.90)
201 Phases
14 Feb 2008 Biology 555Crystallographic Phasing I p. 5 of 42
0 c
a
dhkl
dhkl
dhkl 6π
4π
2π
Atom (j) at x,y,z
φ
Plane hkl
In General for Any Atom (x, y, z)
Remember:We express any position in the cell as
(1) fractional coordinates: pxyz = xja+yjb+zjc(2) the sum of integral multiples of the reciprocal axesσhkl = ha* + kb* + lc*
14 Feb 2008 Biology 555Crystallographic Phasing I p. 6 of 42
Diffraction vector for a Bragg spot
• We set up the diffraction vector σhkl associatedwith a specific diffraction direction hkl: σhkl = ha* + kb* + lc*
• The magnitude of this diffraction vector is thereciprocal of our Bragg-law plane spacing dhkl
:
|σhkl| = 1/ dhkl
2
14 Feb 2008 Biology 555Crystallographic Phasing I p. 7 of 42
Phase angle for a spot
• The phase angle φj associated with our atom is 2πtimes the projection of the displacement vector pjonto σhkl: φj = 2π σhkl• pj
• But that displacement vector pj is related to thereal-space coordinates of the atom at position j:pj = xja + yjb + zjcwhere the fractional coordinates of our atomwithin the unit cell are (xj, yj, zj)
• Thus φj = 2π (ha* + kb* + lc*) • (xja + yjb + zjc)14 Feb 2008 Biology 555
Crystallographic Phasing I p. 8 of 42
Real-space and reciprocal space
• But these real-space and reciprocal-spaceunit cell vectors (a,b,c) and (a*,b*,c*) areduals of one another; that is, they obey:
a•a* = 1, a•b* = 0, a•c* =0b•a* = 0, b•b* = 1, b•c* =0c•a* = 0, c•b* = 0, c•c* = 1• … even when the unit cell isn’t all full of
90-degree angles!
14 Feb 2008 Biology 555Crystallographic Phasing I p. 9 of 42
Matrix formulation of this duality
• If we construct the 3x3 reciprocal-space unitcell matrix A = (a* b* c*)
• And the 3x3 real-space unit cell matrixR = (a b c)for a specific position of the sample, then
• A and R obey the simple relationshipA = R-1, i.e. AR = I
• Where I is a 3x3 identity matrix
14 Feb 2008 Biology 555Crystallographic Phasing I p. 10 of 42
How to use this in getting phases
• φj = 2π (ha* + kb* + lc*) • (xja + yjb + zjc)• But using those dual relationships,
e.g. a*•a = 1, b*•c = 0, we getφj = 2π (hxj + kyj + lzj)
• Note that this is true even if our unit cellangles aren’t 90º!
14 Feb 2008 Biology 555Crystallographic Phasing I p. 11 of 42
Structure Factor
Fourier transform
Inverse Fourier transform
Electron Density
Why Do We Need the Phase?
• In order to reconstruct the molecular image(electron density) from its diffraction pattern boththe intensity and phase, which can assume anyvalue from 0 to 2π, of each of the thousands ofmeasured reflections must be known.
14 Feb 2008 Biology 555Crystallographic Phasing I p. 12 of 42
Hauptman amplitudeswith Hauptman phases
Hauptman amplitudeswith Karle phases
Karle amplitudeswith Karle phases
Karle amplitudeswith Hauptman phases
Importance of Phases
Phases dominate the image!Phase estimates need to be accurate
3
14 Feb 2008 Biology 555Crystallographic Phasing I p. 13 of 42
Understanding the Phase Problem• The phase problem can be best understood from a simple
mathematical construct.• The structure factors (Fhkl) are treated in diffraction theory
as complex quantities, i.e., they consist of a real part(Ahkl) and an imaginary part (Bhkl).
• If the phases, Φhkl, were available, the values of Ahkl andBhkl could be calculated from very simple trigonometry:
• Ahkl = |Fhkl| cos (Φhkl)
• Bhkl = |Fhkl| sin (Φhkl)
• This leads to the relationship: (Ahkl)2 + (Bhkl)2 = |Fhkl|2 = Ihkl
14 Feb 2008 Biology 555Crystallographic Phasing I p. 14 of 42
Argand Diagram
real
imaginary
Fhkl
!hkl
Ahkl
Bhkl
Figure 3. An Argand diagram of
structure factor Fhkl with phase
!hkl. The real (Ahkl) and imaginary
(Bhkl) components are also shown.
(Ahkl)2 + (Bhkl)2 = |Fhkl|2 = Ihkl
!
"hkl
= tan#1 Bhkl
Ahkl
lll hkhkhkiBAF +=
The above relationships are oftenillustrated using an Arganddiagram (right).
From the Argand diagram, it isobvious that Ahkl and Bhkl maybe either positive or negative,depending on the value of thephase angle, Φhkl.
Note: the units of Ahkl, Bhkl and Fhklare in electrons.
14 Feb 2008 Biology 555Crystallographic Phasing I p. 15 of 42
sinθ/λ
f0
Here fj is the atomic scattering factor
!
Fhkl = f je2"i(hx j +ky j +lz j )
j=1
N
#Atomic scattering factors
The Structure Factor
• The scattering factor for eachatom type in the structure isevaluated at the correct sinθ/λ.That value is the scatteringability for that atom.
• Remember sinθ/λ = 1/(2dhkl)• We now have an atomic
scattering factor withmagnitude f0 and direction φj
14 Feb 2008 Biology 555Crystallographic Phasing I p. 16 of 42
)(2 jjjj zkyhx l++= !"
!!==
++==
N
j
i
j
N
j
zkyhxi
jhkjjjj efef
11
)(2 "# l
lF
real
imaginaryIndividualatom fjs
ResultantFhkl
Ahkl
Bhkl
The Structure FactorSum of all individual atom contributions
14 Feb 2008 Biology 555Crystallographic Phasing I p. 17 of 42
!
"x,y,z =1
VFhkle
#2$i[hx+ky+ lz ]
hkl
%& ' (
) * +
=1
VFhkle
#i,
hkl
%& ' (
) * +
e#i,
= cos,+ isin,
Fhkl = Ahkl + iBhkl
"x,y,z =1
VAhkl cos,+ Bhkl sin,
hkl
%hkl
%& ' (
) * +
"x,y,z =1
VAhkl cos[2$ (hx + ky + lz)]+ Bhkl sin[2$ (hx + ky + lz)]
hkl
%hkl
%& ' (
) * +
Here V is the volume of the unit cell
Electron Density• Remember the electron density (image of the molecule) is
the Fourier transform of the structure factor Fhkl. Thus
14 Feb 2008 Biology 555Crystallographic Phasing I p. 18 of 42
How to calculate ρ(x,y,z)• In practice, the electron density for one
three-dimensional unit cell is calculatedby starting at x, y, z = (0, 0, 0) andstepping incrementally along each axis,summing the terms as shown in theequation above for all hkl (as limited bythe resolution of the data) at each pointin space.
4
14 Feb 2008 Biology 555Crystallographic Phasing I p. 19 of 42
Solving the Phase Problem
• Small molecules• Direct Methods• Patterson Methods• Molecular Replacement
• Macromolecules• Multiple Isomorphous Replacement (MIR)• Multi Wavelength Anomalous Dispersion (MAD)• Single Isomorphous Replacement (SIR)• Single Wavelength Anomalous Scattering (SAS)• Molecular Replacement• Direct Methods (special cases)
14 Feb 2008 Biology 555Crystallographic Phasing I p. 20 of 42
Solving the Phase Problem
SMALL MOLECULES:• The use of Direct Methods has essentially solved the
phase problem for well diffracting small moleculecrystals.
MACROMOLECULES:• Today, anomalous scattering techniques such as MAD
or SAS are the most common techniques used for denovo structure determination of macromolecules. Bothtechniques require the presence of one or moreanomalous scatterers in the crystal.
14 Feb 2008 Biology 555Crystallographic Phasing I p. 21 of 42
Direct methods• Karle, Hauptman, David Sayre, and
others determined algebraicrelationships among phase angles ofgroups of reflections.
• The simplest are triplet relationships:For three reflectionsh1=(h1,k1,l1), h2=(h2,k2,l2), h3=(h3,k3,l3),they showed that if h3= -h1- h2, then
• Φ1 + Φ2 + Φ3 ≈ 0• Thus if Φ1 and Φ2 are known then we
can estimate that Φ3 ≈ -Φ1 - Φ2
David Sayre
14 Feb 2008 Biology 555Crystallographic Phasing I p. 22 of 42
When do triplet relations hold?
• Note the approximately zero value in thatrelationship Φ1 + Φ2 + Φ3 ≈ 0.
• The stronger the Bragg reflections are, thecloser this condition is to being exact.
• For very strong Bragg reflections that sumwill be very close to zero
• For weaker ones it may differ significantlyfrom zero
14 Feb 2008 Biology 555Crystallographic Phasing I p. 23 of 42
Phase probabilities• This notion of relationships among phases
obliges us to think of phases probabilisticallyrather than deterministically. This is a key tothe direct-methods approach and has a hugeinfluence on how we think about phasedetermination.
• I’m introducing all of this mostly to get youaccustomed to the notion of phaseprobability distributions!
14 Feb 2008 Biology 555Crystallographic Phasing I p. 24 of 42
Phase probabilities
• Any phase has a value between 0 and 2π(or 0 and 360, if we’re using degrees)
• If we know it’s close to 2π*0.42, then:• If it’s 2π*(0.42 ±0.01), it’s a sharp phase
probability distribution• If it’s 2π*(0.42 ±0.32), it’s a much broader
phase probability distribution
5
14 Feb 2008 Biology 555Crystallographic Phasing I p. 25 of 42
Plots of phase probability• Integral of probability must
be 1, since every phase hasto have some value.
P(φ)
φ0 2π
Sharp distribution
Broad distribution
14 Feb 2008 Biology 555Crystallographic Phasing I p. 26 of 42
How can we use this?
• Obviously if we don’t know φ1+φ2, we can’t usethis to calculate φ3, even if the intensities of allthree are large.
• But we could guess what φ1 and φ2 are and use thisto compute φ3.
• Then we guess φ4 and use the triplet relationshipto compute φ5 and φ6,where h5 = -h1 - h4 and h6 = -h1 - h4 …assuming that reflections 5 and 6 are strong, too!
14 Feb 2008 Biology 555Crystallographic Phasing I p. 27 of 42
Can we make this work?
• We start with guessed phases for a 10-100 strongreflections and use the triplet relationships todetermine the phases for another 1000 reflections
• Any particular calculated phase can be determinedby several different triplet relationships, so ifthey’re self-consistent, the initial guessed 10-100are correct; if they aren’t self-consistent, the guesswas wrong!
• In the latter case, we try a different set of guessesfor our 10-100 starting phases and keep going
14 Feb 2008 Biology 555Crystallographic Phasing I p. 28 of 42
This actually works, provided:
• The data are correctly measured• The data are strong enough that we can pick 1000
strong reflections to use in this process• The data extend to high enough resolution that
atomicity (separable atoms) is really found• There are ways to do direct methods without
assuming atomicity, but they’re more complicated
14 Feb 2008 Biology 555Crystallographic Phasing I p. 29 of 42
Is this relevant tomacromolecules?
• Not directly:– Atomicity rarely present– Systematic errors in data
• Indirectly yes, because it can beused in conjunction with othermethods for locating heavy atoms inthe SIR, MIR, and SAS methods
• It also helps introduce the notion ofphase probability distributions(sneaky!)
14 Feb 2008 Biology 555Crystallographic Phasing I p. 30 of 42
SIR and SAS Methods1. Need a heavy atom (lots of electrons) or a anomalous
scatterer (large anomalous scattering signal) in thecrystal.
• SIR - heavy atoms usually soaked in.• SAS - anomalous scatterers usually engineered in
as selenomethional labels. Can also be soaked.2. SIR collect a native and a derivative data set (2 sets
total). SAS collect one highly redundant data set andkeep anomalous pairs separate during processing.
• SAS - may want to choose a scatterer orwavelength that enhances the anomalous signal.
3. Must find the heavy atoms or anomalous scatterers• can use Patterson analysis or direct methods.
4. Must resolve the bimodal ambiguity.• use solvent flattening or similar technique
6
14 Feb 2008 Biology 555Crystallographic Phasing I p. 31 of 42
What’s the bimodal ambiguity?
• As we’ll show next time, a singleisomorphous derivative or anomalousscatterer enables us to measure each phaseapart from an ambiguity
• That is, for each phase we get two answers(e.g. 2π*0.12 and 2π*0.55), and we can’tpick one out
• A second scatterer will resolve that
14 Feb 2008 Biology 555Crystallographic Phasing I p. 32 of 42
Phase probabilities with no error• A single derivative with no
error gives a phaseprobability like this:
P(φ)
φ0 2π
14 Feb 2008 Biology 555Crystallographic Phasing I p. 33 of 42
2 derivatives, no error• The two distributions
overlap at the correctanswer, not at thewrong answer
P(φ)
φ0 2π
Correct phase
Wrongestimatederived fromderivative 2
Wrongestimatederived fromderivative 1
14 Feb 2008 Biology 555Crystallographic Phasing I p. 34 of 42
Errors spread this out
• Each phase estimate is not really that sharp• Lack of isomorphism (see below) makes
each distribution spread out• Joint probability distribution from 2 or more
experiments is the product of the probabilitydistributions of the individual experiments
14 Feb 2008 Biology 555Crystallographic Phasing I p. 35 of 42
Realistic probability distributions• Joint probability
distribution = productof individual ones
P(φ)
φ0 2π
14 Feb 2008 Biology 555Crystallographic Phasing I p. 36 of 42
Joint probability distributionPhase probability
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Phase/2!
P(p
hase)
norm(P1)
norm(P2)
norm(P1*P2)
P1(") for first
derivative
with peaks at
0.32 and
0.558
P2(") for 2nd
derivative
with peaks at
0.315 and 0.815
Joint
probability
distribution
=
P1(") *
7
14 Feb 2008 Biology 555Crystallographic Phasing I p. 37 of 42
Heavy Atom Derivatives
Heavy atom derivatives MUST beisomorphous
• Heavy atom derivatives are generally prepared by soakingcrystals in dilute (2 - 20 mM) solutions of heavy atom salts(see Table II below for some examples).
• Crystal cracking is generally a good indication that thatheavy atom is interacting with the crystal lattice, andsuggests that a good derivative can be obtained by soakingthe crystal in a more dilute solution.
14 Feb 2008 Biology 555Crystallographic Phasing I p. 38 of 42
Is the derivative worth using?
• Once derivative data has been collected, themerging R factor (Rmerge) between the native andderivative data sets can be used to check for heavyatom incorporation and isomorphism. Rmergevalues for isomorphous derivatives range from0.05 to 0.15. Values below 0.05 indicate thatthere is little heavy atom incorporation. Valuesabove 0.15 indicate a lack of isomorphismbetween the two crystals.
14 Feb 2008 Biology 555Crystallographic Phasing I p. 39 of 42
What is isomorphism?
• Isomorphism for derivatives means that thestructure of the derivatized macromoleculeis identical to the structure of theunderivatized molecule except at the sitewhere the derivative compound has beenintroduced.
14 Feb 2008 Biology 555Crystallographic Phasing I p. 40 of 42
What is lack of isomorphism?
• A derivative may be nonisomorphous if:– It alters the unit cell lengths or angles
significantly (>0.2%?)– It rotates or translates the entire macromolecule
within the unit cell– It alters significantly the conformation of a
large segment (> 8 amino acids or 4nucleotides?) of the mcromolecule
14 Feb 2008 Biology 555Crystallographic Phasing I p. 41 of 42
Derivative compounds
Table II. Protein Residues and Their Affinities for Heavy Metals
Residue: Affinity for: Conditions:
Histidine K2PtCl4, NaAuCl4, EtHgPO4H2 pH>6
Tryptophan Hg(OAc)2, EtHgPO4H2
Glutamic, Aspartic Acids UO2(NO3)2, rare earth cations pH>5
Cysteine Hg,Ir,Pt,Pd,Au cations ph>7
Methionine PtCl42- anion
14 Feb 2008 Biology 555Crystallographic Phasing I p. 42 of 42
From Glusker, Lewis and Rossi
!
Puvw
=1
V|F
hkl
hkl
" |2cos2# (hu + kv + lv)
Finding the Heavy Atomsor Anomalous Scatterers
The Patterson function - a F2 Fourier transform with φ = 0 - vector map (u,v,w instead of
x,y,z) - maps all inter-atomic vectors - get N2 vectors!!
(where N= number of atoms)