What is the Phase Problem? Overview of the Phase...

7
1 14 Feb 2008 Biology 555 Crystallographic Phasing I p. 1 of 42 Protein Data Crystal Structure Phases Overview of the Phase Problem John Rose ACA Summer School 2006 Reorganized by Andy Howard, Biology 555, Spring 2008 Remember We can measure reflection intensities We can calculate structure factors from the intensities We can calculate the structure factors from atomic positions We need phase information to generate the image x,y.z X-ray Diffraction Experiment All phase information is lost F hkl [Real Space] [Reciprocal Space] What is the Phase Problem? In the X-ray diffraction experiment photons are reflected from the crystal lattice (planes) in different directions giving rise to the diffraction pattern. Using a variety of detectors (film, image plates, CCD area detectors) we can estimate intensities but we lose any information about the relative phase for different reflections. 14 Feb 2008 Biology 555 Crystallographic Phasing I p. 3 of 42 Phases Let’s define a phase φ j associated with a specific plane [hkl] for an individual atom: φ j = 2π(hx j + ky j + lz j ) Atom at x j =0.40, y j =0.05, z j =0.10 for plane [213]: φ j = 2π(2*0.40 + 1*0.05 + 3*0.10) = 2π(1.35) If we examine a 2-dimensional case like k=0, then φ j = 2π(hx j + lz j ) Thus for [201] (a two-dimensional case): φ j = 2π(2*0.40 + 0*0.05 + 1*0.10) = 2π(0.90) Now, to understand what this means: 14 Feb 2008 Biology 555 Crystallographic Phasing I p. 4 of 42 A B G C H D F I E A B G C H D F I E 720° c 0 a 201 planes 4π 360° 2π 1080° 6π 0.4, y, 0.1 φ D = 2π[ 2•(0.40) + 1•(0.10)] = 2π(0.90) 201 Phases 14 Feb 2008 Biology 555 Crystallographic Phasing I p. 5 of 42 0 c a d hkl d hkl d hkl 6π 4π 2π Atom (j) at x,y,z φ Plane hkl In General for Any Atom (x, y, z) Remember: We express any position in the cell as (1) fractional coordinates: p xyz = x j a+y j b+z j c (2) the sum of integral multiples of the reciprocal axes σ hkl = ha* + kb* + l c* 14 Feb 2008 Biology 555 Crystallographic Phasing I p. 6 of 42 Diffraction vector for a Bragg spot We set up the diffraction vector σ hkl associated with a specific diffraction direction hkl: σ hkl = ha* + kb* + lc* The magnitude of this diffraction vector is the reciprocal of our Bragg-law plane spacing d hkl : |σ hkl | = 1/ d hkl

Transcript of What is the Phase Problem? Overview of the Phase...

Page 1: What is the Phase Problem? Overview of the Phase Problemagni.phys.iit.edu/~howard/ACASchool/lectures07/Phasing1.pdf · Reorganized by Andy Howard, Biology 555, Spring 2008 Remember

1

14 Feb 2008 Biology 555Crystallographic Phasing I p. 1 of 42

Protein DataCrystal StructurePhases

Overview of the Phase Problem

John RoseACA Summer School 2006

Reorganized by Andy Howard, Biology 555, Spring 2008

RememberWe can measure reflection intensities

We can calculate structure factors from the intensitiesWe can calculate the structure factors from atomic positions

We need phase information to generate the image

x,y.z

X-ray Diffraction Experiment

All phase information is lost

Fhkl[Real Space] [Reciprocal Space]

What is the Phase Problem?

In the X-ray diffraction experiment photons are reflected from thecrystal lattice (planes) in different directions giving rise to thediffraction pattern.

Using a variety of detectors (film, image plates, CCD areadetectors) we can estimate intensities but we lose anyinformation about the relative phase for differentreflections.

14 Feb 2008 Biology 555Crystallographic Phasing I p. 3 of 42

Phases• Let’s define a phase φj associated with a specific plane

[hkl] for an individual atom:φj = 2π(hxj + kyj + lzj)

• Atom at xj=0.40, yj=0.05, zj=0.10 for plane [213]:φj = 2π(2*0.40 + 1*0.05 + 3*0.10) = 2π(1.35)

• If we examine a 2-dimensional case like k=0, thenφj = 2π(hxj + lzj)

• Thus for [201] (a two-dimensional case):φj = 2π(2*0.40 + 0*0.05 + 1*0.10) = 2π(0.90)

• Now, to understand what this means:

14 Feb 2008 Biology 555Crystallographic Phasing I p. 4 of 42

A

B

G

C

H

D

F

I

E

A

B

G

C

H

D

F

I

E

720°

c0

a

201 planes

360°2π

1080°

0.4, y, 0.1

φD = 2π[ 2•(0.40) + 1•(0.10)] = 2π(0.90)

201 Phases

14 Feb 2008 Biology 555Crystallographic Phasing I p. 5 of 42

0 c

a

dhkl

dhkl

dhkl 6π

Atom (j) at x,y,z

φ

Plane hkl

In General for Any Atom (x, y, z)

Remember:We express any position in the cell as

(1) fractional coordinates: pxyz = xja+yjb+zjc(2) the sum of integral multiples of the reciprocal axesσhkl = ha* + kb* + lc*

14 Feb 2008 Biology 555Crystallographic Phasing I p. 6 of 42

Diffraction vector for a Bragg spot

• We set up the diffraction vector σhkl associatedwith a specific diffraction direction hkl: σhkl = ha* + kb* + lc*

• The magnitude of this diffraction vector is thereciprocal of our Bragg-law plane spacing dhkl

:

|σhkl| = 1/ dhkl

Page 2: What is the Phase Problem? Overview of the Phase Problemagni.phys.iit.edu/~howard/ACASchool/lectures07/Phasing1.pdf · Reorganized by Andy Howard, Biology 555, Spring 2008 Remember

2

14 Feb 2008 Biology 555Crystallographic Phasing I p. 7 of 42

Phase angle for a spot

• The phase angle φj associated with our atom is 2πtimes the projection of the displacement vector pjonto σhkl: φj = 2π σhkl• pj

• But that displacement vector pj is related to thereal-space coordinates of the atom at position j:pj = xja + yjb + zjcwhere the fractional coordinates of our atomwithin the unit cell are (xj, yj, zj)

• Thus φj = 2π (ha* + kb* + lc*) • (xja + yjb + zjc)14 Feb 2008 Biology 555

Crystallographic Phasing I p. 8 of 42

Real-space and reciprocal space

• But these real-space and reciprocal-spaceunit cell vectors (a,b,c) and (a*,b*,c*) areduals of one another; that is, they obey:

a•a* = 1, a•b* = 0, a•c* =0b•a* = 0, b•b* = 1, b•c* =0c•a* = 0, c•b* = 0, c•c* = 1• … even when the unit cell isn’t all full of

90-degree angles!

14 Feb 2008 Biology 555Crystallographic Phasing I p. 9 of 42

Matrix formulation of this duality

• If we construct the 3x3 reciprocal-space unitcell matrix A = (a* b* c*)

• And the 3x3 real-space unit cell matrixR = (a b c)for a specific position of the sample, then

• A and R obey the simple relationshipA = R-1, i.e. AR = I

• Where I is a 3x3 identity matrix

14 Feb 2008 Biology 555Crystallographic Phasing I p. 10 of 42

How to use this in getting phases

• φj = 2π (ha* + kb* + lc*) • (xja + yjb + zjc)• But using those dual relationships,

e.g. a*•a = 1, b*•c = 0, we getφj = 2π (hxj + kyj + lzj)

• Note that this is true even if our unit cellangles aren’t 90º!

14 Feb 2008 Biology 555Crystallographic Phasing I p. 11 of 42

Structure Factor

Fourier transform

Inverse Fourier transform

Electron Density

Why Do We Need the Phase?

• In order to reconstruct the molecular image(electron density) from its diffraction pattern boththe intensity and phase, which can assume anyvalue from 0 to 2π, of each of the thousands ofmeasured reflections must be known.

14 Feb 2008 Biology 555Crystallographic Phasing I p. 12 of 42

Hauptman amplitudeswith Hauptman phases

Hauptman amplitudeswith Karle phases

Karle amplitudeswith Karle phases

Karle amplitudeswith Hauptman phases

Importance of Phases

Phases dominate the image!Phase estimates need to be accurate

Page 3: What is the Phase Problem? Overview of the Phase Problemagni.phys.iit.edu/~howard/ACASchool/lectures07/Phasing1.pdf · Reorganized by Andy Howard, Biology 555, Spring 2008 Remember

3

14 Feb 2008 Biology 555Crystallographic Phasing I p. 13 of 42

Understanding the Phase Problem• The phase problem can be best understood from a simple

mathematical construct.• The structure factors (Fhkl) are treated in diffraction theory

as complex quantities, i.e., they consist of a real part(Ahkl) and an imaginary part (Bhkl).

• If the phases, Φhkl, were available, the values of Ahkl andBhkl could be calculated from very simple trigonometry:

• Ahkl = |Fhkl| cos (Φhkl)

• Bhkl = |Fhkl| sin (Φhkl)

• This leads to the relationship: (Ahkl)2 + (Bhkl)2 = |Fhkl|2 = Ihkl

14 Feb 2008 Biology 555Crystallographic Phasing I p. 14 of 42

Argand Diagram

real

imaginary

Fhkl

!hkl

Ahkl

Bhkl

Figure 3. An Argand diagram of

structure factor Fhkl with phase

!hkl. The real (Ahkl) and imaginary

(Bhkl) components are also shown.

(Ahkl)2 + (Bhkl)2 = |Fhkl|2 = Ihkl

!

"hkl

= tan#1 Bhkl

Ahkl

lll hkhkhkiBAF +=

The above relationships are oftenillustrated using an Arganddiagram (right).

From the Argand diagram, it isobvious that Ahkl and Bhkl maybe either positive or negative,depending on the value of thephase angle, Φhkl.

Note: the units of Ahkl, Bhkl and Fhklare in electrons.

14 Feb 2008 Biology 555Crystallographic Phasing I p. 15 of 42

sinθ/λ

f0

Here fj is the atomic scattering factor

!

Fhkl = f je2"i(hx j +ky j +lz j )

j=1

N

#Atomic scattering factors

The Structure Factor

• The scattering factor for eachatom type in the structure isevaluated at the correct sinθ/λ.That value is the scatteringability for that atom.

• Remember sinθ/λ = 1/(2dhkl)• We now have an atomic

scattering factor withmagnitude f0 and direction φj

14 Feb 2008 Biology 555Crystallographic Phasing I p. 16 of 42

)(2 jjjj zkyhx l++= !"

!!==

++==

N

j

i

j

N

j

zkyhxi

jhkjjjj efef

11

)(2 "# l

lF

real

imaginaryIndividualatom fjs

ResultantFhkl

Ahkl

Bhkl

The Structure FactorSum of all individual atom contributions

14 Feb 2008 Biology 555Crystallographic Phasing I p. 17 of 42

!

"x,y,z =1

VFhkle

#2$i[hx+ky+ lz ]

hkl

%& ' (

) * +

=1

VFhkle

#i,

hkl

%& ' (

) * +

e#i,

= cos,+ isin,

Fhkl = Ahkl + iBhkl

"x,y,z =1

VAhkl cos,+ Bhkl sin,

hkl

%hkl

%& ' (

) * +

"x,y,z =1

VAhkl cos[2$ (hx + ky + lz)]+ Bhkl sin[2$ (hx + ky + lz)]

hkl

%hkl

%& ' (

) * +

Here V is the volume of the unit cell

Electron Density• Remember the electron density (image of the molecule) is

the Fourier transform of the structure factor Fhkl. Thus

14 Feb 2008 Biology 555Crystallographic Phasing I p. 18 of 42

How to calculate ρ(x,y,z)• In practice, the electron density for one

three-dimensional unit cell is calculatedby starting at x, y, z = (0, 0, 0) andstepping incrementally along each axis,summing the terms as shown in theequation above for all hkl (as limited bythe resolution of the data) at each pointin space.

Page 4: What is the Phase Problem? Overview of the Phase Problemagni.phys.iit.edu/~howard/ACASchool/lectures07/Phasing1.pdf · Reorganized by Andy Howard, Biology 555, Spring 2008 Remember

4

14 Feb 2008 Biology 555Crystallographic Phasing I p. 19 of 42

Solving the Phase Problem

• Small molecules• Direct Methods• Patterson Methods• Molecular Replacement

• Macromolecules• Multiple Isomorphous Replacement (MIR)• Multi Wavelength Anomalous Dispersion (MAD)• Single Isomorphous Replacement (SIR)• Single Wavelength Anomalous Scattering (SAS)• Molecular Replacement• Direct Methods (special cases)

14 Feb 2008 Biology 555Crystallographic Phasing I p. 20 of 42

Solving the Phase Problem

SMALL MOLECULES:• The use of Direct Methods has essentially solved the

phase problem for well diffracting small moleculecrystals.

MACROMOLECULES:• Today, anomalous scattering techniques such as MAD

or SAS are the most common techniques used for denovo structure determination of macromolecules. Bothtechniques require the presence of one or moreanomalous scatterers in the crystal.

14 Feb 2008 Biology 555Crystallographic Phasing I p. 21 of 42

Direct methods• Karle, Hauptman, David Sayre, and

others determined algebraicrelationships among phase angles ofgroups of reflections.

• The simplest are triplet relationships:For three reflectionsh1=(h1,k1,l1), h2=(h2,k2,l2), h3=(h3,k3,l3),they showed that if h3= -h1- h2, then

• Φ1 + Φ2 + Φ3 ≈ 0• Thus if Φ1 and Φ2 are known then we

can estimate that Φ3 ≈ -Φ1 - Φ2

David Sayre

14 Feb 2008 Biology 555Crystallographic Phasing I p. 22 of 42

When do triplet relations hold?

• Note the approximately zero value in thatrelationship Φ1 + Φ2 + Φ3 ≈ 0.

• The stronger the Bragg reflections are, thecloser this condition is to being exact.

• For very strong Bragg reflections that sumwill be very close to zero

• For weaker ones it may differ significantlyfrom zero

14 Feb 2008 Biology 555Crystallographic Phasing I p. 23 of 42

Phase probabilities• This notion of relationships among phases

obliges us to think of phases probabilisticallyrather than deterministically. This is a key tothe direct-methods approach and has a hugeinfluence on how we think about phasedetermination.

• I’m introducing all of this mostly to get youaccustomed to the notion of phaseprobability distributions!

14 Feb 2008 Biology 555Crystallographic Phasing I p. 24 of 42

Phase probabilities

• Any phase has a value between 0 and 2π(or 0 and 360, if we’re using degrees)

• If we know it’s close to 2π*0.42, then:• If it’s 2π*(0.42 ±0.01), it’s a sharp phase

probability distribution• If it’s 2π*(0.42 ±0.32), it’s a much broader

phase probability distribution

Page 5: What is the Phase Problem? Overview of the Phase Problemagni.phys.iit.edu/~howard/ACASchool/lectures07/Phasing1.pdf · Reorganized by Andy Howard, Biology 555, Spring 2008 Remember

5

14 Feb 2008 Biology 555Crystallographic Phasing I p. 25 of 42

Plots of phase probability• Integral of probability must

be 1, since every phase hasto have some value.

P(φ)

φ0 2π

Sharp distribution

Broad distribution

14 Feb 2008 Biology 555Crystallographic Phasing I p. 26 of 42

How can we use this?

• Obviously if we don’t know φ1+φ2, we can’t usethis to calculate φ3, even if the intensities of allthree are large.

• But we could guess what φ1 and φ2 are and use thisto compute φ3.

• Then we guess φ4 and use the triplet relationshipto compute φ5 and φ6,where h5 = -h1 - h4 and h6 = -h1 - h4 …assuming that reflections 5 and 6 are strong, too!

14 Feb 2008 Biology 555Crystallographic Phasing I p. 27 of 42

Can we make this work?

• We start with guessed phases for a 10-100 strongreflections and use the triplet relationships todetermine the phases for another 1000 reflections

• Any particular calculated phase can be determinedby several different triplet relationships, so ifthey’re self-consistent, the initial guessed 10-100are correct; if they aren’t self-consistent, the guesswas wrong!

• In the latter case, we try a different set of guessesfor our 10-100 starting phases and keep going

14 Feb 2008 Biology 555Crystallographic Phasing I p. 28 of 42

This actually works, provided:

• The data are correctly measured• The data are strong enough that we can pick 1000

strong reflections to use in this process• The data extend to high enough resolution that

atomicity (separable atoms) is really found• There are ways to do direct methods without

assuming atomicity, but they’re more complicated

14 Feb 2008 Biology 555Crystallographic Phasing I p. 29 of 42

Is this relevant tomacromolecules?

• Not directly:– Atomicity rarely present– Systematic errors in data

• Indirectly yes, because it can beused in conjunction with othermethods for locating heavy atoms inthe SIR, MIR, and SAS methods

• It also helps introduce the notion ofphase probability distributions(sneaky!)

14 Feb 2008 Biology 555Crystallographic Phasing I p. 30 of 42

SIR and SAS Methods1. Need a heavy atom (lots of electrons) or a anomalous

scatterer (large anomalous scattering signal) in thecrystal.

• SIR - heavy atoms usually soaked in.• SAS - anomalous scatterers usually engineered in

as selenomethional labels. Can also be soaked.2. SIR collect a native and a derivative data set (2 sets

total). SAS collect one highly redundant data set andkeep anomalous pairs separate during processing.

• SAS - may want to choose a scatterer orwavelength that enhances the anomalous signal.

3. Must find the heavy atoms or anomalous scatterers• can use Patterson analysis or direct methods.

4. Must resolve the bimodal ambiguity.• use solvent flattening or similar technique

Page 6: What is the Phase Problem? Overview of the Phase Problemagni.phys.iit.edu/~howard/ACASchool/lectures07/Phasing1.pdf · Reorganized by Andy Howard, Biology 555, Spring 2008 Remember

6

14 Feb 2008 Biology 555Crystallographic Phasing I p. 31 of 42

What’s the bimodal ambiguity?

• As we’ll show next time, a singleisomorphous derivative or anomalousscatterer enables us to measure each phaseapart from an ambiguity

• That is, for each phase we get two answers(e.g. 2π*0.12 and 2π*0.55), and we can’tpick one out

• A second scatterer will resolve that

14 Feb 2008 Biology 555Crystallographic Phasing I p. 32 of 42

Phase probabilities with no error• A single derivative with no

error gives a phaseprobability like this:

P(φ)

φ0 2π

14 Feb 2008 Biology 555Crystallographic Phasing I p. 33 of 42

2 derivatives, no error• The two distributions

overlap at the correctanswer, not at thewrong answer

P(φ)

φ0 2π

Correct phase

Wrongestimatederived fromderivative 2

Wrongestimatederived fromderivative 1

14 Feb 2008 Biology 555Crystallographic Phasing I p. 34 of 42

Errors spread this out

• Each phase estimate is not really that sharp• Lack of isomorphism (see below) makes

each distribution spread out• Joint probability distribution from 2 or more

experiments is the product of the probabilitydistributions of the individual experiments

14 Feb 2008 Biology 555Crystallographic Phasing I p. 35 of 42

Realistic probability distributions• Joint probability

distribution = productof individual ones

P(φ)

φ0 2π

14 Feb 2008 Biology 555Crystallographic Phasing I p. 36 of 42

Joint probability distributionPhase probability

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Phase/2!

P(p

hase)

norm(P1)

norm(P2)

norm(P1*P2)

P1(") for first

derivative

with peaks at

0.32 and

0.558

P2(") for 2nd

derivative

with peaks at

0.315 and 0.815

Joint

probability

distribution

=

P1(") *

Page 7: What is the Phase Problem? Overview of the Phase Problemagni.phys.iit.edu/~howard/ACASchool/lectures07/Phasing1.pdf · Reorganized by Andy Howard, Biology 555, Spring 2008 Remember

7

14 Feb 2008 Biology 555Crystallographic Phasing I p. 37 of 42

Heavy Atom Derivatives

Heavy atom derivatives MUST beisomorphous

• Heavy atom derivatives are generally prepared by soakingcrystals in dilute (2 - 20 mM) solutions of heavy atom salts(see Table II below for some examples).

• Crystal cracking is generally a good indication that thatheavy atom is interacting with the crystal lattice, andsuggests that a good derivative can be obtained by soakingthe crystal in a more dilute solution.

14 Feb 2008 Biology 555Crystallographic Phasing I p. 38 of 42

Is the derivative worth using?

• Once derivative data has been collected, themerging R factor (Rmerge) between the native andderivative data sets can be used to check for heavyatom incorporation and isomorphism. Rmergevalues for isomorphous derivatives range from0.05 to 0.15. Values below 0.05 indicate thatthere is little heavy atom incorporation. Valuesabove 0.15 indicate a lack of isomorphismbetween the two crystals.

14 Feb 2008 Biology 555Crystallographic Phasing I p. 39 of 42

What is isomorphism?

• Isomorphism for derivatives means that thestructure of the derivatized macromoleculeis identical to the structure of theunderivatized molecule except at the sitewhere the derivative compound has beenintroduced.

14 Feb 2008 Biology 555Crystallographic Phasing I p. 40 of 42

What is lack of isomorphism?

• A derivative may be nonisomorphous if:– It alters the unit cell lengths or angles

significantly (>0.2%?)– It rotates or translates the entire macromolecule

within the unit cell– It alters significantly the conformation of a

large segment (> 8 amino acids or 4nucleotides?) of the mcromolecule

14 Feb 2008 Biology 555Crystallographic Phasing I p. 41 of 42

Derivative compounds

Table II. Protein Residues and Their Affinities for Heavy Metals

Residue: Affinity for: Conditions:

Histidine K2PtCl4, NaAuCl4, EtHgPO4H2 pH>6

Tryptophan Hg(OAc)2, EtHgPO4H2

Glutamic, Aspartic Acids UO2(NO3)2, rare earth cations pH>5

Cysteine Hg,Ir,Pt,Pd,Au cations ph>7

Methionine PtCl42- anion

14 Feb 2008 Biology 555Crystallographic Phasing I p. 42 of 42

From Glusker, Lewis and Rossi

!

Puvw

=1

V|F

hkl

hkl

" |2cos2# (hu + kv + lv)

Finding the Heavy Atomsor Anomalous Scatterers

The Patterson function - a F2 Fourier transform with φ = 0 - vector map (u,v,w instead of

x,y,z) - maps all inter-atomic vectors - get N2 vectors!!

(where N= number of atoms)