Recommendations for the presentation of NMR structures of ...determine the structures of peptides,...

23
Journal of Biomolecular NMR, 12: 1–23, 1998. KLUWER/ESCOM © 1998 Kluwer Academic Publishers. Printed in Belgium. 1 Recommendations for the presentation of NMR structures of proteins and nucleic acids IUPAC-IUBMB-IUPAB Inter-Union Task Group on the Standardization of Data Bases of Protein and Nucleic Acid Structures Determined by NMR Spectroscopy John L. Markley a , Ad Bax b , Yoji Arata c , C. W. Hilbers d , Robert Kaptein e , Brian D. Sykes f , Peter E. Wright g and Kurt Wüthrich h,* a Department of Biochemistry, University of Wisconsin-Madison, Madison, WI 53706-1569, U.S.A.; b Laboratory of Chemical Physics, NIDDK, National Institutes of Health, Bethesda, MD 20892-0520, U.S.A.; c Water Research Institute, Tsukuba, Japan; d Laboratory of Biophysical Chemistry, University of Nijmegen, 6525 ED Nijmegen, The Netherlands; e Department of Chemistry, University of Utrecht, 3584 CH Utrecht, The Netherlands; f Department of Biochemistry, University of Alberta, Edmonton, Alberta, Canada T6G 2H7; g Department of Molecular Biology, The Scripps Research Institute, La Jolla, CA 92037, U.S.A.; h Institut für Molekularbiologie und Biophysik, ETH- Hönggerberg, CH-8093 Zürich, Switzerland Abstract The recommendations presented here are designed to support easier communication of NMR data and NMR struc- tures of proteins and nucleic acids through unified nomenclature and reporting standards. Much of this document pertains to the reporting of data in journal articles; however, in the interest of the future development of structural biology, it is desirable that the bulk of the reported information be stored in computer-accessible form and be freely accessible to the scientific community in standardized formats for data exchange. These recommendations stem from an IUPAC-IUBMB-IUPAB inter-union venture with the direct involvement of ICSU and CODATA. The Task Group has reviewed previous formal recommendations and has extended them in the light of more recent developments in the field of biomolecular NMR spectroscopy. Drafts of the recommendations presented here have been examined critically by more than 50 specialists in the field and have gone through two rounds of extensive modification to incorporate suggestions and criticisms. Table of contents Introduction 3 1. Definition of the system studied 3 1.1. Names of the molecules used 3 1.2. Source 3 1.3. Evidence for homogeneity and chemical identity 3 1.4. Sequence or chemical structure 3 1.5. Solution conditions 3 1.6. Isotope-labeled proteins and nucleic acids 4 2. Atom identifiers for reporting chemical shift assignments 4 * Convenor of the Task Group to whom correspondence should be addressed.

Transcript of Recommendations for the presentation of NMR structures of ...determine the structures of peptides,...

Page 1: Recommendations for the presentation of NMR structures of ...determine the structures of peptides, proteins, and protein–ligand complexes, as well as those of nucleic acids and their

Journal of Biomolecular NMR, 12: 1–23, 1998.KLUWER/ESCOM© 1998Kluwer Academic Publishers. Printed in Belgium.

1

Recommendations for the presentation of NMR structures of proteins andnucleic acidsIUPAC-IUBMB-IUPAB Inter-Union Task Group on the Standardization of Data Bases ofProtein and Nucleic Acid Structures Determined by NMR Spectroscopy

John L. Markleya, Ad Baxb, Yoji Aratac, C. W. Hilbersd, Robert Kapteine, Brian D. Sykesf,Peter E. Wrightg and Kurt Wüthrichh,∗aDepartment of Biochemistry, University of Wisconsin-Madison, Madison, WI 53706-1569, U.S.A.;bLaboratoryof Chemical Physics, NIDDK, National Institutes of Health, Bethesda, MD 20892-0520, U.S.A.;cWater ResearchInstitute, Tsukuba, Japan;dLaboratory of Biophysical Chemistry, University of Nijmegen, 6525 ED Nijmegen, TheNetherlands;eDepartment of Chemistry, University of Utrecht, 3584 CH Utrecht, The Netherlands;fDepartmentof Biochemistry, University of Alberta, Edmonton, Alberta, Canada T6G 2H7;gDepartment of Molecular Biology,The Scripps Research Institute, La Jolla, CA 92037, U.S.A.;hInstitut für Molekularbiologie und Biophysik, ETH-Hönggerberg, CH-8093 Zürich, Switzerland

Abstract

The recommendations presented here are designed to support easier communication of NMR data and NMR struc-tures of proteins and nucleic acids through unified nomenclature and reporting standards. Much of this documentpertains to the reporting of data in journal articles; however, in the interest of the future development of structuralbiology, it is desirable that the bulk of the reported information be stored in computer-accessible form and befreely accessible to the scientific community in standardized formats for data exchange. These recommendationsstem from an IUPAC-IUBMB-IUPAB inter-union venture with the direct involvement of ICSU and CODATA. TheTask Group has reviewed previous formal recommendations and has extended them in the light of more recentdevelopments in the field of biomolecular NMR spectroscopy. Drafts of the recommendations presented here havebeen examined critically by more than 50 specialists in the field and have gone through two rounds of extensivemodification to incorporate suggestions and criticisms.

Table of contents

Introduction 31. Definition of the system studied 3

1.1. Names of the molecules used 31.2. Source 31.3. Evidence for homogeneity and chemical identity 31.4. Sequence or chemical structure 31.5. Solution conditions 31.6. Isotope-labeled proteins and nucleic acids 4

2. Atom identifiers for reporting chemical shift assignments 4

∗ Convenor of the Task Group to whom correspondence shouldbe addressed.

Page 2: Recommendations for the presentation of NMR structures of ...determine the structures of peptides, proteins, and protein–ligand complexes, as well as those of nucleic acids and their

2

2.1. Proteins and peptides 42.1.1. Standard nomenclature 42.1.2. Notation for assignment ambiguities 5

2.1.2.1. Non-stereochemical ambiguity 52.1.2.2. Stereochemical ambiguity involving prochiral centers 62.1.2.3. Other stereochemical ambiguity 6

2.2. Nucleic acids 62.2.1. Standard nomenclature 62.2.2. Notation for assignment ambiguities 6

3. Additional atom identifiers for reporting conformational constraints and structures 63.1. Pseudoatom nomenclature 63.2. NOE constraints that can be identified by chemical shifts 7

4. Nomenclature for specifying conformation 74.1. Polypeptides 9

4.1.1. Backbone torsion angles:φi , ψi , andωi 9

4.1.2. Side-chain torsion angles:χji 9

4.1.3. Specification of turns, disulfide bonds, and proline rings 94.2. Nucleic acids 9

4.2.1. Backbone and glycosidic torsion angles 94.2.2. Sugar pucker: torsion angles and angle of pseudorotation 94.2.3. Value ranges for torsion angles specifying commonly observed conformations 10

5. NMR data acquisition, processing, and referencing 105.1. Data acquisition 105.2. Data processing 115.3. Referencing of chemical shifts 12

5.3.1.1H chemical shifts 135.3.2. Chemical shifts of other nuclei such as2H, 13C, 15N, and31P 13

6. Resonance assignments 136.1. Sequence-specific assignment of polypeptide chains 13

6.1.1. Sequential backbone assignment 136.1.2. Amino acid side chain assignment 14

6.2. Sequence-specific assignment of nucleic acids 136.3. Stereospecific assignment of diastereotopic substituents 146.4. Conformational equilibria 14

7. Conformational features derived from diagnostic NMR parameters 147.1. Polypeptide secondary structure 157.2. Polynucleotides 15

7.2.1. Mononucleosides 157.2.2. Backbone torsion angles 167.2.3. Secondary structure 16

8. Collection of input constraints and structure calculation 168.1. Conformational constraints 17

8.1.1. Distance constraints from NOE data 178.1.2. Torsion angle constraints from J-coupling data 178.1.3. Torsion angle constraints from chemical shifts 178.1.4. Constraints representing disulfide bonds, hydrogen bonding, and metal coordination 178.1.5. Supplementary constraints for structure refinement 178.1.6. Input for the final structure calculation 17

8.2. Methods used for structure calculation and refinement 188.2.1. Structure calculation 188.2.2. Preliminary structures to assign additional constraints 188.2.3. Structure refinement 18

9. Reporting three-dimensional structures 189.1. Presentation of structures 189.2. Agreement of structures with constraints 189.3. Precision of structures 199.4. Validation of structures 19

10. Data bank deposition of NMR structures and supporting data 19Conclusion 20Acknowledgement 20Editorial comments 20References and notes 21

Page 3: Recommendations for the presentation of NMR structures of ...determine the structures of peptides, proteins, and protein–ligand complexes, as well as those of nucleic acids and their

3

Introduction

Solution-state NMR spectroscopy is used widely todetermine the structures of peptides, proteins, andprotein–ligand complexes, as well as those of nucleicacids and their complexes with proteins, drugs, andother molecules. As the field has developed, a certainconsensus has evolved on the presentation of NMRsolution structures. This has been helped indirectly byguidelines established for depositing primary experi-mental data and resulting structures in data banks suchas the Protein Data Bank (PDB; ref. 1), BioMagRes-Bank (BMRB; ref. 2), Nucleic Acid Database (ref.3), and by conventions used for inclusion in abstract-ing services, for example,Macromolecular Structures(ref. 4). In consideration of the accumulated expe-rience over the past few years in presenting NMRstructures, the time appeared to be appropriate for aformal examination of reporting conventions used inthe past and for the development of a set of generallyaccepted guidelines for the future. With these goalsin mind, the present Task Group was convened as anIUPAC/IUBMB/IUPAB Inter-Union venture with fi-nancial support from ICSU and CODATA. The presentrecommendations build upon earlier recommendationsfor biochemical nomenclature (ref. 5), the presentationof proton (ref. 6) and non-proton (ref. 7) NMR datafor publication, and parameters and symbols for use inNMR (ref. 8).

1. Definition of the system studied

1.1. Names of the molecules used

It is helpful to use a notation that specifies the mole-cule type, natural species of origin, and fragment(if applicable): for example, ‘fragment consisting ofresidues 10–218 of murine Blk, a B-cell specific pro-tein tyrosine kinase (PTC)’. Note that recommenda-tions for the naming of a variety of biological macro-molecules have been published (refs. 9–14). Authorsare encouraged to include identifiers used by publiclyaccessible databases (ref. 15), as these may be help-ful for cross-referencing purposes. Such identifiersinclude the accession codes used by the Protein Iden-tification Resource (ref. 16), SWISS-PROT (ref. 17),the Enzyme Commission of IUBMB (ref. 18), and theChemical Abstracts Service (ref. 19).

1.2. Source

• Genus, species, strain or variant of gene (cloned orsynthetic).

• Expression vector and host.• If chemically synthesized, a description of the

methods used.

1.3. Evidence for homogeneity and chemicalidentity

• Confirmation of chemical identity (e.g., from massspectrometry or chemical sequencing).

• Analytical method used to establish chemical ho-mogeneity.

1.4. Sequence or chemical structure

• Full sequence of the molecule (or reference to it).• Unambiguous definition of the sequence numera-

tion of the molecule studied.• Description of additional covalent linkages and

their locations: e.g., disulfide bridges, covalentlyattached cofactors, or metal ions.

• Description of any differences between the systemstudied and the naturally occurring biomolecule(e.g., fragment with additional or fewer residues ateither end of the biopolymer, post-transcriptionalmodifications in nucleic acids or post-translationalmodifications in proteins, or lack thereof).

• Specification of noncovalent cofactors or pros-thetic groups.

• Quaternary structure (number and kind of sub-units; symmetry, if known).

1.5. Solution conditions

• Specification of the solvent constituents and theirisotopic compositions.

• Concentration of each solute component, includ-ing buffers, salts, antibacterial agents, etc.

• Temperature and pressure along with methodsused for their measurement.

• Value of the pH, or pH∗ for uncorrected pH meterreadings in2H2O. Note that p2H (or ‘pD’) appliesonly to measurements made with electrodes filledwith 2H2O.

• Comment on special measures taken to minimizeself-aggregation (if applicable).

• Type of sample cell used.

Page 4: Recommendations for the presentation of NMR structures of ...determine the structures of peptides, proteins, and protein–ligand complexes, as well as those of nucleic acids and their

4

1.6. Isotope-labeled proteins and nucleic acids

Essential information would include descriptions ofthe methods used to prepare the labeled moleculesand to determine the positions and levels of isotopiclabeling in the system used. Percent incorporationshould be indicated when known, as illustrated in thefollowing examples.• Uniform (random) labeling with15N (extent of

incorporation unknown): [U-15N]-ribonuclease.• Uniform (random) labeling with13C, 15N, and2H

(labeling levels known): [U-98%13C; U-90%15N;U-65%2H]-d(GCGCAATTGCGC).

• Residue-selective labeling: [98%15N]-Cys rubre-doxin.

• Site-specific labeling: [95%13Cα]-Trp28 lysozyme.• Insertion of residues with natural isotopic com-

position into otherwise uniformly labeled proteinsor nucleic acids is designated byNA for naturalabundance: [U-98%13C; NA-F,Y,W]-lysozyme.

2. Atom identifiers for reporting chemical shiftassignments

The 1983 IUPAC-IUB recommendations for peptidesand proteins (ref. 21) were never widely adopted bythe protein science community. Thus, the present rec-ommendations for peptides and proteins follow theearlier IUPAC-IUB ‘tentative rules’ of 1969 (ref. 20)currently employed by biomolecular databases, withextensions and clarifications as required for NMRdata. These rules designate all atoms by Greek letters(or Roman counterparts) and employ the main-chainprecedence rule for numbering prochiral sites. Thepresent recommendations for nucleic acids follow theIUPAC-IUB nomenclature (ref. 22), with three ex-ceptions. First, notation for the hydrogens at the C5′position in ribose and deoxyribose follows the wide-spread use of H5′ (for pro-S) and H5′′ (for pro-R),rather than H5′1 and H5′2 (the latter notation (ref. 22)is inconsistent with the numbering convention usedwith amino acids, in which such atoms would be des-ignated as 2 and 3). Second, H2′ (for pro-S) and H2′′(for pro-R) are used to designate the hydrogens at theC2′-position in deoxyribose. Third, the notation pro-posed for hydroxyl groups of ribose and deoxyriboserings in ref. 22 is adapted from that used with aminoacids, so that the hydrogen atoms are denoted by HO2′ ,HO3′ , and HO5′ , as appropriate, whereas the hydroxyloxygens are denoted by O5′ , etc.

In cases not covered by the present recommendations(e.g., unusual or modified amino acid residues or nu-cleic acid bases), authors are encouraged to refer to no-tation in use by the databases for such groups. If newatom notation needs to be introduced, it will be help-ful if authors define this by providing stereospecificdiagrams as appropriate.

2.1. Proteins and peptides

2.1.1. Standard nomenclatureFigure 1 defines the atom-naming conventions rec-ommended here for proteins. Coordinate files shouldcontain all atoms, including hydrogen atoms. The1969 recommendations (ref. 20) did not deal explicitlywith nomenclature for hydrogen atoms, and the rulesare subject to different interpretations. The interpre-tation recommended here (Figure 1) is that at any posi-tion of the side chain, the attached atom leading to themain chain always has the highest priority. For cyclicamino acids such as proline, the priority leads out fromthe main chain Cα atom rather than from the N atom(i.e., the priority is given by Cα > Cβ > Cγ > . . .>N). In the peptide backbone, the nitrogen is denotedby N, its attached hydrogen by H, and the carbonylcarbon and oxygen by C and O, respectively (ref. 20).In cases where the single letter designations are am-biguous, they may be primed (e.g., N′, C′, O′) (ref.20); C′ is used quite widely to denote the peptide back-bone carbonyl carbon. Although H alone is used bythe biomolecular data banks (PDB and BMRB), it isrecommended here that the symbol HN (in widespreaduse in the NMR community) be used as the unambigu-ous designator for the backbone amide hydrogen. Thehydrogens of an N-terminal amine are designated asH1, H2, and H3 (when protonated) or H1 and H2 (whenunprotonated). The oxygens of a C-terminal carboxylor carboxylate are O′ and O′′, and the hydrogen of thecarboxyl is H′′. IUPAC rules (ref. 18) label the hydro-gens on methyl groups and protonated amines by ‘1’,‘2’, and ‘3’, in the conventional way. Thus, for exam-ple, the atom designations in amino acid side chainswould be: Hβ1, Hβ2, and Hβ3 for the three methylhydrogens of alanine; Hζ1, Hζ2, and Hζ3 for the threeamine hydrogens of a protonated lysine; and Hζ1 andHζ2 for the two amine hydrogens of a neutral lysine.

In situations where Greek letters are unavailable(for example, on some computer displays), they maybe replaced by upper case Roman letters (α = A, β =B, γ = G, δ = D, ε = E, ζ = Z, η = H). When

Page 5: Recommendations for the presentation of NMR structures of ...determine the structures of peptides, proteins, and protein–ligand complexes, as well as those of nucleic acids and their

5

Gly

Hα2(R)

(R) (R)

(E)(Z)

(Z)

(E)

(Z)

(E)+

(R)

(R)

(R)

(R) (R)

(R)

(R) (R) (R)

(R)(R)

(R)

(R)

(R)

(R)

(R) (R)

(R)

(Z)

(E)

(R) (R)

(R)

(R)

(R)

(R)(R)

(R)

– —

—––

– —

—––

Hα3Cα

N

H

C

O

Pro

– –

Hβ2

Hβ3Cγ

Hδ2

Hδ3

Hγ3Hγ2

N

C

O

Arg

Hβ3

Hβ2

Hγ2

Hγ3

Cδ Nε

Hδ3Hε

CζHδ2

Ala

Hα –

Asn

Cβ Cγ

Hβ3Oδ1

Nδ2

Hδ21

Hδ22Hβ2

––

– –

Gln

Cγ Cδ

Hγ2Oε1

Nε2

Hε21

Hε22Hγ3

Hβ3

Hβ2

IIe

Cγ1 Cδ1(Hδ1)3

Hγ12

Hγ13

Cγ2

– –

– –

Asp

Cβ Cγ

Hβ3Oδ1

Oδ2 Hδ2Hβ2 –

Nη2–

Hη21

Hη22

(Hγ2)3

Leu

Cγ Hγ

Cδ1(Hδ1)3

Cδ2(Hδ2)3

Hβ3

Hβ2

Thr

Cγ2(Hγ2)3Cβ

Oγ1 –– Hγ1

Trp

CγCβ

Hβ3

Hβ2 Cδ1

Lys

Cγ Cδ

Hγ2

Hγ3

Hβ3

Hβ2

Cε Nζ(Hζ)3

+Hε2

Hε3

Hδ3

Hδ2

Met

Cγ Sδ –– Cε(Hε)3

Hγ2

Hγ3

Hβ3

Hβ2

Phe

CγCβ

Hβ3

Hβ2

Cδ1 ––

Cδ2

Cζ –– HζCε1

–– ––

Hδ2

Hδ1

Hε2

Hε1

Ser

Oγ –– HγCβ

Hβ3

Hβ2

Hη12Hη11– –

Nδ1Hε1

Hε2Hδ2

Hδ1

Cε1+

Nε2C

δ2

––

––

–––

––

Cβ(Hβ)3Cα

N

H

C

O

Glu

Cγ Cδ

Hγ2Oε1

Oε2 Hε2Hγ3

Hβ3

Hβ2 ––

Cys

Cβ Sγ – Hγ

Hβ3

Hβ2

His

Cβ Cγ

Hβ3

Hβ2

Cε2

Tyr

CγCβ

Hβ3

Hβ2

Cδ1 –

Cζ –– Oη –– HηCε1

– –

Hδ2

Hδ1

Hε2

Hε1Val

HβCβ

Cγ2(Hγ2)3

Cγ1(Hγ1)3

––

––

–––

– – –

–––

––

––

Cε2 –

Cε2

Cδ2

Hε3 Hζ3

Hη2

Hζ2

Hε1Hδ1

Cε3 Cζ3

Cη2

Cζ2

Nε1

Nη1

–––

(Z)

(E)

Cδ2

Figure 1. Recommended atom identifiers for the 20 common amino acids follow the 1969 IUPAC-IUB guidelines (ref. 20). Backbone atomsare shown for Pro, Gly, and Ala but not for the other L-amino acids (where they correspond to those bounded by the dashed line in the Alastructure). Greek letters are used as atom identifiers. The Cα or the substituent closer to Cα (in the order Cα, Cβ, Cγ, . . .) takes precedenceover atoms in branches in defining stereochemical relationships. For example, if tetrahedral carbonC has four substituentsX, Y, Z, andZ′ (withpriority X > Y > Z = Z′; i.e., Z and Z′ are diastereotopic substituents designated provisionally as unprimed and primed), their numberingis derived as follows: if one sights down the X—C axis (with the X atom toward the viewer), the equivalent atoms,Z andZ′, are designatedZ2 andZ3, such that Y,Z2, andZ3 follow a clockwise orientation. The side-chain-NH2 nitrogens of Arg are designated as Nη1 and Nη2 bytheir relationship (cis or trans, respectively) to Cδ. The hydrogen atoms of the side-chain -NH2 groups of Asn, Gln, and Arg are distinguishedby numbers (1 or 2) on the basis of their relationship (cis or trans, respectively) to the heavy atom three bonds closer to the main chain (Cβ forAsn, Cγ for Gln, Nε for Arg). Thus, each -NH2 hydrogen of Arg is distinguished by two numbers, the first indicating the nitrogen to which itis attached and the second indicating the stereochemistry of the hydrogen itself. Numbering of Phe and Tyr rings gives higher priority to theatom with the smaller absolute value of theχ2 torsion angle (ref. 20). For example, the ring carbons of Phe and Tyr lying in the plane with thesmallerχ2 torsion angle are designated as Cδ1 and Cε1. Indicated for reference in parentheses are thepro-R/pro-Sdesignations for prochiraltetrahedral groups (with only thepro-R indicated as ‘R’) (refs. 23 and 24) and theE/Z designations for planar groups (refs. 27 and 28).

superscripts are not feasible, the full descriptor can beplaced on a single line. When a full descriptor is usedin a subscript, its components are displayed on a sin-gle line: e.g.,3JH′Hα for a coupling constant ordHαHβ

(commonly abbreviateddαβ) for a distance detectedby a nuclear Overhauser effect (NOE). The computerrepresentation for′ is a single quote and for′′ is twosingle quote symbols.

2.1.2. Notation for assignment ambiguitiesIt is important to distinguish clearly between res-onances that have complete stereochemical assign-ments and those that do not. Resonances with com-

plete stereospecific assignments are denoted by thesymbols shown in Figure 1. Resonances having am-biguous assignments are denoted by special sym-bols, as defined below. Knowledge about particularkinds of assignment ambiguity (stereospecific andnon-stereospecific) can be important in structure de-terminations. Such information defines, for example,what kind of pseudoatoms (see 3.1. below) need to beinvoked in a structure determination.

2.1.2.1. Non-stereochemical ambiguityA slash (/) should be used to indicate ambiguity inassignments. The slash is used to join the ambigu-

Page 6: Recommendations for the presentation of NMR structures of ...determine the structures of peptides, proteins, and protein–ligand complexes, as well as those of nucleic acids and their

6

ous atom designators and chemical shifts. For exam-ple, four chemical shifts (6.84, 6.90, 7.35, and 7.42ppm) associated ambiguously with four hydrogens ofa given tryptophan ring (Hε3, Hζ2, Hζ3, Hη2) are repre-sented as: Hε3/ζ2/ζ3/η2 = 6.84/6.90/7.35/7.42 ppm.A single resonance at 4.30 ppm that is known to arisefrom Hα of either Thr 10 or Thr 28 would be indicatedas T10 Hα/T28 Hα = 4.30 ppm. Because signals fromindividual hydrogens on methyl groups and aminesare ordinarily not resolved by NMR spectroscopy, thesignals could be represented by the slash rule (for ex-ample, Hβ1/Hβ2/Hβ3, for the hydrogens of an alaninemethyl group), but it is simpler when designating spec-tral features just to omit the numeration (Hβ, for thealanine methyl).

2.1.2.2. Stereochemical ambiguity involvingprochiral centers

The nomenclature described in 2.1.2.1. also appliesto situations in which two diastereotopic substituentshave not been assigned stereospecifically. For ex-ample, resonances at 2.44 ppm and 3.12 ppm thatwere ambiguously assigned to the Hβ2 and Hβ3

atoms of a given residue are reported as Hβ2/β3 =3.12/2.44 ppm. A single peak at 2.44 ppm that couldbe assigned to Hβ2, Hβ3, or both is reported asHβ2/β3 = 2.44 ppm. If it has been established thatboth have the same chemical shift, they are reportedas: Hβ2 = 2.44 ppm, Hβ3 = 2.44 ppm.

2.1.2.3. Other stereochemical ambiguityWhen flips of the symmetrical ring of a phenylala-nine or tyrosine are slow on the chemical shift timescale, signals from theδ- andε-atoms can have non-equivalent shifts. In such cases, information aboutassignments relative to theχ2 torsion angle is neededto define the specific atom designators 1 and 2. Thisinformation often is not known in advance of a fullstructure determination, so it is useful to specify thislevel of ambiguity again with the slash rule. For ex-ample, if proton signals at 6.60, 6.95, 7.12, and7.22 ppm have been identified with a particular, slowlyrotating, tyrosine ring at position 12, with no furtherinformation, they are reported as Y12 Hδ1/ε1/δ2/ε2 =7.12/6.95/7.22/6.60ppm. If, in addition to the above,it is known that the signals at 7.12 and 7.22 ppmarise fromδ-hydrogens, then the signals are reportedas Y12 Hδ1/δ2 = 7.12/7.22 ppm and Y12 Hε1/ε2 =6.60/6.95 ppm. If it has been further determined thatthe signals at 6.95 and 7.12 ppm correspond to hydro-gens on the same side of the ring, they are reported as

Hδx = 7.22 ppm, Hδy = 7.12 ppm, Hεx = 6.60 ppm,Hεy = 6.95 ppm.

In cases where the signals from the side-chainamide hydrogens of Asn or Gln, or the guanidinoNH2 nitrogens or hydrogens of Arg are known tobe coincident, the same chemical shift is assigned toboth atom designators. In cases where information forunambiguous, individual assignments is lacking, theambiguity is represented by the ‘slash rule’. For ex-ample, ambiguous assignments would be indicated byGln Hε21/ε22, Arg Hη11/η12, or Arg Hη11/η12/η21/η22.

2.2. Nucleic acids

2.2.1. Standard nomenclatureFigure 2 shows the recommended designators for theatoms of ribose, deoxyribose, and the common bases(ref. 22). The IUPAC-IUB notation for the methylgroup of T is C7(H7)3. The phosphorus-bound oxy-gens are named O3′, O5′, OP1, and OP2. In the CIPrules (Cahn–Ingold–Prelog, refs. 25 and 26), OP1corresponds topro-Rand OP2 topro-S.

2.2.2. Notation for assignment ambiguitiesThe slash rule is used to designate diastereotopicsubstituents of prochiral centers that have not been as-signed stereospecifically. For example, if H5′ and H5′′of a given ribose are identified at 4.12 ppm and 4.53ppm but have not been assigned individually, they aredesignated as H5′/H5′′ = 4.53/4.12 ppm.

3. Additional atom identifiers for reportingconformational constraints and structures

Structural constraints involving atoms for which theNMR signals have not been assigned individually arefrequently employed in the determination of macro-molecular structures from NMR data. In such cases,special symbols are used to identify the structural con-straints. These may take the form of pseudoatoms thatrepresent groups of atoms, or of symbols that linkspecific constraints with NMR chemical shifts.

3.1. Pseudoatom nomenclature

NOE constraints to groups of atoms for which the res-onances have not been assigned individually are com-monly specified by pseudo-structures in which certainatoms have been joined together into pseudoatoms.Relevant pseudoatoms for the standard amino acid

Page 7: Recommendations for the presentation of NMR structures of ...determine the structures of peptides, proteins, and protein–ligand complexes, as well as those of nucleic acids and their

7

ribose

A G

H4'

H5''(R)

(R)

(R)H5'

O4'

base H2'

H1' O2'-R

H3'

O3'-R

R-O5'

C1' C2'

C3'

2' deoxyribose

H4'

H5''H5'

O4'

base H2'

H1' H2''

H3'

O3'-R

R-O5'

C1' C2'

C3'

––

––

––

–C6

N6

H62H61

N3

N1 C5N7

N9

C8 H8C2 C4

––––

–––

––

–––

–H2

––

––

–C6

O6

N3

H1

N2

H22

N1 C5N7

N9

C8 H8C2H21 C4

–– ––

–– ––

–– ––

C––

C4

N4

H42H41

N1

H5

H6O2

N3 C5

C2 C6

––––

––

––

––

– –– ––

U

C4

O4

N1

H5H3

H6O2

N3 C5

C2 C6

–––

––

––

–– ––

– –– –– ––

T

C4

O4

N1

H3

H6O2

N3 C5

C2 C6

–––

––

––

–– ––

– –– –– ––

––

––

––

––

C7(H7)3

Figure 2. Nomenclature, structures, and atom numbering for thesugars and the bases contained in common nucleotides (ref. 22).The identifiers shown here for the hydrogen atoms largely followprevious recommendations as discussed in section 2 of the text. ‘R’indicatespro-R.

residues have been defined (ref. 30) and are in com-mon use in the NMR literature. An NOE to one or allprotons in the group is measured relative to a locationcentral to the group (ref. 31). All pseudoatoms neededto describe proteins and nucleic acids can be specifiedin terms of two letters, Q and M, in conjunction withthe usual atom identifiers (Roman letters are used in-stead of Greek letters for atom positions within aminoacids, and R is used to denote a ring). M describesthe location of methyl groups, and Q is used in allother situations. Table 1 lists all pseudoatoms in thecommon amino acids and nucleotides.

3.2. NOE constraints that can be identified bychemical shifts

NOEs used for structural constraints that are attributedto pairs of diastereotopic or symmetry-related protonswhich have non-degenerate chemical shifts and havenot been assigned individually, can be distinguishedunambiguously on the basis of the chemical shifts.For example, an NOE observed at 3.12 ppm to one

residues withHβ2 and Hβ3

Hβ2

N

Rχ1 = –60°

C

Hβ3Hα

Ile

N

Cγ1

χ1 = –60°

C

Cγ2Hα

Thr

N

Oγ1

χ1 = –60°

C

Cγ2Hα

Val

N

Cγ2

χ1 = ±180°

C

Cγ1Hα

Figure 3. Definition of theχ1 angle for the common amino acids.Note that although the Hα–Cα–Cβ–Hβ(β2) torsion angles are equiv-alent in all the examples shown, the value ofχ1 for Val differs fromthose of the other amino acids as a consequence of the definition ofthis torsion angle.

of a pair of methylene proton resonances at 2.44 ppmand 3.12 ppm, which were assigned ambiguously tothe Hβ2/β3 atoms of Tyr57, is reported as an NOE toTyr57 Hβ2/β3 (3.12 ppm). Alternatively, the symbols‘L’ and ‘H’ can be used to represent the resonances atlower frequency (historically referred to as ‘upfield’)and higher frequency (historically ‘downfield’), re-spectively. In the above example, the NOE would thenbe to Tyr57 HβL.

4. Nomenclature for specifying conformation

Torsion angles are defined in terms of the substituentwith highest preference on each of the two atoms thatspan the rotatable bond (refs. 20 and 35). The IUPAC-IUB convention (refs. 20 and 22) specifies that themain-chain atoms of a peptide or nucleic acid takeprecedence over others. For example, in a peptide, thebackbone carbon and nitrogen atoms take precedenceover other heavier atoms such as the carbonyl oxygenor cysteine Cβ (ref. 20). Otherwise atom preferencesfollow standard conventions (refs. 25 and 26). A giventorsion angleθ about the B–C bond of a molecule A–B–C–D (where A is the atom with highest preferenceattached to B, and D is the atom with highest prefer-ence attached to C) is the angle between the planescontaining A–B–C and B–C–D, or alternatively theangle between the projections of B–A and C–D onto

Page 8: Recommendations for the presentation of NMR structures of ...determine the structures of peptides, proteins, and protein–ligand complexes, as well as those of nucleic acids and their

8

Table 1. Pseudoatoms for the comon amino acids and nucleotides used in thedetermination of structures of Proteins and nucleic acids from NMR dataa

Residue Pseudoatom 1H atoms representedb

Gly QA α-methylene

Ala MB β-methyl

Val MG1, MG2 γ1-, γ2-methyl

QG All six γ-methyl

Ile MG, MD γ2-, δ1-methyl

QG γ1-methylene

Leu MD1, MD2 δ1-, δ2-methyl

QB β-methylene

QD All six δ-methyl

Pro QB, QG, QD β-, γ-, δ-methylene

Ser, Asp, Cys, His, Trp QB β-methylene

Thr MG γ2-methyl

Asn QB β-methylene

QD δ2-amino

Glu QB, QG β-, γ-methylene

Gln QB, QG β-, γ-methylene

QE ε2-amido

Lys QB, QG, QD, QE β-, γ-, δ-, ε-methylene

QZ ζ-amino

Arg QB, QG, QD β-, γ-, δ-methylene

QH1, QH2 η11 andη12,η21 andη22

QH All four η-guanidino

Met QB, QG β-, γ-methylene

ME ε-methyl

Phe, Tyr QB β-methylene

QD, QE δ1- andδ2-ring, ε1- andε2-ring

QR All ring

β-D-Ribose Q5′ 5′-methylene

2′-β-D-Deoxyribose Q2′ 2′-methylene

Q5′ 5′-methylene

A Q6 6-amino

C Q4 4-amino

G Q2 2-amino

T M7 7-methyl

aRef. 30.bSee Figures 1 and 2.

a plane normal to B–C. The torsion angle is writtenin full as θ (A,B,C,D). When the projections of thetwo bonds B–A and C–D coincide (eclipsed orcisconformation),θ = 0◦; when the projections of thetwo bonds B–A and C–D are opposed (trans confor-mation),θ = 180◦. If for a given torsion angleθ thatis neither 0◦ nor 180◦, when looking (in either direc-tion) along the central bond B–C, the minimal rotationof the front bond that would be required to achievethe eclipsed conformation is clockwise,θ is consid-ered to be positive; if the required minimal rotation

is counter-clockwise,θ is considered to be negative.As defined below, various abbreviations are used todenote specific torsion angles in proteins and nucleicacids; in other situations (or to avoid any ambiguity)the four atoms that determine the torsion angle shouldbe specified.

Page 9: Recommendations for the presentation of NMR structures of ...determine the structures of peptides, proteins, and protein–ligand complexes, as well as those of nucleic acids and their

9

4.1. Polypeptides

4.1.1. Backbone torsion angles:φi , ψi , andωi

The φi torsion angle describes rotations about theNi–Cα

i bond (relevant four atoms C′i−1–Ni–Cαi –C′i ),

and theψi torsion angle describes rotations about theCαi –C′i bond (relevant atoms Ni–Cα

i –C′i–Ni+1). Theωi torsion angle describes rotations about the peptidebond, C′i−1–Ni (relevant atoms Cαi−1–C′i−1–Ni–Cα

i ).

4.1.2. Side-chain torsion angles:χji

Side-chain torsion angles for residuei are designatedby, wherej represents the rotatable bond:j = 1 forthe Cα–Cβ bond (where the relevant atoms are N–Cα–Cβ–X); j = 2 for the Cβ–Cγ bond (where the relevantatoms are Cα–Cβ–Cγ–X), etc. Figure 3 shows how theχ1 angle is related to the angle between the projectionsof the bonds Cα–Hα and Cβ–Hβ (or Cβ–Hβ2) for thecommon amino acids. In amino acids with branchedside chains, two superscripts are used, the first to indi-cate the position of the bond and the second to indicatethe branch (e.g, the Cα–Cβ–Cγ1–Cδ1 torsion angle ofisoleucine at residue positioni is denoted byχ2,1

i ) (ref.20).

4.1.3. Specification of turns, disulfide bonds, andproline rings

Turns can be defined unambiguously by the torsionangles involved, i.e.,φi+1, ψi+1, φi+2, andψi+2 (ref.36). Disulfide bridges between residuesi andj can bedescribed by the torsion anglesχ1

i , χ2i , χ

3i , χ

2j andχ1

j ;in addition, it is recommended that the handedness beindicated byR or S (ref. 37). Proline ring puckers canbe defined by the torsion anglesχ1

i , χ2i , χ

3i andχ4

i (seecaption to Figure 4); alternatively, use can be made ofthe fact that the value ofχ1

i alone is indicative of thetwo major pucker forms (ref. 38) (Figure 4).

4.2. Nucleic acids

Notation in current use in the nucleic acid field ispresented here. In line with current usage, multiplenotations are provided in certain instances. Older con-ventions for defining torsion angles can be found inreference 22.

4.2.1. Backbone and glycosidic torsion angles

For residuei the backbone (α, β, γ, δ, ε, ζ) and glyco-sidic (χ) torsion angles for purine (Pu) and pyrimidine(Py) bases are defined as follows (ref. 22) (see alsoFigure 5):α O3′i−1– Pi – O5′i – C5′iβ Pi– O5′i– C5′i – C4′iγ O5′i– C5′i– C4′i – C3′iδ C5′i– C4′i– C3′i – O3′iε C4′i– C3′i– O3′i – Pi+1

ζ C3′i– O3′i– Pi+1– O5′i+1

χ(Py) O4′i– C1′i– N1i – C2iχ(Pu) O4′i– C1′i– N9i – C4i

4.2.2. Sugar pucker: Torsion angles and angleof pseudorotation

Each of the sugar torsion anglesν0, ν1, ν2, ν3 andν4is defined by four atoms as follows (ref. 22) (see alsoFigure 5):ν0 C4′– O4′– C1′– C2′

ν1 O4′– C1′– C2′– C3′

ν2 C1′– C2′– C3′– C4′

ν3 C2′– C3′– C4′– O4′

ν4 C3′– C4′– O4′– C1′Note that ν3 and δ, which characterize torsion

angles about the same bond (C3′– C4′), will havedifferent values for any given conformation.

The sugar is generally non-planar, and it is recom-mended that the sugar ring conformation (pucker) bedescribed by specifying two parameters: the phase an-gle of pseudorotation (P) and the puckering amplitudeψm. In practice, the pseudorotation parameters can bedetermined from measurements of the three-bondJ-couplings between the protons attached to C1′, C2′,C3′, and C4′. The following equations characterize re-lationships among the torsion angles defined by theseprotons and the pseudorotation parameters (refs. 39and 40):φ1′2′ = 121.4+ 1.03ψm cos(P − 144◦)φ1′2′′ = 0.9+ 1.02ψm cos(P − 144◦)φ2′3′ = 2.4+ 1.06ψm cos(P )

φ2′′3′ = 122.9+ 1.06ψm cos(P )

φ3′4′ = −124.0+ 1.09ψm cos(P + 144◦)In these equations the non-equilateral character of

the sugar ring is accounted for. Theφ’s can be derived

Page 10: Recommendations for the presentation of NMR structures of ...determine the structures of peptides, proteins, and protein–ligand complexes, as well as those of nucleic acids and their

10

Hγ 2

Hγ 2

Hγ 3

Hγ 3Hδ3

Hδ3

Hδ2 Hδ2C

O

C

O

– –– –Hα

Hβ3

Hβ3

Hβ2

Hβ2

DOWN UP

NNN

Figure 4. Ring pucker in the proline ring (ref. 38). In the DOWN conformation, the torsion angleχ1 andχ3 are positive andχ2 andχ4 arenegative; in the UP conformation, the torsion anglesχ1 andχ3 are negative andχ2 andχ4 are positive. Typical values for the torsion anglesalso depend on whether the Xxx—Pro peptide bond iscis or trans. Average values (in degrees) found from a number of X-ray structures ofproteins forχ1, χ2, χ3, χ4 were (ref. 38);transDOWN, 18,−29, 17,−4; cis DOWN, 26,−38, 18,−7; transUP,−27, 35,−34, 20;cis UP,−21, 36,−36, 17.

base

– O3'(i–1) – P(i) – O5' – C5' – C4' –– C3' – O3' – P(i+1) –

5' 3'

α β γ δ/ν3 ε ζ

ν2

ν1

χ

ν4

O4' C2'

C1'–– ––

–– ––

––

ν0

2

1

3

4

0

Figure 5. Designation of the torsion angles in the sugar-phosphate backbone (α, β, γ, δ, ε, ζ), the glycosidic bond (χ), and the endocyclictorsion angles in the sugar ring (ν0 — ν4) (ref. 22).

from theJ-couplings by means of appropriate Karplusequations.

Relationships among the phase angle of pseudoro-tation and the alternativeE/T andendo/exonotationsare illustrated by the pseudorotation wheel in Fig-ure 6. In theE/Tnomenclature, the puckered forms aredesignated byE (envelope form) andT (twist form)(ref. 22). For example,P = +18◦ is equivalent toC3′-endoor 3E, andP = −162◦ is equivalent to C3′-exoor 3E. Symmetrical twist conformations, with twoatoms at equal distance with respect to the plane de-fined by the three remaining ring atoms, can also berepresented; for example,P = 0◦ is equivalent toC3′-endo/C2′-exo, or to3

2T.

4.2.3. Value ranges for torsion angles specifyingcommonly observed conformations

In practice, torsion angles are often not known ex-actly, but can be assigned to particular conformationalregions. The notation used most frequently by spec-

troscopists and X-ray crystallographers for ranges oftorsion angles iscis, trans, −gauche,and+gaucheas shown in Figure 7. The IUPAC-IUB Commis-sion on Nucleotide Nomenclature recommended theKlyne-Prelog notation (ref. 22), with±synperiplanar(±sp), ±synclinal (±sc), ±anticlinal (±ac), and±antiperiplanar (±ap) as defined in Figure 7. Thetermssyn(0◦ ± 90◦) andanti (180◦ ± 90◦) have spe-cial meanings in nucleotide chemistry in that they areused to define the orientation of the base with respectto the sugar. Table 2 lists the predominant confor-mations of nucleic acids and the associated ranges oftorsion angles.

5. NMR data acquisition, processing, andreferencing

This section gives recommendations on how to presentinformation needed to provide a comprehensive de-

Page 11: Recommendations for the presentation of NMR structures of ...determine the structures of peptides, proteins, and protein–ligand complexes, as well as those of nucleic acids and their

11

Table 2. Notation commonly used to describe the major conformational formsof polynucleotides in terms of torsion angle ranges obtained from fiber diffrac-tion measurementsa

α β γ δ ε ζ χ

A-RNA g− t g+ g+ t g− anti

−sc +ap +sc +sc −ap −sc −ap

A-DNA g− t g+ g+ [t] g− anti

−sc +ap +sc +sc −ac −sc −ap

B-DNA g− [t] g+ t t [g−] anti

−sc −ac +sc +ap +ap −ac −ac

Z-DNA Gb g+ t t g+ g− [[g+]] syn

−sc −ap +ap +sc −sc +ac +sc

Z-DNA Cc [[g−]] t g+ [t] [[g−]] g− anti

−ac −ap +sc +ac −ac −ac −ap

aFor reference, the alternative Klyne–Prelog notation (ref. 22) is given beloweach recommended symbol. Thecis (c), trans (t), +gauche(g+), −gauche(g−) notation does not provide a definition for the torsion angle ranges 90◦to 150◦ and−90◦ to −150◦ (Figure 7). Several torsion angles in standardpolynucleotide conformations fall outside the ranges defined by these sym-bols. This information is also given in the table in that angle values outsidethe range by less than 10◦ are inclosed by brackets and those outside the rangeby more than 10◦ are enclosed by double brackets. In these cases, the Klyne–Prelog notation provides a more accurate representation of the torsion anglerange.

bGuanine residue in Z-DNA.cCytosine residue in Z-DNA.

scription of the experiments used in a structure deter-mination.

5.1. Data acquisition

The following information describes the data collec-tion process. In particular, a complete account isneeded for experiments employed in the collection ofconformational constraints.• NMR hardware used. Customarily the magnetic

field is specified by the resonance frequency of aparticular nucleus, usually1H; other minimal in-formation includes the type of console, type ofprobe head and the size and nature of the sampletube/container. Additional details might be de-sirable for the description of novel experimentalprotocols or home-built systems.

• Protocols for the acquisition of raw data. This nor-mally includes a diagram of the pulse sequenceand/or an ASCII file with the code for the pulseprogram, or a reference if previously publishedroutines were used. It is important to include in-formation on details of solvent suppression, timingparameters, r.f. power levels (or equivalent 90◦

pulse duration), phase cycling, pulse shapes, anymagnetic field gradients (magnitudes and dura-tions), the modes of acquisition for each dimension(including method for quadrature detection), therecycle delay, and the total time required for dataacquisition.

• Description of time domain data. This usually in-cludes acquisition times, spectral widths, possibleuse of non-linear sampling, and the number of realor complex data points in each time domain.

5.2. Data processing

A comprehensive description of data processing nor-mally includes the following information:• Software packages used.• Enhancement of the time-domain data. This may

include digital filtering, solvent filtering, convolu-tion with a window function, zero-filling, or linearprediction.

• Time-domain to frequency-domain conversion.The Fourier transform is the most commonmethod, but alternatives may be employed, suchas maximum entropy, maximum likelihood, or

Page 12: Recommendations for the presentation of NMR structures of ...determine the structures of peptides, proteins, and protein–ligand complexes, as well as those of nucleic acids and their

12

P=0°

dr

r d

–36°–7

2°36°

72°

±180°144°

108°

–144°

–108°

04'–endo

C1'–exo

C2'–

endo

C4'–exo

C3'

–en

do

04'–exo

C4'–endo

C3'

–ex

o

C1'–endo

C2'–

exo

32T

23T

0E0E

1E 4E

1E4E

40T

01T

10T

04T

21T

43T

34T

12T

3E2E

2E3E

Figure 6. Pseudorotation cycle of the furanose ring showing therelationships among the phase angle of pseudorotation (P), the en-velope (E) and twist (T) notation, and theendoandexonotation.Conformations corresponding to the northern half are designated asN-type, and those corresponding to the southern half are designatedasS-type. Ranges of theP values usually observed experimentallyfor theN andSconformations are represented by ‘r’ for ribo- and ‘d’for 2′-deoxyribofuranose rings ofβ-D-nucleosides and nucleotides(ref. 22). Note that the symmetrical twist (T) conformations arefound at even multiples of 18◦ and that the symmetrical envelopeconformations are found at odd multiples of 18◦.

Bayesian analysis. The final data size and digitalresolution in each dimension are part of a completedocumentation.

• Enhancement of frequency-domain data. Com-mon methods include symmetrization, baselineflattening, and ridge or solvent suppression.

• Methods used to extract spectral parameters suchas resonance frequencies, peak volumes, and cou-pling constants.

5.3. Referencing of chemical shifts

According to general convention, NMR spectra aredisplayed with positive frequencies to the left and neg-ative frequencies to the right of the zero-frequencyreference (refs. 6 and 7). The IUPAC Commissionon Molecular Structure and Spectroscopy (ref. 6) hasrecommended that the1H signal of tetramethylsilane(TMS) be used as the primary reference for the res-onance frequencies (and hence chemical shifts) ofprotons, and they propose to use the same signal asan indirect reference for other nuclei (ref. 41). The

syn

±180°

Frontbond

anti

+60° (+gauche)

–60°(–gauche)

+120°–120°

–150° +150°

–30°0°

–90° +90°

+30°

+sp–sp

+ap–ap

+ac–ac

+sc–sc

(cis)

(trans)

Figure 7. Relationships among different terminologies used for theapproximate descritpion of a torsion angle in terms of conforma-tional regions. Torsion angles are defined by holding the front bondat zero position, as indicated in the figure, and by measuring theangle between the front bond and the back bond while looking alongthe central bond. The Klyne–Prelog convention (whose ranges areabbreviated as:± sp, ± synperiplanar; ± sc, ± synclinal; ± ap,± antiperiplanar; ± ac, ± anticlinal) has been recommended bythe IUPAC-IUB Commission on Nucleotide Nomenclature (ref. 22).However, spectroscopists commonly use a different nomenclaturewhich specifies four 60◦ regions:cis (0◦ ± 30◦), trans (180◦ ±30◦), + gauche(60◦ ± 30◦), and− gauche(−60◦ ± 30◦). Thetermssyn (0◦ ± 90◦) andanti (180◦ ± 90◦) are discussed in thetext.

traditional primary chemical shift standard, however,for aqueous solutions has been the methyl signal ofa water-soluble derivative of TMS. Following recentliterature (ref. 42), it is recommended that the primarychemical shift standard for all nuclei in aqueous bio-logical investigations be the methyl signal of internal2,2-dimethylsilapentane-5-sulfonic acid (DSS) at lowconcentration. When it is inappropriate to use DSS asan internal standard, e.g., in cases where DSS binds tothe system under investigation, it is still desirable touse the position of DSS as the 0 ppm point. In suchcases, ref. 42 lists alternative reference compoundsand corresponding referencing protocols. Precise de-tails should be given for the reference used and theconversion factors employed. In the interest of clearlydifferentiating such chemical shifts from the defaultIUPAC recommendation for TMS as the reference(ref. 41), it is recommended that the notationδDSS be

Page 13: Recommendations for the presentation of NMR structures of ...determine the structures of peptides, proteins, and protein–ligand complexes, as well as those of nucleic acids and their

13

used. The chemical shifts of the methyl peaks in thetwo references are quite similar (ref. 43).

5.3.1.1H chemical shiftsThe methyl resonance of internal DSS at low concen-tration directly provides the preferred reference signalfor 0 ppm: notation,δDSSor δ (DSS).

5.3.2. Chemical shifts of other nuclei such as2H,13C, 15N, and 31P

Whereas a number of direct and indirect approacheshave been described for the referencing of chemicalshifts of nuclei other than1H, the indirect referencingmethod (ref. 44) has become the preferred approach.Non-proton chemical shifts are referenced indirectlyto the 1H standard using conversion factors derivedfrom ratios of NMR frequencies. The relative frequen-cies are designated by the symbol4, with 1H conven-tionally at exactly 100 MHz.4 values for the nucleimost commonly utilized in studies of proteins and nu-cleic acids are listed in Table 3. Thus, for example,the zero frequency for13C chemical shifts is obtainedfrom the experimentally determined1H frequency forDSS by multiplying the latter frequency by the13C/1H4 ratio, 25.1449530/100.000000= 0.251449530. Inorder to tie in with the chemical shift scales used inmuch of the previous literature, these conversion fac-tors were derived from the1H/2H frequency ratio ofwater,13C frequency of the methyl groups of internalDSS, the15N frequency of external liquid ammonia,and the31P frequency of internal trimethylphosphate,respectively (see ref. 42 for details). It is recom-mended here that the relative frequencies (Table 3) beused as fixed constants not subject to further changeand that they be used for data collected at all tem-peratures (with no applied temperature correction) andwith the primary reference as either internal TMS (δ)or DSS (δDSS). It is anticipated that standardized4values for additional nuclei of interest to biomolecu-lar spectroscopists will be developed and published asrecommendations (ref. 41).

6. Resonance assignments

Resonance assignments constitute a prerequisite forthe commonly used structure determination proce-dures. Because the final quality of the structure de-pends to a large extent on the completeness of the un-derlying resonance assignments, their full descriptionis important. Deposition of chemical shift assignments

Table 3. Relative frequencies (4) of nuclei of biomolecular interestto be used for indirect referencinga

Nucleus Secondary 4/MHz Reference

1H 100.000000 by definition2H DSS (internal) 15.3506088 ref. 4513C DSS (internal) 25.1449530 ref. 4215N liquid NH3 (external) 10.1329118 ref. 4231P (CH3O)3PO (internal) 40.4808636 ref. 46

aThese4 values were derived originally (as indicated in the table)from the ratio of the signal frequency of a secondary reference tothat of internal DSS in D2O as the primary reference.

is imperative (see section 10 below). Quantitativeinformation on the completeness of assignments ob-tained, including a listing of the missing assignments,is of prime interest. If, as part of automated assign-ment procedures, parameters are available that providea measure for the reliability of a particular assignment,it is desirable that such parameters be presented alongwith the assignments themselves.

6.1. Sequence-specific assignment of polypeptidechains

The assignment process for polypeptides consists ofsequence-specific assignments of backbone and sidechain resonances.

6.1.1. Sequential backbone assignmentIn most cases, a brief description should suffice, sincepresent assignment strategies employ one or both ofthe two following approaches.

Assignments derived from sequential NOEs and1H-1H J-correlation. If samples with solution condi-tions (e.g., pH, temperature, ionic strength) differentfrom those reported in the assignment table were usedin determining or validating assignments, this infor-mation needs to be noted. It is useful to report otherinformation relevant to the assignment process, suchas improvement of the spectral resolution by isotopiclabeling, including1H/2H exchange of amide groups.It is recommended that this include description ofthe types ofJ-correlation experiments used in thesequential assignment process.

Sequential assignments derived from homonuclearand heteronuclear J-correlations across peptidebonds. Although uniform double labeling with15Nand 13C is most commonly employed with this ap-

Page 14: Recommendations for the presentation of NMR structures of ...determine the structures of peptides, proteins, and protein–ligand complexes, as well as those of nucleic acids and their

14

proach, other labeling patterns may be used. If non-uniform labeling was employed, important informa-tion includes a description of the labeling strategy anda list of the labeled proteins used. In cases whereNOEs were used in parallel to establish or validateassignments, this information is of interest.

6.1.2. Amino acid side chain assignmentSide chain assignments, although conceptually simple,are often complicated by spectral overlap, strong cou-pling, and rapid transverse relaxation. Various isotopelabeling strategies and NMR techniques have beendevised to overcome these problems. Unless novelmethods have been employed, it should suffice to ref-erence the approaches used. It is recommended thatthe report contain a list of the experiments used, thefraction of the side chain resonances that have beencorrelated throughJ-couplings to the backbone, andthe extent to which NOEs have been used in theassignment or validation process.

6.2. Sequence-specific assignment of nucleic acids

The assignment process for nucleic acids consists oftwo phases. In one part, the sugar-phosphate backboneresonances are assigned, together with the purine H8and the pyrimidine H5 and H6 resonances. This as-signment usually is based on combined use of NOEs,1H-1H J-correlation, and1H–31P J-correlation, and,in cases where isotopic labeling is used, on1H–13C,13C–13C, 13C–15N, and13C–31PJ-correlations. In thesecond part, the remaining base protons (adenine H2and all exchange labile protons) are assigned, usu-ally by NOE methods, but possibly also by isotopiclabeling andJ-correlation. For both parts, it is im-portant to report how the resonance assignments wereestablished and what assumptions regarding the con-formation of the oligonucleotide were used in makingassignments that involve NOEs. Further pertinent in-formation includes temperature and solvent conditionsand, if applicable, the level and pattern of isotopicenrichment.

6.3. Stereospecific assignment of diastereotopicsubstituents

It is useful to indicate in the assignment table whichmethod was used to make each stereospecific assign-ment. Unless a novel approach was used, the descrip-tion may consist of a brief reference to one of themethods in common use.

Analysis of J-couplings and short-range NOEs.Itis useful to report theJ-couplings and the experi-ments used for their measurement along with anyNOEs utilized in making the stereospecific assign-ments. If computational methods, such as systematicgrid searches, were used, they should be referenced ordescribed briefly.

Stereoassignment by isotopic labeling.It is custom-ary to report how the labeling was achieved (e.g.,isotopic composition of cell growth medium) andwhat the observed levels of enrichment were for thepertinent sites in the biomolecule.

Stereospecific assignments by reference to preliminarystructures. Indicate the software package and anystatistical criteria (confidence level) employed.

6.4. Conformational equilibria

The criteria used to deduce the existence of multiplestates (as may result, for example, from partial folding,oligomeric heterogeneity, cofactor or ligand bindingheterogeneity, peptide bondcis/trans isomerization,or hairpin-duplex equilibrium) are of prime interest.A separate designator is supplied for each individualstate and associated with its NMR parameters. Frac-tional occupancies and interconversion rates for thestates (or bounds on these) may be included if known.

7. Conformational features derived fromdiagnostic NMR parameters

7.1. Polypeptide secondary structure

It is a special feature of protein structure determinationby NMR that the secondary polypeptide structure, in-cluding the connections between individual segmentsof regular secondary structure, can be determinedearly on in connection with the resonance assign-ments, before the complete structure calculation iseven started. Reports on such identification of regularsecondary structures and tight turns will be of interestalso in the foreseeable future, for example, as a prelim-inary structural characterization of a novel protein orin connection with studies on protein folding. The in-formation used for secondary structure identification isconcisely documented in a survey diagram of the typeshown in Figure 8. The data in Figure 8 include spin-spin coupling constants3JHNHα, sequential NOEsdαN, dNN, anddβN, medium-range NOEsdαN(i, i +

Page 15: Recommendations for the presentation of NMR structures of ...determine the structures of peptides, proteins, and protein–ligand complexes, as well as those of nucleic acids and their

15

Figure 8. Example showing methods recommended for presenting NMR data supporting the identification of regular secondary structure inproteins. The 40-residue protein, pheromone Er-2, is used as an illustration (ref. 51). Above the amino acid sequence, black squares identifyresidues with observably slow hydrogen exchange rates,kex, at the backbone amide (the conditions of the exchange experiment should bespecified). Below the amino acid sequence, filled circles identify residues with3JHNHα < 6.0 Hz, indicative of localα-type conformation;open circles correspond to3JHNHα > 8.0 Hz, indicative of residues in extended chain conformation; crosses identify residues with3JHNHα

values 6.0–8.0 Hz. For the sequential proton-proton NOE connectivities,dαN, dNN, anddβN (dαδ, dNδ anddβδ for Xxx-Pro dipeptides,dαN,dδN anddβN for Pro–Xxx dipeptides), thick and thin bars indicate strong and weak NOE intensities, respectively. The observed medium-rangeNOEsdαN(i, i + 3), dαβ(i, i + 3), dαN(i, i + 4), dNN(i, i + 2), anddαN(i, i + 2) are indicated by lines connecting the two residues that are

related by the NOE.13Cα chemicals shifts relative to the random coil values,1δ(13Cα), are plotted at the bottom of the figure, where positivevalues are shifts to lower field. The sequence locations of three helices are indicated at the bottom; broken lines are used to indicate that theidentification of helix 2 from these data is uncertain.

3), dαβ(i, i + 3), dαN(i, i + 4), dNN(i, i + 2), anddαN(i, i + 2) (ref. 47), and conformation-dependentchemical shifts for theα-carbons,1δ (13Cα). Thediagram can be expanded readily for inclusion of ad-ditional parameters of diagnostic value for secondarystructure determination. Figure 8 gives complete datafor identification of helical structure (ref. 33) andN-caps in helices (ref. 49). Although segments ofextended chain also can be recognized from presen-tations of this type (Figure 8), additional informationon long-range NOEs is needed for the identification ofβ-sheets. A two-dimensional drawing of theβ-sheetsenables a concise presentation of these data, whichmay even include some details on the experimentsused (Figure 9).

7.2. Polynucleotides

7.2.1. MononucleosidesThe conformation of a mononucleoside within a nu-cleic acid structure is determined by the sugar confor-mation and the glycosidic torsion angle. Diagnostic

characterization of these structural features often ispossible from simple inspection of the NMR spectra.• In a ribose ring,3JH1′H2′ ≈ 1 Hz is diagnostic for

theN-puckered conformation (P = 0◦ to 18◦), and3JH1′H2′ ≈ 7.9 Hz is diagnostic for theS-puckeredconformation (P = 144◦ to 162◦).

• In a deoxyribose ring,3JH1′H2′ ≈ 1.8 Hz is diag-nostic for theN-puckered conformation (P = 0◦to 18◦), and3JH1′H2′ ≈ 10 Hz is diagnostic for theS-puckered conformation (P = 144◦ to 162◦).

• The distance between H6 (in pyrimidines) or H8(in purines) and the sugar ring proton H1′ variesfrom∼2.5 Å (0.25 nm) in thesyn(χ ≈ 60◦) orien-tation to∼3.7 Å (0.37 nm) in theanti (χ ≈ 240◦)orientation. Theanti orientation, which is found inA- and B-type helices, therefore, is characterizedby cross-relaxation cross peaks of low intensity.Thesynconformation, which is found, for exam-ple, in the G-residues in Z-DNA, is characterizedby cross-relaxation cross peaks of high intensity.Pyrimidine residues are rarely found in thesynorientation.

Page 16: Recommendations for the presentation of NMR structures of ...determine the structures of peptides, proteins, and protein–ligand complexes, as well as those of nucleic acids and their

16

Figure 9. Presentation of NMR evidence for the presence ofβ-sheet in a protein (the example shown is Tendamistat (ref. 33)). Slowlyexchanging amide protons are shown in bold. Interstrand, long-range1H–1H NOEs dαα(i, j), dαN(i, j), anddNN(i, j) are indicated bydouble-headed arrows. Similar representations can be drawn for parallelβ-sheets.

• The distance between H6 (in pyrimidines) or H8(in purines) and the sugar ring proton H2′ variesfrom ∼2 Å (0.2 nm) in theanti orientation to∼4 Å (0.4 nm) in thesyn orientation. Becausethese distances are nearly independent of the sugarpucker, the magnitudes of their cross-relaxationcross peaks are particularly useful for estimatingthe magnitude of the glycosidic angleχ.

• Until a specific format for the presentation of thesediagnostic data has been agreed upon, descriptionin the text is recommended.

7.2.2. Backbone torsion anglesIn favorable situations, heteronuclear1H–31P couplingconstants can be used to recognize preferred backboneconformations. The preferred orientation of the back-bone angleβ (P–O5′–C5′–C4′) in helix-type structuresis in the trans range, which is characterized by rela-tively small values for3JH5′P and3JH5′′P (about 2 to 5Hz). Unusual folding patterns (e.g., in loop structures)can lead toβ values in theg+ range, characterized by3JH5′′P ≈ 24 Hz, or theg− range, characterized by3JH5′P ≈ 24 Hz. In B-type DNA helices, the torsionanglesβ andγ are in the preferredβt andγ+ confor-mations, and the fragment P–O5′–C5′–C4′–H4′ formsan all-trans, planar,W-type geometry. This results in along-range4JH4′P coupling constant of 2–3 Hz, whichis large enough to lead to discernible1H–31P hetero-COSY cross peaks. In A-type RNA helices,β andγ

adopt similar preferred conformations, but the result-ing geometry is not of the ‘ideal’, all-trans, planar,

W-type, and thus no1H–31P cross peaks are visible forthe H4′–P combination. The backbone torsion angleε, which has the preferred orientationεt, is charac-terized by a similar effect: whenε switches to theε− orientation (ε+ is forbidden), the fragment H2′–C2′–C3′–O3′–P can form a planar,W-type geometry,which is characterized by the presence of1H2′–31Phetero-COSY cross peaks. Until a specific format forthe presentation of these data has been proposed, it isrecommended that they be presented either in a tableor as part of the text.

7.2.3. Secondary structureThe observation of imino proton resonances between11.5 and 14.5 ppm is indicative of the presence ofbase-paired helical regions in a nucleic acid. The reso-nance intensity in this spectral range provides a lowerlimit to the number of helical base pairs. The occur-rence of imino proton resonances atδDSS> 14.5 ppmis indicative of the presence of protonated cytosineinvolved in base-pair hydrogen bonding.

8. Collection of input constraints and structurecalculation

The collection of constraints used in calculating thestructure of a protein or nucleic acid usually is an it-erative process, where an initial fold, determined onthe basis of a small number of constraints, supports

Page 17: Recommendations for the presentation of NMR structures of ...determine the structures of peptides, proteins, and protein–ligand complexes, as well as those of nucleic acids and their

17

the identification of additional constraints, which sup-plement the original constraints in the next round ofrefinement. Thus, the collection of input constraintscan be considered an integral part of the structure cal-culation process, and both aspects are dealt with in thissection.

8.1. Conformational constraints

8.1.1. Distance constraints from NOE dataIt is customary to describe the methods used to as-sign NOE cross peaks and those employed to convertNOE cross peak intensities into interproton distanceconstraints. It is essential that the procedure followedin determining the constraints be described in suffi-cient detail that a worker could repeat their derivationgiven equivalent NOE data. It is important to specifywhether all types of protons were treated equally, orwhether special corrections were made, for example,to account for motional averaging or interactions in-volving methyl groups. For oligomeric proteins, it isimportant to describe if and how intermolecular NOEswere distinguished from intramolecular interactions.

8.1.2. Torsion angle constraints fromJ-couplingdata

It is usual to specify the methods used to relateJ-couplings to structural constraints. This includes thespecific Karplus-type relationships employed, theirrepresentation by an energetic penalty function duringthe structure calculation (see for example, ref. 50), andpossibly additional criteria used to resolve the up tofourfold degeneracy of the Karplus relation betweentorsion angle andJ-coupling.

8.1.3. Torsion angle constraints from chemicalshifts

If chemical-shift-derived torsion angle constraints areused in the structure refinement procedure, it is impor-tant to describe the energetic penalty function used,the method by which the predicted shift was calcu-lated, and the refinement stages in which chemicalshift information was included.

8.1.4. Constraints representing disulfide bonds,hydrogen bonding, and metal coordination

Standard upper and lower distance constraints areavailable to represent disulfide bonds that have beenidentified either by chemical methods or from NMRresults (ref. 51). Similarly, standard upper and lower

distance constraints are available to represent hydro-gen bonds inferred from diagnostic NMR data (seesection 7) (ref. 51). Hydrogen bonding distances forthe Watson–Crick and Hoogsteen base pairs are usu-ally taken from Saenger (ref. 52) with error estimatesof ± 0.2 Å (0.02 nm). Justification is needed, in allcases, for the selection of the ligating atoms and, inthe case of metal coordination, for the coordinationgeometry used in the structure calculation. In all cases,documentation of how these constraints were incor-porated in the energetic penalty function, and at whatstage of the analysis, is of interest.

8.1.5. Supplementary constraints for structurerefinement

Structural information may be derived from a varietyof additional sources, including evidence of hydration,effects of paramagnetic shift and relaxation reagents,isotope shifts, nuclear quadrupole couplings, photo-chemical chemically induced dynamic nuclear polar-ization (photo CIDNP) experiments, field dependenceof J-couplings, anisotropy of relaxation times, etc. It isimportant to describe in some detail the way in whichthese data were collected and the means by which theywere applied to the structure calculation. If an ener-getic penalty function was used, its functional formshould be specified. It is important to document all as-sumptions and additional sources of information usedin calculating the structure from the experimental data.

8.1.6. Input for the final structure calculationData bank deposition of the input for the final struc-ture calculation is considered to be as important asthe deposition of the final atomic coordinates (section10). Standard formats for the deposition of these con-straints (sections 8.1.1.–8.1.5.) are available from thedata banks: BMRB (ref. 2) and PDB (ref. 1).

In publications of structures of proteins and nu-cleic acids, it is customary to provide the followingas a concise overview of the types of input constraintsdetermined. (Prior to this count, NOEs that cannotresult in a constraint due to the distance limits im-posed by the covalent geometry, for example, upperlimit distance constraints> 3.3 Å (0.33 nm) for vicinalprotons, should be eliminated. Upper and lower limitsbelonging to the same NOE should be counted as oneconstraint.):• The number and type of NOE (or rotating-frame

Overhauser effect (ROE)) constraints, classifiedas, intraresidue NOEs (i − j = 0), sequen-tial NOEs (|i − j | = 1), medium-range NOEs

Page 18: Recommendations for the presentation of NMR structures of ...determine the structures of peptides, proteins, and protein–ligand complexes, as well as those of nucleic acids and their

18

(|i − j | < 5), long-range NOEs (all other in-tramolecular), and intermolecular constraints (ifapplicable).

• The number of torsion angle orJ-coupling con-straints used, classified by the type of angle (φ, ψ,χ1, etc., in polypeptides;β, γ, δ, ε, χ, the sugarpucker constraints, and, if used, the staggeredconformations of theα and ζ angles in nucleicacids).

• The number of hydrogen bond constraints (witha distinction made between the number of con-strained hydrogen bonds and the number of con-straints used to describe these bonds).

• If applicable, other types of constraints.

8.2. Methods used for structure calculationand refinement

The computational methods used for structure deter-mination and refinement (including names and ver-sions of software packages used) need to be specifiedin adequate detail, and it is important to provide refer-ences to parameter sets, potential functions and valuesof force constants used, or to include these values ex-plicitly in the report if they have not been publishedelsewhere.

8.2.1. Structure calculationIt is desirable that the process for deriving the three-dimensional structure from the NMR data be de-scribed clearly, including the starting conformation(s),numbers and types of constraints used, and the cri-teria used to reject unfavorable folds. It is importantto describe the protocol used in minimizing the en-ergetic penalty function during the various stages ofrefinement (for example, the initial and final tempera-tures together with the number of steps, the step size,and the force constants used in simulated annealingprotocols).

8.2.2. Preliminary structures to assign additionalconstraints

Currently, most structure determination protocols re-move ambiguities from NOE cross-peak assignmentsby reference to an initial fold calculated with an in-complete set of (ideally unambiguous) constraints. It isimportant to document any ambiguity in the initial setof constraints along with the methods used for assign-ing additional constraints, for example, by referenceto the software used and by reporting adequate detailson the procedures (number of iterations, criteria and

cut-off values) employed for the rejection of NOEs atvarious stages in the refinement.

8.2.3. Structure refinementStructure refinement procedures which rely on compu-tationally intensive programs for relaxation matrix re-finement or chemical shift calculations are frequentlyonly incorporated during the final stages of the struc-ture determination. It is important to provide sufficientdetails regarding the use of these procedures, includ-ing their effects on the average structure used as inputfor the refinement, and their effect on the root-mean-square deviation of the ensemble of structures.

9. Reporting three-dimensional structures

9.1. Presentation of structures

In reporting three-dimensional NMR structures it iscommon practice to display the entire ensemble ofconformers used for statistical analysis. In addition,a representative conformer, suitable for detailed struc-tural interpretation, often is presented. The represen-tative conformer has been identified variously as thestructure that best satisfies the NMR constraints, as theenergy-minimized structure derived from the averagedcoordinates of the ensemble, or as the structure thatis closest to the averaged coordinates of the ensemble(ref. 53). Subtleties in the averaging of NMR prop-erties (ref. 54), which differ from averaging in X-raycrystallography, need to be considered. It is importantto specify the total number of conformers calculatedand the criteria for selection of the subset of conform-ers, including the representative conformer used fordisplay and analysis.

9.2. Agreement of structures with constraints

• It is important to report complete statistics as tohow well the structures satisfy the constraints. Themaximum violations of distance bounds, torsionangle constraints, coupling constants, chemicalshifts, or other experimental constraints used inthe structure calculation, are normally reported,together with the average violation per constraint(± standard deviation).

• When computational methods are used that pro-vide a measure of the agreement of the structureswith the constraints or with direct spectroscopicdata such as NOE intensities, the final value of

Page 19: Recommendations for the presentation of NMR structures of ...determine the structures of peptides, proteins, and protein–ligand complexes, as well as those of nucleic acids and their

19

the target function or figure of merit (R-factor)normally is reported. In the particular case of re-laxation matrix calculation of NOE intensities, itis appropriate to specify the functional form of theR-factor (e.g., direct,r6 weighted, etc.).

• It is usual to specify the deviations from ideal-ized covalent geometry for bond lengths, bondangles, and impropers and to indicate which ide-alized geometry was used. Since covalent bondlengths and angles are restrained to ‘ideal’ valuesin all methods currently used to compute NMRstructures, deviations from ideality do not pro-vide an independent assessment of the overallquality of the structures but can identify problemregions where constraints might be too tight orwhere constraints may be in conflict as a result ofconformational averaging.

9.3. Precision of structures

An ensemble of conformers calculated from the sameset of input data is widely used to represent an NMR-derived structure. The precision of the structure de-termination commonly is expressed in terms of astatistical analysis of the variation of the atomic co-ordinates and torsion angles among these conformers.The precision may then be reported either as averagepairwise root mean square (rms) deviation or the rmsdeviation relative to the mean coordinates. A mean-ingful description of the precision of local structureis provided by the circular variance (ref. 55) or theangular order parameter (ref. 56) for torsion angles. Itis useful to specify the method of superposition used inobtaining the mean coordinates. For proteins, the aver-age rms deviation for Cartesian coordinates usually isreported for a set of backbone heavy atoms, i.e., sets of(Cα), (N, Cα, C), or (N, Cα, C, O) atoms, all side-chainheavy atoms, and all heavy atoms (backbone plus sidechain). It is important to indicate whether the reportedrms deviations apply to all amino acid residues or onlyto a selected subset. For nucleic acids, the atoms usedfor calculations of rms deviations should be specified.

Useful supplementary information includes eval-uation of the rms deviations along the sequence inlight of the density of constraints per residue, in or-der to better evaluate the significance of the apparentprecision.

9.4. Validation of structures

In common with other methods for three-dimensionalstructure analysis, for example X-ray diffraction in

single crystals, there is no direct method for ab-solute validation of the result of an NMR structuredetermination. Nonetheless, a number of criteria areavailable for investigating whether the result of anNMR structure determination is ‘reasonable’. Suchapproaches have been developed only recently and arebeing used to evaluate the results of both NMR andX-ray structure determinations. Additional techniquesare currently in development in different laborato-ries. These commonly are based on the database ofcurrently available three-dimensional structures (refs.55–61). Criteria used for structure validation includethe following:• Conformational energy (either Lennard-Jones or

total energy) with associated force field used.• For proteins, a Ramachandran (φ, ψ) plot for the

backbone torsion angles in the family of conform-ers.

• For more detailed assessment of the stereochemi-cal quality of a protein structure, several computerprograms are available (reviewed in ref. 60). Thesemay identify regions of the structure in which po-tential problems require further evaluation. Mostof these programs have been written primarily forchecking X-ray coordinates, rather than an ensem-ble of conformers obtained by NMR, but a recentlypublished suite of programs (ref. 61) has beendesigned for the validation of NMR structures.

10. Data bank deposition of NMR structures andsupporting data

Deposition in public data banks of the quantitative andsemi-quantitative data and references to specific pro-cedures used to manipulate the data, as described inthese recommendations, is strongly encouraged, evenin cases where this duplicates information tabulated injournal articles. The key elements in a data depositionfor an NMR structure include:• Representation of the covalent structure(s) of the

molecule(s) in the system reported (generally a se-quence plus indications of cross links and specialmodifications) plus information on bound cofac-tors or other ligands and oligomeric structure.

• Tabulation of assigned chemical shifts.• Tabulation of assigned coupling constants.• Tabulation of constraints used in the structure

calculation: distance constraints from NOEs (orthe spectroscopic data from which the constraints

Page 20: Recommendations for the presentation of NMR structures of ...determine the structures of peptides, proteins, and protein–ligand complexes, as well as those of nucleic acids and their

20

were derived), torsion angle constraints fromJ-couplings and chemical shifts, H-bond constraints,disulfide bridge constraints, and supplementaryconstraints: it is important to indicate how theconstraints were calibrated and the nature of anypseudoatom or other corrections (e.g., to accountfor spin diffusion or spin multiplicity) includedin the constraint values and to specify what typeof averaging protocol (e.g., center averaging,r−6

averaging, etc.) has been used (refs. 54 and 62).• Cartesian coordinates for the family of conformers

that represent the result of the structure deter-mination as well as for a single representativeconformer: it is important to include all availableatoms in the deposition, including the protons.

• Concise description of the solution conditions usedfor the structure determination (temperature, pH,ionic composition, etc.).

• Literature citations for the studies that originatedthe deposited data.Compilers of databases are encouraged to accom-

modate the deposition of supplementary informationof the kind described in the preceding sections eitheras free text or in a more organized format. This mayinclude a brief description of the computational tech-niques (including specification of the software) used toprocess the input data, derive and refine the structures,and validate the results.

Additional information usefully available from adatabase includes tables of NMR relaxation rates, hy-drogen exchange rates or protection factors, and ther-modynamic parameters characterizing conformationalequilibria and ligand binding. Some authors may wishto deposit electronic files containing the free induc-tion decays from a representative cross-relaxation ex-periment and/or lists of peaks and intensities of theprimary data sets.

In analogy with the reporting of structures deter-mined by X-ray crystallography, it is desirable thatjournal editors require, as a condition for acceptanceof a publication, database deposition of at least (1)the atomic coordinates, (2) the assigned chemicalshifts, (3) the assignedJ-couplings, and (4) a com-pilation of all input constraints used for the structuredetermination.

Conclusions

The recommendations presented here are designedto support easier communication of NMR data and

NMR structures for proteins and nucleic acids throughunified nomenclature and reporting standards. Muchof this document pertains to the reporting of datain journal articles. However, in the best interest ofthe future development of structural biology, it isdesirable that the bulk of the reported informationbe stored in computer-accessible form and be freelyaccessible to the scientific community. For such pur-pose, the macromolecular crystallographic communityis advocating use of the mmCIF format (refs. 63 and64), which is compatible with the STAR/CIF for-mat (refs. 65 and 66); mmCIF has been implementedin the Nucleic Acid Database. In recognition of thedesirability of developing full compatibility betweenmachine-readable crystal diffraction and NMR data,it is recommended that a compatible format be usedfor NMR data from proteins and nucleic acids. It mayprove desirable, in addition, for the databases to de-velop full compatibility with the ASN.1 data exchangeformat (refs. 67–69), which has been adopted by theU.S. National Center for Biotechnology Informationand is being used in the chemical exchange format de-veloped by the Chemical Abstracts Service (ref. 70).With these goals in mind, international committees,in association with the Protein Data Bank and Bio-MagResBank, are in the process of using the presentdocument to develop data dictionaries and author-input protocols for the deposition of macromolecularNMR data (ref. 71).

Acknowledgement

We are debted to the International Union of Pureand Applied Chemistry for permission to reprint theseRecommendationsthat were previously published inPure and Applied Chemistry (1998) 70, 117–142.©IUPAC, 1988.

Editorial comment

TheseRecommendationsare being reprinted as a ser-vice to our readers. Publication does not imply en-dorsement of the recommendations by the journal butrather presents this information for consideration.

Page 21: Recommendations for the presentation of NMR structures of ...determine the structures of peptides, proteins, and protein–ligand complexes, as well as those of nucleic acids and their

21

References and notes

1. Bernstein, F.C., Koetzle, T.F., Williams, G. J.B., Meyer Jr.,E.F., Brice, M.D., Rogers, J.R., Kennard, O., Shimanouchi, T.and Tasumi, M. (1977) The Protein Data Bank: A Computer-Based Archival File for Macromolecular Structures,J. Mol.Biol., 112, 535–542. The Protein Data Bank can be accessedon the Internet at http://www.pdb.bnl.gov/.

2. Seavey, B.R., Farr, E.A., Westler, W.M. and Markley, J.L.(1991) A Relational Database for Sequence-Specific ProteinNMR Data, J. Biomol. NMR, 1, 217–230; Markley, J. L.,Farr, E.A. and Seavey, B.R. (1991) Appendix: Check Listfor Protein NMR Data to be Submitted for Publication or forDatabase Inclusion,J. Biomol. NMR, 1, 231–236. The Bio-molecular NMR Data Bank (BioMagResBank or BMRB) canbe reached on the Internet at http://www.bmrb.wisc.edu/.

3. Berman, H.M., Olson, W.K., Beveridge, D.L., Westbrook,J., Gelbin, A., Demeny, T., Hsieh, S.-H., Srinivasan, A.R.and Schneider, B. (1992) The Nucleic Acid Database: AComprehensive Relational Database of Three-DimensionalStructures of Nucleic Acids,Biophys. J., 63, 751–759. TheNucleic Acid Data Bank can be reached on the Internet athttp://www.ndbserver.rutgers.edu:80/.

4. Hendrickson, W.A. and Wüthrich, K. (1991)MacromolecularStructures, Current Biology, London, UK.

5. Nomenclature Committee of IUBMB (NC-IUBMB) andIUPAC-IUBMB Joint Commission on Biochemical Nomen-clature (JCBN) (1992) Newsletter 1992,Biochem. Int., 26,567–575; and references therein.

6. Commission on Molecular Structure and Spectroscopy (1972)Recommendations for Presentation of NMR Data for Publica-tion in Chemical Journals,Pure Appl. Chem., 29, 627–628.

7. Commission on Molecular Structure and Spectroscopy (1976)Presentation of NMR Data for Publication in ChemicalJournals–B. Conventions Relating to Spectra from Nucleiother than Protons (Recommendations 1975),Pure Appl.Chem., 45, 219.

8. Harris, R.K., Kowalewski, J., and Cabral de Menezes, S.(1996) Parameters and Symbols for use in Nuclear MagneticResonance, draft IUPAC document to be published inPureAppl. Chem., 69, 2489–2495.

9. Liébecq, C. (Ed.) (1992) Biochemical Nomenclature and Re-lated Documents – A Compendium, 2nd ed., Portland Presson behalf of IUBMB, London, U.K.

10. Bonnett, R. (1978) In The Porphyrins (Ed., Dolphin, D.), Vol.1, Part A, Academic Press, New York, NY, pp. 1–27.

11. IUPAC-IUB Commission on Biochemical Nomenclature(CBN) (1975) Nomenclature of Iron-Sulfur Proteins. Recom-mendations 1973,Eur. J. Biochem., 35, 1–2.

12. IUPAC-IUB Commission on Biochemical Nomenclature(1975) The Nomenclature of Peptide Hormones, Recommen-dations 1974,Biochemistry, 14, 2559–2560; reprinted in theCompendium (ref. 9).

13. IUPAC-IUB Commission on Biochemical Nomenclature(1972) Abbreviated Nomenclature of Synthetic Polypep-tides (Polymerized Amino Acids), Revised Recommendations1971, Biochemistry, 11, 942–944; reprinted in the Com-pendium (ref. 9).

14. Nomenclature Committee of the International Union of Bio-chemistry (NC-IUB) Nomenclature for Incompletely Speci-fied Bases in Nucleic Acids (1985),Eur. J. Biochem., 150,1–5; reprinted in the Compendium (ref. 9).

15. Access to various databases (e.g., GenBank, DDBJ,EBI (EMBL), EC Enzyme Data Bank, PIR, SWISS-

PROT, PRF, PDB, NDB, BMRB) is provided over theInternet. Useful compilations of Internet addresses arefound, for example, on the home pages of the Pro-tein Data Bank (http://www.pdb.bnl.gov/), BioMagResBank(http://www.bmrb.wisc.edu) and the NCSA Biology Work-bench (http://biology.ncsa.uiuc.edu).

16. Protein Identification Resource, Georgetown University,Washington, D.C., U.S.A. The Protein Identifica-tion Resource can be searched with tools located at:http://www.gdb.org/Dan/proteins/pir.html.

17. SWISS-PROT Annotated Protein Sequence Database(http://www.expasy.hcuge.ch/sprot/sprot-top.ht ml).

18. International Union of Biochemistry and Molecular Biol-ogy (1992) Enzyme Nomenclature, Recommendations 1992,Academic Press, Orlando, FL, U.S.A.

19. Chemical Abstracts Service, Columbus, OH, U.S.A.(http://info.cas.org/).

20. IUPAC-IUB Commission on Biochemical Nomenclature(CBN) (1970) Abbreviations and Symbols for the Descrip-tion of the Conformation of Polypeptide Chains, TentativeRules 1969,Biochemistry, 9, 3471–3479; reprinted in theCompendium (ref. 9).

21. IUPAC-IUB Joint Commission on Biochemical Nomen-clature (JCBN) (1984) Nomenclature and Symbolism forAmino Acids and Peptides, Recommendations 1983,Eur. J.Biochem., 138, 9–37; reprinted in the Compendium (ref. 9).

22. IUPAC-IUB Joint Commission on Biochemical Nomenclature(JCBN) (1983) Abbreviations and Symbols for the Descriptionof Conformations of Polynucleotide Chains, Recommenda-tions 1982, Eur. J. Biochem., 131, 9–15; reprinted in theCompendium (ref. 9).

23. Hanson, K.R. (1966) Applications of the Sequence Rule. I.Naming the Paired Ligandsg,g at a Tetrahedral AtomXggij.II. Naming the Two Faces of a Trigonal AtomYghi, J. Am.Chem. Soc., 88, 2731–2742.

24. The 1983 recommendations (ref. 21) specified the use of num-bers to designate atoms and the use of the R/S rules of Cahn,Ingold and Prelog (refs. 25 and 26, see also refs. 27–29) to de-fine stereochemistry. According to theR/Srules, a tetrahedralcarbonX with prochiral substituentsA > B > C = C′ (wherethe priority is according to decreasing atomic mass andC andC′ have equivalent mass and are labeled arbitrarily) is labeledas follows: one sights down theA–X axis and assumes thatC′takes precedence overC (as it would if it were replaced, forexample, with a heavier isotope). Then, ifB, C′, andC (pro-jecting away from the viewer) are clockwise,C′ is labeled aspro-RandC is labeled aspro-S; if they are counterclockwise,C′ is labeled aspro-SandC is labeled aspro-R.

25. Cahn, R.S., Ingold, C.K. and Prelog, V. (1966) Specification ofMolecular Chirality,Angew. Chem. Int. Ed. Engl., 5, 385–415;511.

26. Prelog, V. and Helmchen, G. (1982) Basic Principles of theCIP-System and Proposals for a Revision,Angew. Chem. Int.Ed. Engl., 21, 567–583.

27. Blackwood, J.E., Gladys, C.L., Loening, K.L., Petrarca,A.E. and Rush, J.E. (1968) Unambiguous Specification ofStereoisomerism about a Double Bond,J. Am. Chem. Soc.,90, 509–510.

28. Blackwood, J.E., Gladys, C.L., Petrarca, A.E., Powell, W.H.and Rush, J.E. (1968) Unique and Unambiguous Specificationof Stereoisomerism about a Double Bond in Nomenclature andOther Notation Systems,J. Chem. Doc., 8, 30–32.

29. IUPAC Commission on Nomenclature of Organic Chemistry(CNOC) (1976) Nomenclature of Organic Chemistry, Sec-

Page 22: Recommendations for the presentation of NMR structures of ...determine the structures of peptides, proteins, and protein–ligand complexes, as well as those of nucleic acids and their

22

tion E: Stereochemistry, Recommendations 1974,Pure Appl.Chem., 45, 11–30 (1976); reprinted in the Compendium (ref.9).

30. Wüthrich, K., Billeter, M. and Braun, W. (1983) Pseudo-Structures for the 20 Common Amino Acids for Use in Studiesof Protein Conformations by Measurements of IntramolecularProton–Proton Distance Constraints with Nuclear MagneticResonance,J. Mol. Biol., 169, 949–961.

31. Since the experimental NOEs involve distances to the in-dividual protons rather than to the reference point of thepseudoatom, NOE distance constraints involving pseudoatomsmust contain a correction factor. A conservative and straight-forward approach is to set these corrections equal to thedistance between the reference point of the pseudoatom andthe protons that it replaces. Tables listing these corrections forthe common amino acid residues have been published (refs. 30and 33) as well as more refined corrections (for example, seerefs. 32 and 34).

32. Koning, T.M.G., Boelens, R. and Kaptein, R. (1990) Calcu-lation of the Nuclear Overhauser Effect and Determination ofProton–Proton Distances in the Presence of Internal Motions,J. Magn. Reson., 90, 111–123.

33. Wüthrich, K. (1986) NMR of Proteins and Nucleic Acids,Wiley Interscience, New York, NY.

34. Güntert, P., Braun, W. and Wüthrich, K. (1991) Efficient Com-putation of Three-Dimensional Protein Structures in Solutionfrom Nuclear Magnetic Resonance Data Using the ProgramDIANA and the Supporting Programs CALIBA, HABAS andGLOMSA, J. Mol. Biol., 217, 517–530.

35. Historical changes in the definition of these angles have con-tributed to confusion. Their use in numerous textbooks doesnot conform to the current IUPAC-IUBMB nomenclature:Clauwaert, J. and Xia, J.-Z. (1993) Straightening out theDihedral Angles,Trends Biochem. Sci., 18, 317–318; 464.

36. Wilmott, C.M. and Thornton, J.M. (1990)β-Turns and theirDistortions: A Proposed New Nomenclature,Protein Eng., 3,479–493.

37. Richardson, J.S. (1981) The Anatomy and Taxonomy ofProtein Structure,Adv. Protein Chem., 34, 167–339.

38. Milner-White, E.J., Bell, L.H. and Maccallum, P.H. (1992)Pyrrolidine Ring Puckering incis- andtrans-Proline Residuesin Proteins and Polypeptides. Different Puckers are Favouredin Certain Situations,J. Mol. Biol., 228, 725–734.

39. De Leeuw, H.P.M., Haasnoot, C.A.G. and Altona, C. (1980)Empirical Correlations between Conformational Parametersin β-D-Furanoside Fragments Derived from a Statistical Sur-vey of Crystal Structures of Nucleic Acid Constituents,Isr. J.Chem., 20, 108–126.

40. Haasnoot, C.A.G., De Leeuw, F.A.A.M., de Leeuw, H.P.M.and Altona, C. (1981) The Relationship between Proton-Proton NMR Coupling Constants and Substituent Electroneg-ativities,Org. Magn. Reson., 15, 43–52.

41. Harris, R.K., Becker, E.D., Bremser, W., Cabral de Menezes,S., Goodfellow, R. and Granger, P. (1997) IUPAC (Commis-sion on Molecular Structure and Spectroscopy, I.5) Recom-mendations for NMR Nomenclature. A. Nuclear Spin Proper-ties and Conventions for Chemical Shifts (in preparation).

42. Wishart, D.S., Bigam, C.G., Yao, J., Abildgaard, F., Dyson,H.J., Oldfield, E., Markley, J.L. and Sykes, B.D. (1995)1H,13C and 15N Chemical Shift Referencing in BiomolecularNMR, J. Biomol. NMR., 6, 135–140.

43. In a dilute aqueous solution at 298 K (0.3 mM DSS and0.03 mM TMS in D2O at pH∗ 7), the signal of TMS is atδDSS = −0.0173 ppm (i.e., at low frequency, historically

referred to as ‘upfield’, from that of DSS); in a dilute or-ganic solution at 298 K (0.03 mM TMS and 0.01 mM DSSin perdeuterated dimethyl sulfoxide), the signal of TMS is atδDSS= 0.0246 ppm (i.e., to high frequency from that of DSS).

44. Live, D.H., Davis, D.G., Agosta, W.C. and Cowburn, D.(1984) Long Range Hydrogen Bond Mediated Effects in Pep-tides:15N NMR Study of Gramicidin S in Water and OrganicSolvents,J. Am. Chem. Soc., 106, 1939–1943.

45. The4 value recommended here for2H was determined fromthe ratios of1H and 2H signals of a water sample at 298 Kas measured by Dr. Frits Abildgaard and David M. LeMas-ter at the National Magnetic Resonance Facility at Madison.This value is a surrogate for the isotopomers of DSS, and itsderivation assumes that there is no primary isotope effect onthe ratio.

46. The 4 value recommended here for31P is the averageof values determined independently, one at the NationalMagnetic Resonance Facility at Madison by F. Abildgaard(40.4808651), and the other at the National Institutes of Healthby A. Bax (40.480862). Measurements were made as 298 Kin D2O solution containing 10% trimethylphosphate as theindirect reference and DSS as the direct reference.

47. NOE distances may be represented rigorously by notationanalogous to that used forJ-couplings. However, since protonsare always implied, a shorthand notation is widely used: forexample,dαN for dHαHN, dNN for dHNHN, dβN for dHβHN,anddαβ for dHαHβ.

48. Williamson, M.P., Havel, T.F. and Wüthrich, K. (1985) So-lution Conformation of Proteinase Inhibitor IIA from BullSeminal Plasma by1H NMR Spectroscopy and DistanceGeometry,J. Mol. Biol., 182, 295–315.

49. Gronenborn, A.M. and Clore, G.M. (1994) Identification ofN-Terminal Helix Capping Boxes by Means of13C ChemicalShifts,J. Biomol. NMR, 4, 455–458.

50. Eberstadt, M., Gemmecker, G., Mierke, D.F. and Kessler, H.(1995) Scalar Coupling Constants–Their Analysis and theirApplication for the Elucidation of Structures,Angew. Chem.Int. Ed. Engl., 34, 1671–1695.

51. Ottiger, M., Szyperski, T., Luginbühl, L., Ortenzi, C., Lu-porini, P., Bradshaw, R.A. and Wüthrich, K. (1994) The NMRSolution Structure of the Pheromone Er-2 from the CiliatedProtozoanEuplotes raikovi, Protein Sci., 3, 1515–1526.

52. Saenger, W., (1983) Principles of Nucleic Acid Structure,Springer Verlag, New York, NY, pp. 123–124.

53. Sutcliffe, M.J. (1993) Representing an Ensemble of NMR-Derived Protein Structures by a Single Structure,Protein Sci.,2, 936–944.

54. Fletcher, C.M., Jones, D.N.M., Diamond, R. and Neuhaus, D.(1996) Treatment of NOE Constraints Involving Equivalent orNonstereoassigned Protons in Calculations of Biomacromole-cular Structures,J. Biomol. NMR, 8, 292–310.

55. MacArthur, M.W. and Thornton, J.M. (1993) Conforma-tional Analysis of Protein Structures Derived from NMR Data,Proteins, 17, 232–251.

56. Hyberts, S.G., Goldberg, M.S., Havel, T.F. and Wagner, G.(1992) The Solution Structure of Eglin c Based on Mea-surements of Many NOEs and Coupling Constants and itsComparison with X-ray Structures,Protein Sci., 1, 736–751.

57. Colovos, C. and Yeates, T.O. (1993) Verification of Pro-tein Structures: Patterns of Nonbonded Atomic Interactions,Protein Sci., 2, 1511–1519.

58. Sippl, M.J. (1993) Recognition of Errors in 3-DimensionalStructures of Proteins,Proteins, 17, 355–362.

Page 23: Recommendations for the presentation of NMR structures of ...determine the structures of peptides, proteins, and protein–ligand complexes, as well as those of nucleic acids and their

23

59. Vriend, G. and Sander, C. (1993) Quality Control of Pro-tein Models: Directional Atomic Contact Analysis,J. Appl.Crystallogr., 26, 47–60.

60. MacArthur, M.W., Laskowski, R.A and Thornton, J.M.(1994) Knowledge-Based Validation of Protein-Structure Co-ordinates Derived by X-ray Crystallography and NMR Spec-troscopy,Curr. Opin. Struct. Biol., 4, 731–737.

61. Laskowski, R.A., Rullmann, J.A.C., MacArthur, M.W.,Kaptein, R. and Thornton, J.M. (1996) AQUA andPROCHECK–NMR: Programs for Checking the Quality ofProtein Structures Solved by NMR.J. Biomol. NMR, 8, 477–486.

62. Jardetzky, O. and Roberts, G.C.K. (1981) NMR in MolecularBiology, Academic Press, New York, NY, pp. 126–142.

63. Watenpaugh, K.D., Berman, H.M., Bourne, P.E. and Fitzger-ald, P.M.D. (1993) The Macromolecular CIF Dictionary,Acta Crystallogr. Ser. A, 49 (Supplement), IVI InternationalCongress (IUCr), Abstract OCM–18.02.07.

64. Fitzgerald, P.M.D., Berman, H., Bourne, P. and Watenpaugh,K. (1993) The Macromolecular CIF Dictionary, AbstractNo. D008, American Crystallographic Association AnnualMeeting, Albuquerque, NM, May 23–28 (available from:http://ndbserver.rutgers.edu:80/mmcif/).

65. Hall, S.R. and Spadaccini, N. (1994) The STAR File: DetailedSpecifications,J. Chem. Inf. Comput. Sci., 34, 505–508.

66. Hall, S.R., Allen, F.H. and Brown, I.D. [International Unionof Crystallography, Commission on Crystallographic Data,

Commission on Journals, Working Party on CrystallographicInformation] (1991) The Crystallographic Information File(CIF): a New Standard Archive File for Crystallography,ActaCrystallogr. Ser. A, 47, 655–685.

67. International Organization for Standardization (1987) Infor-mation Processing Systems–Open Systems Interconnection–Specification of Abstract Syntax Notation One (ASN.1), Tech-nical Report ISO8824.

68. International Organization for Standardization (1987) Infor-mation Processing Systems–Open Systems Interconnection–Specification of Basic Encoding Rules for Abstract SyntaxNotation One (ASN.1), Technical Report ISO8825.

69. Ostell, J. (1990) GenInfo ASN.1 Syntax: Sequences, Tech-nical Report 1 & 2, National Center for BiotechnologyInformation, National Library of Medicine.

70. Mockus, J. and Steckert, T.D. (1994) CXF Chemical eX-change Format Version 1.0, Chemical Abstracts Service:Columbus, OH, USA.

71. Ulrich, E.L. and Markley, J.L. (1994) A Data Dictionaryand Information File for the Exchange and Archiving ofNMR Experimental Results, 35th Experimental Nuclear Mag-netic Resonance Conference, April 10–15, Asilomar, CA,Abstract WP 112, p. 264. The BioMagResBank web pages(http://www.bmrb.wisc.edu/) contain complete informationabout this effort including user entry protocols and software.