The Protein Databank Working with protein data-files.

39
The Protein Databank Working with protein data-files
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    229
  • download

    3

Transcript of The Protein Databank Working with protein data-files.

The Protein Databank

Working with protein data-files

Determining Biomolecule Structures

● X-ray crystallography

● Nuclear magnetic resonance

The Protein Databank

The PDB Growth Chart

figGROWTH.eps

Maxim 10.1

Beware of anything in the PDB Header Section

The PDB Data-File Formats

Example PDB structure 1LQT

fig1LQT.eps

Example PDB structure 1M7T

fig1M7T.eps

http://www.rcsb.org/pdb/

http://www.ebi.ac.uk/services/

Downloading PDB data-files

Accessing Data In PDB Entries

● Accessing PDB Annotation Data

● Free R and resolution

REMARK 2 REMARK 2 RESOLUTION. 1.05 ANGSTROMS. REMARK 215 NMR STUDY REMARK 215 THE COORDINATES IN THIS ENTRY WERE GENERATED FROM SOLUTION REMARK 215 NMR DATA. PROTEIN DATA BANK CONVENTIONS REQUIRE THAT REMARK 215 CRYST1 AND SCALE RECORDS BE INCLUDED, BUT THE VALUES ON REMARK 215 THESE RECORDS ARE MEANINGLESS.

Example PDB data-file

. . .REMARK 3 FIT TO DATA USED IN REFINEMENT. REMARK 3 CROSS-VALIDATION METHOD : THROUGHOUT REMARK 3 FREE R VALUE TEST SET SELECTION : RANDOM REMARK 3 R VALUE (WORKING + TEST SET) : 0.134 REMARK 3 R VALUE (WORKING SET) : 0.134 REMARK 3 FREE R VALUE : 0.153 REMARK 3 FREE R VALUE TEST SET SIZE (%) : NULL REMARK 3 FREE R VALUE TEST SET COUNT : 2200 . . .

Example PDB data-file, cont.

Plotting Free R Values against Resolution

figFREER.eps

DBREF 1LQT A 1 456 GB 13882996 AAK47528 1 456 DBREF 1LQT B 1 456 GB 13882996 AAK47528 1 456

DBREF 1AFI 1 72 SWS P04129 MERP_SHIFL 20 91

DBREF 1M7T A 1 66 SWS P10599 THIO_HUMAN 0 65 DBREF 1M7T A 67 106 SWS P00274 THIO_ECOLI 68 107

Database cross references

REMARK 210 REMARK 210 BEST REPRESENTATIVE CONFORMER IN THIS ENSEMBLE : 21 REMARK 210

Coordinates section

ATOM 1 N ARG A 2 26.318 -8.010 39.090 1.00 20.71 N ANISOU 1 N ARG A 2 2040 3071 2755 114 -339 -393 N ATOM 2 CA ARG A 2 25.150 -8.702 38.505 1.00 18.85 C ANISOU 2 CA ARG A 2 2029 2677 2455 67 -321 -209 C ATOM 3 C ARG A 2 24.846 -8.176 37.123 1.00 17.23 C ANISOU 3 C ARG A 2 1689 2429 2429 143 -282 -258 C ATOM 4 O ARG A 2 25.151 -7.048 36.775 1.00 18.14 O . .TER 7215 GLY A 456 ATOM 7216 N ARG B 2 -19.423 25.709 6.980 1.00 21.57 N ANISOU 7216 N ARG B 2 2476 3012 2707 -165 -370 95 N ATOM 7217 CA ARG B 2 -18.718 26.510 8.024 1.00 19.01 C ANISOU 7217 CA ARG B 2 2127 2672 2424 -63 -285 91 C ATOM 7218 C ARG B 2 -17.250 26.207 8.002 1.00 17.22 C ANISOU 7218 C ARG B 2 1955 2392 2196 -91 -299 121 C ATOM 7219 O ARG B 2 -16.851 25.158 7.535 1.00 18.15 O

Data section

TER 14289 GLY B 456 HETATM14290 C ACT 1866 -13.075 1.733 10.218 1.00 27.25 C ANISOU14290 C ACT 1866 3493 3560 3299 -39 -36 -44 C . .CONECT14290142911429214293CONECT1429114290CONECT1429214290TER . .CONECT1469014663MASTER 389 0 15 46 38 0 0 620280 2 401 72 END

Data section, cont.

MODEL 1 ATOM 1 N MET A 1 3.110 -4.682 -3.025 1.00 0.00 N ATOM 2 CA MET A 1 2.546 -3.712 -2.053 1.00 0.00 C ATOM 3 C MET A 1 1.134 -3.295 -2.450 1.00 0.00 C ATOM 4 O MET A 1 0.882 -2.130 -2.758 1.00 0.00 O ATOM 5 CB MET A 1 3.466 -2.491 -2.002 1.00 0.00 C ATOM 6 CG MET A 1 3.781 -1.903 -3.370 1.00 0.00 C ATOM 7 SD MET A 1 4.256 -0.166 -3.285 1.00 0.00 S ATOM 8 CE MET A 1 6.004 -0.307 -2.920 1.00 0.00 C ATOM 9 1H MET A 1 2.906 -4.327 -3.980 1.00 0.00 H ATOM 10 2H MET A 1 2.650 -5.601 -2.859 1.00 0.00 H ATOM 11 3H MET A 1 4.134 -4.738 -2.858 1.00 0.00 H ATOM 12 HA MET A 1 2.517 -4.178 -1.079 1.00 0.00 H ATOM 13 1HB MET A 1 2.996 -1.724 -1.405 1.00 0.00 H ATOM 14 2HB MET A 1 4.397 -2.778 -1.536 1.00 0.00 H ATOM 15 1HG MET A 1 4.596 -2.461 -3.807 1.00 0.00 H ATOM 16 2HG MET A 1 2.907 -1.993 -3.998 1.00 0.00 H ATOM 17 1HE MET A 1 6.344 -1.302 -3.167 1.00 0.00 H ATOM 18 2HE MET A 1 6.169 -0.120 -1.869 1.00 0.00 H ATOM 19 3HE MET A 1 6.553 0.416 -3.505 1.00 0.00 H ATOM 20 N VAL A 2 0.215 -4.256 -2.446 1.00 0.00 N

Data section, cont.

TER 1659 VAL A 107 ENDMDL MODEL 2 ATOM 1 N MET A 1 2.750 -6.779 -1.627 1.00 0.00 N ATOM 2 CA MET A 1 2.487 -5.475 -2.290 1.00 0.00 C . . .TER 1660 VAL A 107 ENDMDL

Data section, cont.

my ( $X, $Y, $Z ) = ( substr( $_, 30, 8 ), substr( $_, 38, 8 ),

substr( $_, 46, 8 ) );

Extracting 3D co-ordinate data

#! /usr/bin/perl -w

# simple_coord_extract <PDB File> - Demonstrates the extraction of # C-Alpha co-ordinates from a PDB # data-file.

use strict;

while ( <> ){ if ( /^ATOM/ && substr( $_, 13, 4 ) eq "CA " ) { my ( $X, $Y, $Z ) = ( substr( $_, 30, 8 ), substr( $_, 38, 8 ), substr( $_, 46, 8 ) );

$X =~ s/ //g; $Y =~ s/ //g; $Z =~ s/ //g;

print "X, Y & Z: $X, $Y, $Z\n"; }}

The simple_coord_extract program

X, Y & Z: 25.150, -8.702, 38.505X, Y & Z: 23.675, -8.497, 35.069X, Y & Z: 20.747, -6.252, 34.332X, Y & Z: 17.545, -8.297, 34.292X, Y & Z: 15.182, -7.484, 31.454X, Y & Z: 11.736, -8.952, 30.942X, Y & Z: 10.261, -9.014, 27.451X, Y & Z: 6.507, -9.548, 27.173

Results from simple_coord_extract ...

The graphic image contact map

figCONTACTMAP.eps

STRIDE: Secondary Structure Assignment

Maxim 10.2

It is often easier and desirable to regenerate database annotation than trawl through entries

reconstituting the annotation using custom code.

$ tar -zxvf stride.tar.gz

$ cd stride

$ make

$ ./stride

Installation of STRIDE

Assigning Secondary Structures

Simplified definition of a Hydrogen Bond

figSIMPLIFIED.eps

Example of Secondary Structure Elements in Proteins

figSSDEMO.eps

Definition of Dihedral angles in the backbone of protein structures

figPSIPSI.eps

$ ./stride

You must specify input file

Action: secondary structure assignmentUsage: stride [Options] InputFile [ > file ]Options: -f File Output file -mFile MolScript file -o Report secondary structure summary Only -h Report Hydrogen bonds -rId1Id2.. Read only chains Id1, Id2 ... -cId1Id2.. Process only Chains Id1, Id2 ... -q[File] Generate SeQuence file in FASTA format and die

Options are position and case insensitive

$ stride -cA 1lqt.pdb

Using STRIDE and parsing the output

$ gawk '/^ASG/ {print $8 " " $9}' 1lqt.A.stride

360.00 156.52-75.72 161.36-71.26 145.24-111.08 119.10-118.65 131.78 . .

$ gawk '(/^ASG/ && /Strand/) {print $8 " " $9}' 1lqt.A.stride

$ gawk '(/^ASG/ && /AlphaHelix/) {print $8 " " $9}' 1lqt.A.stride

Using gawk ...

Ramachandran Plot of dihedral angles of chain A from 1LQT

fig1LQTPHIPSI.eps

$ stride -q 1lqt.pdb

>1lqt.pdb A 452 1.050RPYYIAIVGSGPSAFFAAASLLKAADTTEDLDMAVDMLEMLPTPWGLVRSGVAPDHPKIK . .>1lqt.pdb B 454 1.050RPYYIAIVGSGPSAFFAAASLLKAADTTEDLDMAVDMLEMLPTPWGLVRSGVAPDHPKIK . .

$ stride -cA -q 1lqt.pdb

>1lqt.pdb A 452 1.050RPYYIAIVGSGPSAFFAAASLLKAADTTEDLDMAVDMLEMLPTPWGLVRSGVAPDHPKIK . . .

Extracting amino acid sequences using STRIDE

Introducing The mmCIF Protein Format

Converting mmCIF

● Converting mmCIF to PDB● Converting mmCIFs to PDB with CIFTr

$ cd$ tar -zxvf ciftr-v2.0-linux.tar.gz$ cd ciftr-v2.0-linux/$ setenv RCSBROOT ~/ciftr-v2.0-linux$ export RCSBROOT = ~/ciftr-v2.0-linux

$ ./CIFTr -i 1lqt.cif

The CIFTr program

More on mmCIF

● Problems with the CIFTr conversion● Some advice on using mmCIF● Automated conversion of mmCIF to PDB

Where To From Here