Post on 21-Dec-2015
REMARK 2 REMARK 2 RESOLUTION. 1.05 ANGSTROMS. REMARK 215 NMR STUDY REMARK 215 THE COORDINATES IN THIS ENTRY WERE GENERATED FROM SOLUTION REMARK 215 NMR DATA. PROTEIN DATA BANK CONVENTIONS REQUIRE THAT REMARK 215 CRYST1 AND SCALE RECORDS BE INCLUDED, BUT THE VALUES ON REMARK 215 THESE RECORDS ARE MEANINGLESS.
Example PDB data-file
. . .REMARK 3 FIT TO DATA USED IN REFINEMENT. REMARK 3 CROSS-VALIDATION METHOD : THROUGHOUT REMARK 3 FREE R VALUE TEST SET SELECTION : RANDOM REMARK 3 R VALUE (WORKING + TEST SET) : 0.134 REMARK 3 R VALUE (WORKING SET) : 0.134 REMARK 3 FREE R VALUE : 0.153 REMARK 3 FREE R VALUE TEST SET SIZE (%) : NULL REMARK 3 FREE R VALUE TEST SET COUNT : 2200 . . .
Example PDB data-file, cont.
DBREF 1LQT A 1 456 GB 13882996 AAK47528 1 456 DBREF 1LQT B 1 456 GB 13882996 AAK47528 1 456
DBREF 1AFI 1 72 SWS P04129 MERP_SHIFL 20 91
DBREF 1M7T A 1 66 SWS P10599 THIO_HUMAN 0 65 DBREF 1M7T A 67 106 SWS P00274 THIO_ECOLI 68 107
Database cross references
REMARK 210 REMARK 210 BEST REPRESENTATIVE CONFORMER IN THIS ENSEMBLE : 21 REMARK 210
Coordinates section
ATOM 1 N ARG A 2 26.318 -8.010 39.090 1.00 20.71 N ANISOU 1 N ARG A 2 2040 3071 2755 114 -339 -393 N ATOM 2 CA ARG A 2 25.150 -8.702 38.505 1.00 18.85 C ANISOU 2 CA ARG A 2 2029 2677 2455 67 -321 -209 C ATOM 3 C ARG A 2 24.846 -8.176 37.123 1.00 17.23 C ANISOU 3 C ARG A 2 1689 2429 2429 143 -282 -258 C ATOM 4 O ARG A 2 25.151 -7.048 36.775 1.00 18.14 O . .TER 7215 GLY A 456 ATOM 7216 N ARG B 2 -19.423 25.709 6.980 1.00 21.57 N ANISOU 7216 N ARG B 2 2476 3012 2707 -165 -370 95 N ATOM 7217 CA ARG B 2 -18.718 26.510 8.024 1.00 19.01 C ANISOU 7217 CA ARG B 2 2127 2672 2424 -63 -285 91 C ATOM 7218 C ARG B 2 -17.250 26.207 8.002 1.00 17.22 C ANISOU 7218 C ARG B 2 1955 2392 2196 -91 -299 121 C ATOM 7219 O ARG B 2 -16.851 25.158 7.535 1.00 18.15 O
Data section
TER 14289 GLY B 456 HETATM14290 C ACT 1866 -13.075 1.733 10.218 1.00 27.25 C ANISOU14290 C ACT 1866 3493 3560 3299 -39 -36 -44 C . .CONECT14290142911429214293CONECT1429114290CONECT1429214290TER . .CONECT1469014663MASTER 389 0 15 46 38 0 0 620280 2 401 72 END
Data section, cont.
MODEL 1 ATOM 1 N MET A 1 3.110 -4.682 -3.025 1.00 0.00 N ATOM 2 CA MET A 1 2.546 -3.712 -2.053 1.00 0.00 C ATOM 3 C MET A 1 1.134 -3.295 -2.450 1.00 0.00 C ATOM 4 O MET A 1 0.882 -2.130 -2.758 1.00 0.00 O ATOM 5 CB MET A 1 3.466 -2.491 -2.002 1.00 0.00 C ATOM 6 CG MET A 1 3.781 -1.903 -3.370 1.00 0.00 C ATOM 7 SD MET A 1 4.256 -0.166 -3.285 1.00 0.00 S ATOM 8 CE MET A 1 6.004 -0.307 -2.920 1.00 0.00 C ATOM 9 1H MET A 1 2.906 -4.327 -3.980 1.00 0.00 H ATOM 10 2H MET A 1 2.650 -5.601 -2.859 1.00 0.00 H ATOM 11 3H MET A 1 4.134 -4.738 -2.858 1.00 0.00 H ATOM 12 HA MET A 1 2.517 -4.178 -1.079 1.00 0.00 H ATOM 13 1HB MET A 1 2.996 -1.724 -1.405 1.00 0.00 H ATOM 14 2HB MET A 1 4.397 -2.778 -1.536 1.00 0.00 H ATOM 15 1HG MET A 1 4.596 -2.461 -3.807 1.00 0.00 H ATOM 16 2HG MET A 1 2.907 -1.993 -3.998 1.00 0.00 H ATOM 17 1HE MET A 1 6.344 -1.302 -3.167 1.00 0.00 H ATOM 18 2HE MET A 1 6.169 -0.120 -1.869 1.00 0.00 H ATOM 19 3HE MET A 1 6.553 0.416 -3.505 1.00 0.00 H ATOM 20 N VAL A 2 0.215 -4.256 -2.446 1.00 0.00 N
Data section, cont.
TER 1659 VAL A 107 ENDMDL MODEL 2 ATOM 1 N MET A 1 2.750 -6.779 -1.627 1.00 0.00 N ATOM 2 CA MET A 1 2.487 -5.475 -2.290 1.00 0.00 C . . .TER 1660 VAL A 107 ENDMDL
Data section, cont.
my ( $X, $Y, $Z ) = ( substr( $_, 30, 8 ), substr( $_, 38, 8 ),
substr( $_, 46, 8 ) );
Extracting 3D co-ordinate data
#! /usr/bin/perl -w
# simple_coord_extract <PDB File> - Demonstrates the extraction of # C-Alpha co-ordinates from a PDB # data-file.
use strict;
while ( <> ){ if ( /^ATOM/ && substr( $_, 13, 4 ) eq "CA " ) { my ( $X, $Y, $Z ) = ( substr( $_, 30, 8 ), substr( $_, 38, 8 ), substr( $_, 46, 8 ) );
$X =~ s/ //g; $Y =~ s/ //g; $Z =~ s/ //g;
print "X, Y & Z: $X, $Y, $Z\n"; }}
The simple_coord_extract program
X, Y & Z: 25.150, -8.702, 38.505X, Y & Z: 23.675, -8.497, 35.069X, Y & Z: 20.747, -6.252, 34.332X, Y & Z: 17.545, -8.297, 34.292X, Y & Z: 15.182, -7.484, 31.454X, Y & Z: 11.736, -8.952, 30.942X, Y & Z: 10.261, -9.014, 27.451X, Y & Z: 6.507, -9.548, 27.173
Results from simple_coord_extract ...
Maxim 10.2
It is often easier and desirable to regenerate database annotation than trawl through entries
reconstituting the annotation using custom code.
$ ./stride
You must specify input file
Action: secondary structure assignmentUsage: stride [Options] InputFile [ > file ]Options: -f File Output file -mFile MolScript file -o Report secondary structure summary Only -h Report Hydrogen bonds -rId1Id2.. Read only chains Id1, Id2 ... -cId1Id2.. Process only Chains Id1, Id2 ... -q[File] Generate SeQuence file in FASTA format and die
Options are position and case insensitive
$ stride -cA 1lqt.pdb
Using STRIDE and parsing the output
$ gawk '/^ASG/ {print $8 " " $9}' 1lqt.A.stride
360.00 156.52-75.72 161.36-71.26 145.24-111.08 119.10-118.65 131.78 . .
$ gawk '(/^ASG/ && /Strand/) {print $8 " " $9}' 1lqt.A.stride
$ gawk '(/^ASG/ && /AlphaHelix/) {print $8 " " $9}' 1lqt.A.stride
Using gawk ...
$ stride -q 1lqt.pdb
>1lqt.pdb A 452 1.050RPYYIAIVGSGPSAFFAAASLLKAADTTEDLDMAVDMLEMLPTPWGLVRSGVAPDHPKIK . .>1lqt.pdb B 454 1.050RPYYIAIVGSGPSAFFAAASLLKAADTTEDLDMAVDMLEMLPTPWGLVRSGVAPDHPKIK . .
$ stride -cA -q 1lqt.pdb
>1lqt.pdb A 452 1.050RPYYIAIVGSGPSAFFAAASLLKAADTTEDLDMAVDMLEMLPTPWGLVRSGVAPDHPKIK . . .
Extracting amino acid sequences using STRIDE
$ cd$ tar -zxvf ciftr-v2.0-linux.tar.gz$ cd ciftr-v2.0-linux/$ setenv RCSBROOT ~/ciftr-v2.0-linux$ export RCSBROOT = ~/ciftr-v2.0-linux
$ ./CIFTr -i 1lqt.cif
The CIFTr program
More on mmCIF
● Problems with the CIFTr conversion● Some advice on using mmCIF● Automated conversion of mmCIF to PDB