日本結晶学会年会2011年、北大学術交流会館 PDBjランチョン ... · atom 4 o o...
Transcript of 日本結晶学会年会2011年、北大学術交流会館 PDBjランチョン ... · atom 4 o o...
PDBj (PDB j )日本結晶学会 年会2011年、北大学術交流会館PDBjランチョンセミナー 2011年11月25日
PDBj (PDB japan)日本蛋白質構造デ タバンク日本蛋白質構造データバンク
Haruki NakamuraHaruki Nakamura中村 春木中村 春木
Institute for Protein Research大阪大学蛋白質研究所
http://pdbj.org/ttp://pdbj.o g/http://wwpdb.org/
EMBL‐EBI,
NSF, NIGMS, DOE, NLM, NCI, NINDS, NIDDK
Wellcome Trust, BBSRC, NIGMS, EU
NLM
NBDC‐JST
Kleywegt, G Berman, HM
Markley, JL Nakamura, H
wwPDBwwPDB and and wwPDBACwwPDBAC members members at EBI, at EBI, HinxtonHinxton, on Sept. 30 2011, on Sept. 30 2011
Collaboration b t PDB d CSDbetween wwPDB and CSD
G. J. Kleywegt J. L. Markley C. Gloom H. M. Berman H. Nakamuraat EBI on September 29, 2011
wwPDBにおける国際協力 RutgersUniv.
UCSD NIST
1) データ編集・登録作業を、wwPDBのメンバ で協力しながら実施する
UCSD NISTRCSB
BMRBのメンバーで協力しながら実施する。
2) 唯 のデ タア カイブを持ち
PDBjEBI
2) 唯一のデータアーカイブを持ち、米国のRCSB-PDBがアーカイブ・キーパーとして書き込み権限をもつ権限をもつ。
3) データ・フォーマットや新たな記述法については3) デ タ・フォ マットや新たな記述法については、各メンバー間内の討議により決定する。(V3.2, 4.0)
4) 各メンバーは、各々独自のデータ・ブラウザ、ビューア、検索ツール、Web サービスを開発することが期待される。
(Berman, Henrick & Nakamura (2003) Nat. Struct. Biol. 10, 980)
検 、 開発す 期待 。
日本蛋白質構造データバンク:PDBj日本蛋白質構造デ タバンク:PDBj
1 国際蛋白質構造データバンク(wwPDB)の運営1.国際蛋白質構造データバンク(wwPDB)の運営
2.蛋白質立体構造データベース登録作業
3.蛋白質構造情報の記述フォーマット(v3.2, 4.0, mmCIF PDBML PDB/RDF)の開発とその応用mmCIF, PDBML, PDB/RDF)の開発とその応用
4.蛋白質構造解析実験および蛋白質機能に関する文献情報 付加る文献情報の付加
5 蛋白質立体構造に関する新規二次データベー5.蛋白質立体構造に関する新規二次デ タベスの構築と解析ツールの開発
PDBj: Protein Data Bank Japan (2011)
Data Processed at PDBj and wwPDBData Processed at PDBj and wwPDBr
num
ber Processed total at PDBj (17,852, Nov 2011)
All available PDB data (77,394, Nov 2011)
Dat
a n
cess
edPr
o
PDBj curates and processes about a Quarter of the
year
PDBj curates and processes about a Quarter of the deposited data, mainly from Asian and Oceania regions.
Get Entry Data from our browserAccess to http://pdbj.org/
12as
PDBID (e.g. 12as) or Key-word is input Summary for each PDBID is in a box and GO displayed.
Get Entry Data from our browser, PDBj MineK d h i JKeyword search in Japanese
アスパラギン合成酵素アスパラギン合成酵素
htt // dbj / A i id (FASTA)
Summary for each PDBIDhttp://pdbj.org/ Amino acid sequence (FASTA)
Data viewer at PDBj
Molecular surface DB: eF-sitehttp://ef-site hgc jp/eF-site/
Graphic viewer: jVhttp://pdbj.org/jV/
http://ef site.hgc.jp/eF site/
Annotation of Protein Function from MolecularAnnotation of Protein Function from Molecular Surface Similarity: eF-site / eF-seek
S f S f S S fProtein Molecular Surface DB Search for Similar Surface
Vi i F ld d D i @ PDBjViewing Folds and Dynamics @ PDBj
Protein Globe: Protein Folds Browser ProMode: Protein Dynamics Database by NMA
Vi i b th EM I d At i St tViewing both EM Image and Atomic Strcuture
EM Navigator: Viewer of Images of EM-DB
Yorodumi: Viewer of both Image and Atomic Structure
PDB/RDF f S ti W bPDB/RDF for Semantic Web (Recently developed by PDBj: Akira R. Kinjo et al.)
http://pdbj org/rdfhttp://pdbj.org/rdf
PDB data in the Resource Description FrameworkPDB data in the Resource Description Framework (RDF) format for the Semantic Web.
PDB/RDF exampleBy accessing http://pdbj.org/rdf/1GOF, a list of category holders for the PDBa list of category holders for the PDB entry 1GOF can be retrieved in the RDF/XML format. Then, a list of category elements canThen, a list of category elements can be retrieved (again in the RDF/XML format). Finally, the a particular category y p g yelement, the list of properties of that element is retrieved.
Example of an RDF graph
A subgraph of the left networkThe network of RDF resources for g paugmented with literal objectsthe PDB entry 1GOF.
Development of other Databases and Services
Homolog protein search, Similar fold search, Function Annotation from Sequence Navigator(Standley)
Structure Navigator(Standley & Toh)
Folds and Sequences, SeSAW (Standley)
Encyclopedia of Protein Structures, eProtS(Kinjyo, Kudo, & Ito)
Alignment of Sequence and Structures. MAFFTash(Kato. Toh & Standley)
Hybrid-template modeling Spanner (Lis, Standley, & Nakamura)
http://biosciencedbc jp/PDBj is a member of NBDC, Japan
http://biosciencedbc.jp/
Roadmap for Foundation of National Bioscience Database Center (NBDC)
Preparation 1st Stage 2nd StageFY2008 FY2011 FY2014
Task Oth Lif S i DB fForce in CO
CSTP Other Life Science DBs fromMETI, MHLW, and MAFF
IntegratedDatabase Project
National Bioscience Database Center
National Bioscience Database Centerj
BIRD-JST
Database Center(as an organization of JST governed by MEXT)
Database Center (as an organization of Cabinet Office)
PDBj PDBj
Organization of National Bioscience gDatabase Center (NBDC)
CSTP in Cabinet Office
Life Science Project Team in CSTP
Headquarter of NBDC in CSTP
AdvisoryCommitteeTeam in CSTP in CSTP Committee
Director of National Bioscience Database Center JST
National Bioscience Database CenterNational Bioscience Database Center DBs
DB DB DB DBDBDBDB
Issue about PDB formatIssue about PDB format1) PDB format is still the most widely used ) y
format
2) H h l li i i2) However, there are several limitations, which become problematic more and more
・For Large and Complex moleculesFor Large and Complex molecules・For Meta-data description・・ …..3) PDBx (mmCIF) or PDBML are for 3) ( C ) o a e omachines, and not very human readable.
Size Matters• PDB column format limitations
1 character for polymer chain labels– 1-character for polymer chain labels – 5-characters for atom serial numbers– 3-characters for monomers and ligand identifiers
5 characters for atom names– 5-characters for atom names – F8.3 for model coordinates
• Implications –– Maximum of 62 chains (upper and lower case!)– Maximum of 99,999 atoms
• Requires splitting structures across multiple entries (5Requires splitting structures across multiple entries (5 ribosomes in ASU stored in 10 PDB entries!)
• Map and experimental validation are difficult for split entries– Cannot use standard monomer & ligand nomenclaturesCannot use standard monomer & ligand nomenclatures
(e.g. carbohydrates & protonation variants)– Cannot use conventional atom names in large ligands
Limits molecular dimension (< 9999 999 Angstroms)– Limits molecular dimension (< 9999.999 Angstroms)
Example: Vault by Kato et al. (2zuo, 2zv4, 2zv5)
Example: Vault by Kato et al. (2zuo, 2zv4, 2zv5)
Example: Vault by Kato et al. (2zuo, 2zv4, 2zv5)
Refinement Details ExampleREMARK 3 DATA USED IN REFINEMENT. REMARK 3 RESOLUTION RANGE HIGH (ANGSTROMS) : 1.57 REMARK 3 RESOLUTION RANGE LOW (ANGSTROMS) : 23.00 REMARK 3 DATA CUTOFF (SIGMA(F)) : 0.000
PDB( ( ))
REMARK 3 COMPLETENESS FOR RANGE (%) : NULL REMARK 3 NUMBER OF REFLECTIONS : 43316 REMARK 3 REMARK 3 FIT TO DATA USED IN REFINEMENT. REMARK 3 CROSS-VALIDATION METHOD : NULL REMARK 3 FREE R VALUE TEST SET SELECTION : NULLREMARK 3 FREE R VALUE TEST SET SELECTION : NULL REMARK 3 R VALUE (WORKING + TEST SET) : NULL REMARK 3 R VALUE (WORKING SET) : 0.191 REMARK 3 FREE R VALUE : 0.221 REMARK 3 FREE R VALUE TEST SET SIZE (%) : NULL REMARK 3 FREE R VALUE TEST SET COUNT : 2189 REMARK 3
_refine.entry_id 1XBB _refine.ls_d_res_high 1.57 refine.ls d res low 23.00
mmCIF_refine.ls_d_res_low 23.00 _refine.pdbx_ls_sigma_F 0.0_refine.ls_percent_reflns_obs ? _refine.ls_number_reflns_obs 43316
_refine.pdbx_ls_cross_valid_method ? fi db R F l ti d t il ?_refine.pdbx_R_Free_selection_details ?
_refine.ls_R_factor_obs ? _refine.ls_R_factor_R_work 0.191 _refine.ls_R_factor_R_free 0.221 _refine.ls_percent_reflns_R_free ? _refine.ls_number_reflns_R_free 2189 _ _ _ _ _
New PDB format• Internal data architecture of the wwPDB
New PDB formatInternal data architecture of the PDB is built on mmCIF dictionary technology
wwPDBAnnotation
• There will be continued support for mmCIF/PDBx and PDBML
PDBx/mmCIF
archival formats• There is a critical need to adopt
i l lt ti t th t
SimplifiedFormat
NewFormat PDBML
Current PDB Format
a simple alternative to the current PDB format
Format
Method Specific Atom Records Atom Records Retain Familiar Look and Feel!!BEGIN TABLE DATA atom site xray (record name atom id atom name element residue name chain id residue sequence number ¥!!BEGIN TABLE DATA atom site xray (record name atom id atom name element residue name chain id residue sequence number ¥!!BEGIN_TABLE_DATA atom_site_xray (record_name,atom_id,atom_name,element,residue_name,chain_id,residue_sequence_number,¥insertion_code,Cartn_x,Cartn_y,Cartn_z,formal_charge,model_number,group_id,occupancy,alternate_location,B_isotropic)
ATOM 1 N N ARG A 4 _ 16.757 8.703 13.450 0 1 1 0.620 _ 21.740ATOM 2 CA C ARG A 4 _ 17.014 7.699 12.351 0 1 1 0.620 _ 18.890ATOM 3 C C ARG A 4 _ 16.929 8.224 10.937 0 1 1 0.620 _ 18.590ATOM 4 O O ARG A 4 17.177 7.506 9.922 0 1 1 0.620 16.720
!!BEGIN_TABLE_DATA atom_site_xray (record_name,atom_id,atom_name,element,residue_name,chain_id,residue_sequence_number,¥insertion_code,Cartn_x,Cartn_y,Cartn_z,formal_charge,model_number,group_id,occupancy,alternate_location,B_isotropic)
ATOM 1 N N ARG A 4 _ 16.757 8.703 13.450 0 1 1 0.620 _ 21.740ATOM 2 CA C ARG A 4 _ 17.014 7.699 12.351 0 1 1 0.620 _ 18.890ATOM 3 C C ARG A 4 _ 16.929 8.224 10.937 0 1 1 0.620 _ 18.590ATOM 4 O O ARG A 4 17.177 7.506 9.922 0 1 1 0.620 16.720_ _ATOM 5 CB C ARG A 4 _ 16.060 6.546 12.613 0 1 1 0.620 _ 20.010ATOM 6 CG C ARG A 4 _ 16.429 5.815 13.916 0 1 1 0.620 _ 22.740ATOM 7 CD C ARG A 4 _ 15.282 4.910 14.340 0 1 1 0.620 _ 24.130ATOM 8 NE N ARG A 4 _ 15.103 3.825 13.351 0 1 1 0.620 _ 22.450ATOM 9 CZ C ARG A 4 _ 14.137 2.914 13.579 0 1 1 0.620 _ 24.510
_ _ATOM 5 CB C ARG A 4 _ 16.060 6.546 12.613 0 1 1 0.620 _ 20.010ATOM 6 CG C ARG A 4 _ 16.429 5.815 13.916 0 1 1 0.620 _ 22.740ATOM 7 CD C ARG A 4 _ 15.282 4.910 14.340 0 1 1 0.620 _ 24.130ATOM 8 NE N ARG A 4 _ 15.103 3.825 13.351 0 1 1 0.620 _ 22.450ATOM 9 CZ C ARG A 4 _ 14.137 2.914 13.579 0 1 1 0.620 _ 24.510ATOM 10 NH1 N ARG A 4 _ 13.382 3.040 14.677 0 1 1 0.620 _ 24.220ATOM 11 NH2 N ARG A 4 _ 13.920 1.913 12.748 0 1 1 0.620 _ 24.130ATOM 12 N N LYS A 5 _ 16.357 9.426 10.835 0 1 1 0.620 _ 19.660ATOM 13 CA C LYS A 5 _ 16.206 10.152 9.575 0 1 1 0.620 _ 21.670ATOM 14 C C LYS A 5 _ 15.607 9.229 8.538 0 1 1 0.620 _ 20.880
ATOM 10 NH1 N ARG A 4 _ 13.382 3.040 14.677 0 1 1 0.620 _ 24.220ATOM 11 NH2 N ARG A 4 _ 13.920 1.913 12.748 0 1 1 0.620 _ 24.130ATOM 12 N N LYS A 5 _ 16.357 9.426 10.835 0 1 1 0.620 _ 19.660ATOM 13 CA C LYS A 5 _ 16.206 10.152 9.575 0 1 1 0.620 _ 21.670ATOM 14 C C LYS A 5 _ 15.607 9.229 8.538 0 1 1 0.620 _ 20.880ATOM 15 O O LYS A 5 _ 16.057 8.995 7.416 0 1 1 0.620 _ 20.640ATOM 16 CB C LYS A 5 _ 17.506 10.742 9.045 0 1 1 0.620 _ 23.450ATOM 17 CG C LYS A 5 _ 17.959 11.915 9.911 0 1 1 0.620 _ 26.960ATOM 18 CD C LYS A 5 _ 19.253 12.533 9.415 0 1 1 0.620 _ 30.300ATOM 19 CE C LYS A 5 _ 20.441 11.595 9.492 0 1 1 0.620 _ 31.990ATOM 20 NZ N LYS A 5 21 676 12 193 8 914 0 1 1 0 620 33 340
ATOM 15 O O LYS A 5 _ 16.057 8.995 7.416 0 1 1 0.620 _ 20.640ATOM 16 CB C LYS A 5 _ 17.506 10.742 9.045 0 1 1 0.620 _ 23.450ATOM 17 CG C LYS A 5 _ 17.959 11.915 9.911 0 1 1 0.620 _ 26.960ATOM 18 CD C LYS A 5 _ 19.253 12.533 9.415 0 1 1 0.620 _ 30.300ATOM 19 CE C LYS A 5 _ 20.441 11.595 9.492 0 1 1 0.620 _ 31.990ATOM 20 NZ N LYS A 5 21 676 12 193 8 914 0 1 1 0 620 33 340ATOM 20 NZ N LYS A 5 _ 21.676 12.193 8.914 0 1 1 0.620 _ 33.340!!END_TABLE_DATAATOM 20 NZ N LYS A 5 _ 21.676 12.193 8.914 0 1 1 0.620 _ 33.340!!END_TABLE_DATA
StandardMethod Specific
We would appreciate any comments to the new PDB formatcomments to the new PDB format
Pl d t tPlease send your comments to
http://pdbj.org