Macromolecular Structure Database Project EMSD Infra-structure Services for Europe To develop an...

21
Macromolecular Structure Database Project EMSD Infra-structure Services for Europe To develop an autonomous structural database capability in Europe http://www.ebi.ac.uk/msd

Transcript of Macromolecular Structure Database Project EMSD Infra-structure Services for Europe To develop an...

Page 1: Macromolecular Structure Database Project EMSD Infra-structure Services for Europe To develop an autonomous structural database capability in Europe .

Macromolecular Structure Database ProjectEMSD

Infra-structure Services forEurope To develop an autonomous structural

database capability in Europe

http://www.ebi.ac.uk/msd

Page 2: Macromolecular Structure Database Project EMSD Infra-structure Services for Europe To develop an autonomous structural database capability in Europe .

Temblor

EBI-MSD

Spine

Oxford

Autostruct

York

NMRQual

Utrecht

EMBL WellcomeTrust

CCPN

Cambridge

EHTPX

Daresbury

BBSRC

CCP4

IIMS

EBI-MSD

EU

MRC

Integration

Sanger InstSCOP CATH pfam

harvesting

E-science

Advanced search

CLRC

EU

EU

EU

EU

BBSRC

USA

Data Exchange

BMRBRCSB

Validation

Structural Genomics

Electron Microscopy

Grant & co-ordinator

Grant Funding

Core Funding

Data Exchange

Page 3: Macromolecular Structure Database Project EMSD Infra-structure Services for Europe To develop an autonomous structural database capability in Europe .

clean biological data

integrated data

a single web access point

query interfaces for different users

interconnected views of the data relating structure, sequence, text & experimental details

E-MSD Provides

Page 4: Macromolecular Structure Database Project EMSD Infra-structure Services for Europe To develop an autonomous structural database capability in Europe .

                                             

SwissProt

Medline

Active Sites

Ligands

Folds- Scop/Dali

Secondary Struct

PDB

Ligand

Active site

Structure

Sequence

Keyword

Search Query

Sorted

Hit List

Atlas

page

Structure

Sequence

Active Site

Expt data

Query Results and

Interactive viewer

Web Interface

For Biologist, Chemist, Structural Biologist, Teacher

SSMFastA

Methods

Page 5: Macromolecular Structure Database Project EMSD Infra-structure Services for Europe To develop an autonomous structural database capability in Europe .

Web services

Data API’s

Methods - as web servicesSwissProt

Medline

Active Sites

Ligands

Folds- Scop/Dali

Secondary Struct

PDB

SSMFastA

Methods

Page 6: Macromolecular Structure Database Project EMSD Infra-structure Services for Europe To develop an autonomous structural database capability in Europe .

Web based pages

Search interfaces

Interactive Visualisation

Page 7: Macromolecular Structure Database Project EMSD Infra-structure Services for Europe To develop an autonomous structural database capability in Europe .

DATA INTEGRATIONDATA INTEGRATION

Page 8: Macromolecular Structure Database Project EMSD Infra-structure Services for Europe To develop an autonomous structural database capability in Europe .

A Database for all ?

MSD SEARCH DATABASE

Page 9: Macromolecular Structure Database Project EMSD Infra-structure Services for Europe To develop an autonomous structural database capability in Europe .

Data integration

We want to include all types of biological data

Structure, Sequence, Textual Observed biochemistry (Brenda) Sequence annotation (Prints) DNA - ORFS, SNIPS

But we can’t do everything ! So can the Grid allow the integration

of data from other sources ? SwissProt

Medline

Active Sites

Ligands

Folds- Scop/Dali

Secondary Struct

PDB

Page 10: Macromolecular Structure Database Project EMSD Infra-structure Services for Europe To develop an autonomous structural database capability in Europe .

Problems for Grid (1- Provenance)

We are a funded institute. We have to be seen to be useful or we do not get funded !

Industry need to be seen - share holders

Origin of the Distributed information: User and funding body need to see who provided the

information. How do we retain and present detail of this ?

Page 11: Macromolecular Structure Database Project EMSD Infra-structure Services for Europe To develop an autonomous structural database capability in Europe .

Problem for Grid (2)

We do not know “best practice” in much of biology Methods : structure alignment, secondary structure… Data : multiple coordinates, multiple sequence data….

There will be conflict of information Data/methods have associated validity information - the

different data/methods may be only inconsistent in part. How is conflicting information going to be presented to

and filtered for a user Who is going to assign data validity !

Page 12: Macromolecular Structure Database Project EMSD Infra-structure Services for Europe To develop an autonomous structural database capability in Europe .

Grid problem (3- Data access control)

Bioinformatics is fashionable at the moment. There is a “problem” when something is perceived to be useful eg : There are about 60,000 patents in the US for the ~30,000

human genes - not a problem yet, but….. This is more than data security :

Will Grid employ some good lawyers ? Will Grid hide information on request - cf PDB has “hold” status Will Grid “modify” information on request - cf. Google search

result order as been “updated”

Page 13: Macromolecular Structure Database Project EMSD Infra-structure Services for Europe To develop an autonomous structural database capability in Europe .

Summary

We want to be able to provide a scientific service Web pages and Web services

We would like to be able to expand the results to include information from other data resources.

The 3 issues are only a small number of issues, but represent fundamental problems

Page 14: Macromolecular Structure Database Project EMSD Infra-structure Services for Europe To develop an autonomous structural database capability in Europe .

CLEAN DATA : Quaternary structure

Chains

ResiduesAtomsXray Experiment

Assembly Sub-Assembly

Biology

Page 15: Macromolecular Structure Database Project EMSD Infra-structure Services for Europe To develop an autonomous structural database capability in Europe .

CLEAN DATA :Example of experimental result

Authors wouldknow structure,

we have to derive itat submission

M.BOCHTLER et al, NATURE, 403, 800 (2000)

Asymmetric unit

Page 16: Macromolecular Structure Database Project EMSD Infra-structure Services for Europe To develop an autonomous structural database capability in Europe .

Contains 3 separate molecules - 2 copies of a dodecamer and 1 hexamer

Hexamer Dodecamer

http://pqs.ebi.ac.uk

Assembly

Page 17: Macromolecular Structure Database Project EMSD Infra-structure Services for Europe To develop an autonomous structural database capability in Europe .

RESOLUTION

SLIDING SCALE FOR RULES

electron density at different resolutions - phenylalanine

Correctly placed into the 1.2 Å data.

This still can be done with confidence in the 2 Å case.

But at 3 Å we already observe a deviation of the centroid of the ring from the correct model

Clean data

Page 18: Macromolecular Structure Database Project EMSD Infra-structure Services for Europe To develop an autonomous structural database capability in Europe .

1qi3

1rmg

Zscore=(Fit-<Fit>)/sigmaA large positive spike is indicative of a residue which is worse

than the average for that residue type in structures of

similar resolutions.

1f83

Good

Terrible

Page 19: Macromolecular Structure Database Project EMSD Infra-structure Services for Europe To develop an autonomous structural database capability in Europe .

PHENYLALANINE

Geometric outliers

Page 20: Macromolecular Structure Database Project EMSD Infra-structure Services for Europe To develop an autonomous structural database capability in Europe .

Loader

LIGAND DB

Page 21: Macromolecular Structure Database Project EMSD Infra-structure Services for Europe To develop an autonomous structural database capability in Europe .

Site environment DB

Covalent Bonds Coordinate bonds Hydrogen bonds Planes Non-bonding Electrostatics Di-Sulphide bonds

PHE

PHE

O

N

S

ASP

VAL