BioSimGRID: A GRID Database of Biomolecular Simulations Mark S.P. Sansom [email protected].

23
BioSimGRID: A GRID Database of Biomolecular Simulations Mark S.P. Sansom http:// indigo1.biop.ox.ac.uk [email protected]

Transcript of BioSimGRID: A GRID Database of Biomolecular Simulations Mark S.P. Sansom [email protected].

Page 1: BioSimGRID: A GRID Database of Biomolecular Simulations Mark S.P. Sansom  mark@biop.ox.ac.uk.

BioSimGRID: A GRID Database of Biomolecular Simulations

Mark S.P. Sansomhttp://[email protected]

Page 2: BioSimGRID: A GRID Database of Biomolecular Simulations Mark S.P. Sansom  mark@biop.ox.ac.uk.

Overview

Introduction to biomolecular simulations

www.biosimgrid.org

Why?

Case study – added value from comparisons

How?

Progress towards a prototype of BioSimGRID

The future?

Towards computational systems biology

Page 3: BioSimGRID: A GRID Database of Biomolecular Simulations Mark S.P. Sansom  mark@biop.ox.ac.uk.

MD Simulations: from Structure to Dynamics

Molecular simulations as a tool for protein structure analysis

MD – Newtonian simulation of molecular dynamics using an empirical forcefield

Why? - Proteins move

X-ray structure: average structure at 100 K in crystal

MD simulations: dynamics at 300 K in water (& membrane)

Challenge: to relate structural dynamics to biological function

Page 4: BioSimGRID: A GRID Database of Biomolecular Simulations Mark S.P. Sansom  mark@biop.ox.ac.uk.

Molecular Dynamics

Describe the forces on all atoms:

bonded (bonds, angles, dihedrals)non-bonded (van der Waals, electrostatics)

Describe the initial atom positions: Integrate: F = ma (a few million times…) Result: positions and energies of all

atoms during a few nanoseconds Applications: liquids … peptides …

proteins … membranes Membrane + protein + water = ca.

50,000 atoms

Need for comparative analysis of simulations – GRID data and collaboration

Need for efficient parallelisation – clusters and/or HPC

Page 5: BioSimGRID: A GRID Database of Biomolecular Simulations Mark S.P. Sansom  mark@biop.ox.ac.uk.

Current Paradigm for MD Simulations

Target selection: literature based; interesting protein/problem

System preparation: highly interactive; slow; idiosyncratic

Simulation: diversity of protocols

Analysis: highly interactive; slow; idiosyncratic

Dissemination: traditional – papers, posters, talks

Archival: ‘archive’ data … and then mislay the tape!

Page 6: BioSimGRID: A GRID Database of Biomolecular Simulations Mark S.P. Sansom  mark@biop.ox.ac.uk.

Integrating Simulations and Structural Biology of Proteins

Novel structure(RCSB)

Sequence alignmentBiomedically relevant homologue(s)

Homology model(s)

MD simulationsBiomolecular simulation database

Comparative analysis

Evaluation/refinement of model

Biological and pharmacological simulation & modellinge.g. drug discovery

bacterial K channel

mammalian K channel

dynamics in membrane

drug docking calculations

Interaction site dynamics

bioi

nfo

rmat

ics

& s

tru

ctur

al

biol

ogy

Bio

Sim

GR

IDdr

ug

disc

over

y

Page 7: BioSimGRID: A GRID Database of Biomolecular Simulations Mark S.P. Sansom  mark@biop.ox.ac.uk.

Comparative Simulations: Drug Receptors

Why? – increase significance of results

Sampling – long simulations and multiple simulations

Sampling via biology – exploiting evolution

Biology emerges from comparisons…

e.g. mammalian receptor vs. bacterial binding protein

Rat GluR2 EC fragment Major receptor in mammalian

brains – drug target MD simulations with/without

bound ligands Analyse inter-domain motions

glutamate

S1

S2

Page 8: BioSimGRID: A GRID Database of Biomolecular Simulations Mark S.P. Sansom  mark@biop.ox.ac.uk.

GluR2 – Flexibility & Gating…

Flexibility depends on ligand occupancy & species

Gating mechanism – decrease in flexibility on channel activation

But … incomplete sampling Need: longer simulations &

comparative simulations

empty Kainate Glutamate

>> >

“OFF” “ON”

0 1.0 1.50.5

1

2

3

4

time (ns)

RM

SD

)

0

empty

+Kai

+Glu

2.0

Page 9: BioSimGRID: A GRID Database of Biomolecular Simulations Mark S.P. Sansom  mark@biop.ox.ac.uk.

GlnBP – A Bacterial Binding Protein

GlnBP – bacterial 2-domain periplasmic binding protein

Similar fold to mammalian GluR2

X-ray shows ligand binding induces domain closure

MD shows ligand binding reduces inter-domain motions - cf. GluR2 simulations

+ Gln

empty Gln bound

X-ray structuresMD Simulation

empty

Gln bound

Page 10: BioSimGRID: A GRID Database of Biomolecular Simulations Mark S.P. Sansom  mark@biop.ox.ac.uk.

Main Initial Tasks

To establish a distributed database environment

To develop Grid/Web services using GT3/OGSA

infrastructure

To develop software tools for interrogation and

data-mining

To develop generic analysis tools

Annotation of simulation data with biological and

structural data from other databases

York

Nottingham

Birmingham

OxfordRAL

Southampton

London

collaborating groups

Page 11: BioSimGRID: A GRID Database of Biomolecular Simulations Mark S.P. Sansom  mark@biop.ox.ac.uk.

• Oxford– database management system (Bing Wu)– (meta)data curatorship & integration (Kaihsu Tai)

• Southampton– application programming interface & data retrieval (Muan

Hong Ng)– generic analysis tools (Stuart Murdock)

Dividing up the Tasks

Page 12: BioSimGRID: A GRID Database of Biomolecular Simulations Mark S.P. Sansom  mark@biop.ox.ac.uk.

table trajectory:one entry foreach trajectory

table coordinate: {x, y, z}one entry foreach atom in each residue in each frame in each trajectory

table atom: one entry foreach atom in each residue ineach trajectory

table residue: one entry foreach residue in each trajectory

table frame: one entry foreach frame in each trajectory

dictionary tablesmetadata tables

Database Design: Simplified

Page 13: BioSimGRID: A GRID Database of Biomolecular Simulations Mark S.P. Sansom  mark@biop.ox.ac.uk.

Database Design: A More Complete Version

Page 14: BioSimGRID: A GRID Database of Biomolecular Simulations Mark S.P. Sansom  mark@biop.ox.ac.uk.

Simulation Metadata

Difficult to extract from published literature

This is a prototype: a needs analysis with users/depositors must be conducted

Annotation/links to other biological databases essential

idmoleculesauthordepositorsaffiliationspublicationsmethodsrc_struref_struprogverhardwarenum_of_proctimestepnum_of_frameens_typethermostatsolventforcefieldele_statequ_prothyd_atomunit_shape…

metadata

Page 15: BioSimGRID: A GRID Database of Biomolecular Simulations Mark S.P. Sansom  mark@biop.ox.ac.uk.

Database Editor & SQL Query Capability

Page 16: BioSimGRID: A GRID Database of Biomolecular Simulations Mark S.P. Sansom  mark@biop.ox.ac.uk.

BioSimGRID Prototype

Target date for prototype: July 2003

Page 17: BioSimGRID: A GRID Database of Biomolecular Simulations Mark S.P. Sansom  mark@biop.ox.ac.uk.

Deliverables to Date…

• Database schema• Sample database (with test trajectories)• Prototype shared between 2 sites• Analysis tools – preliminary versions• Interface to database for data retrieval• Python hosting environment

Page 18: BioSimGRID: A GRID Database of Biomolecular Simulations Mark S.P. Sansom  mark@biop.ox.ac.uk.

Roadmap

Dec 2002 – project started

July 2003 – (internal) prototype

September 2003 – working prototype (All Hands meeting)

November 2003 – test ‘real world’ applications

December 2003 – multi-site prototype

2004 – multi-site deposition of data

2005 – open up to additional groups for deposition/testing

Page 19: BioSimGRID: A GRID Database of Biomolecular Simulations Mark S.P. Sansom  mark@biop.ox.ac.uk.

Future Directions

HTMD – simulations coupled to structural genomics

Diamond light source

Computational system biology – virtual outer membrane

HPCx

Multiscale biomolecular simulations – from QM/MM to meso-scale modelling

GRID-enabled simulations

Combine all of these with BioSimGRID…

Page 20: BioSimGRID: A GRID Database of Biomolecular Simulations Mark S.P. Sansom  mark@biop.ox.ac.uk.

Structural Genomics & HTMD

Overall vision – simulation as an integral component of structural genomics

Needs capacity computation – GRID?

MD database (distributed) – BioSimGRID

synchrotron

MD database

novel biology…

compute GRID

Page 21: BioSimGRID: A GRID Database of Biomolecular Simulations Mark S.P. Sansom  mark@biop.ox.ac.uk.

Towards a Virtual Outer Membrane (vOM)

Om

pT

Om

pX

Om

pA

Om

pF

PhoE

FhuA

Pi

TolC

LamB

FhuDMalE

PiBP

OM

PLA

OpcA

- - - -+

Pi

TonB

First step towards computational systems biology – a suitable system

Bacterial OMs – 5 or 6 proteins = 90% of protein content

Structures or good homology models of proteins are available

Complex lipid – outer leaflet is lipopolysaccharide (LPS)

Minimum system size ca. 2.5x106 atoms; simulation times ca. 50 ns

cf. current FhuA – 80,000 atoms & 10 ns – need HPCx

Page 22: BioSimGRID: A GRID Database of Biomolecular Simulations Mark S.P. Sansom  mark@biop.ox.ac.uk.

Multiscale Biomolecular Simulations

Membrane bound enzymes – major drug targets (cf. ibruprofen, anti-depressants, endocannabinoids)

Complex multi-scale problem: QM/MM; ligand binding; membrane/protein fluctuations; diffusive motion of substrates/drugs in multiple phases

Need for GRID-based integrated simulations

Page 23: BioSimGRID: A GRID Database of Biomolecular Simulations Mark S.P. Sansom  mark@biop.ox.ac.uk.

Oxford

Dr Phil Biggin

Dr Carmen Domene

Dr Alessandro Grottesi

Dr Andrew Hung

Dr Daniele Bemporad

Dr Shozeb Haider

Dr Kaihsu Tai

Dr Bing Wu

George Patargias

Oliver Beckstein

Yalini Pathy

Pete Bond

Jonathan Cuthbertson

Sundeep Deol

Jeff Campbell

Loredana Vaccaro

Jennifer Johnston

Katherine Cox

Robert d’Rozario

John Holyoake

Andrew Pang

BBSRC DTI

The Wellcome Trust GSK

EC (TMR) OeSC (EPSRC & DTI)

EPSRC OSC (JIF)

MRC

BioSimGRID

Leo Caves (York)

Simon Cox (Southampton)

Jon Essex (Southampton)

Paul Jeffreys (Oxford)

Charles Laughton (Nottingham)

David Moss (Birkbeck)

Oliver Smart (Birmingham)

Southampton

Dr Stuart Murdock

Dr Muan Hong Ng

Dr Richard Maurer

Dr Hans Fangohr

Steve Johnston