BioSimGRID and BioSimGRID ’lite’ - Towards a worldwide repository for biomolecular simulation ...

43
BioSimGRID and BioSimGRID ’lite’ - Towards a worldwide repository for biomolecular simulation www.biosimgrid.org Philip C Biggin http:// indigo1.biop.ox.ac.uk [email protected]

Transcript of BioSimGRID and BioSimGRID ’lite’ - Towards a worldwide repository for biomolecular simulation ...

BioSimGRID and BioSimGRID ’lite’

-Towards a worldwide repository for biomolecular simulation

www.biosimgrid.org

Philip C Bigginhttp://[email protected]

OverviewOverview

• Introduction- Motivation- Consortium- Case studies – added value from comparisons

• Design- Architecture- Data schema

• How to use- Deposition- Analysis- Worldwide application

• The Future- Towards computational systems biology

Current Paradigm for MD SimulationsCurrent Paradigm for MD Simulations

Target selection: literature based; interesting protein/problem

System preparation: highly interactive; slow; idiosyncratic

Simulation: diversity of protocols

Analysis: highly interactive; slow; idiosyncratic

Dissemination: traditional – papers, posters, talks

Archival: ‘archive’ data … and then mislay the tape!

No third party involvement

Integrating Simulations and Structural Biology of ProteinsIntegrating Simulations and Structural Biology of Proteins

Novel structure(RCSB)

Sequence alignmentBiomedically relevant homologue(s)

Homology model(s)

MD simulationsBiomolecular simulation database

Comparative analysis

Evaluation/refinement of model

Biological and pharmacological simulation & modellinge.g. drug discovery

bacterial K channel

mammalian K channel

dynamics in membrane

drug docking calculations

Interaction site dynamics

bioi

nfo

rmat

ics

& s

tru

ctur

al

biol

ogy

Bio

Sim

GR

IDdr

ug

disc

over

y

ConsortiumConsortium

York

Nottingham

OxfordRAL

Southampton

LondonBristol

• Oxford: Mark Sansom, Paul Jeffreys, Bing Wu, Kaihsu Tai

• Southampton: Jon Essex, Simon Cox, Stuart Murdock, Muan Hong Ng, Hans Fogohr,

Steven Johnston

• London: David Moss

• Nottingham: Charlie Laughton

• York: Leo Caves

• Bristol: Adrian Mulholland

Comparative Simulations: Drug ReceptorsComparative Simulations: Drug Receptors

Why? – increase significance of results

Sampling – long simulations and multiple simulations

Sampling via biology – exploiting evolution

Biology emerges from comparisons…

e.g. mammalian receptor vs. bacterial binding protein

Rat GluR2 EC fragment Major receptor in mammalian

brains – drug target MD simulations with/without

bound ligands Analyse inter-domain motions

glutamate

D1

D2

GluR2 – Flexibility & Gating…GluR2 – Flexibility & Gating…

Flexibility depends on ligand occupancy & species

Gating mechanism – decrease in flexibility on channel activation

But … incomplete sampling Need: longer simulations &

comparative simulations

empty Kainate Glutamate

>> >

“OFF” “ON”

0 1.0 1.50.5

1

2

3

4

time (ns)

RM

SD

)

0

empty

+Kai

+Glu

2.0

GlnBP – A Bacterial Binding ProteinGlnBP – A Bacterial Binding Protein

GlnBP – bacterial 2-domain periplasmic binding protein

Similar fold to mammalian GluR2

X-ray shows ligand binding induces domain closure

MD shows ligand binding reduces inter-domain motions - cf. GluR2 simulations

+ Gln

empty Gln bound

X-ray structuresMD Simulation

empty

Gln bound

Case Study 2..Case Study 2..

Acetylcholinesterase Outer-membrane phospholipase

OMPLAAChE

So how do compare…So how do compare…

Similar active sites or similar motions

Different structures

Simulated with different MD packages (analysis difficult if not visualization)

On different hard drives/tapes/CDs/DVDs.

Under different graduate students’ desks

Under different postdocs’ beds

In different rubbish bins!

BioSimGrid = BioSimDB + Toolkits + Integration

Answer…Answer…

Create a wordwide repository of molecular simulations….

GUI

Service

DB/Data

Web ApplicationWeb Application Python ApplicationPython Application

Apache / Tomcat / SSL / Python

Authentication Authorisation Accounting

DataRetrievalTool

AnalysisTool

HTML Generator

DataDepositionTool

SQLEditor

Trajectory Query Tool

Video/Img Engine

BioSim Data Engine / Storage Resource Broker

HTTP(S) SSH

TCP/IP

TCP/IP

Middle-ware

DatabaseDatabase Flat FilesFlat Files

BioSimGrid Architecture…BioSimGrid Architecture…

DB Flat File

Size/GB 7.5 3.0

Random Access /s

560.8 18.6

Sequential Access

389.0 5.5

• BioSimDB = PDB (or NDB) for MD

enable discovery of new science (cf. genomics/proteomic initiatives)

BioSimDB

CHARMM

AMBER

NAMD

LAMMPS

TINKERGROMACS

Cross-software Analysis…Cross-software Analysis…

It’s a Distributed DatabaseIt’s a Distributed Database

Nobody has enough disk space in one place anyway

Distributed and duplicate

Any piece of information is stored in at least two sites

…for resilience

DB Interface

BioSim Data Engine Services

DB Engine

DatabaseDatabase Flat Files

Flat Files

F/F Engine

F/F Interface

oxford.biosimgrid.orgoxford.biosimgrid.org soton.biosimgrid.orgsoton.biosimgrid.org

CacheCache

BioSim Data Engine Services

DB Interface

DB Engine

DatabaseDatabaseFlat Files

Flat Files

F/F Engine

F/F Interface

CacheCache

SRBAgent

SRBAgent

SRBServer

MCATIDASRB

ServerMCAT IDA

Current ArchitectureCurrent Architecture

Data SchemaData Schema

The hierachy is like that in the PDB: Chain residue atom coordinate …but also extended in the time dimension: frames

Metadata..Metadata..

…is the data about data

MD setup, parameters, instantaneous properties, etc.

People currently write this in papers

People forget something

The disciplined way:-

…structured schema

Deposition…Deposition…

Unified deposition for trajectories from any packages.

AnalysisAnalysis

• Analysis tools

BioSimDB ToolkitBioSimDB Toolkit

Radius of Gyration

Surface and Volume

RMSD/RMSF

Centre of Mass

Inter-atomic distances

Distance matrix

Internal angles

Principal Component Analysis

Average structure

Current ImplementationCurrent Implementation

New workflow with BioSimGridNew workflow with BioSimGrid

Target selection: literature based; interesting protein/problem

Perform simulation (or use someone else’s)

Protocals more systematically recorded/checked/confirmed

Archive data to BioSimGrid

Analyse shared data (either locally or distributed)

Dissemination: traditional – papers, posters, talks

Store results in BioSimGrid

Third parties can analyse data you deposit

That’s dandy - but who is this aimed at?That’s dandy - but who is this aimed at?

• Novice and Expert..

Novice (web/GUI) Makes selections Guided through the options Can only do specific things Difficult to make mistakes

Expert (employ scripting) Python interpreter Much available Reasonably unrestricted

Example sessionsExample sessions

Example sessionsExample sessions

Example sessionsExample sessions

Example sessionsExample sessions

Example sessionsExample sessions

Example sessionsExample sessions

Example sessionsExample sessions

Example sessionsExample sessions

Even in script mode the syntax is quite informative:-

FC = FrameCollection(`2, 100-200`) myRMSD = RMSD(FC)

myRMSD.createPNG()

Provide biochemists with little computational experience a means of analysing computational data and obtain meaningful results.

Example sessionsExample sessions

Viewlet of a session; Demo4.html

BioSimGrid ‘Lite’BioSimGrid ‘Lite’

Light version before final rollout

Provides equilibrated lipid bilayer boxes

Also provides ontogeny: How the box came about…

…metadata

…equilibration process (all the frames)

Deliverables to Date…Deliverables to Date…

• Database schema

• Sample database (with test trajectories)

• Prototype shared between 2 sites

• Analysis tools – preliminary versions (about 14 tools)

• Interface to database for data retrieval

• Python hosting environment

RoadmapRoadmap

Dec 2002 – project started

July 2003 – (internal) prototype

September 2003 – working prototype (All Hands meeting)

November 2003 – test ‘real world’ applications

December 2003 – multi-site prototype

2004 – multi-site deposition of data

2005 – open up to additional groups for deposition/testing

If you are interested…If you are interested…

The team would like to hear from interested parties especially with new ideas etc

Benefits to you

New directions are implemented Toolkit suits your needs Shared development of code Faster and more thorough development

BioSimGrid Benefits

Larger user community More work gets done Code is efficient.

BioSimGrid and community is successful

Future Directions in the GRID contextFuture Directions in the GRID context

1. HTMD – simulations coupled to structural genomics

Diamond light source

2. Computational system biology – virtual outer membrane

HPCx

3. Multiscale biomolecular simulations – from QM/MM to meso-scale modelling

GRID-enabled simulations

1. HTMD – simulations coupled to structural genomics

Diamond light source

2. Computational system biology – virtual outer membrane

HPCx

3. Multiscale biomolecular simulations – from QM/MM to meso-scale modelling

GRID-enabled simulations

BioSimGridBioSimGrid

Structural Genomics & HTMDStructural Genomics & HTMD

Overall vision – simulation as an integral component of structural genomics

Needs capacity computation – GRID?

MD database (distributed) – BioSimGRID

synchrotron

MD database

novel biology…

compute GRID

Towards a Virtual Outer Membrane (vOM)Towards a Virtual Outer Membrane (vOM)

Om

pT

Om

pX

Om

pA

Om

pF

PhoE

FhuA

Pi

TolC

LamB

FhuDMalE

PiBP

OM

PLA

OpcA

- - - -+

Pi

TonB

First step towards computational systems biology – a suitable system

Bacterial OMs – 5 or 6 proteins = 90% of protein content

Structures or good homology models of proteins are available

Complex lipid – outer leaflet is lipopolysaccharide (LPS)

Minimum system size ca. 2.5x106 atoms; simulation times ca. 50 ns

cf. current FhuA – 80,000 atoms & 10 ns – need HPCx

Multiscale Biomolecular SimulationsMultiscale Biomolecular Simulations

Membrane bound enzymes – major drug targets (cf. ibruprofen, anti-depressants, endocannabinoids)

Complex multi-scale problem: QM/MM; ligand binding; membrane/protein fluctuations; diffusive motion of substrates/drugs in multiple phases

Need for GRID-based integrated simulations

QM (Bristol)

Drug-binding (Southampton)

Protein Motions (Oxford)

Drug Diffusion (London)

References…References…

1. K. Tai, S. Murdock, B.Wu, MH Ng, S. Johnston, H. Fangohr, S. Cox, P Jeffreys, J. Essex, M.S.P. Sansom. Org. Biomol. Chem :: Under review

2. MH Ng, S. Johnston, S. Murdock, B. Wu, K. Tai, H. fangohr, S. Cox, J. Essex, M.S.P. Sansom, P.Jeffrey.

UK E-Science Programme All Hands Meeting 2004 :: Accepted.

3. Python Website – www.python.org

4. BioSimGrid – www.biosimgrid.org

Elsewhere

Leo Caves (York)

Charles Laughton (Nottingham)

David Moss (Birkbeck)

Oliver Smart (Birmingham)

Adrian Mulholland (Bristol)

Marc Baaden (Paris)

Southampton

Dr Stuart Murdock (generic analysis tools)

Dr Muan Hong Ng (data retrieval)

Dr Hans Fangohr

Steven Johnston

Prof Simon Cox

Dr Jon Essex

Oxford

Professor Mark Sansom

Dr Carmen Domene

Dr Alessandro Grottesi

Dr Andrew Hung

Dr Daniele Bemporad

Dr Shozeb Haider

Dr Kaihsu Tai (curation and integration)

Dr George Patargias

Oliver Beckstein Jennifer Johnston

Syma Khalid Jorge Pikunic

Pete Bond Zara Sands

Jonathan Cuthbertson Sundeep Deol

Jeff Campbell Yalini Pathy

Loredana Vaccaro Shiva Amiri

Katherine Cox Robert d’Rozario

John Holyoake Samantha Kaye

Anthony Ivetac Sylvanna Ho

Oxford e-Science Center

Professor Paul Jeffreys

Dr Bing Wu (database management)

Matthew Dovey

Ivaylo Kostadinov

BBSRC DTI The Wellcome Trust GSK

EC (TMR) OeSC (EPSRC & DTI) EPSRC OSC (JIF)

MRC

AcknowledgementsAcknowledgements

More information…More information…

[email protected]

www.biosimgrid.org