Standards and software: practical aids for reproducibility of computational research in systems...

31
Standards  and software: practical aids for reproducibility of computational research in systems biology Michael Hucka, Ph.D. (on behalf of many) Department of Computing and Mathematical Sciences California Institute of Technology Pasadena, CA, USA ICSB 2014, Melbourne, Australia, September 2014 Email: [email protected] Twitter: @mhucka

description

My presentation during the session titled "Reproducibility of computational research: methods to avoid madness" on Wednesday, 17 September 2014, during ICSB 2014, held in Melbourne, Australia.

Transcript of Standards and software: practical aids for reproducibility of computational research in systems...

Page 1: Standards and software: practical aids for reproducibility of computational research in systems biology

Standards and software: practical aids for reproducibility of computational research

in systems biologyMichael Hucka, Ph.D. (on behalf of many)

Department of Computing and Mathematical Sciences California Institute of Technology

Pasadena, CA, USA

ICSB 2014, Melbourne, Australia, September 2014

Email: [email protected] Twitter: @mhucka

Page 2: Standards and software: practical aids for reproducibility of computational research in systems biology

Outline

SBML as enabler

The COMBINE family of standardization efforts

Examples of growing software capabilities

Page 3: Standards and software: practical aids for reproducibility of computational research in systems biology

SBML as enabler

Page 4: Standards and software: practical aids for reproducibility of computational research in systems biology

Background context: formal models

{

Page 5: Standards and software: practical aids for reproducibility of computational research in systems biology

Communication of models in the olden days…Describe the models and equations in a paper, and you’re done, right?

Problems:

• Errors in printing

• Missing information

• Dependencies onimplementation

• Outright errors

• Can be a hugeeffort to recreate

Page 6: Standards and software: practical aids for reproducibility of computational research in systems biology

ABC-SysBio CellNetAnalyzer Karyote* PaVESy SBW: Auto Layout acslXtreme CellNOpt KEGGconverter PAYAO sbw: javasim ALC Cellware KEGGtranslator PET sbw: stochastic simulator AMIGO CLEML Kineticon PhysioLab Modeler SCIpath Antimony CL-SBML Kinsolver PINT SED-ML Web Tools APMonitor COBRA libAnnotationSBML PK-Sim / MoBi semanticSBML Arcadia CompuCell3D libRoadRunner PNK SensSB Asmparts ConsensusPathDB libSBML PottersWheel SGMP Athena COPASI libSBMLSim PRISM Sigmoid* AutoSBW CRdata libStruct ProcessDB SIGNALIGN AVIS CycSim MASS Toolbox ProMoT SignaLink BALSA CySBML MatCont PROTON SigPath BASIS Cytoscape MathSBML pybrn SigTran BetaWB Cyto-Sim Medicel PyDSTool SIMBA Bifurcation Discovery Tool DBSolve MEMOSys PySB SimBiology BiGG DEDiscover MesoRD PySCeS Simpathica BiNoM Dizzy Meta-All RANGE SimPheny* BiNoM Cytoscape Plugin DOTcvpSB Metaboflux RAVEN Simulate3D Bio Sketch Pad E-CELL MetaCrop Reactome Simulation Core Library BioBayes ecellJ MetaFluxNet ReMatch Simulation Tool BIOCHAM EPE Metannogen RMBNToolbox SimWiz BioCharon ESS Metatool roadRunner SloppyCell BioCyc Facile MetExplore RSBML SmartCell BioGRID FAME MetNetMaker SABIO-RK Snoopy Biological Networks FASIMU MIRIAM Resources Saint SOSlib BioMet Toolbox FBASBW MMT2 SBFC SPDBS BioModels Database FERN modelMaGe SBML Harvester SRS BioModels Importer FluxBalance ModeRator SBML Layout STEPS BioNessie Fluxor Modesto SBML Reaction Finder StochKit BioNetGen Genetdes Moleculizer SBML Translators StochPy BioPARKIN Genetic Network Analyzer MonaLisa SBML2APM StochSim BioPathwise Gepasi Monod SBML2BioPax STOCKS BioPAX2SBML Gillespie2 MOOSE SBML2LaTeX SurreyFBA BioRica GINsim MuVal (Multi-valued logic) SBML2NEURON SyBiL BioSens GNAT Narrator SBML2Octave SYCAMORE BioSPICE Dashboard GNU MCSim nemo SBML2SMW SynBioSS BioSpreadsheet GRENDEL NetBuilder' SBML2TikZ Systrip BioSyS HSMB NetPath SBML2XPP TERANODE Suite BioTapestry HybridSBML NetPro SBMLEditor The Cell Collective BioUML iBioSim Odefy SBML-PET-MPI Tide BoolNet IBRENA Omix SBMLR TinkerCell braincirc Insilico Discovery ONDEX SBML-SAT Trelis BRENDA insilicoIDE optflux SBML-shorthand UTKornTools BSTLab iPathways Oscill8 SBMLSim VANTED ByoDyn JACOBIAN PANTHER Pathway SBMLsqueezer Vcell CADLIVE Jacobian Viewer PathArt sbmltidy WebCell Cain Jarnac Pathway Access SBMLToolbox WinSCAMP CARMEN JarnacLite Pathway Analyser SBMM assistant Wolfram SystemModeler Cell Illustrator JDesigner Pathway Builder SBO xCellerator CellDesigner JigCell Pathway Solver SBSI Xholon Cellerator JSBML Pathway Tools SBToolbox2 XPPAUT CellMC JSim PathwayLab sbtranslate CellML2SBML JWS Online PATIKAweb SBW

Many software tools for modelingand simulation are available

Page 7: Standards and software: practical aids for reproducibility of computational research in systems biology

https://www.behance.net/gallery/d/7465033

Projects often involve the use of more than one tool

Page 8: Standards and software: practical aids for reproducibility of computational research in systems biology
Page 9: Standards and software: practical aids for reproducibility of computational research in systems biology

Open format for representing models of biological processes

• Data structures + principles + serialization to XML

• (Mostly) Declarative description—not a scripting language

(Mostly) neutral with respect to modeling framework

• E.g., ODE, stochastic systems, etc.

Does not store experimental data, or simulation descriptions

Meant for software to read/write, not humans

SBML = Systems Biology Markup Language

Page 10: Standards and software: practical aids for reproducibility of computational research in systems biology

SBML is a file format based on XML

Page 11: Standards and software: practical aids for reproducibility of computational research in systems biology

SBML is a file format based on XML

Software shields you from working with this directly

Page 12: Standards and software: practical aids for reproducibility of computational research in systems biology

The process is central

• Literally called “reaction” (not necessarily biochemical)

• Participants are pools of entities of the same kind (“species”) !

!

!

!

• Species are located in containers (“compartments”)

- Core SBML assumes well-mixed compartments

Models can further include:

• Other constants & variables

• Discontinuous events

Core SBML concepts are fairly simple

• Unit definitions

• Annotations

• Other, explicit math

na1 A nb1 B+ nc1 Cf1(...)

na2 A nd2 D+ ne1 Ef2(...)

. . .nc3 C nf3 F

f3(...) + ng3 G

Page 13: Standards and software: practical aids for reproducibility of computational research in systems biology

Core SBML constructs support many types of models

!

ODE (e.g., cell differentiation)

Conductance-based (e.g., Hodgin-Huxley)

Typically do not use SBML “reaction” construct,but instead use “rate rules” construct

Neural (e.g., spiking neurons)

Typically use “events” for discontinuous changes

Pharmacokinetic/dynamics

“Species” are not required to be biochemical entities

Infectious diseases BioModels Database model #MODEL1008060001

BioModels Database model #BIOMD0000000451

BioModels Database model #BIOMD0000000020

BioModels Database model #BIOMD0000000127

BioModels Database model #BIOMD0000000234

Example of model type Example model

List originally by Nicolas Le Novére

Page 14: Standards and software: practical aids for reproducibility of computational research in systems biology

SBML Level 3 What it Hierarchical model composition Models containing submodels ✔

Flux balance constraints Constraint-based models ✔

Qualitative models Petri net models, Boolean models ✔

Graph layout Diagrams of models ✔

Multicomponent/state species Entities w/ structure; also rule-based models draft

Spatial Nonhomogeneous spatial models draft

Graph rendering Diagrams of models draft

Groups Arbitrary grouping of components draft

Arrays Arrays of entities draft

Dynamic structures Creation & destruction of components draft

Distributions Numerical values as statistical distributions draft

Annotations Richer annotation syntax

Status

Page 15: Standards and software: practical aids for reproducibility of computational research in systems biology

Many models and software resources are available today

Accepted by dozens of journals *

100’s of software tools available today

• 260+ listed in SBML Software Guide †

1000’s of models available

• ... in public databases

• ... as supplementary data to papers

• ... in private repositories

!!!!* http://sbml.org/Documents/Publications_known_to_accept_submissions_in_SBML_format † http://sbml.org/SBML_Software_Guide

http://sbml.org

Page 16: Standards and software: practical aids for reproducibility of computational research in systems biology

SBML enables better reproducibility of models

Anecdotal report from BioModels Database curators:

• Models encoded from papers used to fail almost 100% of the time

• Models submitted in SBML format:

- 60% work right away

- 40% work “most of the time” with help from authors

BioModels Database @ EBI

• 530 manually-curated models

• 650 non-curated models

• 143,000 auto-generated models

http://biomodels.net

Page 17: Standards and software: practical aids for reproducibility of computational research in systems biology

SBML enables integration of informationSBML supports integration in multiple ways:

• SBML itself is an integrative framework

- Many different types of models all use one core framework

• Hierarchical Model Composition: integrate multiple models into one

- Permits hierarchically-structured models

- Permits libraries of reusable components

• Annotations within SBML:

- Incorporate data, notes, software-specific extensions

- Link to external data, other models, etc.—anything with a URI

Page 18: Standards and software: practical aids for reproducibility of computational research in systems biology

SBML enables large, collaborative, model-building efforts

Page 19: Standards and software: practical aids for reproducibility of computational research in systems biology

The COMBINE family of standardization efforts

Page 20: Standards and software: practical aids for reproducibility of computational research in systems biology

Many things not addressed by SBML alone

?

BIOMD0000000319 in BioModels Database

Decroly & Goldbeter, PNAS, 1982

Page 21: Standards and software: practical aids for reproducibility of computational research in systems biology

Efforts like SED-ML improve reproducibility

Waltemath et al., BMC Sys Bio 5, 2011.

Page 22: Standards and software: practical aids for reproducibility of computational research in systems biology

Realizations about the state of affairs in late-2000’s

• Many efforts overlapped, but lacked coordination

• Invented their own processes from scratch

• Many separate meetings meant more travel for many people

• Limited and fragile funding didn’t support solid base

COMBINE = Computational Modeling in Biology Network

• Coordinate meetings

• Coordinate standards development

• Develop common procedures & tools

• Provide a recognized voice

Motivations for the creation of COMBINE

Page 23: Standards and software: practical aids for reproducibility of computational research in systems biology

Standardization efforts represented in COMBINE today

BioPAX

Qualifiers

GPML

COMBINE Standards

Associated Standardization Efforts

Related Standardization Efforts

COMBINE Archive

MAMO

Page 24: Standards and software: practical aids for reproducibility of computational research in systems biology

Examples of growing software capabilities

Page 25: Standards and software: practical aids for reproducibility of computational research in systems biology

Virtual Cell

Example: standardizing descriptions of spatial models in SBML

SBML Level 3 Spatial packagedraft specification

COPASI

MCell and CellBlender

Page 26: Standards and software: practical aids for reproducibility of computational research in systems biology

Example: synthetic biology and simulationsSBOL = Synthetic Biology Open Language

• Format + visual notation for exchange of genetic designs

• SBOL Developers Group includes 29 organizations

SBOL is leveraging other COMBINE standards

• Using SBML/CellML/etc. + SED-ML to describe module behavior

!

!

!

!

• Connecting to BioPAX and SBGN

• Using COMBINE Archive format

Page 27: Standards and software: practical aids for reproducibility of computational research in systems biology

Example: Path2ModelsAutomatically generates mathematical models from pathway resources

!

!

!

!

!

!

!

!

!

Leverages many open resources – e.g., SABIO-RK, BioCarta, ChEBI, GO, …

Outputs SBML (with MIRIAMannotations) and SBGN

Starts with KEGG, MetaCyc and BioPAX pathway sources

Büch

el e

t al.,

BM

C Sy

stem

s Bio

logy

. (20

13).

7

Page 28: Standards and software: practical aids for reproducibility of computational research in systems biology

Acknowledgments

Page 29: Standards and software: practical aids for reproducibility of computational research in systems biology

Huge thanks to everyone in the COMBINE community

Attendees of COMBINE 2013, Paris, France

Attendees of COMBINE 2014, Los Angeles, California, USA

Page 30: Standards and software: practical aids for reproducibility of computational research in systems biology

National Institute of General Medical Sciences (USA) Air Force Office of Scientific Research (USA) BBSRC (UK) Beckman Institute, Caltech (USA) DARPA IPTO Bio-SPICE Bio-Computation Program (USA) Drug Disease Model Resources (EU-EFPIA Innovative Medicine Initiate) ELIXIR (UK) European Molecular Biology Laboratory (EMBL) Google Summer of Code International Joint Research Program of NEDO (Japan) Japanese Ministry of Agriculture Japanese Ministry of Education, Culture, Sports, Science and Technology JST ERATO Kitano Symbiotic Systems Project (Japan) (to 2003) JST ERATO-SORST Program (Japan) Keio University (Japan) Molecular Sciences Institute (USA) National Science Foundation (USA) STRI, University of Hertfordshire (UK)

SBML funding sources over the past 14 years

Page 31: Standards and software: practical aids for reproducibility of computational research in systems biology

URLs

SBML http://sbml.org

COMBINE http://co.mbine.org

SED-ML http://sed-ml.org

SBOL http://sbolstandard.org

SBGN http://sbgn.org

BioPAX http://biopax.org

BioModels Database http://biomodels.net

Path2Models https://code.google.com/p/path2models/