CANAL - Manchester Tours | Tours of Manchester | Tours Manchester
SysMo-DB: Supporting Data Access and Integration Carole Goble, University of Manchester UK Jacky...
-
Upload
hailie-mort -
Category
Documents
-
view
219 -
download
0
Transcript of SysMo-DB: Supporting Data Access and Integration Carole Goble, University of Manchester UK Jacky...
SysMo-DB: Supporting Data Access and Integration Carole Goble, University of Manchester UKJacky Snoep, Uni of Manchester / Stellenbosch, S AfricaIsabel Rojas, EML Research gGmbH, Germany
Goal of SysMO Eleven individual projects
Different research outcomes A cross-section of microorganisms, including
bacteria, archaea and yeast. Record and describe the dynamic molecular
processes occurring in microorganisms in a comprehensive way
Present these processes in the form of computerized mathematical models.
Pool research capacities and know-how.
The crunch
No one concept of experimentation or modelling
No planned, shared infrastructure for pooling
SysMO-DB
Retrofit a data access, model handling and data integration platform:
To support and manage the diversity of Data and Models Competencies
That promotes shared understanding Using a common platform and common
technologies
DB
Web-based solution to facilitate:exchange of data, models and
processes (intra- and inter- consortia)
search for data, models and processes across the initiative
maximisation of the "shelf life" and utility of the data, models and processes generated
dissemination of results
DB SysMO-DB
Our experimental conditions…. Progressive and incremental
Something in it for me all the way along
Low hanging fruit immediately Return that matches investment
Realistic Eases pressure points and
concerns of the groups Lower barriers of engagement
Sustainable Flexible, extensible and open
Exp
erim
enta
l d
ata
Mod
els
Pro
cess
es
SysMO DB
SysMO-DB Concept
SysMO-HUB web interface
SysMO-DB Team
University of Stellenbosch, South AfricaUniversity of Manchester, UK
Prof Jacky Snoep
Models
EML Research gGmbH, Germany
DataMetadata
Prof Isabel Rojas
University of Manchester, UK
Processes (Workflow)
PortalInfrastructure
Prof Carole Goble
SysMO-DB Team
University of Manchester, UK
WorkflowPortal
InfrastructureSoftware Engineer
Stuart Owen
University of Manchester, UK
WorkflowMetadataBioinformatician
Katy Wolstencroft
EML Research gGmbH, Germany
DatabasesMetadata
Isabel Rojas and Olga Krebs
Backed up by the Rest
…..and more
…and more
Construct pathway model in SBML
Model analysis
New hypothesis
Experimental validation
Data analysis & integration
Model update
New data
New dataPredict
Construct pathway model in SBML
Model analysis
New hypothesis
Experimental validation
Data analysis & integration
Model update
Workflows
New data
New data
JWSOnline
COPASI
Workflows
SysMO Data
SABIO-RK
External Data and
Applications
Predict
SysMO-Hub Portal
Data
Models
Workflows
SysMO-Hub Portal
Data
Models
WorkflowsExternal Resources
SysMO-Hub Portal
My Stuff
Data
Models
WorkflowsExternal Resources
Private Access
Controlled publication
SysMO-Hub Portal
My Stuff
Data
Models
WorkflowsExternal Resources
Private Access
Controlled publication
Metadata
SysMO-SEEK
Access Control
Acce
ss C
ontro
lAccess Control
Access Control
Stitching it together Metadata on everything
recommendations, MIBBI, our own controlled vocabularies that incrementally evolve
Web services simple interfaces that
incrementally evolve Web 2.0 style
Atom feeds, blogs, wikis, mash ups, REST
JERM Web Service
Access Interface
Met
adat
a
SysMO Data Models
JERM Extractor
Met
adat
a
Met
adat
a External Resources
Web Service Access Interface
Taverna Workflows
SysMO HUB Portal (Liferay)
Met
adat
a
Workflows
SysMOSEEK
Rep
osito
ries
& R
eso
urce
sS
ervi
ceIn
terf
ace
Inte
grat
ion
Dis
cove
ry,
Acc
ess
Ann
ota
tion
&
Col
labo
ratio
nResultsCache
myE
xperim
ent
JWS Online
SABIO-RK
Met
adat
a
BioCatalogue
Access Control
Access Control
Customised web portalUnified access to SysMO resources, and
integrated queries across data, workflow and model catalogues, and repositories
A common entry to the information created by the SysMO partners.
Pre-cooked queries and processesUmbrella for eGroupWare, OpenWetWare, wikis
and other solutions Liferay (http://www.liferay.com) portal framework
Web Access - SysMO-Hub
Data Exchanges Use existing community standards e.g:
MIRIAM: Minimum Information Requested for the Annotation of (biochemical) Models
MIAME: Minimum Information for the Annotation of Microarray Experiments
MIAPE: Minimum Information for the Annotation of Proteomics Experiments
SBML: Systems Biology Markup Language Definition of minimal sets for information
exchange within the consortia
Data and Metadata “Just Enough Results Model”
minimum metadata for exchange Where storage solutions exist
Expose through JERM
Where storage solutions do not exist SABIO-RK, iChiP, Brenda, MeMo and
many more JWS Online, BioModels COPASI myExperiment
Ontologies, catalogues and controlled vocabularies for annotation
SysMO SEEK: Registry
JERM Web Service
Access Interface
Met
adat
a
SysMO Data
JERM Extractor
SABIO-RK
Access Control
Discovery SysMO-SEEKSelf-curated, access-controlled catalogue of
assets to promote cooperationMetadata database (who has what)
Progressive refinement Projects, Group, Provenance, Files It will NOT hold results.
Meta catalogue Search over other catalogues BioCatalogue, myExperiment, JWS Online,
BioModelsIs itself a web service
Incorporate in your own group ware environments and applications
SysMO SEEK Is there any group generating kinetic data? Is this data available? Who is working with which organism? What methods are been used to determine
enzyme activity? Under which experimental conditions are my
partners working on for the measurement of glucose concentration?
and many more
Models Publish, manage, run, validate JWS Online
Database of curated models and a simulator Web service enabled
Each SysMO projects will have a separate password protected website.
Processes - Workflows Applications and services become
accessible to the workflow machinery as Web services or Java applications.
Data and application integration and analysis
Model construction and population Repeatable and shareable plan Transparent provenance log Taverna Workflow Management
System
Processes Technology
Taverna
myExperiment.org
Example - Manipulation of SBML models in workflows Using libSBML
For data integration For constructing and
annotating SBML models
libSBML written in C then wrapped with a Java API
Related Activities BioCatalogue
Community and Expert Curated Catalogue of Life Science Web Services
Started June 2008. Target Practice
Informatic and metabolomic assessment of biological network changes and of drug-cell interactions
Utopia, Taverna workflows Solutions held by SysMO partners
eGroupWare, PHProjekt, Basecamp, wikis etc
Training, Consultancy, Know-how Us:
Training on databases, models, workflow systems and web services, and best practice for the annotation of resources by metadata.
Kick-starting, toolkits, templates You:
Social networking for shared content, know-how and best practice
Contribution Best of breed solutions in place
already
User Focus Group of PALS
PhD Students and Post Docs
Pals
1 Falko Krause TRANSLUCENT Bioinformatics Berlin, Germany
2 Leif Steil BaCell-SysMO Experimentalist,
databases
??
3 Walter Glaser MOSES Bioinformatician Vienna, Austria
4 Malkhey Verma MOSES + SulfoSys Experiment/
modeller interface
Manchester, UK
5 Femke Mensonides MOSES + SulfoSys Experiment/
modeller interface
Vrije University, Netherlands
6 Hanan Messiha Girgis MOSES Experimentalist Manchester, UK
7 Pawel Sierocinski SulfoSys Experimentalist Wageningen, Netherlands
8 Maria Rodrigues KOSMOBAC Modeller Vigo. Spain
9 Afsaneh Maleki-Dizaji SUMO Bioinformatician Sheffield, UK
10 John Heap COSMIC Experimentalist Nottingham, UK
11 Walid Omar STREAM Experimentalist Warwick, UK
12 Elon Correa Valla Modeller Manchester, UK
13 Renate Kania SysMO-LAB Database EML, Germany
14 Mark Musters SysMO-LAB Modeller Wageningen, Netherlands
15 Terry McGenity’s postdoc? PSysMO Experimentalist Essex, UK
16 Maksim Zakhartsev MOSES Experimentalist Stuttgart, Germany
Hands OnWhich data do you need to exchange?
i.e. What do you need, what can you give? What are the minimal exchange formats?How to best to annotate your data (giving
semantics to your data)?How to cross-relate different types of data
(e.g. Genomic, Transcriptomic, Proteomic, Metabolomic, Kinetic, and modelling data)
What should be in the SysMO SEEK?What should the portal look like?
Steps so far….Questionnaire Current situation in each project Contribute to design of work packages Responses from:
Project 1: BaCell-SysMO Project 2: COSMIC Project 3: SUMO Project 4: KOSMOBAC Project 6: Psysmo Project 7: Pseudomonas fluorescens Project 9: Translucent Project 10: Streptomyces coelicolor Project 11: Silicon cell model
……..Results A spectrum of resources and data management and integration expertise
Each project is concerned with data, models and processes, but each partner may not do all
All projects are concerned with sharing between their sites.
Some are not yet ready to share with all of SysMO. Respect privacy. Governance.
Produced on site and stored in files or excel spread-sheets
Not consolidated between group members or project members. No common database solutions;
Only who and when produced. Does not to conform to existing minimum metadata ‘omics standards
Google search over basic indexing
Common format or in a database or repository
Group members and project partners, but not the rest of SysMO or outside
Annotation of data, may be free-text, but may not conform to existing standards.
Google search over basic indexing and annotations.
Stored and indexed in relational databases from consortium or other formats
Project partners & SysMO but not outside. Some web service interface access to data resources
Minimum metadata standards
Fully searchable
Stored and indexed in relational databases, using databases from consortium or using other formats
Fully searchableProject partners, SysMO & the Systems Biology community via web services and data services. Some data exported to public repositories
Minimum metadata standards
Storage Access Annotation DiscoveryData
Model
Data but No Models
Models are developed in a non-SBML format and are not converted to SBML.
None Models are submitted to JWS online in their native format
Models are developed in SBML, or in another format, and converted into SBML
Little or no annotation of the models using current standards, such as, MIRIAM
Models are submitted to JWSOnline in SBML
Models are submitted to JWS online.
Models are developed in SBML
Fully annotated using MIRIAM
Representation Annotation Access
All processes are manual with no scripted pipelines or workflows;
Some data may also be gathered from external sources,
Data is produced and stored locally
No automation of routine processes
No reference to external resources
Data used for models by other groups in same project or locally
Some of gathering or model population automated workflows
Some web service interfaces to locally generated data and tools
Some of gathering or model population is mediated by workflows
Verifying simulation results against experimental data is mediated by workflows.
Web service interfaces to all locally generated data and tools
Workflows are annotated and published on myExperiment for SysMO consortium members.
Processes
Silver or Gold Pilots Project 1 BaCell-SysMo
Produce datasets, use models, workflow ready Project 7 Pseudomonas fluorescens and Project 6 Psysmo
Pseudomonas organisms, use third party data sets and produce their own, model ready, workflow ready
Project 10 Streptomyces coelicolor Omics and standards compliant, use third party standard data, workflow
ready, model standards skeptic but use models Project 3 SUMO
Produce own data and own models, have their own wiki for sharing data, workflow ready and model ready, using COPASI
MOSES (though no questionnaire) Local, using models, produce their own data, similar work in Target
Practice using UTOPIA and Taverna workflows already SulfoSYS
Data solutions, eGroupWare Project 9 TRANSLUCENT
Our first Pal! Protein-protein interaction data. PHProjekt SysMOLab and MeMo (though no questionnaire)
Wikis, SABIO-RK, etc
Bronze Data Pilot
Data storage solutions for project partners who need it
Many work mainly with Excel or flat files Need data storage first to disseminate to
others and start collaborating KOSMOBAC (Booth) Group
Development Approach
You already got something, we will not reinvent.
Development and deployment of all components will be incremental Metadata specs SW rapid prototyping
Leverage Limited -> Sophisticated Cater for different levels of
readiness Customised for each project
Comprehensive up to date audit and list of meetings. First cut Hub and SEEK
Project areas set up & access control scheme SysMO-SEEK of data assets & projects with interface Collection of queries/use cases for SEEK and Hub
Data With Gold and Silver pals define the first cut JERM With Bronze pal identify storage solution Establish best practices on data annotation Prepare two or three SysMO datasets for workflow readiness
Models and Workflows Access to JWS Online and myExperiment Seed with SysMO-specific workflows and models Identify useful workflow packs
Engagement Project web site and wiki Build up our PALS team Visits and training timetable JERM and SEEK workshop
First Steps – end October 2008
JERM and SEEK Workshop First Pals face 2 face 18-19 September 2008 EML, Heidelberg, Germany Facilitated
Preparation: Audit Sweet spots & pains
In Meeting: SysMO-SEEK Just Enough Results Model for Exchange
Audit The repositories you use now and plan to use to store your experimental
data: home grown; standard; public; private The other repositories you use now and plan to use The data formats you use now and plan to use The SOPs you have in place or plan to The software you use for data management, group ware & project
management, model simulation etc: e.g. Rosetta, Oracle, Matlab, R, Mathematica, eGroupware, PHProjekt, wikis
Software you have that would be of benefit to all, and are willing to share – e.g. Falko’s Semantic SBML tool, MCISB SBML annotation tool
The programming and software environments you use – e.g: Java, Python, C++, Ruby on Rails, Perl
Your local expertise available for data management e.g. full time bioinformatician, database manager, commercial outsourcing, none
What facilities do you have for coping with external access – how do you export data now?
Design a systematic collection mechanism with two of the pals Wiki mining
Sweet spots and Pains Confidentially……in your humble opinion….
What would be the first three low hanging fruit for your project?
And what are three obstacles / barriers? Tell us about your experimentalists,
modellers and bioinformaticians What doesn’t work right now? What does?
SysMO-SEEK - not the results themselves
What schemas or metadata do you have for groupware, projects, SOPs, procedures we can use as a basis for the SEEK model and for sourcing the content? How do you know who has what and what are they doing?
What controlled vocabularies do you use for this if any?
Which data do you need / would like to know from others in SysMO and outside SysMO?
What data would you be willing to give? What is the data release policy of your project?
Availability, conditions of use, permissions, credit etc What is the lifecycle of your data
Versioning policy,
Just Enough Results Model Exchange Which data do you need / would like to know
from others? What data would you be willing to give? How do you annotate your data?
How do you cross relate different types of data?
What standards for data do you already use and know about?
Revised Hub and SEEK Enhanced SysMO-SEEK of data assets & projects
Data Gold and Silver - the first JERM interface Access to a few data sets through Hub using JERM Bronze - established a storage solution JERM-based SysMO datasets for workflow readiness Disseminate best practices on data annotation
Models and Workflows Models and Workflows on JWS Online and myExperiment Demoed workflow using data sets through JERM interface Useful workflow packs & launch workflows from portal
Engagement Devising next steps with PALS team Visits and training timetable
First Steps end March 2009
Back up TeamsBack up Teams
PALS and DMG
Data Management
Group
SysMO-DB Delivery Team
Back up Technical TeamsSysMO-Pals
FundersSteeringReviewGovernance
Hands on engagement
SysMO Projects
Realism
•Light touch•Incremental•One size will not fit all•Use what is already there