Advances in Scientific Workflow Environments
-
Upload
carole-goble -
Category
Science
-
view
169 -
download
2
Transcript of Advances in Scientific Workflow Environments
![Page 1: Advances in Scientific Workflow Environments](https://reader033.fdocuments.net/reader033/viewer/2022052705/58a6c12d1a28ab661f8b6d3f/html5/thumbnails/1.jpg)
2016-09-04 BioExcel SIG, ECCB, Amsterdam
Advances in Scientific Workflow Environments
Carole Goble, Stian Soiland-ReyesThe University of Manchester
[email protected]://esciencelab.org.uk/
![Page 2: Advances in Scientific Workflow Environments](https://reader033.fdocuments.net/reader033/viewer/2022052705/58a6c12d1a28ab661f8b6d3f/html5/thumbnails/2.jpg)
What is a Workflow? • Orchestrating multiple
computational tasks• Managing the control and
data flow between them• In a world that is
homogeneous or heterogeneous
• Tasks– Local / remote– Local / third party– White, grey or black boxes– Reliable / fragile– Reserved / dynamic– Various underpinning
infrastructure– Various access controls
BioExcel: Biomolecular recognition
![Page 3: Advances in Scientific Workflow Environments](https://reader033.fdocuments.net/reader033/viewer/2022052705/58a6c12d1a28ab661f8b6d3f/html5/thumbnails/3.jpg)
What is a Workflow? Automation
– Automate computational aspects– Repetitive pipelines, sweep campaigns
Scaling – compute cycles– Make use of computational
infrastructure & handle large dataAbstraction – people cycles
– Shield complexity and incompatibilities– Report, re-use, evolve, share, compare– Repeat – Tweak - Repeat– First class commodities
Provenance - reporting– Capture, report and utilize log and
data lineage auto-documentation– Traceable evolution, audit,
transparency– Compare
With thanks to Bertram Ludascher: WORKS 2015 Keynote
FindableAccessibleInteroperableReusable(Reproducible)
![Page 4: Advances in Scientific Workflow Environments](https://reader033.fdocuments.net/reader033/viewer/2022052705/58a6c12d1a28ab661f8b6d3f/html5/thumbnails/4.jpg)
https://pegasus.isi.edu/2016/02/11/pegasus-powers-ligo-gravitational-waves-detection-analysis/
Laser Interferometer Gravitational-Wave Observatory – first detection of gravitational waves from colliding black holes
![Page 5: Advances in Scientific Workflow Environments](https://reader033.fdocuments.net/reader033/viewer/2022052705/58a6c12d1a28ab661f8b6d3f/html5/thumbnails/5.jpg)
Morphological, hemodynamic and structural analyses linked to aneurysm genesis, growth and rupture.
[Susheel Varma] http://www.vph-share.eu/
http://taverna.org.uk
![Page 6: Advances in Scientific Workflow Environments](https://reader033.fdocuments.net/reader033/viewer/2022052705/58a6c12d1a28ab661f8b6d3f/html5/thumbnails/6.jpg)
Galaxy https://usegalaxy.org/
![Page 7: Advances in Scientific Workflow Environments](https://reader033.fdocuments.net/reader033/viewer/2022052705/58a6c12d1a28ab661f8b6d3f/html5/thumbnails/7.jpg)
Marine metagenomics
Workflow Driven
+ Bespoke Scripts
[Rob Finn]
![Page 8: Advances in Scientific Workflow Environments](https://reader033.fdocuments.net/reader033/viewer/2022052705/58a6c12d1a28ab661f8b6d3f/html5/thumbnails/8.jpg)
Open PHACTShttps://www.knime.org/
BioExcel workflow
https://www.openphacts.org/
Targets
Pharmacological queriestarget, compound and pathway data
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0115460
![Page 9: Advances in Scientific Workflow Environments](https://reader033.fdocuments.net/reader033/viewer/2022052705/58a6c12d1a28ab661f8b6d3f/html5/thumbnails/9.jpg)
Scripts, Ensemble toolkit, execution patterns
http://www.extasy-project.org/
![Page 10: Advances in Scientific Workflow Environments](https://reader033.fdocuments.net/reader033/viewer/2022052705/58a6c12d1a28ab661f8b6d3f/html5/thumbnails/10.jpg)
http://www.myexperiment.org
WF Zoo
![Page 11: Advances in Scientific Workflow Environments](https://reader033.fdocuments.net/reader033/viewer/2022052705/58a6c12d1a28ab661f8b6d3f/html5/thumbnails/11.jpg)
![Page 12: Advances in Scientific Workflow Environments](https://reader033.fdocuments.net/reader033/viewer/2022052705/58a6c12d1a28ab661f8b6d3f/html5/thumbnails/12.jpg)
Workflow Patterns, templates
Data wrangling& analytics
Simulations
Instrumentpipelines++
http://tpeterka.github.io/maui-project/The Future of Scientific Workflows, Report of DOE Workshop 2015, http://science.energy.gov/~/media/ascr/pdf/programdocuments/docs/workflows_final_report.pd
![Page 13: Advances in Scientific Workflow Environments](https://reader033.fdocuments.net/reader033/viewer/2022052705/58a6c12d1a28ab661f8b6d3f/html5/thumbnails/13.jpg)
Workflow Patterns, templates
Data wrangling& analytics
Simulations
Instrumentpipelines++ Garijo et al Common Motifs in Scientific Workflows: An Empirical Analysis, FGCS, 36, July 2014, 338–351
![Page 14: Advances in Scientific Workflow Environments](https://reader033.fdocuments.net/reader033/viewer/2022052705/58a6c12d1a28ab661f8b6d3f/html5/thumbnails/14.jpg)
Workflow Patterns, templates• Long running and complex code• Tunable parameters and input sets• Simulation sweeps / iterations• Ensembles, comparisons • Tricky set-ups, human-in-the-loop
interaction• Computational steering• In situ workflows – multiple tasks,
same box, within fixed time– data locality. – human-in-the-loop. – capture provenance.
Data wrangling& analytics
Simulations
Instrumentpipelines++
![Page 15: Advances in Scientific Workflow Environments](https://reader033.fdocuments.net/reader033/viewer/2022052705/58a6c12d1a28ab661f8b6d3f/html5/thumbnails/15.jpg)
Traction + ExamplesReuse behaviours
Exploratory vs ProductionDifferent kinds of user / deployment
Developer – User Ratios
BiologistDeveloper ComputationalScientist
Embe
d in A
pplic
ation
Embe
d in p
latfor
m
Embe
d in in
frastr
uctu
re
![Page 16: Advances in Scientific Workflow Environments](https://reader033.fdocuments.net/reader033/viewer/2022052705/58a6c12d1a28ab661f8b6d3f/html5/thumbnails/16.jpg)
Existing computational research workflow systems
https://github.com/common-workflow-language/common-workflow-language/wiki/Existing-Workflow-systems
WFMS Zoo
![Page 17: Advances in Scientific Workflow Environments](https://reader033.fdocuments.net/reader033/viewer/2022052705/58a6c12d1a28ab661f8b6d3f/html5/thumbnails/17.jpg)
Existing computational research workflow systems
https://github.com/common-workflow-language/common-workflow-language/wiki/Existing-Workflow-systems
![Page 18: Advances in Scientific Workflow Environments](https://reader033.fdocuments.net/reader033/viewer/2022052705/58a6c12d1a28ab661f8b6d3f/html5/thumbnails/18.jpg)
Existing computational research workflow systems
https://github.com/common-workflow-language/common-workflow-language/wiki/Existing-Workflow-systems
![Page 19: Advances in Scientific Workflow Environments](https://reader033.fdocuments.net/reader033/viewer/2022052705/58a6c12d1a28ab661f8b6d3f/html5/thumbnails/19.jpg)
“Multi-scale” WFMS• Workflow
Management System– Its design and
reporting environment– Its execution
environment• The tasks
– tools, codes and services and their execution environments
• Stack layer– App level, infrastructure
level
![Page 20: Advances in Scientific Workflow Environments](https://reader033.fdocuments.net/reader033/viewer/2022052705/58a6c12d1a28ab661f8b6d3f/html5/thumbnails/20.jpg)
Component making
Tasks loosely coupled through files, • execute on geographically
distributed clusters, clouds, grids across systems
• execute on multiple facilities• call host services (web / grid
services)
DAICDistributed Area/Instrument Computing
“Multi-scale” WFMS
Tasks tightly coupled• exchanging info over
memory/storage• network of supercomputers • In situ workflows – multiple tasks, same
box, within fixed time
HPC
InteroperabilityPortabilityGranularityMaintenance
![Page 21: Advances in Scientific Workflow Environments](https://reader033.fdocuments.net/reader033/viewer/2022052705/58a6c12d1a28ab661f8b6d3f/html5/thumbnails/21.jpg)
Workflow Environment Ecosystem
![Page 22: Advances in Scientific Workflow Environments](https://reader033.fdocuments.net/reader033/viewer/2022052705/58a6c12d1a28ab661f8b6d3f/html5/thumbnails/22.jpg)
Copernicus workflow engine for parallel adaptive molecular dynamics
• Peer-to-peer distributed computing platform– high-level parallelization of
statistical sampling problems• Consolidation of
heterogeneous compute resources
• Automatic resource matching of jobs against compute resources
• Automatic fault tolerance of distributed work
• Workflow execution engine to define a problem (reporting) and trace its results live (provenance)
• Flexible plugin facilities – programs to be integrated to the
workflow execution engine
Free Energy Workflow using GROMACS
http://copernicus-computing.org/
![Page 23: Advances in Scientific Workflow Environments](https://reader033.fdocuments.net/reader033/viewer/2022052705/58a6c12d1a28ab661f8b6d3f/html5/thumbnails/23.jpg)
COMPs/PyCOMPs: Programmer Productivity framework
• Sequential programming– Parallelisation and distribution
heavy-lifting– Dependency detection
• Infrastructure unaware– Abstract application from
underlying infrastructure– Portability
• Standard Programming Languages– Java, Python, C/C++
• No (or few!) APIs– Standard Java
![Page 24: Advances in Scientific Workflow Environments](https://reader033.fdocuments.net/reader033/viewer/2022052705/58a6c12d1a28ab661f8b6d3f/html5/thumbnails/24.jpg)
Shield the user/programmer
Exposure to the infrastructure
System Design
Resource provisioning
Adaptive/dynamic workflows
Manage/minimize data transfers
Smart parallelism
Code staging
Data stagingFail-over
Human in the loop
OS/R Guarantees
Service Guarantees
![Page 25: Advances in Scientific Workflow Environments](https://reader033.fdocuments.net/reader033/viewer/2022052705/58a6c12d1a28ab661f8b6d3f/html5/thumbnails/25.jpg)
Stop Press!GUIs not essential!• Canvas, drag-drop blocks,
arrows, run button• Command-line & embedding
in developer or user applications
Scripts can be workflows!• WMS<->Scripts• Script vs Workflows/ASAP:
– Automation: *****– Scaling: **– Abstraction: *– Provenance: **
![Page 26: Advances in Scientific Workflow Environments](https://reader033.fdocuments.net/reader033/viewer/2022052705/58a6c12d1a28ab661f8b6d3f/html5/thumbnails/26.jpg)
Stop Press!GUIs not essential!• Canvas, drag-drop blocks,
arrows, run button• Command-line & embedding
in developer or user applications
Scripts can be workflows!• WMS <-> Scripts• Script vs Workflows/ASAP:
– Automation: *****– Scaling: **– Abstraction: *– Provenance: **
Work close to a problem-specific ad-hoc data model
Domain Specific Language "programming-lite" scripts
• wire with declarative "makefile"-like DAG
Plus
• procedural scripting and expressions in languages like Javascript and Python
Nextflow, SnakeMake, Common Workflow Language
![Page 27: Advances in Scientific Workflow Environments](https://reader033.fdocuments.net/reader033/viewer/2022052705/58a6c12d1a28ab661f8b6d3f/html5/thumbnails/27.jpg)
GUIs Are Essential take-up by the user base
![Page 28: Advances in Scientific Workflow Environments](https://reader033.fdocuments.net/reader033/viewer/2022052705/58a6c12d1a28ab661f8b6d3f/html5/thumbnails/28.jpg)
Workflowising script software eco-systemsprime example: provenance
ASAP• common, interoperable
provenance recording– W3C PROV
ASAP• YesWorkflow.org
– Annotations in script yield workflow view
ASAP• Library profilers
– noWorkflow• runtime provenance
recorders– Sumatra, RDataTracker
![Page 29: Advances in Scientific Workflow Environments](https://reader033.fdocuments.net/reader033/viewer/2022052705/58a6c12d1a28ab661f8b6d3f/html5/thumbnails/29.jpg)
Provenance the link between computation and results
W3C PROV model standard
record for reportingcompare diffs/discrepanciesprovenance analyticstrack changes, adapt partial repeat/reproducecarry attributionscompute creditscompute data quality/trustselect data to keep/releaseoptimisation and debugging
Metadata propagation –where was the physical sample collected, and who should be attributed?
Task-based abstractions: simplifying provenance using motifs and tool annotations“Free energy calculation” rather than 5 steps including preparation of PDB files and GROMACS execution
![Page 30: Advances in Scientific Workflow Environments](https://reader033.fdocuments.net/reader033/viewer/2022052705/58a6c12d1a28ab661f8b6d3f/html5/thumbnails/30.jpg)
Provenance the link workflow variants and workflow reuse and repurpose
W3C PROV model standard?record for reportingcompare diffs/discrepanciesprovenance analyticstrack changes, adapt carry attributionscompute design creditsversioning, forking, cloning
Nested workflows functions by stealth
Copy and paste fragmentationDesigning for reuse Find and Go
Software practicesSystematic reuse
Guidelines for persistently identifying software using DataCitehttps://epubs.stfc.ac.uk/work/24058274
https://www.force11.org/software-citation-principles
![Page 31: Advances in Scientific Workflow Environments](https://reader033.fdocuments.net/reader033/viewer/2022052705/58a6c12d1a28ab661f8b6d3f/html5/thumbnails/31.jpg)
ASAP Wfms for FAIR Science
Automate: workflows, programs and services folks already use or want to use
Scale: Enable computational productivity
Abstract: Enable human productivity
Provenance: Record and use
Provenance
Reproducibility
PortabilityReuse
UsabilityUnderstanding
Validation
Workflow Plugged in Code
Reporting Comparison
Interoperability
Thanks to Bertram Ludascher
![Page 32: Advances in Scientific Workflow Environments](https://reader033.fdocuments.net/reader033/viewer/2022052705/58a6c12d1a28ab661f8b6d3f/html5/thumbnails/32.jpg)
Dependency Management
Codes Behaviours & Reliability
![Page 33: Advances in Scientific Workflow Environments](https://reader033.fdocuments.net/reader033/viewer/2022052705/58a6c12d1a28ab661f8b6d3f/html5/thumbnails/33.jpg)
● Task-specific “mini-workflow” fragments– e.g. using Gromacs, CPMD,
HADDOCK● Packaged
– EGI VM images and Docker containers
● Backed by existing registries– ELIXIR’s bio.tools and EGI
App DB● Instantiated as cloud
instances– private (Open Nebula, Open
Stack)– public (e.g. Amazon AWS )
Application Building BlocksBioExcel Virtualised Software Library“transversal workflow units”, higher level operations
![Page 34: Advances in Scientific Workflow Environments](https://reader033.fdocuments.net/reader033/viewer/2022052705/58a6c12d1a28ab661f8b6d3f/html5/thumbnails/34.jpg)
BioExcel Use cases
● Genomics● Ensembl Molecular
simulations● Free Energy simulations● Multiscale modelling of
molecular basis for odor and taste
● Biomolecular recognition● Pharmacological queries● Virtual Screening
![Page 35: Advances in Scientific Workflow Environments](https://reader033.fdocuments.net/reader033/viewer/2022052705/58a6c12d1a28ab661f8b6d3f/html5/thumbnails/35.jpg)
Finding valid pathways through free-energy landscapes: implementation of the “string of swarms” method using Copernicus as a workflow manager, and GROMACS as a compute engine.
![Page 36: Advances in Scientific Workflow Environments](https://reader033.fdocuments.net/reader033/viewer/2022052705/58a6c12d1a28ab661f8b6d3f/html5/thumbnails/36.jpg)
Workflow Interoperability. • Common format for bioinformatics tool
& workflow execution• Community based standards effort• Designed for clusters & clouds• Supports the use of containers (e.g.
Docker)• Specify data dependencies between
steps• Scatter/gather on steps• Nest workflows in steps
• Develop your pipeline on your local computer (optionally with Docker)
• Execute on your research cluster or in the cloud
• Deliver to users via workbenches
• EDAM ontology (ELIXIR-DK) to specify file formats and reason about them: “FASTQ Sanger” encoding is a type of FASTQ file
![Page 37: Advances in Scientific Workflow Environments](https://reader033.fdocuments.net/reader033/viewer/2022052705/58a6c12d1a28ab661f8b6d3f/html5/thumbnails/37.jpg)
Workflow Research Object Bundleresearchobject.org
Belhajjame et al (2015) Using a suite of ontologies for preserving workflow-centric research objects, J Web Semantics doi:10.1016/j.websem.2015.01.003
application/vnd.wf4ever.robundle+zip
![Page 38: Advances in Scientific Workflow Environments](https://reader033.fdocuments.net/reader033/viewer/2022052705/58a6c12d1a28ab661f8b6d3f/html5/thumbnails/38.jpg)
Generic Grid middleware
Workflow bus: provide services for1) Interoperability and integration, 2) composition, 3) provenance,
4) Enactment, 5) Human in the loop computing
Taverna Kepler Triana VLAMG
Sub workflow 1
Sub workflow 2
Sub workflow 3
Scientific experiment: a meta workflow
Sub workflow 4
Generic Grid middleware
Workflow bus: provide services for1) Interoperability and integration, 2) composition, 3) provenance,
4) Enactment, 5) Human in the loop computing
Taverna Kepler Triana VLAMG
Sub workflow 1
Sub workflow 2
Sub workflow 3
Scientific experiment: a meta workflow
Sub workflow 4
Z. Zhao et al., “Workflow bus for e-Science”, in IEEE e-Science 2006, Amsterdam
![Page 39: Advances in Scientific Workflow Environments](https://reader033.fdocuments.net/reader033/viewer/2022052705/58a6c12d1a28ab661f8b6d3f/html5/thumbnails/39.jpg)
2007
2015
![Page 40: Advances in Scientific Workflow Environments](https://reader033.fdocuments.net/reader033/viewer/2022052705/58a6c12d1a28ab661f8b6d3f/html5/thumbnails/40.jpg)
http://bioexcel.eu/events/bioexcel-workflow-training-for-computational-biomolecular-research/
Adam Hospital (IRB), Anna Montras (IRB), Stian Soiland-Reyes (UNIMAN), Alexandre Bonvin (UU), Adrien Melquiond (UU), Josep Lluís Gelpí (BSC), Daniele Lezzi (BSC), Steven Newhouse (EBI), Jose A. Dianes (EBI), Mark Abraham (KTH), Rossen Apostolov (KTH), Emiliano Ippoliti (Jülich), Adam Carter (UEDIN), Darren J. White (UEDIN)
Slides: Bertram Ludascher, Ewa Deelman, Vasa Curcin, Paolo Missier, Pinar Alper, Susheel Varma, Rob Finn, Michael Crusoe, Rizos Sakellariou
Sign upASAP!
![Page 41: Advances in Scientific Workflow Environments](https://reader033.fdocuments.net/reader033/viewer/2022052705/58a6c12d1a28ab661f8b6d3f/html5/thumbnails/41.jpg)
Bonus Slides