Scientific Workflow Interchanging Through Patterns: Reversals and Lessons Learned Bruno Fernandes...
-
Upload
shannon-smith -
Category
Documents
-
view
217 -
download
0
Transcript of Scientific Workflow Interchanging Through Patterns: Reversals and Lessons Learned Bruno Fernandes...
1
Scientific Workflow Interchanging Through Patterns:
Reversals and Lessons Learned
Bruno Fernandes BastosRegina Maria Maciel Braga
Antônio Tadeu Azevedo Gomes
2
Agenda
• Introduction– Problem Formulation and Initial Hypothesis
• Envisioned Solution• Preliminary Experiments• Reformulated Hypothesis• Qualitative Analysis of the Research Material– The myExperiment Repository
• Related Work• Conclusions
3
Introduction
• Scientific workflows are used for tackling complex problems in different e-science domains– They may be described as a directed graph where the vertices
represent the tasks and the edges represent the data relationships between the tasks
• Several Scientific Workflow Management Systems (SWfMSs) have been developed– Specifying scientific workflows with higher-level abstractions
(Workflow Specification Languages - WfSL) than scripts,– Orchestrating the execution of the tasks, and – Managing the data consumed and produced by these
workflows.
4
Problem Formulation• We formulated our research problem– The state-of-the-art in SWfMSs does not allow a
scientist to easily reuse workflow specifications previously modeled in other SWfMSs than those this scientist is used to work with.
5
Initial Hypothesis
• The use of workflow patterns could help in keeping the semantics of a workflow– The use of workflow patterns combined with software
architecture concepts to capture the key semantics expressed in workflow specifications enables the establishment of automated processes that transform these specifications across different SWfMSs.
• These processes allow for a reduction on the effort scientists would make to reuse workflow specifications developed by other research groups in SWfMSs that are not part of the usual tooling these scientists employ in their daily work
6
Envisioned Solution
• A novel language for interchanging workflow specifications– Using the Acme architecture description interchange
language• It was based on the specification of a single
architectural style where the components were the tasks and the connectors were the patterns– Definition of an interchangeable workflow: workflow
composed of a set of “interchangeable elements”• Constants, subworkflows and webservices tasks
7
Envisioned SolutionPatterns
Structural• Sequence: binds a single output port to a single input port;• Parallel Split: binds a single output port to two or more input ports,
replicating the same data from the output port to all input ports;• Simple Merge: binds two or more output ports to a single input port, feeding
the input port with data received from each output port in an interleaved way;
Behavioral• Synchronization: similar in structure to the Simple Merge pattern, but the
task with the input port may be only executed when data coming from all the output ports have been received and grouped according to some criteria;
• Exclusive Choice: similar in structure to the Parallel Split pattern, but only one of the input ports may receive data from the output port, according to some condition.
8
Workflow Pattern Identification• Patterns may be implemented in different ways
– Depending on the features each SWfMS supports– Eg: Exclusive Choice Pattern
9
Preliminary Experiments
• Experiment Planning– 4 VisTrails, 46 Kepler and 1452 Taverna
specifications• For the 1st hypothesis the task type matters– VisTrails has only one Web Service and it is not
available– Kepler has 45 types of tasks but none of them is a
Web Service– Taverna has more than 100 types and many Web
Services
10
Preliminary Experiments• Analysis of the workflow transformations
– 53% of the Taverna tasks were interchangeable
Quantity of TasksQuantity of Interchangeable Workflows
11
Reformulated Hypothesis
The use of workflow patterns and software architecture concepts to capture the key structural semantics expressed in workflow specifications enables the establishment of semi-automated processes that transform these specifications across different WfSLs. These processes allow for a reduction on the effort scientists would make to reuse structurally complex workflow specifications (in the sense of having a large number of tasks and dependency relationships between these tasks) developed by other research groups in SWfMSs that are not part of the usual tooling these scientists employ in their daily work.
12
Further Experiments
• After interchanging the workflows structures we could interchange almost all workflows (98.28%)– Problems related to
the patterns identification
13
Qualitative Analisys of the Research Material
• The myExperiment repository– Webservice tasks implemented as either local,
inaccessible, or authenticated, which made it impossible to execute these workflows, even in their source specifications
– Lack of documentation: Most of the analyzed workflows have no or very few metadata information
• Similar problems reported in the Wf4Ever project– Proposal of a new myExperiment repository
14
Qualitative Analisys of the Research Material
• The studied systems– Once a task has its type defined and its input and
output ports linked to other tasks, it cannot have its type changed, therefore it needs to be removed• Once removed the relations are gone!• It reduces the utility of our approach
– Some SWfMS have limitations• VisTrails does not export subworkflows
15
Related Work
• Taverna 2-Galaxy and Tavaxy– Limited to two SWfMSs and their adaptability to a
broader range of SWfMSs would depend on a complete reformulation of their architectures• Although Tavaxy brings the patterns approach
• IWIR– Most similar to ours– Syntactical structures that are quite similar to those
defined for the SWfMSs• Other works
16
Conclusions
• This research endeavor started with exploratory studies aiming at identifying whether it would be possible to establish “future-proof” automated processes for transforming workflows between different SWfMSs.
• It was unclear whether the perceived problem does actually exist, and the experimental data we employed may point out in a different direction.
• The fact that the myExperiment repository is plenty of “toy” made it harder to execute a proof of concept.
17
Questions