Starting Work ow Tasks Before They’re...
Transcript of Starting Work ow Tasks Before They’re...
Starting Workflow TasksBefore They’re Ready
Wladislaw Gusew, Bjorn Scheuermann
Computer Engineering Group, Humboldt University of Berlin
Agenda
I Introduction
I Execution semantics
I Methods and tools
I Simulation results
I Experimental results
I Conclusion
1 / 21
Big data in research
2 / 21
Scientific workflow example
I Directed Acyclic Graph(DAG)
I Executed on distributedsystems
I Aggregation and broadcasttypes of tasks
I Demanding for networkresources
3 / 21
Execution semantics
4 / 21
Execution semantics
4 / 21
Execution semantics
I But in reality resources are limited
I Execute only a subset of parent tasks concurrently(insufficient number of workers)
I Congestion of network (all parent tasks have the same priority)
4 / 21
Example execution
5 / 21
Example execution
5 / 21
Example execution
5 / 21
Example execution
I Network congestion can slow down processing even further(effects of data losses at the transport protocol layer)
I High delay to the start of the aggregation task
I Low performance andhigh execution costs (e.g., in computation clouds)
5 / 21
What can we do to improve this?
6 / 21
What can we do to improve this?
6 / 21
What can we do to improve this?
6 / 21
What can we do to improve this?
6 / 21
What can we do to improve this?
6 / 21
What can we do to improve this?
6 / 21
What can we do to improve this?
List of actions:
1. Obtain information on task’s input characteristics
2. Refine the workflow and inform the execution engine
3. Let the aggregation task ”feel comfortable” in changed setting
6 / 21
What can we do to improve this?
List of actions:
1. Obtain information on task’s input characteristics
2. Refine the workflow and inform the execution engine
3. Let the aggregation task ”feel comfortable” in changed setting
6 / 21
Obtaining input characteristics
1. Annotations to workflows
2. Manual code review
3. Automated profiling
7 / 21
Automated profiling
I Operating system instrumentation tool
I Enables interception of system calls(file open, read/write, file close)
I Record and evaluate logfiles withtraces of conducted file accesses.
0
0.5
1
1.5
2
2.5
3
0 0.5 1 1.5 2 2.5 3
Read accesses [M
B]
Execution progress [108 CPU cycles]
Reads by mAdd in a small workflow
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
0 2 4 6 8 10 12 14 16 18
Read accesses [M
B]
Execution progress [108 CPU cycles]
Reads by mAdd in a medium sized workflow
8 / 21
Automated profiling
I Operating system instrumentation tool
I Enables interception of system calls(file open, read/write, file close)
I Record and evaluate logfiles withtraces of conducted file accesses.
0
0.5
1
1.5
2
2.5
3
0 0.5 1 1.5 2 2.5 3
Read accesses [M
B]
Execution progress [108 CPU cycles]
Reads by mAdd in a small workflow
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
0 2 4 6 8 10 12 14 16 18
Read accesses [M
B]
Execution progress [108 CPU cycles]
Reads by mAdd in a medium sized workflow
8 / 21
Refining workflow by transforming DAG
9 / 21
Refining workflow by transforming DAG
9 / 21
Refining workflow by transforming DAG
9 / 21
Refining workflow by transforming DAG
9 / 21
Realizing virtual task split
I Real task is transparently wrapped
I FUSE enables the setup of a virtualFile system in USEr space
I Access to input files is performedthrough our wrapper
I Wrapper is responsible for maintainingthe correct execution logic
10 / 21
Evaluation with the Montage workflow
11 / 21
Simulating workflow execution
I Java-based simulation framework for scientific workflows
I Simulates an execution on a Pegasus/HTCondor stack
I Use provided Montage workflows with 25, 50, 100, 1000 tasks
I Python script conducted DAG transformation of DAX files
I Network configured as bottleneck (by bandwidth limitation)
W. Chen and E. Deelman, ”WorkflowSim: A toolkit for simulating scientific workflowsin distributed environments,” in eScience’12.
12 / 21
Simulation results
13 / 21
Simulation results
13 / 21
Variation of number of tasks
1
10
100
1000
25 50 100 1000
Total workflow runtime (log.) [s]
Number of tasks
Simulation results for 50 workers and max-min
Normal Split
15% 19%25%
31%
14 / 21
Variation of workers
15 / 21
Variation of workers
100
150
200
250
300
350
400
450
5 10 50 100
Total workflow runtime [s]
Number of workers
Simulation results for Montage100 and min-min
Normal Split
10%
14%
26% 25%
16 / 21
Variation of scheduling algorithms
17 / 21
Variation of scheduling algorithms
0
50
100
150
200
250
300
350
Min-minMax-min
Round-robin
HEFTDHEFT
Random
Total workflow runtime [s]
Scheduling algorithm
Simulation results for Montage100 on 100 workers
Normal Split
25% 27% 28% 25%
17% 34%
18 / 21
Evaluation in a computing cluster
I Small cluster of up to 10 compute nodes
I Intel i7 CPU@ 2.5GHz, 8GB RAM, connected to commonnetwork switch with 1Gbit/s
I Execute Montage 133 workflow in Pegasus/HTCondor
I Network bandwidth was limited on application layer to10Mbit/s
I 10 repetitions, mean values with 95% confidence intervals
19 / 21
Measurement results
0
20
40
60
80
100
120
140
160
180
200
1 2 3 4 5 6 7 8 9 10
Total workflow runtime [s]
Number of computing nodes
Computing cluster results for 1...10 workers
Original Montage133Transformed Montage133
20 / 21
Conclusion
I Many ”legacy” workflows exist which are executed with classicsemantics
I Our approach is applicable to aggregation tasks that are oftenthe most time intensive tasks in a workflow
I By using DAG transformation, no changes to taskimplementations and execution engines are required
I Simulation and real experiment show that performance can beimproved by up to 15%
I Potential of outperforming the original workflow grows withincreasing #workers and #tasks
21 / 21
Conclusion
I Many ”legacy” workflows exist which are executed with classicsemantics
I Our approach is applicable to aggregation tasks that are oftenthe most time intensive tasks in a workflow
I By using DAG transformation, no changes to taskimplementations and execution engines are required
I Simulation and real experiment show that performance can beimproved by up to 15%
I Potential of outperforming the original workflow grows withincreasing #workers and #tasks
21 / 21