PUSH-- a Dataflow Shell

1
1. Observation (Made by Streamline etc...) This... ... is just a large combination of these: f1 pipe 1 f2 0 f1 pipe 1 f2 0 pipe 13 pipe 5 pipe 9 f3 0 f3 0 f3 0 pipe 1 f4 6 pipe 1 10 pipe 1 14 pipe 1 f5 0 ... which becomes this dataflow pipelined command set. See Also: http://www.research.ibm.com/hare http://code.google.com/p/push/ 2. If everything’s a pipe in Dataflow programming, why not use a shell? References Willem de Bruijn. Adaptive Operating System Design for High Throughput I/O. PhD thesis, Vrije Universiteit Amsterdam, 2010. M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: distributed data-parallel programs from sequential building blocks. In Proceedings of the 2007 conference on EuroSys, pages 59–72. ACM Press New York, NY, USA, 2007. f1 |< f3 >| f5 This command... >| cmd |< cmd $ irf cmd cmd cmd $ orf f1 f3 f5 ... transforms to this syntax tree ... XCPU 3 See XCPU 3 poster for more details on job distribution Job Distribution laptop GPUtask Celltask BG/Ptask 3. How? Shell should be orchestrator Need a way to do Pipe Fork Exec over a large Number of machines Need a way of moving to records from byte streams 4. Dataflow pipes cmd1 |< cmd2 >| cmd3 |< Fanout: one to many >| Fanin: many to one Must be paired 5. Record handling in pipes User Defined: Implicit or Explicit ORF (output record filter) default hashes 1 to many newline separated IRF (Input Record Filter) default merges buffers on newlines 7. Conclusions Systems level not language level Easy to change record handling Configurable degree of parallelism Cross Platform (Win32, Linux, OSX) Not Batch, Interactive 6. Research Challenges + Future Work Exascale Pipe Fork Exec Graph optimization at XCPU 3 Cloud Integration Work-stealing This work has been supported by the Department of Energy Of Office of Science Operating and Runtime Systems For Extreme Scale Scientific Computation project under contract #DE-FG02-08ER25851 Push: a Dataflow Shell Noah Evans, Eric Van Hensbergen

description

 

Transcript of PUSH-- a Dataflow Shell

Page 1: PUSH-- a Dataflow Shell

1. Observation (Made by Streamline etc...)

This...

... is just a large combination of

these:

f1

pipe

1

f2

0 f1pipe1

f20

pipe

13

pipe

5

pipe9

f30

f30

f30

pipe1

f4

6pipe1

10

pipe1

14pipe1

f50

... which becomes this dataflowpipelined command set.

See Also:http://www.research.ibm.com/hare

http://code.google.com/p/push/

2. If everything’s a pipe in Dataflow programming, why not use a shell?

References•Willem de Bruijn. Adaptive Operating System Design for High Throughput I/O. PhD thesis, Vrije Universiteit Amsterdam, 2010.

•M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: distributed data-parallel programs from sequential building blocks. In Proceedings of the 2007 conference on EuroSys, pages 59–72. ACM Press New York, NY, USA, 2007.

f1 |< f3 >| f5

This command...

>|

cmd |< cmd

$

irf

cmd cmd cmd

$

orf

f1 f3

f5

... transforms to this syntax tree ...

XCPU3

See XCPU3 poster for more details on job distribution

Job Distribution

laptop → GPUtask → Celltask → BG/Ptask

3. How?•Shell should be orchestrator•Need a way to do Pipe → Fork → Exec over a large Number of machines•Need a way of moving to records from byte streams

4. Dataflow pipescmd1 |< cmd2 >| cmd3! |< Fanout: one to many! >| Fanin: many to one! Must be paired

5. Record handling in pipes•User Defined: Implicit or Explicit•ORF (output record filter)•default hashes 1 to many•newline separated

•IRF (Input Record Filter)•default merges buffers on newlines

7. Conclusions•Systems level not language level•Easy to change record handling•Configurable degree of parallelism•Cross Platform (Win32, Linux, OSX)•Not Batch, Interactive

6. Research Challenges + Future Work•Exascale Pipe → Fork → Exec•Graph optimization at XCPU3

•Cloud Integration•Work-stealing

This work has been supported by the Department of Energy Of Office of Science Operating and Runtime Systems For Extreme Scale Scientific Computation project under contract #DE-FG02-08ER25851

Push: a Dataflow ShellNoah Evans, Eric Van Hensbergen