DryadOpt: Branch-and-Bound on Distributed Data-Parallel Execution Engines

38
DryadOpt: Branch-and-Bound on Distributed Data-Parallel Execution Engines Mihai Budiu, Daniel Delling, Renato Werneck Microsoft Research - Silicon Valley IEEE International Parallel & Distributed Processing Symposium IPDPS 2011

description

DryadOpt: Branch-and-Bound on Distributed Data-Parallel Execution Engines. Mihai Budiu, Daniel Delling, Renato Werneck Microsoft Research - Silicon Valley IEEE International Parallel & Distributed Processing Symposium IPDPS 2011. DDPEEs. Your problem. Application. DryadOpt. FlumeJava. - PowerPoint PPT Presentation

Transcript of DryadOpt: Branch-and-Bound on Distributed Data-Parallel Execution Engines

Page 1: DryadOpt: Branch-and-Bound on Distributed Data-Parallel  Execution Engines

DryadOpt: Branch-and-Bound on Distributed Data-Parallel

Execution Engines

Mihai Budiu, Daniel Delling, Renato WerneckMicrosoft Research - Silicon Valley

IEEE International Parallel & Distributed Processing Symposium

IPDPS 2011

Page 2: DryadOpt: Branch-and-Bound on Distributed Data-Parallel  Execution Engines

2

DDPEEs

Execution

Application

Storage

Language

Map-Reduce

GFSBigTable

CosmosAzureHPC

Dryad

DryadLINQScope

FlumeJava

Hadoop

HDFS

Pig, HiveDryadOpt

Your problem

Page 3: DryadOpt: Branch-and-Bound on Distributed Data-Parallel  Execution Engines

3

Branch-And-Bound (BB)

• Solve optimization problems

• Explore potential solutions tree

• Bound solution cost• Prune search

Page 4: DryadOpt: Branch-and-Bound on Distributed Data-Parallel  Execution Engines

4

Optimization Problems

• Minimize/maximize cost• Many are NP-hard• Arise frequently in practice• Parallelism = linear speedup/exponential algorithm– may make a solution practical – e.g., one CPU-year / day– real-world instances are not always hard– relatively small problems

Page 5: DryadOpt: Branch-and-Bound on Distributed Data-Parallel  Execution Engines

5

Why Is This Work Interesting?

• Generic distributed BB implementation– Separate sequential and parallel components– Parallelism hidden from user

• DDPEEs offer a restricted computation model– Communication is expensive– DDPEEs require idempotent computations

(DryadOpt uses any sequential solver)• DryadOpt exploits parallelism well (CPU/core)

Page 6: DryadOpt: Branch-and-Bound on Distributed Data-Parallel  Execution Engines

6

Generic Solution Search

Solver API

Sequential Solver

DryadOpt

User

We

Page 7: DryadOpt: Branch-and-Bound on Distributed Data-Parallel  Execution Engines

Concern Separation

Solver interface

Sequentialengine

Multi-coreengine

Distributedengine

(DryadOpt) Solver engines

Specializedsequentialsolvers

Steinertree

Travellingsalesman

Optimizationproblem

Page 8: DryadOpt: Branch-and-Bound on Distributed Data-Parallel  Execution Engines

8

Outline

• Introduction• Mapping BB to DDPEEs• Running the algorithm• Parallelization details• Performance results• Conclusions

Page 9: DryadOpt: Branch-and-Bound on Distributed Data-Parallel  Execution Engines

9

DDPEE Computation Structure

Input Computations

Communication

Output

Computation graph is statically constructed

Page 10: DryadOpt: Branch-and-Bound on Distributed Data-Parallel  Execution Engines

10

Unbalanced Search Trees

No static tree partition will work well

Page 11: DryadOpt: Branch-and-Bound on Distributed Data-Parallel  Execution Engines

11

Algorithm structure

• Dynamic load-balancing• Iterative computation

Expand tree

Load-balance

Iterate

Page 12: DryadOpt: Branch-and-Bound on Distributed Data-Parallel  Execution Engines

12

Distributing Search Trees

Page 13: DryadOpt: Branch-and-Bound on Distributed Data-Parallel  Execution Engines

13

Outline

• Introduction• Mapping BB to DDPEEs• Running the algorithm• Parallelization details• Performance results• Conclusions

Page 14: DryadOpt: Branch-and-Bound on Distributed Data-Parallel  Execution Engines

14

1. Start tree on a single machine

Page 15: DryadOpt: Branch-and-Bound on Distributed Data-Parallel  Execution Engines

15

2. Split the open problems randomly

3. Distribute open problems

Page 16: DryadOpt: Branch-and-Bound on Distributed Data-Parallel  Execution Engines

16

4. Proceed independently

Page 17: DryadOpt: Branch-and-Bound on Distributed Data-Parallel  Execution Engines

17

5. Split Independently, Randomly

Page 18: DryadOpt: Branch-and-Bound on Distributed Data-Parallel  Execution Engines

18

6. Redistribute

Page 19: DryadOpt: Branch-and-Bound on Distributed Data-Parallel  Execution Engines

19

7. Merge

Page 20: DryadOpt: Branch-and-Bound on Distributed Data-Parallel  Execution Engines

20

8. Iterate

Page 21: DryadOpt: Branch-and-Bound on Distributed Data-Parallel  Execution Engines

21

Final Tree

Page 22: DryadOpt: Branch-and-Bound on Distributed Data-Parallel  Execution Engines

22

Outline

• Introduction• Mapping BB to DDPEEs• Running the algorithm• Parallelization details• Performance results• Conclusions

Page 23: DryadOpt: Branch-and-Bound on Distributed Data-Parallel  Execution Engines

23

Bird’s Eye View

Current frontier

New frontier

Sequential solver

Load-balancing

Aggregate stateTermination test

New frontier

instance

global state

Broadcast

computation

Repeat if not done

Page 24: DryadOpt: Branch-and-Bound on Distributed Data-Parallel  Execution Engines

24

Nested Parallelism

Inter-machine parallelism Inter-core parallelism

Partition

Merge

Page 25: DryadOpt: Branch-and-Bound on Distributed Data-Parallel  Execution Engines

25

Other Details in Paper

• Cluster resources are unpredictable– Outliers can lead to low cluster utilizationUse real-time scheduling

• Sequential solver is not idempotent– Fault tolerance-triggered re-executions

can lead to incorrect resultsCkeckpoint frontier at suitable execution points

Page 26: DryadOpt: Branch-and-Bound on Distributed Data-Parallel  Execution Engines

26

Other Details in Paper

• Trade-off memory/load balancing– The frontier can grow very largeAdjust dynamically tree traversal strategy BFS/DFS

• Sub-problems may differ little from problem– Many sub-problems can cause memory pressureUse an incremental sub-problem representation

Page 27: DryadOpt: Branch-and-Bound on Distributed Data-Parallel  Execution Engines

27

Outline

• Introduction• Mapping BB to DDPEEs• Running the algorithm• Parallelization details• Performance results• Conclusions

Page 28: DryadOpt: Branch-and-Bound on Distributed Data-Parallel  Execution Engines

28

Benchmark: Steiner Tree Solver

Page 29: DryadOpt: Branch-and-Bound on Distributed Data-Parallel  Execution Engines

29

Cluster

• Machines– 2 dual-core AMD Opteron 2.6Ghz– 16 GB RAM– Windows Server 2003

• DryadLINQ• 128 machines (512 cores)

Page 30: DryadOpt: Branch-and-Bound on Distributed Data-Parallel  Execution Engines

30

Scalability

Page 31: DryadOpt: Branch-and-Bound on Distributed Data-Parallel  Execution Engines

31

Conclusions

• Generic parallelization (problem-independent)• Nested machine/core parallelization• Careful scheduling needed for good performance• Solvers are not idempotent:

interference with fault-tolerance mechanisms

• Search Tree Exploration is efficiently parallelizable in the DDPEE model

Page 32: DryadOpt: Branch-and-Bound on Distributed Data-Parallel  Execution Engines

32

Backup Slides

Page 33: DryadOpt: Branch-and-Bound on Distributed Data-Parallel  Execution Engines

Real-Time Scheduling

Page 34: DryadOpt: Branch-and-Bound on Distributed Data-Parallel  Execution Engines

Real

-tim

e sc

hedu

ling

Rela

tive-

time

sche

dulin

g

time

Clus

ter m

achi

neCl

uste

r mac

hine

real-time deadlines

61m

Preempted Completed

Page 35: DryadOpt: Branch-and-Bound on Distributed Data-Parallel  Execution Engines

35

Load-Balancing

Page 36: DryadOpt: Branch-and-Bound on Distributed Data-Parallel  Execution Engines

36

• BFS: – large frontier– Efficient load-balancing– Memory pressure

• DFS– Reduces # of open subproblems

• Solution: dynamically switch BFS DFS

Tree Traversal Strategies

Page 37: DryadOpt: Branch-and-Bound on Distributed Data-Parallel  Execution Engines

37

The Solver API[Serializable] interface IBBInstance {}

[Serializable] interface IBBGlobalState {void Merge (IBBGlobalState s);void Copy (IBBGlobalState s); }

List<IBBInstance> Solve (List<IBBInstance> incrementalSteps,IBBGlobalState state,BBConfig c)

Page 38: DryadOpt: Branch-and-Bound on Distributed Data-Parallel  Execution Engines

38

Re-execution & Idempotence

X Y X Y X Y

X Y X Y

Y

Y X

Y

Y X

YY

?