CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load...
-
Upload
julie-cole -
Category
Documents
-
view
216 -
download
2
Transcript of CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load...
![Page 1: CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.](https://reader035.fdocuments.net/reader035/viewer/2022070413/5697bfdf1a28abf838cb26a5/html5/thumbnails/1.jpg)
CS 484
• Load Balancing
![Page 2: CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.](https://reader035.fdocuments.net/reader035/viewer/2022070413/5697bfdf1a28abf838cb26a5/html5/thumbnails/2.jpg)
Load Balancing
Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal
Two types of load balancing Static Dynamic
![Page 3: CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.](https://reader035.fdocuments.net/reader035/viewer/2022070413/5697bfdf1a28abf838cb26a5/html5/thumbnails/3.jpg)
Load Balancing
The load balancing problem can be reduced to the bin-packing problem NP-complete
For simple cases, we can do well, but … Heterogeneity Different types of resources
ProcessorNetwork, etc.
![Page 4: CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.](https://reader035.fdocuments.net/reader035/viewer/2022070413/5697bfdf1a28abf838cb26a5/html5/thumbnails/4.jpg)
Evaluation of load balancing
Efficiency Are the processors always working? How much processing overhead is associated with the load balance algorithm?
Communication Does load balance introduce or affect the communication pattern?
How much communication overhead is associated with the load balance algorithm?
How many edges are cut in communication graph?
![Page 5: CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.](https://reader035.fdocuments.net/reader035/viewer/2022070413/5697bfdf1a28abf838cb26a5/html5/thumbnails/5.jpg)
Partitioning Techniques
Regular grids (-: Easy :-) striping blocking use processing power to divide load more fairly
Generalized Graphs Levelization Scattered Decomposition Recursive Bisection
![Page 6: CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.](https://reader035.fdocuments.net/reader035/viewer/2022070413/5697bfdf1a28abf838cb26a5/html5/thumbnails/6.jpg)
Example
• consider a set of twelve independent tasks with the following set of execution times: {10, 6, 4, 4, 2, 2, 2, 2, 1, 1, 1, 1}
• How would you distribute these tasks among 4 processors?
6
![Page 7: CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.](https://reader035.fdocuments.net/reader035/viewer/2022070413/5697bfdf1a28abf838cb26a5/html5/thumbnails/7.jpg)
Consecutive block assignment
7
execution time for these twelve tasks would be 20 time units
This schedule would take only 10 time units
![Page 8: CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.](https://reader035.fdocuments.net/reader035/viewer/2022070413/5697bfdf1a28abf838cb26a5/html5/thumbnails/8.jpg)
8
Evaluation of Load Balancing
• Goal: Find a good mapping from the application graph G = (V,E) onto the processor graph H = (U,F)
• Consider Load: max number of nodes from G assigned to any single node of H
Dilation: max distance of any route of a single edge from G in H
Congestion: max number of edges from G that have to be routed via any single edge in H
![Page 9: CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.](https://reader035.fdocuments.net/reader035/viewer/2022070413/5697bfdf1a28abf838cb26a5/html5/thumbnails/9.jpg)
9
Overall Goal: Find a mapping π that minimizes all three measures -load, dilation, and congestion
Note: Today’s networks make dilation inconsequential to some extent
![Page 10: CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.](https://reader035.fdocuments.net/reader035/viewer/2022070413/5697bfdf1a28abf838cb26a5/html5/thumbnails/10.jpg)
Levelization
Begin with a boundary Number these nodes 1
All nodes connected to a level 1 node are labeled 2, etc.Partitioning is performed determine the number of nodes per processor
count off the nodes of a level until exhausted
proceed to the next level
![Page 11: CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.](https://reader035.fdocuments.net/reader035/viewer/2022070413/5697bfdf1a28abf838cb26a5/html5/thumbnails/11.jpg)
Levelization
![Page 12: CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.](https://reader035.fdocuments.net/reader035/viewer/2022070413/5697bfdf1a28abf838cb26a5/html5/thumbnails/12.jpg)
Recursive Coordinate Bisection
Divide the domain based on the physical coordinates of the nodes.Pick a dimension and divide in half.RCB uses no connectivity informationlots of edges crossing boundariespartitions may be disconnected
Some new research based on graph separators overcomes some problems.
![Page 13: CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.](https://reader035.fdocuments.net/reader035/viewer/2022070413/5697bfdf1a28abf838cb26a5/html5/thumbnails/13.jpg)
13
Unbalanced Recursive Bisection
• An attempt at reducing communication costs
• Create subgrids that have better aspect ratios
• Instead of dividing the grid in half, consider unbalanced subgrids of size: 1/p and (p-1)/p 2/p and (p-2)/p Etc.
• Choose the partition size that minimizes the subgrid aspect ratio
![Page 14: CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.](https://reader035.fdocuments.net/reader035/viewer/2022070413/5697bfdf1a28abf838cb26a5/html5/thumbnails/14.jpg)
14
Unbalanced Recursive Bisection
![Page 15: CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.](https://reader035.fdocuments.net/reader035/viewer/2022070413/5697bfdf1a28abf838cb26a5/html5/thumbnails/15.jpg)
Graph Theory Based Algorithms
Geometric algorithms are generally low quality they don’t take into account connectivity
Graph theory algorithms apply what we know about generalized graphs to the partitioning problemHopefully, they reduce the cut size
![Page 16: CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.](https://reader035.fdocuments.net/reader035/viewer/2022070413/5697bfdf1a28abf838cb26a5/html5/thumbnails/16.jpg)
Greedy Bisection
Start with a vertex of the smallest degree least number of
edges
Mark all its neighborsMark all its neighbors neighbors, etc.The first n/p marked vertices form one subdomainApply the algorithm on the remaining
![Page 17: CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.](https://reader035.fdocuments.net/reader035/viewer/2022070413/5697bfdf1a28abf838cb26a5/html5/thumbnails/17.jpg)
Recursive Graph Bisection
Based on graph distance rather than coordinate distance.Determine the two furthest separated nodes
Organize and partition nodes according to their distance from extremities.
Computationally expensiveCan use approximation methods.
![Page 18: CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.](https://reader035.fdocuments.net/reader035/viewer/2022070413/5697bfdf1a28abf838cb26a5/html5/thumbnails/18.jpg)
Recursive Spectral Bisection
• Minimize the number of edges cut with the partition
18
![Page 19: CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.](https://reader035.fdocuments.net/reader035/viewer/2022070413/5697bfdf1a28abf838cb26a5/html5/thumbnails/19.jpg)
RCB 529 edges cut RGB 618 edges cut
RSB299 edges cut
![Page 20: CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.](https://reader035.fdocuments.net/reader035/viewer/2022070413/5697bfdf1a28abf838cb26a5/html5/thumbnails/20.jpg)
20
Dynamic Load Balancing
![Page 21: CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.](https://reader035.fdocuments.net/reader035/viewer/2022070413/5697bfdf1a28abf838cb26a5/html5/thumbnails/21.jpg)
Dynamic Load BalancingLoad is statically partitioned initially
Adjust load when an imbalance is detected.
Objectives rebalance the load keep edge cut minimized (communication) avoid having too much overhead
![Page 22: CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.](https://reader035.fdocuments.net/reader035/viewer/2022070413/5697bfdf1a28abf838cb26a5/html5/thumbnails/22.jpg)
Dynamic Load Balancing
Consider adaptive algorithmsAfter an interval of computation mesh is adjusted according to an estimate of the discretization errorcoarsened in areasrefined in others
Mesh adjustment causes load imbalance
![Page 23: CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.](https://reader035.fdocuments.net/reader035/viewer/2022070413/5697bfdf1a28abf838cb26a5/html5/thumbnails/23.jpg)
Centralized DLB
Control of the load is centralized
Two approaches Master-worker (Task scheduling)
Tasks are kept in central location Workers ask for tasks Requires that you have lots of tasks with weak locality requirements. No major communication between workers
Load Monitor Periodically, monitor load on the processors
Adjust load to keep optimal balance
![Page 24: CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.](https://reader035.fdocuments.net/reader035/viewer/2022070413/5697bfdf1a28abf838cb26a5/html5/thumbnails/24.jpg)
Decentralizing DLB
Generally focused on work poolTwo approaches Hierarchy
Fully distributed
![Page 25: CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.](https://reader035.fdocuments.net/reader035/viewer/2022070413/5697bfdf1a28abf838cb26a5/html5/thumbnails/25.jpg)
Fully Distributed DLB
Lower overhead than centralized schemes.No global information Load is locally optimized Propagation is slow Load balance may not be as good as centralized load balance scheme
Three steps Flow calculation (How much to move) Mesh node selection (Which work to move)
Actual mesh node migration
![Page 26: CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.](https://reader035.fdocuments.net/reader035/viewer/2022070413/5697bfdf1a28abf838cb26a5/html5/thumbnails/26.jpg)
Flow calculation
View as a network flow problem Add source and sink nodes Connect source to all nodes
edge value is current load Connect sink to all nodes
edge value is mean loadprocessor communication graph
![Page 27: CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.](https://reader035.fdocuments.net/reader035/viewer/2022070413/5697bfdf1a28abf838cb26a5/html5/thumbnails/27.jpg)
Flow calculation
Many network flow algorithms more intense than necessary not parallel
Use simpler, more scalable algorithmsRandom Matchings pick random neighboring processes
exchange some load eventually you may get there
![Page 28: CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.](https://reader035.fdocuments.net/reader035/viewer/2022070413/5697bfdf1a28abf838cb26a5/html5/thumbnails/28.jpg)
Diffusion
Each processor balances its load with all its neighbors How much work should I have? ( is weighting factor)
How much to send on an edge?
Repeat until all load is balancedsteps
![Page 29: CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.](https://reader035.fdocuments.net/reader035/viewer/2022070413/5697bfdf1a28abf838cb26a5/html5/thumbnails/29.jpg)
Diffusion
Convergence to load balance can be slow
Can be improved with over-relaxation Monitor what is sent in each step Determine how much to send based on current imbalance and how much was sent in previous steps
Diffuses load in steps
![Page 30: CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.](https://reader035.fdocuments.net/reader035/viewer/2022070413/5697bfdf1a28abf838cb26a5/html5/thumbnails/30.jpg)
Dimension Exchange
Rather than communicate with all neighbors each round, only communicate with one synchronous algorithm Comes from dimensions of hypercube Use edge coloring for general graphs
Exchange load with neighbor along a dimension l = (li + lj)/2
Will converge in d steps if hypercubeSome graphs may need different factor to converge faster l = li * a + lj * (1 –a)