Distributed Load Balancing for Key-Value Storage Systems
description
Transcript of Distributed Load Balancing for Key-Value Storage Systems
Distributed Load Balancing for Key-Value Storage Systems
Imranul HoqueMichael Spreitzer
Malgorzata Steinder
2
3
Key-Value Storage Systems
• Usage:– Session state, tags, comments, etc.
• Requirements:– Scalability– Fast response time– High availability & fault tolerance– Relaxed consistency guarantee
• Example: Cassandra, Dynamo, PNUTS, etc.
4
Load Balancing in K-V Storage
• Hash partitioned vs. range partitioned– Range partitioned data ensures efficient range
scan/search– Hash partitioned data helps even distribution
Server 1 Server 2 Server 3 Server 4
SATTUE
SUNMON
WED THUFRI
MON TUE WED THU FRI SAT SUN
Tablets Table
5
Issues with Load Balancing
• Uneven space distribution due to range partitioning– Solution: partition the tablets and move them
around• Few number of very popular records
Server 1 Server 2 Server 3 Server 4
SATTUE
SUNMON
WED THUFRI
6
Contribution
• Algorithms for solving the load balancing problem– Load = space, bandwidth– Evenly distribute the spare capacity– Distributed algorithm, not a centralized one– Reduce the number of moves
• Previous solutions:– One dimensional/key-space redistribution/bulk
loading
7
Outline
• Motivation• System modeling and assumptions• Algorithms– One-to-one– One-to-n– Move suppression
• Design decisions• Experimental results
Emulation of proposed distributed algorithms• Future works
8
System Modeling and Assumptions
Table
Tablet
Tablet
Tablet
Server A
Server B
Server C
B1, S1
B2, S2
B3, S3
BA, SA
BB, SB
BC, SC1. <= 0.01 in both dimensions2. # of tablets >> # of nodes
B1, S1
B4, S4
B5, S5
9
System State
B
STarget Zone:
helps achieve convergence
Target Point
Goal: Move tablets around so that every server is within the target zone
10
Load Balancing Algorithms
• Phase 1:– Global averaging scheme– Variance of the approximation of the average
decreases exponentially fast • Phase 2:– One-to-one gossip– One-to-n gossip– Move suppression
Phase 1 Phase 2 Phase 1 Phase 2
t
11
One-to-One Gossip
• Point selection strategy– Midpoint strategy – Greedy strategy
• Tablet transfer strategy– Move to the selected point with minimum cost
(space transferred)
12
Tablet Transfer StrategyServer 2
Server 1
Target for Server 1
B
S
13
Tablet Transfer Strategy (2)Server 1
Left Right
• Start with an empty bag• Goal: take vectors from the servers so that they add up
to the target vector• If slope(bag + left + right) < slope(target):– Add right to bag, move right– Otherwise, add left to bag move left
14
Initial Configurations
Uniform Two Extreme Mid Quadrant
15
Point Selection Strategy
• Midpoint Strategy+ Guaranteed convergence+ No need to run phase 1– Lots of extra movement
• Visualization Demo– Uniform– Two extreme– Mid quadrant 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Series1; 0.5Series1; 0.45
0.85
0.65
S
B
Server 1
Server 2
16
Point Selection Strategy (2)
• Greedy Strategy– Take the point closer to
the target– Move it to the target, if• improves the position of
the other point• does not worsen by more
than δ
• Reduces movement
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Series1; 0.5Series1; 0.45
0.85
Server 1
Server 2
Takes long time to converge in some cases
17
DHT-based Location Directory
18
DHT + Midpoint
• Greedy + fallback to DHT:– Convergence problem exists for some configurations– Visualization Demo
• Solution:– Greedy + fallback to DHT with Midpoint – Demo: uniform, two extreme, mid quadrant
• Alternate approach:– Greedy + fallback to Midpoint– Trade-off: movement cost vs. DHT overhead
19
Experimental Evaluation
• Uniform configuration– Greedy + DHT (Midpoint)– Midpoint– Greedy + Midpoint (No DHT)
• Effect of varying target zone• Effect of failed gossip count• Metrics– Amount of space moved– # of gossip rounds– Multiple tablet move
20
Uniform Configuration: Results
greedy midpoint greedy+mid0
200
400
600
800
1000
1200
Spac
e m
oved
1 2 3 4 5 6 7 8 9 > 100
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
greedymidpointgreedy+midpoint
# of movements
% o
f mov
ed ta
blet
s
greedy midpoint greedy+mid0
5
10
15
20
25
30
35
40
Failed GossipSuccessful Gossip
# of
roun
ds
21
Effect of Varying Target Zone
0.01 0.02 0.03 0.04 0.050
100200300400500600700800900
1000
GreedyMidpoint
Half-length of target zone
Spac
e m
oved
0.01 0.02 0.03 0.04 0.050
5
10
15
20
25
30
35
40
Avg. Failed Gossip: MidpointAvg.Successful Gossip: MidpointAvg. Failed Gossip: GreedyAvg. Successful Gossip: Greedy
Half-length of target zone#
of ro
unds
Larger target zone = fast convergence, less accuracy
Target zone width should depend on the target point value
22
Effect of Failed Gossip Count (Greedy)
5 10 15 20 250
5
10
15
20
25
30
Failed GossipSuccessful Gossip
Failed Gossip
# of
roun
ds
5 10 15 20 250
100
200
300
400
500
600
Failed Gossip
Spac
e m
oved
Large failed gossip count = More time in greedy mode, more unproductive gossip at the end
23
One-to-N Gossip
• Contact a few random nodes– Locked/unlocked mode
• Pick the most profitable one – Distance from the target is minimized
• Advantage– Better choices
• Initial results– Locked mode: may lead to deadlock– Unlocked mode: most of the cases other nodes start
transfer
24
Move Suppression
• Two global stages• Stage 1:– One-to-One gossip, but moves are hypothetical
• Stage 2:– Change to chosen placement
• Advantage– Tablet not moved multiple times
• Challenges– When to switch to Stage 2 from Stage 1
25
Future Works
• Handling initial placement• Frequency of running the placement
algorithm• Considering the network hierarchy• Handling failures• Extending to heterogeneous resources
Questions?