PWL Denver: Copysets
-
Upload
aysylu-greenberg -
Category
Software
-
view
285 -
download
1
Transcript of PWL Denver: Copysets
opysets:
Reducing the Frequency of Data Lossin Cloud Storage
Aysylu Greenberg
Papers We Love Denver
April 27, 2017
Welcome Papers We Love Denver!
Aysylu Greenberg
@aysylu22
paperswelove.org
Today
• Random replication
• Copyset Replication
• Copyset Replication with Scatter Width
• Pragmatic aspects
RANDOM REPLICATION
Overview & Tradeoffs
Random Replication
R = 3N = 9
Random Replication:Correlated Failures
Recovery from Data Loss
Fixed cost of restoring lost data is high
Lose more data but less often
Increase in R is expensive
Random Replication:Tradeoff
{small amount & high frequency} data loss
{large amount & low frequency} data loss
COPYSET REPLICATION
Intuition
Copyset Replication
R = 3N = 9
Copyset Replication
R = 3N = 9
S = 2
Recovery from Node Failure
Simpler recovery than random replication:
R – 1 nodes with data
Higher load on small number of nodes
Copyset Replication with S = 2:Tradeoff
{small amount & high frequency} data loss
{large amount & low frequency} data loss
SCATTER WIDTH
Tuning choices
Copyset Replication with S=2
R = 3N = 9
Copyset Replication with S=4
R = 3N = 9
Copyset Replication with S = 4
1 2 3
654
7 8 9 R = 3N = 9
1 2 3
654
7 8 9
1 2 3
654
7 8 9
Copyset Replication with S = 4:Permutation Phase
1 2 3
654
7 8 9
Copyset Replication with S = 4:Permutation Phase
1 2 3
654
7 8 9
Copyset Replication with S = 4:Permutation Phase
1 2 3
654
7 8 9
Copyset Replication with S = 4:Permutation Phase
Tuning Scatter Width
Set by system designer to control parallelism of data recovery
Control load on each individual node during recovery
Copyset Replication Scatter Width:Tradeoffs
{small amount & high frequency} data loss
{large amount & low frequency} data loss
Scatter Width:Tuning Choices
Random replication: scatter width of N-1, lots of replica sets
Scatter Width:Tuning Choices
Random replication: scatter width of N-1, lots of replica sets
S << N
Scatter Width:Tuning Choices
Random replication: scatter width of N-1, lots of replica sets
S << N
To reduce frequency of data loss, minimize:
FROM IDEAS TO PRACTICE
Pragmatic aspects
Pragmatic Aspects
• Move randomization to permutation stage
• Low overhead on operations
• Near optimal and fast
• Support for dynamic systems while maintaining guarantees is tricky -> chainsets(http://hackingdistributed.com/2014/02/14/chainsets/)
• Tiered replicationhttps://www.usenix.org/conference/atc15/technical-session/presentation/cidon
opysets:
Reducing the Frequency of Data Lossin Cloud Storage
Aysylu Greenberg
Papers We Love Denver
April 27, 2017