Distributed Cluster Repair for OceanStore
description
Transcript of Distributed Cluster Repair for OceanStore
![Page 1: Distributed Cluster Repair for OceanStore](https://reader035.fdocuments.net/reader035/viewer/2022062805/56814dc8550346895dbb1ca4/html5/thumbnails/1.jpg)
Distributed Cluster Repair for OceanStore
Irena Nadjakova and Arindam Chakrabarti
Acknowledgements:Hakim Weatherspoon
John Kubiatowicz
![Page 2: Distributed Cluster Repair for OceanStore](https://reader035.fdocuments.net/reader035/viewer/2022062805/56814dc8550346895dbb1ca4/html5/thumbnails/2.jpg)
Dec 9, 2003 Distributed Cluster Repair for OceanStore 2
OceanStore Overview
Data Storage Utility• Robustness• Security• Durability • High availability • Global-scale
![Page 3: Distributed Cluster Repair for OceanStore](https://reader035.fdocuments.net/reader035/viewer/2022062805/56814dc8550346895dbb1ca4/html5/thumbnails/3.jpg)
Dec 9, 2003 Distributed Cluster Repair for OceanStore 3
Where our project fits in
Durability• Automatic version-management• Highly redundant erasure-coding• Massive dissemination of
fragments on machines with highly uncorrelated availability.
![Page 4: Distributed Cluster Repair for OceanStore](https://reader035.fdocuments.net/reader035/viewer/2022062805/56814dc8550346895dbb1ca4/html5/thumbnails/4.jpg)
Dec 9, 2003 Distributed Cluster Repair for OceanStore 4
The internet
![Page 5: Distributed Cluster Repair for OceanStore](https://reader035.fdocuments.net/reader035/viewer/2022062805/56814dc8550346895dbb1ca4/html5/thumbnails/5.jpg)
Dec 9, 2003 Distributed Cluster Repair for OceanStore 5
Choosing locations for storing a fragment
![Page 6: Distributed Cluster Repair for OceanStore](https://reader035.fdocuments.net/reader035/viewer/2022062805/56814dc8550346895dbb1ca4/html5/thumbnails/6.jpg)
Dec 9, 2003 Distributed Cluster Repair for OceanStore 6
Choosing locations for storing a fragment
![Page 7: Distributed Cluster Repair for OceanStore](https://reader035.fdocuments.net/reader035/viewer/2022062805/56814dc8550346895dbb1ca4/html5/thumbnails/7.jpg)
Dec 9, 2003 Distributed Cluster Repair for OceanStore 7
Choosing locations for storing a fragment
![Page 8: Distributed Cluster Repair for OceanStore](https://reader035.fdocuments.net/reader035/viewer/2022062805/56814dc8550346895dbb1ca4/html5/thumbnails/8.jpg)
Dec 9, 2003 Distributed Cluster Repair for OceanStore 8
Clustering
![Page 9: Distributed Cluster Repair for OceanStore](https://reader035.fdocuments.net/reader035/viewer/2022062805/56814dc8550346895dbb1ca4/html5/thumbnails/9.jpg)
Dec 9, 2003 Distributed Cluster Repair for OceanStore 9
Clustering
![Page 10: Distributed Cluster Repair for OceanStore](https://reader035.fdocuments.net/reader035/viewer/2022062805/56814dc8550346895dbb1ca4/html5/thumbnails/10.jpg)
Dec 9, 2003 Distributed Cluster Repair for OceanStore 10
OceanStore solution
• Availability of each machine tracked over time.
• Machines that have very little availability are not used for fragment storage.
• Distance between each pair of machines computed. (high mutual information ) close)
• Cluster the machines into chunks based on this distance using normalized cuts.
• All the computation is done on one central computer (Cluster Server).
![Page 11: Distributed Cluster Repair for OceanStore](https://reader035.fdocuments.net/reader035/viewer/2022062805/56814dc8550346895dbb1ca4/html5/thumbnails/11.jpg)
Dec 9, 2003 Distributed Cluster Repair for OceanStore 11
OceanStore solution
• Machines that are highly correlated in availability are in same cluster.
• Machines in separate clusters have low correlation in availability.
• When a node needs to store replica fragments, it requests cluster information from the cluster server and uses it to send each fragment to k nodes: one from each of k different clusters.
![Page 12: Distributed Cluster Repair for OceanStore](https://reader035.fdocuments.net/reader035/viewer/2022062805/56814dc8550346895dbb1ca4/html5/thumbnails/12.jpg)
Dec 9, 2003 Distributed Cluster Repair for OceanStore 12
Cluster creation
• Needs centralized computation.• Can we do it in a distributed manner ?• NCuts is one stumbling block. It seems to
need the entire graph.• Having to pull the cluster info from one
central cluster server: single point of failure• Can we have a “Distributed NCuts” algo to
look at subgraphs ? How to make subgraphs? Do we need to know the entire graph to decide how to divide it into pieces ?
![Page 13: Distributed Cluster Repair for OceanStore](https://reader035.fdocuments.net/reader035/viewer/2022062805/56814dc8550346895dbb1ca4/html5/thumbnails/13.jpg)
Dec 9, 2003 Distributed Cluster Repair for OceanStore 13
Distributed clustering
![Page 14: Distributed Cluster Repair for OceanStore](https://reader035.fdocuments.net/reader035/viewer/2022062805/56814dc8550346895dbb1ca4/html5/thumbnails/14.jpg)
Dec 9, 2003 Distributed Cluster Repair for OceanStore 14
Distributed clustering
![Page 15: Distributed Cluster Repair for OceanStore](https://reader035.fdocuments.net/reader035/viewer/2022062805/56814dc8550346895dbb1ca4/html5/thumbnails/15.jpg)
Dec 9, 2003 Distributed Cluster Repair for OceanStore 15
Distributed clustering
![Page 16: Distributed Cluster Repair for OceanStore](https://reader035.fdocuments.net/reader035/viewer/2022062805/56814dc8550346895dbb1ca4/html5/thumbnails/16.jpg)
Dec 9, 2003 Distributed Cluster Repair for OceanStore 16
Distributed clustering
![Page 17: Distributed Cluster Repair for OceanStore](https://reader035.fdocuments.net/reader035/viewer/2022062805/56814dc8550346895dbb1ca4/html5/thumbnails/17.jpg)
Dec 9, 2003 Distributed Cluster Repair for OceanStore 17
Distributed clustering
![Page 18: Distributed Cluster Repair for OceanStore](https://reader035.fdocuments.net/reader035/viewer/2022062805/56814dc8550346895dbb1ca4/html5/thumbnails/18.jpg)
Dec 9, 2003 Distributed Cluster Repair for OceanStore 18
Initial idea
• We run the centralized algorithm once for some time period (chose 73 days) to generate some initial clustering (expensive!)
• We distribute the machines among some f cluster servers– Each has a smaller subset of size num of the
initial machines– Keeping the initial clustering proportions for
each node– Each machine occurs in approximately
equal number of cluster servers
![Page 19: Distributed Cluster Repair for OceanStore](https://reader035.fdocuments.net/reader035/viewer/2022062805/56814dc8550346895dbb1ca4/html5/thumbnails/19.jpg)
Dec 9, 2003 Distributed Cluster Repair for OceanStore 19
Initial idea (cont)
• Now we can afford to recluster the machines on each server frequently to keep up with the network changes.– Chose to do it once every 30 days for
the simulation purposes, but can easily be done a lot more often
![Page 20: Distributed Cluster Repair for OceanStore](https://reader035.fdocuments.net/reader035/viewer/2022062805/56814dc8550346895dbb1ca4/html5/thumbnails/20.jpg)
Dec 9, 2003 Distributed Cluster Repair for OceanStore 20
Evaluation
• To see how well this does, we want to compare it with the original global algorithm, run in the same time period.
• Metric – the average mutual informationI(x,y) = P(x,y) log P(x,y)/P(x)P(y)
– Average MI for a single server is just the average of the mutual information between pairs of machines in different clusters on the server
– On multiple servers, we compute the above on every server, then average among servers
![Page 21: Distributed Cluster Repair for OceanStore](https://reader035.fdocuments.net/reader035/viewer/2022062805/56814dc8550346895dbb1ca4/html5/thumbnails/21.jpg)
Dec 9, 2003 Distributed Cluster Repair for OceanStore 21
Simulating Network Evolution Dynamics
• We have availability data for 1000 machines for a period of 73 days.
• We use it to simulate the behavior of a network with 1000 machines over a period of 730 days = 2 years.
• We simulate networks with varying evolution characteristics to evaluate the robustness of our distributed cluster repair algorithm.
![Page 22: Distributed Cluster Repair for OceanStore](https://reader035.fdocuments.net/reader035/viewer/2022062805/56814dc8550346895dbb1ca4/html5/thumbnails/22.jpg)
Dec 9, 2003 Distributed Cluster Repair for OceanStore 22
Simulating Network Evolution Dynamics
Qualities of a good network:• Maybe server availability (AV) should
not vary drastically in the future ?• Maybe average server repair time
(MTTR) should not vary drastically ?• Maybe mean time to failure (MTTF)
should not vary drastically ?• Maybe failure correlations (FCOR)
should also not vary drastically ?
![Page 23: Distributed Cluster Repair for OceanStore](https://reader035.fdocuments.net/reader035/viewer/2022062805/56814dc8550346895dbb1ca4/html5/thumbnails/23.jpg)
Dec 9, 2003 Distributed Cluster Repair for OceanStore 23
NS Algo 1: Sanity Check 1
Global déjà vu• Maintains AV, MTTF, MTTR, FCOR• Simulates a well-behaved network.• Our distributed update algorithm
should do very well on this.
![Page 24: Distributed Cluster Repair for OceanStore](https://reader035.fdocuments.net/reader035/viewer/2022062805/56814dc8550346895dbb1ca4/html5/thumbnails/24.jpg)
Dec 9, 2003 Distributed Cluster Repair for OceanStore 24
NS Algo 2: Acid Test 1
Local déjà vu• Maintains AV, MTTF, MTTR, but not
FCOR.
![Page 25: Distributed Cluster Repair for OceanStore](https://reader035.fdocuments.net/reader035/viewer/2022062805/56814dc8550346895dbb1ca4/html5/thumbnails/25.jpg)
Dec 9, 2003 Distributed Cluster Repair for OceanStore 25
NS Algo 3: Acid Test 2
Births and Deaths• Maintains AV, MTTF, MTTR, and FCOR,
but only for some nodes, and for some time.
• Nodes are taken off (die) the network or are added to (born) the network at certain times. When they are actually on the network, they maintain their AV, MTTF, MTTR, FCOR.
![Page 26: Distributed Cluster Repair for OceanStore](https://reader035.fdocuments.net/reader035/viewer/2022062805/56814dc8550346895dbb1ca4/html5/thumbnails/26.jpg)
Dec 9, 2003 Distributed Cluster Repair for OceanStore 26
NS Algo 4: Acid Test 3
Noisy Global déjà vu• Maintains AV, MTTF, MTTR, FCOR
to a large extent, but adds some Gaussian noise, representing the variations that may be observed in a real network.
![Page 27: Distributed Cluster Repair for OceanStore](https://reader035.fdocuments.net/reader035/viewer/2022062805/56814dc8550346895dbb1ca4/html5/thumbnails/27.jpg)
Dec 9, 2003 Distributed Cluster Repair for OceanStore 27
NS Algo 5: Acid Test 4
Noisy Local déjà vu• Maintains AV, MTTF, MTTR, but not
FCOR, and also adds some Gaussian noise representing the variations that may be observed in a real network.
• Does our algorithm do well in this situation ? If yes, how robust is it to noise ?
![Page 28: Distributed Cluster Repair for OceanStore](https://reader035.fdocuments.net/reader035/viewer/2022062805/56814dc8550346895dbb1ca4/html5/thumbnails/28.jpg)
Dec 9, 2003 Distributed Cluster Repair for OceanStore 28
![Page 29: Distributed Cluster Repair for OceanStore](https://reader035.fdocuments.net/reader035/viewer/2022062805/56814dc8550346895dbb1ca4/html5/thumbnails/29.jpg)
Dec 9, 2003 Distributed Cluster Repair for OceanStore 29
![Page 30: Distributed Cluster Repair for OceanStore](https://reader035.fdocuments.net/reader035/viewer/2022062805/56814dc8550346895dbb1ca4/html5/thumbnails/30.jpg)
Dec 9, 2003 Distributed Cluster Repair for OceanStore 30
![Page 31: Distributed Cluster Repair for OceanStore](https://reader035.fdocuments.net/reader035/viewer/2022062805/56814dc8550346895dbb1ca4/html5/thumbnails/31.jpg)
Dec 9, 2003 Distributed Cluster Repair for OceanStore 31
![Page 32: Distributed Cluster Repair for OceanStore](https://reader035.fdocuments.net/reader035/viewer/2022062805/56814dc8550346895dbb1ca4/html5/thumbnails/32.jpg)
Dec 9, 2003 Distributed Cluster Repair for OceanStore 32
Still problems
• Initial clustering is expensive• What happens if we don’t use it?
![Page 33: Distributed Cluster Repair for OceanStore](https://reader035.fdocuments.net/reader035/viewer/2022062805/56814dc8550346895dbb1ca4/html5/thumbnails/33.jpg)
Dec 9, 2003 Distributed Cluster Repair for OceanStore 33
![Page 34: Distributed Cluster Repair for OceanStore](https://reader035.fdocuments.net/reader035/viewer/2022062805/56814dc8550346895dbb1ca4/html5/thumbnails/34.jpg)
Dec 9, 2003 Distributed Cluster Repair for OceanStore 34
How to fix this?
• Randomly distribute machines to the servers
• Perform local clustering• Find the ‘unwanted’ elements (highest
mutual information with the rest on this node)
• Exchange them with ‘unwanted’ elements of another cluster to which the first ones are least correlated
• Communication overhead is low; most computation can proceed without a lot of communication
![Page 35: Distributed Cluster Repair for OceanStore](https://reader035.fdocuments.net/reader035/viewer/2022062805/56814dc8550346895dbb1ca4/html5/thumbnails/35.jpg)
Dec 9, 2003 Distributed Cluster Repair for OceanStore 35
Under development…
• What we have so far is a scheme that – picks a server at random– finds a few unwanted elements– exchanges those with the same number of
unwanted elements of another server – picked at random, or having the best correlation with the unwanted elements of the first server
• The percentage improvement is small so far – 0.4%-1.5% for the first 5 or so runs. It falls off afterwards.
![Page 36: Distributed Cluster Repair for OceanStore](https://reader035.fdocuments.net/reader035/viewer/2022062805/56814dc8550346895dbb1ca4/html5/thumbnails/36.jpg)
Dec 9, 2003 Distributed Cluster Repair for OceanStore 36
How to improve
• Exchange more than 1 machines ?• Run for several generations ?• It may be even better to just
randomly exchange machines, as long as the overall average mutual information of the distributed cluster decreases.
![Page 37: Distributed Cluster Repair for OceanStore](https://reader035.fdocuments.net/reader035/viewer/2022062805/56814dc8550346895dbb1ca4/html5/thumbnails/37.jpg)
Dec 9, 2003 Distributed Cluster Repair for OceanStore 37
Summary of achievements
• Towards getting rid of expensive centralized cluster creation
• Scalable distributed cluster management scheme
![Page 38: Distributed Cluster Repair for OceanStore](https://reader035.fdocuments.net/reader035/viewer/2022062805/56814dc8550346895dbb1ca4/html5/thumbnails/38.jpg)
Dec 9, 2003 Distributed Cluster Repair for OceanStore 38
Thanks for listening !
Acknowledgements:
Hakim WeatherspoonJohn Kubiatowicz