A Dynamic Data Grid Replication Strategy to Minimize the Data Missed Ming Lei, Susan Vrbsky, Xiaoyan...
-
Upload
katherine-sheena-sharp -
Category
Documents
-
view
221 -
download
0
Transcript of A Dynamic Data Grid Replication Strategy to Minimize the Data Missed Ming Lei, Susan Vrbsky, Xiaoyan...
A Dynamic Data Grid Replication Strategy to
Minimize the Data Missed
Ming Lei, Susan Vrbsky, Xiaoyan Hong
University of Alabama
Agenda
Background &Previous Work Motivation System Models Result Conclusion Future Work
Background Large scale geographically distributed syst
ems are becoming more and more popular Replication of data is the most common so
lution to improve file access time Dynamic behavior of Grid users makes it d
ifficult to make decisions concerning data replications to meet the system availability goal
Previous work: Several replica schemes compared for savin
g access latency and bandwidth – unlimited storage [Ranganathan, et al. 2002]
HotZone algorithm to minimize the client-to-replica latency [Szymaniak et al. 2005]
HBR - dynamic replica replication strategy to reduce data access time by avoiding networking congestion [Park et al. 2003]
Motivation: As bandwidth and computing capacity
have become relatively cheaper, the data access latency can drop dramatically
System reliability and availability becomes the focus
Any data file access failure can lead to an incorrect result or a job crash
People can tolerate a small delay but not any system unreliability
Motivation:
Replicate data to:Maximize system data availabilityAssume limited storage resourcesWithout sacrificing data access latency
Architecture:
System Model: Note that system level data availability is more i
mportant than an individual file’s availability Two new measurements proposed:
System File Missing Rate SFMR
number of files potentially unavailable number of all the files requested by all the jobs.
System Bytes Missing Rate SBMR
number of bytes potentially unavailable total number of bytes requested by all jobs.
System Model: Given a set of jobs, J = (j1, j2, j3…, jN), each j
ob will access one file set F= (f1,f2..fk) File must stored at a Storage Element (SE) File availability will depend on the SE availa
bility For any file, its availability is :
pi = 1-
k
i
seip1
)1(
1. SFMR =
2. SBMR=
Job requests can be converted to a series of file access operations
System Model:
n
i
i
n
i
k
j
j
k
P
1
1 1
)1(
n
i
k
j
j
n
i
k
j
j
js
SP
1 1
1 1
*)1(
SFMR =
SBMR=
The set O means the file accessing set. We assume the whole storage limit in the whole grid system is S, so we have:
≤S, Ci denotes the number of
copies of fi and S is the total storage available.
System Model:
||
)1(
Oo
Pi
i
oS
oSP
i
i
i
ii *)1(
1
*m
i i
i
C S
For each file access operation ri, at instant T, we associate it with an important variable Vi, which will be set to the number of times this file will be accessed in the future.
How to make such a value Vi (4 ways):
1. No Prediction : The Vi = 1 at any time.
2. Bio Prediction: Vi is based on the file access history to predict the value of the file by a binomial distribution.
3. Zipf Prediction: Vi is based on the file access history to predict the value of the file by a Zipf distribution.
4. Queue Prediction: The current job queue is used to predict the value of the file. If the queue is empty, this Queue Prediction function will work the same as No Prediction.
System Model:
To achieve the optimal the SFMR and SBMR, we have to maximize the following values:
and
If the file sizes are the same, SFMR = SBMR.
To better describe our scheme and algorithm, We introduce a weight value as:
Wi =(Pj * Vj) /(Cj *Sj)
System Model:
i
i
i VoP *
o
VSPi
iii **
Algorithm:MinDmr Optimizer ():1. if requested file fi exists in the site then continue2. if requested file fi does not exist in the site and site has
enough free space then retrieve fi from remote site and store it.
3. if requested file fi does not exist in the site and site does not have enough free space then
sort the files in current SE by the file weight Wi in ascending order.
fetch the files from the sorted file list in order and add it into the candidates list until the accumulative file size of the candidate files are greater than or equal to the requested file.
4. Replicate the file if the value gained by replicating the file fi > accumulative value loss by deleting the candidate file fj from the SE:
ΔPi *Vi > ∑ΔPj *Vj
Candicatesj
Simulation Setting
OptorSim : developed by the EU DataGrid Project to test dynamic replica schemes. Eco optimizer (economical model – file replicated if maximizes
profit of SE)
Simulation Configuration :
File Set Size : 200 Job Set Size : 10000;
File set per job : 3~20 File Size : 1G
Network Topology Setting:
Results -
System File Missing Rate
0.0000.0010.0010.0020.0020.0030.0030.0040.0040.0050.005
Replica Schemes
SF
MR
Sequential
Random
RandomWalkGaussianRandom Zipf
SFMR with varying replica optimizers
Results - Total Job Time
5290000
5300000
5310000
5320000
5330000
5340000
5350000
5360000
Replica Schemes
Jo
b T
ime (
in s
ecs)
The Total job time with sequential access
File Missing Rate
0.00000
0.00050
0.00100
0.00150
0.00200
0.00250
0.00300
0.00350
0.00400
0.00450
0.00500
Replica Schemes
SF
MR
Random
Shortest Queue
Access Cost
Queue Access Cost
SFMR with varying job schedulers
Results – System File Missing Rate
0
0.5
1
1.5
2
2.5
3
4 8 16 32 64 128 256
Job queue Length
SF
MR
*0.0
0001
SEQ AccessPattern
Zipf AccessPattern
Job total Time
52400000
52600000
52800000
53000000
53200000
53400000
53600000
4 8 16 32 64 128 256
Job Queue Length
To
tal Jo
b T
ime
SEQ AccessPattern
Zipf AccessPattern
SFMR with varying job queue length
Total Job Time with varying job queue length
Results – File Missing Rate
0.00000
0.00100
0.00200
0.00300
0.00400
0.00500
0.00600
0.00700
0.00800
0.00900
200 300 400 500 600
File Space
SF
MR
LFUEcoBioEcoZipfBioMinDmrZipfMinDmrMinDmrNoPredictionMinDmrQueuePred
Missing Rate Gap (SBMR-SFMR)
Missing Rate Gap(SBMR-SFMR)
1.54
50.47
13.14
0.13
24.13
0.24 0.170.00
10.00
20.00
30.00
40.00
50.00
60.00
Replica Scheme
Mis
sin
g R
ate
*0
.00
00
1
SFMR with sequential access pattern
Conclusion
Proposed two metrics of data availability to evaluate the reliability of the system data in the Data Grid system
Discussed how we model the system availability problem Developed four prediction-based replica optimizers with t
he assumption that the Grid storage space is limited Presented our replica greedy algorithm that treats the ho
t and cold data file differently and uses a weighting factor for the replacement scheme.
Simulation results indicate our new strategies will outperform all others overall in terms of data availability
Future Work:
When the file size is not unique size, how to enhance our
scheme to differentiate the system file missing rate and system bytes missing rate
Work on new measurements to evaluate the job missing rate
Design new scheme and prediction function to minimize the new measurements