A Hybrid Caching Strategy for Streaming Media Files Jussara M. Almeida Derek L. Eager Mary K. Vernon...
-
date post
21-Dec-2015 -
Category
Documents
-
view
214 -
download
0
Transcript of A Hybrid Caching Strategy for Streaming Media Files Jussara M. Almeida Derek L. Eager Mary K. Vernon...
A Hybrid Caching Strategy for Streaming Media Files
Jussara M. Almeida Derek L. Eager Mary K. Vernon
University of Wisconsin-MadisonUniversity of Saskatchewan
November 2001
Outline• Characteristics of Streaming Media (SM) files
• Delivery of SM files
• Hypothesis and Assumptions
• Previous Caching Policies
• New Policy Performance Comparison
• New Caching Policies
• Conclusions and Future Work
Characteristics of SM Files
• Large file size– cache on disk
• Sustained I/O bandwidth – inserting and reading new content
• Clients access partial files– initial portion– favored segment– base + variable number of layers of layered
encoding
Delivery of SM Files
• Unicast streaming:
– server bandwidth is linear in client request rate
– goal: maximize byte hit ratio
• Multicast streaming
– save bandwidth
– cost sharing introduces new tradeoffs
Multicast
0
5
10
15
20
1 10 100 1000Client Request Rate
Re
qu
ire
d S
erv
er
Ba
nd
wid
th
• example: 10 distributed proxy servers each serving a local region,
100 requests (on avg) arrive per region during a given popular video
need 7 streams per region, or 12 streams at the remote server
Caching for Multicast Streams: Tradeoffs
Caching for Multicast Streams: Tradeoffs
• caching popular content reduces the load on the remote server and network
• delivering popular content from the remote server amortizes the cost of a stream over more clients
• earlier portions of a popular video require more bandwidth and have less cost-sharing than later portions
New Caching Policies Research
• Hypothesis: popularity-based strategy will outperform replacement-based strategy
– significant fraction of requests to uncached files may be for files that are accessed very sporadically
• Assumptions:
– limited disk space implies limited disk bandwidth
– proxy bandwidth for delivering cached streams is equal to min of proxy disk bw and proxy network bw
(call this proxy disk bandwidth)
Current Web Caching Policies
• Replacement based (cache on each miss)
• Top replacement candidate is an ad-hoc combination of:– large files
– least recently access or lower access frequency
– miss penalty (server latency, bandwidth)
• Cache whole file or none
• Unicast
• Ignore limited disk bandwidth
• Interval Caching [DaSi93, KaRT95]
• Resource Based Caching (RBC) [TVDS98]
• Least Frequently Used (LFU)
• Block-based insertion and deletion [AcSm00]
• Popularity-based caching for layered encoding [RYHE00]
• Prefix and Segment Caching for smoothing [SeRT99,WZDS98]
Previous SM Caching Policies
Interval Caching
• Cache smallest intervals
• Target: memory caches (lots of insertions)
File f
0 T Time
S1S2
0 T
S1
S1S2
0 T
S1S2S3
0 T
• Cache entire files and intervals/runs
• Goal: efficiently utilize the limited resource – limited space: cache smallest space requirement– limited bandwidth: cache smallest write overhead
• Pre-allocate bandwidth to each cached entity
• Complex algorithm – Complex implementation – High time complexity
Resource Based Caching
RBC Algorithm
xixi
xi
WR
W
,,
,
xi
xixi
xi
S
WRR
,
,,
,
)(
Step 1: Selecting entity x {interval, run, file} of file i
1) If Ubw > Uspace +
Choose the entity with lowest
2) If Uspace > Ubw +
Choose the entity with minimum space requirement Si,x
3) If Uspace - < Ubw < Uspace +
Choose the entity with largest
Step 2: Caching decision for entity x
1) If enough unallocated space and unallocated bandwidth:
Cache entity x
2) If enough unallocated space but bandwidth constrained:
Use bandwidth goodness list to select candidates for eviction
3) If enough unallocated bandwidth but space constrained:
Use space goodness list to select candidates for eviction
4) If both bandwidth and space constrained:
Walk on both lists: at each step, remove entity from bandwidth goodness list or from space goodness list.
Step 3: Allocate space and bandwidth for entity x
Least Frequently Used
• Different implementation options:
– What to do when receive first access to an object?
– How to estimate frequency?
• Version studied: Currently Most Popular (CMP)
– Insert only most frequently accessed
(file or segment)
– On-line popularity estimate: future research
Previous comparison : RBC vs. CMP [TVDS98]
• Fixed file access frequencies
• RBC outperforms CMP for all parameter values studied
• Limited design space– e.g.: total cache size 16GB
• Inconsistent results
New Performance Comparison
• Re-evaluate byte hit ratio of CMP and RBC– Simulation with synthetic workload– Broad design space
• New Pooled RBC
• New simple hybrid CMP/interval caching (CMP/IC) policy
System Assumptions
• Arrivals: Poisson()– extra experiments with Pareto(,k)
• File access frequency: Zipf()
• Perfect File popularity
– extra experiments with approximate file popularity
• Uniform file size and delivery rate
– extra experiments with variable file size and delivery rate
• Load balanced across multiple disks
System Parameters
• n : number of files
: Zipf parameter
• N : arrival rate (avg. number of requests per avg. file duration T)
N = T
• C : cache size (fraction of media data accessed)
• B: normalized disk bandwidth
(fraction of the average number of simultaneous streams needed to deliver data that is cached by CMP)
• B depends on N, , n, C and disk technology
• Relative performance of policies depends mainly on B
• B = 1.0 : CMP system is bandwidth balanced
• B 1.0 : CMP system is bandwidth deficient
• B 1.0: CMP system is bandwidth abundant
System Parameters
• Ultrastar 72ZX disk : – disk space: 116.76 hours of MPEG-1 video (73.4GB)
– disk bandwidth: 108 MPEG-1 streams (22-37 MB/s )
• Assume: 100 requests / hour for cached files
• If cache contains 2-hour movies:– Need 200 streams
– B = 108/200 = 0.54
• If cache contains 30-minute TV shows:– Need 50 streams for cache content
– B = 108/50 = 2.16
Normalized Disk Bandwidth (B)Example
RBC vs. CMP
• CMP outperforms RBC if B 1.0• RBC slightly outperforms CMP if B 1.0 and small caches
0
0.2
0.4
0.6
0.8
1
0 0.1 0.25 0.4 0.6 0.8 1
Cache Size
Byt
e H
it R
atio RBC
CMP
CMP
RBC
B=0.75
B=1.0
0
0.2
0.4
0.6
0.8
1
0 0.1 0.25 0.4 0.6 0.8 1
Cache Size
Byt
e H
it R
atio
CMPRBC
B=2.0
N = 450, n= 100, =0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1 21 41 61 81
File
Fra
ctio
n C
ach
ed
B = 0.75
00.10.20.30.40.50.60.70.80.9
1
1 21 41 61 81
File
Fra
ctio
n C
ach
edB = 2.0
00.10.20.30.40.50.60.70.80.9
1
1 21 41 61 81
File
Fra
ctio
n C
ach
ed
B = 1.0
Files Cached by RBC
• Average fraction of each file cached by RBC (N = 450, n = 100, C=0.25)
00.20.40.60.8
1
0 0.2 0.4 0.6 0.8 1Cache Size
Util
izat
ion
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1Cache Size
Util
izat
ion CMP - BW Util.
RBC - BW Util.RBC - Space Util.RBC - Write BW
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1Cache Size
Util
izat
ion
B = 0.75 B = 2.0B = 1.0
Space and Bandwidth Utilization
Pooled RBC
• Three improvements over RBC
– simpler rule to select entity to cache
– can keep cached intervals when deleting a full file
– pool of pre-allocated bandwidth
• Similar complexity as RBC
Pooled RBC, RBC and LFU
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Cache Size
Byt
e H
it R
atio
CMP / Pooled RBC
RBC
B=0.75CMP / Pooled RBC
RBC
B =1.0
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Cache SizeB
yte
Hit
Rat
io
RBC / Pooled RBC
CMP
B = 2.0
• Pooled RBC CMP• BUT, Pooled RBC is much more complex than CMP
N = 450, n= 100, =0
Hybrid CMP/IC Policies
• Do interval caching on a separate (small) cache
– Interval Cache in Main Memory: CMP/ICmem and Pooled RBC/ICmem
– Interval Cache on Disk: CMP/ICdisk
• e.g. 5% of disk cache
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Cache Size
Byt
e H
it R
atio
CMP/ICmemPooled RBC/ICmem
CMP/ICmem
Pooled RBC/ICmem
B = 1.0
B = 0.75
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Cache Size
Byt
e H
it R
atio
Pooled RBC/ICmem
CMP/ICmem
B = 2.0
N = 450, n= 100, =0
CMP/ICmem vs. Pooled RBC/ICmem
• Memory cache improves CMP and Pooled RBC • B 1.0 : greater improvement for CMP
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Cache Size
Byt
e H
it r
atio
CMP/ICdisk / CMP
Pooled RBC
B=0.75CMP/ICdisk / CMP
Pooled RBC
B=1.0
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Cache SizeB
yte
Hit
Rat
io
CMP
CMP/ICdisk / Pooled RBC
B = 2.0
N = 450, n= 100, =0
CMP/ICdisk vs. Pooled RBC
• CMP/ICdisk Pooled RBC CMP
Conclusions
• Simple CMP
– simple to implement
– performance similar to Pooled RBC, CMP/ICdisk (static file popularities)
• Hybrid CMP/IC policy
– Performance Pooled RBC
– simple to implement
– possibly more robust (imperfect and dynamic popularity measures)
Future Work• Develop on-line estimate of file popularity
• Server log analysis– client behavior and workloads (NOSSDAV’01 paper)– More logs!!!!
• Caching Policies for Multicast Streams – popular file has greater cache-sharing if not cached– determine cache content that minimizes per-client cost– caching principles / on-line policy– (coming up soon)
• Prototype, experimental ( live ) workloads