The Hardness of Cache Conscious Data Placement Erez Petrank, Technion Dror Rawitz, Caesarea...
-
date post
22-Dec-2015 -
Category
Documents
-
view
217 -
download
0
Transcript of The Hardness of Cache Conscious Data Placement Erez Petrank, Technion Dror Rawitz, Caesarea...
![Page 1: The Hardness of Cache Conscious Data Placement Erez Petrank, Technion Dror Rawitz, Caesarea Rothschild Institute Appeared in 29 th ACM Conference on Principles.](https://reader030.fdocuments.net/reader030/viewer/2022032523/56649d7a5503460f94a5dc1d/html5/thumbnails/1.jpg)
The Hardness of Cache Conscious Data Placement
Erez Petrank, TechnionDror Rawitz, Caesarea Rothschild Institute
Appeared in29th ACM Conference on Principles of Programming
Languages Portland, Oregon, January 16, 2002
![Page 2: The Hardness of Cache Conscious Data Placement Erez Petrank, Technion Dror Rawitz, Caesarea Rothschild Institute Appeared in 29 th ACM Conference on Principles.](https://reader030.fdocuments.net/reader030/viewer/2022032523/56649d7a5503460f94a5dc1d/html5/thumbnails/2.jpg)
2
Agenda Background & motivation The problem of cache conscious
data / code placement is: extremely difficult in various models
Positive matching results (weak…). Some proof techniques and details Conclusion
![Page 3: The Hardness of Cache Conscious Data Placement Erez Petrank, Technion Dror Rawitz, Caesarea Rothschild Institute Appeared in 29 th ACM Conference on Principles.](https://reader030.fdocuments.net/reader030/viewer/2022032523/56649d7a5503460f94a5dc1d/html5/thumbnails/3.jpg)
4
Cache Structure Large memory divided
into blocks. Small cache - k blocks. Mapping of memory
blocks to cache blocks. (e.g., modulus function)
Cache hit - accessed block is in the cache.
Cache miss - required block must be read to cache from memory.
Direct mapping
![Page 4: The Hardness of Cache Conscious Data Placement Erez Petrank, Technion Dror Rawitz, Caesarea Rothschild Institute Appeared in 29 th ACM Conference on Principles.](https://reader030.fdocuments.net/reader030/viewer/2022032523/56649d7a5503460f94a5dc1d/html5/thumbnails/4.jpg)
5
What can we do to improve program cache behavior?
Arrange code / data to minimize cache misses
Write cache-conscious programs
In this work we concentrate on the first.
![Page 5: The Hardness of Cache Conscious Data Placement Erez Petrank, Technion Dror Rawitz, Caesarea Rothschild Institute Appeared in 29 th ACM Conference on Principles.](https://reader030.fdocuments.net/reader030/viewer/2022032523/56649d7a5503460f94a5dc1d/html5/thumbnails/5.jpg)
6
How do we place data (or code) optimally?
Step 1: Discover future accesses to data. Step 2: Find placement of data that
minimizes the number of cache misses.
Step 3: Rearranged the data in memory. Step 4: Run program.
Some “minor” problems: In Step 1: We cannot tell the future In Step 2: We don’t know how to do that
![Page 6: The Hardness of Cache Conscious Data Placement Erez Petrank, Technion Dror Rawitz, Caesarea Rothschild Institute Appeared in 29 th ACM Conference on Principles.](https://reader030.fdocuments.net/reader030/viewer/2022032523/56649d7a5503460f94a5dc1d/html5/thumbnails/6.jpg)
7
Step 1: Discover future accesses to data Static analysis. Profiling. Runtime monitoring.
This work:Even if future accesses are known exactly,Step 2 (placing data optimally) is extremely
difficult.
![Page 7: The Hardness of Cache Conscious Data Placement Erez Petrank, Technion Dror Rawitz, Caesarea Rothschild Institute Appeared in 29 th ACM Conference on Principles.](https://reader030.fdocuments.net/reader030/viewer/2022032523/56649d7a5503460f94a5dc1d/html5/thumbnails/7.jpg)
8
The Problem Input: a set of objects O={o1,…,om}, and a
sequence of accesses =(1,…,n).
E.g. = (o1,o3,o7,o1,o2,o1,o3,o4,o1).
Solution: a placement, f:ON. Measure: number of misses.
We want: placement of o1,…,om in memory that obtains minimum number of cache misses (over all possible placements).
![Page 8: The Hardness of Cache Conscious Data Placement Erez Petrank, Technion Dror Rawitz, Caesarea Rothschild Institute Appeared in 29 th ACM Conference on Principles.](https://reader030.fdocuments.net/reader030/viewer/2022032523/56649d7a5503460f94a5dc1d/html5/thumbnails/8.jpg)
9
Our Results
Can we (efficiently) find an optimal placement?
No! Unless, P=NP.
![Page 9: The Hardness of Cache Conscious Data Placement Erez Petrank, Technion Dror Rawitz, Caesarea Rothschild Institute Appeared in 29 th ACM Conference on Principles.](https://reader030.fdocuments.net/reader030/viewer/2022032523/56649d7a5503460f94a5dc1d/html5/thumbnails/9.jpg)
10
Our Results
Can we (efficiently) find an “almost” optimal placement? Almost = # misses twice the optimum
No! Unless, P=NP.
Can we (eff.) find “fairly” optimal placement? Fairly = # misses 100 times the optimum No! Unless, P=NP.
![Page 10: The Hardness of Cache Conscious Data Placement Erez Petrank, Technion Dror Rawitz, Caesarea Rothschild Institute Appeared in 29 th ACM Conference on Principles.](https://reader030.fdocuments.net/reader030/viewer/2022032523/56649d7a5503460f94a5dc1d/html5/thumbnails/10.jpg)
11
Our Results
Can we (eff.) find a “reasonable” placement? reasonable = # misses log(n) the optimum No! Unless, P=NP.
Can we (eff.) find an “acceptable” placement? Acceptable = # misses n0.99 times the optimum No! Unless, P=NP.
![Page 11: The Hardness of Cache Conscious Data Placement Erez Petrank, Technion Dror Rawitz, Caesarea Rothschild Institute Appeared in 29 th ACM Conference on Principles.](https://reader030.fdocuments.net/reader030/viewer/2022032523/56649d7a5503460f94a5dc1d/html5/thumbnails/11.jpg)
12
The Main Theorem
Let ε be any real number, 0<ε<1. If there is a polynomial time algorithm that finds a placement which is within a factor of n(1-) from the optimum, then P=NP.
(Theorem holds for caches with > 2 blocks)
![Page 12: The Hardness of Cache Conscious Data Placement Erez Petrank, Technion Dror Rawitz, Caesarea Rothschild Institute Appeared in 29 th ACM Conference on Principles.](https://reader030.fdocuments.net/reader030/viewer/2022032523/56649d7a5503460f94a5dc1d/html5/thumbnails/12.jpg)
13
Extend to t-way Associative Caches
t-way Associative Caches:
t·k blocks in cache, k sets, t blocks in a set.
memory block mapped to a set.
Inside a set: a replacement protocol.
Theorem 2: same hardness holds for t-way associative cache systems.
.
.
.
.
.
.
1 2 3
![Page 13: The Hardness of Cache Conscious Data Placement Erez Petrank, Technion Dror Rawitz, Caesarea Rothschild Institute Appeared in 29 th ACM Conference on Principles.](https://reader030.fdocuments.net/reader030/viewer/2022032523/56649d7a5503460f94a5dc1d/html5/thumbnails/13.jpg)
14
Result is “robust” Holds for a variety of models. E.g.,
Mapping of memory block to cache is not by modulus,
Replacement policy is not standard, Object sizes are fixed, (or they are
not), Objects must be aligned to cache
blocks, (or not), Etc…
![Page 14: The Hardness of Cache Conscious Data Placement Erez Petrank, Technion Dror Rawitz, Caesarea Rothschild Institute Appeared in 29 th ACM Conference on Principles.](https://reader030.fdocuments.net/reader030/viewer/2022032523/56649d7a5503460f94a5dc1d/html5/thumbnails/14.jpg)
15
More Difficulties:Pairwise Information
A practical problem: sequence of accesses is long. Processing it is costly.
Solution in previous work: keep relations between pairs of objects. E.g., for each pair how beneficial is putting
them in the same memory block. E.g., for each pair how many misses would
be caused by mapping the two to the same cache block
![Page 15: The Hardness of Cache Conscious Data Placement Erez Petrank, Technion Dror Rawitz, Caesarea Rothschild Institute Appeared in 29 th ACM Conference on Principles.](https://reader030.fdocuments.net/reader030/viewer/2022032523/56649d7a5503460f94a5dc1d/html5/thumbnails/15.jpg)
16
Pairwise Information is Lossy
Conclusion:Even when given unrestricted time, finding an optimal pairwise placement is a bad idea (worst case).
Theorem 3:There exists a sequence such that
#misses(f) (k-3) #misses(f*)f - optimal pairwise placemetf* - optimal placement
![Page 16: The Hardness of Cache Conscious Data Placement Erez Petrank, Technion Dror Rawitz, Caesarea Rothschild Institute Appeared in 29 th ACM Conference on Principles.](https://reader030.fdocuments.net/reader030/viewer/2022032523/56649d7a5503460f94a5dc1d/html5/thumbnails/16.jpg)
17
Pairwise Information:Hardness Result
Theorem 4: Let ε be any real number, 0<ε<1. If there is a polynomial time algorithm that finds a placement which is within a factor of n(1- ε) from the optimum with respect to pairwise information, then P=NP.
Proof is similar to the direct mapping case.
![Page 17: The Hardness of Cache Conscious Data Placement Erez Petrank, Technion Dror Rawitz, Caesarea Rothschild Institute Appeared in 29 th ACM Conference on Principles.](https://reader030.fdocuments.net/reader030/viewer/2022032523/56649d7a5503460f94a5dc1d/html5/thumbnails/17.jpg)
18
A Simple Observation Input:
Objects O={o1,…,om}, and access sequence =(1,…,n).
Any placement yields at most n cache misses.
Any placement yields at least 1 cache miss. Therefore, any placement is within a factor of
n from the optimum. (Recall: a solution within n(1-ε) is not
possible.)
![Page 18: The Hardness of Cache Conscious Data Placement Erez Petrank, Technion Dror Rawitz, Caesarea Rothschild Institute Appeared in 29 th ACM Conference on Principles.](https://reader030.fdocuments.net/reader030/viewer/2022032523/56649d7a5503460f94a5dc1d/html5/thumbnails/18.jpg)
19
What about positive results ?
In light of the lower bound not much can be done in general. Yet…
Theorem 5: There exists a polynomial time approximation algorithm that outputs a placement (always) within a factor of from the optimal placement for any c. Compare: impossible: n(1-), possible: n/c
logn
ncn
log
![Page 19: The Hardness of Cache Conscious Data Placement Erez Petrank, Technion Dror Rawitz, Caesarea Rothschild Institute Appeared in 29 th ACM Conference on Principles.](https://reader030.fdocuments.net/reader030/viewer/2022032523/56649d7a5503460f94a5dc1d/html5/thumbnails/19.jpg)
20
Other Problems Our Problem:
Inapproximable n/c logn -approximation algorithm
Famous problems with similar results: Minimum graph coloring Maximum clique
![Page 20: The Hardness of Cache Conscious Data Placement Erez Petrank, Technion Dror Rawitz, Caesarea Rothschild Institute Appeared in 29 th ACM Conference on Principles.](https://reader030.fdocuments.net/reader030/viewer/2022032523/56649d7a5503460f94a5dc1d/html5/thumbnails/20.jpg)
21
Implications: We cannot hope to find an algorithm
that will always give a good placement.We must use heuristics.
We cannot estimate the potential benefit of rearranging data in memory to the cache behavior. We can only check what a proposed heuristic does for common benchmarks.
![Page 21: The Hardness of Cache Conscious Data Placement Erez Petrank, Technion Dror Rawitz, Caesarea Rothschild Institute Appeared in 29 th ACM Conference on Principles.](https://reader030.fdocuments.net/reader030/viewer/2022032523/56649d7a5503460f94a5dc1d/html5/thumbnails/21.jpg)
22
Some Proof Ideas(simplest – direct mapping)
Proof: We show that if the above algorithm exists, then we can decide for any given graph G, if G is k-colorable.
Theorem 1: Let be any real number, 0<<1. If there is a polynomial time algorithm that finds a placement which is within a factor of n(1-) from the optimum, then P=NP.
![Page 22: The Hardness of Cache Conscious Data Placement Erez Petrank, Technion Dror Rawitz, Caesarea Rothschild Institute Appeared in 29 th ACM Conference on Principles.](https://reader030.fdocuments.net/reader030/viewer/2022032523/56649d7a5503460f94a5dc1d/html5/thumbnails/22.jpg)
23
The k-colorability Problem Problem: Given G=(V,E), is G k-
colorable?
Known to be NP-complete for k>2.
![Page 23: The Hardness of Cache Conscious Data Placement Erez Petrank, Technion Dror Rawitz, Caesarea Rothschild Institute Appeared in 29 th ACM Conference on Principles.](https://reader030.fdocuments.net/reader030/viewer/2022032523/56649d7a5503460f94a5dc1d/html5/thumbnails/23.jpg)
24
Translating a Graph G into a Cache Question
GraphCache Question
ColorCache line
Vertex viObject ov
Edge e=(vi,vj)Subsequence e=(oi,oj)M
(i.e., M times (oi,oj).)
Coloring Placement
![Page 24: The Hardness of Cache Conscious Data Placement Erez Petrank, Technion Dror Rawitz, Caesarea Rothschild Institute Appeared in 29 th ACM Conference on Principles.](https://reader030.fdocuments.net/reader030/viewer/2022032523/56649d7a5503460f94a5dc1d/html5/thumbnails/24.jpg)
25
Translating a Graph G into a Cache Question
A vertex vi is represented by an object oi:OG = { oi : vi V }
Let =O(1/).
Each edge (vi,vj) is represented by |E| repetitions of the two objects oi,oj:
E
jiEvv
G ooji
,),(
![Page 25: The Hardness of Cache Conscious Data Placement Erez Petrank, Technion Dror Rawitz, Caesarea Rothschild Institute Appeared in 29 th ACM Conference on Principles.](https://reader030.fdocuments.net/reader030/viewer/2022032523/56649d7a5503460f94a5dc1d/html5/thumbnails/25.jpg)
26
Examples G1 (not 3-colorable)
G2 (3-colorable)
1 2
43
1 2
43
542
532
541
531
5212 ,,,,, oooooooooo
643
642
632
641
631
6211 ,,,,,, oooooooooooo
![Page 26: The Hardness of Cache Conscious Data Placement Erez Petrank, Technion Dror Rawitz, Caesarea Rothschild Institute Appeared in 29 th ACM Conference on Principles.](https://reader030.fdocuments.net/reader030/viewer/2022032523/56649d7a5503460f94a5dc1d/html5/thumbnails/26.jpg)
27
Properties of the Translation Length of : n = O(|E|+1) Case I: G is k-colorable.
Then, Opt(OG,G) = O(|E|) = O(n1/(+1)) = O(n/2) Case II: G is not k-colorable.
Then, Opt(OG,G) = Ω(|E|) = Ω(n/(+1)) = Ω(n1-/2)
Thus, an algorithm that provides a placement within n(1-) from the optimum can distinguish between the two cases!
![Page 27: The Hardness of Cache Conscious Data Placement Erez Petrank, Technion Dror Rawitz, Caesarea Rothschild Institute Appeared in 29 th ACM Conference on Principles.](https://reader030.fdocuments.net/reader030/viewer/2022032523/56649d7a5503460f94a5dc1d/html5/thumbnails/27.jpg)
28
Replacement Protocol Relevant in t-way caches Which object is removed
when set is full? We need a replacement
policy or protocol. Examples:
LRU FIFO …
.
.
.
.
.
.
1 2 3
![Page 28: The Hardness of Cache Conscious Data Placement Erez Petrank, Technion Dror Rawitz, Caesarea Rothschild Institute Appeared in 29 th ACM Conference on Principles.](https://reader030.fdocuments.net/reader030/viewer/2022032523/56649d7a5503460f94a5dc1d/html5/thumbnails/28.jpg)
29
Replacement Protocol
Solution: Only reasonable caches are considered.
Problem: replacement protocol may behave badly.
Recently used objects are being removed (e.g., most recently used).
Any placement is good.
Problem: How do we define reasonable?
![Page 29: The Hardness of Cache Conscious Data Placement Erez Petrank, Technion Dror Rawitz, Caesarea Rothschild Institute Appeared in 29 th ACM Conference on Principles.](https://reader030.fdocuments.net/reader030/viewer/2022032523/56649d7a5503460f94a5dc1d/html5/thumbnails/29.jpg)
30
Sensible Caches =(o1,…,oq), s.t. ij, oi oj No more than t objects are mapped to
the same cache set
causes at most q+C misses, when accessed after any ’.
O(1) misses after first access to .E.g., LRU is 0-sensible.
![Page 30: The Hardness of Cache Conscious Data Placement Erez Petrank, Technion Dror Rawitz, Caesarea Rothschild Institute Appeared in 29 th ACM Conference on Principles.](https://reader030.fdocuments.net/reader030/viewer/2022032523/56649d7a5503460f94a5dc1d/html5/thumbnails/30.jpg)
31
Pseudo-LRU Used in Intel® Pentium® 4-way associative Each set:
Two pairs LRU block flag (for each pair) LRU pair flag
Replacement Protocol: LRU block in LRU pair is removed
Pseudo-LRU is 0-sensible.
![Page 31: The Hardness of Cache Conscious Data Placement Erez Petrank, Technion Dror Rawitz, Caesarea Rothschild Institute Appeared in 29 th ACM Conference on Principles.](https://reader030.fdocuments.net/reader030/viewer/2022032523/56649d7a5503460f94a5dc1d/html5/thumbnails/31.jpg)
32
t-way associative caches:Some Proof Ideas
Proof: If the above algorithm exists, then we can decide for any given graph G, if G is k-colorable.
Theorem 2: Let be any real number, 0<<1. If there is a polynomial time algorithm that finds a placement which is within a factor of n(1- ) from the optimum, then P=NP.
Main idea: a sensible replacement protocol is “forced” to behave as a direct mapping cache.
We do this by using dummy objects
![Page 32: The Hardness of Cache Conscious Data Placement Erez Petrank, Technion Dror Rawitz, Caesarea Rothschild Institute Appeared in 29 th ACM Conference on Principles.](https://reader030.fdocuments.net/reader030/viewer/2022032523/56649d7a5503460f94a5dc1d/html5/thumbnails/32.jpg)
33
How do we construct the approximation algorithm? If there are c logn objects
Opt c logn. Any placement is -approximate.
Otherwise, There are < c logn objects in . In this case, we can find an optimal
placement by examining possible placements.
ncn
log
kcnc nk loglog
![Page 33: The Hardness of Cache Conscious Data Placement Erez Petrank, Technion Dror Rawitz, Caesarea Rothschild Institute Appeared in 29 th ACM Conference on Principles.](https://reader030.fdocuments.net/reader030/viewer/2022032523/56649d7a5503460f94a5dc1d/html5/thumbnails/33.jpg)
34
Unrestricted Alignments and non Uniform Size Objects Two simplifying assumptions:
Uniform size objects Aligned objects
Can we get the same results without them?
Lower bound? Upper bound?
![Page 34: The Hardness of Cache Conscious Data Placement Erez Petrank, Technion Dror Rawitz, Caesarea Rothschild Institute Appeared in 29 th ACM Conference on Principles.](https://reader030.fdocuments.net/reader030/viewer/2022032523/56649d7a5503460f94a5dc1d/html5/thumbnails/34.jpg)
35
Problem with Unrestricted Alignments Cache with 3 blocks. 2 objects o1, o2 each of size 1.5 blocks. The sequence: (o1,o2)M
Restricted alignment: O(M) misses.
Unrestricted alignment: 3 misses. (If they are placed consecutively in memory.)
![Page 35: The Hardness of Cache Conscious Data Placement Erez Petrank, Technion Dror Rawitz, Caesarea Rothschild Institute Appeared in 29 th ACM Conference on Principles.](https://reader030.fdocuments.net/reader030/viewer/2022032523/56649d7a5503460f94a5dc1d/html5/thumbnails/35.jpg)
36
Unrestricted Alignments (uniform instances) Aligned version of a
placement:
f g
Direct mapping: #misses(g) #misses(f)
Lower bounds
Associative caches: #misses(g) = O(#misses(f)) + O(|E|) when (O,) = H(G)
H
![Page 36: The Hardness of Cache Conscious Data Placement Erez Petrank, Technion Dror Rawitz, Caesarea Rothschild Institute Appeared in 29 th ACM Conference on Principles.](https://reader030.fdocuments.net/reader030/viewer/2022032523/56649d7a5503460f94a5dc1d/html5/thumbnails/36.jpg)
37
Conclusion Computing the best placement of
data in memory (w.r.t. reducing cache misses) is extremely difficult.
We cannot even get close (if PNP). There exists a matching (weak)
positive result. Implications:
using heuristics cannot be avoided. We cannot hope to evaluate potential
benefit.
![Page 37: The Hardness of Cache Conscious Data Placement Erez Petrank, Technion Dror Rawitz, Caesarea Rothschild Institute Appeared in 29 th ACM Conference on Principles.](https://reader030.fdocuments.net/reader030/viewer/2022032523/56649d7a5503460f94a5dc1d/html5/thumbnails/37.jpg)
38
An Open Question:
Can we classify programs for which the problem becomes simpler?
![Page 38: The Hardness of Cache Conscious Data Placement Erez Petrank, Technion Dror Rawitz, Caesarea Rothschild Institute Appeared in 29 th ACM Conference on Principles.](https://reader030.fdocuments.net/reader030/viewer/2022032523/56649d7a5503460f94a5dc1d/html5/thumbnails/38.jpg)
39
The end