Prime-Cache / Pro-Cache / Power-Cache Archive Appliance User
Doppelgänger: A Cache for Approximate Computing
Transcript of Doppelgänger: A Cache for Approximate Computing
Doppelgänger: A Cache for Approximate Computing
Joshua San Miguel
Jorge Albericio
Andreas Moshovos
Natalie Enright Jerger
Cache Hierarchy
2
main memory
processor core
private caches
shared last-level cache
Cache Hierarchy
3
processor core
private caches
main memory
shared last-level cache
Cache Hierarchy
4
processor core
private caches
main memory
shared last-level cache
Accessing memory is 10x – 100x greater latency and energy than accessing private cache!
Cache Hierarchy
5
processor core
private caches
main memory
shared last-level cache
Accessing memory is 10x – 100x greater latency and energy than accessing private cache!
Need hierarchy of large caches…
Cache Hierarchy
6
main memory
processor core
private caches
shared last-level cache
Cache Hierarchy
7
main memory
processor core
private caches
shared last-level cache
Cache Hierarchy
8
main memory
processor core
private caches
shared last-level cache
But last-level cache consumes substantial energy and takes up 30%-50% of chip area!
Cache Hierarchy
9
main memory
processor core
private caches
shared last-level cache
But last-level cache consumes substantial energy and takes up 30%-50% of chip area!
Higher efficiency via Approximate Computing…
Summary
10
Doppelgänger Cache: Identifies approximate similarity in data block values.
77% cache storage savings of approximable data.
Summary
11
Doppelgänger Cache: Identifies approximate similarity in data block values.
77% cache storage savings of approximable data.
Effectively compresses storage of approximately similar blocks. 3x better compression ratio than state-of-the-art techniques.
Summary
12
Doppelgänger Cache: Identifies approximate similarity in data block values.
77% cache storage savings of approximable data.
Effectively compresses storage of approximately similar blocks. 3x better compression ratio than state-of-the-art techniques.
Significantly reduces area and energy consumption. Reduces total on-chip cache area by 1.36x.
Outline
Approximate Computing
Approximate Similarity
Doppelgänger Cache
Cache Architecture
Similarity Mapping
Evaluation
13
Approximate Computing
Not all data/computations need to be precise.
14
http://www.zentut.com/
http://www.businessweek.com/
http://www.cc.gatech.edu/~cnieto6/
http://www.analyticbridge.com/
http://themusicparlour.blogspot.ca/
http://www.scientific-computing.com/
Data mining
Computer vision Audio and video processing
Gaming Machine learning Dynamical simulation
Approximate Similarity
15
Two data blocks are approximately similar (i.e., doppelgängers) if replacing the values of one with the other still results in acceptable application output in the end.
Approximate Similarity
16
Two data blocks are approximately similar (i.e., doppelgängers) if replacing the values of one with the other still results in acceptable application output in the end.
Approximate Similarity
17
1
2
3
92 131 183 91 132 186
90 131 185 93 133 184
35 31 29 43 38 37
Two data blocks are approximately similar (i.e., doppelgängers) if replacing the values of one with the other still results in acceptable application output in the end.
Approximate Similarity
18
1
2
3
92 131 183 91 132 186
90 131 185 93 133 184
35 31 29 43 38 37
approximately
similar
Two data blocks are approximately similar (i.e., doppelgängers) if replacing the values of one with the other still results in acceptable application output in the end.
Approximate Similarity
19
1
2
3
92 131 183 91 132 186
90 131 185 93 133 184
35 31 29 43 38 37
approximately
similar
Two data blocks are approximately similar (i.e., doppelgängers) if replacing the values of one with the other still results in acceptable application output in the end.
Approximate Similarity
20
1
2
3
92 131 183 91 132 186
90 131 185 93 133 184
35 31 29 43 38 37
approximately
similar
Two data blocks are approximately similar (i.e., doppelgängers) if replacing the values of one with the other still results in acceptable application output in the end.
Approximate Similarity
21
1
2
3
92 131 183 91 132 186
90 131 185 93 133 184
35 31 29 43 38 37
approximately
similar
Two data blocks are approximately similar (i.e., doppelgängers) if replacing the values of one with the other still results in acceptable application output in the end.
Allows for 77% cache storage savings of approximable data!
Outline
Approximate Computing
Approximate Similarity
Doppelgänger Cache
Cache Architecture
Similarity Mapping
Evaluation
22
Doppelgänger Cache
23
main memory
processor core
private caches
shared last-level cache
Doppelgänger Cache
24
main memory
processor core
private caches
shared last-level cache
How can we exploit approximate similarity to save area and energy in the last-level cache?
Doppelgänger Cache
25
main memory
processor core
private caches
shared last-level cache
Doppelgänger Cache
26
main memory
processor core
private caches
shared LLC precise LLC Doppelgänger LLC
Conventional Cache
27
tag array data array
address
from L2
data from
memory
Conventional Cache
28
tag array data array
address
from L2
data from
memory
One-to-one mapping of data values to memory locations.
Conventional Cache
29
tag array data array
address
from L2
data from
memory
One-to-one mapping of data values to memory locations.
But the fundamental goal of a processor is to process data values, not memory locations…
Conventional Cache
30
tag array data array
address
from L2
data from
memory
Conventional Cache
31
tag array data array
address
from L2
data from
memory
Conventional Cache
32
tag array data array
address
from L2
data from
memory
Conventional Cache
33
tag array data array
address
from L2
data from
memory
Conventional Cache
34
tag array data array
address
from L2
data from
memory
Multiple copies of approximately similar blocks.
Conventional Cache
35
tag array data array
address
from L2
data from
memory
Doppelgänger Cache
36
tag array
approximate data array
address
from L2
data from
memory
Doppelgänger Cache
37
tag array
approximate data array
address
from L2
data from
memory
Smaller data array allows for substantial area and energy savings.
Doppelgänger Cache
38
tag array
approximate data array
address
from L2
data from
memory
Doppelgänger Cache
39
tag 0 map X
tag 1 map X
tag 2 map X
tag 3 map X
map X data block A
tag array
approximate data array
address
from L2
data from
memory
Doppelgänger Cache - Lookups
40
tag 0 map X
tag 1 map X
tag 2 map X
tag 3 map X
map X data block A
tag array
approximate data array
Doppelgänger Cache - Lookups
41
tag 0 map X
tag 1 map X
tag 2 map X
tag 3 map X
map X data block A
tag array
approximate data array
address 0
from L2
Doppelgänger Cache - Lookups
42
tag 0 map X
tag 1 map X
tag 2 map X
tag 3 map X
map X data block A
tag array
approximate data array
address 0
from L2
map X from
tag array
Doppelgänger Cache - Lookups
43
tag 0 map X
tag 1 map X
tag 2 map X
tag 3 map X
map X data block A
tag array
approximate data array
address 0
from L2
data A to L2
map X from
tag array
Doppelgänger Cache - Insertions
44
tag 0 map X
tag 1 map X
tag 2 map X
tag 3 map X
map X data block A
tag array
approximate data array
Doppelgänger Cache - Insertions
45
tag 0 map X
tag 5
tag 1 map X
tag 2 map X
tag 3 map X
map X data block A
tag array
approximate data array
address 5
from L2
Doppelgänger Cache - Insertions
46
tag 0 map X
tag 5
tag 1 map X
tag 2 map X
tag 3 map X
map X data block A
tag array
approximate data array
address 5
from L2
data B from
memory
data B to L2
Doppelgänger Cache - Insertions
47
tag 0 map X
tag 5 map Y
tag 1 map X
tag 2 map X
tag 3 map X
map X data block A
tag array
approximate data array
address 5
from L2
data B from
memory generate map Y from data B
data B to L2
Doppelgänger Cache - Insertions
48
tag 0 map X
tag 5 map Y
tag 1 map X
tag 2 map X
tag 3 map X
map X data block A
map Y
tag array
approximate data array
address 5
from L2
data B from
memory generate map Y from data B
data B to L2
Doppelgänger Cache - Insertions
49
tag 0 map X
tag 5 map Y
tag 1 map X
tag 2 map X
tag 3 map X
map X data block A
map Y
tag array
approximate data array
address 5
from L2
data B from
memory generate map Y from data B
data B to L2
Miss!
Doppelgänger Cache - Insertions (Miss)
50
tag 0 map X
tag 5 map Y
tag 1 map X
tag 2 map X
tag 3 map X
map X data block A
map Y
tag array
approximate data array
address 5
from L2
data B from
memory generate map Y from data B
data B to L2
Doppelgänger Cache - Insertions (Miss)
51
tag 0 map X
tag 5 map Y
tag 1 map X
tag 2 map X
tag 3 map X
map X data block A
map Y data block B
tag array
approximate data array
address 5
from L2
data B from
memory generate map Y from data B
data B to L2
Doppelgänger Cache - Insertions
52
tag 0 map X
tag 5 map Y
tag 1 map X
tag 2 map X
tag 3 map X
map X data block A
map Y data block B
tag array
approximate data array
Doppelgänger Cache - Insertions
53
tag 0 map X
tag 6
tag 5 map Y
tag 1 map X
tag 2 map X
tag 3 map X
map X data block A
map Y data block B
tag array
approximate data array
address 6
from L2
Doppelgänger Cache - Insertions
54
tag 0 map X
tag 6
tag 5 map Y
tag 1 map X
tag 2 map X
tag 3 map X
map X data block A
map Y data block B
tag array
approximate data array
address 6
from L2
data C from
memory
data C to L2
Doppelgänger Cache - Insertions
55
tag 0 map X
tag 6 map X
tag 5 map Y
tag 1 map X
tag 2 map X
tag 3 map X
map X data block A
map Y data block B
tag array
approximate data array
address 6
from L2
data C from
memory generate map X from data C
data C to L2
Doppelgänger Cache - Insertions
56
tag 0 map X
tag 6 map X
tag 5 map Y
tag 1 map X
tag 2 map X
tag 3 map X
map X data block A
map Y data block B
tag array
approximate data array
address 6
from L2
data C from
memory generate map X from data C
data C to L2
Doppelgänger Cache - Insertions
57
tag 0 map X
tag 6 map X
tag 5 map Y
tag 1 map X
tag 2 map X
tag 3 map X
map X data block A
map Y data block B
tag array
approximate data array
address 6
from L2
data C from
memory generate map X from data C
data C to L2
Hit!
Doppelgänger Cache - Insertions (Hit)
58
tag 0 map X
tag 6 map X
tag 5 map Y
tag 1 map X
tag 2 map X
tag 3 map X
map X data block A
map Y data block B
tag array
approximate data array
address 6
from L2
data C from
memory generate map X from data C
data C to L2
Doppelgänger Cache - Insertions (Hit)
59
tag 0 map X
tag 6 map X
tag 5 map Y
tag 1 map X
tag 2 map X
tag 3 map X
map X data block A
map Y data block B
tag array
approximate data array
address 6
from L2
data C from
memory generate map X from data C
data C to L2
Doppelgänger Cache - Insertions (Hit)
60
tag 0 map X
tag 6 map X
tag 5 map Y
tag 1 map X
tag 2 map X
tag 3 map X
map X data block A
map Y data block B
tag array
approximate data array
address 6
from L2
data C from
memory generate map X from data C
data C to L2 Data block A serves as an acceptable approximation of data block C.
Doppelgänger Cache - Insertions (Hit)
61
tag 0 map X
tag 6 map X
tag 5 map Y
tag 1 map X
tag 2 map X
tag 3 map X
map X data block A
map Y data block B
tag array
approximate data array
address 6
from L2
data C from
memory generate map X from data C
data C to L2
Doppelgänger Cache - Similarity Mapping
62
The map value represents the signature (or likeness) of a block. Blocks that generate the same map value are approximately similar.
Doppelgänger Cache - Similarity Mapping
63
data block A
A[0] A[1] A[n]
hash function
mapping
hash
map
The map value represents the signature (or likeness) of a block. Blocks that generate the same map value are approximately similar.
Doppelgänger Cache - Similarity Mapping
64
data block A
A[0] A[1] A[n]
mapping
hash
map
hash function Aggregates values in block:
hash = AVG(A*0+, …, A*n+)
The map value represents the signature (or likeness) of a block. Blocks that generate the same map value are approximately similar.
Doppelgänger Cache - Similarity Mapping
65
data block A
A[0] A[1] A[n]
hash function
hash
map
Discretizes hash value:
mapping
map
(M-bit)
All possible
hash values
The map value represents the signature (or likeness) of a block. Blocks that generate the same map value are approximately similar.
Doppelgänger Cache - Similarity Mapping
66
data block A
A[0] A[1] A[n]
hash function
hash
map
Discretizes hash value:
mapping
map
(M-bit)
All possible
hash values
approximately
similar
The map value represents the signature (or likeness) of a block. Blocks that generate the same map value are approximately similar.
Doppelgänger Cache
67
main memory
processor core
private caches
shared LLC precise LLC Doppelgänger LLC
uniDoppelgänger Cache
68
main memory
processor core
private caches
shared LLC uniDoppelgänger LLC
uniDoppelgänger Cache
69
main memory
processor core
private caches
shared LLC uniDoppelgänger LLC
Precise blocks simply use physical address as the map value.
70
More details in paper: Cache writes, replacements and coherence.
Details on hash functions and mapping.
Sensitivity to size of map space and data array.
Evaluation of uniDoppelgänger.
Doppelgänger Cache
Outline
Approximate Computing
Approximate Similarity
Doppelgänger Cache
Cache Architecture
Similarity Mapping
Evaluation
71
Evaluation
72
Applications: PARSEC and AxBench
Performance: Full-system cycle-level simulation
Error: Pin simulation
Area and Energy: CACTI
Configuration: 4 cores, private L1 and L2
2MB shared LLC (1MB precise, 1MB Doppelgänger)
Doppelgänger: 14-bit similarity map, 1/4 data array
73
Evaluation - Compression Ratio
0x
1x
2x
3x
4x
5x
6x
BΔI exact deduplication doppelganger doppelganger + BΔI
com
pre
ssio
n r
atio
bet
ter
74
Evaluation - Compression Ratio
0x
1x
2x
3x
4x
5x
6x
BΔI exact deduplication doppelganger doppelganger + BΔI
com
pre
ssio
n r
atio
bet
ter
75
Evaluation - Compression Ratio
0x
1x
2x
3x
4x
5x
6x
BΔI exact deduplication doppelganger doppelganger + BΔI
com
pre
ssio
n r
atio
bet
ter
76
Evaluation
0.8x
0.9x
1.0x
1.1x
1.2x
1.3x
1.4x
applicationoutput accuracy
applicationperformance
total cachedynamic energy
reduction
total cacheleakage energy
reduction
total cache areareduction
bet
ter
77
Evaluation
0.8x
0.9x
1.0x
1.1x
1.2x
1.3x
1.4x
applicationoutput accuracy
applicationperformance
total cachedynamic energy
reduction
total cacheleakage energy
reduction
total cache areareduction
bet
ter
Conclusion
78
Doppelgänger Cache: Identifies approximate similarity in data block values.
77% cache storage savings of approximable data.
Effectively compresses storage of approximately similar blocks. 3x better compression ratio than state-of-the-art techniques.
Significantly reduces area and energy consumption. Reduces total on-chip cache area by 1.36x.
Thank you
Doppelgänger: A Cache for Approximate Computing
Joshua San Miguel
Jorge Albericio
Andreas Moshovos
Natalie Enright Jerger