Speaker: WeiZeng
description
Transcript of Speaker: WeiZeng
1
Balanced Cache:Reducing Conflict Misses of Direct-Mapped Caches through Programmable Decoders
ISCA 2006,IEEE.By Chuanjun Zhang
Speaker: WeiZeng
2
Outline Introduction The B-Cache Organization Experimental Results and Analysis Related Work Conclusion
3
Outline Introduction The B-Cache Organization Experimental Results and Analysis Related Work Conclusion
4
Background Bottleneck to achieve high performance
Increasing gap between memory latency and processor speed
Multilevel memory hierarchy Cache acts as intermediary between the
super fast processor and the much slower main memory.
Two cache mapping schemes Direct-Mapped Cache: Set-Associative Cache:
5
ComparisionDirect-Mapped Cache faster access time consumes less power per access consumes less area easy to implement simple to design higher miss rate
Set-Associative Cache longer access time consumes more power per access consumes more area reduces conflict misses has a replacement policyDesirable cache : access time of direct-mapped
cache + low miss rate of set-associative cache.
6
What is B-Cache?
Balanced Cache (B-Cache):A mechanism to provide the benefit of cache block replacement while maintaining the constant access time of a direct-mapped cache
7
New features of B-Cache
Decoder length of direct-mapped cache is increased by 3 bits: accesses to heavily used sets can be reduced to 1/8th of original design.
A replacement policy is added. A programmable decoder is used.
8
The problem (an Example)
8-bit adresses
0,1,8,9... 0,1,8,9
9
8-bit address
same as in 2-way cache
X : invalid PD entry
B-Cache solution
10
Outline Introduction The B-Cache Organization Experimental Results and Analysis Related Work Conclusion
11
Terminology
Memory address mapping factor (MF):
B-Cache associativity (BAS):
PI : index length of PDNPI : index length of NPDOI : index length of original direct-mapped cache
MF = 2(PI+NPI)/2OI , where MF≥1
BAS = 2OI/2NPI , where BAS≥1
12
B-Cache organization
MF = 2(PI+NPI)/2OI =2(6+6)/29=8 BAS = 2(OI)/2NPI =2(3)/26=8
13
Replacement policy
Random Policy: Simple to design and needs very few extra
hardware. Least Recently Used(LRU):
Better hit rate but more area overhead
14
Outline Introduction The B-Cache Organization Experimental Results and Analysis Related Work Conclusion
15
Experimental Methodology Primary metric: miss rate Other metrics: Latency,Storage,Power
Costs, Overall Performance, Overall Energy
Baseline: level one cache (direct-mapped 16kB cache with 32 bytes line size for instruction and data caches)
26 SPEC2K benchmarks run using the SimpleScalar tool set
16
Data miss-rate reductions 16 entry victim buffer
set-associative caches B-Caches with dif. MFs
17
Latency
18
Storage Overhead
Additional hardware for the B-Cache is the CAM based PD.
4.3% higher than baseline
19
Power Overhead Extra power consumption: PD of each
subarray. Power reduction:
3-bit data length reduction Removal of 3 input NAND gates
10.5% higher than baseline
20
Overall Performance
Outperforms baseline by average of 5.9%. Only 0.3% less than 8-way cache but 3.7% higher than victim buffer.
21
Overall Energy B-Cache consumes least energy ( 2% less than the
baseline ) B-Cache reduces miss rate and hence accesses to
2nd level cache, which is more power costly. When cache miss, B-Cache also reduces cache
memory accesses through miss prediction of PD, which makes power overhead much less.
22
Outline Introduction The B-Cache Organization Experimental Results and Analysis Related Work Conclusion
23
Related works
Reducing Miss Rate of Direct Mapped Caches Page allocation Column associative cache Adaptive group associative cache Skewed associative cache
Reducing Access Time of Set-associative Caches Partial address matcing : predicting hit way Difference bit cache
24
Compared with previous tech
B-cache Applied to both high performance and
low-power embedded systems Balanced without software intervention Feasible and easy to implement
25
Outline Introduction The B-Cache Organization Experimental Results and Analysis Related Work Conclusion
26
Conclusion B-Cache allows accesses to cache sets to be
balanced by increasing the decoder length and incorporating a replacement policy to a direct-mapped cache design.
Programmable decoders dynamically determine which memory address has a mapping to cache set
A 16kB level one B-Cache outperforms direct-mapped cache by 64.5% and 37.8% miss rate reductions for instruction and data cache, respectively
Average IPC improvement: 5.9% Energy reduction: 2%. Access time: same as direct mapped cache
27
Thanks!