Speaker: WeiZeng

27
1 Balanced Cache:Reducing Conflict Misses of Direct- Mapped Caches through Programmable Decoders ISCA 2006,IEEE. By Chuanjun Zhang Speaker: WeiZeng

description

Balanced Cache:Reducing Conflict Misses of Direct-Mapped Caches through Programmable Decoders ISCA 2006,IEEE. By Chuanjun Zhang. Speaker: WeiZeng. Introduction The B-Cache Organization Experimental Results and Analysis Related Work Conclusion. Outline. Introduction - PowerPoint PPT Presentation

Transcript of Speaker: WeiZeng

Page 1: Speaker: WeiZeng

1

Balanced Cache:Reducing Conflict Misses of Direct-Mapped Caches through Programmable Decoders

ISCA 2006,IEEE.By Chuanjun Zhang

Speaker: WeiZeng

Page 2: Speaker: WeiZeng

2

Outline Introduction The B-Cache Organization Experimental Results and Analysis Related Work Conclusion

Page 3: Speaker: WeiZeng

3

Outline Introduction The B-Cache Organization Experimental Results and Analysis Related Work Conclusion

Page 4: Speaker: WeiZeng

4

Background Bottleneck to achieve high performance

Increasing gap between memory latency and processor speed

Multilevel memory hierarchy Cache acts as intermediary between the

super fast processor and the much slower main memory.

Two cache mapping schemes Direct-Mapped Cache: Set-Associative Cache:

Page 5: Speaker: WeiZeng

5

ComparisionDirect-Mapped Cache faster access time consumes less power per access consumes less area easy to implement simple to design higher miss rate

Set-Associative Cache longer access time consumes more power per access consumes more area reduces conflict misses has a replacement policyDesirable cache : access time of direct-mapped

cache + low miss rate of set-associative cache.

Page 6: Speaker: WeiZeng

6

What is B-Cache?

Balanced Cache (B-Cache):A mechanism to provide the benefit of cache block replacement while maintaining the constant access time of a direct-mapped cache

Page 7: Speaker: WeiZeng

7

New features of B-Cache

Decoder length of direct-mapped cache is increased by 3 bits: accesses to heavily used sets can be reduced to 1/8th of original design.

A replacement policy is added. A programmable decoder is used.

Page 8: Speaker: WeiZeng

8

The problem (an Example)

8-bit adresses

0,1,8,9... 0,1,8,9

Page 9: Speaker: WeiZeng

9

8-bit address

same as in 2-way cache

X : invalid PD entry

B-Cache solution

Page 10: Speaker: WeiZeng

10

Outline Introduction The B-Cache Organization Experimental Results and Analysis Related Work Conclusion

Page 11: Speaker: WeiZeng

11

Terminology

Memory address mapping factor (MF):

B-Cache associativity (BAS):

PI : index length of PDNPI : index length of NPDOI : index length of original direct-mapped cache

MF = 2(PI+NPI)/2OI , where MF≥1

BAS = 2OI/2NPI , where BAS≥1

Page 12: Speaker: WeiZeng

12

B-Cache organization

MF = 2(PI+NPI)/2OI =2(6+6)/29=8 BAS = 2(OI)/2NPI =2(3)/26=8

Page 13: Speaker: WeiZeng

13

Replacement policy

Random Policy: Simple to design and needs very few extra

hardware. Least Recently Used(LRU):

Better hit rate but more area overhead

Page 14: Speaker: WeiZeng

14

Outline Introduction The B-Cache Organization Experimental Results and Analysis Related Work Conclusion

Page 15: Speaker: WeiZeng

15

Experimental Methodology Primary metric: miss rate Other metrics: Latency,Storage,Power

Costs, Overall Performance, Overall Energy

Baseline: level one cache (direct-mapped 16kB cache with 32 bytes line size for instruction and data caches)

26 SPEC2K benchmarks run using the SimpleScalar tool set

Page 16: Speaker: WeiZeng

16

Data miss-rate reductions 16 entry victim buffer

set-associative caches B-Caches with dif. MFs

Page 17: Speaker: WeiZeng

17

Latency

Page 18: Speaker: WeiZeng

18

Storage Overhead

Additional hardware for the B-Cache is the CAM based PD.

4.3% higher than baseline

Page 19: Speaker: WeiZeng

19

Power Overhead Extra power consumption: PD of each

subarray. Power reduction:

3-bit data length reduction Removal of 3 input NAND gates

10.5% higher than baseline

Page 20: Speaker: WeiZeng

20

Overall Performance

Outperforms baseline by average of 5.9%. Only 0.3% less than 8-way cache but 3.7% higher than victim buffer.

Page 21: Speaker: WeiZeng

21

Overall Energy B-Cache consumes least energy ( 2% less than the

baseline ) B-Cache reduces miss rate and hence accesses to

2nd level cache, which is more power costly. When cache miss, B-Cache also reduces cache

memory accesses through miss prediction of PD, which makes power overhead much less.

Page 22: Speaker: WeiZeng

22

Outline Introduction The B-Cache Organization Experimental Results and Analysis Related Work Conclusion

Page 23: Speaker: WeiZeng

23

Related works

Reducing Miss Rate of Direct Mapped Caches Page allocation Column associative cache Adaptive group associative cache Skewed associative cache

Reducing Access Time of Set-associative Caches Partial address matcing : predicting hit way Difference bit cache

Page 24: Speaker: WeiZeng

24

Compared with previous tech

B-cache Applied to both high performance and

low-power embedded systems Balanced without software intervention Feasible and easy to implement

Page 25: Speaker: WeiZeng

25

Outline Introduction The B-Cache Organization Experimental Results and Analysis Related Work Conclusion

Page 26: Speaker: WeiZeng

26

Conclusion B-Cache allows accesses to cache sets to be

balanced by increasing the decoder length and incorporating a replacement policy to a direct-mapped cache design.

Programmable decoders dynamically determine which memory address has a mapping to cache set

A 16kB level one B-Cache outperforms direct-mapped cache by 64.5% and 37.8% miss rate reductions for instruction and data cache, respectively

Average IPC improvement: 5.9% Energy reduction: 2%. Access time: same as direct mapped cache

Page 27: Speaker: WeiZeng

27

Thanks!