2013/10/21 Yun-Chung Yang An Energy-Efficient Adaptive Hybrid Cache Jason Cong, Karthik Gururaj, Hui...

Paper Presentation

2013/10/21 Yun-Chung Yang

An Energy-Efficient Adaptive Hybrid Cache

Jason Cong, Karthik Gururaj, Hui Huang, Chunyue Liu, Glenn Reinman, Yi ZouComputer Science Department, University of California, Los AngelesLow Power Electronics and Design (ISLPED) 2011 International Symposium onPage 67 – 72

2

Abstract Related Work What’s the Problem

Run-time behavior Set balancing

Proposed Method – Adaptive Hybrid Cache Experiment Result Conclusion

Outline

3

By reconfiguring part of the cache as software-managed scratchpad memory (SPM), hybrid caches manage to handle both unknown and predictable memory access patterns. However, existing hybrid caches provide a flexible partitioning of cache and SPM without considering adaptation to the run-time cache behavior. Previous cache set balancing techniques are either energy-inefficient or require serial tag and data array access. In this paper an adaptive hybrid cache is proposed to dynamically remap SPM blocks from high-demand cache sets to low-demand cache sets. This achieves 19%, 25%, 18% and 18% energy-runtime-production reductions over four previous representative techniques on a wide range of benchmarks.

Abstract

4

Related Work

Software Controlled

Partition cache from way to

blocks

[2], [3]

Energy(set utilization)

SPM

Serial tag/data access

[4], [5]

[8]-[10]

This paper

Victim cache[1

1]

Balanced Cache[12]

Column caching FlexCache Reconfigurable cache

Virtual Local Store

Not need tag/data access

Use CAM memory

5

Previous hybrid cache designs partition the cache and SPM without adaption to the run time cache behavior. Due to SPM allocation is uniform and cache behavior is non-

uniform。Hot cache set problem.

What’s the problem?

Adaptive Hybrid Cache (a) Original Code (b) Transformed Code for

AH-cacheCompiler’s job

(c) Memory space for AH

-cache (d) SPM blocks

Adaptive Mapping to cache (e) SPM mapping in

cache (f) SPM mapping look-up

table(SMLT)

The Proposed Method

6

Hardware for AH-cache The green part is for accessing the SPM. Perform addressing cache and SMLT look-up in parallel

with the virtual address calculation in the pipeline architecture.

Hardware Configuration

7

Dynamically remap SPM blocks from high-demand cache sets to low-demand cache sets Migrate SPM blocks from high demand sets to low

demand sets

Adaptive Mapping

Initial map of SPM block in cache is random

8

Goal : Application requires P SPM blocks while AH-cache can provide Q SPM blocks at most, there will be S=P-Q blocks to adaptive satisfy the high-demand cache sets.

Solution : Use victim tag buffer to capture the demand of each set. Floating block holder – Record the cache sets that hold the floating

blocks.

Adaptive Mapping – Victim Tag Buffer

9

Re-insertion bit = 1, means this set is highly demanded and re-inserted into FBH queue.

Floating Block Holder

10

11

Problem : Worst-case of S cycles delay for searching, where S is max size of SPM.

Solution : Storing re-insertion bit into table called re-insertion bit

table(RIBT) Search parallel in 16 re-insertion bit.

Improvement of FBH queue

12

Storage Overhead Critical path of SMLT table in pipeline stage Comparison with other(performance, miss rate,

energy) Non-adaptive hybrid cache(N) Non-adaptive hybrid cache + balanced cache(B) Non-adaptive hybrid cache + victim cache(Vp, Vs) Phase-reconfigurable hybrid cache(R) Adaptive hybrid cache(AH) Static optimized hybrid cache(S)

What would you like to know

13

Storage 16KB, 2 way-associative, 128 sets, 64B data

block, 4B tag entry size. 128 64B SPM blocks SMLT 128 9-bit entries(1 valid + 6 bit index + 2 bit way) Insertion flag + 4-bit counter FBH queue 128 7-bit entries RIBT 8 16-bit entries Total : 0.4KB, 3% of the hybrid cache size

14

Critical Path of SMLT 32nm technology(cache block size is 64B) 0.2ns for critical path fits in 4GHz core.

15

Experiment Result – Miss Rate R reduces cache miss by 34% AH-cache reduces the cache miss by 52%

AH-cache outer perform B because of B cache allocate SPM in uniform way without considering cache set demand.

Victim cache depends on its size.

16

AH-cache outer perform B, Vp, Vs and R by 3%, 4%, 8% and 12%, respectively

Experiment Result – Performance

17

Although the proposed method with additional hardware, SMLT, VTB and adaptive mapping unit, AH-cache still have energy reduction of 16%, 22%, 10% and 7% compared to designs B, Vp, Vs and R, respectively.

Experiment Result – Energy

18

AH-cache dynamic remapping SPM blocks to cache block on run-time behavior.

AH-cache achieves energy-runtime-production reduction of 19%, 25%, 18% and 18% over designs B, Vp, Vs and R.

My comment Detail explained Mention the usage of tag while in SPM mode

Conclusion & Comment

2013/10/21 Yun-Chung Yang An Energy-Efficient Adaptive Hybrid Cache Jason Cong, Karthik Gururaj, Hui...

Documents

Transcript of 2013/10/21 Yun-Chung Yang An Energy-Efficient Adaptive Hybrid Cache Jason Cong, Karthik Gururaj, Hui...