2013/10/21 Yun-Chung Yang An Energy-Efficient Adaptive Hybrid Cache Jason Cong, Karthik Gururaj, Hui...
-
Upload
louise-welch -
Category
Documents
-
view
215 -
download
0
Transcript of 2013/10/21 Yun-Chung Yang An Energy-Efficient Adaptive Hybrid Cache Jason Cong, Karthik Gururaj, Hui...
Paper Presentation
2013/10/21 Yun-Chung Yang
An Energy-Efficient Adaptive Hybrid Cache
Jason Cong, Karthik Gururaj, Hui Huang, Chunyue Liu, Glenn Reinman, Yi ZouComputer Science Department, University of California, Los AngelesLow Power Electronics and Design (ISLPED) 2011 International Symposium onPage 67 – 72
2
Abstract Related Work What’s the Problem
Run-time behavior Set balancing
Proposed Method – Adaptive Hybrid Cache Experiment Result Conclusion
Outline
3
By reconfiguring part of the cache as software-managed scratchpad memory (SPM), hybrid caches manage to handle both unknown and predictable memory access patterns. However, existing hybrid caches provide a flexible partitioning of cache and SPM without considering adaptation to the run-time cache behavior. Previous cache set balancing techniques are either energy-inefficient or require serial tag and data array access. In this paper an adaptive hybrid cache is proposed to dynamically remap SPM blocks from high-demand cache sets to low-demand cache sets. This achieves 19%, 25%, 18% and 18% energy-runtime-production reductions over four previous representative techniques on a wide range of benchmarks.
Abstract
4
Related Work
Software Controlled
Partition cache from way to
blocks
[2], [3]
Energy(set utilization)
SPM
Serial tag/data access
[4], [5]
[8]-[10]
This paper
Victim cache[1
1]
Balanced Cache[12]
Column caching FlexCache Reconfigurable cache
Virtual Local Store
Not need tag/data access
Use CAM memory
5
Previous hybrid cache designs partition the cache and SPM without adaption to the run time cache behavior. Due to SPM allocation is uniform and cache behavior is non-
uniform。Hot cache set problem.
What’s the problem?
Adaptive Hybrid Cache (a) Original Code (b) Transformed Code for
AH-cacheCompiler’s job
(c) Memory space for AH
-cache (d) SPM blocks
Adaptive Mapping to cache (e) SPM mapping in
cache (f) SPM mapping look-up
table(SMLT)
The Proposed Method
6
Hardware for AH-cache The green part is for accessing the SPM. Perform addressing cache and SMLT look-up in parallel
with the virtual address calculation in the pipeline architecture.
Hardware Configuration
7
Dynamically remap SPM blocks from high-demand cache sets to low-demand cache sets Migrate SPM blocks from high demand sets to low
demand sets
Adaptive Mapping
Initial map of SPM block in cache is random
8
Goal : Application requires P SPM blocks while AH-cache can provide Q SPM blocks at most, there will be S=P-Q blocks to adaptive satisfy the high-demand cache sets.
Solution : Use victim tag buffer to capture the demand of each set. Floating block holder – Record the cache sets that hold the floating
blocks.
Adaptive Mapping – Victim Tag Buffer
9
Re-insertion bit = 1, means this set is highly demanded and re-inserted into FBH queue.
Floating Block Holder
10
11
Problem : Worst-case of S cycles delay for searching, where S is max size of SPM.
Solution : Storing re-insertion bit into table called re-insertion bit
table(RIBT) Search parallel in 16 re-insertion bit.
Improvement of FBH queue
12
Storage Overhead Critical path of SMLT table in pipeline stage Comparison with other(performance, miss rate,
energy) Non-adaptive hybrid cache(N) Non-adaptive hybrid cache + balanced cache(B) Non-adaptive hybrid cache + victim cache(Vp, Vs) Phase-reconfigurable hybrid cache(R) Adaptive hybrid cache(AH) Static optimized hybrid cache(S)
What would you like to know
13
Storage 16KB, 2 way-associative, 128 sets, 64B data
block, 4B tag entry size. 128 64B SPM blocks SMLT 128 9-bit entries(1 valid + 6 bit index + 2 bit way) Insertion flag + 4-bit counter FBH queue 128 7-bit entries RIBT 8 16-bit entries Total : 0.4KB, 3% of the hybrid cache size
14
Critical Path of SMLT 32nm technology(cache block size is 64B) 0.2ns for critical path fits in 4GHz core.
15
Experiment Result – Miss Rate R reduces cache miss by 34% AH-cache reduces the cache miss by 52%
AH-cache outer perform B because of B cache allocate SPM in uniform way without considering cache set demand.
Victim cache depends on its size.
16
AH-cache outer perform B, Vp, Vs and R by 3%, 4%, 8% and 12%, respectively
Experiment Result – Performance
17
Although the proposed method with additional hardware, SMLT, VTB and adaptive mapping unit, AH-cache still have energy reduction of 16%, 22%, 10% and 7% compared to designs B, Vp, Vs and R, respectively.
Experiment Result – Energy