Post-Layout Leakage Power Minimization Based on Distributed Sleep Transistor Insertion
Performance Optimization for Low-Leakage Caches based on Sleep-Line Access Density
-
Upload
iona-herrera -
Category
Documents
-
view
31 -
download
1
description
Transcript of Performance Optimization for Low-Leakage Caches based on Sleep-Line Access Density
126 March 2006 ODES-4
Performance Optimization for Low-Leakage Caches based on
Sleep-Line Access Density
Reiko Komiya †, Koji Inoue ‡
and Kazuaki Murakami ‡
†Fukuoka University, Japan‡ Kyushu University, Japan
226 March 2006 ODES-4
Outline
• Introduction– Leakage energy of cache memory– Conventional low leakage cache : Cache decay
• Problem of cache decay approach
• Solution: Always-Active approach
• Evaluation
• Conclusions
326 March 2006 ODES-4
Introduction
Dynamic Pwr
Static Pwr
The breakdown of energy consumptionin a processor family * 1
Cache leakage reduction is very important!!Cache leakage reduction is very important!!
Energy consumption = Dynamic energy + Static energy
Leakage energy increases withthe progress of process technology
consumed by charging & discharging by leakage current
*1 Fred Pollack (Intel Fellow): New Microarchitecture Challenges in the Coming Generations of CMOS Process Technologies [Micro32] *2 Simon Segars, “Low Power Design Techniques for Microprocessors,” ISSCC2001
Cache energy is44%
Power Analysis of ARM920T
426 March 2006 ODES-4
Conventional Low-Leakage Cache
Sleep mode(destroy the data to reduce leakage)
Conventional low-leakage cache:Cache decay
Conventional cachedoesn’t support any leakage
reduction technique
activemode
( high-leakage )
sleepmode
( low-leakage )no-access time decay itnerval≧
access ( miss )
initial state
Active mode (high-leakage to preserve the data)
Sleep-miss(degrades processor performance)
The mode of each line transits based on this state transition diagram
526 March 2006 ODES-4
0.50
1.00
1.50
2.00
2.50
3.00
3.5017
7.m
esa
179.
art
183.
equa
ke
188.
amm
p
164.
gzip
175.
vpr
176.
gcc
181.
mcf
197.
pars
er
256.
bzip
2
Ave
rage
Benchmark programs
Nor
mal
ized
exe
cuti
on ti
me 11.7
0.50
1.00
1.50
2.00
2.50
3.00
3.50
Norm
alized DL
1 misses
Performance Impact of Sleep-misses
Many sleep-misses causelarge performance degradation!
626 March 2006 ODES-4
Our Goal
High-performance, low-leakage cache!
• Problem of conventional low-leakage cache– Performance degradation caused by sleep-
misses
• Our approach– To improve performance, reduce sleep-misses– Prohibit some cache lines from going to sleep
mode
726 March 2006 ODES-4
Analysis of Sleep-misses
• Sleep-Miss Density (SMD):shows amount of sleep-misses in each line
SMDi =the number of sleep-misses at the cache line i
the average number of sleep-misses for all cache lines
The number of sleep-misses at each cache line
• Example
6 5 1
2 4 1
60 1 10
•The total number of sleep-misses: 90•The number of lines: 9 ⇒ The average number of sleep-misses : 10
SMD6=6SMD7=0.1
SMD8=1
Cache lines which often cause sleep-misses havehigh SMD !
826 March 2006 ODES-4
Characteristics of Sleep-misses4 ≦ SMD2 ≦ SMD < 41 ≦ SMD < 2SMD < 1
0%
10%20%
30%40%
50%
60%70%
80%90%
100%
f179
.art
f183
.equ
ake
i164
.gzip
Ave
rage
0%
10%20%
30%40%
50%
60%70%
80%90%
100%
f179
.art
f183
.equ
ake
i164
.gzip
Ave
rage
The breakdown of cache linesin terms of SMD
The breakdown of sleep-missesin terms of SMD
Bre
akd
own
of l
ine
s
Bre
akd
own
of s
lee
p-m
iss
A small number of high SMD linesoften produce sleep-misses
3.1% of lines cause 94.4% of sleep-misses
926 March 2006 ODES-4
Always-Active Approach
• Support “Always-Active mode (AA mode)”
• AA mode prohibits the corresponding line from going to sleep mode
• Cache lines which cause frequently sleep-misses should operate in AA mode
• Such lines are called “Always-Active lines (AA lines)”
1026 March 2006 ODES-4initial state
How to Decide AA Lines
A line which causes frequently sleep-misses ⇒ AA line
6 5 1
2 4 1
60 1 10
The number of sleep-misses at each cache lineSMD at each cache line
0.6 0.5 0.1
0.2 0.4 0.1
6 0.1 1
SMD > ThresholdSMD ≦ Threshold
activemode
sleepmodeno-access time decay interval≧
access
always-activemode
1126 March 2006 ODES-4
How to Measure SMD Dynamically
SMDi =the average number of sleep-misses for all cache lines
> Threshold
① > ②×③Example ) The number of cache lines = 1024 (=210) , Threshold = 2 (=21)
①
②
③
the total number ofsleep-misses
10bit right shift ②
②×③① > ?
AA modeactive modeyes
no
1bit left shift
the number of sleep-misses at the cache line i
1226 March 2006 ODES-4
Hardware Implementation
Sleep-miss counterAlways-active flag
1023
012
Decay flag 2 bit local counter
tag data
Vol
tage
Con
trol
gated
Vdd or 0V
total sleep-miss counter
¼ decay interval
>? >
shifter
global counter
=?
If a line is in sleep mode, Cache decay tag is in sleep mode⇒ AA approach tag is in active mode⇒
The line is in sleep-mode && tag match⇒a sleep-miss occurs!
1326 March 2006 ODES-4
Experimental Setup
• Evaluation model– Cache decay: conventional low-leakage cache– AA1: Cache decay with AA approach (threshold value=1)
• Cache configuration– L1 data cache
• Cache size: 32KB• Associativity: 2way• Hit latency: 1 clock cycle • Miss penalty: 32 clock cycles
• Evaluation items– Performance improvement– Energy reduction
1426 March 2006 ODES-4
Results
0.0
0.2
0.4
0.6
0.8
1.0
1.2
f183
.equ
ake
i164
.gzip
i175
.vpr
i197
.par
ser
Ave
rage
_
正規
化消
費エ
ネル
ギー
0.0
0.2
0.4
0.6
0.8
1.0
1.2
f183
.equ
ake
i164
.gzip
i175
.vpr
i197
.par
ser
Ave
rage
_
正規
化消
費エ
ネル
ギー
0.0
0.2
0.4
0.6
0.8
1.0
1.2
f183
.equ
ake
i164
.gzip
i175
.vpr
i197
.par
ser
Ave
rage
_
正規
化消
費エ
ネル
ギー
0.9
1.0
1.1
1.2
1.3
f183
.equ
ake
i164
.gzip
i175
.vpr
i197
.par
ser
Ave
rage
_
正規
化実
行時
間
0.9
1.0
1.1
1.2
1.3
f183
.equ
ake
i164
.gzip
i175
.vpr
i197
.par
ser
Ave
rage
_
正規
化実
行時
間
0.9
1.0
1.1
1.2
1.3
f183
.equ
ake
i164
.gzip
i175
.vpr
i197
.par
ser
Ave
rage
_
正規
化実
行時
間
Cache decayAA1
Higher performance and lower energy consumption
Improve the performance by increasing energy consumption
Nor
mal
ized
exe
cutio
n tim
e
Nor
mal
ized
ene
rgy
1526 March 2006 ODES-4
Conclusions
• We have proposed a high-performance, low-leakage cache: AA approach– Detect lines which cause sleep-misses frequently at run tim
e– The performance is improved by operating the line as AA
mode• Evaluation results
– Higher performance and lower energy consumption – The best case (f183.equake):
• Performance degradation: 19% → 4.2%• Energy consumption: 20% reduction
• Future work– Compare AA approach with an adaptive decay technique
(Kaxiras ISCA’00)
1826 March 2006 ODES-4
Impact of Threshold
0.9
1.0
1.1
1.2
1.3f1
83.e
quak
e
i164
.gzi
p
Ave
rage
_
正規
化実
行時
間
0.0
0.2
0.4
0.6
0.8
1.0
1.2
f183
.equ
ake
i164
.gzip
Ave
rage
_
正規
化消
費エ
ネル
ギー
Cache decayAA4AA2AA1
Threshold is small high performance. ⇒Because the number of AA lines increase!
Nor
mal
ized
exe
cutio
n tim
e
Nor
mal
ized
ene
rgy
1926 March 2006 ODES-4
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
(J)
_
消費
エネ
ルギ
ー内
訳
LE L1 DE L1 DE memory
f183.equake i164.gzip
Breakdown of Energy Consumption
AA1 is
・ Leakage energy increase
・ Dynamic energy accompanying reduce ‐ Because the number of sleep-miss reduce
Energy reduction is tradeoff of DEmemory and LEL1
AA1
Cache decay
Bre
akdo
wn
of e
nerg
y (J
)
2026 March 2006 ODES-4
Performance Impact of Decay Interval
0.81.01.21.41.61.82.0
177.
mes
a
179.
art
183.
equa
ke
188.
amm
p
164.
gzip
175.
vpr
176.
gcc
181.
mcf
197.
pars
er
256.
bzip
2
Ave
rage
Benchmark Programs
Nor
mal
ized
exe
cutio
n tim
e
Decay-1K Decay-8K Decay-64K Decay-512K AA1-4K AA2-8K
Cache decay: Performance improve along with the extension of decay intervalAA approach: Even if it uses short decay interval, performance fully improve
2126 March 2006 ODES-4
Energy Impact of Decay Interval
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
Benchmark Programs
Bre
akdo
wn
of e
nerg
y (J
)
_
LE L1 DE L1 DE memory
177.
mes
a
179.
art
183.
equa
ke
188.
amm
p
164.
gzip
175.
vpr|
176.
gcc
181.
mcf
197.
pars
er .
256.
bzip
2
Ave
rage
Decay-1K, Decay-8K, Decay-64K, Decay-512K, AA1-4K, AA2-8K
Cache decay: Leakage energy increase along with the extension of decay intervalAA approach: Leakage reduction is large than cache decay using long decay interval
2226 March 2006 ODES-4
Energy Model(1/3)
Etotal = LEL1 + DEL1 + DEmemory
LEL1 = {LEbit×Nactive(i)}
CC
i 1
CC : プログラム実行時間LEbit : 1 クロックサイクルにおける 1 ビット SRAMセルでの 平均リーク消費エネルギーNactive(i): i clock cycle 時の活性状態 SRAM ビット数
LEL1 : L1 キャッシュのリーク消費エネルギーDEL1 : L1 キャッシュの動的消費エネルギーDEmemory :主記憶アクセス消費エネルギー
従来型低リーク
常活性ブロック方式
従来型
CC 長い 短いNactive(i) 少ない 多い
☺☺☹
☹
2326 March 2006 ODES-4
DEL1 = DE 常活性 + DE 従来低 + DE 従来
消費エネルギー・モデル (2/3)
従来型低リーク
常活性方式
DE 常活
性
- オーバヘッド
DE 従来
低
オーバヘッド オーバヘッド
☹ ☹
☹
DE 常活性 : 常活性ブロック方式の適用による 動的消費エネルギー・オーバヘッド
DE 従来低 : 従来型低リーク・キャッシュの適用による動的消費エネルギー
オーバヘッド
DE 従来 : 従来型キャッシュでのアクセス消費エネルギーローカル待機状態中ミスカウンタ常活性フラグ
1023
012
待機状態フラグローカルカウンタtag data
電源
電圧
制御
状態破棄
Vdd / 0
総衰退ミスカウンタ設定値
>? >
シフタ
グローバルカウンタ
=?
2426 March 2006 ODES-4
消費エネルギー・モデル (3/3)
パラメータ
アクセス当りの平均消費エネルギ
ー積算根拠
LEbit 0.13pJ 文献 [1] を参考DEorg 1.90nJ CACTI3.0 を用いて測
定DE 従来 0.1pJ+0.5pJ 文献 [2]を参考DE 常活性 4.20pJ テーブルサイズと DEor
g から見積もり
DEmemory 38.0nJ DEorg×20と見積もり[1] K.Flautner, N.S.Kim, S.Martin, D.Blaauw, and T.Mudge, “Drowsy Caches: Simple Techniques for Reducing Leakage Power,” Proc. of the 29th Int, Symp. on Computer Architecture, pp.148-157, May 2002.[2] S.Kaxiras, Z.Hu, and M.Martonosi, “Cache Decay: Exploiting Generational Behavior to Reduce Cache Leakage Power,” Proc. of the 28th Int, Symp. on Computer Architecture, pp.240-251, June 2001.