Post on 31-Jan-2016
description
Energy Reduction for STT-RAM Using Early Write Termination
Ping Zhou, Bo Zhao, Jun Yang, *Youtao ZhangElectrical and Computer Engineering Department
*Department of Computer ScienceUniversity of Pittsburgh
1ICCAD 2009
Introduction
• Traditional SRAM Cache– Limited by density, leakage and scalability
• STT-RAM Cache?– High density (~4x than SRAM)– High speed (same read speed as SRAM)– Non-volatile– No write endurance problem
2
STT-RAM: Cell
• Magnetic Tunnel Junction (MTJ)• Relative magnetization direction
– Different resistances Logic 0 or 1
• Write: spin-polarized current– Much less write current than conventional MRAM
3
MgO MgO
High Resistance (Logic 1)
Low Resistance (Logic 0)
Reference Layer
Free Layer
• Similar array structure as SRAM• Bidirectional write current
STT-RAM: Cell Array
4
write 0 write 1
MTJ MTJ
MTJMTJ
BL SL BL SLWL
WL
STT-RAM Cache: Challenge
• High dynamic energy– 6~14x more energy per write access
[Dong et al. DAC 2008, Sun et al. HPCA 2009]
– Write contributes >74% of total dynamic energy
5
74.2%
Need to reduce write energy in STT-RAM cache!
Opportunity
• Many bits are unchanged in a write access – Redundant bit-writes [Zhou et al. ISCA 2009]
• Redundant bit-writes in 16MB STT-RAM cache
6
88%
How to exploit this opportunity?
Exploiting Redundant Bit-Writes
• Need to know the old value…• Read & compare before write [Zhou et al. ISCA 2009]
• Can we do better?
7
Observation
• MTJ resistance changes abruptly by the end of write cycle– Cell still holds old value at
early stage of write cycle
• Read is much faster than write
8
Y. Chen et al. ISQED 2008
Possible to sense the old value at early stage of write cycle
Early Write Termination: Idea
• On a write access…– Start write cycle like normal– Sense the old value at early stage– Terminate the write cycle if old value is same as
new value
• Does not require a preceding read & compare!
9
EWT Circuit
10
MTJ
pass pass
Vsense1 Vsense0
write 0write 1
conversionconversionVin1 Vin0
Conversion circuit-Basic differential amplifier-Input lower Output higher-Input higher Output lower
Rwire Rwire
Vsense0Vsense1
Vref0Vref1
Sense-Amp
New value
Terminate?
SLBLWL
How EWT Works?
11
MTJ
pass pass
Vsense1 Vsense0
lowwrite 0
high
conversionconversionVin1 Vin0
Rwire Rwire
Old Value New Value Vsense0 SA output Action
0 0 higher 1 Terminate
Vin0
lower
1 0 lower 0 Continuehigher
0.536ns
SLBLWL
Advantages of EWT
• No performance penalty!– Carried within a write cycle– No need to read & compare before a write– Write access may finish early Slight speedup
• Low energy overhead (3.23%)• Low complexity• Easy to integrate with existing designs
12
MODELING STT-RAM AND EWT
13
Latency Modeling
• Cell– Derived from recent works [Dong et al. DAC 2008]
• Peripheral– Derived from CACTI
[Thoziyoor et al. ISCA 2008, Dong et al. DAC 2008]
14
Dynamic Energy Modeling
• Baseline: Derived from recent works[Dong et al. DAC 2008]
• EWT– Read energy: same as baseline– Write energy: variable
15
EWTwriteE peripheralE overheadE cellsE
peripheralE
overheadE
cellsE
Peripheral (derived from CACTI)
Extra energy introduced by EWT circuits (HSPICE)
Nchanged × Echanged + Nunchanged × Eunchanged
Cell change Terminated cell change
Leakage Energy Modeling
• STT-RAM is non-volatile– Power gate the idle banks– Assume 1ns delay to “wake up”– Used in both baseline and EWT
16
Experimental Setup
• Simics-based simulator– 4-core CMP, 1GHz– 32KB private L1 cache– 16MB shared L2 cache using STT-RAM, 16 banks– 4GB main memory– Enhanced cache model: STT-RAM & EWT
17
Results: Performance
18
• Normalized Cycle-Per-Instruction (CPI)
1% speedup
Slight performance improvement
Results: Write Energy
19
• Normalized write energy
Up to 80% write energy reduction
70% saving
Results: Dynamic Energy
20
• Normalized dynamic energy
52% reductionEWT
Base
Results: Total Energy
• Normalized total energy
21
33% reduction
Results: Energy-Delay Product
• Normalized ED2
22
34% reduction
Conclusion
• Address a key challenge to STT-RAM cache: dynamic energy
• EWT: Exploit redundant bit-writes without performance penalty– Low overhead and complexity
• Modeling and evaluation– Up to 80% write energy reduction– 34% ED2 reduction
23
THANK YOU!
24