Data Mapping for Higher Performance and Energy Efficiency in Multi-Level Phase Change Memory HanBin...
-
Upload
virgil-cobb -
Category
Documents
-
view
220 -
download
0
Transcript of Data Mapping for Higher Performance and Energy Efficiency in Multi-Level Phase Change Memory HanBin...
![Page 1: Data Mapping for Higher Performance and Energy Efficiency in Multi-Level Phase Change Memory HanBin Yoon*, Naveen Muralimanohar ǂ, Justin Meza*, Onur Mutlu*,](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649cec5503460f949b8d04/html5/thumbnails/1.jpg)
1
Data Mapping forHigher Performance and Energy Efficiency
in Multi-Level Phase Change Memory
HanBin Yoon*, Naveen Muralimanoharǂ, Justin Meza*, Onur Mutlu*, Norm Jouppiǂ
*Carnegie Mellon University, ǂHP Labs
![Page 2: Data Mapping for Higher Performance and Energy Efficiency in Multi-Level Phase Change Memory HanBin Yoon*, Naveen Muralimanohar ǂ, Justin Meza*, Onur Mutlu*,](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649cec5503460f949b8d04/html5/thumbnails/2.jpg)
2
Overview
• MLC PCM: Strengths and weaknesses• Data mapping scheme for MLC PCM– Exploits PCM characteristics for lower latency– Improves data integrity
• Row buffer management for MLC PCM– Increases row buffer hit rate
• Performance and energy efficiency improvements
![Page 3: Data Mapping for Higher Performance and Energy Efficiency in Multi-Level Phase Change Memory HanBin Yoon*, Naveen Muralimanohar ǂ, Justin Meza*, Onur Mutlu*,](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649cec5503460f949b8d04/html5/thumbnails/3.jpg)
3
Why MLC PCM?
• Emerging high density memory technology– Projected 3-12 denser than DRAM1
• Scalable DRAM alternative on the horizon– Access latency comparable to DRAM
• Multi-Level Cell: 1 of key strengths over DRAM– Further increases memory density (by 2–4)
• But MLC also has drawbacks
[1Lee+ ISCA’09]
![Page 4: Data Mapping for Higher Performance and Energy Efficiency in Multi-Level Phase Change Memory HanBin Yoon*, Naveen Muralimanohar ǂ, Justin Meza*, Onur Mutlu*,](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649cec5503460f949b8d04/html5/thumbnails/4.jpg)
4
Higher MLC Latencies and Energy
• MLC program/read operation is more complex– Finer control/detection of cell resistances
• Generally leads to higher latencies and energy– ~2 for reads, ~4 for writes (depending on tech. & impl.)
11 10 01 00
Number of cells
Resistance
1 0
![Page 5: Data Mapping for Higher Performance and Energy Efficiency in Multi-Level Phase Change Memory HanBin Yoon*, Naveen Muralimanohar ǂ, Justin Meza*, Onur Mutlu*,](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649cec5503460f949b8d04/html5/thumbnails/5.jpg)
5
• In MLC, single cell failure can lead to multi-bit faults
MLC Multi-bit Faults
bitbit
bitbit
bitbit
bitbit
bitbit
bitbit
bitbit
bitbit
bitbit
bitbit
bitbit
bitbit
bitbit
bitbit
bitbit
bitbit
bitbit
bitbit
bitbit
bitbit
cell
bit line
word line
LSB
MSB row
column
![Page 6: Data Mapping for Higher Performance and Energy Efficiency in Multi-Level Phase Change Memory HanBin Yoon*, Naveen Muralimanohar ǂ, Justin Meza*, Onur Mutlu*,](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649cec5503460f949b8d04/html5/thumbnails/6.jpg)
6
Motivation
• MLC PCM strength:– Scalable, dense memory
• MLC PCM weaknesses:– Higher latencies– Higher energy– Multi-bit faults– Endurance
Mitigate throughbit mapping schemes
androw buffer managementbased on the following
observations
![Page 7: Data Mapping for Higher Performance and Energy Efficiency in Multi-Level Phase Change Memory HanBin Yoon*, Naveen Muralimanohar ǂ, Justin Meza*, Onur Mutlu*,](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649cec5503460f949b8d04/html5/thumbnails/7.jpg)
7
• Read latency depends on cell state– Higher cell resistance higher read latency
Observation #1: Read Asymmetry
[2Qureshi+ ISCA’10]
![Page 8: Data Mapping for Higher Performance and Energy Efficiency in Multi-Level Phase Change Memory HanBin Yoon*, Naveen Muralimanohar ǂ, Justin Meza*, Onur Mutlu*,](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649cec5503460f949b8d04/html5/thumbnails/8.jpg)
8
• MSB can be determined before read completes• Quicker MSB read group LSB & MSB separately
Observation #1: Read Asymmetry
![Page 9: Data Mapping for Higher Performance and Energy Efficiency in Multi-Level Phase Change Memory HanBin Yoon*, Naveen Muralimanohar ǂ, Justin Meza*, Onur Mutlu*,](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649cec5503460f949b8d04/html5/thumbnails/9.jpg)
9
Observation #2: Program Asymmetry
00 01MSB LSB MSB LSB
10 11MSB LSB MSB LSB
210ns75ns
75ns
75ns 210ns
210ns
200ns
200ns
200ns250ns250ns
50ns
[3Joshi+ HPCA’11]
• Program latency depends on cell state
![Page 10: Data Mapping for Higher Performance and Energy Efficiency in Multi-Level Phase Change Memory HanBin Yoon*, Naveen Muralimanohar ǂ, Justin Meza*, Onur Mutlu*,](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649cec5503460f949b8d04/html5/thumbnails/10.jpg)
10
• Single-bit change reduces LSB program latency• Quicker LSB prog. group LSB & MSB
separately
Observation #2: Program Asymmetry
00 01MSB LSB MSB LSB
10 11MSB LSB MSB LSB
210ns75ns
75ns
75ns 210ns
210ns
200ns
200ns
200ns250ns250ns
50ns
Not allowed
![Page 11: Data Mapping for Higher Performance and Energy Efficiency in Multi-Level Phase Change Memory HanBin Yoon*, Naveen Muralimanohar ǂ, Justin Meza*, Onur Mutlu*,](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649cec5503460f949b8d04/html5/thumbnails/11.jpg)
11
• Bit mapping affects distribution of bit faults– 1 cell failure: 2 faults in 1 block vs. 1 fault each in 2
blocks (ECC-wise better)• Distributed faults group LSB & MSB separately
Observation #3: Distributed Bit Faults
bitbit
bitbit
bitbit
bitbit
bitbit
bitbit
bitbit
bitbit
bitbit
bitbit
bitbit
bitbit
bitbit
bitbit
bitbit
bitbit
bitbit
bitbit
bitbit
bitbit
MSB bits
LSB bits
MSB bits
LSB bits
bitbit
bitbit
bitbit
bitbit
bitbit
bitbit
bitbit
bitbit
bitbit
bitbit
bitbit
bitbit
bitbit
bitbit
bitbit
bitbit
![Page 12: Data Mapping for Higher Performance and Energy Efficiency in Multi-Level Phase Change Memory HanBin Yoon*, Naveen Muralimanohar ǂ, Justin Meza*, Onur Mutlu*,](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649cec5503460f949b8d04/html5/thumbnails/12.jpg)
12
• Decoupled bit mapping scheme– Reduced read latency for MSB pages (read asym.)– Reduced program latency for LSB pages (prog asym.)– Distributed bit faults between LSB and MSB blocks– Worse endurance
Idea #1: Bit-Decoupled Mapping
bitbit
bitbit
bitbit
bitbit
bitbit
bitbit
bitbit
bitbit
bitbit
bitbit
bitbit
bitbit
bitbit
bitbit
bitbit
bitbit
bitbit
bitbit
bitbit
bitbit
01
23
45
67
89
1011
1213
1415
1617
1819 Row
= Page
0256
1257
2258
3259
4260
5261
6262
7263
8264
9265 MSB page
LSB page
Coupled
Decoupled
![Page 13: Data Mapping for Higher Performance and Energy Efficiency in Multi-Level Phase Change Memory HanBin Yoon*, Naveen Muralimanohar ǂ, Justin Meza*, Onur Mutlu*,](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649cec5503460f949b8d04/html5/thumbnails/13.jpg)
13
Coalescing Writes
• Assuming spatial locality in writebacks• Interleaving blocks facilitates write coalescing• Improved endurance interleave blocks between LSB &
MSB
LSB blocks
MSB blocks
LSB blocks
MSB blocks
32 33 34 35 36 37 38 39 40 41 44 4742 43 45 460 3 8 9 10 11 12 13 14 151 2 4 5 6 7
3 9 11 13 15 17 19 21 23 25 27 29 311 5 70 8 10 12 14 16 18 20 22 24 26 28 302 4 6
PCM row: Decoupled bit mapping
PCM row: Decoupled bit mapping + block interleaving
dirty cache blockscache blocks
1 2 4 5 6 7
1 5 72 4 6
![Page 14: Data Mapping for Higher Performance and Energy Efficiency in Multi-Level Phase Change Memory HanBin Yoon*, Naveen Muralimanohar ǂ, Justin Meza*, Onur Mutlu*,](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649cec5503460f949b8d04/html5/thumbnails/14.jpg)
14
• LM-Interleaved (LMI) bit mapping scheme– Mitigates cell wear
Idea #2: LSB-MSB Block Interleaving
MSB page
LSB page0
2561
2572
2583
2594
2605
2616
2627
2638
2649
265
bitbit
bitbit
bitbit
bitbit
bitbit
bitbit
bitbit
bitbit
bitbit
bitbit MSB page
LSB page0
81
92
103
114
125
136
147
1516
2417
25
Decoupled
Decoupled and LM-Interleaved (LMI)
![Page 15: Data Mapping for Higher Performance and Energy Efficiency in Multi-Level Phase Change Memory HanBin Yoon*, Naveen Muralimanohar ǂ, Justin Meza*, Onur Mutlu*,](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649cec5503460f949b8d04/html5/thumbnails/15.jpg)
15
• Opportunity: Two latches per cell in row buffer– Use single row buffer as two “page buffers”
Row Buffer Management
01
23
45
67
89
1011
1213
1415
1617
1819
0256
1257
2258
3259
4260
5261
6262
7263
8264
9265
Row buffer
Row= Page
MSB page
LSB page
MSB bits
LSB bits
Coupled
Decoupled
![Page 16: Data Mapping for Higher Performance and Energy Efficiency in Multi-Level Phase Change Memory HanBin Yoon*, Naveen Muralimanohar ǂ, Justin Meza*, Onur Mutlu*,](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649cec5503460f949b8d04/html5/thumbnails/16.jpg)
16
• Increased row buffer hit rate
Idea #3: Split Page Buffering (SPB)MSB page
LSB page512
768513
769514
770515
771516
772517
773518
774519
775520
776521
777
LSB/MSB
LSB/MSB
0256
1257
2258
3259
4260
5261
6262
7263
8264
9265 MSB page
LSB page
0 1 2 3 4 5 6 7 8 9512 513 514 515 516 517 518 519 520 521
Row buffer
![Page 17: Data Mapping for Higher Performance and Energy Efficiency in Multi-Level Phase Change Memory HanBin Yoon*, Naveen Muralimanohar ǂ, Justin Meza*, Onur Mutlu*,](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649cec5503460f949b8d04/html5/thumbnails/17.jpg)
17
Evaluation Methodology
• Cycle-level x86 CPU-memory simulator– CPU: 8 out-of-order cores, 32 KB private L1 per
core– L2: 512 KB shared per core, DRAM-Aware LLC
Writeback4,5
– Dual channel DDR3 1066 MT/s, 2 ranks, aggregate PCM capacity 16 GB (2 bits per cell)
• Multi-programmed SPEC CPU2006 workloads– Misses per kilo-instructions > 10
[4Lee+ UTA-TechReport’10; 5Stuecheli+ ISCA’10]
![Page 18: Data Mapping for Higher Performance and Energy Efficiency in Multi-Level Phase Change Memory HanBin Yoon*, Naveen Muralimanohar ǂ, Justin Meza*, Onur Mutlu*,](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649cec5503460f949b8d04/html5/thumbnails/18.jpg)
18
Comparison Points and Metrics
• Baseline: Coupled bit mapping• Decoupled: Decoupled bit mapping• LMI-4: LSB-MSB interleaving every 4 blocks• LMI-16: LSB-MSB interleaving every 16 blocks• Weighted speedup (performance) = sum of
thread speedups versus when run alone• Max slowdown (fairness) = highest slowdown
experienced by any thread
![Page 19: Data Mapping for Higher Performance and Energy Efficiency in Multi-Level Phase Change Memory HanBin Yoon*, Naveen Muralimanohar ǂ, Justin Meza*, Onur Mutlu*,](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649cec5503460f949b8d04/html5/thumbnails/19.jpg)
190
0.2
0.4
0.6
0.8
1
1.2
Baseline Decoupled LMI-4 LMI-16
Performance
Wei
ghte
d Sp
eedu
p (n
orm
.)
Decoupled schemes benefit from reduced read latency (MSB) & program latency (LSB)
![Page 20: Data Mapping for Higher Performance and Energy Efficiency in Multi-Level Phase Change Memory HanBin Yoon*, Naveen Muralimanohar ǂ, Justin Meza*, Onur Mutlu*,](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649cec5503460f949b8d04/html5/thumbnails/20.jpg)
200
0.2
0.4
0.6
0.8
1
1.2
Baseline Decoupled LMI-4 LMI-16
Fairness
Max
imum
Slo
wdo
wn
(nor
m.)
Individual thread speedups and increased row buffer hit rate
![Page 21: Data Mapping for Higher Performance and Energy Efficiency in Multi-Level Phase Change Memory HanBin Yoon*, Naveen Muralimanohar ǂ, Justin Meza*, Onur Mutlu*,](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649cec5503460f949b8d04/html5/thumbnails/21.jpg)
210
0.2
0.4
0.6
0.8
1
1.2
Baseline Decoupled LMI-4 LMI-16
Energy Efficiency
Perf
orm
ance
per
Watt
(nor
m.)
Lower read energy (dominant case) due to exploiting read asymmetry
![Page 22: Data Mapping for Higher Performance and Energy Efficiency in Multi-Level Phase Change Memory HanBin Yoon*, Naveen Muralimanohar ǂ, Justin Meza*, Onur Mutlu*,](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649cec5503460f949b8d04/html5/thumbnails/22.jpg)
220
0.2
0.4
0.6
0.8
1
1.2
Baseline Decoupled LMI-4 LMI-16
Memory Lifetime
Mem
ory
Life
time
(nor
m.)
5-year lifespan feasible for system design?Point of on-going research…
![Page 23: Data Mapping for Higher Performance and Energy Efficiency in Multi-Level Phase Change Memory HanBin Yoon*, Naveen Muralimanohar ǂ, Justin Meza*, Onur Mutlu*,](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649cec5503460f949b8d04/html5/thumbnails/23.jpg)
23
Conclusion
• MLC PCM is a scalable, dense memory tech.– Exhibits higher latency and energy compared to SLC
1. LSB-MSB decoupled bit mapping– Exploits read asymmetry & program asymmetry– Distributes multi-bit faults
2. LSB-MSB block interleaving– Mitigates cell wear
3. Split page buffering– Increases row buffer hit rate
• Enhances perf. and energy eff. of MLC PCM
![Page 24: Data Mapping for Higher Performance and Energy Efficiency in Multi-Level Phase Change Memory HanBin Yoon*, Naveen Muralimanohar ǂ, Justin Meza*, Onur Mutlu*,](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649cec5503460f949b8d04/html5/thumbnails/24.jpg)
24
Thank you! Questions?