Embedded System Lab. 김해천 [email protected] Linearly Compressed Pages: A Low- Complexity,...
-
Upload
mariah-potter -
Category
Documents
-
view
226 -
download
6
Transcript of Embedded System Lab. 김해천 [email protected] Linearly Compressed Pages: A Low- Complexity,...
Embedded System Lab.
Embedded System Lab.
Linearly Compressed Pages: A Low-Complexity,
Low-Latency Main Memory Com-pression Framework
Gennady Pekhimenko† [email protected] Vivek Seshadri† [email protected] Yoongu Kim† [email protected] Hongyi Xin† [email protected] Onur Mutlu†
[email protected] Phillip B. Gibbons? [email protected] Michael A. Kozuch? [email protected] Todd C. Mowry† [email protected] †Carnegie Mellon University Intel Labs Pittsburgh
김 해 천
Embedded System Lab.
AbstractData compression is a promising approach for meeting the increasing memory capacity demands ex-pected in future systems. Unfortunately, existing compression algorithms do not translate wellwhen directly applied to main memory because they require the memory controller to perform non-trivial computation to locate a cache line within a compressed memory page, thereby increasing access la-tency and degrading system performance. Prior proposals for addressing this performance degradation problem are either costly or energy inefficient. By leveraging the key insight that all cache lines within a page should be compressed to the same size, this paper proposes a new approach to main memory compression—Linearly Compressed Pages (LCP)—that avoids the performance degradation problem without requiring costly or energy-inefficient hardware. We show that any compression algorithm can be adapted to fit the requirements of LCP, and we specifically adapt two previously-proposed compression algorithms to LCP: Frequent Pattern Compression and Base-Delta-Immediate Compression. Evaluations using benchmarks from SPEC CPU2006 and five server benchmarks show that our approach can signif-icantly increase the effective memory capacity (by 69% on average). In addition to the capacity gains, we evaluate the benefit of transferring consecutive compressed cache lines between the memory con-troller and main memory. Our new mechanism considerably reduces the memory bandwidth require-ments of most of the evaluated benchmarks (by 24% on average), and improves overall performance (by 6.1%/13.9%/10.7% for single-/two-/four-core orkloads on average) compared to a baseline system that does not employ main memory compression. LCP also decreases energy consumed by the main memory subsystem (by 9.5% on average overthe best prior mechanism).
김 해 천
Embedded System Lab.
Introduction Main memory, commonly implemented using DRAM technology, is a critical resource in mod-
ern systems Main memory capacity must be sufficiently provisioned
To prevent devastating performance loss from frequent page faults, overflowing working set
Unfortunately, the required minimum memory capacity is expected to increase in the future Applications are generally becoming more data-intensive with working set sizes Many core integrated onto the same chip, more applications are running concurrently on the system
Simply scaling up main memory? DRAM already constitutes a significant portion of the system’s cost and power budget Expensive off-chip signaling buffers
Data compression would be a very attractive approach to effectively increase main memory capacity
김 해 천
Embedded System Lab.
Potential for Data Compression
Significant redundancy in in-memory data
How can we exploit this redundancy? Main Memory compression helps Provides effect of a larger memory without making it physically larger
0x00000000 0x0000000B 0x00000003 0x00000004 …
L0 L1 L2 . . . LN-1
Cache Line (64B)
Address Offset 0 64 128 (N-1)*64
L0 L1 L2 . . . LN-1
Compressed Page
0 ? ? ?Address Offset
Uncompressed Page
Virtual Page (4KB)
Fragmentation
Virtual Address
Physical Address
Challenge 1: Address Computation Challenge 2: Mapping & Fragmentation
김 해 천
Embedded System Lab.
Linearly Compressed Pages(LCP): Key Idea
64B 64B 64B 64B . . .
. . .
4:1 Compression
64B
Uncompressed Page (4KB: 64*64B)
Compressed Data (1KB)
LCP effectively solves challenge 1: address computation
128
32
Fixed compressed size EM
idx
E0
Metadata (64B)
ExceptionStorage
김 해 천
Embedded System Lab.
Base-Delta Encoding [PACT’12]
32-byte Uncompressed Cache Line
0xC04039C0 0xC04039C8 0xC04039D0 … 0xC04039F8
0xC04039C0Base
0x00
1 byte
0x08
1 byte
0x10
1 byte
… 0x38 12-byte Compressed Cache Line
20 bytes saved Fast Decompression: vector addition
Simple Hardware: arithmetic and comparison
Effective: good compression ratio
김 해 천
Embedded System Lab.
Result
Effect on Memory Capacity 32 SPEC2006, databases, web workloads, 2MB L2 cache
LCP-based designs achieve competitive average com-pression ratios with prior work
0.0
0.5
1.0
1.5
2.0
2.5
1.00
1.591.521.62
2.60
Baseline RMC LCP-FPCLCP-BDI LZ
Co
mp
ress
ion
Rat
io
LCP-based designs signifi-cantly reduce bandwidth (24%) (due to data compres-sion)
LCP-based designs sig-nificantly improve per-formance over RMC
1-core 2-core 4-core0%
2%
4%
6%
8%
10%
12%
14%
16% RMC LCP-FPC LCP-BDI
Per
form
ance
Im
pro
vem
ent
0.0
0.2
0.4
0.6
0.8
1.0 1.00
0.79 0.80 0.76
Baseline RMC LCP-FPC LCP-BDI
Nor
mal
ized
BPKI
Bet-
ter
LCP framework significantly de-creases the number of page faults (up to 23% on average for 768MB)
256MB 512MB 768MB 1GB0
0.2
0.4
0.6
0.8
1
1.2
8%
14% 23%21%
Baseline LCP-BDI
DRAM Size
No
rmal
ized
# o
f P
age
Fau
lts
Effect on Memory Capacity Effect on Bus Bandwidth Effect on Performance Effect on Page Faults
김 해 천
Embedded System Lab.
Conclusion Old Idea: Compress data in main memory Problem: How to avoid inefficiency in address computation? Solution: A new main memory compression framework called LCP (Linearly Compressed
Pages) Key idea: fixed-size for compressed cache lines within a page
Evaluation:
1. Increases memory capacity (62% on average)
2. Decreases bandwidth consumption (24%)
3. Improves overall performance (13.9%)
김 해 천
Embedded System Lab.
http://slideplayer.com/slide/3542154/ http://users.ece.cmu.edu/~omutlu/pub/linearly-compressed-pages_micro13.pdf
김 해 천
Embedded System Lab.
Memory Request FlowInitial Page Compression (1/3)
Memory Request Flow (2)
Last-LevelCache
Core TLB
Compress/ Decom-press
MemoryController
MD Cache
Processor
Disk
DRAM
4KB
1KB
1. Initial Page Compression2. Cache Line Read
LD
LD
1KB$Line
3. Cache Line Writeback
$Line
2KB
$Line
Cache Line Read (2/3)Cache Line Writeback (3/3)
김 해 천
Embedded System Lab.
Physically Tagged Caches
Core
TLB
tag
tag
tag
Physical Address
data
data
data
VirtualAddress
Critical PathAddress Translation
L2 CacheLines
김 해 천
Embedded System Lab.
Frequent Pattern Compression
Idea: encode cache lines based on frequently oc-curring patterns, e.g., first half of a word is zero
0x00000001
0x00000000
0xFFFFFFFF
0xABCDEFFF
0x00000001
0010x0000000
0000
0xFFFFFFFF
0110xABCDEF
FF111
Frequent Patterns:000 – All zeros001 – First half zeros010 – Second half zeros011 – Repeated bytes100 – All ones…111 – Not a frequent pattern
001 0x0001 000 0110xFF
1110xABCDEF
FF
0x0001
0xFF0xABCDEF
FF