Embedded System Lab. 김해천 [email protected] Linearly Compressed Pages: A Low- Complexity,...

Embedded System Lab.


김해천 [email protected]

Linearly Compressed Pages: A Low-Complexity,

Low-Latency Main Memory Com-pression Framework

Gennady Pekhimenko† [email protected] Vivek Seshadri† [email protected] Yoongu Kim† [email protected] Hongyi Xin† [email protected] Onur Mutlu†

[email protected] Phillip B. Gibbons? [email protected] Michael A. Kozuch? [email protected] Todd C. Mowry† [email protected] †Carnegie Mellon University Intel Labs Pittsburgh

mailto:[email protected]








김 해 천


AbstractData compression is a promising approach for meeting the increasing memory capacity demands ex-pected in future systems. Unfortunately, existing compression algorithms do not translate wellwhen directly applied to main memory because they require the memory controller to perform non-trivial computation to locate a cache line within a compressed memory page, thereby increasing access la-tency and degrading system performance. Prior proposals for addressing this performance degradation problem are either costly or energy inefficient. By leveraging the key insight that all cache lines within a page should be compressed to the same size, this paper proposes a new approach to main memory compression—Linearly Compressed Pages (LCP)—that avoids the performance degradation problem without requiring costly or energy-inefficient hardware. We show that any compression algorithm can be adapted to fit the requirements of LCP, and we specifically adapt two previously-proposed compression algorithms to LCP: Frequent Pattern Compression and Base-Delta-Immediate Compression. Evaluations using benchmarks from SPEC CPU2006 and five server benchmarks show that our approach can signif-icantly increase the effective memory capacity (by 69% on average). In addition to the capacity gains, we evaluate the benefit of transferring consecutive compressed cache lines between the memory con-troller and main memory. Our new mechanism considerably reduces the memory bandwidth require-ments of most of the evaluated benchmarks (by 24% on average), and improves overall performance (by 6.1%/13.9%/10.7% for single-/two-/four-core orkloads on average) compared to a baseline system that does not employ main memory compression. LCP also decreases energy consumed by the main memory subsystem (by 9.5% on average overthe best prior mechanism).

김 해 천


Introduction Main memory, commonly implemented using DRAM technology, is a critical resource in mod-

ern systems Main memory capacity must be sufficiently provisioned

To prevent devastating performance loss from frequent page faults, overflowing working set

Unfortunately, the required minimum memory capacity is expected to increase in the future Applications are generally becoming more data-intensive with working set sizes Many core integrated onto the same chip, more applications are running concurrently on the system

Simply scaling up main memory? DRAM already constitutes a significant portion of the system’s cost and power budget Expensive off-chip signaling buffers

Data compression would be a very attractive approach to effectively increase main memory capacity

김 해 천


Potential for Data Compression

Significant redundancy in in-memory data

How can we exploit this redundancy? Main Memory compression helps Provides effect of a larger memory without making it physically larger

0x00000000 0x0000000B 0x00000003 0x00000004 …

L0 L1 L2 . . . LN-1

Cache Line (64B)

Address Offset 0 64 128 (N-1)*64

L0 L1 L2 . . . LN-1

Compressed Page

0 ? ? ?Address Offset

Uncompressed Page

Virtual Page (4KB)

Fragmentation

Virtual Address

Physical Address

Challenge 1: Address Computation Challenge 2: Mapping & Fragmentation

김 해 천


Linearly Compressed Pages(LCP): Key Idea

64B 64B 64B 64B . . .

. . .

4:1 Compression

64B

Uncompressed Page (4KB: 64*64B)

Compressed Data (1KB)

LCP effectively solves challenge 1: address computation

128

32

Fixed compressed size EM

idx

E0

Metadata (64B)

ExceptionStorage

김 해 천


Base-Delta Encoding [PACT’12]

32-byte Uncompressed Cache Line

0xC04039C0 0xC04039C8 0xC04039D0 … 0xC04039F8

0xC04039C0Base

0x00

1 byte

0x08

1 byte

0x10

1 byte

… 0x38 12-byte Compressed Cache Line

20 bytes saved Fast Decompression: vector addition

Simple Hardware: arithmetic and comparison

Effective: good compression ratio

김 해 천


Result

Effect on Memory Capacity 32 SPEC2006, databases, web workloads, 2MB L2 cache

LCP-based designs achieve competitive average com-pression ratios with prior work

0.0

0.5

1.0

1.5

2.0

2.5

1.00

1.591.521.62

2.60

Baseline RMC LCP-FPCLCP-BDI LZ

Co

mp

ress

ion

Rat

io

LCP-based designs signifi-cantly reduce bandwidth (24%) (due to data compres-sion)

LCP-based designs sig-nificantly improve per-formance over RMC

1-core 2-core 4-core0%

2%

4%

6%

8%

10%

12%

14%

16% RMC LCP-FPC LCP-BDI

Per

form

ance

Im

pro

vem

ent

0.0

0.2

0.4

0.6

0.8

1.0 1.00

0.79 0.80 0.76

Baseline RMC LCP-FPC LCP-BDI

Nor

mal

ized

BPKI

Bet-

ter

LCP framework significantly de-creases the number of page faults (up to 23% on average for 768MB)

256MB 512MB 768MB 1GB0

0.2

0.4

0.6

0.8

1

1.2

8%

14% 23%21%

Baseline LCP-BDI

DRAM Size

No

rmal

ized

# o

f P

age

Fau

lts

Effect on Memory Capacity Effect on Bus Bandwidth Effect on Performance Effect on Page Faults

김 해 천


Conclusion Old Idea: Compress data in main memory Problem: How to avoid inefficiency in address computation? Solution: A new main memory compression framework called LCP (Linearly Compressed

Pages) Key idea: fixed-size for compressed cache lines within a page

Evaluation:

1. Increases memory capacity (62% on average)

2. Decreases bandwidth consumption (24%)

3. Improves overall performance (13.9%)

김 해 천


http://slideplayer.com/slide/3542154/ http://users.ece.cmu.edu/~omutlu/pub/linearly-compressed-pages_micro13.pdf

http://slideplayer.com/slide/3542154/

http://slideplayer.com/slide/3542154/

김 해 천


Memory Request FlowInitial Page Compression (1/3)

Memory Request Flow (2)

Last-LevelCache

Core TLB

Compress/ Decom-press

MemoryController

MD Cache

Processor

Disk

DRAM

4KB

1KB

1. Initial Page Compression2. Cache Line Read

LD

LD

1KB$Line

3. Cache Line Writeback

$Line

2KB

$Line

Cache Line Read (2/3)Cache Line Writeback (3/3)

김 해 천


Physically Tagged Caches

Core

TLB

tag

tag

tag

Physical Address

data

data

data

VirtualAddress

Critical PathAddress Translation

L2 CacheLines

김 해 천


Frequent Pattern Compression

Idea: encode cache lines based on frequently oc-curring patterns, e.g., first half of a word is zero

0x00000001

0x00000000

0xFFFFFFFF

0xABCDEFFF

0x00000001

0010x0000000

0000

0xFFFFFFFF

0110xABCDEF

FF111

Frequent Patterns:000 – All zeros001 – First half zeros010 – Second half zeros011 – Repeated bytes100 – All ones…111 – Not a frequent pattern

001 0x0001 000 0110xFF

1110xABCDEF

FF

0x0001

0xFF0xABCDEF

FF

Embedded System Lab. 김해천 [email protected] Linearly Compressed Pages: A Low- Complexity,...

Documents

Transcript of Embedded System Lab. 김해천 [email protected] Linearly Compressed Pages: A Low- Complexity,...