CRAMES: Compressed RAM for Embedded...

37
CRAMES: Compressed RAM for Embedded Systems Robert Dick http://robertdick.org/ or http://ziyang.eecs.northwestern.edu/ dickrp/ Department of Electrical Engineering and Computer Science Northwestern University Application Swap Virtual Memory CRAMES Application Page Decomp Page Comp Application

Transcript of CRAMES: Compressed RAM for Embedded...

Page 1: CRAMES: Compressed RAM for Embedded Systemsziyang.eecs.umich.edu/~dickrp/esds-two-week/lectures/memcomp.pdf · CRAMES: Compressed RAM for Embedded Systems ... xzxx xxxz zxxx zzxx

CRAMES: Compressed RAM for Embedded Systems

Robert Dick

http://robertdick.org/ or http://ziyang.eecs.northwestern.edu/∼dickrp/Department of Electrical Engineering and Computer Science

Northwestern University

Application

Swap

VirtualMemory

CRAMES

Application

PageDecomp

PageComp

Application

Page 2: CRAMES: Compressed RAM for Embedded Systemsziyang.eecs.umich.edu/~dickrp/esds-two-week/lectures/memcomp.pdf · CRAMES: Compressed RAM for Embedded Systems ... xzxx xxxz zxxx zzxx

IntroductionCRAMES design

Experimental evaluation

Collaborators on project

NEC

Northwestern UniversityLei Yang

Lan S. BaiRobert P. Dick

NEC Labs AmericaDr. Haris Lekatsas

Dr. Srimat Chakradhar

2 Robert Dick CRAMES

Page 3: CRAMES: Compressed RAM for Embedded Systemsziyang.eecs.umich.edu/~dickrp/esds-two-week/lectures/memcomp.pdf · CRAMES: Compressed RAM for Embedded Systems ... xzxx xxxz zxxx zzxx

IntroductionCRAMES design

Experimental evaluation

MotivationPast work

Outline

1. IntroductionMotivationPast work

2. CRAMES designDesign overviewPattern-based partial match compression

3. Experimental evaluationExperimental setupCommercialization and future directions

3 Robert Dick CRAMES

Page 4: CRAMES: Compressed RAM for Embedded Systemsziyang.eecs.umich.edu/~dickrp/esds-two-week/lectures/memcomp.pdf · CRAMES: Compressed RAM for Embedded Systems ... xzxx xxxz zxxx zzxx

IntroductionCRAMES design

Experimental evaluation

MotivationPast work

Problem background

RAM quantity limits application functionality

RAM price dropping but usage growing faster

Secure Internet access, email, music, and games

How much RAM?

Functionality

Cost

Power consumption

Size

4 Robert Dick CRAMES

Page 5: CRAMES: Compressed RAM for Embedded Systemsziyang.eecs.umich.edu/~dickrp/esds-two-week/lectures/memcomp.pdf · CRAMES: Compressed RAM for Embedded Systems ... xzxx xxxz zxxx zzxx

IntroductionCRAMES design

Experimental evaluation

MotivationPast work

Ideal hardware–software design process

Ideal case

Hardware and software engineers collaborate on system-level designfrom start to finish

We teach the advantages of this in our classes

It doesn’t always happen

5 Robert Dick CRAMES

Page 6: CRAMES: Compressed RAM for Embedded Systemsziyang.eecs.umich.edu/~dickrp/esds-two-week/lectures/memcomp.pdf · CRAMES: Compressed RAM for Embedded Systems ... xzxx xxxz zxxx zzxx

IntroductionCRAMES design

Experimental evaluation

MotivationPast work

Real hardware–software design process

HW engineers SW engineers

6 Robert Dick CRAMES

Page 7: CRAMES: Compressed RAM for Embedded Systemsziyang.eecs.umich.edu/~dickrp/esds-two-week/lectures/memcomp.pdf · CRAMES: Compressed RAM for Embedded Systems ... xzxx xxxz zxxx zzxx

IntroductionCRAMES design

Experimental evaluation

MotivationPast work

Real hardware–software design process

HW engineers SW engineers

6 Robert Dick CRAMES

Page 8: CRAMES: Compressed RAM for Embedded Systemsziyang.eecs.umich.edu/~dickrp/esds-two-week/lectures/memcomp.pdf · CRAMES: Compressed RAM for Embedded Systems ... xzxx xxxz zxxx zzxx

IntroductionCRAMES design

Experimental evaluation

MotivationPast work

Options

Option 1: Add more memory

Implications: Hardware redesign, miss shipping target, get fired

Option 2: Rip out memory-hungry application features

Implications: Lose market to competitors, fail to recoup design andproduction costs, get fired

Option 3: Make it seem as if memory increased without changinghardware, without changing applications, and without performance orpower consumption penalties

Nobody knew how to do this

7 Robert Dick CRAMES

Page 9: CRAMES: Compressed RAM for Embedded Systemsziyang.eecs.umich.edu/~dickrp/esds-two-week/lectures/memcomp.pdf · CRAMES: Compressed RAM for Embedded Systems ... xzxx xxxz zxxx zzxx

IntroductionCRAMES design

Experimental evaluation

MotivationPast work

Goals

Allow application RAM requirements to overrun initial estimates evenafter hardware design

Reduce physical RAM, negligible performance and energy cost

Improve functionality or performance with same physical RAM

8 Robert Dick CRAMES

Page 10: CRAMES: Compressed RAM for Embedded Systemsziyang.eecs.umich.edu/~dickrp/esds-two-week/lectures/memcomp.pdf · CRAMES: Compressed RAM for Embedded Systems ... xzxx xxxz zxxx zzxx

IntroductionCRAMES design

Experimental evaluation

MotivationPast work

HW RAM compression

Tremaine 2001, Benini 2002, Moore 2003

Hardware (de)compression unit between cache and RAM

Hardware redesign and application-specific compression hardware

Past work claimed application-specific hardware essential to keeppower and performance overhead low

9 Robert Dick CRAMES

Page 11: CRAMES: Compressed RAM for Embedded Systemsziyang.eecs.umich.edu/~dickrp/esds-two-week/lectures/memcomp.pdf · CRAMES: Compressed RAM for Embedded Systems ... xzxx xxxz zxxx zzxx

IntroductionCRAMES design

Experimental evaluation

MotivationPast work

Code compression

Lekatsas 2000, Xu 2004

Store code compressed, decompress during execution

Compress off-line, decompress on-line

For RAM, less important than on-line data compression

10 Robert Dick CRAMES

Page 12: CRAMES: Compressed RAM for Embedded Systemsziyang.eecs.umich.edu/~dickrp/esds-two-week/lectures/memcomp.pdf · CRAMES: Compressed RAM for Embedded Systems ... xzxx xxxz zxxx zzxx

IntroductionCRAMES design

Experimental evaluation

MotivationPast work

Compression for swap performance

Compressed caching

Douglis 1993, Russinovich 1996, Wilson 1999, Kjelso 1999

Add compressed software cache to VM

Swap compression

RamDoubler, Cortez 2000, Roy 2001, Chihaia 2005

Compress swapped-out pages and store them in software cache

Both techniques

Target: general purpose system with disks

Goal: improve system performance

Interface to backing store (disk)

11 Robert Dick CRAMES

Page 13: CRAMES: Compressed RAM for Embedded Systemsziyang.eecs.umich.edu/~dickrp/esds-two-week/lectures/memcomp.pdf · CRAMES: Compressed RAM for Embedded Systems ... xzxx xxxz zxxx zzxx

IntroductionCRAMES design

Experimental evaluation

Design overviewPattern-based partial match compression

Outline

1. IntroductionMotivationPast work

2. CRAMES designDesign overviewPattern-based partial match compression

3. Experimental evaluationExperimental setupCommercialization and future directions

12 Robert Dick CRAMES

Page 14: CRAMES: Compressed RAM for Embedded Systemsziyang.eecs.umich.edu/~dickrp/esds-two-week/lectures/memcomp.pdf · CRAMES: Compressed RAM for Embedded Systems ... xzxx xxxz zxxx zzxx

IntroductionCRAMES design

Experimental evaluation

Design overviewPattern-based partial match compression

Design principles

Page selection

Scheduling compression and decompression

Organizing compressed and uncompressed regions

Dynamically adjust compressed region size

Compression scheme

High performance

Energy efficient

Good compression ratio

Low memory requirement

13 Robert Dick CRAMES

Page 15: CRAMES: Compressed RAM for Embedded Systemsziyang.eecs.umich.edu/~dickrp/esds-two-week/lectures/memcomp.pdf · CRAMES: Compressed RAM for Embedded Systems ... xzxx xxxz zxxx zzxx

IntroductionCRAMES design

Experimental evaluation

Design overviewPattern-based partial match compression

CRAMES design

Kernel Swapping System

CRAMES Device Driver

BlockDecompressor

MemoryManager

BlockCompressor

Virtual Memory

RAM

MMU

Page Fault

8

7

3

45

1

2

6

Memory Low

8

7

3

4

5

6

2

1

Memory Low

8

7

3

4

5

6

2

1 8

7

3

4

5

6

2

1

14 Robert Dick CRAMES

Page 16: CRAMES: Compressed RAM for Embedded Systemsziyang.eecs.umich.edu/~dickrp/esds-two-week/lectures/memcomp.pdf · CRAMES: Compressed RAM for Embedded Systems ... xzxx xxxz zxxx zzxx

IntroductionCRAMES design

Experimental evaluation

Design overviewPattern-based partial match compression

System configuration

Slightly slowermain memoryworking area

20 MB

Main memoryworking area

10 MB

User view

Slightly slowerRamdisk with

file system20 MB

Compressedswap device

10 MB

Main memoryworking area

10 MB

Compressed RAM device with file system

12 MB

System view

CRAMES

Main memoryworking area

20 MB

Ramdiskwith file system

12 MB

System/User view

Original

15 Robert Dick CRAMES

Page 17: CRAMES: Compressed RAM for Embedded Systemsziyang.eecs.umich.edu/~dickrp/esds-two-week/lectures/memcomp.pdf · CRAMES: Compressed RAM for Embedded Systems ... xzxx xxxz zxxx zzxx

IntroductionCRAMES design

Experimental evaluation

Design overviewPattern-based partial match compression

Compression algorithm

Compression Ratio

0.14

0.19

0.24

0.29

0.34

0.39

512 1024 2048 4096 8192

Block Size (bytes)

Co

mp

ressio

n R

ati

o bzip2

zlib default

zilb level 1

zlib level 9

RLE

LZRW-1A

LZO

Compression Time

0

0.0005

0.001

0.0015

0.002

0.0025

512 1024 2048 4096 8192

Block Size (bytes)

Tim

e (

sec)

bzip2

zilb default

zlib level 1

zlib level 9

RLE

LZRW-1A

LZO

Decompression Time

0

0.0002

0.0004

0.0006

0.0008

0.001

0.0012

512 1024 2048 4096 8192

Block Size (bytes)

Tim

e (

sec)

bzip2

zlib default

zlib level 1

zlib level 9

RLE

LZRW-1A

LZO

Working Memory

1

10

100

1000

10000

bzip2 zlib default LZO LZRW-1A RLE

Compression Algorithms

log

2 (

byte

s)

Compression

Decompression

7600kB

3700kB

256KB

44KB64KB

0

16KB 16KB

0 0

Compression Ratio

0.14

0.19

0.24

0.29

0.34

0.39

512 1024 2048 4096 8192

Block Size (bytes)

Co

mp

ressio

n R

ati

o bzip2

zlib default

zilb level 1

zlib level 9

RLE

LZRW-1A

LZO

Compression Time

0

0.0005

0.001

0.0015

0.002

0.0025

512 1024 2048 4096 8192

Block Size (bytes)

Tim

e (

sec)

bzip2

zilb default

zlib level 1

zlib level 9

RLE

LZRW-1A

LZO

Decompression Time

0

0.0002

0.0004

0.0006

0.0008

0.001

0.0012

512 1024 2048 4096 8192

Block Size (bytes)

Tim

e (

sec)

bzip2

zlib default

zlib level 1

zlib level 9

RLE

LZRW-1A

LZO

Working Memory

1

10

100

1000

10000

bzip2 zlib default LZO LZRW-1A RLE

Compression Algorithms

log

2 (

byte

s)

Compression

Decompression

7600kB

3700kB

256KB

44KB64KB

0

16KB 16KB

0 0

Compression Ratio

0.14

0.19

0.24

0.29

0.34

0.39

512 1024 2048 4096 8192

Block Size (bytes)

Co

mp

ressio

n R

ati

o bzip2

zlib default

zilb level 1

zlib level 9

RLE

LZRW-1A

LZO

Compression Time

0

0.0005

0.001

0.0015

0.002

0.0025

512 1024 2048 4096 8192

Block Size (bytes)

Tim

e (

sec)

bzip2

zilb default

zlib level 1

zlib level 9

RLE

LZRW-1A

LZO

Decompression Time

0

0.0002

0.0004

0.0006

0.0008

0.001

0.0012

512 1024 2048 4096 8192

Block Size (bytes)

Tim

e (

sec)

bzip2

zlib default

zlib level 1

zlib level 9

RLE

LZRW-1A

LZO

Working Memory

1

10

100

1000

10000

bzip2 zlib default LZO LZRW-1A RLE

Compression Algorithms

log

2 (

byte

s)

Compression

Decompression

7600kB

3700kB

256KB

44KB64KB

0

16KB 16KB

0 0

Compression Ratio

0.14

0.19

0.24

0.29

0.34

0.39

512 1024 2048 4096 8192

Block Size (bytes)

Co

mp

ressio

n R

ati

o bzip2

zlib default

zilb level 1

zlib level 9

RLE

LZRW-1A

LZO

Compression Time

0

0.0005

0.001

0.0015

0.002

0.0025

512 1024 2048 4096 8192

Block Size (bytes)

Tim

e (

sec)

bzip2

zilb default

zlib level 1

zlib level 9

RLE

LZRW-1A

LZO

Decompression Time

0

0.0002

0.0004

0.0006

0.0008

0.001

0.0012

512 1024 2048 4096 8192

Block Size (bytes)

Tim

e (

sec)

bzip2

zlib default

zlib level 1

zlib level 9

RLE

LZRW-1A

LZO

Working Memory

1

10

100

1000

10000

bzip2 zlib default LZO LZRW-1A RLE

Compression Algorithms

log

2 (

byte

s)

Compression

Decompression

7600kB

3700kB

256KB

44KB64KB

0

16KB 16KB

0 0

LZObest existingcompressionalgorithm for

CRAMES

16 Robert Dick CRAMES

Page 18: CRAMES: Compressed RAM for Embedded Systemsziyang.eecs.umich.edu/~dickrp/esds-two-week/lectures/memcomp.pdf · CRAMES: Compressed RAM for Embedded Systems ... xzxx xxxz zxxx zzxx

IntroductionCRAMES design

Experimental evaluation

Design overviewPattern-based partial match compression

Compression algorithm

Compression Ratio

0.14

0.19

0.24

0.29

0.34

0.39

512 1024 2048 4096 8192

Block Size (bytes)

Co

mp

ressio

n R

ati

o bzip2

zlib default

zilb level 1

zlib level 9

RLE

LZRW-1A

LZO

Compression Time

0

0.0005

0.001

0.0015

0.002

0.0025

512 1024 2048 4096 8192

Block Size (bytes)

Tim

e (

sec)

bzip2

zilb default

zlib level 1

zlib level 9

RLE

LZRW-1A

LZO

Decompression Time

0

0.0002

0.0004

0.0006

0.0008

0.001

0.0012

512 1024 2048 4096 8192

Block Size (bytes)

Tim

e (

sec)

bzip2

zlib default

zlib level 1

zlib level 9

RLE

LZRW-1A

LZO

Working Memory

1

10

100

1000

10000

bzip2 zlib default LZO LZRW-1A RLE

Compression Algorithms

log

2 (

byte

s)

Compression

Decompression

7600kB

3700kB

256KB

44KB64KB

0

16KB 16KB

0 0

Compression Ratio

0.14

0.19

0.24

0.29

0.34

0.39

512 1024 2048 4096 8192

Block Size (bytes)

Co

mp

ressio

n R

ati

o bzip2

zlib default

zilb level 1

zlib level 9

RLE

LZRW-1A

LZO

Compression Time

0

0.0005

0.001

0.0015

0.002

0.0025

512 1024 2048 4096 8192

Block Size (bytes)

Tim

e (

sec)

bzip2

zilb default

zlib level 1

zlib level 9

RLE

LZRW-1A

LZO

Decompression Time

0

0.0002

0.0004

0.0006

0.0008

0.001

0.0012

512 1024 2048 4096 8192

Block Size (bytes)

Tim

e (

sec)

bzip2

zlib default

zlib level 1

zlib level 9

RLE

LZRW-1A

LZO

Working Memory

1

10

100

1000

10000

bzip2 zlib default LZO LZRW-1A RLE

Compression Algorithms

log

2 (

byte

s)

Compression

Decompression

7600kB

3700kB

256KB

44KB64KB

0

16KB 16KB

0 0

Compression Ratio

0.14

0.19

0.24

0.29

0.34

0.39

512 1024 2048 4096 8192

Block Size (bytes)

Co

mp

ressio

n R

ati

o bzip2

zlib default

zilb level 1

zlib level 9

RLE

LZRW-1A

LZO

Compression Time

0

0.0005

0.001

0.0015

0.002

0.0025

512 1024 2048 4096 8192

Block Size (bytes)

Tim

e (

sec)

bzip2

zilb default

zlib level 1

zlib level 9

RLE

LZRW-1A

LZO

Decompression Time

0

0.0002

0.0004

0.0006

0.0008

0.001

0.0012

512 1024 2048 4096 8192

Block Size (bytes)

Tim

e (

sec)

bzip2

zlib default

zlib level 1

zlib level 9

RLE

LZRW-1A

LZO

Working Memory

1

10

100

1000

10000

bzip2 zlib default LZO LZRW-1A RLE

Compression Algorithms

log

2 (

byte

s)

Compression

Decompression

7600kB

3700kB

256KB

44KB64KB

0

16KB 16KB

0 0

Compression Ratio

0.14

0.19

0.24

0.29

0.34

0.39

512 1024 2048 4096 8192

Block Size (bytes)

Co

mp

ressio

n R

ati

o bzip2

zlib default

zilb level 1

zlib level 9

RLE

LZRW-1A

LZO

Compression Time

0

0.0005

0.001

0.0015

0.002

0.0025

512 1024 2048 4096 8192

Block Size (bytes)

Tim

e (

sec)

bzip2

zilb default

zlib level 1

zlib level 9

RLE

LZRW-1A

LZO

Decompression Time

0

0.0002

0.0004

0.0006

0.0008

0.001

0.0012

512 1024 2048 4096 8192

Block Size (bytes)

Tim

e (

sec)

bzip2

zlib default

zlib level 1

zlib level 9

RLE

LZRW-1A

LZO

Working Memory

1

10

100

1000

10000

bzip2 zlib default LZO LZRW-1A RLE

Compression Algorithms

log

2 (

byte

s)

Compression

Decompression

7600kB

3700kB

256KB

44KB64KB

0

16KB 16KB

0 0

LZObest existingcompressionalgorithm for

CRAMES

16 Robert Dick CRAMES

Page 19: CRAMES: Compressed RAM for Embedded Systemsziyang.eecs.umich.edu/~dickrp/esds-two-week/lectures/memcomp.pdf · CRAMES: Compressed RAM for Embedded Systems ... xzxx xxxz zxxx zzxx

IntroductionCRAMES design

Experimental evaluation

Design overviewPattern-based partial match compression

Weak link: Compression algorithm

LZO average performance penalty 9.5% when RAM reduced to 40%

Developed simulation environment to permit profiling

Compression and decompression were taking most time

Needed a better compression algorithm

17 Robert Dick CRAMES

Page 20: CRAMES: Compressed RAM for Embedded Systemsziyang.eecs.umich.edu/~dickrp/esds-two-week/lectures/memcomp.pdf · CRAMES: Compressed RAM for Embedded Systems ... xzxx xxxz zxxx zzxx

IntroductionCRAMES design

Experimental evaluation

Design overviewPattern-based partial match compression

Weak link: Compression algorithm

LZO average performance penalty 9.5% when RAM reduced to 40%

Developed simulation environment to permit profiling

Compression and decompression were taking most time

Needed a better compression algorithm

17 Robert Dick CRAMES

Page 21: CRAMES: Compressed RAM for Embedded Systemsziyang.eecs.umich.edu/~dickrp/esds-two-week/lectures/memcomp.pdf · CRAMES: Compressed RAM for Embedded Systems ... xzxx xxxz zxxx zzxx

IntroductionCRAMES design

Experimental evaluation

Design overviewPattern-based partial match compression

Pattern-based partial dictionary match coding

Is something similar

in dictionary?

Input 32-bit word

Output dictionary

location and

difference

Update dictionary

entry

Move entry to

front of

dictionary

Eliminate LRU

dictionary entry

Place entry

in dictionary

Output value

YN

18 Robert Dick CRAMES

Page 22: CRAMES: Compressed RAM for Embedded Systemsziyang.eecs.umich.edu/~dickrp/esds-two-week/lectures/memcomp.pdf · CRAMES: Compressed RAM for Embedded Systems ... xzxx xxxz zxxx zzxx

IntroductionCRAMES design

Experimental evaluation

Design overviewPattern-based partial match compression

Data regularity

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

(16,

1)(8

,2)

(4,4

)

(32,

1)

(16,

2)(8

,4)

(64,

1)

(32,

2)

(16,

4)

(128

,1)

(64,

2)

(32,

4)

(256

,1)

(128

,2)

(64,

4)

Dictionary layout (size, level)

Perc

en

tag

e %

xmxx

xzxz

zxxz

xzzx

zxzz

zzxz

xzzz

xxzz

xxzx

xzxx

xxxz

zxxx

zzxx

zxzx

mmmx

mmxx

mmmm

zzzx

xxxx

zzzz

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

(16,

1)(8

,2)

(4,4

)

(32,

1)

(16,

2)(8

,4)

(64,

1)

(32,

2)

(16,

4)

(128

,1)

(64,

2)

(32,

4)

(256

,1)

(128

,2)

(64,

4)

Dictionary layout (size, level)

Perc

en

tag

e %

xmxx

xzxz

zxxz

xzzx

zxzz

zzxz

xzzz

xxzz

xxzx

xzxx

xxxz

zxxx

zzxx

zxzx

mmmx

mmxx

mmmm

zzzx

xxxx

zzzz

Zeros: 38%

Small values: 10%

Nearby values similar(data, pointers): 27%

19 Robert Dick CRAMES

Page 23: CRAMES: Compressed RAM for Embedded Systemsziyang.eecs.umich.edu/~dickrp/esds-two-week/lectures/memcomp.pdf · CRAMES: Compressed RAM for Embedded Systems ... xzxx xxxz zxxx zzxx

IntroductionCRAMES design

Experimental evaluation

Design overviewPattern-based partial match compression

Data regularity

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

(16,

1)(8

,2)

(4,4

)

(32,

1)

(16,

2)(8

,4)

(64,

1)

(32,

2)

(16,

4)

(128

,1)

(64,

2)

(32,

4)

(256

,1)

(128

,2)

(64,

4)

Dictionary layout (size, level)

Perc

en

tag

e %

xmxx

xzxz

zxxz

xzzx

zxzz

zzxz

xzzz

xxzz

xxzx

xzxx

xxxz

zxxx

zzxx

zxzx

mmmx

mmxx

mmmm

zzzx

xxxx

zzzz

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

(16,

1)(8

,2)

(4,4

)

(32,

1)

(16,

2)(8

,4)

(64,

1)

(32,

2)

(16,

4)

(128

,1)

(64,

2)

(32,

4)

(256

,1)

(128

,2)

(64,

4)

Dictionary layout (size, level)

Perc

en

tag

e %

xmxx

xzxz

zxxz

xzzx

zxzz

zzxz

xzzz

xxzz

xxzx

xzxx

xxxz

zxxx

zzxx

zxzx

mmmx

mmxx

mmmm

zzzx

xxxx

zzzz

Zeros: 38%

Small values: 10%

Nearby values similar(data, pointers): 27%

19 Robert Dick CRAMES

Page 24: CRAMES: Compressed RAM for Embedded Systemsziyang.eecs.umich.edu/~dickrp/esds-two-week/lectures/memcomp.pdf · CRAMES: Compressed RAM for Embedded Systems ... xzxx xxxz zxxx zzxx

IntroductionCRAMES design

Experimental evaluation

Design overviewPattern-based partial match compression

Pattern-based partial match

Algorithm

Consider each 32-bit word as an input

Allow partial dictionary match

Use most frequent patterns based on statistical analysis

Optimizations

Two-way associative LRU 16-entry hash-mapped dictionary

Optimized coding scheme

Early termination for uncompressable data

Fine-grained operation parallelization

20 Robert Dick CRAMES

Page 25: CRAMES: Compressed RAM for Embedded Systemsziyang.eecs.umich.edu/~dickrp/esds-two-week/lectures/memcomp.pdf · CRAMES: Compressed RAM for Embedded Systems ... xzxx xxxz zxxx zzxx

IntroductionCRAMES design

Experimental evaluation

Design overviewPattern-based partial match compression

PBPM evaluations

PBPM twice as fast as LZO

Compression ratio degrades by 10%

Improves performance and still doubles usable memory

21 Robert Dick CRAMES

Page 26: CRAMES: Compressed RAM for Embedded Systemsziyang.eecs.umich.edu/~dickrp/esds-two-week/lectures/memcomp.pdf · CRAMES: Compressed RAM for Embedded Systems ... xzxx xxxz zxxx zzxx

IntroductionCRAMES design

Experimental evaluation

Design overviewPattern-based partial match compression

CRAMES and in-RAM filesystem compression

Compressed

RAM disk reserved

Kernel

Main

memory

7%

2.2 MB

55.6%

17.8 MB

37.5%

12 MB

22 Robert Dick CRAMES

Page 27: CRAMES: Compressed RAM for Embedded Systemsziyang.eecs.umich.edu/~dickrp/esds-two-week/lectures/memcomp.pdf · CRAMES: Compressed RAM for Embedded Systems ... xzxx xxxz zxxx zzxx

IntroductionCRAMES design

Experimental evaluation

Experimental setupCommercialization and future directions

Outline

1. IntroductionMotivationPast work

2. CRAMES designDesign overviewPattern-based partial match compression

3. Experimental evaluationExperimental setupCommercialization and future directions

23 Robert Dick CRAMES

Page 28: CRAMES: Compressed RAM for Embedded Systemsziyang.eecs.umich.edu/~dickrp/esds-two-week/lectures/memcomp.pdf · CRAMES: Compressed RAM for Embedded Systems ... xzxx xxxz zxxx zzxx

IntroductionCRAMES design

Experimental evaluation

Experimental setupCommercialization and future directions

Experimental setup

Sharp Zaurus SL-5600 PDA

Intel XScale PXA250

32 MB flash memory

32 MB RAM

Embedix (Linux 2.4.18 kernel)

Qt/Qtopia PDA edition

24 Robert Dick CRAMES

Page 29: CRAMES: Compressed RAM for Embedded Systemsziyang.eecs.umich.edu/~dickrp/esds-two-week/lectures/memcomp.pdf · CRAMES: Compressed RAM for Embedded Systems ... xzxx xxxz zxxx zzxx

IntroductionCRAMES design

Experimental evaluation

Experimental setupCommercialization and future directions

CRAMES for compressed filesystems

CRAMES was used to create a compressed RAM device onZaurus SL-5600

EXT2 file system

LZO compression

Resource map memory allocation

Benchmarks: common file system operations

Increase capacity to 1.6×Execution time increases by 8.4%

Energy consumption increases by 5.2%

25 Robert Dick CRAMES

Page 30: CRAMES: Compressed RAM for Embedded Systemsziyang.eecs.umich.edu/~dickrp/esds-two-week/lectures/memcomp.pdf · CRAMES: Compressed RAM for Embedded Systems ... xzxx xxxz zxxx zzxx

IntroductionCRAMES design

Experimental evaluation

Experimental setupCommercialization and future directions

Benchmarks

Impact of using CRAMES to reduce physical RAM

Results for

ADPCM: Speech compression application from MediaBench

JPEG: Image encoding application from MediaBench

MPEG2: Video CODEC application from MediaBench

Straight-forward matrix multiplication

Intentionally difficult for CRAMES

Also tested on

10 GUI applications that came with Qtopia

Next-generation cellphone prototype

26 Robert Dick CRAMES

Page 31: CRAMES: Compressed RAM for Embedded Systemsziyang.eecs.umich.edu/~dickrp/esds-two-week/lectures/memcomp.pdf · CRAMES: Compressed RAM for Embedded Systems ... xzxx xxxz zxxx zzxx

IntroductionCRAMES design

Experimental evaluation

Experimental setupCommercialization and future directions

Results

Reduced RAM from 20 MB to 8MB

Base case: 20 MB RAM, no compression

Without CRAMES

All suffered significant performance penalties

Matrix multiplication cannot execute

With CRAMES

LZO average case 9% overhead, worst case 29%

PBPM average case 2.5% overhead, worst case 9%

Also works on arbitrary in-RAM filesystems

27 Robert Dick CRAMES

Page 32: CRAMES: Compressed RAM for Embedded Systemsziyang.eecs.umich.edu/~dickrp/esds-two-week/lectures/memcomp.pdf · CRAMES: Compressed RAM for Embedded Systems ... xzxx xxxz zxxx zzxx

IntroductionCRAMES design

Experimental evaluation

Experimental setupCommercialization and future directions

Status

Patent pending: Northwestern University and NEC

Millions of cellphones using CRAMES started shipping in June2007

Technology won Computerwold Horizon Award in 2007

28 Robert Dick CRAMES

Page 33: CRAMES: Compressed RAM for Embedded Systemsziyang.eecs.umich.edu/~dickrp/esds-two-week/lectures/memcomp.pdf · CRAMES: Compressed RAM for Embedded Systems ... xzxx xxxz zxxx zzxx

IntroductionCRAMES design

Experimental evaluation

Experimental setupCommercialization and future directions

What did this require?

Bunny suit army?

29 Robert Dick CRAMES

Page 34: CRAMES: Compressed RAM for Embedded Systemsziyang.eecs.umich.edu/~dickrp/esds-two-week/lectures/memcomp.pdf · CRAMES: Compressed RAM for Embedded Systems ... xzxx xxxz zxxx zzxx

IntroductionCRAMES design

Experimental evaluation

Experimental setupCommercialization and future directions

What did this require?

Bunny suit army?No! One good Ph.D. student.

30 Robert Dick CRAMES

Page 35: CRAMES: Compressed RAM for Embedded Systemsziyang.eecs.umich.edu/~dickrp/esds-two-week/lectures/memcomp.pdf · CRAMES: Compressed RAM for Embedded Systems ... xzxx xxxz zxxx zzxx

IntroductionCRAMES design

Experimental evaluation

Experimental setupCommercialization and future directions

Another directionMemory expansion for MMU-less embedded systems

user application

byte code

modified byte code

executable

LLVM compiler

MEMMU compiler

Calls to

MEMMU

library

LLVM compiler

Instruction

replacement,

compiler-time

optimizations

MEMMU

runtime

library

Observations and Results

Application: Sensor networks

Implemented in LLVM,tested on TelosB nodes

Increases usable memory by50%, unchanged applications

Little overhead aftercompiler optimizations

CASES’06

With Lan Bai and Lei Yang

31 Robert Dick CRAMES

Page 36: CRAMES: Compressed RAM for Embedded Systemsziyang.eecs.umich.edu/~dickrp/esds-two-week/lectures/memcomp.pdf · CRAMES: Compressed RAM for Embedded Systems ... xzxx xxxz zxxx zzxx

IntroductionCRAMES design

Experimental evaluation

Experimental setupCommercialization and future directions

Acknowledgments

This work was supported by NEC Labs America and NSF awardCCR-0347941.

We would like to acknowledge Hui Ding of NorthwesternUniversity for his suggestions on efficiently implementing PBPMand his help testing CRAMES with GUI applications.

Professors Peter Dinda, Gokhan Memik, and Lawrence Henschenprovided suggestions and feedback.

32 Robert Dick CRAMES

Page 37: CRAMES: Compressed RAM for Embedded Systemsziyang.eecs.umich.edu/~dickrp/esds-two-week/lectures/memcomp.pdf · CRAMES: Compressed RAM for Embedded Systems ... xzxx xxxz zxxx zzxx

IntroductionCRAMES design

Experimental evaluation

Experimental setupCommercialization and future directions

Thank you for attending!

For additional information about this work

CRAMES and MEMMU athttp://ziyang.eecs.northwestern.edu/∼dickrp/tools.html

For additional information about the team that created it

http://ziyang.eecs.northwestern.edu/∼dickrp/people/students.html

33 Robert Dick CRAMES