PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES ›...

80
PageForge: A Near-Memory Content- Aware Page-Merging Architecture Dimitrios Skarlatos, Nam Sung Kim, and Josep Torrellas University of Illinois at Urbana-Champaign MICRO-50 @ Boston

Transcript of PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES ›...

Page 1: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

PageForge: A Near-Memory Content-Aware Page-Merging Architecture

Dimitrios Skarlatos, Nam Sung Kim, and Josep Torrellas

University of Illinois at Urbana-Champaign

MICRO-50 @ Boston

Page 2: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

Motivation: Server Consolidation in the Cloud

2

Page 3: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

Motivation: Server Consolidation in the Cloud

3

How to consolidate Main Memory?

Page 4: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

Content-Aware Page Merging or Page Deduplication

4

Page 5: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

Content-Aware Page Merging or Page Deduplication

• Hypervisors search the address space and merge identical pages

5

Page 6: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

Content-Aware Page Merging or Page Deduplication

• Hypervisors search the address space and merge identical pages

6

Host Physical

Guest Virtual

Guest Physical

Page 0 Page 1

Page 0 Page 1

Page 0 Page 1

VM-0 VM-1

Page 7: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

Content-Aware Page Merging or Page Deduplication

• Hypervisors search the address space and merge identical pages

7

Host Physical

Guest Virtual

Guest Physical

Page 0 Page 1

Page 0 Page 1

Page 0 Page 1

VM-0 VM-1

Page 0 Page 1

Page 0

Page 0 Page 1

VM-0 VM-1

Page 8: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

Problem: Overhead of Software Page Merging

• Search hundreds of millions of pages

• Latency sensitive applications get disrupted

• RedHat’s SW page merging (KSM) has execution overhead:

• Average mean latency overhead: 68%

• Average tail latency overhead: 136%

8

Page 9: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

Contribution: PageForge• First solution for hardware-assisted content-aware page merging

• General, effective, minimal hypervisor involvement & hardware mods

• Reduced overhead vs state-of-the-art software:

• Mean latency 68% à 10%

• Tail latency 136% à 11%

• Same memory savings as software: 48% à Twice #VMs

• Novel use of ECC for page content characterization

MICRO-50 @ Boston

Page 10: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

Content Duplication in the Cloud

Page 11: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

Background: Cloud

11

Infrastructure

Page 12: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

Background: Cloud

12

Infrastructure

Host Operating System

Page 13: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

Background: Cloud

13

Infrastructure

Host Operating System

Hypervisor

Page 14: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

Background: Cloud

14

Infrastructure

Host Operating System

Hypervisor

Guest OS

Page 15: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

Background: Cloud

15

Infrastructure

Host Operating System

Hypervisor

Guest OS

Libs

Page 16: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

Background: Cloud

16

Infrastructure

Host Operating System

Hypervisor

Guest OS

Libs

Data

Page 17: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

Background: Cloud

17

Infrastructure

Host Operating System

Hypervisor

Guest OS Guest OS Guest OS

Libs

Data

Libs

Data

Libs

Data

Page 18: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

Background: Content Duplication

18

Infrastructure

Host Operating System

Hypervisor

Guest OS Guest OS Guest OS

Libs

Data

Libs

Data

Libs

Data

Page 19: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

Background: Content Duplication

19

Infrastructure

Host Operating System

Hypervisor

Guest OS Guest OS Guest OS

Libs

Data

Libs

Data

Libs

Data

Page 20: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

Background: Content Duplication

20

Infrastructure

Host Operating System

Hypervisor

Guest OS Guest OS Guest OS

Libs

Data

Libs

Data

Libs

Data

Page 21: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

Background: Content Duplication

21

Infrastructure

Host Operating System

Hypervisor

Guest OS Guest OS Guest OS

Libs

Data

Libs

Data

Libs

Data

Page 22: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

Background: Content Duplication

22

Infrastructure

Host Operating System

Hypervisor

Guest OS Guest OS Guest OS

Libs

Data

Libs

Data

Libs

Data

Page 23: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

Background: Content Duplication

23

Infrastructure

Host Operating SystemPage Merging

ContentDuplication

Guest OS Guest OS Guest OS

Libs

Data

Libs

Data

Libs

Data

Hypervisor

Page 24: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

Our Proposal: Content Deduplication in HW

24

Infrastructure

Host Operating SystemPage Merging

ContentDuplication

Guest OS Guest OS Guest OS

Libs

Data

Libs

Data

Libs

Data

Hypervisor

Page 25: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

Software-based Content-Aware Page Merging

Page 26: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

Software-Based Content-Aware Page Merging

Pool of Pages to Scan

4KB4KB

Pool of Stable Pages

4KB4KB

4KB

4KB

Page 27: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

Software-Based Content-Aware Page Merging

Pool of Pages to Scan

4KB4KB

Pool of Stable Pages

4KB4KB

4KB

4KB

Candidate Page

ComparePage

Page 28: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

Software-Based Content-Aware Page Merging

4KB

4KB

MemoryCtrl L3 $ L1 $ Core

Main Memory

L2 $

Candidate Page

ComparePage

Page 29: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

Core1B

1BL1 $

Software-Based Content-Aware Page Merging

4KB

4KB

MemoryCtrl L3 $ L2 $

Main Memory

==

64B

64B

Candidate Page

ComparePage

Page 30: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

Software-Based Content-Aware Page Merging

Pool of Pages to Scan

4KB4KB

Pool of Stable Pages

4KB4KB

4KB

4KB

Candidate Page

ComparePage

Page 31: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

Software-Based Content-Aware Page Merging

Pool of Pages to Scan

4KB4KB

Pool of Stable Pages

4KB4KB

4KB

4KB

Candidate Page

ComparePage

Page 32: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

Why Page-Merging is expensive?

• Search hundreds of millions of pages

• Many core cycles consumed

• Caches get polluted

• Latency sensitive applications get disrupted

32

Page 33: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

Software-Based Content-Aware Page Merging

• Optimization: Identify if the candidate page has recently changed

• If a page changes too often à Not a good candidate

Candidate Page

4KB

1KB #Hash Key

JHash

Page 34: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

PageForge: Hardware Assisted Page-Merging

Page 35: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

PageForge Main Idea

• Hardware engine in the memory controller

• Compares pages

• Generates hashes

• Advantages:

ü No core utilization

ü No cache pollution

35

L2 L2 L2

CPU CPU CPU

Interconnect

L3 L3 L3

Mem

Con

trolle

r

L2

CPU

L3

Page

Forg

e

Memory

Memory

L3 L3 L3 L3

L2 L2 L2 L2

CPU CPU CPU CPU

L2

CPU

L3

L3

L2

CPU

Mem

Con

trolle

r

Memory

Memory

Page 36: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

PageForge Main Idea

• Hardware engine in the memory controller

• Compares pages

• Generates hashes

• Advantages:

ü No core utilization

ü No cache pollution

36

L2 L2 L2

CPU CPU CPU

Interconnect

L3 L3 L3

Mem

Con

trolle

r

L2

CPU

L3

Page

Forg

e

Memory

Memory

L3 L3 L3 L3

L2 L2 L2 L2

CPU CPU CPU CPU

L2

CPU

L3

L3

L2

CPU

Mem

Con

trolle

r

Memory

Memory

Page 37: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

PageForge @ The Memory Controller

Read Req Buffer

Write Req Buffer

Scheduler(FR-FCFS)

CommandGeneration

Memory Interface

On-C

hip Netw

ork

Control

Comparator

Write Data BufferMemory Controller

Off-C

hip Mem

ory Interface

PageForge

Scan Table

ECC Enc

ECC DecRead Data Buffer

37

Page 38: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

PageForge Operation

38

Page 39: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

PageForge Operation

39

Operating System

• Loads in Scan Table:

• Candidate page PPN

• PPNs of pages to compare to candidate

Page 40: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

PageForge Operation

40

Operating System

• Loads in Scan Table:

• Candidate page PPN

• PPNs of pages to compare to candidate

PageForgeHardware

• Autonomously performs :

• Sequence of comparisons

• Generation of the hash key for candidate page

Page 41: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

PageForge Operation

41

In Hardware

Page 42: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

PageForge Operation

42

SearchPool of

Stable Pages

PickCandidate

Page

In Hardware

Page 43: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

PageForge Operation

43

Match Found?

OS MergesPages

SearchPool of

Stable Pages

In Hardware

Yes

Page 44: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

PageForge Operation

44

Match Found?

Old Key =New Key?

No

In Hardware

SearchPool of

Stable Pages

Old Key =New Key?

No

PickCandidate

Page

Page 45: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

PageForge Operation

45

SearchPool of

Unprotected Pages

Match Found?

Yes

Yes

In Hardware

Match Found?

Old Key =New Key?

SearchPool of

Stable Pages

OS MergesPages

Page 46: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

PageForge Operation

46

SearchPool of

Unprotected Pages

Match Found?

Insert Candidate in Unprotected Pool

No

Yes

In Hardware

Match Found?

Old Key =New Key?

SearchPool of

Stable Pages

Page 47: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

PageForge Operation

47

SearchPool of

Unprotected Pages

Match Found?

Insert Candidate in Unprotected Pool

No

Yes

Yes

In Hardware

Match Found?

OS MergesPages

Old Key =New Key?

No

No

Yes

SearchPool of

Stable Pages

PickCandidate

Page

Page 48: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

Eliminating The Cost of Hash Keys

Page 49: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

Hash Key for Page Content Characterization

49

Page 50: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

Hash Key for Page Content Characterization

• If page changed recently, key may be different

50

Page 51: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

Hash Key for Page Content Characterization

• If page changed recently, key may be different

• KSM: JHash à Serial computation + 1KB of data

51

Page 52: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

Hash Key for Page Content Characterization

• If page changed recently, key may be different

• KSM: JHash à Serial computation + 1KB of data

• PageForge: ECC à Parallel computation + 256B of data

52

Page 53: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

PageForge @ The Memory Controller

Read Req Buffer

Write Req Buffer

Scheduler(FR-FCFS)

CommandGeneration

Memory Interface

On-C

hip Netw

ork

Control

Comparator

Write Data BufferMemory Controller

Off-C

hip Mem

ory Interface

PageForge

Scan Table

ECC Enc

ECC DecRead Data Buffer

53

Page 54: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

Proposal: ECC for Page Content Characterization

54

Page 55: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

Proposal: ECC for Page Content Characterization

• Break a 4KB page into 4 segments

55

16

Data 64B

16 16 16

4K Page

Page 56: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

Proposal: ECC for Page Content Characterization

• Break a 4KB page into 4 segments

• Pick a random cache line within each segment

56

16

Data 64B

16 16 16

4K Page

Page 57: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

Proposal: ECC for Page Content Characterization

• Break a 4KB page into 4 segments

• Pick a random cache line within each segment

• Select the 8-bit ECC of the least significant QWORD

57

16

Data 64B

16 16 16

ECC 8B

4K Page

Page 58: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

Proposal: ECC for Page Content Characterization

• Break a 4KB page into 4 segments

• Pick a random cache line within each segment

• Select the 8-bit ECC of the least significant QWORD

58

16

Data 64B

16 16 16

ECC 8B

Hash Key 4B

4K Page

Page 59: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

Proposal: ECC for Page Content Characterization

• Break a 4KB page into 4 segments

• Pick a random cache line within each segment

• Select the 8-bit ECC of the least significant QWORD

59

16

Data 64B

16 16 16

ECC 8B

Hash Key 4B

Software JHash = 1KB

Page 60: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

Proposal: ECC for Page Content Characterization

• Break a 4KB page into 4 segments

• Pick a random cache line within each segment

• Select the 8-bit ECC of the least significant QWORD

60

16

Data 64B

16 16 16

ECC 8B

Hash Key 4B

Software Jhash = 1KB

Page 61: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

Proposal: ECC for Page Content Characterization

• Break a 4KB page into 4 segments

• Pick a random cache line within each segment

• Select the 8-bit ECC of the least significant QWORD

61

16

Data 64B

16 16 16

ECC 8B

Hash Key 4B

ECC Hash = 256 Bytes

Page 62: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

Proposal: ECC for Page Content Characterization

• Break a 4KB page into 4 segments

• Pick a random cache line within each segment

• Select the 8-bit ECC of the least significant QWORD

62

16

Data 64B

16 16 16

ECC 8B

Hash Key 4B75% Memory Footprint Reduction for Hash Key

Generation!

Page 63: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

Evaluation

Page 64: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

Simulation Setup

• 2GHz, 10-core, 16GB DRAM

• One VM per core

• Ubuntu 16.04 Host, Ubuntu Cloud Guest

• InHouse simulator: Simics + SST + DRAMSim2

• TailBench Suite benchmarks

64

Page 65: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

Configurations

• Baseline: Page-merging is disabled

• KSM: RedHat’s implementations shipped with the current Linux

• PageForge: Hardware-assisted page-merging

65

Page 66: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

Memory Savings – w/o vs w/ Page Merging

66

Img-Dnn Masstree Moses Silo Sphinx Average

48%

W/O

Pa

ge-M

ergi

ng

0

20

40

60

80

100

Pa

ges

%

w/o Page Merging w/ Page Merging

Page 67: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

Memory Savings – w/o vs w/ Page Merging

67

Img-Dnn Masstree Moses Silo Sphinx Average

48%

W/O

Pa

ge-M

ergi

ng

0

20

40

60

80

100

Pa

ges

%

w/o Page Merging w/ Page Merging

PageForge AchievesMemory Savings of 48%

Page 68: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

Memory Savings – w/o vs w/ Page Merging

68

Img-Dnn Masstree Moses Silo Sphinx Average

48%

W/O

Pa

ge-M

ergi

ng

0

20

40

60

80

100

Pa

ges

%

w/o Page Merging w/ Page Merging

PageForge AchievesMemory Savings of 48%

Twice #VMs!!

Page 69: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

Memory Savings – w/o vs w/ Page Merging

0

20

40

60

80

100

Pa

ges

%

Unmergeable Mergeable Zero Mergeable Non-Zero

69

Img-Dnn Masstree Moses Silo Sphinx Average

W/O

Pa

ge-M

ergi

ng

W/ P

age

-Mer

gin

g

Page 70: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

Memory Savings – w/o vs w/ Page Merging

0

20

40

60

80

100

Pa

ges

%

Unmergeable Mergeable Zero Mergeable Non-Zero

70

Img-Dnn Masstree Moses Silo Sphinx Average

W/O

Pa

ge-M

ergi

ng

W/ P

age

-Mer

gin

g

Page 71: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

Memory Savings – w/o vs w/ Page Merging

0

20

40

60

80

100

Pa

ges

%

Unmergeable Mergeable Zero Mergeable Non-Zero

71

Img-Dnn Masstree Moses Silo Sphinx Average

W/O

Pa

ge-M

ergi

ng

W/ P

age

-Mer

gin

g48%

Page 72: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

Mean Latency of Requests

0

0.5

1

1.5

2

2.5

3

Img-Dnn Masstree Moses Silo Sphinx Average

No

rma

lize

d M

ean

La

ten

cy

KSM PageForge

72

Page 73: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

Mean Latency of Requests

0

0.5

1

1.5

2

2.5

3

Img-Dnn Masstree Moses Silo Sphinx Average

No

rma

lize

d M

ean

La

ten

cy

Baseline KSM PageForge

73

Page 74: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

Mean Latency of Requests

0

0.5

1

1.5

2

2.5

3

Img-Dnn Masstree Moses Silo Sphinx Average

No

rma

lize

d M

ean

La

ten

cy

Baseline KSM PageForge

74

Page 75: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

Mean Latency of Requests

0

0.5

1

1.5

2

2.5

3

Img-Dnn Masstree Moses Silo Sphinx Average

No

rma

lize

d M

ean

La

ten

cy

Baseline KSM PageForge

75

68%10%

Page 76: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

Mean Latency of Requests

0

0.5

1

1.5

2

2.5

3

Img-Dnn Masstree Moses Silo Sphinx Average

No

rma

lize

d M

ean

La

ten

cy

Baseline KSM PageForge

76

PageForge Reduces Mean Latency Overhead

From 68% Down to 10%!

68%10%

Page 77: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

Tail Latency of Requests

0

0.5

1

1.5

2

2.5

3

Img-Dnn Masstree Moses Silo Sphinx Average

95th

Per

cen

tile

La

ten

cy

Baseline KSM PageForge5.18

77

5.18

136%

11%

Page 78: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

Tail Latency of Requests

0

0.5

1

1.5

2

2.5

3

Img-Dnn Masstree Moses Silo Sphinx Average

95th

Per

cen

tile

La

ten

cy

Baseline KSM PageForge

78

PageForge Reduces Tail Latency Overhead

From 136% Down to 11%!

5.18

136%

11%

Page 79: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

Also in the Paper

• Software interface to PageForge

• Interaction with the cache coherence protocol

• Alternative designs

• Bandwidth analysis

• ECC vs Jhash à ECC keys add negligible collisions

• Power and area at 22nm is negligible

79

Page 80: PageForge: A Near-Memory Content- Aware Page …iacoma.cs.uiuc.edu › iacoma-papers › PRES › present_micro17_1.pdfPageForge: A Near-Memory Content- Aware Page-Merging Architecture

Takeaway: PageForge• First solution for hardware-assisted content-aware page merging

• General, effective, minimal hypervisor involvement & hardware mods

• Reduced overhead vs state-of-the-art software:

• Mean latency 68% à 10%

• Tail latency 136% à 11%

• Same memory savings as software: 48% à Twice #VMs

• Novel use of ECC for content characterization à 75% less memory footprint

MICRO-50 @ Boston