CROW - safari.ethz.ch · 21 CROW-ref Problem: Refresh has high overheads. Weak rows lead to high...

Post on 27-May-2020

1 views 0 download

Transcript of CROW - safari.ethz.ch · 21 CROW-ref Problem: Refresh has high overheads. Weak rows lead to high...

Hasan HassanMinesh Patel Jeremie S. Kim A. Giray Yaglikci Nandita Vijaykumar

Nika Mansouri Ghiasi Saugata Ghose Onur Mutlu

CROWA Low-Cost Substrate for Improving

DRAM Performance, Energy Efficiency, and Reliability

Presented at ISCA 2019

2

SummaryChallenges of DRAM scaling:

• High access latency → bottleneck for improving system performance/energy

• Refresh overhead → reduces performance and consume high energy

• Exposure to vulnerabilities (e.g., RowHammer)

Copy-Row DRAM (CROW)• Introduces copy rows into a subarray

• The benefits of a copy row:

• Efficiently duplicating data from regular row to a copy row

• Quick access to a duplicated row

• Remapping a regular row to a copy row

CROW is a flexible substrate with many use cases:• CROW-cache & CROW-ref

• Mitigating RowHammer

• We hope CROW enables many other use cases going forward

(20% speedup and consumes 22% less DRAM energy)

regular rows

SA SA SA SA SASA

CR

OW

d

eco

der copy rows

reg

ula

r ro

w

de

cod

er

Source code available in July:github.com/CMU-SAFARI/CROW

3

Outline1. DRAM Operation Basics

2. The CROW Substrate

3. Evaluation

4. Conclusion

CROW-cache: Reducing DRAM Latency

CROW-ref: Reducing DRAM Refresh

Mitigating RowHammer

4

DRAM Organization

CPU

Memory Controller

DRAM Row

Sense Amplifier

Memory Bus

DRAM Cell

DRAM Subarray

5

Accessing DRAM

DRAM Row

Sense Amplifier

DRAM Cell

DRAM Subarray

Activate

Precharge

Read

6

Outline1. DRAM Operation Basics

2. The CROW Substrate

3. Evaluation

4. Conclusion

CROW-cache: Reducing DRAM Latency

CROW-ref: Reducing DRAM Refresh

Mitigating RowHammer

7

Challenges of DRAM Scaling

DRAM

access latency

refresh overhead

exposure to vulnerabilities

1

2

3

8

Our Goal

We want a substrate that enables the duplication and remapping of data within a subarray

9

Memory Controller

The Components of CROW

CROW-table

DRAM Subarray

DRAMregular rows

SA SA SA SA SASA

CR

OW

d

eco

der copy rows

reg

ula

r ro

w

de

cod

er

SA SA SA SA SASA

row

dec

od

er

10

DRAMregular

rows

SA SA SA SA SASA

CR

OW

d

eco

der copy rows

reg

ula

r ro

w

de

cod

er

Memory Controller

CROW Operation 1: Row Copy

DRAM Subarray

ACT-c (copy)S

A

SA SA

SA SA

SA

11

Row Copy: Steps

SenseAmplifier

source row:

destination row:

1 Activation of the source row

2 Charge sharing

3 Beginning of restoration

4 Activation of the destination row

5Restoration of both rows to source data

12

Row Copy: Steps

SenseAmplifier

source row:

destination row:

1 Activation of the source row

2 Charge sharing

3 Beginning of restoration

4 Activation of the destination row

5Restoration of both rows to source data

Enables quickly copying a regular row into a copy row

13

Memory Controller

CROW Operation 2: Two-Row Activation

DRAM Subarray

DRAM

ACT-t (two row)

regular rows

SA SA SA SA SASA

CR

OW

d

eco

der copy rows

reg

ula

r ro

w

de

cod

er

SA

SA

SA

SA

SA

SA

14

Two-Row Activation: Steps

SenseAmplifier

1 Activation of two rows

2 Charge sharing

3 Restoration

fast

both charged or discharged

15

Two-Row Activation: Steps

SenseAmplifier

1 Activation of two rows

2 Charge sharing

3 Restoration

fast

Enables fast access to data that is duplicated across a regular row and a copy row

both charged or discharged

16

Outline1. DRAM Operation Basics

2. The CROW Substrate

3. Evaluation

4. Conclusion

CROW-cache: Reducing DRAM Latency

CROW-ref: Reducing DRAM Refresh

Mitigating RowHammer

17

CROW-cacheProblem: High access latency

Key idea: Use copy rows to enable low-latency access to most-recently-activated regular rows in a subarray

CROW-cache combines:• row copy → copy a newly activated regular row into a copy row• two-row activation → activate the regular row and copy row

together on the next access

Reduces activation latency by 38%

18

Memory Controller

CROW-cache Operationload row X

DRAM

ACT-c

ACT-t

load row X

RequestQueue

1 CROW-table miss

2 Allocate a copy row

3 Issue ACT-c (copy)

1 CROW-table hit

2 Issue ACT-t (two row)

DRAM Subarray

regular rows

SA SA SA SA SASA

CR

OW

d

eco

der copy rows

reg

ula

r ro

w

de

cod

er

SA

SA SA

SA SA

SA

SA

SA

SA

SA

SA

SA

CROW-table

copy row 0 row X

[bank conflict]

19

Memory Controller

CROW-cache Operation

DRAM

ACT-t

1 CROW-table miss

2 Allocate a copy row

3 Issue ACT-c

1 CROW-table hit

2 Issue ACT-t

DRAM Subarray

regular rows

SA SA SA SA SASA

CR

OW

d

eco

der copy rows

reg

ula

r ro

w

de

cod

er

SA

SA SA

SA SA

SA

SA

SA

SA

SA

SA

SA

CROW-table

copy row 0 row XSecond activation of row X is faster

load row X

load row X

RequestQueue

[bank conflict]

20

Outline1. DRAM Operation Basics

2. The CROW Substrate

3. Evaluation

4. Conclusion

CROW-cache: Reducing DRAM Latency

CROW-ref: Reducing DRAM Refresh

Mitigating RowHammer

21

CROW-refProblem: Refresh has high overheads. Weak rows lead to high refresh rate

• weak row: at least one of the row’s cells cannot retain data correctly when refresh rate is decreased

Key idea: Safely reduce refresh rate by remapping a weak regular row to a strong copy row

CROW-ref uses:• row copy → copy a weak regular row to a strong copy row

CROW-ref eliminates more than half of the refresh requests

22

CROW-ref Operation

SA SA SA SA SASA SA

SA SA

SA SA

SA

weak

strongRetention Time Profiler

strongstrong

strongstrong

1Perform retention time profiling

3 On ACT, check the CROW-table

4If remapped, activate a copy row

2Remap weak rows to strong copy rows

23

CROW-ref Operation

SA SA SA SA SASA SA

SA SA

SA SA

SA

weak

strongRetention Time Profiler

strongstrong

strongstrong

1Perform retention time profiling

3 On ACT, check the CROW-table

4If remapped, activate a copy row

How many weak rows exist in a DRAM chip?

2Remap weak rows to strong copy rows

24

Identifying Weak RowsWeak cells are rare [Liu+, ISCA’13]

weak cell: retention < 256ms

~1000/238 (32 GiB) failing cells

DRAM Retention Time Profiler• REAPER [Patel+, ISCA’17]

PARBOR [Khan+, DSN’16]AVATAR [Qureshi+, DSN’15]

• At system boot or during runtime

3.30E-043.30E-11

0%

20%

40%

60%

80%

100%

1 2 4 8

Pro

ba

bil

ity

Weak rows in a subarray

25

Identifying Weak RowsWeak cells are rare [Liu+, ISCA’13]

weak cell: retention < 256ms

~1000/238 (32 GiB) failing cells

DRAM Retention Time Profiler• REAPER [Patel+, ISCA’17]

PARBOR [Khan+, DSN’16]AVATAR [Qureshi+, DSN’15]

• At system boot or during runtime

3.30E-043.30E-11

0%

20%

40%

60%

80%

100%

1 2 4 8

Pro

ba

bil

ity

Weak rows in a subarrayA few copy rows are sufficient to halve the refresh rate

26

Outline1. DRAM Operation Basics

2. The CROW Substrate

3. Evaluation

4. Conclusion

CROW-cache: Reducing DRAM Latency

CROW-ref: Reducing DRAM Refresh

Mitigating RowHammer

27

activate

Mitigating RowHammer

SA SA SA SA SASA

victimaggressor

victim

precharge

Key idea: remap victim rows to copy rows

28

Outline1. DRAM Operation Basics

2. The CROW Substrate

3. Evaluation

4. Conclusion

CROW-cache: Reducing DRAM Latency

CROW-ref: Reducing DRAM Refresh

Mitigating RowHammer

29

Methodology• Simulator

• DRAM Simulator (Ramulator [Kim+, CAL’15])https://github.com/CMU-SAFARI/ramulator

• Workloads• 44 single-core workloads

• SPEC CPU2006, TPC, STREAM, MediaBench• 160 multi-programmed four-core workloads

• By randomly choosing from single-core workloads• Execute at least 200 million representative instructions per core

• System Parameters• 1/4 core system with 8 MiB LLC• LPDDR4 main memory• 8 copy rows per 512-row subarray

Source code available in July:github.com/CMU-SAFARI/CROW

30

0.86

0.88

0.90

0.92

0.94

0.96

0.98

1.00

single-core four-core

No

rmal

ized

D

RA

M E

ner

gy

1.00

1.02

1.04

1.06

1.08

1.10

single-core four-core

Spee

du

pCROW-cache Results

7.1%7.5%8.2% 6.9%

* with 8 copy rows and a 64Gb DRAM chip (sensitivity in paper)

31

0.86

0.88

0.90

0.92

0.94

0.96

0.98

1.00

single-core four-core

No

rmal

ized

D

RA

M E

ner

gy

1.00

1.02

1.04

1.06

1.08

1.10

single-core four-core

Spee

du

pCROW-cache Results

7.1%7.5%8.2% 6.9%

* with 8 copy rows a 64Gb DRAM chip (sensitivity in paper)

CROW-cache improves single-/four-core performance and energy

32

0.70

0.75

0.80

0.85

0.90

0.95

1.00

single-core four-core

No

rmal

ized

D

RA

M E

ner

gy

1.041.051.061.071.081.09

1.11.111.121.13

single-core four-core

Spee

du

pCROW-ref Results

17.2%

7.8%

7.1%

11.9%

* with 8 copy rows and a 64Gb DRAM chip (sensitivity in paper)

33

0.70

0.75

0.80

0.85

0.90

0.95

1.00

single-core four-core

No

rmal

ized

D

RA

M E

ner

gy

1.041.051.061.071.081.09

1.11.111.121.13

single-core four-core

Spee

du

pCROW-ref Results

17.2%

7.8%

7.1%

11.9%

* with 8 copy rows a 64Gb DRAM chip (sensitivity in paper)

CROW-ref significantly reduces the performance and energy overhead of DRAM refresh

34

Combining CROW-cache and CROW-ref

0.70

0.80

0.90

1.00

1.10

1.20

1.30

single-core four-core

Spee

du

p

CROW-(cache+ref)

Ideal CROW-cache + no refresh

0.70

0.72

0.74

0.76

0.78

0.80

single-core four-core

No

rmal

ized

D

RA

M E

ner

gy

CROW-(cache+ref)

Ideal CROW-cache + no refresh

17% 20% 23% 22%

35

Combining CROW-cache and CROW-ref

0.70

0.80

0.90

1.00

1.10

1.20

1.30

single-core four-core

Spee

du

p

CROW-(cache+ref)

Ideal CROW-cache + no refresh

0.70

0.72

0.74

0.76

0.78

0.80

single-core four-core

No

rmal

ized

D

RA

M E

ner

gy

CROW-(cache+ref)

Ideal CROW-cache + no refresh

17% 20% 23% 22%

CROW-(cache+ref) provides more performance and DRAM energy benefits than each mechanism alone

36

Hardware Overhead

For 8 copy rows and 16 GiB DRAM:•0.5% DRAM chip area•1.6% DRAM capacity•11.3 KiB memory controller storage

CROW is a low-cost substrate

37

Other Results in the Paper• Performance and energy sensitivity to:

• Number of copy-rows per subarray

• DRAM chip density

• Last-level cache capacity

• CROW-cache with prefetching

• CROW-cache compared to other in-DRAM caching mechanisms:• TL-DRAM [Lee+, HPCA’13]

• SALP [Kim+, ISCA’12]

38

Outline1. DRAM Operation Basics

2. The CROW Substrate

3. Evaluation

4. Conclusion

CROW-cache: Reducing DRAM Latency

CROW-ref: Reducing DRAM Refresh

Mitigating RowHammer

39

ConclusionChallenges of DRAM scaling:

• High access latency → bottleneck for improving system performance/energy

• Refresh overhead → reduces performance and consume high energy

• Exposure to vulnerabilities (e.g., RowHammer)

Copy-Row DRAM (CROW)• Introduces copy rows into a subarray

• The benefits of a copy row:

• Efficiently duplicating data from regular row to a copy row

• Quick access to a duplicated row

• Remapping a regular row to a copy row

CROW is a flexible substrate with many use cases:• CROW-cache & CROW-ref

• Mitigating RowHammer

• We hope CROW enables many other use cases going forward

(20% speedup and consumes 22% less DRAM energy)

regular rows

SA SA SA SA SASA

CR

OW

d

eco

der copy rows

reg

ula

r ro

w

de

cod

er

Source code available in July:github.com/CMU-SAFARI/CROW

Hasan HassanMinesh Patel Jeremie S. Kim A. Giray Yaglikci Nandita Vijaykumar

Nika Mansouri Ghiasi Saugata Ghose Onur Mutlu

CROWA Low-Cost Substrate for Improving

DRAM Performance, Energy Efficiency, and Reliability

Backup Slides

42

Latency Reduction with MRA

43

activate

Mitigating RowHammer

SA SA SA SA SASA

victimaggressor

victim

precharge

Key idea: remap victim rows to copy rows

44

0.900.951.001.051.101.151.20

lesl

ie3

d

tpch

2

zeu

s

lbm

mcf

stre

am-c

p

lib

q

h2

64

-dec

AV

ER

AG

E(1

-co

re)

HH

HH

Spee

du

pCROW-1 CROW-8

CROW-64 CROW-128

Ideal CROW-cache (100% Hit Rate)

CROW-cache Performance

6.6%

0.7%7.1%

single-core four-core

7.5%

45

0.90

0.95

1.00

1.05

1.10

1.15

1.20

Spee

du

p8 Gbit 16 Gbit 32 Gbit 64 Gbit

CROW-ref Performance

7.1%

single-core

11.9%

four-coresingle-core four-core

46

0.70

0.75

0.80

0.85

0.90

0.95

1.00

No

rmal

ized

D

RA

M E

ner

gy8 Gbit 16 Gbit 32 Gbit 64 Gbit

CROW-ref Energy Savings

17.2%

single-core four-core

7.8%

47

Speedup - CROW-cache

Single-core

48

Speedup - CROW-cache

Four-core

49

Energy – CROW-cache

50

Comparison to TL-DRAM and SALP

CROW-1

CROW-8

TL-DRAM-1

TL-DRAM-8

SALP-512

SALP-256

SALP-128

SALP-512-O

SALP-256-O

SALP-128-O

0.60.81.01.21.41.61.82.02.2

4% 6% 8% 10% 12% 14% 16%

No

rmal

ized

D

RA

M E

ner

gy

Speedup

CROW-1CROW-8

TL-DRAM-1

TL-DRAM-8

SALP-256

SALP-128

SALP-256-O

SALP-128-O

0%

5%

10%

15%

20%

25%

30%

4% 6% 8% 10% 12% 14% 16%

Ch

ip A

rea

Ove

rhea

d

Speedup

51

Slide on RLTL

52

Speedup – CROW-ref

53

Energy – CROW-ref

54

CROW-cache + ref

55

CROW-table Organization

56

tRCD vs tRAS

57

MRA Area Overhead

58

Data 0

Data 1

Cell

time

cha

rge

Sense-Amplifier

Sensing Restore

Cell

SenseAmplifier

Precharge

R/WACT PRE

Ready to AccessCharge Level

tRCD

tRAS

Ready to AccessReady to

Precharge

DRAM Charge over Time