Memory Scaling is Dead, Long Live Memory Scaling · Moinuddin K. Qureshi ECE, Georgia Tech Memory...

Moinuddin K. Qureshi

ECE, Georgia Tech

Memory Scaling is Dead, Long Live Memory Scaling

Le Memoire Scaling est mort, vive le Memoire Scaling!

At Yale’s “Mid Career” Celebration at University of Texas at Austin, Sept 19 2014

The Gap in Memory Hierarchy

Main memory system must scale to maintain performance growth

21 23 27 211 213 215 219 223

Typical access latency in processor cycles (@ 4 GHz)

L1(SRAM) EDRAM DRAM HDD

25 29 217 221

Misses in main memory (page faults) degrade performance severely

Memory capacity per core expected to drop by 30% every two years

The Memory Capacity Gap

Trends: Core count doubling every 2 years.

DRAM DIMM capacity doubling every 3 years

Lim+ ISCA’09

Increasing error rates can render DRAM scaling infeasible

Challenges for DRAM: Scaling Wall

Scaling wall DRAM does not scale well to

small feature sizes (sub 1x nm)

Important to investigate both approaches

Two Roads Diverged …

Architectural support for DRAM scaling

and to reduce refresh overheads

Find alternative technology that avoids problems

of DRAM

DRAM challenges

Outline

Introduction

ArchShiled: Yield Aware (arch support for DRAM )

Hybrid Memory: reduce Latency, Energy, Power

Adaptive Tuning of Systems to Workloads

Summary

• Unreliability of ultra-thin dielectric material

• In addition, DRAM cell failures also from: – Permanently leaky cells

– Mechanically unstable cells

– Broken links in the DRAM array

Permanent faults for future DRAMs expected to be much higher

Reasons for DRAM Faults

Charge Leaks

DRAM Cell Capacitor

Permanently Leaky Cell

Mechanically Unstable Cell

DRAM Cell Capacitor (tilting towards ground)

Broken Links

DRAM Cells

• DRAM chip (organized into rows and columns) have spares

• Laser fuses enable spare rows/columns

• Entire row/column needs to be sacrificed for a few faulty cells

Row and Column Sparing incurs large area overheads

Row and Column Sparing

DRAM Chip: Before Row/Column Sparing

Spare Rows

Spare C

Faults

Replaced Rows

DRAM Chip: After Row/Column Sparing

laced C

Deactivated Rows and Columns

• Commodity ECC DIMM with SECDED at 8 bytes (72,64)

• Mainly used for soft-error protection

• For hard errors, high chance of two errors in same word (birthday paradox)

For 8GB DIMM 1 billion words

Expected errors till double-error word

= 1.25*Sqrt(N) = 40K errors 0.5 ppm

SECDED not enough at high error-rate (what about soft-error?)

Commodity ECC-DIMM

Most faulty words have 1-bit error The skew in fault probability can be leveraged for low cost resilience

Dissecting Fault Probabilities

Faulty Bits per word (8B) Probability Num words in 8GB

0 99.3% 0.99 Billion

1 0.7% 7.7 Million

2 26 x 10-6 28 K

3 62 x 10-9 67

4 10-10 0.1

At Bit Error Rate of 10-4 (100ppm) for an 8GB DIMM (1 billion words)

Tolerate high error rates with commodity ECC DIMM while retaining soft-error resilience

ArchShield: Overview

ArchShield

Replication Area

Fault Map (cached)

Fault Map

Main Memory

Inspired from Solid State Drives (SSD) to tolerate high bit-error rate

Expose faulty cell information to Architecture layer via runtime testing

ArchShield stores the error mitigation information in memory

Most words will be error-free

1-bit error handled with SECDED

Multi-bit error handled with replication

ArchShield: Yield Aware Design

When DIMM is configured, runtime testing is performed. Each 8B word gets classified into one of three types:

Tolerates 100ppm fault rate with 1% slowdown and 4% capacity loss

No Error

(classification of faulty words can be stored in hard drive for future use)

(Replication not needed)

SECDED can

correct soft error

1-bit Error

SECDED can correct hard error

Need replication

for soft error

Multi-bit Error

Word gets decommissioned

Only the replica

is used

Outline

Introduction

Summary

Emerging Technology to aid Scaling

PCM is attractive for designing scalable memory systems. But …

Phase Change Memory (PCM): Scalable to sub 10nm

Advantages: scalable, has MLC capability, non volatile (no leakage)

Resistive memory: High resistance (0), Low resistance (1)

Challenges for PCM

How do we design a scalable PCM without these disadvantages?

Key Problems: 1. Higher read latency (compared to DRAM) 2. Limited write endurance (~10-100 million writes per cell) 3. Writes are much slower, and power hungry

Replacing DRAM with PCM causes: High Read Latency,

High Power High Energy Consumption

Hybrid Memory: Best of DRAM and PCM

Hybrid Memory System: 1. DRAM as cache to tolerate PCM Rd/Wr latency and Wr bandwidth 2. PCM as main-memory to provide large capacity at good cost/power 3. Write filtering techniques to reduces wasteful writes to PCM

PCM Main Memory

DATA T

DRAM Buffer

PCM Write Queue

T=Tag-Store

Processor

Latency, Energy, Power: Lowered

Hybrid memory provides performance similar to iso-capacity DRAM Also avoids the energy/power overheads from frequent writes

db1 db2 qsort bsearch kmeans gauss daxpy vdotp gmean

8GB DRAM

32GB PCM

32GB DRAM

32GB PCM + 1GB DRAM

Outline

Introduction

Summary

Workload Adaptive Systems

Ideal for each workload to have the policy that works best for it

Different policies work well for different workloads 1. No single replacement policy works well for all workloads 2. Or, the prefetch algorithm 3. Or, the memory scheduling algorithm 4. Or, the coherence algorithm 5. Or, any other policy (write allocate/no allocate?)

Unfortunately: systems are designed to cater to average case (a policy that works good enough for all workloads)

Adaptive Tuning via Runtime Testing

Adaptive Tuning can allow dynamic policy selection at low cost

Say we want to select between two policies: P0 and P1

P0-sets

Follower Sets

P1-sets

Divide the cache in three:

– Dedicated P0 sets

– Dedicated P1 sets

– Follower sets (winner of P0,P1)

n-bit saturating counter

misses to P0-sets: counter++

misses to P1-set: counter--

Counter decides policy for Followers:

– MSB = 0, Use P0

– MSB = 1, Use P1

n-bit cntr +

– miss

monitor choose apply (Set Dueling: using a single counter)

Outline

Introduction

Summary

Challenges for Computer Architects

End of: Technology Scaling, Frequency Scaling, Moore’s Law, ???? How do we address these challenges:

Hybrid memory: Latency, Energy, Power reduction for PCM

Workload adaptive systems: low cost

Yield Awareness

“Adaptivity Through Testing”

The solution for all computer architecture problems is:

Challenges for Computer Architects

Hybrid memory: Latency, Energy, Power reduction for PCM

Workload adaptive systems: get low cost

Yield Awareness

Adaptivity Through Testing

The solution for all computer architecture problems is:

Happy 75th Yale !

End of: Technology Scaling, Frequency Scaling, Moore’s Law, ???? How do we address these challenges:

Memory Scaling is Dead, Long Live Memory Scaling · Moinuddin K. Qureshi ECE, Georgia Tech Memory...

Documents

Transcript of Memory Scaling is Dead, Long Live Memory Scaling · Moinuddin K. Qureshi ECE, Georgia Tech Memory...

The Dynamics of Scaling: A Memory-Based Anchor Model of ...mozer/Teaching/syllabi... · The Dynamics of Scaling: A Memory-Based Anchor Model of Category Rating and Absolute Identification

Semiconductor Flash Memory Scaling - Peopletking/theses/mshe.pdf · Semiconductor Flash Memory Scaling by Min She Doctor of Philosophy in Engineering - Electrical Engineering and

Scaling-Up In-Memory Datalog Processing: Observations and Techniquespages.cs.wisc.edu/~aws/papers/vldb19.pdf · 2019. 2. 18. · Scaling-Up In-Memory Datalog Processing: Observations

Memory Power Management via Dynamic Voltage/Frequency Scaling

Tek12: php://memory and streams for scaling

Memory Scaling : A Systems Architecture Perspectiveomutlu/pub/mutlu_memory-scaling_imw13_invited-talk.pdfThe Main Memory System ! Main memory is a critical component of all computing

Impact of Memory Frequency Scaling on User-centric ...eprints.whiterose.ac.uk/125334/1/paper_acm_8p_author_version.pdf · Impact of Memory Frequency Scaling on User-centric Smartphone

Top 10 Performance Gotchas for scaling in-memory Algorithms.

Individual Voltage Scaling in Logic and Memory Circuits ... · Conclusion & Future Work 17 •Individual voltage scaling problem in logic and memory presented Conclusion •A heuristic

Scaling and memory of intraday volatility return intervals ...havlin.biu.ac.il/PS/wyhs520.pdf · Scaling and memory of intraday volatility return intervals in stock markets ... aged

Chapter 6 Main Memory Scaling: Challenges and Solution ...users.ece.cmu.edu/~omutlu/pub/main-memory-scaling_springer15.pdf · Main Memory Scaling: Challenges and Solution Directions

Semiconductor Flash Memory Scaling

Chapter 6 Main Memory Scaling: Challenges and Solution ......Chapter 6 Main Memory Scaling: Challenges and Solution Directions Onur Mutlu, Carnegie Mellon University Abstract The memory

AIDE MEMOIRE Scaling-up Renewable Energy Program in Ghana ...

Cliffhanger: Scaling Performance Cliffs in Web Memory · PDF fileCliffhanger: Scaling Performance Cliffs in Web Memory Caches Asaf Cidon 1, Assaf Eisenman , Mohammad Alizadeh2, and

Scalable Many-Core Memory Systems Topic 1: DRAM Basics and DRAM Scaling

qconsf 2013: Top 10 Performance Gotchas for scaling in-memory Algorithms - Srisatish

Between History and Memory: Les Lieux de Memoire

SCALED PLANAR FLOATING-GATE NAND FLASH MEMORY TECHNOLOGY: CHALLENGES ...yy374yj9591/ShyamThesis... · 2.9 Flash Memory Scaling ... 2.9.1 Challenges in scaling NOR flash..... 31 2.9.2

High Performance Computing Systemsdshook/cse566/lectures/Exascale.pdf#3 Memory Technology Memory Capacity. 13 #3 Memory Technology Energy Scaling ... The Next Frontier” 2015. 28