REVISITING THE MEMORY HIERARCHY FOR TOMORROW COMPUTING SYSTEMS PPt/… · Caches Data moves along...

19
Leti Devices Workshop | Elisa Vianello | December 4, 2016 REVISITING THE MEMORY HIERARCHY FOR TOMORROW COMPUTING SYSTEMS

Transcript of REVISITING THE MEMORY HIERARCHY FOR TOMORROW COMPUTING SYSTEMS PPt/… · Caches Data moves along...

Leti Devices Workshop | Elisa Vianello | December 4, 2016

REVISITING THE MEMORY HIERARCHY FOR TOMORROW COMPUTING SYSTEMS

| 2Leti Devices Workshop | ElisaVianello | December 4, 2016

OUTLINE

• Ever Increasing Need for More Memory

• Rethinking the Memory Hierarchy with New NV Memory Technologies

• Rethinking the System Architecture?

| 3

0

M. Hilbert et al., Vol. 332, SCIENCE, 2011

In billions of GB

Analog

Digital

2002 turn-point between analog vs digital format

The amount of data being created is growing 40% a year into the next decade

IoT

Mobile

Server

EVER INCREASING NEED FOR MORE MEMORY

Leti Devices Workshop | ElisaVianello | December 4, 2016

By 2020 the digital universe is expected to contain nearly as many digital bits as there are Stars in the Universe…

| 4

Core count doubling ~ every 2 yearsDRAM capacity doubling ~ every 3 years

New memory technologies and rethinking of system ar chitecture focused on data storage and management are needed!

Compute costs dropping faster than memory costs

Source: Liam et al ISCA 2009Source: IBM Deep Computing

EVER INCREASING NEED FOR MORE MEMORY

logic

memory

Leti Devices Workshop | ElisaVianello | December 4, 2016

The growth in data produced is outpacing the improv ements in the density and cost of storage technologies

| 5

CONVENTIONAL TYPES OF MEMORIES AND TECHNOLOGIES

MAGNETIC

NAND Flash

HD

D

SD

D

DR

AM

CP

UMIM CAP

1T 1R

Register filesL1-L4 SRAM

Storage

Cos

t/bit

& S

peed

Memory volume

Latency gap

Main Memory

RegistersCaches

Dat

a m

oves

alo

ng a

ll le

vels

of s

tora

ge

hier

arch

y be

fore

and

afte

r be

ing

proc

esse

d at

the

proc

esso

r

Can new NVMs (RRAM, PCM, MRAM…) help to reduce the l atency gap and limit data movements across the Memory Hier archy?

Processor

Leti Devices Workshop | ElisaVianello | December 4, 2016

| 6

Cycle Number Nc [#]

MRAM

101 103 105 107 109 1011100

102

104

106

108

CBRAM OxRAM PCRAM

GST-Sb

Ta2O5/TaOxTa2O5/TaOx

GST-N

GST-O

Cu-GST/CuO/SiO2/BE

WOx

CNTTaON

TaN/GeTe/Cu

Ti/TaSiO/RuCu/PSE/Ru

Pro

gram

min

g W

indo

w R

off/R

on [#

]

Endurance Nc [#]

Cu-GST/CuO/SiO2/BE

Cu/Ta2O5/Pt

Ag/GeS2/HfO

2

Golden

GaSbGe

RRAM MEMORIES: PROGRAMMING WINDOW VS. ENDURANCE

• CBRAM largest Roff/Ron chalco-based and/or bilayers; best endurance oxide-based

• OxRAM largest Roff/Ron non-polar; best endurance bipolar

• PCRAM best endurance GST-based

• MRAM outlier..

Universal Memory does Not Exist!4.5 paper on Monday afternoon on RRAM Endurance,

Retention and Window Margin Trade-offLeti Devices Workshop | ElisaVianello | December 4, 2016

E. Vianello IEDM 2014L. Perniola IMW 2016

| 7

STORAGE CLASS MEMORIES

MAGNETIC

NAND Flash

HD

D

SD

D

DR

AM

CP

U

MIM CAP1T 1R

Register filesL1-L4 SRAM

PCM/RRAM

SC

M

Latency of access

<0.001µs registers<0.01µs cache

<0.03µs

>100µs

>103µs

0.05µs-100µs?processing @ SCM: from data compression to query

processing

com

putin

g

Leti Devices Workshop | ElisaVianello | December 4, 2016

Requirements:Fast access speed approaching DRAM

Nonvolatile retain data at power offHigh endurance (program/erase cycles)

Solid state (no moving parts)Low cost/bit (approaching HDD)

| 8

« TRANSFORMATIONAL » ABILITY OF PCM

0 4 8 12 16 2010-5

10-4

10-3

10-2

10-1

SETRESET

DE

FE

CT

DE

NS

ITY

CURRENT [µµµµA]

12 Mb 90 nm test chip

H. Y. Cheng et al., IEDM 2012G. Navarro et al., IEDM 2013H.Y. Cheng et al., JAP 2014P. Zuliani et al., SSE 2015V. Sousa et al., VLSI 2015

101 103 105 107 109104

105

106

107

SET

RESET

RE

SIT

AN

CE

[ΩΩ ΩΩ

]

CYCLES

2h bake at 230°C

RESET 30ns - SET 800ns

Leti Devices Workshop | ElisaVianello | December 4, 2016

| 9

HYBRID MAIN MEMORY

MAGNETIC

NAND Flash

HD

D

SD

D

DR

AM

CP

U

Hybrid RRAM/PCM

Register filesL1-L4 SRAM

PCM/RRAM

SC

MHigh capacity working memory

Hybrid memory: DRAM as a cache to RRAM or PCM tech

to achieve the best of multiple technologies

Leti Devices Workshop | ElisaVianello | December 4, 2016

• non-volatile• high-density• low-cost

• fast• durable

| 10

LRS and HRS on 4kbits arrayMAD Leti OxRAM

4.7 paper on Monday afternoon on Filament-based RR AM

Ti/HfO 2 BASED-OxRAM

<100ns programming time at 1V with up to 100M cycle s

E. Vianello IEDM 2014A. Benoist IRPS 2014 ST/Leti

Leti Devices Workshop | ElisaVianello | December 4, 2016

28nm CMOS

| 11

L2-L3 CACHE

MAGNETIC

NAND Flash

HD

D

SD

D

DR

AM

CP

U

Hybrid PCM/MRAM

Cache L2 L3

PCM/RRAM

SC

MReplacing SRAM (L2-L3)

best candidate is STT-MRAM (fast and high endurance)

with STT-MRAM

Leti Devices Workshop | ElisaVianello | December 4, 2016

Reduction of stand-by power consumption

4Mbit STT MRAM

Read 3.3ns @ 1.25V

65nm CMOS

[Noguchi, Toshiba, ISSCC 2016]

| 12

REPLACING SRAM WITH STT-MRAM

MAD Leti pSTT-MRAM

27.3 paper on Wednesday morning on Data Retention Extraction Methodology pSTT-MRAM

Leti Devices Workshop | ElisaVianello | December 4, 2016

STT-MRAM demonstrated fast writing speed and high e ndurance, however retention still to be fully characterized

different retention extraction methods are compared

a trade off exists between thermal stability and programming speed

4kbit array

| 13

RETHINKING THE SYSTEM ARCHITECTURE?

Neuromorphic systems• Detect and predict patterns in complex data• visual or auditory data analysis

Leti Devices Workshop | ElisaVianello | December 4, 2016

PCMGST , GeTe, GST/HfO2

CBRAMGeS2/Ag

OxRAMHfO2/Ti (planar + VRAM)

RRAM has been promoted by Leti (and many others!) to emulate synaptic plasticity: the ability of synapse s to

strengthen or weaken over time

| 14

RRAM TO IMPLEMENT ON-LINE LEARNINGIn nature synapses have a volatile component, is

it useful for the learning process?

RRAM data

Time [s]

Con

duct

ance

[S]

biological data

decoding of neural signals

Highly noisy neurological data

ANN with RRAM based synapses

synaptic change has a volatile component

The volatile component allows to improve detection in highly-noisy input data

16.6 paper on Tuesday morning on Short and Long-te rm Synaptic Plasticity Using OxRAM

Leti Devices Workshop | ElisaVianello | December 4, 2016

| 15

The data explosion is leading to a corresponding gr owth in data centric applications (capture, classify, archi ve…). The

adoption of new NVMs enable a rethinking of system architecture-based on data storage and management.

However universal memory does not exist, different NVM technologies have to be introduced in the storage h ierarchy

Challenges:

Design matched to the specific memory technology fe atures

Non volatility does non come for free

• static power vs. active power• memory window vs. endurance• memory window vs. data retention.

CONCLUSION

Leti Devices Workshop | ElisaVianello | December 4, 2016

| 16

Thanks to the new NVMs technologies (PCM, RRAM, MRA M) the persistent memory is getting closer to the compute center

avoiding wasting energy in the movement

CONCLUSION

Leti Devices Workshop | ElisaVianello | December 4, 2016

Leti, technology research instituteCommissariat à l’énergie atomique et aux énergies alternativesMinatec Campus | 17 rue des Martyrs | 38054 Grenoble Cedex | Francewww.leti.fr

| 18

EMBEDDED SYSTEMS: TOWARD DISTRIBUTED MEMORY

MCU (Stand-by)

Sensor

Radio

MCU (active)

Source: Renesas ASP-DAC 2014

MCU power challenge: need for Ultra-Low Stand-by Power design solution

NV Flip-Flop[N. Jovanovi ć ISCAS 2016]

NVM merges with SRAM and logicshutdown SRAM & registers thanks to

distributed NVM

28nm FDSOI + HfO 2 based OxRAM

eFla

sh(N

VM

) logic

nvSRAM

NV logic

SRAM

Leti Devices Workshop | ElisaVianello | December 4, 2016

• thermal stability for smart card & automotive

• easy integration with advance CMOS

| 19

A circuit-architecture co-optimization framework for evaluating emerging memory hierarchiesISPASS.2013 X. Dong et al.