EE271 Introduction to VLSI Design

63
Computing Performance: The N3XT 1,000X Department of Electrical Engineering Stanford University H.-S. Philip Wong Collaborator: Subhasish Mitra 0 1 0 1

Transcript of EE271 Introduction to VLSI Design

Page 1: EE271 Introduction to VLSI Design

Computing Performance:

The N3XT 1,000X

Department of Electrical Engineering

Stanford University

H.-S. Philip Wong

Collaborator: Subhasish Mitra

0 1 0 1

Page 2: EE271 Introduction to VLSI Design

World Relies on Electronics

2

Page 3: EE271 Introduction to VLSI Design

100101010101010101010101011001010010101010101001101010101010100111010011000101010101011001010001110010100101010101100010111010101010101010011010010101010101010101101010011001010110010101010101101001011010101010101001111100111110111010010010111010101101010110101010101010101010111001001001010101010

Abundant-data

World Relies on Electronics

3

Page 4: EE271 Introduction to VLSI Design

100101010101010101010101011001010010101010101001101010101010100111010011000101010101011001010001110010100101010101100010111010101010101010011010010101010101010101101010011001010110010101010101101001011010101010101001111100111110111010010010111010101101010110101010101010101010111001001001010101010

World Relies on Electronics

Internet of Everything

4

Page 5: EE271 Introduction to VLSI Design

100101010101010101010101011001010010101010101001101010101010100111010011000101010101011001010001110010100101010101100010111010101010101010011010010101010101010101101010011001010110010101010101101001011010101010101001111100111110111010010010111010101101010110101010101010101010111001001001010101010

Genomics

Smart Cities

Military Science

Finance

Security

Health Care Government

World Relies on Electronics

5

Page 6: EE271 Introduction to VLSI Design

100101010101010101010101011001010010101010101001101010101010100111010011000101010101011001010001110010100101010101100010111010101010101010011010010101010101010101101010011001010110010101010101101001011010101010101001111100111110111010010010111010101101010110101010101010101010111001001001010101010

Genomics

Smart Cities

Military Science

Finance

Security

Health Care Government

Computational demands

exceed

Processing capability

World Relies on Electronics

6

Page 7: EE271 Introduction to VLSI Design

7

10

MH

z

F

req.

5

GH

z

1980 Year 2013

Source: cpudb.stanford.edu

Many Walls Simultaneously

Power

wall

Year

Ha

rdw

are

bu

gs Complexity wall

Also:

resilience wall,

interconnect wall,

cooling wall, … Source: Intel

96%

Execution time

Compute

Memory access

Memory wall

Page 8: EE271 Introduction to VLSI Design

US National Academy of Sciences (2011) 8

Page 9: EE271 Introduction to VLSI Design

System

integration

Device

performance

Improve Computing Performance

9

Page 10: EE271 Introduction to VLSI Design

Option 1: Better Transistors

Few experimental demos

Transistors ≠ system

System

integration

Device

performance10

Page 11: EE271 Introduction to VLSI Design

Option 2: Design Tricks

Limited “tricks”

Complexity → design bugs

Multi-cores

Power

management

System

integration

Device

performance11

Page 12: EE271 Introduction to VLSI Design

Improve Computing Performance

Multi-cores

Power

management

Target:

1,000X performance

Business as usual insufficient

System

integration

Device

performance12

Page 13: EE271 Introduction to VLSI Design

Solution: Nanosystems

Translate new nanotech

New devices

New fabrication

New sensorsimperfections?

large-scale fabrication?

13

Page 14: EE271 Introduction to VLSI Design

Solution: Nanosystems

Translate new nanotechinto new systems

New devices

New fabrication

New sensors

glo

ba

l IL

Vlocal ILVs

New

architectures

14

Page 15: EE271 Introduction to VLSI Design

Solution: Nanosystems

Translate new nanotechinto new systems

enable new applicationsNew devices

New fabrication

New sensors

glo

ba

l IL

Vlocal ILVs

New

architectures

15

Page 16: EE271 Introduction to VLSI Design

Abundant-Data Applications

16

Huge memory wall

96%

Application execution time

Compute

Memory access

Page 17: EE271 Introduction to VLSI Design

Limited to 2-Dimensional Circuits

Computer Chips Today

17

Page 18: EE271 Introduction to VLSI Design

N3XT NanosystemsComputation immersed in memory

18

Page 19: EE271 Introduction to VLSI Design

Memory

N3XT NanosystemsComputation immersed in memory

Computing logic

Ultra-dense

vertical connections

19

Page 20: EE271 Introduction to VLSI Design

Memory

N3XT NanosystemsComputation immersed in memory

Impossible with today’s technologies

Computing logic

Ultra-dense

vertical connections

20

Page 21: EE271 Introduction to VLSI Design

21

Nano-Engineered

Computing Systems Technology

21

0 1 0 1

Page 22: EE271 Introduction to VLSI Design

Unique N3XT Technology

22

End-to-end

Isolated improvements inadequate

Chip stacking

New apps

Memories

Nanoscale

cooling

Abundant

data

apps

1D / 2D

FETs,

RRAM,

mRAM

Architecture

&

software

Yield,

reliability

New

3D

fabrication

Existing efforts N3XT

Page 23: EE271 Introduction to VLSI Design

Memory

Computing Logic

N3XT NanosystemsComputation immersed in memory

Ultra-dense

vertical connections

23

Page 24: EE271 Introduction to VLSI Design

Carbon Nanotube FET (CNFET)

24

CNT: d = 1.2nm

2 µm

Gate

2 µm

Gate

Energy Delay Product

~ 10X benefit

IBM Power 7 model

d

CNFET

Sub-litho

Page 25: EE271 Introduction to VLSI Design

CNFET Inverter

25

P+ Doped

N+ Doped

INPUT

Page 26: EE271 Introduction to VLSI Design

Big Promise, Major Obstacles

26

Mis-positioned CNTs Metallic CNTs

Process advances alone inadequate

[Zhang IEEE TCAD 12]

Solution: Imperfection-immune design

Page 27: EE271 Introduction to VLSI Design

CNT Growth circa 2005

27

Highly mis-positioned

10 µm

Page 28: EE271 Introduction to VLSI Design

First Wafer-Scale Aligned CNT Growth

28

Quartz wafer

with catalyst

Aligned

CNT growth

Quartz wafer with CNTs

20mm

99.5% aligned CNTs

Stanford Nanofabrication Facility

[Patil VLSI Tech. 08, IEEE TNANO 09]

Page 29: EE271 Introduction to VLSI Design

Wafer-Scale CNT Transfer

29[Patil VLSI Tech. 08, IEEE TNANO 09]

High-temperature CNT growth

900 °C

CNT transfer

120 °C

Low-temperature circuit fabrication

Before transfer After transfer

SiO2/SiQuartz

2 µm

CNTs

2 µm

Page 30: EE271 Introduction to VLSI Design

Mis-Positioned CNT-Immune NAND

30

BA

A

B

Out

1. Grow CNTs

2. Extended gate, contacts

3. Etch gate & CNTs

4. Dope P & N regions

Etched

region

essential

Arbitrary logic functions

Graph algorithms

Vdd

Gnd

Page 31: EE271 Introduction to VLSI Design

VLSI Metallic CNT Removal

31

Chip-scale electrical breakdown

Universally effective

[Patil IEDM 09, IEEE TNANO 10, Shulaker ACS Nano 14]

source

drain

2μm5μm

99.99% m-CNT removal, 4% s-CNT removal

Page 32: EE271 Introduction to VLSI Design

New VMR

32

Arbitrary technology nodes: 10nm & beyond

[Shulaker IEDM 15]

Relaxed node m-CNTs Erased Scaled circuits

Record selectivity

99.99% m-CNTs erased, 1% s-CNTs erased

Page 33: EE271 Introduction to VLSI Design

Most Importantly

33

VLSI processing

No per-unit customization

VLSI design

Immune CNT library

Page 34: EE271 Introduction to VLSI Design

First Sub-system: ISSCC Demo

34[Shulaker ISSCC 13, IEEE JSSC 14] Collaborator: Prof. G. Gielen, KU Leuven

Page 35: EE271 Introduction to VLSI Design

First Sub-system: ISSCC Demo

35[Shulaker ISSCC 13, IEEE JSSC 14] Collaborator: Prof. G. Gielen, KU Leuven

Wafer with CNFET circuits

Robot

ISSCC Jack Raper Outstanding Technology Directions Paper

Sacha: CNT Controlled Hand-shaking Robot

Page 36: EE271 Introduction to VLSI Design

CNT Computer

36[Shulaker Nature 13]

Page 37: EE271 Introduction to VLSI Design

CNT Computer

37[Shulaker Nature 13]

Turing-complete processor: entirely CNFETs

Instruction Fetch Data Fetch ALU Write-back

Page 39: EE271 Introduction to VLSI Design

Reproducible Results

39

80 ALUs 200 D-Latches

~ 1,800 CNFETs~ 1,600 CNFETs

Waveforms overlaid

Page 40: EE271 Introduction to VLSI Design

High-Performance CNFETs

40

Doping Current Drive

Contact Resistance Scaling

Dielectric

interactions

Page 41: EE271 Introduction to VLSI Design

High-Performance CNFETs

41

> 100 CNTs/mm

Major challenge

New result

> 100 CNTs/mm

Record ION density

Controlled variations

CNFET

(Stanford lab)

Si FET

(foundries)

I ON

(µA

/µm

)

High-density CNTs

[Shulaker IEDM 14]

Page 42: EE271 Introduction to VLSI Design

42

High Performance Obstacles

Doping Current Drive

Contact ResistanceScaling

Dielectric

interactions

Page 43: EE271 Introduction to VLSI Design

43

SS = 97 mV/dec

n-CNFET p-CNFET

VDD

GND

In

Out

VIN(V)

VO

UT(V

)

1

100

• VDD from 1.0V 0.8V 0.6V 0.4V 0.2V

Complementary CNFET Logic

Page 44: EE271 Introduction to VLSI Design

44

High Performance Obstacles

Doping Current Drive

Contact ResistanceScaling

Dielectric

interactions

Page 45: EE271 Introduction to VLSI Design

45

Recent Progress

Top-contact Edge-contact

[Cao, Science 15 (IBM)]

Page 46: EE271 Introduction to VLSI Design

Memory

Computing Logic

N3XT NanosystemsComputation immersed in memory

Ultra-dense

vertical connections

46

Page 47: EE271 Introduction to VLSI Design

Many Nano-scale Innovations

47

Memory & logic devices

30 µm thick

Vertical metal nanowire arrays

Phase change: hotspots suppressed

Embedded cooling

3D Resistive RAM (RRAM)

<1 nm

MoS2

2D FETs: large-area monolayer MoS2

Page 48: EE271 Introduction to VLSI Design

New Memories

filament

oxygen ion

Top Electrode

Bottom Electrode

metaloxide

oxygen vacancy

Bottom Electrode

Top Electrode

oxide isolation

switching region

phase change material

filament

Bottom Electrode

solid electrolyte

Active Top Electrode

metal atoms

STT-MRAM CBRAMRRAMPCM

Spin torque transfer magnetic random access memory

Phase change memory

Resistive switching random access memory

Conductive bridge random access memory

Random access, non-volatile, no erase before write

Soft Magnet

Pinned Magnet

tunnel barrier (oxide)

current

48

Page 49: EE271 Introduction to VLSI Design

Scalable Embedded Memory

Bi-layer TiOx (2.5nm) / HfOx (1.5nm)

Y. Wu, H. Yi, Z. Zhang, Z. Jiang, J. Sohn,

S. Wong, H.-S. P. Wong, IEDM 2013.

(Stanford)

B.Govoreanu et al., IEDM 2011 (IMEC)

Scalable: 10 nmScalable: 12 nm

49

Page 50: EE271 Introduction to VLSI Design

High Density 3D Memory

Stanford: IEDM ’12, ’13, VLSI ’13, ’14,DATE ’15, Nature Comm ‘15

2nd Layer

1st Layer

Al2O3

Graphene

SiO2

TiN

Al2O3

Graphene

SiO2

TiN TiN

2nd Layer

1stLayer

Pt

Pt

SiO2

HfOx

TiN

Pt

SiO2

40nm 40nm

5nm

5nm 5nm

G-RRAM (graphene thickness: 3Å) Pt-RRAM (Pt thickness: 25nm)

Pt-RRAM (Pt thickness: 5nm)

RRAM memory cells

50

Page 51: EE271 Introduction to VLSI Design

High Density 3D Memory

< 1 μA

1 – 2 V

5 ns

> 1G cycles

F = 5 nm

128 layers

64 Tb per chipStanford: IEDM ’12, ’13, VLSI ’13, ’14,DATE ’15, Nature Comm ‘15

51

Page 52: EE271 Introduction to VLSI Design

Memory

Computing Logic

N3XT NanosystemsComputation immersed in memory

Ultra-dense

vertical connections

52

Page 53: EE271 Introduction to VLSI Design

First Logic + Memory Monolithic 3D

[Shulaker IEDM 14]

Si-F

ET

RR

AM

CN

FE

TLogic

RAM

RAM

Logic

The “High-rise”

chip

Circuit demos

Routing Element

Routing Element

Routing Element WL[3]

WL[1]

WL[0]

BL_1 BL_2

Routing Element

in2

in1

out2

out1

WL[2]

200 µm

53

Page 54: EE271 Introduction to VLSI Design

Millions of sensors

Memory

CNT computing logic

Ultra-dense

vertical connections

Terabytes / second

Abundant sensor data:

Extensive, accurate classification

Interwoven Compute + Memory + Sensing

[M. Shulaker, Stanford. Unpublished] 61

To be published. Please keep in confidence

Page 55: EE271 Introduction to VLSI Design

Complement with Software Solutions

62DSL = Domain-Specific Language

Co-optimized

s/w + h/w

Runtime

optimization

Learning:

key

architectural

concept

Yield,

reliability

Cross-

Layer

ResilienceDSL

compiler

Page 56: EE271 Introduction to VLSI Design

Quantifying N3XT System Benefits

63

Heterogeneous nanotechnologies

Architecture design space

Physical design

Integrated thermal analysis

Yield, reliability

Page 57: EE271 Introduction to VLSI Design

Sweet Spot: Abundant-Data Apps.

64PageRank app.

851X benefits

0%

20%

40%

60%

80%

100%

2D N3XT

0%

20%

40%

60%

80%

100%

2D N3XT

2.7% 4.3%

Energy: 37X Exec. Time: 23X

IBM graph analytics Data-intensive computing

Page 58: EE271 Introduction to VLSI Design

Sweet Spot: Abundant-Data Apps.

65

IBM graph analytics Data-intensive computing

PageRank app.

851X benefits

0%

20%

40%

60%

80%

100%

2D N3XT

0%

20%

40%

60%

80%

100%

2D N3XT

Processor active Processor idle Memory access

0%

1%

2%

3%

0%

3%

5%

Energy: 37X Exec. Time: 23X

Page 59: EE271 Introduction to VLSI Design

Massive Benefits Require- Not a logic device

- Not a memory device

- Not 3D integration

- Not thermal management

- Not new architectures

- Not yield and reliability management

66

Page 60: EE271 Introduction to VLSI Design

Massive Benefits Require- Not a logic device

- Not a memory device

- Not 3D integration

- Not thermal management

- Not new architectures

- Not yield and reliability management

N3X

T

67

Page 62: EE271 Introduction to VLSI Design

Conclusion

69

Nanosystems today

Game ON, to the era

N3XT 1,000X

Compute + memory densely interwoven

0 1 0 1

Page 63: EE271 Introduction to VLSI Design

Memory

N3XT NanosystemsComputation immersed in memory

Computing logic

Ultra-dense

vertical connections

70