INVITED PLENARY TALK FOR VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY … · 2011-08-28 · VLSI...

38
INVITED PLENARY TALK FOR INVITED PLENARY TALK FOR VLSI VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY IMPACTS FROM THE TECHNOLOGY IMPACTS FROM THE VLSI VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY IMPACTS FROM THE TECHNOLOGY IMPACTS FROM THE NEW WAVE OF ARCHITECTURES NEW WAVE OF ARCHITECTURES FOR MEDIA FOR MEDIA-RICH WORKLOADS RICH WORKLOADS Samuel Naffziger AMD Corporate Fellow August 26 th , 2011 (Original presentation June 14 th , 2011) 1 VLSI Technology Symposium | June 2011 | Public

Transcript of INVITED PLENARY TALK FOR VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY … · 2011-08-28 · VLSI...

Page 1: INVITED PLENARY TALK FOR VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY … · 2011-08-28 · VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY IMPACTS FROM THE TECHNOLOGY SYMPOSIUM 2011 ... Web

INVITED PLENARY TALK FORINVITED PLENARY TALK FORVLSIVLSI TECHNOLOGY SYMPOSIUM 2011TECHNOLOGY SYMPOSIUM 2011

TECHNOLOGY IMPACTS FROM THETECHNOLOGY IMPACTS FROM THE

VLSI VLSI TECHNOLOGY SYMPOSIUM 2011TECHNOLOGY SYMPOSIUM 2011

TECHNOLOGY IMPACTS FROM THE TECHNOLOGY IMPACTS FROM THE NEW WAVE OF ARCHITECTURES NEW WAVE OF ARCHITECTURES FOR MEDIAFOR MEDIA--RICH WORKLOADSRICH WORKLOADS

Samuel NaffzigerAMD Corporate Fellow

August 26th, 2011 (Original presentation June 14th , 2011)

1 VLSI Technology Symposium | June 2011 | Public

Page 2: INVITED PLENARY TALK FOR VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY … · 2011-08-28 · VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY IMPACTS FROM THE TECHNOLOGY SYMPOSIUM 2011 ... Web

OutlineOutline

Introduction

The new workloads and demands on computationThe new workloads and demands on computation

Characteristics of serial and parallel computation

The Accelerated Processing Unit (APU) architectureThe Accelerated Processing Unit (APU) architecture

APU architecture implications for technology

Summary

2 VLSI Technology Symposium | June 2011 | Public

Page 3: INVITED PLENARY TALK FOR VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY … · 2011-08-28 · VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY IMPACTS FROM THE TECHNOLOGY SYMPOSIUM 2011 ... Web

Now: Parallel/DataNow: Parallel/Data--DenseDense

The Big Experience/Small Form Factor ParadoxThe Big Experience/Small Form Factor ParadoxMid 2000sMid 2000sTechnologyTechnology Mid 1990sMid 1990s Now: Parallel/DataNow: Parallel/Data--DenseDense

16:9 @ 7 megapixels

HD video flipcams, phones,

4:3 @ 1.2 megapixels

Digital cameras, SD webcams (1-5

TechnologyTechnology Mid 1990sMid 1990s

DisplayDisplay 4:3 @ 0.5 megapixel

E il fil & webcams (1GB)

3D Internet apps and HD video online, social networking w/HD files

SD webcams (1-5 MB files)

WWW and streaming SD video

ContentContent Email, film & scanners

OnlineOnline Text and low res photos

3D Blu-ray HD

Multi-touch, facial/gesture/voice recognition + mouse & keyboard

DVDs

Mouse & keyboardMouse & keyboard

MultimediaMultimedia CD-ROM

InterfaceInterface Mouse & keyboard

All day computing (8+ Hours)All day computing (8+ Hours)

Immersive and Immersive and interactive performanceinteractive performance ss

33--4 Hours4 HoursStandardStandard--definition definition

InternetInternet

Battery Battery Life*Life* 1-2 Hours

rmrm tors

tors pp

Wor

kloa

dsW

orkl

oadsFor

For

Fact

Fact

Early Early Internet and Multimedia Internet and Multimedia ExperiencesExperiences

3 VLSI Technology Symposium| June 2011 | Public

*Resting battery life as measured with industry standard tests.

Page 4: INVITED PLENARY TALK FOR VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY … · 2011-08-28 · VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY IMPACTS FROM THE TECHNOLOGY SYMPOSIUM 2011 ... Web

Focusing on the experiences that matterFocusing on the experiences that matterConsumer PC Usage New Experiences

Email

Web browsing

Office productivity

New Experiences

Accelerated Internet Accelerated Internet Office productivity

Listen to music

Online chat

Watching online video

Photo editing

Accelerated Internet Accelerated Internet and HD Videoand HD Video

Photo editing

Personal finances

Taking notes

Online web-based games

Simplified Content Simplified Content ManagementManagement

Social networking

Calendar management

Locally installed games

Educational apps

ImmersiveImmersiveGamingGaming

Video editing

Internet phone

0% 20% 40% 60% 80% 100%

4 VLSI Technology Symposium| June 2011 | Public

Source: IDC's 2009 Consumer PC Buyer Survey

Page 5: INVITED PLENARY TALK FOR VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY … · 2011-08-28 · VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY IMPACTS FROM THE TECHNOLOGY SYMPOSIUM 2011 ... Web

People Prefer Visual CommunicationsPeople Prefer Visual Communications

Visual Visual PerceptionPerceptionVerbal Verbal PerceptionPerception

Words are processedWords are processedat only 150 wordsat only 150 wordsper minuteper minute

Pictures and videoare processed 400 to

2000 times faster

Rich visual experiencesM lti l t tAugmenting Today’s Content:Augmenting Today’s Content: Multiple content sources Multi-Display Stereo 3D

5 VLSI Technology Symposium | June 2011 | Public

Page 6: INVITED PLENARY TALK FOR VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY … · 2011-08-28 · VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY IMPACTS FROM THE TECHNOLOGY SYMPOSIUM 2011 ... Web

The Emerging World of New Data Rich Applications The Emerging World of New Data Rich Applications

The Ultimate VisualThe Ultimate Visual

Communicating

The Ultimate Visual Experience™

Fast Rich Web content, favorite HD Movies, games with realistic

graphics

The Ultimate Visual Experience™

Fast Rich Web content, favorite HD Movies, games with realistic

graphics • IM, Email, Facebook• Video Chat, NetMeeting

Using photos• Viewing& Sharing• Search, Recognition, Labeling? • Advanced Editing

graphicsgraphics

Gaming

• Advanced Editing

Using video• DVD, BLU-RAY™, HD • Search, Recognition, Labeling • Advanced Editing & Mixing g

• Mainstream Games• 3D gamesMusic

• Listening and Sharing• Editing and Mixing• Composing and compositing

6 VLSI Technology Symposium | June 2011 | Public

ArcSoftTotalMedia®

Theatre 5

ArcSoftMedia

Converter® 7

CyberLinkMedia

Espresso 6

CyberLinkPower

Director 9

Corel VideoStudioPro

Corel Digital Studio2010

InternetExplorer 9

Microsoft® PowerPoint® 2010

Windows Live

Essentials

CodemastersF1 2010Nuvixa

Be Present

ViVuDesktop

TelepresenceViewdle

Uploader

Page 7: INVITED PLENARY TALK FOR VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY … · 2011-08-28 · VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY IMPACTS FROM THE TECHNOLOGY SYMPOSIUM 2011 ... Web

New Workload Examples: New Workload Examples: Changing Consumer BehaviorChanging Consumer Behavior

24 hoursof video

Approximately

9 billionvideo files owned are of video

uploaded to YouTube

every minute

video files owned are

high-definition

50 million +digital media files

1000 imagesd g ta ed a es

added to personal content libraries

every day

imagesare uploaded to Facebook

every second

7 | 2011 VLSI Symposium| June 2011 | Public

Page 8: INVITED PLENARY TALK FOR VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY … · 2011-08-28 · VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY IMPACTS FROM THE TECHNOLOGY SYMPOSIUM 2011 ... Web

What Are the Implications for Computation?What Are the Implications for Computation?Insatiable demand for highInsatiable demand for high bandwidth processing–Visual image processing–Natural user interfaces–Massive data mining for

associative searchesassociative searches, recognition

Some of these compute needs b ffl d d tcan be offloaded to servers,

some must be done on the mobile device– Similar compute needs and

massive growth in both spaces

How must CPU architecture change to d l ith th t d ?

8 VLSI Technology Symposium | June 2011 | Public

pdeal with these trends?

Page 9: INVITED PLENARY TALK FOR VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY … · 2011-08-28 · VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY IMPACTS FROM THE TECHNOLOGY SYMPOSIUM 2011 ... Web

Serial ComputationSerial Computation

Transistors(thousands)

35 Years of Microprocessor Trend DataSerial Code

(thousands)

Single-threadPerformance(SpecINT)

i 0

…Conditionalbranches

Frequency(MHz)

Typical Power(Watts)

i=0i++

load x(i)fmulstore

cmp i (16)bc

Number ofCores

Loops, branches and conditional evaluation

Original data collected and plotted by M. Horowitz, F. Labonte, O. Shacham, K. Olukotun, L. Hammond and C. Batten

9 VLSI Technology Symposium | June 2011 | Public

Page 10: INVITED PLENARY TALK FOR VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY … · 2011-08-28 · VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY IMPACTS FROM THE TECHNOLOGY SYMPOSIUM 2011 ... Web

Parallel ComputationParallel Computation

GFLOPs Trendi=0

DataParallel Code

6000

7000

8000

P)

GFLOPs Trend

GPU

i++load x(i)

fmulstore

cmp i (1000000)bc

Loop 1M times for 1M pieces

of data

3000

4000

5000

eak GFlop

s (SPFP CPU

AMD projections

i,j=0i++j++

load x(i,j)fmulstore

0

1000

2000

2005 2006 2007 2008 2009 2010 2011 2012 2013 2014P projections

…cmp j (100000)

bccmp i (100000)

bc

2D array representingvery large dataset

Years

10 VLSI Technology Symposium | June 2011 | Public

Page 11: INVITED PLENARY TALK FOR VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY … · 2011-08-28 · VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY IMPACTS FROM THE TECHNOLOGY SYMPOSIUM 2011 ... Web

GPU/CPU Design DifferencesGPU/CPU Design Differences

L t f i t ti littl d t

CPU (Serial compute) GPU (parallel compute)

Lots of instructions little data

• Out of order exec, Branch prediction

Few instructions lots of data

• Single Instruction Multiple Data• Extensive fine threading capability• Few hardware threads

Weak performance gains through density

• Extensive fine-threading capability

Nearly linear performance gains with density

Maximize speed with fast devices

Maximize density with cool devices

11 VLSI Technology Symposium | June 2011 | Public

Page 12: INVITED PLENARY TALK FOR VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY … · 2011-08-28 · VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY IMPACTS FROM THE TECHNOLOGY SYMPOSIUM 2011 ... Web

Three Eras of Processor PerformanceThree Eras of Processor Performance

Single-Core Era

Enabled by:

Multi-Core Era

Enabled by: Moore’s Law

HeterogeneousSystems Era

Enabled by: M ’ L Moore’s Law

Voltage & Process Scaling Micro Architecture

Constrained by:P

Moore’s Law Desire for Throughput 20 years of SMP arch

Constrained by:Power

Moore’s Law Abundant data parallelism Power efficient GPUs

Temporarily constrained by:P i d lPower

ComplexityPowerParallel SW availabilityScalability

Programming modelsCommunication overheadsWorkloads

erfo

rman

ce

?o rform

ance

o

plic

atio

n an

ce

ingl

e-th

read

P

we arehere

Thro

ughp

ut P

e we arehere

Targ

eted

App

Per

form

a

we arehere

o

12 VLSI Technology Symposium | June 2011 | Public

S Time

T

Time(# of Processors)

Time(Data-parallel exploitation)

Page 13: INVITED PLENARY TALK FOR VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY … · 2011-08-28 · VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY IMPACTS FROM THE TECHNOLOGY SYMPOSIUM 2011 ... Web

Heterogeneous Computing with an APU ArchitectureHeterogeneous Computing with an APU Architecture

2010 G2010 G (“ ) f(“ ) f 20112011 (“ ) f(“ ) f

CPU CPU CoresCores

~17 GB/sec~17 GB/sec~17 GB/sec~17 GB/sec

2010 IGP2010 IGP--based based (“Danube”) Platform(“Danube”) Platform 2011 APU2011 APU--based based (“Llano”) Platform(“Llano”) Platform

CoresCores

~~7 GB/sec7 GB/sec

UNBUNB

MC

MC

DDR3 DIMMDDR3 DIMMMemoryMemory

CPU ChipCPU Chip

DDR3 DIMMDDR3 DIMMMemoryMemory

APU ChipAPU Chip

CPU CPU CoresCores

UVDUVD

UN

B / M

CU

NB

/ MC

GPUGPU UVDUVD

SB FunctionsSB Functions

FCH ChipFCH Chip

Graphics requires memory Graphics requires memory BW to bring full capabilities BW to bring full capabilities

to lifeto life~27 GB/sec~27 GB/sec

~27 GB/sec~27 GB/secPCIe

GPUGPU

OptionalOptional

PCIe®®

Bandwidth pinch points and latency Bandwidth pinch points and latency hold back the GPU capabilitieshold back the GPU capabilities

Integration Provides ImprovementIntegration Provides Improvement Eliminate power and latency of extra chip Eliminate power and latency of extra chip

GPUGPU

crossingcrossing 3X 3X bandwidth between GPU and Memory!bandwidth between GPU and Memory! Same Same sized GPU is substantially more sized GPU is substantially more effectiveeffective

P ffi i d d h l f b hP ffi i d d h l f b h

13 VLSI Technology Symposium | June 2011 | Public

Power efficient, advanced technology for both Power efficient, advanced technology for both CPU and GPUCPU and GPU

Page 14: INVITED PLENARY TALK FOR VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY … · 2011-08-28 · VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY IMPACTS FROM THE TECHNOLOGY SYMPOSIUM 2011 ... Web

The Challenges of Integration

erfo

rman

ceP

erDensity

Thick, fast metalBig devices

Dense, thin metal, small devicesBig devices devices

CPU flop area = 2 14

GPU flop area = 1 0

Flop count for 4 Llano CPU cores=0.66M

Flop count for Llano GPU =3.5M

14 VLSI Technology Symposium | June 2011 | Public

area 2.14 area 1.0

Page 15: INVITED PLENARY TALK FOR VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY … · 2011-08-28 · VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY IMPACTS FROM THE TECHNOLOGY SYMPOSIUM 2011 ... Web

How to Balance the Metal Stack?Cu Resistivity

22.12.22.32.42.5

uohm

-cm

)

rman

ce With barrierWithout barrier

1.51.61.71.81.92

0 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 1Res

istiv

ity (

Per

for

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Line Width (um)Density

With the 20nm node, even local metal will be seeing large RC increase compromises more diffi lt

Add metal layers?Add metal layers? Thin, dense layers for the GPUThin, dense layers for the GPU

difficultyy

Thick, low resistance layers for the CPUThick, low resistance layers for the CPU Cost issues?Cost issues? Via resistance?Via resistance?

15 VLSI Technology Symposium | June 2011 | Public

Via resistance?Via resistance?Technology improvements in BEOL are requiredTechnology improvements in BEOL are required

Page 16: INVITED PLENARY TALK FOR VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY … · 2011-08-28 · VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY IMPACTS FROM THE TECHNOLOGY SYMPOSIUM 2011 ... Web

R R vsvs C?C?

Given the grim RC prognosis, should we be re-shaping either the aspect ratio or stack

M1 M1 M1 – M2

M5 – M6

M5 – M1

composition?Maybe. However, there are times when

M1 – M1 M1 M2

However, there are times when RC is important, but there are also many times when only C matters 7000

8000

9000

m

Track Availability vs Distance

GPU Stack:8-1X; 1-2X

Moreover, metal stack aspect ratio is more or less maxed out, so that leaves stack

3000

4000

5000

6000

7000

able

trac

ks/1

00um CPU Stack:

2-1X; 2-1.3X; 4-2X; 1-4X

composition Different products will emphasize different metal 0

1000

2000

3000

0 20 40 60 80 100 120 140

Ava

ila

16 VLSI Technology Symposium | June 2011 | Public

pstacks

0 20 40 60 80 100 120 140

Norm Dist/ps

Page 17: INVITED PLENARY TALK FOR VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY … · 2011-08-28 · VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY IMPACTS FROM THE TECHNOLOGY SYMPOSIUM 2011 ... Web

The growth in metal layer countThe growth in metal layer count

14Number of CPU Metal Levels vs Technology Node

10

12

els

6

8

Met

al L

eve

0

2

4

010 100 1000

Technology Node

17 VLSI Technology Symposium | June 2011 | Public

Page 18: INVITED PLENARY TALK FOR VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY … · 2011-08-28 · VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY IMPACTS FROM THE TECHNOLOGY SYMPOSIUM 2011 ... Web

Factors driving growth in Metal LayersFactors driving growth in Metal LayersI t t i t f b i liInterconnect requirements from basic scaling–Transistor count N scales as S2 (with fixed die size)Total interconnect length (in lambda) scales N>1 because of semi-global and global routes Therefore interconnect length (in mm) global and global routes. Therefore, interconnect length (in mm) increases at a rate <1/S

Non-scaling design rules–In order to achieve tight pitch, more restrictive design rules In order to achieve tight pitch, more restrictive design rules are imposed that significantly reduce the routeability of metal layers: Unidirectional metal, increased overlap requirements, restrictive T2T

d T2L land T2L rulesEach metal layer is “worth less” in terms of routeability: need more metal layers

Reverse scalingReverse scaling–Long distance routes require lower RC than can be accommodated by scaled metal

–So move routes to thicker layers but fewer tracks available

18 VLSI Technology Symposium | June 2011 | Public

–So, move routes to thicker layers, but fewer tracks available, so pressure on layer count

Page 19: INVITED PLENARY TALK FOR VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY … · 2011-08-28 · VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY IMPACTS FROM THE TECHNOLOGY SYMPOSIUM 2011 ... Web

Factors driving increase in Metal LayersFactors driving increase in Metal Layers

Electromigration/Power Supply GridElectromigration/Power Supply Grid–As cross section scales with S2, and current increases as Vdddrops, so current densities increase dramaticallyHigher Via R Metal resistances significantly degrade Higher Via R, Metal resistances significantly degrade Drives improved E-M sophistication, process techniques (alloys/barriers), denser power networksPower Gating and Power Islands may drive the need for multipleo e Gat g a d o e s a ds ay d e t e eed o u t p esupply grids More metal consumed by power supply grid

All of the above can have the effect of increasing the number of metal layers– But it can be a tradeoff of Metal layers vs die size and/or route timeBut it can be a tradeoff of Metal layers vs die size and/or route time

19 VLSI Technology Symposium | June 2011 | Public

Page 20: INVITED PLENARY TALK FOR VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY … · 2011-08-28 · VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY IMPACTS FROM THE TECHNOLOGY SYMPOSIUM 2011 ... Web

Device OptimizationAPUVtMix

rman

ce

CPU GPU

0.8

1

1.2

APU Vt Mix

LC‐HVT

Per

for

0.2

0.4

0.6HVT

RVT

LC‐RVT

LVTDevice Ioff

To achieve breakthrough APU performance, the Llano GPU

0

Llano CPU Llano GPU

LVTDevice Ioff

p ,has ~5X the flops and ~5X the device count of the CPUs

200

250

FPG vs. Ioff

desired device range

Speed vs. Leakage

ed

A broader device suite is

Broader span of devices required

0

50

100

150

FPG

device range

RO

spe

e

LVT LC-RVTRVT HVT LC-HVT

suite is required

20 VLSI Technology Symposium | June 2011 | Public

0

175 50 20 4.3 2.7 0.5 0.4

Ioff (nA/um room temp)

Page 21: INVITED PLENARY TALK FOR VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY … · 2011-08-28 · VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY IMPACTS FROM THE TECHNOLOGY SYMPOSIUM 2011 ... Web

Power Transfers

110.0110.0

100.0

105.0

100.0

105.0

90.0

95.0

90.0

95.0

85.0

Balanced workload

85.0

GPU-centric data parallel workload

V l i i i l bliV l i i i l bliVoltage range is critical to enabling Voltage range is critical to enabling the efficient power transfers that the efficient power transfers that make for compelling APU make for compelling APU performanceperformance

21 VLSI Technology Symposium | June 2011 | Public

performance performance

Page 22: INVITED PLENARY TALK FOR VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY … · 2011-08-28 · VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY IMPACTS FROM THE TECHNOLOGY SYMPOSIUM 2011 ... Web

Power Transfers

110.0110.0

100.0

105.0

100.0

105.0

90.0

95.0

90.0

95.0

85.0

CPU-centric serial orkload

85.0

Balanced workloadV l i i i l bliV l i i i l bli workloadVoltage range is critical to enabling Voltage range is critical to enabling

the efficient power transfers that the efficient power transfers that make for compelling APU make for compelling APU performanceperformance

22 VLSI Technology Symposium | June 2011 | Public

performance performance

Page 23: INVITED PLENARY TALK FOR VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY … · 2011-08-28 · VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY IMPACTS FROM THE TECHNOLOGY SYMPOSIUM 2011 ... Web

Operating Voltage RangeE/op vs V

Operating voltage Operating voltage requirements:requirements: Low voltage necessary forLow voltage necessary for

2

2.5

E/op vs. V

Low voltage necessary for Low voltage necessary for power efficiencypower efficiency High voltage necessary for High voltage necessary for 0.5

1

1.5

g g yg g ya snappy user experience a snappy user experience enabled by turbo modeenabled by turbo mode

0

0.7V 0.8V 0.9V 1.0V 1.1V 1.2V 1.3V

23 VLSI Technology Symposium | June 2011 | Public

Page 24: INVITED PLENARY TALK FOR VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY … · 2011-08-28 · VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY IMPACTS FROM THE TECHNOLOGY SYMPOSIUM 2011 ... Web

Power Density Limited  GPU

Operating Voltage Challenges

0.952V

0.915V

0 886V0.900V

0.950V

1.000V

3.5

4

4.5

5

age

40nm to 14nm

Frequency

To maintain cost effective To maintain cost effective performance growth with performance growth with technology node the GPUtechnology node the GPU 0.886V

0.805V

0 750V

0.800V

0.850V

1

1.5

2

2.5

3

Nom

inal Volta

Frequency

Power Density

Voltage

technology node, the GPU technology node, the GPU must:must:

–– Hold power density Hold power density constantconstant

0.700V

0.750V

0

0.5

40nm 28nm 20nm 14nm

constantconstant–– Exploit density gains to add Exploit density gains to add

compute units compute units Juniper FrequencyData40nm GPU Frequency Data

This necessarily drives This necessarily drives operating voltage downoperating voltage down

This would be good for energy This would be good for energy 700MHz

800MHz

900MHz

1000MHz

Juniper Frequency Data

1

2

3

4

5

6

40nm GPU Frequency Data

efficiency except …efficiency except …–– Variation impacts are much Variation impacts are much

greater at low voltagegreater at low voltage 300MHz

400MHz

500MHz

600MHz

6

7

8

9

10

11

12

13Frequency

24 VLSI Technology Symposium | June 2011 | Public

0MHz

100MHz

200MHz

0.85V 0.90V 0.95V 1.00V 1.10V 1.15V

14

15

16

17

18

spread increases at low voltage

Page 25: INVITED PLENARY TALK FOR VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY … · 2011-08-28 · VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY IMPACTS FROM THE TECHNOLOGY SYMPOSIUM 2011 ... Web

The Operating Voltage Challenge

Many barriers to maintaining both high Many barriers to maintaining both high and low voltage as technology scalesand low voltage as technology scales

FD devices should enable FD devices should enable maintaining the functional maintaining the functional range for a generation or tworange for a generation or twoand low voltage as technology scalesand low voltage as technology scales

TDDB vs. SCE controlTDDB vs. SCE control ULK breakdown vs. denser pitchesULK breakdown vs. denser pitches

V i ti t lV i ti t l

g gg gWill turbo modes be too Will turbo modes be too

compromised?compromised?What’s next?What’s next? Variation controlVariation control What s next?What s next?

95.0

100.0

105.0

110.0

Poly

85.0

90.0

Fin

25 VLSI Technology Symposium | June 2011 | Public

BOXFin

Page 26: INVITED PLENARY TALK FOR VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY … · 2011-08-28 · VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY IMPACTS FROM THE TECHNOLOGY SYMPOSIUM 2011 ... Web

Cost issuesCost issues

26 VLSI Technology Symposium | June 2011 | Public

Page 27: INVITED PLENARY TALK FOR VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY … · 2011-08-28 · VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY IMPACTS FROM THE TECHNOLOGY SYMPOSIUM 2011 ... Web

1000 R = k1/NA

Lithography evolutionLithography evolution

R k1/NA is saturating

NA is saturating

k1 limit is about 0.25

i-linei-lineg-line 430nmg-line 430nm

100

"1:1"

i-line 365nmi-line

365nmKrF 248nmKrF 248nmArF

193 nmimmersion

430nm430nmArF193 nm

10 NA~1.35 NA< 0.8

27 VLSI Technology Symposium | June 2011 | Public

1010 100 1000

Technology Node or min Feat. size

NA 1.35 NA 0.8

Page 28: INVITED PLENARY TALK FOR VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY … · 2011-08-28 · VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY IMPACTS FROM THE TECHNOLOGY SYMPOSIUM 2011 ... Web

Scaling implicationsScaling implications

R = k1/NA . – stuck at 193nm for now, NA at 1.35, and k1 limit at 0.25–Reducing k1 to <0 3 has very considerable cost:Reducing k1 to <0.3 has very considerable cost: –Much OPC and RDR needed to achieve tight pitches

K1=0.36

- Net, a significant erosion of pitch-based scaling entitlement- Scale factors are proprietary … but block area scaling > pitch scaling^2!

Fundamental pitch limitation for 193nm lithography is ~ 80nm

28 VLSI Technology Symposium | June 2011 | Public

Page 29: INVITED PLENARY TALK FOR VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY … · 2011-08-28 · VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY IMPACTS FROM THE TECHNOLOGY SYMPOSIUM 2011 ... Web

Pitch splittingPitch splitting

Decomposing a layer into two effectively 72nm 144nm

doubles pitch, resolving k1 issue and allowing complex shapes

Decomposition requires Decomposition requires significant CAD effort to break the patterns into two printable layersp y

29 VLSI Technology Symposium | June 2011 | Public

Page 30: INVITED PLENARY TALK FOR VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY … · 2011-08-28 · VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY IMPACTS FROM THE TECHNOLOGY SYMPOSIUM 2011 ... Web

Pitch splittingPitch splitting

Decomposing a layer into two effectively 72nm 144nm

doubles pitch, resolving k1 issue and allowing complex shapes

Decomposition requires Decomposition requires significant CAD effort to break the patterns into two printable layersp y

However, now have within-layer overlay issues, and min space can be a Vmax issue or a Cap issue

4σ min space ~ 16.5nm (PS @ 72nm) versus ~28nm ( @ 80nm)

a Vmax issue, or a Cap issue

30 VLSI Technology Symposium | June 2011 | Public

4σ max space ~ 44.8nm (PS @ 72nm) versus ~41.3nm (D@ 80nm) ~Ccap variation: +85% / -30% over nominal for PS @ 72nm

~Ccap variation: +25% / -15% over nominal for SE @ 80nm

Page 31: INVITED PLENARY TALK FOR VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY … · 2011-08-28 · VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY IMPACTS FROM THE TECHNOLOGY SYMPOSIUM 2011 ... Web

Why do we care?Why do we care?

Foundries have settled on a 28nm node with a ~4:3 M1X:Poly pitch ratioFoundries have settled on a 28nm node with a ~4:3 M1X:Poly pitch ratio–Typical Design rules

assuming 0.7x scalingDesign Rule 28nm Desired 20nm

Contacted Poly Pitch ~113nm ~80nm

20nm node CPP is doable –but probably want >80nm for margin and gate oversize capability

Contacted Poly Pitch 113nm 80nmM1X Pitch ~90nm ~64nm

but probably want >80nm for margin and gate oversize capabilityDesired 1X metal scaling to 20nm is below pitch split limitCan get “true” scaling and pitch split 1X metals

–GPU’s have up to 8 1X metals–CPU’s have 2-5 1X metals

Choice: significant cost adder for “true” scaling, or reduced cost and reduced scaling

31 VLSI Technology Symposium | June 2011 | Public

Page 32: INVITED PLENARY TALK FOR VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY … · 2011-08-28 · VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY IMPACTS FROM THE TECHNOLOGY SYMPOSIUM 2011 ... Web

Other cost considerationsOther cost considerationsMOL: Conventional contacts at <90nm CPP don’t work, and a more complex scheme is required, analogous to LI used by Intel at 32nm (+2 masks)

BEOL Options: Only scale 1X metals to ~80nm pitch, get reduced scaling but lower cost

Add metal layers at 80nm pitch to recover scaling; increased cost and cycle time

Use some combination of pitch split and non-pitch split layers to obtain greater scaling at higher cost

Key questions to resolve:

Additional cost of pitch split layers– Additional cost of pitch split layers

– Additional defectivity of pitch split layers (~64 vs ~80nm pitch)

– Whether or not to pitch split vias

32 VLSI Technology Symposium | June 2011 | Public

p p

Page 33: INVITED PLENARY TALK FOR VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY … · 2011-08-28 · VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY IMPACTS FROM THE TECHNOLOGY SYMPOSIUM 2011 ... Web

Relative cost experimentRelative cost experiment

33 VLSI Technology Symposium | June 2011 | Public

Page 34: INVITED PLENARY TALK FOR VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY … · 2011-08-28 · VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY IMPACTS FROM THE TECHNOLOGY SYMPOSIUM 2011 ... Web

What about EUV?What about EUV?At = 13.5nm, EUV should make lithography simple, and eliminate the

f O C ?need for pitch splitting, as well as most OPC. Right?Maybe:

– Very expensive capital equipment– Complex, expensive reflective masks– Very low throughput due to illuminator

output >10X below requirementsoutput 10X below requirements– Very high power requirements

Th i b l blThese issues may be solvable, unlikely by the leading edge of 14nm

EUV

Other forms of advanced lithography such as MEBL look attractive, but are even further behind EUV.

34 VLSI Technology Symposium | June 2011 | Public

Page 35: INVITED PLENARY TALK FOR VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY … · 2011-08-28 · VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY IMPACTS FROM THE TECHNOLOGY SYMPOSIUM 2011 ... Web

3D Integration to the Rescue?

DRAM

TIM (Thermal Interface Material)

Heat Sink

DRAMi

Through GPU Die

Analog Die (SB, Power)Metal Layers

Metal LayersDRAMMicro-

bumps

ThroughSilicon Vias

(TSVs) CPU DieMetal Layers

G U eMetal Layers

Package Substrate

South Bridge

35 VLSI Technology Symposium | June 2011 | Public

Page 36: INVITED PLENARY TALK FOR VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY … · 2011-08-28 · VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY IMPACTS FROM THE TECHNOLOGY SYMPOSIUM 2011 ... Web

3D Integration to the Rescue?

Stacking offers many attractive benefits Stacking offers many attractive benefitsHigher bandwidth to local memory

E bl ll l d i l t di t b i th iEnables parallel and serial compute die to be in their own separate optimized technology – interconnect speed vs. density, device optimization etc.

Allows IO and southbridge content to remain in older, more analog-friendly technology

36 VLSI Technology Symposium | June 2011 | Public

Page 37: INVITED PLENARY TALK FOR VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY … · 2011-08-28 · VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY IMPACTS FROM THE TECHNOLOGY SYMPOSIUM 2011 ... Web

3D Integration Challenges Economical 3D stacking in high volume manufacturing presents co o ca 3 stac g g o u e a u actu g p ese tsmany challenges

Benefits must exceed the additional costs of TSVs, and yield fallout

Logistics of testing and assembling die from multiple sources can be immense

Countless mechanical and thermal issues to solve in high volume mfgCountless mechanical and thermal issues to solve in high volume mfg

DRAM

Clearly 3D provides compelling solutions to many problems, but the TIM (Thermal Interface Material)

Heat Sink

DRAMy

barriers to entry mean heavy R&D $$ and partnerships required

ThroughSilicon

Vias(TSVs)

CPU DieMetal Layers

GPU DieMetal Layers

Analog Die (SB, Power)Metal Layers

Metal LayersDRAMDie to

Die Vias

37 VLSI Technology Symposium | June 2011 | Public

p p qPackage Substrate

South Bridge

Page 38: INVITED PLENARY TALK FOR VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY … · 2011-08-28 · VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY IMPACTS FROM THE TECHNOLOGY SYMPOSIUM 2011 ... Web

SummarySummary

Insatiable demand for high bandwidth computation–Visual image processing–Natural user interfacesNatural user interfaces–Massive data mining for associate searches, recognition

Some of these compute needs can be offloaded to servers, some must be done on the mobile devicesome must be done on the mobile device–Similar compute needs and massive growth in both spaces–Combined serial and parallel computation architectures are

key in both spaceskey in both spacesHuge technology challenges to meeting this opportunity

–Interconnect scaling is hitting a wall that must be overcomeA broad device suite is necessary that operates efficiently at–A broad device suite is necessary that operates efficiently at low voltage while enabling high speed for response time

–Cost issues present a very real barrier to further scaling3D integration offers a promising long term solution

38 VLSI Technology Symposium | June 2011 | Public

–3D integration offers a promising long term solution