0 1 Thousand Core Chips A Technology Perspective Shekhar Borkar Intel Corp. June 7, 2007.

29
1

Transcript of 0 1 Thousand Core Chips A Technology Perspective Shekhar Borkar Intel Corp. June 7, 2007.

Page 1: 0 1 Thousand Core Chips A Technology Perspective Shekhar Borkar Intel Corp. June 7, 2007.

1

Page 2: 0 1 Thousand Core Chips A Technology Perspective Shekhar Borkar Intel Corp. June 7, 2007.

2

Thousand Core ChipsThousand Core ChipsA Technology PerspectiveA Technology Perspective

Shekhar Shekhar BorkarBorkar

Intel Corp.Intel Corp.

June 7, 2007June 7, 2007

Page 3: 0 1 Thousand Core Chips A Technology Perspective Shekhar Borkar Intel Corp. June 7, 2007.

3

OutlineOutline

Technology outlookTechnology outlook

Evolution of Multi—thousands of cores?Evolution of Multi—thousands of cores?

How do you feed thousands of coresHow do you feed thousands of cores

Future challenges: variations and reliabilityFuture challenges: variations and reliability

ResiliencyResiliency

SummarySummary

Page 4: 0 1 Thousand Core Chips A Technology Perspective Shekhar Borkar Intel Corp. June 7, 2007.

4

Technology OutlookTechnology OutlookHigh Volume High Volume ManufacturingManufacturing

20042004 20062006 20082008 20102010 20122012 20142014 20162016 20182018

Technology Technology Node (nm)Node (nm)

9090 6565 4545 3232 2222 1616 1111 88

Integration Integration Capacity (BT)Capacity (BT)

2 4 8 16 32 64 128 256

Delay = CV/I Delay = CV/I scalingscaling

0.70.7 ~0.7~0.7 >0.7>0.7 Delay scaling will slow downDelay scaling will slow down

Energy/Logic Op Energy/Logic Op scalingscaling

>0.35>0.35 >0.5>0.5 >0.5>0.5 Energy scaling will slow downEnergy scaling will slow down

Bulk Planar Bulk Planar CMOSCMOS

High Probability Low ProbabilityHigh Probability Low Probability

Alternate, 3G etcAlternate, 3G etc Low Probability High ProbabilityLow Probability High Probability

VariabilityVariability Medium High Very HighMedium High Very High

ILD (K)ILD (K) ~3~3 <3<3 Reduce slowly towards 2-2.5Reduce slowly towards 2-2.5

RC DelayRC Delay 11 11 11 11 11 11 11 11

Metal LayersMetal Layers 6-76-7 7-87-8 8-98-9 0.5 to 1 layer per generation0.5 to 1 layer per generation

Page 5: 0 1 Thousand Core Chips A Technology Perspective Shekhar Borkar Intel Corp. June 7, 2007.

5

Terascale Integration CapacityTerascale Integration Capacity

1.E+00

1.E+01

1.E+02

1.E+03

1.E+04

1.E+05

1.E+06

2001 2005 2009 2013 2017

Tra

nsi

sto

rs (

Mil

lio

ns) Total Transistors,

300mm2 die

~1.5B LogicTransistors

~100MB Cache

100+B Transistor integration capacity100+B Transistor integration capacity

Page 6: 0 1 Thousand Core Chips A Technology Perspective Shekhar Borkar Intel Corp. June 7, 2007.

6

Scaling ProjectionsScaling Projections

0

10

20

30

40

50

2001 2005 2009 2013 2017

Fre

qu

ency

(G

Hz)

1.5X Ideal

1.25X Realistic

0.0

0.4

0.8

1.2

2001 2005 2009 2013 2017

Vd

d (

Vo

lts)

0.7X Ideal

Realistic

Freq scaling will slow downFreq scaling will slow down

VVdddd scaling will slow down scaling will slow down

Power will be too highPower will be too high0

200

400

600

800

1,000

1,200

1,400

2001 2005 2009 2013 2017

Po

wer

(W

atts

)

Power too high

300mm2 Die

Page 7: 0 1 Thousand Core Chips A Technology Perspective Shekhar Borkar Intel Corp. June 7, 2007.

7

Why Multi-core? –PerformanceWhy Multi-core? –Performance

1

10

1 10Area (X) or Power (X)

Pe

rfo

rma

nc

e (

X)

Slope ~ 0.5

Pollack's Rule2X Power = 1.4X Performance

1

10

100

1,000

2001 2005 2009 2013 2017

Rel

ativ

e P

erfo

rman

ce

Single Core

Multi-Core(Potential)

> 10X

Ever increasing single cores yield diminishing performance in a power envelope

Multi-cores provide potential for near-linear performance speedup

Page 8: 0 1 Thousand Core Chips A Technology Perspective Shekhar Borkar Intel Corp. June 7, 2007.

8

Why Dual-core? –PowerWhy Dual-core? –Power

VoltageVoltage FrequencyFrequency PowerPower PerformancePerformance

1%1% 1%1% 3%3% 0.66%0.66%

Rule of thumb

CoreCore

CacheCache

CoreCore

CacheCache

CoreCore

Voltage = 1Freq = 1Area = 1Power = 1Perf = 1

Voltage = -15%Freq = -15%Area = 2Power = 1Perf = ~1.8

In the same process technology…

Page 9: 0 1 Thousand Core Chips A Technology Perspective Shekhar Borkar Intel Corp. June 7, 2007.

9

C1C1 C2C2

C3C3 C4C4

Cache

Large CoreLarge Core

Cache

1

2

3

4

1

2 SmallCoreSmallCore

1 1

1

2

3

4

1

2

3

4

Power

PerformancePower = 1/4

Performance = 1/2

Multi-Core:Multi-Core:Power efficientPower efficient

Better power and Better power and thermal managementthermal management

Multi-Core:Multi-Core:Power efficientPower efficient

Better power and Better power and thermal managementthermal management

From Dual to Multi—From Dual to Multi—

Page 10: 0 1 Thousand Core Chips A Technology Perspective Shekhar Borkar Intel Corp. June 7, 2007.

10

GPGP GPGP

GPGP

GPGP GPGP

GPGP

GPGP

GPGP GPGP

GPGP

GPGP GPGP

General Purpose Cores

Future Multi-core PlatformFuture Multi-core Platform

SPSP SPSP

SPSP SPSPSpecial Purpose HW

CC

CC

CC

CC

CC

CC

CC

CC Interconnect fabric

Heterogeneous Multi-Core Platform—SOCHeterogeneous Multi-Core Platform—SOC

Page 11: 0 1 Thousand Core Chips A Technology Perspective Shekhar Borkar Intel Corp. June 7, 2007.

11

Fine Grain Power ManagementFine Grain Power Management

ff ff

ff

ff

ff ff

Vdd Cores with critical tasksFreq = f, at VddTPT = 1, Power = 1

f/2f/2

f/2f/2

f/2f/2

f/2f/2

f/2f/2

0.7xVdd

Non-critical coresFreq = f/2, at 0.7xVddTPT = 0.5, Power = 0.25

00

00

00

00 00

Cores shut downTPT = 0, Power = 0

Page 12: 0 1 Thousand Core Chips A Technology Perspective Shekhar Borkar Intel Corp. June 7, 2007.

12

Performance ScalingPerformance Scaling

0

2

4

6

8

10

0 10 20 30

Number of Cores

Per

form

ance

Amdahl’s Law: Parallel Speedup = 1/(Serial% + (1-Serial%)/N)

Serial% = 6.7%N = 16, N1/2 = 8

16 Cores, Perf = 8

Serial% = 20%N = 6, N1/2 = 3

6 Cores, Perf = 3

Parallel software key to Multi-core successParallel software key to Multi-core successParallel software key to Multi-core successParallel software key to Multi-core success

Page 13: 0 1 Thousand Core Chips A Technology Perspective Shekhar Borkar Intel Corp. June 7, 2007.

13

From Multi to Many…From Multi to Many…

0

5

10

15

20

25

30

TPT OneApp

TwoApp

FourApp

EightApp

Sys

tem

Per

form

ance

Large

Med

Small

Single Core Performance

1

0.5

0.3

0

0.2

0.4

0.6

0.8

1

1.2

La

rge

Me

d

Sm

all

Re

lati

ve

Pe

rfo

rma

nc

e

13mm, 100W, 48MB Cache, 4B Transistors, in 22nm12 Cores 48 Cores 144 Cores

Page 14: 0 1 Thousand Core Chips A Technology Perspective Shekhar Borkar Intel Corp. June 7, 2007.

14

From Many to Too Many…From Many to Too Many…

Single Core Performance

1

0.5

0.3

0

0.2

0.4

0.6

0.8

1

1.2

La

rge

Me

d

Sm

all

Re

lati

ve

Pe

rfo

rma

nc

e

13mm, 100W, 96MB Cache, 8B Transistors, in 16nm24 Cores 96 Cores 288 Cores

0

5

10

15

20

25

30

TPT OneApp

TwoApp

FourApp

EightApp

Sys

tem

Per

form

ance

Large

Med

Small

Page 15: 0 1 Thousand Core Chips A Technology Perspective Shekhar Borkar Intel Corp. June 7, 2007.

15

On Die Network PowerOn Die Network Power

1

10

100

1000

10000

2001 2005 2009 2013 2017

Th

rou

gh

pu

t (R

ela

tiv

e)

Small, 1.5MT core~1000 Cores

Large, 15MT core~ 100 Cores

1

10

100

1,000

2001 2005 2009 2013 2017

Ne

two

rk P

ow

er

(W)

4B wide links, 4 links/core

~150W

~15W

300mm2 Die

A careful balance of:

1. Throughput performance

2. Single thread performance (core size)

3. Core and network power

Page 16: 0 1 Thousand Core Chips A Technology Perspective Shekhar Borkar Intel Corp. June 7, 2007.

16

ObservationsObservationsScaling Multi— demands more parallelism every Scaling Multi— demands more parallelism every generationgeneration• Thread level, task level, application level

Many (or too many) cores does not always Many (or too many) cores does not always mean…mean…• The highest performance

• The highest MIPS/Watt

• The lowest power

If on-die network power is significant, then power If on-die network power is significant, then power is even worseis even worse

Now software, too, must follow Moore’s LawNow software, too, must follow Moore’s LawNow software, too, must follow Moore’s LawNow software, too, must follow Moore’s Law

Page 17: 0 1 Thousand Core Chips A Technology Perspective Shekhar Borkar Intel Corp. June 7, 2007.

17

Memory BW GapMemory BW GapBusses have become wider to deliver necessary memory BW (10 to 30 GB/sec)

Yet, memory BW is not enough

Many Core System will demand 100 GB/sec memory BW

0

1000

2000

3000

4000

5000

6000

1985 1990 1995 2000 2005 2010

MH

z

Core Clock

Bus Clock

GAP

How do you feed the beast?How do you feed the beast?How do you feed the beast?How do you feed the beast?

Page 18: 0 1 Thousand Core Chips A Technology Perspective Shekhar Borkar Intel Corp. June 7, 2007.

18

IO Pins and PowerIO Pins and Power

0

5

10

15

20

25

30

0 5 10 15 20

Signaling Rate GBit/sec

Po

wer

(m

W/G

bp

s)

State of the artState of the art

Research

State of the art:100 GB/sec ~ 1 Tb/sec = 1,000 Gb/sec 25mw/Gb/sec = 25 WattsBus-width = 1,000/5 = 200, about 400 pins (differential)

Too many signal pins, too much powerToo many signal pins, too much power

Page 19: 0 1 Thousand Core Chips A Technology Perspective Shekhar Borkar Intel Corp. June 7, 2007.

19

SolutionSolution

ChipChip ChipChip> 5mm

Bus

High speed busses

Busses are transmission linesL-R-C effectsNeed signal terminationSignal processing consumes power

Solutions:Reduce distance to << 5mmR-C busReduce signaling speed (~1Gb/sec)Increase pins to deliver BW1-2 mw/Gbps

ChipChip ChipChip

<2mm

100 GB/sec ~ 1 Tb/sec = 1,000 Gb/sec 2mw/Gb/sec = 2 WattsBus-width = 1,000/1 = 1,000 pins

Page 20: 0 1 Thousand Core Chips A Technology Perspective Shekhar Borkar Intel Corp. June 7, 2007.

20

Package

Anatomy of a Silicon ChipAnatomy of a Silicon Chip

Si Chip

Heat-sink

Heat

PowerSignals

Page 21: 0 1 Thousand Core Chips A Technology Perspective Shekhar Borkar Intel Corp. June 7, 2007.

21

Package

System in a PackageSystem in a Package

Si Chip Si Chip

Limited pins: 10mm / 50 micron = 200 pins

Limited pinsSignal distance is large ~10 mm – higher powerComplex package

Page 22: 0 1 Thousand Core Chips A Technology Perspective Shekhar Borkar Intel Corp. June 7, 2007.

22

Package

DRAM on TopDRAM on Top

CPU

Temp = 85°C

Junction Temp = 100+°C

High temp, hot spotsNot good for DRAM

DRAM

Heat-sink

Page 23: 0 1 Thousand Core Chips A Technology Perspective Shekhar Borkar Intel Corp. June 7, 2007.

23

Package

DRAM at the BottomDRAM at the Bottom

DRAM

CPU

Heat-sink

Power and IO signals go through DRAM to CPU

Thin DRAM die

Through DRAM vias

The most promising solution to feed the beastThe most promising solution to feed the beast

Page 24: 0 1 Thousand Core Chips A Technology Perspective Shekhar Borkar Intel Corp. June 7, 2007.

24

ReliabilityReliability

Soft Error FIT/Chip (Logic & Mem)

0

50

100

150

Re

lati

ve

Time dependent device degradation

0

1

1 2 3 4 5 6 7 8 9 10

Time

Ion

Re

lati

ve

Burn-in may phase out…?

1

10

100

1000

10000

180 90 45 22

Jo

x (

Re

lati

ve

)Hi-K?

?

Extreme device variations

0

50

100

100 120 140 160 180 200

Vt(mV)

Re

lati

ve

Wider

Page 25: 0 1 Thousand Core Chips A Technology Perspective Shekhar Borkar Intel Corp. June 7, 2007.

25

Implications to ReliabilityImplications to Reliability

Extreme variations (Static & Dynamic) will result in Extreme variations (Static & Dynamic) will result in unreliable componentsunreliable components

Impossible to design reliable system as we know Impossible to design reliable system as we know todaytoday

• Transient errors (Soft Errors)

• Gradual errors (Variations)

• Time dependent (Degradation)

Reliable systems with unreliable components Reliable systems with unreliable components ——Resilient Resilient ArchitecturesArchitectures

Reliable systems with unreliable components Reliable systems with unreliable components ——Resilient Resilient ArchitecturesArchitectures

Page 26: 0 1 Thousand Core Chips A Technology Perspective Shekhar Borkar Intel Corp. June 7, 2007.

26

Implications to TestImplications to Test

One-time-factory testing will be outOne-time-factory testing will be out

Burn-in to catch chip infant-mortality will not be Burn-in to catch chip infant-mortality will not be practicalpractical

Test HW will be part of the designTest HW will be part of the design

Dynamically self-test, detect errors, Dynamically self-test, detect errors, reconfigure, & adaptreconfigure, & adapt

Page 27: 0 1 Thousand Core Chips A Technology Perspective Shekhar Borkar Intel Corp. June 7, 2007.

27

In a Nut-shell…In a Nut-shell…

100 Billion

Transistors

100 Billion

Transistors

100 BT integration capacity

Billions unusable (variations)

Some will fail over time

Yet, deliver high performance in the power & Yet, deliver high performance in the power & cost envelopecost envelope

Yet, deliver high performance in the power & Yet, deliver high performance in the power & cost envelopecost envelope

Intermittent failures

Page 28: 0 1 Thousand Core Chips A Technology Perspective Shekhar Borkar Intel Corp. June 7, 2007.

28

Resiliency with Many-CoreResiliency with Many-Core

Dynamic on-chip testingDynamic on-chip testing

Performance profilingPerformance profiling

Cores in reserve (spares)Cores in reserve (spares)

Binning strategyBinning strategy

Dynamic, fine grain, performance Dynamic, fine grain, performance and power managementand power management

Coarse-grain redundancy Coarse-grain redundancy checkingchecking

Dynamic error detection & Dynamic error detection & reconfiguration reconfiguration

Decommission aging cores, swap Decommission aging cores, swap with spareswith spares

Dynamically…Dynamically…1.1. Self test & detectSelf test & detect2.2. Isolate errorsIsolate errors3.3. ConfineConfine4.4. Reconfigure, andReconfigure, and5.5. AdaptAdapt

Dynamically…Dynamically…1.1. Self test & detectSelf test & detect2.2. Isolate errorsIsolate errors3.3. ConfineConfine4.4. Reconfigure, andReconfigure, and5.5. AdaptAdapt

CC

CC

CC

CC

CC

CC

CC

CC

Page 29: 0 1 Thousand Core Chips A Technology Perspective Shekhar Borkar Intel Corp. June 7, 2007.

29

SummarySummaryMoore’s Law with Terascale integration capacity Moore’s Law with Terascale integration capacity will allow integration of thousands of coreswill allow integration of thousands of cores

Power continues to be the challengePower continues to be the challenge

On-die network power could be significantOn-die network power could be significant

Optimize for power with the size of the core and Optimize for power with the size of the core and the number of coresthe number of cores

3D Memory technology needed to feed the beast3D Memory technology needed to feed the beast

Many-cores will deliver the highest performance in Many-cores will deliver the highest performance in the power envelope the power envelope with resiliencywith resiliency