CS 61C: Great Ideas in Computer Architecture Lecture 12 ...

53
CS 61C: Great Ideas in Computer Architecture Lecture 12: Control & Operating Speed Krste Asanović & Randy Katz http://inst.eecs.berkeley.edu/~cs61c/fa17

Transcript of CS 61C: Great Ideas in Computer Architecture Lecture 12 ...

Page 1: CS 61C: Great Ideas in Computer Architecture Lecture 12 ...

CS61C:GreatIdeasinComputerArchitecture

Lecture12:Control&OperatingSpeed

KrsteAsanović &RandyKatz

http://inst.eecs.berkeley.edu/~cs61c/fa17

Page 2: CS 61C: Great Ideas in Computer Architecture Lecture 12 ...

Agenda• FinishSingle-CycleRISC-VDatapath• Controller• InstructionTiming• PerformanceMeasures• IntroductiontoPipelining• PipelinedRISC-V Datapath• AndinConclusion,...CS61c Lecture12:Control&Performance 2

Page 3: CS 61C: Great Ideas in Computer Architecture Lecture 12 ...

Recap:Addingbranchestodatapath

CS61c 3

IMEMALU

Imm.Gen

+4

DMEMBranchComp.

Reg[]

AddrAAddrB

DataAAddrD

DataB

DataD

Addr

DataWDataR

10

01

10 pc

01

inst[11:7]

inst[19:15]

inst[24:20]

inst[31:7]

alu

mem

wb

alu

pc+4

Reg[rs1]

pc

imm[31:0]

Reg[rs2]

inst[31:0] ImmSel RegWEn BrUnBrEq BrLT ASelBSel ALUSel MemRW WBSelPCSel

wb

Page 4: CS 61C: Great Ideas in Computer Architecture Lecture 12 ...

ImplementingJALR Instruction(I-Format)

• JALRrd,rs,immediate−WritesPC+4toReg[rd](returnaddress)− SetsPC=Reg[rs1]+immediate− Usessameimmediates asarithmeticandloads

§ nomultiplicationby2bytes

4

Page 5: CS 61C: Great Ideas in Computer Architecture Lecture 12 ...

Addingjalr todatapath

CS61c 5

IMEMALU

Imm.Gen

+4

DMEMBranchComp.

Reg[]

AddrAAddrB

DataAAddrD

DataB

DataD

Addr

DataWDataR

10

0121

0 pc01

inst[11:7]

inst[19:15]

inst[24:20]

inst[31:7]

pc+4alu

mem

wb

alu

pc+4

Reg[rs1]

pc

imm[31:0]

Reg[rs2]

inst[31:0] ImmSel RegWEn BrUnBrEq BrLT ASelBSel ALUSel MemRW WBSelPCSel

wb

Page 6: CS 61C: Great Ideas in Computer Architecture Lecture 12 ...

Addingjalr todatapath

CS61c 6

IMEMALU

Imm.Gen

+4

DMEMBranchComp.

Reg[]

AddrAAddrB

DataAAddrD

DataB

DataD

Addr

DataWDataR

10

0121

0 pc01

inst[11:7]

inst[19:15]

inst[24:20]

inst[31:7]

pc+4alu

mem

wb

alu

pc+4

Reg[rs1]

pc

imm[31:0]

Reg[rs2]

inst[31:0] ImmSel=I RegWEn=1

BrUn=* BrEq=* BrLT=*

Asel=0Bsel=1

ALUSel=Add

MemRW=Read WBSel=2PCSel

wb

Page 7: CS 61C: Great Ideas in Computer Architecture Lecture 12 ...

Implementingjal Instruction

• JALsavesPC+4inReg[rd](thereturnaddress)• SetPC=PC+offset(PC-relativejump)• Targetsomewherewithin±219 locations,2bytesapart

− ±218 32-bitinstructions• Immediateencodingoptimizedsimilarlytobranchinstructiontoreducehardwarecost

7

Page 8: CS 61C: Great Ideas in Computer Architecture Lecture 12 ...

Addingjal todatapath

CS61c 8

IMEMALU

Imm.Gen

+4

DMEMBranchComp.

Reg[]

AddrAAddrB

DataAAddrD

DataB

DataD

Addr

DataWDataR

10

0121

0 pc01

inst[11:7]

inst[19:15]

inst[24:20]

inst[31:7]

pc+4alu

mem

wb

alu

pc+4

Reg[rs1]

pc

imm[31:0]

Reg[rs2]

inst[31:0] ImmSel RegWEn BrUnBrEq BrLT ASelBSel ALUSel MemRW WBSelPCSel

wb

Page 9: CS 61C: Great Ideas in Computer Architecture Lecture 12 ...

Addingjal todatapath

CS61c 9

IMEMALU

Imm.Gen

+4

DMEMBranchComp.

Reg[]

AddrAAddrB

DataAAddrD

DataB

DataD

Addr

DataWDataR

10

0121

0 pc01

inst[11:7]

inst[19:15]

inst[24:20]

inst[31:7]

pc+4alu

mem

wb

alu

pc+4

Reg[rs1]

pc

imm[31:0]

Reg[rs2]

inst[31:0] ImmSel=J RegWEn=1

BrUn=* BrEq=* BrLT=*

Asel=1Bsel=1

ALUSel=Add

MemRW=Read WBSel=2PCSel

wb

Page 10: CS 61C: Great Ideas in Computer Architecture Lecture 12 ...

“UpperImmediate”instructions

• Has20-bitimmediateinupper20bitsof32-bitinstructionword• Onedestinationregister,rd• Usedfortwoinstructions

− LUI– LoadUpperImmediate(addtozero)− AUIPC– AddUpperImmediatetoPC

10

Page 11: CS 61C: Great Ideas in Computer Architecture Lecture 12 ...

Implementinglui

CS61c 11

IMEMALU

Imm.Gen

+4

DMEMBranchComp.

Reg[]

AddrAAddrB

DataAAddrD

DataB

DataD

Addr

DataWDataR

10

0121

001

inst[11:7]

inst[19:15]

inst[24:20]

inst[31:7]

pc+4alu

mem

wb

alu

pc+4

Reg[rs1]

pc

imm[31:0]

Reg[rs2]

inst[31:0] ImmSel=U RegWEn=1

BrUn=* BrE=* BrLT=*

Asel=*Bsel=1 ALUSel=B MemRW=Read WBSel=1PCSel=pc+4

wb

pc

Page 12: CS 61C: Great Ideas in Computer Architecture Lecture 12 ...

Implementingauipc

CS61c 12

IMEMALU

Imm.Gen

+4

DMEMBranchComp.

Reg[]

AddrAAddrB

DataAAddrD

DataB

DataD

Addr

DataWDataR

10

0121

001

inst[11:7]

inst[19:15]

inst[24:20]

inst[31:7]

pc+4alu

mem

wb

alu

pc+4

Reg[rs1]

pc

imm[31:0]

Reg[rs2]

inst[31:0] ImmSel=U RegWEn=1

BrUn=* BrE=* BrLT=*

Asel=1Bsel=1 ALUSel=Add MemRW=0 WBSel=1PCSel=pc+4

wb

pc

Page 13: CS 61C: Great Ideas in Computer Architecture Lecture 12 ...

Recap:CompleteRV32IISA

13

NotinCS61C

RV32Ihas47instructionstotal37instructionscoveredinCS61C

Page 14: CS 61C: Great Ideas in Computer Architecture Lecture 12 ...

Single-CycleRISC-VRV32IDatapath

CS61c 14

IMEMALU

Imm.Gen

+4

DMEMBranchComp.

Reg[]

AddrAAddrB

DataAAddrD

DataB

DataD

Addr

DataWDataR

10

0121

0 pc01

inst[11:7]

inst[19:15]

inst[24:20]

inst[31:7]

pc+4alu

mem

wb

alu

pc+4

Reg[rs1]

pc

imm[31:0]

Reg[rs2]

inst[31:0] ImmSel RegWEn BrUnBrEq BrLT ASelBSel ALUSel MemRW WBSelPCSel

wb

Page 15: CS 61C: Great Ideas in Computer Architecture Lecture 12 ...

Agenda• FinishSingle-CycleRISC-VDatapath• Controller• InstructionTiming• PerformanceMeasures• IntroductiontoPipelining• PipelinedRISC-V Datapath• AndinConclusion,...CS61c Lecture12:Control&Performance 15

Page 16: CS 61C: Great Ideas in Computer Architecture Lecture 12 ...

Processor

CS61c Lecture12:Control&Performance 16

Processor

Control

DatapathPC

RegistersArithmetic&LogicUnit

(ALU)

Memory

Bytes

Enable?Read/Write

Address

WriteDataReadData

Processor-MemoryInterface

Program

Data

Page 17: CS 61C: Great Ideas in Computer Architecture Lecture 12 ...

Single-CycleRISC-VRV32IDatapath

CS61c 17

IMEMALU

Imm.Gen

+4

DMEMBranchComp.

Reg[]

AddrAAddrB

DataAAddrD

DataB

DataD

Addr

DataWDataR

10

0121

0 pc01

inst[11:7]

inst[19:15]

inst[24:20]

inst[31:7]

pc+4alu

mem

wb

alu

pc+4

Reg[rs1]

pc

imm[31:0]

Reg[rs2]

ControlLogic

inst[31:0] ImmSel RegWEn BrUnBrEq BrLT ASelBSel ALUSel MemRW WBSelPCSel

wb

Page 18: CS 61C: Great Ideas in Computer Architecture Lecture 12 ...

ControlLogicTruthTable(incomplete)

CS61c Lecture12:Control&Performance 18

Inst[31:0] BrEq BrLT PCSel ImmSel BrUn ASel BSel ALUSel MemRW RegWEn WBSel

add * * +4 * * Reg Reg Add Read 1 ALU

sub * * +4 * * Reg Reg Sub Read 1 ALU

(R-R Op) * * +4 * * Reg Reg (Op) Read 1 ALU

addi * * +4 I * Reg Imm Add Read 1 ALU

lw * * +4 I * Reg Imm Add Read 1 Mem

sw * * +4 S * Reg Imm Add Write 0 *

beq 0 * +4 B * PC Imm Add Read 0 *

beq 1 * ALU B * PC Imm Add Read 0 *

bne 0 * ALU B * PC Imm Add Read 0 *

bne 1 * +4 B * PC Imm Add Read 0 *

blt * 1 ALU B 0 PC Imm Add Read 0 *

bltu * 1 ALU B 1 PC Imm Add Read 0 *

jalr * * ALU I * Reg Imm Add Read 1 PC+4

jal * * ALU J * PC Imm Add Read 1 PC+4

auipc * * +4 U * PC Imm Add Read 1 ALU

Page 19: CS 61C: Great Ideas in Computer Architecture Lecture 12 ...

ControlRealizationOptions• ROM

−“Read-OnlyMemory”−Regularstructure−Canbeeasilyreprogrammed

§ fixerrors§ addinstructions

−Popularwhendesigningcontrollogicmanually• CombinatorialLogic

−Today,chipdesignersuselogicsynthesistoolstoconverttruthtablestonetworksofgates

CS61c Lecture12:Control&Performance 19

Page 20: CS 61C: Great Ideas in Computer Architecture Lecture 12 ...

RV32I,anine-bitISA!

20

NotinCS61C

Instructiontypeencodedusingonly9bitsinst[30],inst[14:12],inst[6:2]

inst[30] inst[14:12] inst[6:2]

Page 21: CS 61C: Great Ideas in Computer Architecture Lecture 12 ...

ROM-basedControl

CS61c Lecture12:Control&Performance 21

ROM

Inst[30,14:12,6:2] BrEq9

PCSel

ALUSel[3:0]4

11-bitaddress(inputs)

15databits(outputs)

BrLT

ImmSel[2:0]3BrUnASelBSel

MemRWRegWEnWBSel[1:0]2

Page 22: CS 61C: Great Ideas in Computer Architecture Lecture 12 ...

ROMControllerImplementation

CS61c Lecture12:Control&Performance 22

Control Word foraddControl Word forsubControl Word foror

.

.

.

Address

Decode

r...

Inst[]BrEQBrLT

Controlleroutput(PCSel,ImmSel,…)

addsubor

jal

11

Page 23: CS 61C: Great Ideas in Computer Architecture Lecture 12 ...

Administrivia• Homework2Duetomorrow11:59pm• Project1Part1DueMondayOct.9

− Part2dueMondayOct.16

• Midterm1RegradesduenextTuesday− TalktoaTAifyoudon’tunderstandamidtermquestionorareunsureofaregrade

CS61c Lecture12:Control&Performance 23

Page 24: CS 61C: Great Ideas in Computer Architecture Lecture 12 ...

Break!

10/5/17 24

Page 25: CS 61C: Great Ideas in Computer Architecture Lecture 12 ...

Agenda• FinishSingle-CycleRISC-VDatapath• Controller• InstructionTiming• PerformanceMeasures• IntroductiontoPipelining• PipelinedRISC-V Datapath• AndinConclusion,...CS61c Lecture12:Control&Performance 25

Page 26: CS 61C: Great Ideas in Computer Architecture Lecture 12 ...

InstructionTiming

IF ID EX MEM WB Total

I-MEM Reg Read ALU D-MEM Reg W

200 ps 100ps 200 ps 200ps 100ps 800psCS61c Lecture12:Control&Performance 26

Page 27: CS 61C: Great Ideas in Computer Architecture Lecture 12 ...

InstructionTiming

• Maximumclockfrequency− fmax =1/800ps=1.25GHz

• Mostblocksidlemostofthetime− E.g.fmax,ALU =1/200ps=5GHz!− HowcanwekeepALUbusyallthetime?− 5billionadds/sec,ratherthanjust1.25billion?− Idea:Factories usethreeemployeeshifts- equipmentisalwaysbusy!

Instr IF=200ps ID=100ps ALU=200ps MEM=200ps WB =100ps Totaladd X X X X 600psbeq X X X 500psjal X X X 500pslw X X X X X 800pssw X X X X 700ps

Page 28: CS 61C: Great Ideas in Computer Architecture Lecture 12 ...

Agenda• FinishSingle-CycleRISC-VDatapath• Controller• InstructionTiming• PerformanceMeasures• IntroductiontoPipelining• PipelinedRISC-V Datapath• AndinConclusion,...CS61c Lecture12:Control&Performance 28

Page 29: CS 61C: Great Ideas in Computer Architecture Lecture 12 ...

PerformanceMeasures• “Our”RISC-Vexecutesinstructionsat1.25GHz

−1instructionevery800ps

• Canweimproveitsperformance?−Whatdowemeanwiththisstatement?−Notsoobvious:

§ Quickerresponsetime,soonejobfinishesfaster?§ Morejobsperunittime(e.g.webserverreturningpages)?§ Longerbatterylife?

CS61c Lecture12:Control&Performance 29

Page 30: CS 61C: Great Ideas in Computer Architecture Lecture 12 ...

TransportationAnalogySportsCar Bus

Passenger Capacity 2 50Travel Speed 200mph 50mphGasMileage 5mpg 2mpg

CS61c Lecture12:Control&Performance 30

SportsCar BusTravelTime 15min 60minTimefor100 passengers 750min 120minGallonsperpassenger 5gallons 0.5 gallons

50Miletrip:

Page 31: CS 61C: Great Ideas in Computer Architecture Lecture 12 ...

ComputerAnalogyTransportation Computer

TripTime Program executiontime:e.g.timetoupdatedisplay

Timefor100passengers Throughput:e.g.number ofserverrequestshandledperhour

Gallonsper passenger Energypertask*:e.g.how manymoviesyoucanwatchperbatterychargeorenergybillfordatacenter

31

*Note: powerisnotagoodmeasure,sincelow-powerCPUmightrunforalongtimetocompleteonetaskconsumingmoreenergythanfastercomputerrunningathigherpowerforashortertime

Page 32: CS 61C: Great Ideas in Computer Architecture Lecture 12 ...

“IronLaw”ofProcessorPerformance

32

Time =Instructions Cycles TimeProgramProgram*Instruction*Cycle

Page 33: CS 61C: Great Ideas in Computer Architecture Lecture 12 ...

InstructionsperProgram

Determinedby• Task• Algorithm,e.g.O(N2) vsO(N)• Programminglanguage• Compiler• InstructionSetArchitecture(ISA)

CS61c Lecture12:Control&Performance 33

Page 34: CS 61C: Great Ideas in Computer Architecture Lecture 12 ...

(Average)ClockcyclesperInstruction

Determinedby• ISA• Processorimplementation(ormicroarchitecture)• E.g.for“our”single-cycleRISC-Vdesign,CPI=1• Complexinstructions(e.g.strcpy),CPI>>1• Superscalarprocessors,CPI<1(nextlecture)

CS61c Lecture12:Control&Performance 34

Page 35: CS 61C: Great Ideas in Computer Architecture Lecture 12 ...

TimeperCycle(1/Frequency)

Determinedby• Processormicroarchitecture(determinescriticalpath

throughlogicgates)• Technology(e.g.14nmversus28nm)• Powerbudget(lowervoltagesreducetransistorspeed)

CS61c Lecture12:Control&Performance 35

Page 36: CS 61C: Great Ideas in Computer Architecture Lecture 12 ...

SpeedTradeoffExample• Forsometask(e.g.imagecompression)…

CS61c Lecture12:Control&Performance 36

ProcessorA Processor B#Instructions 1Million 1.5MillionAverageCPI 2.5 1Clockrate f 2.5 GHz 2GHzExecutiontime 1ms 0.75ms

ProcessorBisfasterforthistask,despiteexecutingmoreinstructionsandhavingalowerclockrate!

Page 37: CS 61C: Great Ideas in Computer Architecture Lecture 12 ...

EnergyperTask

37

Energy =Instructions EnergyProgramProgram*Instruction

Energy αInstructions *CV2

ProgramProgram

“Capacitance” dependsontechnology,processorfeaturese.g.#ofcores

Supplyvoltage,e.g.1V

Wanttoreducecapacitanceandvoltagetoreduceenergy/task

Page 38: CS 61C: Great Ideas in Computer Architecture Lecture 12 ...

EnergyTradeoffExample• “Next-generation”processor

− C(Moore’sLaw): -15%− Supplyvoltage,Vsup: -15%− Energyconsumption: 1- (1-0.85)3 =-39%

• Significantlyimprovedenergyefficiencythanksto−Moore’sLawAND− Reducedsupplyvoltage

CS61c Lecture12:Control&Performance 38

Page 39: CS 61C: Great Ideas in Computer Architecture Lecture 12 ...

Energy“IronLaw”

39

Performance=Power*EnergyEfficiency(Tasks/Second)(Joules/Second)(Tasks/Joule)

• Energyefficiency(e.g.,instructions/Joule)iskeymetricinallcomputingdevices

• Forpower-constrainedsystems(e.g.,20MWdatacenter),needbetterenergyefficiencytogetmoreperformanceatsamepower

• Forenergy-constrainedsystems(e.g.,1Wphone),needbetterenergyefficiencytoprolongbatterylife

Page 40: CS 61C: Great Ideas in Computer Architecture Lecture 12 ...

EndofScaling• Inrecentyears,industryhasnotbeenabletoreducesupplyvoltagemuch,asreducingitfurtherwouldmeanincreasing“leakagepower”wheretransistorswitchesdon’tfullyturnoff(morelikedimmerswitchthanon-offswitch)• Also,sizeoftransistorsandhencecapacitance,notshrinkingasmuchasbeforebetweentransistorgenerations• Powerbecomesagrowingconcern– the“powerwall”• Cost-effectiveair-cooledchiplimitaround~150W

CS61c Lecture12:Control&Performance 40

Page 41: CS 61C: Great Ideas in Computer Architecture Lecture 12 ...

0

1

10

100

1000

10000

100000

1000000

10000000

1970 1975 1980 1985 1990 1995 2000 2005 2010

Transistors (Thousands)Frequency (MHz)Power (W)Cores

ProcessorTrends

41[Olukotun, Hammond,Sutter,Smith,Batten]

Page 42: CS 61C: Great Ideas in Computer Architecture Lecture 12 ...

Break!

10/5/17 42

Page 43: CS 61C: Great Ideas in Computer Architecture Lecture 12 ...

Agenda• FinishSingle-CycleRISC-VDatapath• Controller• InstructionTiming• PerformanceMeasures• IntroductiontoPipelining• PipelinedRISC-V Datapath• AndinConclusion,...CS61c Lecture12:Control&Performance 43

Page 44: CS 61C: Great Ideas in Computer Architecture Lecture 12 ...

Pipelining• Afamiliarexample:

− Gettingauniversitydegree

• ShortageofComputerscientists(yourstartupisgrowing):− Howlongdoesittaketoeducate16,000?

CS61c Lecture12:Control&Performance 44

Year1 Year2 Year3 Year4

Page 45: CS 61C: Great Ideas in Computer Architecture Lecture 12 ...

ComputerScientistEducation

CS61c Lecture12:Control&Performance 45

• Option1:serial

• Option2:pipelining

7years

16,000in16years,averagethroughputis1000/year

• 16,000in7years• Steadystatethroughputis4000/year• Resourcesusedefficiently• 4-foldimprovementoverserialeducation

4000graduate

4years

4000graduate

4years

4000graduate

4years

4000

4years

4000graduate4000graduate

4000graduate4000graduate

4000enter

1year

Page 46: CS 61C: Great Ideas in Computer Architecture Lecture 12 ...

LatencyversusThroughput• Latency

− Timefromenteringcollegetograduation− Serial 4years− Pipelining 4years

• Throughput− Averagenumberofstudentsgraduatingeachyear− Serial 1000− Pipelining 4000

• Pipelining− Increasesthroughput(4xinthisexample)− Butdoesnothingtolatency

§ sometimesworse(additionaloverheade.g.forshifttransition)CS61c Lecture12:Control&Performance 46

Page 47: CS 61C: Great Ideas in Computer Architecture Lecture 12 ...

SimultaneousversusSequential

CS61c Lecture12:Control&Performance 47

• Whathappenssequentially?• Whathappenssimultaneously?

4years

4000graduate4000graduate

4000graduate4000graduate

4000graduate4000graduate

4000graduate4000graduate

4000graduate4000graduate

Page 48: CS 61C: Great Ideas in Computer Architecture Lecture 12 ...

Agenda• FinishSingle-CycleRISC-VDatapath• Controller• InstructionTiming• PerformanceMeasures• IntroductiontoPipelining• PipelinedRISC-V Datapath• AndinConclusion,...CS61c Lecture12:Control&Performance 48

Page 49: CS 61C: Great Ideas in Computer Architecture Lecture 12 ...

PipeliningwithRISC-V

CS61c 49

Phase Pictogram tstep SerialInstruction Fetch 200 psReg Read 100psALU 200psMemory 200psRegisterWrite 100pstinstruction 800ps

addt0,t1,t2

ort3,t4,t5

sll t6,t0,t3tcycle

instructionsequence

tinstruction

tcycle Pipelined200 ps200ps200ps200ps200ps1000ps

Page 50: CS 61C: Great Ideas in Computer Architecture Lecture 12 ...

PipeliningwithRISC-V

CS61c 50

addt0,t1,t2

ort3,t4,t5

sll t6,t0,t3tcycle

instructionsequence

tinstruction

Single Cycle PipeliningTiming tstep =100…200ps tcycle =200ps

Register accessonly100ps AllcyclessamelengthInstruction time,tinstruction =tcycle =800ps 1000psClockrate,fs 1/800ps =1.25GHz 1/200 ps =5GHzRelativespeed 1x 4x

Page 51: CS 61C: Great Ideas in Computer Architecture Lecture 12 ...

SequentialvsSimultaneous

addt0,t1,t2

ort3,t4,t5

sll t6,t0,t3

tcycle=200ps

instructionsequence

tinstruction =1000ps

sw t0,4(t3)

lw t0,8(t3)

addi t2,t2,1

Whathappenssequentially,whathappenssimultaneously?

Page 52: CS 61C: Great Ideas in Computer Architecture Lecture 12 ...

Agenda• FinishSingle-CycleRISC-VDatapath• Controller• InstructionTiming• PerformanceMeasures• IntroductiontoPipelining• PipelinedRISC-V Datapath• AndinConclusion,...CS61c Lecture12:Control&Performance 52

Page 53: CS 61C: Great Ideas in Computer Architecture Lecture 12 ...

AndinConclusion,…• Controller

− Tellsuniversaldatapathhowtoexecuteeachinstruction• Instructiontiming

− Setbyinstructioncomplexity,architecture,technology− Pipeliningincreasesclockfrequency,“instructionspersecond”

§ Butdoesnotreducetimetocompleteinstruction

• Performancemeasures− Differentmeasuresdependingonobjective

§ Responsetime§ Jobs/second§ Energypertask

CS61c Lecture12:Control&Performance 53