PipelininginthePentium4
http://www.hardwaresecrets.com/inside-pentium-4-architecture/2/
https://commons.wikimedia.org/wiki/File:Pentium_4,_3.0GHz_(4).jpg
20-stagepipelineinNorthwood(2002)microarchitecture
Pentium4diephoto
Prescott(2004)microarchitecturehad31stagesandoriginaldesignhadatarget clockfrequencyof10GHz
CS2630ComputerOrganization
Meeting24:PipelinedMIPSprocessorBrandonMyers
UniversityofIowa
minutepaperidentifyonesituationwherepipelininghappens(exceptfor
digitallogicandlaundry)
Let’sreviewthestepsofhowlw getsexecuted
http://courses.cs.washington.edu/courses/cse378
IF:InstructionFetch
http://courses.cs.washington.edu/courses/cse378
readInstructionoutoftheinstructionmemory
ID:InstructionDecode
http://courses.cs.washington.edu/courses/cse378
readvaluesfromregisterfile
EX:Execute
http://courses.cs.washington.edu/courses/cse378
ALUcomputesaresult
MEM:Accessmemory
http://courses.cs.washington.edu/courses/cse378
Readthedatamemory
WB:Writeback
http://courses.cs.washington.edu/courses/cse378
Writethedatabacktoregisterfile
Peerinstruction• Matchthestagestowhathappensinthemduringthebranchinstruction
1. IF(instructionfetch)2. ID(instructiondecode)3. EX(execute)4. MEM(memory)5. WB(writeback)
a) comparetwooperandsb) readtworegistersc) readthebranch
instructionbitsd) writearegistere) writememoryf) readmemoryg) nothingornoneofthe
above
Pipelinedexecutionofaprogram
http://courses.cs.washington.edu/courses/cse378
Performanceofpipelineddatapath
IF ID EX MEM WB
cycle0 cycle1 cycle2 cycle3 cycle4
200ps 1000ps 1800ps 2600ps
IF ID EX MEM WB
cycle5
cycle0 cycle2
IF ID EX MEM WB
IF ID EX MEM WB
clock period = 400ps (clock frequency = 2.5GHz)
clock period = 1600ps (clock frequency = 0.625 GHz)
200ps 1000ps 1800ps 2600ps 3400ps
lw $t0, 4($t1)
lw $t1, 0($t2)
lw $t0, 4($t1)
lw $t1, 0($t2)
3400ps
Peerinstruction• Suppose
• clk-to-qdelayofourregistersis100ps• setuptimeofourregistersis50ps• delayofcombinationallogicineachstageis75ps,200ps,150ps,250ps,and200ps,respectively
Whatisthemaximumclockfrequencywecanrunourprocessorat?
a) 2.5GHzb) 0.976GHzc) 4.44GHzd) 2.86GHze) 3.33GHzf) 6.67GHz
Peerinstruction• Howmanyinstructions/cycle(IPC)willMIPSprocessorachievewhenpipeliningandusingeachofthe5stages(IF,ID,EX,MEM,WB)asapipelinestage?
a) 𝐼𝑃𝐶 = 5b) 𝐼𝑃𝐶 = )
*⁄c) 𝐼𝑃𝐶 = 1d) 𝐼𝑃𝐶 = 2e) 𝐼𝑃𝐶 ≤ 1
Pipelineddatapath
http://courses.cs.washington.edu/courses/cse378
greyboxesrepresentregistersforallsignals
PeerInstruction• Howshouldwechangethecontrolunittohandleapipelinedprocessor(stagesIF,ID,EX,MEM,WB)• singlecyclecontrolunitwassomecombinationallogic
a) nochangeb) implementasafinitestatemachine(FSM)c) calculatethecontrolsignalsandpassthemdownthe
pipelined) adifferentcontrolunitforeachstage;passthe
instructionbitsdownthepipeline
Controlinthepipelinedprocessor
http://courses.cs.washington.edu/courses/cse378
computeallthesignalsduringIDstage.Somesignalsnotneededuntillaterstage,sopropagatethroughstages
Pipelining…whatcouldgowrong?
http://courses.cs.washington.edu/courses/cse378
Considerthefollowingprograms…add $t0, $t1, $t2add $t4, $t0, $t3
=======================================
lw $s0, 4($t0)sll $s1, $s2, 3
=======================================
beq $zero, $zero, gadgetaddi $t1, $zero, 1gadget: addi $t1, $zero, 2
(seehandout)
Anotationforstudyinghazards
http://courses.cs.washington.edu/courses/cse378/
notethattherearenottworegisterfiles(Reg).Rather,thisnotationmeanstoshowtheactivestageforaninstructionduringeachcycle.Theregisterfile(Reg)isinvolvedduringIDandWB.
R
R
W
W
dottedblueline:Weneedtowrite$2beforeweread$2solidredline:Pointingfromwherevalueisactuallyproducedtowhereitisactuallyused
Summary• PipelinedMIPSprocessortakesmultiplecyclestofinishagiveninstruction,butitcanexecutemultipleinstructionssimultaneously(upto1perstage)• Thestagewiththelongestdelaydeterminestheclockperiod• Registersseparateeachstage;controlanddatasignalsaresenttothenextstagethroughtheregisters• Executingmultipleinstructionssimultaneouslycanresultinhazards
• Next:• thinkingabouthazardssystematically• modifyingtheprocessortocopewithhazards
Top Related