Recap (Pipelining)

Click here to load reader

  • date post

  • Category


  • view

  • download


Embed Size (px)


Recap (Pipelining). What is Pipelining?. A way of speeding up execution of tasks Key idea : overlap execution of multiple taks. Automobile Manufacturing. 1. Build frame. 60 min. 2. Add engine. 50 min. 3. Build body. 80 min. 4. Paint. 40 min. 5. Finish.45 min. 275 min. - PowerPoint PPT Presentation

Transcript of Recap (Pipelining)

  • Recap(Pipelining)

  • What is Pipelining?A way of speeding up execution of tasks

    Key idea: overlap execution of multiple taks

  • Automobile Manufacturing1. Build frame. 60 min.2. Add engine. 50 min.3. Build body. 80 min. 4. Paint. 40 min.5. Finish.45 min.275 min.Latency: Time from start to finish for one car.Throughput: Number of finished cars per time unit.1 car/275 min = 0.218 cars/hour 275 minutes per car.Issues: How can we make the process better by adding?(smaller is better)(larger is better)

  • An Assembly line6050804045First two stages cant produce faster than one car/80 min or a backlog will occur at third stage.8080Last two stages only receive one car/80 min to work on.8080Latency: 400 min/carThroughput: 4 cars/640 min (1 car/160 min)Will approach 1 car/80 min as time goes on

  • Pipelining a Digital SystemKey idea: break big computation up into pieces Separate each piece with a pipeline register

  • Pipelining a Digital SystemWhy do this? Because it's faster for repeated computations

  • Comments about pipeliningPipelining increases throughput, but not latencyAnswer available every 200ps, BUTA single computation still takes 1nsLimitations:Computations must be divisible into stages of equal sizesPipeline registers add overhead

  • Another ExampleOne operation must complete before next can beginOperations spaced 33ns apartDelay = 33nsThroughput = 30MHzTimeUnpipelinedSystemOp1Op2Op3??

  • 3 Stage PipeliningSpace operations 13ns apart3 operations occur simultaneouslyDelay = 39nsThroughput = 77MHzTimeOp1Op2Op3Op4

  • Limitation: Nonuniform PipeliningThroughput limited by slowest stageDelay determined by clock period * number of stagesMust attempt to balance stagesClockDelay = 18 * 3 = 54 nsThroughput = 55MHz

  • Limitation: Deep PipelinesDiminishing returns as add more pipeline stagesRegister delays become limiting factorIncreased latencySmall throughput gainsMore hazardsDelay = 48ns, Throughput = 128MHz

  • MIPSPipelining

  • MIPS 5-stage pipelineThe MIPS processor needs 5 stages to execute instructionsPipelining stages:IF - Instruction FetchID - Instruction DecodeEX - Execute / Address CalculationMEM - Memory Access (read / write)WB - Write Back (results into register file)Not all instructions need all the stages (e.g., add instruction does not need the MEM stage)

  • Basic MIPS Pipelined ProcessorIF/IDID/EXEX/MEMMEM/WB

  • Pipelined Example - Executing Multiple InstructionsConsider the following instruction sequence:lw $r0, 10($r1)sw $sr3, 20($r4)add $r5, $r6, $r7sub $r8, $r9, $r10

  • Executing Multiple InstructionsClock Cycle 1LW

  • Executing Multiple InstructionsClock Cycle 2LWSW

  • Executing Multiple InstructionsClock Cycle 3LWSWADD

  • Executing Multiple InstructionsClock Cycle 4LWSWADDSUB

  • Executing Multiple InstructionsClock Cycle 5LWSWADDSUB

  • Executing Multiple InstructionsClock Cycle 6SWADDSUB

  • Executing Multiple InstructionsClock Cycle 7ADDSUB

  • Executing Multiple InstructionsClock Cycle 8SUB

  • Alternative View - Multicycle Diagram

  • Processor PipeliningThere are two ways that pipelining can help:Reduce the clock cycle time, and keep the same CPIReduce the CPI, and keep the same clock cycle time

    CPU time = Instruction count * CPI * Clock cycle time

  • Reduce the clock cycle time, and keep the same CPICPI = 1Clock = X Hz

  • Reduce the clock cycle time, and keep the same CPI5516RD1RD2RN1RN2WNWDRegister FileALU1632RDWDDataMemoryADDR5Instruction I32
  • Reduce the CPI, and keep the same cycle timeCPI = 5Clock = X*5 Hz

  • Reduce the CPI, and keep the same cycle timeCPI = 1Clock = X*5 Hz

  • Pipeline performanceIdeally we get a speedup (by reducing clock cycle or reducing the CPI) equal to the number of stages.In practice, we do not achieve that but we get close:Pipelining has additional overhead (e.g., pipeline registers)Pipeline hazards

  • Pipeline HazardsHazards are situations in pipelining which prevent the next instruction in the instruction stream from executing during the designated clock cycle.Hazards reduce the ideal speedup gained from pipelining (e.g., CPI =1) and are classified into three classes:Structural hazards Data hazards Control hazards

    Please if anyone has additional comments please speak up