MIPS Pipelining

Click here to load reader

download MIPS Pipelining

of 73

description

MIPS Pipelining. Chapter 4 Sections 4.5 – 4.8 Dr. Iyad F. Jafar. Outline. Introduction Why Pipelining? MIPS Pipelined Datapath MIPS Pipelined Control Pipelining Hazards Structural Hazards Data Hazards Control Hazards Exceptions and Interrupts Fallacies and Pitfalls - PowerPoint PPT Presentation

Transcript of MIPS Pipelining

Introduction

Chapter 4 Sections 4.5 4.8

Dr. Iyad F. JafarMIPS PipeliningOutline Introduction Why Pipelining? MIPS Pipelined Datapath MIPS Pipelined Control Pipelining Hazards Structural Hazards Data Hazards Control Hazards Exceptions and Interrupts Fallacies and Pitfalls Reading Assignment

22Introduction3Single-cycle datapath Simple!Hardware replication?Cycle time?Multi-cycle datapathMore involved Less HW replication of major units Better performance if the delay of major functional units is balanced! Can we do any better?Pipelining!3Introduction4PipeliningIn Multi-cycle, only one major unit is used in each cycle while other units are idle! Why not to use them to do something else? Basically, start the next instruction before the current one is finished! Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5IFetchDecExecMemWBLWCycle 7Cycle 6Cycle 8SWIFetchDecExecMemWBR-TypeIFetchDecExecMemWB4Introduction5PipeliningThe time required to execute one instruction (Instruction latency) is not affected! However, the number of instructions finished per unit time (Throughput) is increasedThus, Pipelining improves the throughput not latency! Most modern processors are pipelined!NotesAs in multi-cycle, the cycle time is determined by the slowest unit! However, similar to single-cycle, we can get one instruction done every cycle!It is assumed that all instructions take the same number of cycles!

5Introduction6Multiple Cycle Implementation:ClkCycle 1IFetchDecExecMemWBCycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7Cycle 8Cycle 9Cycle 10IFetchDecExecMemlwswIFetchR-typelwIFetchDecExecMemWBPipeline Implementation:IFetchDecExecMemWBswIFetchDecExecMemWBR-typeClkSingle Cycle Implementation:lwswWasteCycle 1Cycle 2R-type6Why Pipelining?7For Performance! Instr.

OrderTime (clock cycles)Inst 1Inst 2Inst 3Inst 5Inst 4ALUIMRegDMRegALUIMRegDMRegALUIMRegDMRegALUIMRegDMRegALUIMRegDMRegOnce the pipeline is full, one instruction is completed every cycle, so CPI = 1 (similar to Single-cycle)Time to fill the pipelineWhy Pipelining?8Example 1. Comparing pipelining to single-cycleConsider a program that consists of a large number of LOAD instructions only that is executed on a single-cycle CPU and 5-stage pipelined CPU with the operation time for the major units (memory, ALU, and register file) to be 200 ps in both cases.

1) Determine the time required to finish executing 1,000,000 LOAD instructions and compute the speed up of pipelining.2) Determine the time required to finish executing the first 3 LOAD instructions 3) Repeat (1) and (2) if the delay of the register file is 100 ps instead of 200 ps.

Cycle times for the two implementationsCCSC = 200 + 200 + 200 + 200 + 200 = 1000 ps CCPP = 200 ps

8

PipeliningWhy Pipelining?9Example 1. Comparing pipelining to single-cycle1) Determine the time required to finish executing 1,000,000 LOAD instructions and compute the speed up of pipelining.

Single-cycleTimeSC = 1000 ps x 1000000 = 1,000,000,000 psTimePP = 1000 ps + 200 ps x 999999 = 200,000,800 psSpeeup = 1,000,000,000 / 200,000,800 = 4.99998(very close to the number of stages)After 200*5 seconds, the pipeline is full and we get 1 instruction per cycle afterwards9PipeliningWhy Pipelining?10Example 1. Comparing pipelining to single-cycle2) Determine the time required to finish executing the first 3 LOAD instructions and compute the speed up of pipelining

Single-cycleTimeSC = 1000 x 3 = 3000 psTimePP = 200 x 5 +200 + 200 = 1400 psSpeeup = 3000 / 1400 = 2.14(less than the number of stages)

10Why Pipelining?11Example 1. Comparing pipelining to single-cycle3) Repeat (1) and (2) if the delay of the register file is 100 ps . CCSC = 200 + 100 + 200 + 200 + 100 = 800 ps CCPP = 200 ps

For 1,000,000 instructions TimeSC = 800 x 1,000,000 = 800,000,000 psTimePP = 1000+ 200x999,999 = 200,000,800psSpeeup = 800,000,000/ 200,000,600 = 3.99998 (