Basic MIPS Architecture: Single-Cycle Datapath and Control

Click here to load reader

download Basic MIPS Architecture: Single-Cycle Datapath and Control

of 32

description

Basic MIPS Architecture: Single-Cycle Datapath and Control. Chapter 4 Sections 4.1 – 4.4 Appendix D.1 and D.2 Dr. Iyad F. Jafar. Outline. Introduction Clocking Single-cycle Datapath Single-cycle Control Performance Analysis. Introduction. So far, we have built a small ALU - PowerPoint PPT Presentation

Transcript of Basic MIPS Architecture: Single-Cycle Datapath and Control

Introduction

Chapter 4Sections 4.1 4.4Appendix D.1 and D.2

Dr. Iyad F. JafarBasic MIPS Architecture:Single-Cycle Datapath and ControlOutline Introduction Clocking Single-cycle Datapath Single-cycle Control Performance Analysis

22Introduction3So far, we have built a small ALUADD, SUB, SLT, AND, OR, What about Memory and registers?Control operations? Interpreting (decoding) instructions?The big pictureThe CPUs datapath deals with moving data around The CPUs control manages the dataGeneric implementationFetchPC = PC+4DecodeExecuteThe clocking methodology defines when signals can be read and when they are writtenAn edge-triggered methodologyTypical executionread contents of state elements send values through combinational logicwrite results to one or more state elements

Assumes state elements are written on every clock cycle; if not, need explicit write control signalwrite occurs only when both the write control is asserted and the clock edge occurs

Clocking4State ElementState ElementCombinationallogicclockone clock cycleSingle-Cycle Datapath5The first implementation considered

All instructions start and finish execution in one cycle!

This include the time required to fetch, decode, and execute the instruction

In the following, we will consider the datapath of each of these steps Single-Cycle Datapath6Fetch DatapathFetching the instruction from memory requires Sending the PC to memory to read the instruction Update the PC to point to the next instruction

Do we need an explicit write signal for writing the PC?Do we need an explicit read signal for reading the memory?

PCReadAddressDataInstructionMemory+4InstructionNo, they read and written on every cycle !6Single-Cycle Datapath7Decode DatapathRegardless of the instruction Send the opcode (31-26) and the function (5-0) fields of the instruction to the control unit Read two registers; rs (25-21) and rt (20-16)Reading is not harmful!

InstructionWrite DataRead Addr 1Read Addr 2Write AddrRegister FileRead Data 1Read Data 2ControlUnitR[rs]R[rt]7Single-Cycle Datapath8Inside the Register FileHow can we read a register out of 32 registers?

Register 0Register 1Register 2.Register 3132-to-1 MUX32-to-1 MUXRead Register 1Read Register 2Read Data 1Read Data 2013101318Single-Cycle Datapath9Inside the Register FileHow can we write a register out of 32 registers?

Register Number5-to-32 DecoderWrite DataRegister 0DCRegister 1DCRegister 2DC..DCRegister 31DCWrite3110Clock9Single-Cycle Datapath10Execution DatapathR-type instructions (ADD, SUB, SLT, AND, OR)The two registers are read already!Perform operation based on OPCODE and FUNC fieldsStore the result back into the register file (the destination register is specified in rd field of the instruction (15-11)!

The register file is not written on every cycle! Need an explicit write signal

Write DataRead Addr 1Read Addr 2Write AddrRegister FileRead Data 1Read Data 2R[rs]R[rt]InstructionWriteALURegWriteALU Control10Single-Cycle Datapath11Execution DatapathLoad InstructionCompute the load address Store the loaded data in the register file. The destination register is the rt field of the instruction (20-16)

Write DataRead Addr 1Read Addr 2Write AddrRegister FileRead Data 1Read Data 2R[rs]R[rt]InstructionWriteALURegWriteALU ControlAddressDataData MemorySign Ext.WriteDataMemReadMemWrite11Single-Cycle Datapath12Execution DatapathStore InstructionCompute the load address Store register in the memory

Write DataRead Addr 1Read Addr 2Write AddrRegister FileRead Data 1Read Data 2R[rs]R[rt]InstructionWriteALURegWriteALU ControlAddressDataData MemorySign Ext.WriteDataMemReadMemWrite12Single-Cycle Datapath13Execution DatapathBranch InstructionCompare the two registers Compute the branch addressChange PC if true !

Write DataRead Addr 1Read Addr 2Write AddrRegister FileRead Data 1Read Data 2InstructionWriteRegWriteSign Ext.ALUALU ControlPC+4x4+ZeroBranch Address10Branch AddressZero13Single-Cycle Datapath14Execution DatapathJump InstructionCompute the jump addressStore it in the PC

PCReadAddressDataInstructionMemory+4Instructionx4jump address10JumpNo, they read and written on every cycle !14Single-Cycle Datapath15Creating the Single DatapathAssemble the datapath segments and add control lines and multiplexors as neededSingle cycle design Fetch, decode and execute each instructions in one clock cycleNo datapath resource can be used more than once per instruction, so some must be duplicated (e.g., separate Instruction Memory and Data Memory, several adders)Multiplexors needed at the input of shared elements with control lines to do the selectionWrite signals to control writing to the Register File and Data MemoryCycle time is determined by length of the longest path

15Single-Cycle Datapath16ReadAddressInstr[31-0]InstructionMemory+PC4Write DataRead Addr 1Read Addr 2Write AddrRegister

FileRead Data 1Read Data 2ALUovfzeroRegWriteDataMemoryAddressWrite DataRead DataMemWriteMemReadSignExtend1632MemtoRegALUSrcShiftleft 2+PCSrcRegDstALUcontrol11100001ALUOpInstr[5-0]Instr[15-0]Instr[25-21]Instr[20-16]Instr[15 -11]ControlUnitInstr[31-26]Branch01Shiftleft 2Instr[25-0]PC[31-28]Jump16Single-Cycle Control17Need to design the control that generates the appropriate control signals based on the Opcode and Function fields to Specify the operation of the ALUControl the data flow by selecting the appropriate input of the multiplexors With the following observations across different instructions Op field is always in bits 31-26 of the instruction Address of registers to be read are always specified by The rs field (bits 25-21) The rt field (bits 20-16)For LW and SW, the rs field is the base registerAddress of register to be written is in one of two placesFor LW, the address is the rt field (bits 20-16 )For R-type, the address is the rd field (bits 15-11)Offset for BEQ, LW, and SW is always in bits 15-0 of the instruction

17Single-Cycle Control18Signal NameEffect when Deassereted (0)Effect when Asserted (1)RegDstThe destination register is from rt fieldThe destination register is from rd fieldRegWriteNoneEnable writing to the register selected by the Write register portALUSrcThe second ALU operand comes from the second register file outputThe second ALU operand is the sign extended offset PCSrcPC value is PC+4PC is the branch addressMemReadNoneContents of memory address are put on Read data outputMemWriteNoneData on the Write data input is placed in the specified addressMemtoRegThe data fed to the register file Write data input comes from ALU The data fed to the register file Write data input comes from memoryALUOpUsed with the function field of the instruction to generate the ALUOp signal that specify the ALU operation18R-type Instruction Data/Control Flow19ReadAddressInstr[31-0]InstructionMemory+PC4Write DataRead Addr 1Read Addr 2Write AddrRegister

FileRead Data 1Read Data 2ALUovfzeroRegWriteDataMemoryAddressWrite DataRead DataMemWriteMemReadSignExtend1632MemtoRegALUSrcShiftleft 2+PCSrcRegDstALUcontrol11100001ALUOpInstr[5-0]Instr[15-0]Instr[25-21]Instr[20-16]Instr[15 -11]ControlUnitInstr[31-26]Branch01Shiftleft 2Instr[26-0]PC[31-28]Jump19Load Word Instruction Data/Control Flow20ReadAddressInstr[31-0]InstructionMemory+PC4Write DataRead Addr 1Read Addr 2Write AddrRegister

FileRead Data 1Read Data 2ALUovfzeroRegWriteDataMemoryAddressWrite DataRead DataMemWriteMemReadSignExtend1632MemtoRegALUSrcShiftleft 2+PCSrcRegDstALUcontrol11100001ALUOpInstr[5-0]Instr[15-0]Instr[25-21]Instr[20-16]Instr[15 -11]ControlUnitInstr[31-26]Branch01Shiftleft 2Instr[26-0]PC[31-28]Jump20Branch Equal Instruction Data/Control Flow21ReadAddressInstr[31-0]InstructionMemory+PC4Write DataRead Addr 1Read Addr 2Write AddrRegister

FileRead Data 1Read Data 2ALUovfzeroRegWriteDataMemoryAddressWrite DataRead DataMemWriteMemReadSignExtend1632MemtoRegALUSrcShiftleft 2+PCSrcRegDstALUcontrol11100001ALUOpInstr[5-0]Instr[15-0]Instr[25-21]Instr[20-16]Instr[15 -11]ControlUnitInstr[31-26]Branch01Shiftleft 2Instr[26-0]PC[31-28]Jump21Jump Instruction Data/Control Flow22ReadAddressInstr[31-0]InstructionMemory+PC4Write DataRead Addr 1Read Addr 2Write AddrRegister

FileRead Data 1Read Data 2ALUovfzeroRegWriteDataMemoryAddressWrite DataRead DataMemWriteMemReadSignExtend1632MemtoRegALUSrcShiftleft 2+PCSrcRegDstALUcontrol11100001ALUOpInstr[5-0]Instr[15-0]Instr[25-21]Instr[20-16]Instr[15 -11]ControlUnitInstr[31-26]Branch01Shiftleft 2Instr[26-0]PC[31-28]Jump22Single-Cycle Control23The Main Control UnitThe input is the Op field (6 bits) from the instruction The output is nine control signals The truth table !

InputsOutputsOp5Op4Op3Op2Op1Op0RegDistALUsrcMemtoRegRegWriteMemReadMemWriteBranchALUop1ALUop0R-type000000100100010LW100011011110000SW101011X1X001000BEQ000100X0X000101Why to split the Control into main and ALU ??? If we use one init, then it will have (6+6) inputs and (7+3) outputs if split, the main is 6x9 and the ALU is (6+2)x3 this reduces the complexity and cost !

For the ALUop, basically, it performs three types of operations 00 LW and SW addition 01 BEW Subtraction 10 R-type depending on the Func

23Single-Cycle Control24The Main Control UnitTo design the logic circuit, generate the appropriate minterms for each output signalSimply, use a PLA!

24Single-Cycle Control25The ALU Control UnitIt has two inputsALUop (2 bits) from Main control Func (6 bits) from the instructionIt has two outputs Bengate (1 bits)Operation (2 bits)Supported Operations

FunctionBnegateOperationand000or001add010sub110slt111ALUcontrolALUopFuncBnegateOperation25Single-Cycle Control26The ALU Control UnitTruth Table !

InputsOutputsALUop1ALUop0F5F4F3F2F1F0BnegateOperation1Operation0AND10100100000OR10100101001ADD10100000010SUB10100010110SLT10101010111LW00n/a010SW00n/a010BEQ01n/a11026Single-Cycle Control27The ALU Control UnitHardware ImplementationGenerating minterms!! Minimization!! By inspection!

27Performance Analysis28All instructions have to finish in one cycle!How long is the cycle time? Different units are used in different instructionsEach unit has its own delay Need to find the longest path!Assume the following times

Thus, the cycle time should be at least 8 nsR-type:Instr. FetchRegister ReadALU Register Write 6ns8ns7ns5ns2nsBranch:Instr. FetchRegister ReadALULW:Instr. FetchRegister ReadALU Memory ReadRegister WriteSW:Instr. FetchRegister ReadALU Memory WriteJump: Instr. FetchUnitDelayALU2 nsMemory2 nsRegister File1 ns28Performance Analysis29The cycle time is fixed!However, not all instructions require the same time! There is a wasted time for some instructions?!

Possible Solution? ClockLWSWCycle 1Cycle 2waste29Performance Analysis30Example 1. Example 1. consider the following two implementations of a single cycle machine: Machine A : all instructions execute in one cycle of fixed lengthMachine B: all instructions execute in one cycle , however, the cycle time adapts to instruction types

Use the information given in the tables to compare the two machines

UnitTime (ps)Memory200ALU and adders100Register File50Instruction typePercentage %ALU45Load25Store10Branch15Jump530Performance Analysis31Example 1. Continued. CPU Execution Time = IC x CPI x Clock cycle timeCPI A = CPIB = 1 ICA = ICB CCA= 600 ns

CCB = 600 x 0.25 + 550 x 0.1 + 400 x 0.45 + 350 x 0.15 + 200 x 0.05 = 447.5 psperformancB / performanceA = 600 / 447.5 = 1.34So, adaptive clock cycle is faster; however it is hard to implement !

Instruction TypeInst. MemoryRegister ReadALUData MemoryRegister WriteTotal R-type20050100050400Load2005010020050600Store20050100200550Branch200501000350Jump20020031Single Cycle Disadvantages & Advantages32Single-cycle implementation assumes that all instructions can execute in one cycles

AdvantagesSimple and easy to understand

Disadvantages Hardware duplication!Uses the clock cycle inefficiently the clock cycle must be timed to accommodate the slowest instruction (especially problematic for more complex instructions like floating point multiply)

32