Major CPU Design Steps

50
EECC550 - Shaaban EECC550 - Shaaban #1 Lec # 5 Winter 2008 1-6-2009 Major CPU Design Steps Major CPU Design Steps 1. Analyze instruction set operations using independent RTN ISA => RTN => datapath requirements. This provides the the required datapath components and how they are connected to meet ISA requirements. 2. Select required datapath components, connections & establish clock methodology (e.g clock edge-triggered). 3. Assemble datapath meeting the requirements. 4. Identify and define the function of all control points or signals needed by the datapath. Analyze implementation of each instruction to determine setting of control points that affects its operations and register transfer. 5. Design & assemble the control logic. Hard-Wired: Finite-state machine implementation. Microprogrammed. (Chapter 5.5) Datapath Control Determine number of cycles per instruction and operations in each cycle. + i.e using a control program

description

Major CPU Design Steps. 1. Analyze instruction set operations using independent RTN ISA => RTN => datapath requirements. This provides the the required datapath components and how they are connected to meet ISA requirements. - PowerPoint PPT Presentation

Transcript of Major CPU Design Steps

Page 1: Major CPU Design Steps

EECC550 - ShaabanEECC550 - Shaaban#1 Lec # 5 Winter 2008 1-6-2009

Major CPU Design StepsMajor CPU Design Steps1. Analyze instruction set operations using independent RTN

ISA => RTN => datapath requirements.– This provides the the required datapath components and how they are connected to meet ISA

requirements.

2. Select required datapath components, connections & establish clock methodology (e.g clock edge-triggered).

3. Assemble datapath meeting the requirements.

4. Identify and define the function of all control points or signals needed by the datapath.– Analyze implementation of each instruction to determine setting of control points that affects its operations and

register transfer.

5. Design & assemble the control logic.– Hard-Wired: Finite-state machine implementation.– Microprogrammed.

(Chapter 5.5)

Dat

apat

hC

ontr

ol

Determine number of cycles per instruction and operations in each cycle.+

i.e using a control program

Page 2: Major CPU Design Steps

EECC550 - ShaabanEECC550 - Shaaban#2 Lec # 5 Winter 2008 1-6-2009

Single Cycle MIPS Datapath: Single Cycle MIPS Datapath: CPI = 1, Long Clock CycleCPI = 1, Long Clock Cycleim

m16

32

ALUop (2-bits)

Clk

busW

RegWr

32

32

busA

32busB

55 5

Rw Ra Rb32 32-bitRegisters

Rs

Rt

Rt

RdRegDst

Exten

der

Mu

x

3216imm16

ALUSrcExtOp

Mu

x

MemtoReg

Clk

Data InWrEn32 Adr

DataMemory

MemWrA

LU

Zero

Instruction<31:0>

0

1

0

1

01

<21:25>

<16:20>

<11:15>

<0:15>

Imm16RdRtRs

=

Ad

der

Ad

der

PC

Clk

00

Mu

x

4

PCSrc

PC

Ext

Adr

InstMemory

BranchZero

0

1

PC+4

BranchTarget

R[rs]

R[rt]

MainALU

(Includes ORInot in book version)

ALUControlFunction

Field

Jump Not Included

T = I x CPI x C

Page 3: Major CPU Design Steps

EECC550 - ShaabanEECC550 - Shaaban#3 Lec # 5 Winter 2008 1-6-2009

Readregister 1

Readregister 2

Writeregister

Writedata

Writedata

Registers

Add

Readdata 1

Readdata 2

Signextend

16 32

Instruction[31–0]

Add

ALUresult

Mux

Mux

Mux

Address

Datamemory

Readdata

Shiftleft 2

Shiftleft 2

4

Readaddress

Instructionmemory

PC

1

0

0

1

1

0

Mux

0

1

Mux

0

1

ALUcontrol

Instruction [5–0]

Instruction [25–21]

Instruction [31–26]

Instruction [15–11]

Instruction [20–16]

Instruction [15–0]

RegDst

Jump

Branch

MemRead

MemtoReg

ALUOp

MemWrite

ALUSrc

RegWrite

Control

Instruction [25–0] Jump address [31–0]

26 28

PC + 4 [31–28]

ALU

Zero

ALUresult

4

Single Cycle MIPS Datapath Extended To Handle Jump with Single Cycle MIPS Datapath Extended To Handle Jump with Control Unit AddedControl Unit Added

In this book version, ORI is not supported—no zero extend of immediate needed.

Figure 5.24 page 314

Book figure may have an error!

Function Field

rs

rt

PC +4

rd

R[rs]

R[rt]

Branch Target

PC +4

32

32

32

32

32

32PC +4

ALUOp (2-bits)00 = add01 = subtract10 = R-Type

imm16

Opcode

R[rt]

Page 4: Major CPU Design Steps

EECC550 - ShaabanEECC550 - Shaaban#4 Lec # 5 Winter 2008 1-6-2009

Drawbacks of Single-Cycle ProcessorDrawbacks of Single-Cycle Processor1. Long cycle time:

– All instructions must take as much time as the slowest:• Cycle time for load is longer than needed for all other instructions.

– Real memory is not as well-behaved as idealized memory• Cannot always complete data access in one (short) cycle.

2. Impossible to implement complex, variable-length instructions and complex addressing modes in a single cycle.

• e.g indirect memory addressing.

3. High and duplicate hardware resource requirements– Any hardware functional unit cannot be used more than once in a single cycle (e.g. ALUs).

4. Cannot pipeline (overlap) the processing of one instruction with the previous instructions.– (instruction pipelining, chapter 6).

CPI = 1

Page 5: Major CPU Design Steps

EECC550 - ShaabanEECC550 - Shaaban#5 Lec # 5 Winter 2008 1-6-2009

Abstract View of Single Cycle CPUAbstract View of Single Cycle CPU

PC

Nex

t P

C

Reg

iste

rF

etch ALU Reg

. W

rt

Mem

Acc

ess

Dat

aM

emInst

ruct

ion

Fet

ch

Res

ult

Sto

re

AL

Uct

r

Reg

Dst

AL

US

rc

Ext

Op

Mem

Wr

Eq

ual

Bra

nch,

Jum

p

Reg

Wr

Mem

Wr

Mem

Rd

MainControl

ALUcontrol

op

fun

Ext

One CPU Clock CycleDuration C = 8ns

One instruction per cycle CPI = 1

Assuming the following datapath/control hardware components delays:Memory Units: 2 ns ALU and adders: 2 nsRegister File: 1 ns Control Unit < 1 ns

2 ns1 ns

2 ns

2 ns

1 ns

Page 6: Major CPU Design Steps

EECC550 - ShaabanEECC550 - Shaaban#6 Lec # 5 Winter 2008 1-6-2009

Single Cycle Instruction TimingSingle Cycle Instruction Timing

PC Inst Memory mux ALU Data Mem mux

PC Reg FileInst Memory mux ALU mux

PC Inst Memory mux ALU Data Mem

PC Inst Memory cmp mux

Reg File

Reg File

Reg File

Arithmetic & Logical

Load

Store

Branch

Critical Path

setup

setup

(Determines CPU clock cycle, C)

Page 7: Major CPU Design Steps

EECC550 - ShaabanEECC550 - Shaaban#7 Lec # 5 Winter 2008 1-6-2009

Clock Cycle Time & Critical PathClock Cycle Time & Critical Path

• Critical path: the slowest path between any two storage devices

• Clock Cycle time is a function of the critical path, and must be greater than:

– Clock-to-Q + Longest Delay Path through the Combination Logic + Setup + Clock Skew

Clk

.

.

.

.

.

.

.

.

.

.

.

.

One CPU Clock CycleDuration C = 8ns here

Critical Path

Assuming the following datapath/control hardware components delays:Memory Units: 2 ns ALU and adders: 2 nsRegister File: 1 ns Control Unit < 1 ns

i.e longest delayLW in this case

Page 8: Major CPU Design Steps

EECC550 - ShaabanEECC550 - Shaaban#8 Lec # 5 Winter 2008 1-6-2009

Reducing Cycle Time: Multi-Cycle DesignReducing Cycle Time: Multi-Cycle Design• Cut combinational dependency graph by inserting registers / latches.• The same work is done in two or more shorter cycles, rather than one long

cycle.

storage element

Acyclic CombinationalLogic

storage element

storage element

Acyclic CombinationalLogic (A)

storage element

storage element

Acyclic CombinationalLogic (B)

=>

Place registers to:• Get a balanced clock cycle length• Save any results needed for the remaining cycles

One longcycle

Two shortercycles

Cycle 1

Cycle 2

e.g CPI =1e.g CPI =2

Storage Element:Register or memory

Page 9: Major CPU Design Steps

EECC550 - ShaabanEECC550 - Shaaban#9 Lec # 5 Winter 2008 1-6-2009

Basic MIPS Instruction Processing StepsBasic MIPS Instruction Processing Steps

Obtain instruction from program storage

Determine instruction type

Obtain operands from registers

Compute result value or status

Store result in register/memory if needed

(usually called Write Back).

Update program counter to address

of next instruction } Commonsteps for all instructions

Instruction

Fetch

Instruction

Decode

Execute

Result

Store

Next

Instruction

Instruction Mem[PC]

PC PC + 4

Done by Control Unit

Instruction Memory

Page 10: Major CPU Design Steps

EECC550 - ShaabanEECC550 - Shaaban#10 Lec # 5 Winter 2008 1-6-2009

Partitioning The Single Cycle DatapathPartitioning The Single Cycle Datapath Add registers between steps to break into cycles

PC

Nex

t P

C

Ope

rand

Fet

ch Exec Reg

. F

ile

Mem

Acc

ess

Dat

aM

em

Inst

ruct

ion

Fet

ch

Res

ult

Sto

re

AL

Uct

r

Reg

Dst

AL

US

rc

Ext

Op

Mem

Wr

Bra

nch,

Ju

mp

Reg

Wr

Mem

Wr

Mem

Rd

Instruction Fetch Cycle (IF)

Instruction Decode Cycle (ID)

Execution Cycle (EX)

Data Memory Access Cycle (MEM)

Write back Cycle (WB)

1 2 3 4 5

Place registers to:• Get a balanced clock cycle length• Save any results needed for the remaining cycles

2 ns1 ns

2 ns 2 ns 1 nsTo Control Unit

Page 11: Major CPU Design Steps

EECC550 - ShaabanEECC550 - Shaaban#11 Lec # 5 Winter 2008 1-6-2009

Example Multi-cycle DatapathExample Multi-cycle Datapath

PC

Nex

t P

C

Ext

ALU Reg

. F

ile

Mem

Acc

ess

Dat

aM

em

AL

Uct

r

Reg

Dst

AL

US

rc

Ext

Op

Bra

nch,

Jum

p

Reg

Wr

Mem

Wr

Mem

Rd

IR

A

B

R

M

RegFile

Mem

ToR

eg

Equ

al

Registers added: All clock-edge triggered (not shown register write enable control lines)

IR: Instruction registerA, B: Two registers to hold operands read from register file.R: or ALUOut, holds the output of the main ALUM: or Memory data register (MDR) to hold data read from data memoryCPU Clock Cycle Time: Worst cycle delay = C = 2ns (ignoring MUX, CLK-Q delays)

Instruction Fetch (IF) 2ns

Instruction Decode (ID) 1ns

Execution (EX) 2ns

Memory (MEM) 2ns

Write Back (WB) 1ns

To Control Unit

Assuming the following datapath/control hardware components delays:Memory Units: 2 ns ALU and adders: 2 nsRegister File: 1 ns Control Unit < 1 ns

Inst

ruct

ion

Fet

ch

1 2 3 4 5

Thus Clock Rate:f = 1 / 2ns = 500 MHz

Page 12: Major CPU Design Steps

EECC550 - ShaabanEECC550 - Shaaban#12 Lec # 5 Winter 2008 1-6-2009

Operations (Dependant RTN) for Each CycleOperations (Dependant RTN) for Each Cycle

Instruction Fetch

Instruction Decode

Execution

Memory

WriteBack

R-Type

IR Mem[PC]

A R[rs]

B R[rt]

R A funct B

R[rd] R

PC PC + 4

Logic Immediate

IR Mem[PC]

A R[rs]

B R[rt

R A OR ZeroExt[imm16]

R[rt] R

PC PC + 4

Load

IR Mem[PC]

A R[rs]B R[rt

R A + SignEx(Im16)

M Mem[R]

R[rt] M

PC PC + 4

Store

IR Mem[PC]

A R[rs]

B R[rt]

R A + SignEx(Im16)

Mem[R] B

PC PC + 4

Branch

IR Mem[PC]

A R[rs]

B R[rt]

Zero A - B

If Zero = 1:

PC PC + 4 +

(SignExt(imm16) x4)

else (i.e Zero =0):

PC PC + 4

IF

ID

EX

MEM

WB

Instruction Fetch (IF) & Instruction Decode cycles are common for all instructions

Page 13: Major CPU Design Steps

EECC550 - ShaabanEECC550 - Shaaban#13 Lec # 5 Winter 2008 1-6-2009

MIPS Multi-Cycle Datapath:MIPS Multi-Cycle Datapath: Five Cycles of LoadFive Cycles of Load

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5

IF ID EX MEM WBLoad

1- Instruction Fetch (IF): Fetch the instruction from instruction Memory.

2- Instruction Decode (ID): Operand Register Fetch and Instruction Decode.

3- Execute (EX): Calculate the effective memory address.

4- Memory (MEM): Read the data from the Data Memory.

5- Write Back (WB): Write the loaded data to the register file. Update PC.

CPI = 5

Page 14: Major CPU Design Steps

EECC550 - ShaabanEECC550 - Shaaban#14 Lec # 5 Winter 2008 1-6-2009

Multi-cycle Datapath Instruction CPIMulti-cycle Datapath Instruction CPI• R-Type/Immediate: Require four cycles, CPI = 4

– IF, ID, EX, WB

• Loads: Require five cycles, CPI = 5– IF, ID, EX, MEM, WB

• Stores: Require four cycles, CPI = 4– IF, ID, EX, MEM

• Branches/Jumps: Require three cycles, CPI = 3– IF, ID, EX

• Average or effective program CPI: 3 CPI 5 depending on program profile (instruction mix).

Page 15: Major CPU Design Steps

EECC550 - ShaabanEECC550 - Shaaban#15 Lec # 5 Winter 2008 1-6-2009

Single Cycle Vs. Multi-Cycle CPUSingle Cycle Vs. Multi-Cycle CPU

Clk

Cycle 1

Multiple Cycle Implementation:

IF ID EX MEM WB

Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10

IF ID EX MEM

Load Store

Clk

Single Cycle Implementation:

Load Store Waste

IF

R-type

Cycle 1 Cycle 2

8 ns

2ns (500 MHz)

Single-Cycle CPU:CPI = 1 C = 8nsOne million instructions take = I x CPI x C = 106 x 1 x 8x10-9 = 8 msec

Multi-Cycle CPU:CPI = 3 to 5 C = 2nsOne million instructions take from 106 x 3 x 2x10-9 = 6 msecto 106 x 5 x 2x10-9 = 10 msecdepending on instruction mix used.

8ns (125 MHz)

Assuming the following datapath/control hardware components delays:Memory Units: 2 ns ALU and adders: 2 nsRegister File: 1 ns Control Unit < 1 ns

f = 500 MHzf = 125 MHz

T = I x CPI x C

Page 16: Major CPU Design Steps

EECC550 - ShaabanEECC550 - Shaaban#16 Lec # 5 Winter 2008 1-6-2009

Finite State Machine (FSM) Control ModelFinite State Machine (FSM) Control Model• State specifies control points (outputs) for Register Transfer.• Control points (outputs) are assumed to depend only on the current state

and not inputs (i.e. Moore finite state machine)• Transfer (register/memory writes) and state transition occur upon exiting

the state on the falling edge of the clock.

State X

Register TransferControl Points

State Transition Depends on Inputs

Control State

Next StateLogic

Output Logic

inputs (opcode, conditions)

outputs (control points)

Next State

Last State

To datapath

Current State

Control Unit Design:

e.g Flip-Flops

Moore FiniteState Machine

Page 17: Major CPU Design Steps

EECC550 - ShaabanEECC550 - Shaaban#17 Lec # 5 Winter 2008 1-6-2009

Control Specification For Multi-cycle CPUControl Specification For Multi-cycle CPUFinite State Machine (FSM) - State Transition DiagramFinite State Machine (FSM) - State Transition Diagram

IR MEM[PC]

R-type

A R[rs]B R[rt]

R A fun B

R[rd] RPC PC + 4

R A or ZX

R[rt] RPC PC + 4

ORi

R A + SX

R[rt] MPC PC + 4

M MEM[R]

LW

R A + SX

MEM[R] BPC PC + 4

BEQ & Zero

BEQ & ~Zero

PC PC + 4 PC PC + 4+ SX || 00

SW

“instruction fetch”

“decode / operand fetch”

Execute

Memory

Write-back

To instruction fetch

To instruction fetchTo instruction fetch

13 states:4 State Flip-Flops needed

(Start state)

Page 18: Major CPU Design Steps

EECC550 - ShaabanEECC550 - Shaaban#18 Lec # 5 Winter 2008 1-6-2009

Traditional FSM ControllerTraditional FSM Controller

State

6

4

11nextState

op

Equal

control points

state op condnextstate control points

State Transition Table

datapath StateTo datapath

Outputs (Control points)

OpcodeCurrent State

State register (4 Flip-Flops)

Output Logic

Next StateLogic

Outputs

Inputs

Page 19: Major CPU Design Steps

EECC550 - ShaabanEECC550 - Shaaban#19 Lec # 5 Winter 2008 1-6-2009

Traditional FSM ControllerTraditional FSM Controller

datapath + state diagram => controldatapath + state diagram => control

• Translate RTN statements into control points.

• Assign states.

• Implement the controller.

More on FSM controller implementation in Appendix C

Page 20: Major CPU Design Steps

EECC550 - ShaabanEECC550 - Shaaban#20 Lec # 5 Winter 2008 1-6-2009

Mapping RTNs To Control Points ExamplesMapping RTNs To Control Points Examples& State Assignments& State Assignments

IR MEM[PC]

0000

R-type

A R[rs]B R[rt] 0001

R A fun B 0100

R[rd] RPC PC + 4

0101

R A or ZX 0110

R[rt] RPC PC + 4

0111

ORi

R A + SX 1000

R[rt] MPC PC + 4

1010

M MEM[R] 1001

LW

R A + SX 1011

MEM[R] BPC PC + 4 1100

BEQ & Zero

BEQ & ~Zero

PC PC + 4 0011

PC PC + 4+SX || 00 0010

SW

“instruction fetch”

“decode / operand fetch”

Execute

Memory

Write-back

imem_rd, IRen

Aen, Ben

ALUfun, Sen

RegDst,RegWr,PCen To instruction fetch

state 0000

To instruction fetch state 0000To instruction fetch state 0000

0

1

2

3

4

5 7

8

9

10

116

12

13 states:4 State Flip-Flops needed

Page 21: Major CPU Design Steps

EECC550 - ShaabanEECC550 - Shaaban#21 Lec # 5 Winter 2008 1-6-2009

Detailed Control Specification - State Transition TableCurrent Op field Z Next IR PC Ops Exec Mem Write-BackState en sel A B Ex Sr ALU S R W M M-R Wr

Dst0000 ?????? ? 0001 10001 BEQ 0 0011 1 10001 BEQ 1 0010 1 10001 R-type x 0100 1 10001 orI x 0110 1 10001 LW x 1000 1 10001 SW x 1011 1 10010 xxxxxx x 0000 1 10011 xxxxxx x 0000 1 00100 xxxxxx x 0101 0 1 fun 10101 xxxxxx x 0000 1 0 0 1 10110 xxxxxx x 0111 0 0 or 10111 xxxxxx x 0000 1 0 0 1 01000 xxxxxx x 1001 1 0 add 11001 xxxxxx x 1010 1 0 11010 xxxxxx x 0000 1 0 1 1 01011 xxxxxx x 1100 1 0 add 11100 xxxxxx x 0000 1 0 0 1

R

ORI

LW

SW

BEQ

IF

ID

Can be combined in one state

More on FSM controller implementation in Appendix C

Page 22: Major CPU Design Steps

EECC550 - ShaabanEECC550 - Shaaban#22 Lec # 5 Winter 2008 1-6-2009

Alternative Multiple Cycle Datapath (In Textbook)• Minimizes Hardware: 1 memory, 1 ALU

IdealMemory

Din

Address

32

32

32Dout

MemWr32

AL

U

3232

ALUOp

ALUControl

32

IRWr

Instru

ction R

eg

32

Reg File

Ra

Rw

busW

Rb5

5

32busA

32busB

RegWr

Rs

Rt

Mu

x

0

1

Rt

Rd

PCWr

ALUSrcA

Mux 01

RegDst

Mu

x

0

1

32

PC

MemtoReg

Extend

Mu

x

0

132

0

123

4

16Imm 32

ALUSrcB

Mu

x1

0

32

Zero

ZeroPCWrCond PCSrc

32

IorD

Mem

Data R

eg

AL

U O

ut

B

A

<< 2

MemRd

PC

Page 23: Major CPU Design Steps

EECC550 - ShaabanEECC550 - Shaaban#23 Lec # 5 Winter 2008 1-6-2009

Alternative Multiple Cycle Datapath (In Textbook)

• Shared instruction/data memory unit• A single ALU shared among instructions• Shared units require additional or widened multiplexors• Temporary registers to hold data between clock cycles of the instruction:

• Additional registers: Instruction Register (IR), Memory Data Register (MDR), A, B, ALUOut

(Figure 5.27 page 322)

rs

rt

rd

imm16

i.e MDR

Page 24: Major CPU Design Steps

EECC550 - ShaabanEECC550 - Shaaban#24 Lec # 5 Winter 2008 1-6-2009

Alternative Multiple Cycle Datapath With Control Lines (Fig 5.28 In Textbook)

(Figure 5.28 page 323)

(ORI not supported, Jump supported)

PC+ 4

BranchTarget

rs

rt

rd

2

2

2

imm16

32

32

32

32

32

32

32

PC

Page 25: Major CPU Design Steps

EECC550 - ShaabanEECC550 - Shaaban#25 Lec # 5 Winter 2008 1-6-2009

The Effect of The 1-bit Control Signals Signal Name

RegDst

RegWrite

ALUSrcA

MemRead

MemWrite

MemtoReg

IorD

IRWrite

PCWrite

PCWriteCond

Effect when deasserted (=0)

The register destination number for thewrite register comes from the rt field(instruction bits 20:16).

None

The first ALU operand is the PC

None

None

The value fed to the register write data input comes from ALUOut register.

The PC is used to supply the address to thememory unit.

None

None

None

Effect when asserted (=1)

The register destination number for thewrite register comes from the rd field(instruction bits 15:11).The register on the write register inputis written with the value on the Write data input.

The First ALU operand is register A (i.e R[rs])

Content of memory specified by the address input are put on the memory data output.

Memory contents specified by the address input is replaced by the value on the Write data input.

The value fed to the register write data input comes from data memory register (MDR).

The ALUOut register is used to supply the the address to the memory unit.

The output of the memory is written into Instruction Register (IR)

The PC is written; the source is controlled by PCSource

The PC is written if the Zero output of the ALU isalso active.

(Figure 5.29 page 324)

Page 26: Major CPU Design Steps

EECC550 - ShaabanEECC550 - Shaaban#26 Lec # 5 Winter 2008 1-6-2009

The Effect of The 2-bit Control Signals Signal Name

ALUOp

ALUSrcB

PCSource

Value (Binary)

00

01

10

00

01

10

11

00

01

10

Effect

The ALU performs an add operation

The ALU performs a subtract operation

The funct field of the instruction determines the ALU operation (R-Type)

The second input of the ALU comes from register B

The second input of the ALU is the constant 4

The second input of the ALU is the sign-extended 16-bitimmediate (imm16) field of the instruction in IR

The second input of the ALU is is the sign-extended 16-bitimmediate field of IR shifted left 2 bits (for branches)

Output of the ALU (PC+4) is sent to the PC for writing

The content of ALUOut (the branch target address) is sent to the PC for writing

The jump target address (IR[25:0] shifted left 2 bits and concatenated with PC+4[31:28] is sent to the PC for writing

(Figure 5.29 page 324)

i.e jump address

(i.e R[rs])

Page 27: Major CPU Design Steps

EECC550 - ShaabanEECC550 - Shaaban#27 Lec # 5 Winter 2008 1-6-2009

Instruction Fetch

Instruction Decode

Execution

Memory

WriteBack

R-Type

IR Mem[PC]PC PC + 4

A R[rs]

B R[rt]

ALUout PC + (SignExt(imm16) x4)

ALUout

A funct B

R[rd] ALUout

Load

IR Mem[PC]PC PC + 4

A R[rs]

B R[rt]

ALUout PC +

(SignExt(imm16) x4)

ALUout

A + SignEx(Imm16)

MDR Mem[ALUout]

R[rt] MDR

Store

IR Mem[PC]PC PC + 4

A R[rs]

B R[rt]

ALUout PC +

(SignExt(imm16) x4)

ALUout

A + SignEx(Imm16)

Mem[ALUout] B

Branch

IR Mem[PC]PC PC + 4

A R[rs]

B R[rt]

ALUout PC +

(SignExt(imm16) x4)

Zero A - B

Zero: PC ALUout

Jump

IR Mem[PC]PC PC + 4

A R[rs]

B R[rt]

ALUout PC +

(SignExt(imm16) x4)

PC Jump Address

IF

ID

EX

MEM

WB

Instruction Fetch (IF) & Instruction Decode (ID) cycles are common for all instructions

Operations (Dependant RTN) for Each CycleOperations (Dependant RTN) for Each Cycle

Page 28: Major CPU Design Steps

EECC550 - ShaabanEECC550 - Shaaban#28 Lec # 5 Winter 2008 1-6-2009

High-Level View of Finite State High-Level View of Finite State Machine ControlMachine Control

• First steps are independent of the instruction class• Then a series of sequences that depend on the instruction opcode• Then the control returns to fetch a new instruction.• Each box above represents one or several state.

(Figure 5.32)

(Figure 5.33) (Figure 5.34) (Figure 5.35) (Figure 5.36)

(Figure 5.31 page 332)

0-1

2-5 6-7 8 9

Page 29: Major CPU Design Steps

EECC550 - ShaabanEECC550 - Shaaban#29 Lec # 5 Winter 2008 1-6-2009

FSM State TransitionDiagram (From Book) IF ID

EX

MEM WB

WB

(Figure 5.38 page 339)

Total 10 states

More on FSM controller implementation in Appendix C

R[rd] ALUout

IR Mem[PC]PC PC + 4

ALUout A func B

A R[rs]

B R[rt]

ALUout PC +

(SignExt(imm16) x4)

Zero A -B

Zero: PC ALUout

ALUout

A + SignEx(Imm16)

PC Jump Address

R[rt] MDR

MDR Mem[ALUout]

Mem[ALUout] B

Page 30: Major CPU Design Steps

EECC550 - ShaabanEECC550 - Shaaban#30 Lec # 5 Winter 2008 1-6-2009

Instruction Fetch (IF) and Decode (ID) Instruction Fetch (IF) and Decode (ID) FSM StatesFSM States

IFID

(Figure 5.33) (Figure 5.34) (Figure 5.35) (Figure 5.36)

(Figure 5.32 page 333)

IR Mem[PC]PC PC + 4

A R[rs]

B R[rt]

ALUout PC + (SignExt(imm16) x4)

Page 31: Major CPU Design Steps

EECC550 - ShaabanEECC550 - Shaaban#31 Lec # 5 Winter 2008 1-6-2009

Instruction Fetch (IF) Cycle (State 0)

(Figure 5.28 page 323)

(ORI not supported, Jump supported)

PC+ 4

BranchTarget

rs

rt

rd

2

2

2

imm16

32

32

32

32

32

32

32

PC

IR Mem[PC]PC PC + 4

00

MemRead = 1 ALUSrcA = 0 IorD = 0 IRWrite =1 ALUSrcB = 01 ALUOp = 00 (add) PCWrite = 1 PCSource = 00

10

101

0

1

00

Add

1

Page 32: Major CPU Design Steps

EECC550 - ShaabanEECC550 - Shaaban#32 Lec # 5 Winter 2008 1-6-2009

Instruction Decode (ID) Cycle (State 1)

(Figure 5.28 page 323)

(ORI not supported, Jump supported)

PC+ 4

BranchTarget

rs

rt

rd

2

2

2

imm16

32

32

32

32

32

32

32

PC

A R[rs]

B R[rt]

ALUout PC + (SignExt(imm16) x4)

ALUSrcA = 0 ALUSrcB = 11 ALUOp = 00 (add)

00

Add

11

0

(Calculate branch target)

Page 33: Major CPU Design Steps

EECC550 - ShaabanEECC550 - Shaaban#33 Lec # 5 Winter 2008 1-6-2009

Load/Store Instructions FSM StatesLoad/Store Instructions FSM States

EX

MEM

WB To Instruction Fetch(Figure 5.32)

(From Instruction Decode)

(Figure 5.33 page 334)

ALUout A + SignEx(Imm16)

MDR Mem[ALUout]

Mem[ALUout] B

R[rt] MDR

i.e Effective address calculation

Page 34: Major CPU Design Steps

EECC550 - ShaabanEECC550 - Shaaban#34 Lec # 5 Winter 2008 1-6-2009

Load/Store Execution (EX) Cycle (State 2)

(Figure 5.28 page 323)

(ORI not supported, Jump supported)

PC+ 4

BranchTarget

rs

rt

rd

2

2

2

imm16

32

32

32

32

32

32

32

PC

ALUSrcA = 1 ALUSrcB = 10ALUOp = 00 (add)

00

Add

10

1

ALUout A + SignEx(Imm16)

Effective address calculation

Page 35: Major CPU Design Steps

EECC550 - ShaabanEECC550 - Shaaban#35 Lec # 5 Winter 2008 1-6-2009

(Figure 5.28 page 323)

Load Memory (MEM) Cycle (State 3)

(ORI not supported, Jump supported)

PC+ 4

BranchTarget

rs

rt

rd

2

2

2

imm16

32

32

32

32

32

32

32

PC

MDR Mem[ALUout] MemRead = 1 IorD = 1

11

Page 36: Major CPU Design Steps

EECC550 - ShaabanEECC550 - Shaaban#36 Lec # 5 Winter 2008 1-6-2009

(Figure 5.28 page 323)

Load Write Back (WB) Cycle (State 4)

(ORI not supported, Jump supported)

PC+ 4

BranchTarget

rs

rt

rd

2

2

2

imm16

32

32

32

32

32

32

32

PC

R[rt] MDR RegWrite = 1 MemtoReg = 1 RegDst = 0

1

0

1

Page 37: Major CPU Design Steps

EECC550 - ShaabanEECC550 - Shaaban#37 Lec # 5 Winter 2008 1-6-2009

(Figure 5.28 page 323)

Store Memory (MEM) Cycle (State 5)

(ORI not supported, Jump supported)

PC+ 4

BranchTarget

rs

rt

rd

2

2

2

imm16

32

32

32

32

32

32

32

PC

Mem[ALUout] B MemWrite = 1 IorD = 1

11

Page 38: Major CPU Design Steps

EECC550 - ShaabanEECC550 - Shaaban#38 Lec # 5 Winter 2008 1-6-2009

R-Type Instructions R-Type Instructions FSM StatesFSM States

EX

WB

To State 0 (Instruction Fetch) (Figure 5.32)

(From Instruction Decode)

(Figure 5.34 page 335)

ALUout A funct B

R[rd] ALUout

Page 39: Major CPU Design Steps

EECC550 - ShaabanEECC550 - Shaaban#39 Lec # 5 Winter 2008 1-6-2009

R-Type Execution (EX) Cycle (State 6)

(Figure 5.28 page 323)

(ORI not supported, Jump supported)

PC+ 4

BranchTarget

rs

rt

rd

2

2

2

imm16

32

32

32

32

32

32

32

PC

ALUout A funct B ALUSrcA = 1 ALUSrcB = 00 ALUOp = 10 (R-Type)

10

00

1

R-Type

Page 40: Major CPU Design Steps

EECC550 - ShaabanEECC550 - Shaaban#40 Lec # 5 Winter 2008 1-6-2009

(Figure 5.28 page 323)

R-Type Write Back (WB) Cycle (State 7)

(ORI not supported, Jump supported)

PC+ 4

BranchTarget

rs

rt

rd

2

2

2

imm16

32

32

32

32

32

32

32

PC

R[rd] ALUout RegWrite = 1 MemtoReg = 0 RegDst = 1

1

1

0

Page 41: Major CPU Design Steps

EECC550 - ShaabanEECC550 - Shaaban#41 Lec # 5 Winter 2008 1-6-2009

Jump Instruction Jump Instruction Single EX StateSingle EX State

Branch Instruction Branch Instruction Single EX StateSingle EX State

EXEX

To State 0 (Instruction Fetch) (Figure 5.32)

(From Instruction Decode)

To State 0 (Instruction Fetch) (Figure 5.32)

(From Instruction Decode)

(Figures 5.35, 5.36 page 337)

PC Jump AddressZero A - B

Zero : PC ALUout

Page 42: Major CPU Design Steps

EECC550 - ShaabanEECC550 - Shaaban#42 Lec # 5 Winter 2008 1-6-2009

(Figure 5.28 page 323)

Branch Execution (EX) Cycle (State 8)

(ORI not supported, Jump supported)

PC+ 4

BranchTarget

rs

rt

rd

2

2

2

imm16

32

32

32

32

32

32

32

PC

Zero A - B

Zero : PC ALUoutALUSrcA = 1 ALUSrcB = 00 ALUOp = 01 (Subtract)PCWriteCond = 1 PCSource = 01

011

01

Subtract

00

1

Page 43: Major CPU Design Steps

EECC550 - ShaabanEECC550 - Shaaban#43 Lec # 5 Winter 2008 1-6-2009

(Figure 5.28 page 323)

Jump Execution (EX) Cycle (State 9)

(ORI not supported, Jump supported)

PC+ 4

BranchTarget

rs

rt

rd

2

2

2

imm16

32

32

32

32

32

32

32

PC

PC Jump Address PCWrite = 1 PCSource = 10

101

1

Page 44: Major CPU Design Steps

EECC550 - ShaabanEECC550 - Shaaban#44 Lec # 5 Winter 2008 1-6-2009

MIPS Multi-cycle Datapath MIPS Multi-cycle Datapath Performance EvaluationPerformance Evaluation

• What is the average CPI?– State diagram gives CPI for each instruction type.

– Workload (program) below gives frequency of each type.

Type CPIi for type Frequency CPIi x freqIi

Arith/Logic 4 40% 1.6

Load 5 30% 1.5

Store 4 10% 0.4

branch 3 20% 0.6

Average CPI: 4.1

Better than CPI = 5 if all instructions took the same number of clock cycles (5).

T = I x CPI x C

Page 45: Major CPU Design Steps

EECC550 - ShaabanEECC550 - Shaaban#45 Lec # 5 Winter 2008 1-6-2009

• You are to add support for a new instruction, swap that exchanges the values of two registers to the MIPS multicycle datapath of Figure 5.28 on page 232

swap $rs, $rt• Swap used the R-Type format with: the value of field rs = the value of field rd • Add any necessary datapaths and control signals to the

multicycle datapath. Find a solution that minimizes the number of clock cycles required for the new instruction without modifying the register file. Justify the need for the modifications, if any.

• Show the necessary modifications to the multicycle control finite state machine of Figure 5.38 on page 339 when adding the swap instruction. For each new state added, provide the dependent RTN and active control signal values.

Adding Support for swap to Multi Cycle Datapath

(For More Practice Exercise 5.42)

i.e No additional register write ports

R[rt] R[rs]

R[rs] R[rt]

Page 46: Major CPU Design Steps

EECC550 - ShaabanEECC550 - Shaaban#46 Lec # 5 Winter 2008 1-6-2009

23

Adding swap Instruction Support to Multi Cycle Datapath Swap $rs, $rt R[rt] R[rs]

R[rs] R[rt]

We assume here rs = rd in instruction encoding

The outputs of A and B should be connected to the multiplexor controlled by MemtoReg if one of the two fields (rs and rd) contains the name of one of the registers being swapped. The other register is specified by rt. The MemtoReg control signal becomes two bits.

op rs rt rd[31-26] [25-21] [20-16] [10-6]

(For More Practice Exercise 5.42)

rs

rt

rd

imm16

PC+ 4

BranchTarget

22

2

R[rs]

R[rt]

Page 47: Major CPU Design Steps

EECC550 - ShaabanEECC550 - Shaaban#47 Lec # 5 Winter 2008 1-6-2009

A R[rs]

B R[rt]

ALUout PC +

(SignExt(imm16) x4)

IR Mem[PC]PC PC + 4

IF

ID

R[rd] B

R[rt] A

ALUout A func B

R[rd] ALUout

ALUout

A + SignEx(Imm16)EX

MEMWB

WB

Swap takes 4 cycles

WB1

WB2

Adding swap Instruction Support to Multi Cycle Datapath

(For More Practice Exercise 5.42)

Zero A -B

Zero: PC ALUout

Page 48: Major CPU Design Steps

EECC550 - ShaabanEECC550 - Shaaban#48 Lec # 5 Winter 2008 1-6-2009

• You are to add support for a new instruction, add3, that adds the values of three registers, to the MIPS multicycle datapath of Figure 5.28 on page 232 For example:

add3 $s0,$s1, $s2, $s3

Register $s0 gets the sum of $s1, $s2 and $s3.

The instruction encoding uses a modified R-format, with an additional register specifier rx added replacing the five low bits of the “funct” field.

• Add necessary datapath components, connections, and control signals to the multicycle datapath without modifying the register bank or adding additional ALUs. Find a solution that minimizes the number of clock cycles required for the new instruction. Justify the need for the modifications, if any.

• Show the necessary modifications to the multicycle control finite state machine of Figure 5.38 on page 339 when adding the add3 instruction. For each new state added, provide the dependent RTN and active control signal values.

Adding Support for add3 to Multi Cycle Datapath

(For More Practice Exercise 5.45)

OP rs rt rd rx

$s1 $s2 Not used

6 bits[31-26]

5 bits[25-21]

5 bits[20-16]

5 bits[15-11]

add3

5 bits [4-0]

$s0 $s3

6 bits [10-5]

Page 49: Major CPU Design Steps

EECC550 - ShaabanEECC550 - Shaaban#49 Lec # 5 Winter 2008 1-6-2009

Exercise 5.45: add3 instruction support to Multi Cycle Datapath Add3 $rd, $rs, $rt, $rx

R[rd] R[rs] + R[rt] + R[rx]

rx is a new register specifier in field [0-4] of the instructionNo additional register read ports or ALUs allowed

2

ReadSrc

1. ALUout is added as an extra input to first ALU operand MUX to use the previous ALU result as an input for the second addition. 2. A multiplexor should be added to select between rt and the new field rx containing register number of the 3rd operand (bits 4-0 for the instruction) for input for Read Register 2. This multiplexor will be controlled by a new one bit control signal called ReadSrc.

op rs rt rd rx[31-26] [25-21] [20-16] [10-6] [4-0]

ModifiedR-Format

WriteB

3. WriteB control line added to enable writing R[rx] to B

2

2

2 PC+ 4

BranchTarget

imm16

rx

rd

rs

rt

Page 50: Major CPU Design Steps

EECC550 - ShaabanEECC550 - Shaaban#50 Lec # 5 Winter 2008 1-6-2009

Exercise 5.45: add3 instruction support to Multi Cycle Datapath A R[rs]

B R[rt]

ALUout PC +

(SignExt(imm16) x4)

IR Mem[PC]PC PC + 4

IF

ID

ALUout A + B

B R[rx]

ALUout ALUout + B

ALUout A func B

Zero A -B

Zero: PC ALUout

ALUout

A + SignEx(Im16)EX

MEMWB

WB

EX1

EX2

R[rd] ALUout

Add3 takes 5 cycles

WriteB

WriteB

(For More Practice Exercise 5.45)