EECC550 - ShaabanEECC550 - Shaaban#1 Lec # 5 Winter 2003 1-6-2004
CPU Design StepsCPU Design Steps1. Analyze instruction set operations using independent ISA
=> RTN => datapath requirements.
2. Select required datapath components, connections & establish clock methodology.
3. Assemble datapath meeting the requirements.
4. Analyze the implementation of each instruction to determine setting of control points that effects the register transfer.
5. Design & assemble the control logic.
(Chapter 5.4)
EECC550 - ShaabanEECC550 - Shaaban#2 Lec # 5 Winter 2003 1-6-2004
imm
16
32
ALUctr
Clk
busW
RegWr
32
32
busA
32busB
55 5
Rw Ra Rb32 32-bitRegisters
Rs
Rt
Rt
RdRegDst
Exten
der
Mu
x
3216imm16
ALUSrcExtOp
Mu
x
MemtoReg
Clk
Data InWrEn32 Adr
DataMemory
MemWrA
LU
Zero
Instruction<31:0>
0
1
0
1
01
<21:25>
<16:20>
<11:15>
<0:15>
Imm16RdRtRs
=
Ad
der
Ad
der
PC
Clk
00
Mu
x
4
PCSrc
PC
Ext
Adr
InstMemory
BranchZero
0
1
PC+4
BranchTarget
R[rs]
R[rt]
(Includes ORI)
MainALU
Single Cycle MIPS Datapath: Single Cycle MIPS Datapath: CPI = 1, Long Clock CycleCPI = 1, Long Clock Cycle
Jump Not Included
EECC550 - ShaabanEECC550 - Shaaban#3 Lec # 5 Winter 2003 1-6-2004
Drawbacks of Single-Cycle ProcessorDrawbacks of Single-Cycle Processor
• Long cycle time.
• All instructions must take as much time as the slowest:– Cycle time for load is longer than needed for all other instructions.
• Real memory is not as well-behaved as idealized memory– Cannot always complete data access in one (short) cycle.
• Cannot pipeline (overlap) the processing of one instruction with the previous instructions.– (instruction pipelining, chapter 6).
EECC550 - ShaabanEECC550 - Shaaban#4 Lec # 5 Winter 2003 1-6-2004
Abstract View of Single Cycle CPUAbstract View of Single Cycle CPU
PC
Nex
t P
C
Reg
iste
rF
etch ALU Reg
. W
rt
Mem
Acc
ess
Dat
aM
emInst
ruct
ion
Fet
ch
Res
ult
Sto
re
AL
Uct
r
Reg
Dst
AL
US
rc
Ext
Op
Mem
Wr
Eq
ual
Bra
nch,
Jum
p
Reg
Wr
Mem
Wr
Mem
Rd
MainControl
ALUcontrol
op
fun
Ext
One CPU Clock CycleDuration C = 8ns
One instruction per cycle CPI = 1
EECC550 - ShaabanEECC550 - Shaaban#5 Lec # 5 Winter 2003 1-6-2004
Single Cycle Instruction TimingSingle Cycle Instruction Timing
PC Inst Memory mux ALU Data Mem mux
PC Reg FileInst Memory mux ALU mux
PC Inst Memory mux ALU Data Mem
PC Inst Memory cmp mux
Reg File
Reg File
Reg File
Arithmetic & Logical
Load
Store
Branch
Critical Path
setup
setup
(Determines CPU clock cycle, C)
EECC550 - ShaabanEECC550 - Shaaban#6 Lec # 5 Winter 2003 1-6-2004
Clock Cycle Time & Critical PathClock Cycle Time & Critical Path
• Critical path: the slowest path between any two storage devices
• Clock Cycle time is a function of the critical path, and must be greater than:
– Clock-to-Q + Longest Path through the Combination Logic + Setup
Clk
.
.
.
.
.
.
.
.
.
.
.
.
One CPU Clock CycleDuration C = 8ns here
EECC550 - ShaabanEECC550 - Shaaban#7 Lec # 5 Winter 2003 1-6-2004
Reducing Cycle Time: Multi-Cycle DesignReducing Cycle Time: Multi-Cycle Design• Cut combinational dependency graph by inserting registers / latches.• The same work is done in two or more shorter cycles, rather than one long
cycle.
storage element
Acyclic CombinationalLogic
storage element
storage element
Acyclic CombinationalLogic (A)
storage element
storage element
Acyclic CombinationalLogic (B)
=>
EECC550 - ShaabanEECC550 - Shaaban#8 Lec # 5 Winter 2003 1-6-2004
Instruction Processing StepsInstruction Processing Steps
Obtain instruction from program storage
Determine instruction type
Obtain operands from registers
Compute result value or status
Store result in register/memory if needed
(usually called Write Back).
Update program counter to address
of next instruction } Commonsteps for all instructions
Instruction
Fetch
Instruction
Decode
Execute
Result
Store
Next
Instruction
Instruction Mem[PC]
PC PC + 4 (For MIPS)
EECC550 - ShaabanEECC550 - Shaaban#9 Lec # 5 Winter 2003 1-6-2004
Partitioning The Single Cycle DatapathPartitioning The Single Cycle Datapath Add registers between steps to break into cycles
PC
Nex
t P
C
Ope
rand
Fet
ch Exec Reg
. F
ile
Mem
Acc
ess
Dat
aM
em
Inst
ruct
ion
Fet
ch
Res
ult
Sto
re
AL
Uct
r
Reg
Dst
AL
US
rc
Ext
Op
Mem
Wr
Bra
nch,
Ju
mp
Reg
Wr
Mem
Wr
Mem
Rd
Instruction Fetch Cycle (IF)
Instruction Decode Cycle (ID)
Execution Cycle (EX)
Data Memory Access Cycle (MEM)
Write back Cycle (WB)
1 2 3 4 5
EECC550 - ShaabanEECC550 - Shaaban#10 Lec # 5 Winter 2003 1-6-2004
Example Multi-cycle DatapathExample Multi-cycle Datapath
PC
Nex
t P
C
Ext
ALU Reg
. F
ile
Mem
Acc
ess
Dat
aM
em
AL
Uct
r
Reg
Dst
AL
US
rc
Ext
Op
Bra
nch,
Jum
p
Reg
Wr
Mem
Wr
Mem
Rd
IR
A
B
R
M
RegFile
Mem
ToR
eg
Equ
al
Registers added: (not shown register write enable control lines)
IR: Instruction registerA, B: Two registers to hold operands read from register file.R: or ALUOut, holds the output of the main ALUM: or Memory data register (MDR) to hold data read from data memoryCPU Clock Cycle Time: Worst cycle delay = C = 2ns (ignoring MUX, CLK-Q delays)
Instruction Fetch (IF) 2ns
Instruction Decode (ID) 1ns
Execution (EX) 2ns
Memory (MEM) 2ns
Write Back (WB) 1ns
To Control Unit
EECC550 - ShaabanEECC550 - Shaaban#11 Lec # 5 Winter 2003 1-6-2004
Operations (Dependant RTN) for Each CycleOperations (Dependant RTN) for Each Cycle
Instruction Fetch
Instruction Decode
Execution
Memory
WriteBack
R-Type
IR Mem[PC]
A R[rs]
B R[rt]
R A + B
R[rd] R
PC PC + 4
Logic Immediate
IR Mem[PC]
A R[rs]
R A OR ZeroExt[imm16]
R[rt] R
PC PC + 4
Load
IR Mem[PC]
A R[rs]
R A + SignEx(Im16)
M Mem[R]
R[rt] M
PC PC + 4
Store
IR Mem[PC]
A R[rs]
B R[rt]
R A + SignEx(Im16)
Mem[R] B
PC PC + 4
Branch
IR Mem[PC]
A R[rs]
B R[rt]
Zero R[rs] - R[rt]
If Zero = 1:
PC PC + 4 +
(SignExt(imm16) x4)
else (i.e Zero =0):
PC PC + 4
IF
ID
EX
MEM
WB
EECC550 - ShaabanEECC550 - Shaaban#12 Lec # 5 Winter 2003 1-6-2004
MIPS Multi-Cycle Datapath:MIPS Multi-Cycle Datapath: Five Cycles of LoadFive Cycles of Load
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5
IF ID EX MEM WBLoad
1- Instruction Fetch (IF): Fetch the instruction from instruction Memory.
2- Instruction Decode (ID): Operand Register Fetch and Instruction Decode.
3- Execute (EX): Calculate the effective memory address.
4- Memory (MEM): Read the data from the Data Memory.
5- Write Back (WB): Write the loaded data to the register file. Update PC.
EECC550 - ShaabanEECC550 - Shaaban#13 Lec # 5 Winter 2003 1-6-2004
Multi-cycle Datapath Instruction CPIMulti-cycle Datapath Instruction CPI• R-Type/Immediate: Require four cycles, CPI = 4
– IF, ID, EX, WB
• Loads: Require five cycles, CPI = 5– IF, ID, EX, MEM, WB
• Stores: Require four cycles, CPI = 4– IF, ID, EX, MEM
• Branches/Jumps: Require three cycles, CPI = 3– IF, ID, EX
• Average program CPI: 3 CPI 5 depending on program profile (instruction mix).
EECC550 - ShaabanEECC550 - Shaaban#14 Lec # 5 Winter 2003 1-6-2004
Single Cycle Vs. Multi-Cycle CPUSingle Cycle Vs. Multi-Cycle CPU
Clk
Cycle 1
Multiple Cycle Implementation:
IF ID EX MEM WB
Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10
IF ID EX MEM
Load Store
Clk
Single Cycle Implementation:
Load Store Waste
IF
R-type
Cycle 1 Cycle 2
8 ns
2ns (500 MHz)
Single-Cycle CPU:CPI = 1 C = 8nsOne million instructions take = I x CPI x C = 106 x 1 x 8x10-9 = 8 msec
Multi-Cycle CPU:CPI = 3 to 5 C = 2nsOne million instructions take from 106 x 3 x 2x10-9 = 6 msecto 106 x 5 x 2x10-9 = 10 msecdepending on instruction mix used.
8ns (125 MHz)
EECC550 - ShaabanEECC550 - Shaaban#15 Lec # 5 Winter 2003 1-6-2004
Finite State Machine (FSM) Control ModelFinite State Machine (FSM) Control Model
• State specifies control points for Register Transfer.
• Transfer occurs upon exiting state (falling edge).
State X
Register TransferControl Points
Depends on Input
Control State
Next StateLogic
Output Logic
inputs (conditions)
outputs (control points)
Next State
Last State
EECC550 - ShaabanEECC550 - Shaaban#16 Lec # 5 Winter 2003 1-6-2004
Control Specification For Multi-cycle CPUControl Specification For Multi-cycle CPUFinite State Machine (FSM) - State Transition DiagramFinite State Machine (FSM) - State Transition Diagram
IR MEM[PC]
R-type
A R[rs]B R[rt]
R A fun B
R[rd] RPC PC + 4
R A or ZX
R[rt] RPC PC + 4
ORi
R A + SX
R[rt] MPC PC + 4
M MEM[R]
LW
R A + SX
MEM[R] BPC PC + 4
BEQ & Zero
BEQ & ~Zero
PC PC + 4 PC PC + 4+ SX || 00
SW
“instruction fetch”
“decode / operand fetch”
Execute
Memory
Write-back
To instruction fetch
To instruction fetchTo instruction fetch
13 states:4 State Flip-Flops needed
EECC550 - ShaabanEECC550 - Shaaban#17 Lec # 5 Winter 2003 1-6-2004
Traditional FSM ControllerTraditional FSM Controller
State
6
4
11nextState
op
Equal
control points
state op condnextstate control points
Truth or Transition Table
datapath State
To datapath
EECC550 - ShaabanEECC550 - Shaaban#18 Lec # 5 Winter 2003 1-6-2004
Traditional FSM ControllerTraditional FSM Controller
datapath + state diagram => controldatapath + state diagram => control
• Translate RTN statements into control points.
• Assign states.
• Implement the controller.
EECC550 - ShaabanEECC550 - Shaaban#19 Lec # 5 Winter 2003 1-6-2004
Mapping RTNs To Control Points ExamplesMapping RTNs To Control Points Examples& State Assignments& State Assignments
IR MEM[PC]
0000
R-type
A R[rs]B R[rt] 0001
R A fun B 0100
R[rd] RPC PC + 4
0101
R A or ZX 0110
R[rt] RPC PC + 4
0111
ORi
R A + SX 1000
R[rt] MPC PC + 4
1010
M MEM[R] 1001
LW
R A + SX 1011
MEM[R] BPC PC + 4 1100
BEQ & Zero
BEQ & ~Zero
PC PC + 4 0011
PC PC + 4+SX || 00 0010
SW
“instruction fetch”
“decode / operand fetch”
Execute
Memory
Write-back
imem_rd, IRen
Aen, Ben
ALUfun, Sen
RegDst,RegWr,PCen To instruction fetch
state 0000
To instruction fetch state 0000To instruction fetch state 0000
0
1
2
3
4
5 7
8
9
10
116
12
EECC550 - ShaabanEECC550 - Shaaban#20 Lec # 5 Winter 2003 1-6-2004
Detailed Control Specification - State Transition TableCurrent Op field Z Next IR PC Ops Exec Mem Write-BackState en sel A B Ex Sr ALU S R W M M-R Wr
Dst0000 ?????? ? 0001 10001 BEQ 0 0011 1 10001 BEQ 1 0010 1 10001 R-type x 0100 1 10001 orI x 0110 1 10001 LW x 1000 1 10001 SW x 1011 1 10010 xxxxxx x 0000 1 10011 xxxxxx x 0000 1 00100 xxxxxx x 0101 0 1 fun 10101 xxxxxx x 0000 1 0 0 1 10110 xxxxxx x 0111 0 0 or 10111 xxxxxx x 0000 1 0 0 1 01000 xxxxxx x 1001 1 0 add 11001 xxxxxx x 1010 1 0 11010 xxxxxx x 0000 1 0 1 1 01011 xxxxxx x 1100 1 0 add 11100 xxxxxx x 0000 1 0 0 1
R
ORI
LW
SW
BEQ
IF
ID
Can be combines in one state
EECC550 - ShaabanEECC550 - Shaaban#21 Lec # 5 Winter 2003 1-6-2004
Alternative Multiple Cycle Datapath (In Textbook)• Miminizes Hardware: 1 memory, 1 ALU
IdealMemoryWrAdrDin
RAdr
32
32
32Dout
MemWr
32
AL
U
3232
ALUOp
ALUControl
Instru
ction R
eg
32
IRWr
32
Reg File
Ra
Rw
busW
Rb5
5
32busA
32busB
RegWr
Rs
Rt
Mu
x
0
1
Rt
Rd
PCWr
ALUSelA
Mux 01
RegDst
Mu
x
0
1
32
PC
MemtoReg
Extend
ExtOp
Mu
x0
132
0
1
23
4
16Imm 32
<< 2
ALUSelB
Mu
x1
0
Target32
Zero
ZeroPCWrCond PCSrc BrWr
32
IorD
AL
U O
ut
EECC550 - ShaabanEECC550 - Shaaban#22 Lec # 5 Winter 2003 1-6-2004
Alternative Multiple Cycle Datapath (In Textbook)
•Shared instruction/data memory unit• A single ALU shared among instructions• Shared units require additional or widened multiplexors• Temporary registers to hold data between clock cycles of the instruction:
• Additional registers: Instruction Register (IR), Memory Data Register (MDR), A, B, ALUOut
EECC550 - ShaabanEECC550 - Shaaban#23 Lec # 5 Winter 2003 1-6-2004
Alternative Multiple Cycle Datapath With Control Lines (Fig 5.33 In Textbook)
(ORI not supported, Jump supported)
PC+ 4
BranchTarget
EECC550 - ShaabanEECC550 - Shaaban#24 Lec # 5 Winter 2003 1-6-2004
Operations In Each CycleOperations In Each Cycle
Instruction Fetch
Instruction Decode
Execution
Memory
WriteBack
R-Type
IR Mem[PC]PC PC + 4
A R[rs]
B R[rt]
ALUout PC + (SignExt(imm16) x4)
ALUout A + B
R[rd] ALUout
Logic Immediate
IR Mem[PC]PC PC + 4
A R[rs]
B R[rt]
ALUout PC +
(SignExt(imm16) x4)
ALUout
A OR ZeroExt[imm16]
R[rt] ALUout
Load
IR Mem[PC]PC PC + 4
A R[rs]
B R[rt]
ALUout PC +
(SignExt(imm16) x4)
ALUout
A + SignEx(Im16)
M Mem[ALUout]
R[rt] Mem
Store
IR Mem[PC]PC PC + 4
A R[rs]
B R[rt]
ALUout PC +
(SignExt(imm16) x4)
ALUout
A + SignEx(Im16)
Mem[ALUout] B
Branch
IR Mem[PC]PC PC + 4
A R[rs]
B R[rt]
ALUout PC +
(SignExt(imm16) x4)
If Equal = 1
PC ALUout
IF
ID
EX
MEM
WB
EECC550 - ShaabanEECC550 - Shaaban#25 Lec # 5 Winter 2003 1-6-2004
High-Level View of Finite State High-Level View of Finite State Machine ControlMachine Control
• First steps are independent of the instruction class• Then a series of sequences that depend on the instruction opcode• Then the control returns to fetch a new instruction.• Each box above represents one or several state.
EECC550 - ShaabanEECC550 - Shaaban#26 Lec # 5 Winter 2003 1-6-2004
Instruction Fetch (IF) and Decode (ID) Instruction Fetch (IF) and Decode (ID) FSM StatesFSM States
IF ID
EECC550 - ShaabanEECC550 - Shaaban#27 Lec # 5 Winter 2003 1-6-2004
Load/Store Instructions FSM StatesLoad/Store Instructions FSM States
EX
MEM
WB
EECC550 - ShaabanEECC550 - Shaaban#28 Lec # 5 Winter 2003 1-6-2004
R-Type Instructions R-Type Instructions FSM StatesFSM States
EX
WB
EECC550 - ShaabanEECC550 - Shaaban#29 Lec # 5 Winter 2003 1-6-2004
Jump Instruction Jump Instruction Single EX StateSingle EX State
Branch Instruction Branch Instruction Single EX StateSingle EX State
EX EX
EECC550 - ShaabanEECC550 - Shaaban#30 Lec # 5 Winter 2003 1-6-2004
FSM State TransitionDiagram (From Book) IF ID
EX
MEM WB
WB
EECC550 - ShaabanEECC550 - Shaaban#31 Lec # 5 Winter 2003 1-6-2004
Finite State Machine (FSM) SpecificationFinite State Machine (FSM) SpecificationIR MEM[PC]
PC PC + 4
R-type
ALUout A fun B
R[rd] ALUout
ALUout A op ZX
R[rt] ALUout
ORiALUout
A + SX
R[rt] M
M MEM[ALUout]
LW
ALUout A + SX
MEM[ALUout] B
SW
“instruction fetch”
“decode”
Exe
cute
Mem
ory
Writ
e-ba
ck
0000
0001
0100
0101
0110
0111
1000
1001
1010
1011
1100
BEQ
0010
If A = B then PC ALUout
A R[rs]B R[rt]
ALUout PC +SX
To instruction fetch
To instruction fetchTo instruction fetch
EECC550 - ShaabanEECC550 - Shaaban#32 Lec # 5 Winter 2003 1-6-2004
MIPS Multi-cycle Datapath MIPS Multi-cycle Datapath Performance EvaluationPerformance Evaluation
• What is the average CPI?– State diagram gives CPI for each instruction type.
– Workload (program) below gives frequency of each type.
Type CPIi for type Frequency CPIi x freqIi
Arith/Logic 4 40% 1.6
Load 5 30% 1.5
Store 4 10% 0.4
branch 3 20% 0.6
Average CPI: 4.1
Better than CPI = 5 if all instructions took the same number of clock cycles (5).
Top Related