Systems Architecture I - College of Computing & Informatics
Transcript of Systems Architecture I - College of Computing & Informatics
Lec 8 Systems Architecture I 1
Systems Architecture I
TopicsA Simple Implementation of MIPS*
A Multicycle Implementation of MIPS**
*This lecture was derived from material in the text (sec. 5.1-5.3). **This lecture was derived from material in the text (sec. 5.4-5.5).
All figures from Computer Organization and Design: The Hardware/Software Approach, Second Edition, by David Patterson and John Hennessy, are copyrighted material (COPYRIGHT 1998 MORGAN KAUFMANN PUBLISHERS, INC. ALL RIGHTS RESERVED).
Notes Courtesy of Jeremy R. Johnson
Lec 8 Systems Architecture I 2
Systems Architecture I
Topic 1: A Simple Implementation of MIPS
Lec 8 Systems Architecture I 3
Introduction
• Objective: To understand how to implement the MIPS instruction set.
• Combine components (registers, memory, ALU) and add control
• Fetch-Execute cycle• Topics
– Sequential logic (elements with state) and timing (edge triggered)• Memory• Registers
– Datapath components: Instruction memory, PC, Add, Register File, ALU, Data Memory
– Implement a subset of MIPS in a single cycle computer– Shortcomings of a single cycle computer
Lec 8 Systems Architecture I 4
The Processor: Datapath & Control
• Implementation of MIPS• Simplified to contain only:
– memory-reference instructions: lw, sw– arithmetic-logical instructions: add, sub, and, or, slt– control flow instructions: beq, j
• Generic Implementation:
– use the program counter (PC) to supply instruction address– get the instruction from memory– read registers– use the instruction to decide exactly what to do
Lec 8 Systems Architecture I 5
Abstract View
• Two types of functional units:– elements that operate on data values (combinational)– elements that contain state (sequential)
Registers
Register #
Data
Register #
Data
�
memory
Address
Data
Register #
PC Instruction ALU
Instruction
�
memory
Address
Lec 8 Systems Architecture I 6
Timing
• Clocks used in synchronous logic – when should an element that contains state be updated?
• Edge-triggered timing
cycle time
rising edge
falling edge
Lec 8 Systems Architecture I 7
Edge Triggered Timing
• State updated at clock edge• read contents of some state elements, • send values through some combinational logic• write results to one or more state elements
Clock cycle
State
�
element
�
1Combinational logic
State
�
element
�
2
Lec 8 Systems Architecture I 8
Components for Simple Implementation
• Functional Units needed for each instruction
PC
Instruction
�
memory
Instruction
�
address
Instruction
a. Instruction memory b. Program counter
Add Sum
c. Adder16 32
Sign
�
extend
b. Sign-extension unit
MemRead
MemWrite
Data
�
memoryWrite
�
data
Read
�
data
a. Data memory unit
Address
ALU control
RegWrite
RegistersWrite
�
register
Read
�
data 1
Read
�
data 2
Read
�
register 1
Read
�
register 2
Write
�
data
ALU
�
resultALU
Data
Data
Register
�
numbers
a. Registers b. ALU
Zero5
5
5 3
Lec 8 Systems Architecture I 9
Building the Datapath
• Use Multiplexors to put components together
PC
Instruction
�
memory
Read
�
address
Instruction
16 32
Add ALU�
result
M
�
u
�
x
Registers
Write
�
registerWrite
�
data
Read
�
data 1
Read
�
data 2
Read
�
register 1Read
�
register 2
Shift
�
left 2
4
M
�
u
�
x
ALU operation3
RegWrite
MemRead
MemWrite
PCSrc
ALUSrc
MemtoReg
ALU
�
result
ZeroALU
Data
�
memory
Address
�
�
Write
�
data
Read
�
data M
�
u
�
x
Sign
�
extend
Add
Lec 8 Systems Architecture I 10
Adding Control
• Selecting the operations to perform (ALU, read/write, etc.)
• Controlling the flow of data (multiplexor inputs)
• Information comes from the 32 bits of the instruction
Lec 8 Systems Architecture I 11
MIPS Instructions
• add $t0,$s1,$s2
• lw $t0,256($t1)
000000 10001 10010 01000 00000 100000
op rs rt rd shamt funct
100011 01001 01000 0000 0001 0000 0000
op rs rt offset
Lec 8 Systems Architecture I 12
MIPS Instructions Continued
• beq $s1,$s2,100
• j 4096
000010 00 0000 0000 0000 0100 0000 0000
op address
000100 10001 10010 0000 0000 0001 1001
op rs rt offset
Lec 8 Systems Architecture I 13
ALU
• Control Lines000 and001 or010 add110 sub111 slt
Seta31
0
Result0a0
Result1a1
0
Result2a2
0
Operation
b31
b0
b1
b2
Result31
Overflow
Bnegate
Zero
ALU0Less
CarryIn
CarryOut
ALU1Less
CarryIn
CarryOut
ALU2Less
CarryIn
CarryOut
ALU31Less
CarryIn
Lec 8 Systems Architecture I 14
Determining ALU Control Bits
• ALUOp determined by instruction
Instruction ALUOp Instruction funct ALU ALUopcode operation action controlLW 00 load word xxxxxx add 010SW 00 store word xxxxxx add 010BEQ 01 branch eq xxxxxx sub 110R-type 10 add 100000 add 010R-type 10 sub 100010 sub 110R-type 10 and 100100 and 000R-type 10 or 100101 or 001R-type 10 slt 101010 slt 111
Lec 8 Systems Architecture I 15
• Must describe hardware to compute 3-bit ALU control input– given instruction type
00 = lw, sw01 = beq, 10 = arithmetic
– function code for arithmetic
• Describe it using a truth table (can turn into gates):
ALUOpcomputed from instruction type
ALU Control
ALUOp Funct field OperationALUOp1 ALUOp0 F5 F4 F3 F2 F1 F0
0 0 X X X X X X 010X 1 X X X X X X 1101 X X X 0 0 0 0 0101 X X X 0 0 1 0 1101 X X X 0 1 0 0 0001 X X X 0 1 0 1 0011 X X X 1 0 1 0 111
Lec 8 Systems Architecture I 16
Datapath with Control
PC
Instruction
�
memory
Read
�
address
Instruction
�
[31– 0]
Instruction [20– 16]
Instruction [25– 21]
Add
Instruction [5– 0]
MemtoReg
ALUOp
MemWrite
RegWrite
MemRead
BranchRegDst
ALUSrc
Instruction [31– 26]
4
16 32Instruction [15– 0]
0
0M
�
u
�
x
0
1
Control
Add ALU�
result
M
�
u
�
x
0
1
RegistersWrite
�
register
Write�
data
Read�
data 1
Read
�
data 2
Read
�
register 1
Read
�
register 2
Sign
�
extend
Shift
�
left 2
M
�
u
�
x1
ALU
�
result
Zero
Data
�
memoryWrite
�
data
Read
�
dataM
�
u
�
x
1
Instruction [15– 11]
ALU
�
control
ALUAddress
Lec 8 Systems Architecture I 17
Control Line Settings
• 8 control lines (control read/write and multiplexors)
Instruction RegDst ALUSrcMemto-
RegReg
WriteMem Read
Mem Write Branch ALUOp1 ALUp0
R-format 1 0 0 1 0 0 0 1 0lw 0 1 1 1 1 0 0 0 0sw X 1 X 0 0 1 0 0 0beq X 0 X 0 0 0 1 0 1
Lec 8 Systems Architecture I 18
Shortcomings of a Single Cycle Implementation
• Limits reuse of hardware components– each functional unit can be used only once per cycle– e.g. instruction and data memory required
• Inefficient– clock cycle determined by longest possible path in the machine– E.G. Assume time for:
• Memory units = 2 ns• ALU and adders = 2 ns• Register file (read or write) = 1 ns
– R-format = Inst Mem + reg read + ALU + reg write = 6 ns– Load = Inst Mem + reg read + ALU + Data Mem + reg write = 8 ns– Store = Inst Mem + reg read + ALU + Data Mem = 7 ns– Branch = Inst Mem + reg read + ALU = 5ns
Lec 8 Systems Architecture I 19
Loss of Efficiency Due to Single Cycle
• Assume 24% loads, 12% stores, 44% R-format, 18% branches, and 2% jumps (2 ns)
CPU perf [var]/CPU perf [single] = CPU Time [single]/CPU Time[var]= (IC * CPU clock[single]/(IC * CPU clock[var])= CPU clock[single]/CPU clock[var]= 8/(8 x 0.24 + 7 x 0.12 + 6 x 0.44 + 5 x 0.18 + 2 x 0.02)= 8/6.34 = 1.26
• Next class, we examine a multi-cycle implementation (5.4)
Lec 8 Systems Architecture I 20
Systems Architecture I
Topic 2: A Multicycle Implementation of MIPS
Lec 8 Systems Architecture I 21
Introduction
• Objective: To re-implement the MIPS instruction set using a multicycle implementation. The benefits are shared hardware and that instructions can take a different number of cycles (reduced computing time).
• How: break instructions into steps, each step taking a clock cycle.
• What’s New: – control depends on step– Need to store temporary values in internal registers
• Topics– Steps in multicycle implementation– Control using finite state machine– Control using microprogramming
Lec 8 Systems Architecture I 22
Multicycle Approach
• Break up the instructions into steps, each step takes a cycle– balance the amount of work to be done– restrict each cycle to use only one major functional unit– Functional units: memory, register file, and ALU
• At the end of a cycle– Use internal registers to store results between steps
Shift
�
left 2
PC
Memory
�
MemData
Write
�
data
M
�
u
�
x
0
1
RegistersWrite
�
register
Write
�
data
Read
�
data 1
Read
�
data 2
Read
�
register 1
Read
�
register 2
M�
u
�
x
0
1
M
�
u
�
x
0
1
4
Instruction�
[15– 0]
Sign
�
extend
3216
Instruction
�
[25– 21]
Instruction
�
[20– 16]
Instruction
�
[15– 0]
Instruction
�
register1 M
�
u
�
x
0
3
2
M
�
u
�
x
ALU
�
resultALU
Zero
Memory
�
data
�
register
Instruction�
[15– 11]
�
A
B
ALUOut
0
1
Address
Lec 8 Systems Architecture I 23
Execution Steps
1 Instruction fetch– IR = Memory[PC];– PC = PC + 4;
2 Instruction decode and register fetch– A = Reg[IR[25-21]];– B = Reg[IR[20-16]];– ALUOut = PC + (sign-extend (IR[15-0]) << 2);
Lec 8 Systems Architecture I 24
Execution Steps
3 Execution, memory address computation, or branch completion
– Memory reference• ALUOut = A + sign-extend (IR[15-0]);
– Arithmetic-logical instruction (R-type)• ALUOut = A op B;
– Branch• if (A == B) PC = ALUOut;
– Jump• PC = PC [31-28] || (IR[25-0]<<2)
Lec 8 Systems Architecture I 25
Execution Steps
4 Memory access or R-type instruction completion– memory reference
• MDR = Memory [ALUOut];
– or• Memory [ALUOut] = B;
– R-type completion• Reg [IR[15-11]] = ALUOut;
5 Memory read completion– Reg [IR[20-16]] = MDR;
Lec 8 Systems Architecture I 26
Summary
• 3 - 5 clock cycles• Uses “ optimistic” actions
Step nameAction for R-type
instructionsAction for memory-reference
instructionsAction for branches
Action for jumps
Instruction fetch IR = Memory[PC]PC = PC + 4
Instruction A = Reg [IR[25-21]]decode/register fetch B = Reg [IR[20-16]]
ALUOut = PC + (sign-extend (IR[15-0]) << 2)
Execution, address ALUOut = A op B ALUOut = A + sign-extend if (A ==B) then PC = PC [31-28] IIcomputation, branch/ (IR[15-0]) PC = ALUOut (IR[25-0]<<2)jump completion
Memory access or R-type Reg [IR[15-11]] = Load: MDR = Memory[ALUOut]completion ALUOut or
Store: Memory [ALUOut] = B
Memory read completion Load: Reg[IR[20-16]] = MDR
Lec 8 Systems Architecture I 27
CPI in a Multicycle CPU
• Using the previous multicycle implementation determine the average cycles per instruction (from gcc)– 22% loads– 11% stores– 49% R-format– 16% branches– 2% jumps
CPI = 0.22 ×××× 5 + 0.11 ×××× 4 + 0.49 ×××× 4 + 0.16 ×××× 3 + 0.02 ×××× 3= 4.04
Lec 8 Systems Architecture I 28
Control Signals
• 1-bit control signals– RegDst– RegWrite– ALUSrcA– MemRead– MemWrite– MemtoReg– IorD– IRWrite– PCWrite– PCWriteCond
• 2-bit control signals– ALUOp
• 00 (add)• 01 (sub)• 10 (funct field)
– ALUSrcB• 00 (B register)• 01 (constant 4)• 10 (lower 16 bits from IR)• 11 (lower 16 bits from IR
shifted left 2 bits)
– PCSource• 00 (output of ALU = PC+4)• 01 (ALUOut = branch addr)• 10 (jump target)
Lec 8 Systems Architecture I 29
Complete Datapath
Shift
�
left 2
PCM
�
u
�
x
0
1
RegistersWrite
�
register
Write
�data
Read
�
data 1
Read�
data 2
Read
�
register 1
Read
�
register 2
Instruction
�
[15– 11]
M
�
u
�
x
0
1
M
�
u
�
x
0
1
4
Instruction
�
[15– 0]
Sign
�
extend
3216
Instruction
�
[25– 21]
Instruction
�
[20– 16]
Instruction
�
[15– 0]
Instruction
�
register
ALU
�
control
ALU
�
resultALU
Zero
Memory
�
data
�
register
�
A
B
IorD
MemRead
MemWrite
MemtoReg
PCWriteCond
PCWrite
IRWrite
ALUOp
ALUSrcB
ALUSrcA
RegDst
PCSource
RegWrite
Control
Outputs
Op
�
[5– 0]
Instruction
�
[31-26]
Instruction [5– 0]
M
�
u
�
x
0
2
Jump
�
address [31-0]Instruction [25– 0] 26 28Shift
�
left 2
PC [31-28]
1
1 M
�
u
�
x
0
3
2
M
�
u
�
x
0
1ALUOut
Memory
MemData
Write
�
data
Address
Lec 8 Systems Architecture I 30
Finite State Machine
• Set of states• Next function determined by input and current state• Output determined by current state and possibly input
• Moore machine (output determined only by current state)
Next-state
�
functionCurrent state
Clock
Output
�
function
Next
�
state
Outputs
Inputs
Lec 8 Systems Architecture I 31
Control Using FSM
PCWrite
�
PCSource = 10
ALUSrcA = 1�
ALUSrcB = 00
�
ALUOp = 01
�
PCWriteCond
�
PCSource = 01
ALUSrcA =1
�
ALUSrcB = 00
�
ALUOp= 10
RegDst = 1
�
RegWrite
�
MemtoReg = 0
MemWrite
�
IorD = 1MemRead
�
IorD = 1
ALUSrcA = 1
�
ALUSrcB = 10
�
ALUOp = 00
RegDst = 0
�
RegWrite
�
MemtoReg = 1�
�
ALUSrcA = 0
�ALUSrcB = 11
�
ALUOp = 00
MemRead
�
ALUSrcA = 0
�
IorD = 0
�
IRWrite
�
ALUSrcB = 01
�
ALUOp = 00
�
PCWrite
�
PCSource = 00
Instruction fetchInstruction decode/
�
register fetch
Jump
�
completionBranch
�
completionExecution
Memory address
�
computation
Memory
�
accessMemory
�
access R-type completion
Write-back step
(Op = 'LW') or (Op = 'SW') (Op = R-type)
(Op
= 'BEQ
')
(Op
= 'J
')
(Op = 'SW
')
(Op
= 'L
W')
4
01
9862
753
Start
Lec 8 Systems Architecture I 32
Implementation of FSM
PCWrite
PCWriteCond
IorD
MemtoReg
PCSource
ALUOp
ALUSrcB
ALUSrcA
RegWrite
RegDst
NS3NS2NS1NS0
Op5
Op4
Op3
Op2
Op1
Op0
S3
S2
S1
S0
State register
IRWrite
MemRead
MemWrite
Instruction register
�
opcode field
Outputs
Control logic
Inputs
Lec 8 Systems Architecture I 33
Control using Microprogramming
• Represent asserted values on control lines symbolically (microinstructions)– Fields: Label, ALU control, SRC1, SRC2, Register control, Memory,
PCWrite control, Sequencing– specifies non-overlapping set of control signals
• Placed in ROM or PLA (provides address for microinstructions)
• Sequencing mechanism– next microinstruction– branch to microinstruction (FETCH) for next MIPS instruction– choose next microinstruction based on control unit input (dispatch).
Implemented using a table of addresses (dispatch table).
Lec 8 Systems Architecture I 34
Microinstruction FieldsField name Value Signals act ive Comm ent
Add ALUOp = 00 Cause the ALU to add.ALU control Subt ALUOp = 01 Cause the ALU to subtract; this implements the compare for
branches.Func code ALUOp = 10 Use the instruction's function code to determine ALU control.
SRC1 PC ALUSrcA = 0 Use the PC as the first ALU input.A ALUSrcA = 1 Register A is the first ALU input.B ALUSrcB = 00 Register B is the second ALU input.
SRC2 4 ALUSrcB = 01 Use 4 as the second ALU input.Extend ALUSrcB = 10 Use output of the sign extension unit as the second ALU input.Extshft ALUSrcB = 11 Use the output of the shift-by-two unit as the second ALU input.Read Read two registers using the rs and rt fields of the IR as the register
numbers and putting the data into registers A and B.Write ALU RegWrite, Write a register using the rd field of the IR as the register number and
Register RegDst = 1, the contents of the ALUOut as the data.control MemtoReg = 0
Write MDR RegWrite, Write a register using the rt field of the IR as the register number andRegDst = 0, the contents of the MDR as the data.MemtoReg = 1
Read PC MemRead, Read memory using the PC as address; write result into IR (and lorD = 0 the MDR).
Memory Read ALU MemRead, Read memory using the ALUOut as address; write result into MDR.lorD = 1
Write ALU MemWrite, Write memory using the ALUOut as address, contents of B as thelorD = 1 data.
ALU PCSource = 00 Write the output of the ALU into the PC.PCWrite
PC write control ALUOut-cond PCSource = 01, If the Zero output of the ALU is active, write the PC with the contentsPCWriteCond of the register ALUOut.
jump address PCSource = 10, Write the PC with the jump address from the instruction.PCWrite
Seq AddrCtl = 11 Choose the next microinstruction sequentially.Sequencing Fetch AddrCtl = 00 Go to the first microinstruction to begin a new instruction.
Dispatch 1 AddrCtl = 01 Dispatch using the ROM 1.Dispatch 2 AddrCtl = 10 Dispatch using the ROM 2.
Lec 8 Systems Architecture I 35
Microprogram
LabelALU
control SRC1 SRC2Register control Memory
PCWrite control Sequencing
Fetch Add PC 4 Read PC ALU SeqAdd PC Extshft Read Dispatch 1
Mem1 Add A Extend Dispatch 2LW2 Read ALU Seq
Write MDR FetchSW2 Write ALU FetchRformat1 Func code A B Seq
Write ALU FetchBEQ1 Subt A B ALUOut-cond FetchJUMP1 Jump address Fetch
Lec 8 Systems Architecture I 36
Implementation
PCWritePCWriteCondIorD
MemtoRegPCSourceALUOpALUSrcBALUSrcARegWrite
AddrCtl
Outputs
Microcode memory
IRWrite
MemReadMemWrite
RegDst
Control unit
Input
Microprogram counter
Address select logic
Op[
5–0]
Adder
1
Datapath
Instruction register
�
opcode field
BWrite