Systems Architecture I - College of Computing & Informatics

36
Lec 8 Systems Architecture I 1 Systems Architecture I Topics A Simple Implementation of MIPS * A Multicycle Implementation of MIPS ** *This lecture was derived from material in the text (sec. 5.1-5.3). **This lecture was derived from material in the text (sec. 5.4-5.5). All figures from Computer Organization and Design: The Hardware/Software Approach, Second Edition, by David Patterson and John Hennessy, are copyrighted material (COPYRIGHT 1998 MORGAN KAUFMANN PUBLISHERS, INC. ALL RIGHTS RESERVED). Notes Courtesy of Jeremy R. Johnson

Transcript of Systems Architecture I - College of Computing & Informatics

Page 1: Systems Architecture I - College of Computing & Informatics

Lec 8 Systems Architecture I 1

Systems Architecture I

TopicsA Simple Implementation of MIPS*

A Multicycle Implementation of MIPS**

*This lecture was derived from material in the text (sec. 5.1-5.3). **This lecture was derived from material in the text (sec. 5.4-5.5).

All figures from Computer Organization and Design: The Hardware/Software Approach, Second Edition, by David Patterson and John Hennessy, are copyrighted material (COPYRIGHT 1998 MORGAN KAUFMANN PUBLISHERS, INC. ALL RIGHTS RESERVED).

Notes Courtesy of Jeremy R. Johnson

Page 2: Systems Architecture I - College of Computing & Informatics

Lec 8 Systems Architecture I 2

Systems Architecture I

Topic 1: A Simple Implementation of MIPS

Page 3: Systems Architecture I - College of Computing & Informatics

Lec 8 Systems Architecture I 3

Introduction

• Objective: To understand how to implement the MIPS instruction set.

• Combine components (registers, memory, ALU) and add control

• Fetch-Execute cycle• Topics

– Sequential logic (elements with state) and timing (edge triggered)• Memory• Registers

– Datapath components: Instruction memory, PC, Add, Register File, ALU, Data Memory

– Implement a subset of MIPS in a single cycle computer– Shortcomings of a single cycle computer

Page 4: Systems Architecture I - College of Computing & Informatics

Lec 8 Systems Architecture I 4

The Processor: Datapath & Control

• Implementation of MIPS• Simplified to contain only:

– memory-reference instructions: lw, sw– arithmetic-logical instructions: add, sub, and, or, slt– control flow instructions: beq, j

• Generic Implementation:

– use the program counter (PC) to supply instruction address– get the instruction from memory– read registers– use the instruction to decide exactly what to do

Page 5: Systems Architecture I - College of Computing & Informatics

Lec 8 Systems Architecture I 5

Abstract View

• Two types of functional units:– elements that operate on data values (combinational)– elements that contain state (sequential)

Registers

Register #

Data

Register #

Data

memory

Address

Data

Register #

PC Instruction ALU

Instruction

memory

Address

Page 6: Systems Architecture I - College of Computing & Informatics

Lec 8 Systems Architecture I 6

Timing

• Clocks used in synchronous logic – when should an element that contains state be updated?

• Edge-triggered timing

cycle time

rising edge

falling edge

Page 7: Systems Architecture I - College of Computing & Informatics

Lec 8 Systems Architecture I 7

Edge Triggered Timing

• State updated at clock edge• read contents of some state elements, • send values through some combinational logic• write results to one or more state elements

Clock cycle

State

element

1Combinational logic

State

element

2

Page 8: Systems Architecture I - College of Computing & Informatics

Lec 8 Systems Architecture I 8

Components for Simple Implementation

• Functional Units needed for each instruction

PC

Instruction

memory

Instruction

address

Instruction

a. Instruction memory b. Program counter

Add Sum

c. Adder16 32

Sign

extend

b. Sign-extension unit

MemRead

MemWrite

Data

memoryWrite

data

Read

data

a. Data memory unit

Address

ALU control

RegWrite

RegistersWrite

register

Read

data 1

Read

data 2

Read

register 1

Read

register 2

Write

data

ALU

resultALU

Data

Data

Register

numbers

a. Registers b. ALU

Zero5

5

5 3

Page 9: Systems Architecture I - College of Computing & Informatics

Lec 8 Systems Architecture I 9

Building the Datapath

• Use Multiplexors to put components together

PC

Instruction

memory

Read

address

Instruction

16 32

Add ALU�

result

M

u

x

Registers

Write

registerWrite

data

Read

data 1

Read

data 2

Read

register 1Read

register 2

Shift

left 2

4

M

u

x

ALU operation3

RegWrite

MemRead

MemWrite

PCSrc

ALUSrc

MemtoReg

ALU

result

ZeroALU

Data

memory

Address

Write

data

Read

data M

u

x

Sign

extend

Add

Page 10: Systems Architecture I - College of Computing & Informatics

Lec 8 Systems Architecture I 10

Adding Control

• Selecting the operations to perform (ALU, read/write, etc.)

• Controlling the flow of data (multiplexor inputs)

• Information comes from the 32 bits of the instruction

Page 11: Systems Architecture I - College of Computing & Informatics

Lec 8 Systems Architecture I 11

MIPS Instructions

• add $t0,$s1,$s2

• lw $t0,256($t1)

000000 10001 10010 01000 00000 100000

op rs rt rd shamt funct

100011 01001 01000 0000 0001 0000 0000

op rs rt offset

Page 12: Systems Architecture I - College of Computing & Informatics

Lec 8 Systems Architecture I 12

MIPS Instructions Continued

• beq $s1,$s2,100

• j 4096

000010 00 0000 0000 0000 0100 0000 0000

op address

000100 10001 10010 0000 0000 0001 1001

op rs rt offset

Page 13: Systems Architecture I - College of Computing & Informatics

Lec 8 Systems Architecture I 13

ALU

• Control Lines000 and001 or010 add110 sub111 slt

Seta31

0

Result0a0

Result1a1

0

Result2a2

0

Operation

b31

b0

b1

b2

Result31

Overflow

Bnegate

Zero

ALU0Less

CarryIn

CarryOut

ALU1Less

CarryIn

CarryOut

ALU2Less

CarryIn

CarryOut

ALU31Less

CarryIn

Page 14: Systems Architecture I - College of Computing & Informatics

Lec 8 Systems Architecture I 14

Determining ALU Control Bits

• ALUOp determined by instruction

Instruction ALUOp Instruction funct ALU ALUopcode operation action controlLW 00 load word xxxxxx add 010SW 00 store word xxxxxx add 010BEQ 01 branch eq xxxxxx sub 110R-type 10 add 100000 add 010R-type 10 sub 100010 sub 110R-type 10 and 100100 and 000R-type 10 or 100101 or 001R-type 10 slt 101010 slt 111

Page 15: Systems Architecture I - College of Computing & Informatics

Lec 8 Systems Architecture I 15

• Must describe hardware to compute 3-bit ALU control input– given instruction type

00 = lw, sw01 = beq, 10 = arithmetic

– function code for arithmetic

• Describe it using a truth table (can turn into gates):

ALUOpcomputed from instruction type

ALU Control

ALUOp Funct field OperationALUOp1 ALUOp0 F5 F4 F3 F2 F1 F0

0 0 X X X X X X 010X 1 X X X X X X 1101 X X X 0 0 0 0 0101 X X X 0 0 1 0 1101 X X X 0 1 0 0 0001 X X X 0 1 0 1 0011 X X X 1 0 1 0 111

Page 16: Systems Architecture I - College of Computing & Informatics

Lec 8 Systems Architecture I 16

Datapath with Control

PC

Instruction

memory

Read

address

Instruction

[31– 0]

Instruction [20– 16]

Instruction [25– 21]

Add

Instruction [5– 0]

MemtoReg

ALUOp

MemWrite

RegWrite

MemRead

BranchRegDst

ALUSrc

Instruction [31– 26]

4

16 32Instruction [15– 0]

0

0M

u

x

0

1

Control

Add ALU�

result

M

u

x

0

1

RegistersWrite

register

Write�

data

Read�

data 1

Read

data 2

Read

register 1

Read

register 2

Sign

extend

Shift

left 2

M

u

x1

ALU

result

Zero

Data

memoryWrite

data

Read

dataM

u

x

1

Instruction [15– 11]

ALU

control

ALUAddress

Page 17: Systems Architecture I - College of Computing & Informatics

Lec 8 Systems Architecture I 17

Control Line Settings

• 8 control lines (control read/write and multiplexors)

Instruction RegDst ALUSrcMemto-

RegReg

WriteMem Read

Mem Write Branch ALUOp1 ALUp0

R-format 1 0 0 1 0 0 0 1 0lw 0 1 1 1 1 0 0 0 0sw X 1 X 0 0 1 0 0 0beq X 0 X 0 0 0 1 0 1

Page 18: Systems Architecture I - College of Computing & Informatics

Lec 8 Systems Architecture I 18

Shortcomings of a Single Cycle Implementation

• Limits reuse of hardware components– each functional unit can be used only once per cycle– e.g. instruction and data memory required

• Inefficient– clock cycle determined by longest possible path in the machine– E.G. Assume time for:

• Memory units = 2 ns• ALU and adders = 2 ns• Register file (read or write) = 1 ns

– R-format = Inst Mem + reg read + ALU + reg write = 6 ns– Load = Inst Mem + reg read + ALU + Data Mem + reg write = 8 ns– Store = Inst Mem + reg read + ALU + Data Mem = 7 ns– Branch = Inst Mem + reg read + ALU = 5ns

Page 19: Systems Architecture I - College of Computing & Informatics

Lec 8 Systems Architecture I 19

Loss of Efficiency Due to Single Cycle

• Assume 24% loads, 12% stores, 44% R-format, 18% branches, and 2% jumps (2 ns)

CPU perf [var]/CPU perf [single] = CPU Time [single]/CPU Time[var]= (IC * CPU clock[single]/(IC * CPU clock[var])= CPU clock[single]/CPU clock[var]= 8/(8 x 0.24 + 7 x 0.12 + 6 x 0.44 + 5 x 0.18 + 2 x 0.02)= 8/6.34 = 1.26

• Next class, we examine a multi-cycle implementation (5.4)

Page 20: Systems Architecture I - College of Computing & Informatics

Lec 8 Systems Architecture I 20

Systems Architecture I

Topic 2: A Multicycle Implementation of MIPS

Page 21: Systems Architecture I - College of Computing & Informatics

Lec 8 Systems Architecture I 21

Introduction

• Objective: To re-implement the MIPS instruction set using a multicycle implementation. The benefits are shared hardware and that instructions can take a different number of cycles (reduced computing time).

• How: break instructions into steps, each step taking a clock cycle.

• What’s New: – control depends on step– Need to store temporary values in internal registers

• Topics– Steps in multicycle implementation– Control using finite state machine– Control using microprogramming

Page 22: Systems Architecture I - College of Computing & Informatics

Lec 8 Systems Architecture I 22

Multicycle Approach

• Break up the instructions into steps, each step takes a cycle– balance the amount of work to be done– restrict each cycle to use only one major functional unit– Functional units: memory, register file, and ALU

• At the end of a cycle– Use internal registers to store results between steps

Shift

left 2

PC

Memory

MemData

Write

data

M

u

x

0

1

RegistersWrite

register

Write

data

Read

data 1

Read

data 2

Read

register 1

Read

register 2

M�

u

x

0

1

M

u

x

0

1

4

Instruction�

[15– 0]

Sign

extend

3216

Instruction

[25– 21]

Instruction

[20– 16]

Instruction

[15– 0]

Instruction

register1 M

u

x

0

3

2

M

u

x

ALU

resultALU

Zero

Memory

data

register

Instruction�

[15– 11]

A

B

ALUOut

0

1

Address

Page 23: Systems Architecture I - College of Computing & Informatics

Lec 8 Systems Architecture I 23

Execution Steps

1 Instruction fetch– IR = Memory[PC];– PC = PC + 4;

2 Instruction decode and register fetch– A = Reg[IR[25-21]];– B = Reg[IR[20-16]];– ALUOut = PC + (sign-extend (IR[15-0]) << 2);

Page 24: Systems Architecture I - College of Computing & Informatics

Lec 8 Systems Architecture I 24

Execution Steps

3 Execution, memory address computation, or branch completion

– Memory reference• ALUOut = A + sign-extend (IR[15-0]);

– Arithmetic-logical instruction (R-type)• ALUOut = A op B;

– Branch• if (A == B) PC = ALUOut;

– Jump• PC = PC [31-28] || (IR[25-0]<<2)

Page 25: Systems Architecture I - College of Computing & Informatics

Lec 8 Systems Architecture I 25

Execution Steps

4 Memory access or R-type instruction completion– memory reference

• MDR = Memory [ALUOut];

– or• Memory [ALUOut] = B;

– R-type completion• Reg [IR[15-11]] = ALUOut;

5 Memory read completion– Reg [IR[20-16]] = MDR;

Page 26: Systems Architecture I - College of Computing & Informatics

Lec 8 Systems Architecture I 26

Summary

• 3 - 5 clock cycles• Uses “ optimistic” actions

Step nameAction for R-type

instructionsAction for memory-reference

instructionsAction for branches

Action for jumps

Instruction fetch IR = Memory[PC]PC = PC + 4

Instruction A = Reg [IR[25-21]]decode/register fetch B = Reg [IR[20-16]]

ALUOut = PC + (sign-extend (IR[15-0]) << 2)

Execution, address ALUOut = A op B ALUOut = A + sign-extend if (A ==B) then PC = PC [31-28] IIcomputation, branch/ (IR[15-0]) PC = ALUOut (IR[25-0]<<2)jump completion

Memory access or R-type Reg [IR[15-11]] = Load: MDR = Memory[ALUOut]completion ALUOut or

Store: Memory [ALUOut] = B

Memory read completion Load: Reg[IR[20-16]] = MDR

Page 27: Systems Architecture I - College of Computing & Informatics

Lec 8 Systems Architecture I 27

CPI in a Multicycle CPU

• Using the previous multicycle implementation determine the average cycles per instruction (from gcc)– 22% loads– 11% stores– 49% R-format– 16% branches– 2% jumps

CPI = 0.22 ×××× 5 + 0.11 ×××× 4 + 0.49 ×××× 4 + 0.16 ×××× 3 + 0.02 ×××× 3= 4.04

Page 28: Systems Architecture I - College of Computing & Informatics

Lec 8 Systems Architecture I 28

Control Signals

• 1-bit control signals– RegDst– RegWrite– ALUSrcA– MemRead– MemWrite– MemtoReg– IorD– IRWrite– PCWrite– PCWriteCond

• 2-bit control signals– ALUOp

• 00 (add)• 01 (sub)• 10 (funct field)

– ALUSrcB• 00 (B register)• 01 (constant 4)• 10 (lower 16 bits from IR)• 11 (lower 16 bits from IR

shifted left 2 bits)

– PCSource• 00 (output of ALU = PC+4)• 01 (ALUOut = branch addr)• 10 (jump target)

Page 29: Systems Architecture I - College of Computing & Informatics

Lec 8 Systems Architecture I 29

Complete Datapath

Shift

left 2

PCM

u

x

0

1

RegistersWrite

register

Write

�data

Read

data 1

Read�

data 2

Read

register 1

Read

register 2

Instruction

[15– 11]

M

u

x

0

1

M

u

x

0

1

4

Instruction

[15– 0]

Sign

extend

3216

Instruction

[25– 21]

Instruction

[20– 16]

Instruction

[15– 0]

Instruction

register

ALU

control

ALU

resultALU

Zero

Memory

data

register

A

B

IorD

MemRead

MemWrite

MemtoReg

PCWriteCond

PCWrite

IRWrite

ALUOp

ALUSrcB

ALUSrcA

RegDst

PCSource

RegWrite

Control

Outputs

Op

[5– 0]

Instruction

[31-26]

Instruction [5– 0]

M

u

x

0

2

Jump

address [31-0]Instruction [25– 0] 26 28Shift

left 2

PC [31-28]

1

1 M

u

x

0

3

2

M

u

x

0

1ALUOut

Memory

MemData

Write

data

Address

Page 30: Systems Architecture I - College of Computing & Informatics

Lec 8 Systems Architecture I 30

Finite State Machine

• Set of states• Next function determined by input and current state• Output determined by current state and possibly input

• Moore machine (output determined only by current state)

Next-state

functionCurrent state

Clock

Output

function

Next

state

Outputs

Inputs

Page 31: Systems Architecture I - College of Computing & Informatics

Lec 8 Systems Architecture I 31

Control Using FSM

PCWrite

PCSource = 10

ALUSrcA = 1�

ALUSrcB = 00

ALUOp = 01

PCWriteCond

PCSource = 01

ALUSrcA =1

ALUSrcB = 00

ALUOp= 10

RegDst = 1

RegWrite

MemtoReg = 0

MemWrite

IorD = 1MemRead

IorD = 1

ALUSrcA = 1

ALUSrcB = 10

ALUOp = 00

RegDst = 0

RegWrite

MemtoReg = 1�

ALUSrcA = 0

�ALUSrcB = 11

ALUOp = 00

MemRead

ALUSrcA = 0

IorD = 0

IRWrite

ALUSrcB = 01

ALUOp = 00

PCWrite

PCSource = 00

Instruction fetchInstruction decode/

register fetch

Jump

completionBranch

completionExecution

Memory address

computation

Memory

accessMemory

access R-type completion

Write-back step

(Op = 'LW') or (Op = 'SW') (Op = R-type)

(Op

= 'BEQ

')

(Op

= 'J

')

(Op = 'SW

')

(Op

= 'L

W')

4

01

9862

753

Start

Page 32: Systems Architecture I - College of Computing & Informatics

Lec 8 Systems Architecture I 32

Implementation of FSM

PCWrite

PCWriteCond

IorD

MemtoReg

PCSource

ALUOp

ALUSrcB

ALUSrcA

RegWrite

RegDst

NS3NS2NS1NS0

Op5

Op4

Op3

Op2

Op1

Op0

S3

S2

S1

S0

State register

IRWrite

MemRead

MemWrite

Instruction register

opcode field

Outputs

Control logic

Inputs

Page 33: Systems Architecture I - College of Computing & Informatics

Lec 8 Systems Architecture I 33

Control using Microprogramming

• Represent asserted values on control lines symbolically (microinstructions)– Fields: Label, ALU control, SRC1, SRC2, Register control, Memory,

PCWrite control, Sequencing– specifies non-overlapping set of control signals

• Placed in ROM or PLA (provides address for microinstructions)

• Sequencing mechanism– next microinstruction– branch to microinstruction (FETCH) for next MIPS instruction– choose next microinstruction based on control unit input (dispatch).

Implemented using a table of addresses (dispatch table).

Page 34: Systems Architecture I - College of Computing & Informatics

Lec 8 Systems Architecture I 34

Microinstruction FieldsField name Value Signals act ive Comm ent

Add ALUOp = 00 Cause the ALU to add.ALU control Subt ALUOp = 01 Cause the ALU to subtract; this implements the compare for

branches.Func code ALUOp = 10 Use the instruction's function code to determine ALU control.

SRC1 PC ALUSrcA = 0 Use the PC as the first ALU input.A ALUSrcA = 1 Register A is the first ALU input.B ALUSrcB = 00 Register B is the second ALU input.

SRC2 4 ALUSrcB = 01 Use 4 as the second ALU input.Extend ALUSrcB = 10 Use output of the sign extension unit as the second ALU input.Extshft ALUSrcB = 11 Use the output of the shift-by-two unit as the second ALU input.Read Read two registers using the rs and rt fields of the IR as the register

numbers and putting the data into registers A and B.Write ALU RegWrite, Write a register using the rd field of the IR as the register number and

Register RegDst = 1, the contents of the ALUOut as the data.control MemtoReg = 0

Write MDR RegWrite, Write a register using the rt field of the IR as the register number andRegDst = 0, the contents of the MDR as the data.MemtoReg = 1

Read PC MemRead, Read memory using the PC as address; write result into IR (and lorD = 0 the MDR).

Memory Read ALU MemRead, Read memory using the ALUOut as address; write result into MDR.lorD = 1

Write ALU MemWrite, Write memory using the ALUOut as address, contents of B as thelorD = 1 data.

ALU PCSource = 00 Write the output of the ALU into the PC.PCWrite

PC write control ALUOut-cond PCSource = 01, If the Zero output of the ALU is active, write the PC with the contentsPCWriteCond of the register ALUOut.

jump address PCSource = 10, Write the PC with the jump address from the instruction.PCWrite

Seq AddrCtl = 11 Choose the next microinstruction sequentially.Sequencing Fetch AddrCtl = 00 Go to the first microinstruction to begin a new instruction.

Dispatch 1 AddrCtl = 01 Dispatch using the ROM 1.Dispatch 2 AddrCtl = 10 Dispatch using the ROM 2.

Page 35: Systems Architecture I - College of Computing & Informatics

Lec 8 Systems Architecture I 35

Microprogram

LabelALU

control SRC1 SRC2Register control Memory

PCWrite control Sequencing

Fetch Add PC 4 Read PC ALU SeqAdd PC Extshft Read Dispatch 1

Mem1 Add A Extend Dispatch 2LW2 Read ALU Seq

Write MDR FetchSW2 Write ALU FetchRformat1 Func code A B Seq

Write ALU FetchBEQ1 Subt A B ALUOut-cond FetchJUMP1 Jump address Fetch

Page 36: Systems Architecture I - College of Computing & Informatics

Lec 8 Systems Architecture I 36

Implementation

PCWritePCWriteCondIorD

MemtoRegPCSourceALUOpALUSrcBALUSrcARegWrite

AddrCtl

Outputs

Microcode memory

IRWrite

MemReadMemWrite

RegDst

Control unit

Input

Microprogram counter

Address select logic

Op[

5–0]

Adder

1

Datapath

Instruction register

opcode field

BWrite