Post on 26-Apr-2018
CS 35101 Ch 5.1 Steinfadt, SP08 KSU
CS 35101Computer Architecture
Spring 2008
Week 10: Chapter 5.1-5.3
Materials adapated from Mary Jane Irwin(www.cse.psu.edu/~mji) and Kevin Schaffer
[adapted from D. Patterson slides]
CS 35101 Ch 5.2 Steinfadt, SP08 KSU
Head’s Up Last course week’s material
Understanding performance, Ch. 4.1-4.6 This week’s material
Designing a MIPS single cycle datapath- Reading assignment – PH 5.1-5.3
Next week’s material More on single and multi-cycle datapath design
- Reading assignment – PH: 5.4-5.6
Reminders HW 3 is due Thursday, 3/27 by the start of class Project 2 is posted and due on 4/22 Exam #2 is Tuesday, April 15
CS 35101 Ch 5.3 Steinfadt, SP08 KSU
Datapath design tended to just work … Control pathsare where the system complexity lives. Bugsspawned from control path design errors reside inthe microcode flow, the finite-state machines, and allthe special exceptions that inevitably spring up in amachine design like thistles in a flower garden.
The Pentium Chronicles, Colwell, pg. 64
CS 35101 Ch 5.4 Steinfadt, SP08 KSU
Review: Design Principles Simplicity favors regularity
fixed size instructions – 32-bits only three instruction formats
Good design demands good compromises three instruction formats
Smaller is faster limited instruction set limited number of registers in register file limited number of addressing modes
Make the common case fast arithmetic operands from the register file (load-store
machine) allow instructions to contain immediate operands
CS 35101 Ch 5.5 Steinfadt, SP08 KSU
We're ready to look at an implementation of the MIPS Simplified to contain only:
memory-reference instructions: lw, sw arithmetic-logical instructions: add, addu, sub, subu,and, or, xor, nor, slt, sltu
arithmetic-logical immediate instructions: addi, addiu,andi, ori, xori, slti, sltiu
control flow instructions: beq, j Generic implementation:
use the program counter (PC) to supplythe instruction address and fetch theinstruction from memory(and update the PC)
decode the instruction (and read registers) execute the instruction
The Processor: Datapath & Control
FetchPC = PC+4
DecodeExec
CS 35101 Ch 5.6 Steinfadt, SP08 KSU
Abstract Implementation View Two types of functional units:
elements that operate on data values (combinational) elements that contain state (sequential)
Single cycle operation Split memory model - one memory for instructions
and one for data
Address Instruction
InstructionMemory
Write Data
Reg Addr
Reg Addr
Reg Addr
Register
File ALUData
Memory
Address
Write Data
Read DataPC
Read Data
Read Data
CS 35101 Ch 5.7 Steinfadt, SP08 KSU
Clocking Methodologies Clocking methodology defines when signals can
be read and when they can be writtenfalling (negative) edge
rising (positive) edgeclock cycle
clock rate = 1/(clock cycle) e.g., 10 nsec clock cycle = 100 MHz clock rate 1 nsec clock cycle = 1 GHz clock rate
State element design choices level sensitive latch master-slave and edge-triggered flipflops
CS 35101 Ch 5.8 Steinfadt, SP08 KSU
State Elements Set-reset latch
R
S
Q
!Q 0!Q(t)
01
!Q(t+1)
011Q(t)00
110001
Q(t+1)SR
clock
D
Q
!Q
clock
D
Q
Level sensitive D latch
latch is transparent when clock is high (copies input tooutput)
CS 35101 Ch 5.9 Steinfadt, SP08 KSU
Two-Sided Clock Constraint Race problem with latch based design …
D
clock
Q
!Q
D-latch0D
clock
Q
!Q
D-latch1
clock
Consider the case when D-latch0 holds a 0 and D-latch1 holds a 1 and you want to transfer thecontents of D-latch0 to D-latch1 and vica versa must have the clock high long enough for the transfer to
take place must not leave the clock high so long that the
transferred data is copied back into the original latch Two-sided clock constraint
CS 35101 Ch 5.10 Steinfadt, SP08 KSU
State Elements, con’t Solution is to use flipflops that change state (Q)
only on clock edge (master-slave)
master (first D-latch) copies the input when the clock ishigh (the slave (second D-latch) is locked in its memorystate and the output does not change)
slave copies the master when the clock goes low (themaster is now locked in its memory state so changes atthe input are not loaded into the master D-latch)
D
clock
Q
!Q
D-latchD
clock
Q
!Q
D-latchQ
!Q
D
clockclock
D
Q
CS 35101 Ch 5.11 Steinfadt, SP08 KSU
One-Slided Clock Constraint Master-slave (edge-triggered) flipflops removes
one of the clock constraints
D
clock
Q
!Q
MS-ff0D
clock
Q
!Q
MS-ff1
clock
Consider the case when MS-ff0 holds a 0 andMS-ff1 holds a 1 and you want to transfer thecontents of MS-ff0 to MS-ff1 and vica versa must have the clock cycle time long enough to
accommodate the worst case delay path One-sided clock constraint
CS 35101 Ch 5.12 Steinfadt, SP08 KSU
Latches vs Flipflops Output is equal to the stored value inside the
element Change of state (value) is based on the clock
Latches: output changes whenever the inputs changeand the clock is asserted (level sensitive methodology)
- Two-sided timing constraint Flip-flop: output changes only on a clock edge (edge-
triggered methodology)- One-sided timing constraint
A clocking methodology defines when signals canbe read and written – wouldn’t want to read asignal at the same time it was being written
CS 35101 Ch 5.13 Steinfadt, SP08 KSU
Our Implementation An edge-triggered methodology, typical execution
read contents of some state elements (combinationalactivity, so no clock control signal needed)
send values through some combinational logic write results to one or more state elements on clock
edge
Stateelement
1
Stateelement
2
Combinationallogic
clock
one clock cycle
Assumes state elements are written on every clockcycle; if not, need explicit write control signal write occurs only when both the write control is asserted
and the clock edge occurs
CS 35101 Ch 5.14 Steinfadt, SP08 KSU
Fetching Instructions Fetching instructions involves
reading the instruction from the Instruction Memory updating the PC value to be the address of the next
(sequential) instruction
ReadAddress
Instruction
InstructionMemory
Add
PC
4
PC is updated every clock cycle, so it does not need anexplicit write control signal just a clock signal
Reading from the Instruction Memory is a combinationalactivity, so it doesn’t need an explicit read control signal
FetchPC = PC+4
DecodeExec
clock
CS 35101 Ch 5.15 Steinfadt, SP08 KSU
Instruction Formats Review
5:010:615:1120:1625:2131:26
functshamtrdrtrsop
15:020:1625:2131:26
immedrtrsop
25:031:26
addressop
CS 35101 Ch 5.16 Steinfadt, SP08 KSU
Decoding Instructions Decoding instructions involves
sending the fetched instruction’s opcode and functionfield bits to the control unit
and
Instruction
Write Data
Read Addr 1
Read Addr 2
Write Addr
Register
File
Read Data 1
Read Data 2
ControlUnit
reading two values from the Register File- Register File addresses are contained in the instruction
FetchPC = PC+4
DecodeExec
CS 35101 Ch 5.17 Steinfadt, SP08 KSU
Note that both RegFile read ports are active for allinstructions during the Decode cycle using the rs andrt instruction field addresses Since haven’t decoded the instruction yet, don’t know what
the instruction is ! Just in case the instruction uses values from the RegFile do
“work ahead” by reading the two source operands
Which instructions do make use of the RegFile values?
Also, all instructions (except j) use the ALU afterreading the registers
Why? memory-reference? arithmetic? control flow?
Reading Registers “Just in Case”
CS 35101 Ch 5.18 Steinfadt, SP08 KSU
Executing R Format Operations R format operations (add, sub, slt, and, or)
perform operation (op and funct) on values in rs and rt store the result back into the Register File (into location rd)
Instruction
Write Data
Read Addr 1
Read Addr 2
Write Addr
Register
File
Read Data 1
Read Data 2
ALU
overflowzero
ALU controlRegWrite
R-type:31 25 20 15 5 0
op rs rt rd functshamt
10
Note that Register File is not written every cycle (e.g. sw), sowe need an explicit write control signal for the Register File
FetchPC = PC+4
DecodeExec
CS 35101 Ch 5.19 Steinfadt, SP08 KSU
Consider slt Instruction R format operations (add, sub, slt, and, or)
perform operation (op and funct) on values in rs and rt store the result back into the Register File (into location rd)
Instruction
Write Data
Read Addr 1
Read Addr 2
Write Addr
Register
File
Read Data 1
Read Data 2
ALU
overflowzero
ALU controlRegWrite
R-type:31 25 20 15 5 0
op rs rt rd functshamt
10
Note that Register File is not written every cycle (e.g. sw), sowe need an explicit write control signal for the Register File
FetchPC = PC+4
DecodeExec
CS 35101 Ch 5.20 Steinfadt, SP08 KSU
Remember the R format instruction slt slt $t0, $s0, $s1 # if $s0 < $s1
# then $t0 = 1# else $t0 = 0
Consider the slt Instruction
Where does the 1 (or 0) come from to store into $t0 in theRegister File at the end of the execute cycle?
Instruction
Write Data
Read Addr 1
Read Addr 2
Write Addr
Register
File
Read Data 1
Read Data 2
ALU
overflowzero
ALU controlRegWrite
CS 35101 Ch 5.21 Steinfadt, SP08 KSU
CS 35101 Ch 5.22 Steinfadt, SP08 KSU
Executing Load and Store Operations Load and store operations have to
compute a memory address by adding the baseregister (in rs) to the 16-bit signed offset field in theinstruction
- base register was read from the Register File duringdecode
- offset value in the low order 16 bits of the instructionmust be sign extended to create a 32-bit signed value
store value, read from the Register File duringdecode, must be written to the Data Memory
load value, read from the Data Memory, must bestored in the Register File
I-Type: op rs rt address offset31 25 20 15 0
CS 35101 Ch 5.23 Steinfadt, SP08 KSU
Executing Load and Store Operations, con’t
Instruction
Write Data
Read Addr 1
Read Addr 2
Write Addr
Register
File
Read Data 1
Read Data 2
ALU
overflowzero
ALU controlRegWrite
DataMemory
Address
Write Data
Read Data
SignExtend
MemWrite
MemRead
CS 35101 Ch 5.25 Steinfadt, SP08 KSU
Executing Branch Operations Branch operations have to
compare the operands read from the Register Fileduring decode (rs and rt values) for equality (zeroALU output)
compute the branch target address by adding theupdated PC to the sign extended16-bit signedoffset field in the instruction
- “base register” is the updated PC- offset value in the low order 16 bits of the instruction
must be sign extended to create a 32-bit signedvalue and then shifted left 2 bits to turn it into a wordaddress
I-Type: op rs rt address offset31 25 20 15 0
CS 35101 Ch 5.26 Steinfadt, SP08 KSU
Executing Branch Operations, con’t
Instruction
Write Data
Read Addr 1
Read Addr 2
Write Addr
Register
File
Read Data 1
Read Data 2
ALU
zero
ALU control
SignExtend16 32
Shiftleft 2
Add
4 Add
PC
Branchtargetaddress
(to branchcontrol logic)
CS 35101 Ch 5.28 Steinfadt, SP08 KSU
Executing Jump Operations Jump operations have to
replace the lower 28 bits of the PC with the lower 26 bitsof the fetched instruction shifted left by 2 bits
ReadAddress
Instruction
InstructionMemory
Add
PC
4
Shiftleft 2
Jumpaddress
26
4
28
J-Type: op
31 25 0
jump target address
CS 35101 Ch 5.29 Steinfadt, SP08 KSU
Creating a Single Datapath from the Parts Assemble the datapath elements, add control lines
as needed, and design the control path Fetch, decode and execute each instructions in
one clock cycle – single cycle design no datapath resource can be used more than once per
instruction, so some must be duplicated (e.g., why wehave a separate Instruction Memory and Data Memory)
to share datapath elements between two differentinstruction classes will need multiplexors at the input ofthe shared elements with control lines to do theselection
Cycle time is determined by length of the longestpath
CS 35101 Ch 5.30 Steinfadt, SP08 KSU
Fetch, R, and Memory Access Portions
ReadAddress
Instruction
InstructionMemory
Add
PC
4
Write Data
Read Addr 1
Read Addr 2
Write Addr
Register
File
Read Data 1
Read Data 2
ALU
ovfzero
ALU controlRegWrite
DataMemory
Address
Write Data
Read Data
MemWrite
MemReadSign
Extend16 32
CS 35101 Ch 5.31 Steinfadt, SP08 KSU
Multiplexor Insertion
MemtoReg
ReadAddress
Instruction
InstructionMemory
Add
PC
4
Write Data
Read Addr 1
Read Addr 2
Write Addr
Register
File
Read Data 1
Read Data 2
ALU
ovfzero
ALU controlRegWrite
DataMemory
Address
Write Data
Read Data
MemWrite
MemReadSign
Extend16 32
ALUSrc
CS 35101 Ch 5.32 Steinfadt, SP08 KSU
Clock Distribution
MemtoReg
ReadAddress
Instruction
InstructionMemory
Add
PC
4
Write Data
Read Addr 1
Read Addr 2
Write Addr
Register
File
Read Data 1
Read Data 2
ALU
ovfzero
ALU control
RegWrite
DataMemory
Address
Write Data
Read Data
MemWrite
MemReadSign
Extend16 32
ALUSrc
System Clock
clock cycle
CS 35101 Ch 5.33 Steinfadt, SP08 KSU
Adding the Branch Portion
Write Data
Read Addr 1
Read Addr 2
Write Addr
Register
File
Read Data 1
Read Data 2
ALU
ovfzero
ALU controlRegWrite
DataMemory
Address
Write Data
Read Data
MemWrite
MemReadSign
Extend16 32
MemtoRegALUSrc
ReadAddress
Instruction
InstructionMemory
Add
PC
4
CS 35101 Ch 5.35 Steinfadt, SP08 KSU
We wait for everything to settle down
ALU might not produce “right answer” right away Memory and RegFile reads are combinational (as are
ALU, adders, muxes, shifter, signextender) Use write signals along with the clock edge to determine
when to write to the sequential elements (to the PC, tothe Register File and to the Data Memory)
The clock cycle time is determined by the logicdelay through the longest path
Our Simple Control Structure
We are ignoring some details like registersetup and hold times