4-bit Carry look-ahead adder - Virginia...
Transcript of 4-bit Carry look-ahead adder - Virginia...
MIPS64
• Registers
– 32 64-bit GPRs (int registers)
– 32 64-bit fp registers (dual use)
– R0=0
– Additional special purpose registers
• Data types
– 8-bit byte
– 2 bytes = half word
– 4 bytes = word
– 8 bytes = dword
• Addressing modes
– Immediate and displacement
• register indirect and absolute are easily represented
– Byte addressable 64-bit address
• Big or little endian
– Load/store architecture
I-type instructions
• I-Type instruction:
• Encodes: Loads and stores of bytes, half words, words, dwords
– All immediate (rt rs op immediate)
– Ex: Add base register rs to 16 bit offset
Op code rs rt immediate
6 bits 5 bits 5 bits 16 bits
R-type instructions
• R-Type instruction:
• Register-Register ALU operations: rd rs func rt
– Function encodes data path operation: add, sub, slt, and, or
– Read/write special registers and moves
Op code rs rt rd func
6 bits 5 bits 5 bits 5 bits 11 bits
Source registers Destination register Op code variant
J-type instructions
• J-Type instruction:
• Encodes:
– Jump and jump & link
– Trap and return from exception
Op code immediate
6 bits 26 bits
In summary…
• Pitfalls
– Designing “high-level” language instructions
– Not considering compiler design when targeting code size
• Fallacies
– There are “typical” programs
– An architecture with flaws cannot be successful
– You can design a flawless architecture
Data Path and Control
Review
Kirk W. Cameron, Ph.D.Associate Professor
Department of Computer Science
Virginia Tech
ALU components
0
1
a
b
op
1-bit logical unit for AND and OR3/2 adder
+a
bsum
Carry_in
Carry_out
1-bit Simplified ALU
0
1
2
a
b
op
result
Carry_in
Carry_out
+
1-bit Simplified ALU w/ subtraction
0
1
2
a
b
op
result
Carry_in
Carry_out
+0
1
Binvert
babababa )()1(1
32 bit ALU (ripple carry)
a0
b0
op
Carry_in
Binvert
result0
a1
b1 result1
ALU0
ALU1
Carry_out
Carry_out
a32
b32 result32ALU32Carry_out
Edge-triggered Design
• Clocks: necessary to update logic that holds state
– Free running signal with a fixed cycle time (clock period)
• Frequency = inverse of cycle time
• Clock period = high clock followed by a low clock
• Edge-triggered clocking: all state changes occur on clock edge
• Active clock edge = edge of clock that causes state to change
Rising edge Falling edge Clock period
Why use edge-triggering?
State of
input
element
Combinational
Logic
State of
output
element
Clock period
*input values must be stable when active clock edge returns
Hold timeSetup time
High
Low
High
Low
6 bit register using D flip flops
Register Files
• Set of registers that can be read and written by supplying
a register number to be accessed
– One set of registers operated on by a port
• Multiple-read ports: not too difficult
• Multiple-write ports: problems arise
Read ports
M
U
X
Read port 1
Register0
Register1
:
:
:
Register n-1
Readdata1
M
U
X
Read port 1
Register0
Register1
:
:
:
Register n-1
Readdata1
M
U
X
Read port 2
Readdata2
:
:Single Read Port
Multi Read Port
Write Ports
D
e
c
o
d
e
r
Write register
Register0
C
D
Register1
C
D
:
:
:
Register n-1
C
D
:
:
Write (clock)
Register 0
Register 1
Register n-1
Write data
Register Files
• Example: 2 read ports, 1 write port, high-level view
Read Port 1
Read Port 2
Write Register
Write DataWrite
Readdata1
Readdata2
Why not multiple writes?
What if read and write at same time? Who goes first?
Data Path for I-type instruction
• Assuming 32, 64-bit registers (R0..R31)
• Use same register file as R-type
ReadReg1
ReadReg2
WriteReg
Write DataWrite
Readdata1
Readdata2REGISTERS A
LU
ALUop
zero
rs
rt
Data
Memory
memwrite
resultimmediate
Sign
extend
memread
address
Write data
Op code rs rt immediate
6 bits 5 bits 5 bits 16 bits
Data Path for R-type instruction
• Assuming 32, 64-bit registers (R0..R31)
• Using 2 read ports and 1 write port
– Inputs: 3 register #s (each 5 bits wide), data for write (64-bits)
– Outputs: readdata1 and readdata2 (both 64 bits)
ReadReg1
ReadReg2
WriteReg
Write DataWrite
Readdata1
Readdata2REGISTERS A
LU
ALUop
zero
rs
rt
rd
result
Op code rs rt rd func
6 bits 5 bits 5 bits 5 bits 11 bits
Single clock implementation
• Execute all instructions in one clock cycle
• No data path resource can be used more than once
• Memory for instructions separate from data
• Anywhere we want to choose a signal from two paths we
use mux + selector
Combined R-type + Load/Store
ALUSrc: 1 bit selector of
a) data from rt (R-type)
b) sign extended immediate (Load/Store)
MemtoReg: 1 bit selector of
a) data found at calculated address in memory (Load)
b) ALU result from two register ALU operation
Adding instruction fetch
Adding branches
Shift
left 2
sum
M
U
X
AL
U
PCSrc
ALU control
• Our latest version of ALU needs 3-bits to control Mux’s
– Binvert: 1-bit control to operate on B or B-inverse (sub)
– Operation: 2-bit control to perform AND, OR, add, slt
AL
U
a
b
carryout
zero
result
overflow
ALU operationCombine Binvert and Operation into
set of 3 control lines
ALU operation Function
000 AND
001 OR
010 ADD
110 SUB
111 Set on less than
Shift
left 2
AL
U
sum
M
U
X
PCSrc
ALU control input is 3-bitsAND(000), OR(001), ADD(010), SUB(110), SLT(111)
Instruction class determines ALU operation:
lw/sw ALU computes mem address (ADD)
branch ALU computes comparison (SUB)
R-type ALU control determined by 6-bit func field
What function should ALU perform?
• Determined by instruction
• Op-code + function code => ALUop
• Observation: There are 3 possibilities => 2 bits
– Load/store uses ALU to add (always)
– BEQ uses ALU to subtract (always)
– R-type uses ALU to perform 1 of 5 operations
• Use the function-field and a 2-bit control field (ALUOp) as input to
produce 3-bit ALU (operation) control
Determining ALU control bits
I-type instructions
Based on op-code: generated by main control unit
Input signal to ALU
Obtaining 3-bit ALU control from ALUOp + Funct
ALUOp Funct field
Operation ActionALUOp1 ALUOp2 F5 F4 F3 F2 F1 F0
0 0 X X X X X X 010 Add
X 1 X X X X X X 110 Sub
1 X X X 0 0 0 0 010 Add
1 X X X 0 0 1 0 110 Sub
1 X X X 0 1 0 0 000 And
1 X X X 0 1 0 1 001 Or
1 X X X 1 0 1 0 111 Set on less than
Resulting PLA is ALU control logic
Shift
left 2
AL
U sum
M
U
X
PCSrc
[25-21] rs
[20-16] rt
[15-11] rd
[15-0] addr
[5-0] funct
ALU
controlALUOp
ALU Control Unit
Instruction bits
0 rs rt rd func
[31-26] [25-21] [20-16] [15-11] [5-0]
35 or 43 rs rd immediate
[31-26] [25-21] [20-16] [15-0]
4 rs rd immediate
[31-26] [25-21] [20-16] [15-0]
2 immediate
[31-26] [25-0]
All R-type
I-type
Load/Store
I-type
Branch
Jump
shamt
[10-6]
Op Code Mapping
Instr
type
OpCode
in
decimal
OpCode in binary
Instr[31] Instr[30] Instr[29] Instr[28] Instr[27] Instr[26]
Op5 Op4 Op3 Op2 Op1 Op0
R-type 0ten 0 0 0 0 0 0
Load 35ten 1 0 0 0 1 1
Store 43ten 1 0 1 0 1 1
BEQ 4ten 0 0 0 1 0 0
Jump 2ten 0 0 0 0 1 0
Shift
left 2
AL
U
sum
M
U
X
PCSrc
[25-21] rs
[20-16] rt
[15-11] rd
[15-0] addr
[5-0] funct
ALU
controlALUOp
ALU Control Unit
What about write register?
For R-type use bits [15-11]
For I-type use bits [20-16]
[15-11] rd
M
U
X
RegDst
Single-cycle Instruction execution• R-type
– Fetch instruction and increment PC
– Read two source registers from register file (set control)
– Perform ALU operation on register operands
• Store/Load
– Fetch instruction and increment PC
– Read one register value from register file (set control)
– ALU computes sum of immediate and register value
– ALU result used as address to memory
– Data from memory written back to register (Load only)
• Branch
– Fetch instruction and increment PC
– Read two source registers from register file (set control)
– ALU performs subtract on register operands (target addr computed)
– Zero result from ALU determines PC
Main Control SignalsSignal Deasserted (0) Asserted (1)
PCSrc (0/1mux) PC = PC+4 PCbranch target
RegDst (0/1mux) Destination register for
write uses [20-16]
Destination register for
write uses [15-11]
ALUSrc (0/1mux)
Second ALU operand is
data from register file
(readdata2)
Second ALU operand is
sign-extended lower 16
bits [15-0] of instruction
MemToReg (1/0mux)
Data to be written to
local register comes
from result of ALU
Data to be written to
local register comes
from data memory
RegWrite (enable) No write
Register determined by
RegDst is written with
value on Write data
input
MemRead (enable) No read
Data at address is
fetched and put on
Readdata output
MemWrite (enable) No write
Write data written to
location specified by
address
Data path and control
Read/write Enabler
Data path control Mux’s
Designing Main control logic (R-type)
Instr
type
Reg
Dst
AL
US
rc
Mem
To
Reg
Reg
Write
Mem
Read
Mem
Write
Bra
nch
AL
UO
p1
AL
UO
p0
R-type 1 0 0 1 0 0 0 1 0
RegDst: Destination register is determined by [15-11] 1
ALUSrc: ALU input 2 is from register file0
MemtoReg: ALU result written to register 0
RegWrite: Write to register allowed 1
MemRead: Read to mem not needed (no effect) 0
MemWrite: Write to mem not needed (no effect) 0
Branch: This is not a branch (PCSrc0) 0
ALUOp1: TBD by function code 1 (sets ALU control)
ALUOp0: TBD by function code 0 (sets ALU control)
Designing Main control logic (Load)
Instr
type
Reg
Dst
AL
US
rc
Mem
To
Reg
Reg
Write
Mem
Read
Mem
Write
Bra
nch
AL
UO
p1
AL
UO
p0
R-type 1 0 0 1 0 0 0 1 0
Load 0 1 1 1 1 0 0 0 0
RegDst: Destination register is determined by [20-16] 0
ALUSrc: ALU input 2 is from sign extended immediate1
MemtoReg: data found at address written to register 1
RegWrite: Write to register allowed 1
MemRead: Allow mem to be read 1
MemWrite: Write to mem not needed (no effect) 0
Branch: This is not a branch (PCSrc0) 0
ALUOp1: TBD by function code 0 (sets ALU control)
ALUOp0: TBD by function code 0 (sets ALU control)
Designing Main control logic (Store)
Instr
type
Reg
Dst
AL
US
rc
Mem
To
Reg
Reg
Write
Mem
Read
Mem
Write
Bra
nch
AL
UO
p1
AL
UO
p0
R-type 1 0 0 1 0 0 0 1 0
Load 0 1 1 1 1 0 0 0 0
Store X 1 X 0 0 1 0 0 0
RegDst: Don’t care b/c no storage to regs (RegWrite=0) X
ALUSrc: ALU input 2 is from sign extended immediate1
MemtoReg: Don’t care b/c no storage to regs (RegWrite=0) X
RegWrite: Write to register not allowed 0
MemRead: Read to mem not needed (no effect) 0
MemWrite: Allow mem to be written 1
Branch: This is not a branch (PCSrc0) 0
ALUOp1: TBD by function code 0 (sets ALU control)
ALUOp0: TBD by function code 0 (sets ALU control)
Designing Main control logic (BEQ)
Instr
type
Reg
Dst
AL
US
rc
Mem
To
Reg
Reg
Write
Mem
Read
Mem
Write
Bra
nch
AL
UO
p1
AL
UO
p0
R-type 1 0 0 1 0 0 0 1 0
Load 0 1 1 1 1 0 0 0 0
Store X 1 X 0 0 1 0 0 0
BEQ X 0 X 0 0 0 1 0 1
RegDst: Don’t care b/c no storage to regs (RegWrite=0) X
ALUSrc: ALU input 2 is from register file0
MemtoReg: Don’t care b/c no storage to regs (RegWrite=0) X
RegWrite: Write to register not allowed 0
MemRead: Read to mem not needed (no effect) 0
MemWrite: Write to mem not needed (no effect) 0
Branch: This is a branch (PCSrc=(zero AND Branch)) 1
ALUOp1: TBD by function code 0 (sets ALU control)
ALUOp0: TBD by function code 1 (sets ALU control)
What about
Jump?
Hierarchical Control Logic
6-bit op code 9-bit control
2-bit ALUOp
Active Units for R-type
Instr
type
Reg
Dst
AL
US
rc
Mem
To
Reg
Reg
Write
Mem
Read
Mem
Write
Bra
nch
AL
UO
p1
AL
UO
p0
R-type 1 0 0 1 0 0 0 1 0
Active Units for Load
Instr
type
Reg
Dst
AL
US
rc
Mem
To
Reg
Reg
Write
Mem
Read
Mem
Write
Bra
nch
AL
UO
p1
AL
UO
p0
Load 0 1 1 1 1 0 0 0 0
Active Units for Branch
Instr
type
Reg
Dst
AL
US
rc
Mem
To
Reg
Reg
Write
Mem
Read
Mem
Write
Bra
nch
AL
UO
p1
AL
UO
p0
BEQ X 0 X 0 0 0 1 0 1
Design of Main Control
Instr
type
OpCode
in
decimal
OpCode in binary
Instr[31] Instr[30] Instr[29] Instr[28] Instr[27] Instr[26]
Op5 Op4 Op3 Op2 Op1 Op0
R-type 0ten 0 0 0 0 0 0
Load 35ten 1 0 0 0 1 1
Store 43ten 1 0 1 0 1 1
BEQ 4ten 0 0 0 1 0 0
Instr
typeR
eg
Dst
AL
US
rc
Mem
To
Reg
Reg
Write
Mem
Read
Mem
Write
Bra
nch
AL
UO
p1
AL
UO
p0
R-type 1 0 0 1 0 0 0 1 0
Load 0 1 1 1 1 0 0 0 0
Store X 1 X 0 0 1 0 0 0
BEQ X 0 X 0 0 0 1 0 1
Resulting Truth TableSignal R-type Load Store BEQ
Inputs
Op5 0 1 1 0
Op4 0 0 0 0
Op3 0 0 1 0
Op2 0 0 0 1
Op1 0 1 1 0
Op0 0 1 1 0
Outputs
RegDst 1 0 X X
ALUSrc 0 1 1 0
MemToReg 0 1 X X
RegWrite 1 1 0 0
MemRead 0 1 0 0
MemWrite 0 0 1 0
Branch 0 0 0 1
ALUOp1 1 0 0 0
ALUOp2 0 0 0 1
Jumps
• Unconditional change to PC
• 26-bit immediate is word-aligned
• PC-relative offset, thus upper bits of PC give upper bits of
target address
• Resulting 32-bit target address..
2 immediate
[31-26] [25-0]
Jump
immediate
[27-2]
00
[1-0]
(PC+4)[31-28]
[31-28]
Jumps
Single-cycle performance
• Assume
– No delay from mux’s, control unit, PC access, sign extend, wiring
– Delays: memory, 2ns; ALU 2ns; Registerfile 1ns
– 24% loads, 12% stores, 44% R-type, 18% branch, 2% jumps
• Compare
– 1 fixed-length clock cycle implementation
– 1 variable-length clock cycle implementation (impractical)
Instruction
type
Instruction
memory
Register
Read
ALU
operation
Data
memory
Register
Write
Total
R-type 2 1 2 0 1 6
Load 2 1 2 2 1 8
Store 2 1 2 2 7
Branch 2 1 2 5
Jump 2 2
Single-cycle performance
Fixed length clock cycle: 8 ns clock length
avg time / instr = 8 ns
Variable length clock cycle:
avg time / instr = 8(.24)+7(.12)+6(.44)+5(.18)+2(.02)= 6.3 ns
Speedup: 8/6.3=1.27
Variable clock implementation is 1.27 times faster than fixed.
How can we gain efficiency without added
complexity of variable clock length?
Drawbacks to single-cycle implementation
• Clock cycle length depends on critical path (loads)
• CPI=1 is good provided clock rate very fast
• Problem 1: many instructions execute faster than loads, thus inefficient (worse for fp)– Solution 1: use shorter clock cycle and require multiple
clocks per instruction
• Problem 2: each functional unit can be used only once per clock– Solution 2: reuse functional units in data path through
multi-cycle implementation
Multicycle Implementation
• Each step in instruction execution takes 1 clock cycle
• Advantages:
– Functional units like ALU can be used more than once per instruction
minimizing hardware
– Instructions may take varying numbers of clock cycles to complete
(efficiency)
• Major differences from single-cycle design:
– Single memory for instructions and data
– Single ALU
– Additional registers (state elements) for output between steps