Pipelining the MIPS Datapath - GitHub Pages · Today’s lecture Pipeline implementation...
Transcript of Pipelining the MIPS Datapath - GitHub Pages · Today’s lecture Pipeline implementation...
Today’s lecture Pipeline implementation Single-cycle Datapath Pipelining performance Pipelined datapath Example
Refresher: The Full Single-cycle Datapath
wr_enable
clk
resetalu_op[2:0]
A[31:0]
B[31:0]
out[31:0]0
1 s
overflowzero
negative
alu_src2
alu_op[2:0]write_enable
alu_src2rd_src
3
inst[31:0]
inst[25:21]
inst[20:16]
inst[15:11]
inst[20:16]
rd_src
rs
rt
rd
rt
5
5
5
16
6
6
32
inst[15:0]
inst[31:26]
inst[5:0]
imm16
32
32
32
3
32
3
ADD
432
32
1
30
PC[31:0]
nextPC[31:0]
PC[31:2]
except
A_addr
B_addr
W_addr
A_data
B_data
W_en
W_data
reset
25x32 Register
File
MIPS instruction decoderalu_op[2:0]
write_enable
alu_src2rd_src
except
opcode[5:0]
funct[5:0]
A
B
out
0
1s
Instruction Memory
addr[29:0]data[31:0]
PC RegisterD[31:0]
Q[31:0]resetenable
3ADD
branch offset
32ALU
0 1 2 3
PC+4[31:28] 2'b0
ALU
control_type
inst[25:0] 26
4
32
3232
32
32
32
control_type[1:0]2
control_type[1:0]
zero
32<<2in[29:0] out[31:0]
Sign Extenderin[15:0] out[31:0]
branch offset
30'b0
10 slt
16'b0
16
lui 1 0
luiluisltslt
word_webyte_we
out[1:0]
data
_out
[31:
0]
data
_out
[31:
24]
data
_out
[23:
16]
data
_out
[15:
8]
data
_out
[7:0
]
mem_readbyte_load
32
1
0
addr[31:0]
data_out[31:0]
data_in[31:0]
word_webyte_we
Data Memory
reset
0123
1
0 24'b0
32
32
32
32
32
byte_loadbyte_loadword_weword_webyte_webyte_wemem_readmem_read
We will use a simplified implementation of MIPS to create a pipelined version
Arithmetic: add sub and or slt
Data Transfer:
lw sw
Control: beq
Sign Extend
4
ADD
PCSrc
RegWrite
rd1
0
RegDst
MIPS Decoder
0
1
ALUSrc
ALUOp
<< 2
ADD
Address
Write Data
Read Data
Data Memory
1
0
MemToReg
0
1
Read Address
Instruction[31:0]
Instruction Memory
A
B
ReadAddr1
ReadAddr2
WriteAddr
WriteData
ReadData1
ReadData2
Register File
rt
imm
rs
rt
MemReadMemWrite
PC
EN
Sign Extend
4
ADD
PCSrc
RegWrite
rd1
0
RegDst
MIPS Decoder
0
1
ALUSrc
ALUOp
<< 2
ADD
Address
Write Data
Read Data
Data Memory
1
0
MemToReg
0
1
Read Address
Instruction[31:0]
Instruction Memory
A
B
ReadAddr1
ReadAddr2
WriteAddr
WriteData
ReadData1
ReadData2
Register File
rt
imm
rs
rt
MemReadMemWrite
PC
EN
2ns
2ns 2ns1ns
2ns 2ns
Worst-case delay from register-read to register-write determines clock speed
1ns
IF ID EX MEM WB
IF ID EX MEM WB
0ns 2 3 5 7 8 10 11 13 15 16 ns
clk
Single-cycle datapath completes one instruction per clock cycle
Add pipeline registers in between stages to increase clock speed
Approximate as one big pipeline register between each stage. The registers are named for the stages they connect.
IF/ID ID/EX EX/MEM MEM/WB
No pipeline register after the WB stage, because write is to the register file.
Paths from register-read to register-write are shorter, clock cycle shorter
Sign Extend
4
ADD
PCSrc
RegWrite
rd1
0
RegDst
MIPS Decoder
0
1
ALUSrcALUOp
<< 2
ADD
Address
Write Data
Read Data
Data Memory
1
0
MemToReg
IF/ID ID/EX
WBMEX
EX/MEM
WBM
Mem/WB
WB
0
1
Read Address
Instruction[31:0]
Instruction Memory
A
B
ReadAddr1
ReadAddr2
WriteAddr
WriteData
ReadData1
ReadData2
Register File
rt
imm
rd
rt
rs
rt
MemReadMemWrite
PC
EN
IF EX MEM WB
0ns 2 4 6 8 10 12 14 16 ns
ID
IF EX MEM WBID
IF EX MEM WBID
IF EX MEM WBID
clk
Pipeline datapath completes one stage per clock cycle
How long does it take to run 1000 instructions if the clock period is 8 ns?
Throughput: Time to run N instructions on the single-cycle datapath is N x clock period
For large N, this 5-stage pipeline quadruples performanceSingle-cycle throughput performance
= N instructions / N * 8 ns
Pipeline throughput performance = N instructions / N * 2 ns
Speedup =Pipeline performance
Single − cycle performance=
82
= 4
Data values required in later stages must be propagated forwardthrough the pipeline registers.
Sign Extend
RegWrite
rd1
0
RegDst
0
1
ALUSrcALUOp
Write Data
Read Data
Data Memory
1
0
MemToRegRead Address
Instruction[31:0]
Instruction Memory
BWriteAddr
WriteData
ReadData2
Register File
rt
imm
rd
rt
EN
Note – We cannot keep values like destination register in the “IF/ID register”
Sign Extend
4
ADD
PCSrc
RegWrite
rd1
0
RegDst
MIPS Decoder
0
1
ALUSrcALUOp
<< 2
ADD
Address
Write Data
Read Data
Data Memory
1
0
MemToReg
IF/ID ID/EX
WBMEX
EX/MEM
WBM
Mem/WB
WB
0
1
Read Address
Instruction[31:0]
Instruction Memory
A
B
ReadAddr1
ReadAddr2
WriteAddr
WriteData
ReadData1
ReadData2
Register File
rt
imm
rd
rt
rs
rt
MemReadMemWrite
PC
EN
zero
Sign Extend
4
ADD
PCSrc
RegWrite
rd1
0
RegDst
MIPS Decoder
0
1
ALUSrcALUOp
<< 2
ADD
Address
Write Data
Read Data
Data Memory
1
0
MemToReg
IF/ID ID/EX
WBMEX
EX/MEM
WBM
Mem/WB
WB
0
1
Read Address
Instruction[31:0]
Instruction Memory
A
B
ReadAddr1
ReadAddr2
WriteAddr
WriteData
ReadData1
ReadData2
Register File
rt
imm
rd
rt
rs
rt
MemReadMemWrite
PC
EN
zero
Control signals are generated in the decode stage and are propagated across stages
Categorize control signals by the pipeline stage that uses them
Stage Control signals neededEX ALUSrc ALUOp RegDst PCSrcMEM MemRead MemWriteWB RegWrite MemToReg
The pipeline registers and program counter update every clock cycle, so they do not have write enable controls
Sign Extend
4
ADD
PCSrc
RegWrite
rd1
0
RegDst
MIPS Decoder
0
1
ALUSrcALUOp
<< 2
ADD
Address
Write Data
Read Data
Data Memory
1
0
MemToReg
IF/ID ID/EX
WBMEX
EX/MEM
WBM
Mem/WB
WB
0
1
Read Address
Instruction[31:0]
Instruction Memory
A
B
ReadAddr1
ReadAddr2
WriteAddr
WriteData
ReadData1
ReadData2
Register File
rt
imm
rd
rt
rs
rt
MemReadMemWrite
PC
EN
1000: lw $8, 4($29)1004: sub $2, $4, $51008: and $9, $10, $111012: or $16, $17, $181016: add $13, $14, $0
ASSUMPTIONS Each register contains its number plus 100. Example: R[8] == 108, R[29] == 129 Every data memory location contains 99. Example: M[8] == 99, M[29] == 99
CONVENTIONS X indicates values that are not important, Example: Imm16 for R-type. Question marks ??? indicate values we do not know, usually resulting from
instructions coming before and after the ones in our example.
An example execution sequence
addresses in decimal
Cycle 1 (filling)IF: lw $8, 4($29) MEM: ??? WB: ???EX: ???ID: ???
Sign Extend
ADD
RegWrite
rd1
0
RegDst
0
1
ALUSrcALUOp
ADD
Address
Write Data
Read Data
Data Memory
1
0
MemToRegRead Address
Instruction[31:0]
Instruction Memory
A
B
ReadAddr1
ReadAddr2
WriteAddr
WriteData
ReadData1
ReadData2
Register File
rt
imm
rd
rt
rs
rt
MemReadMemWrite
PC
EN
1000
??
??
???
?
? ??
?
?
??? ? ?
?
?
?
?
?
Cycle 2ID: lw $8, 4($29)IF: sub $2, $4, $5 MEM: ??? WB: ???EX: ???
Sign Extend
ADD
RegWrite
rd1
0
RegDst
0
1
ALUSrcALUOp
ADD
Address
Write Data
Read Data
Data Memory
1
0
MemToRegRead Address
Instruction[31:0]
Instruction Memory
A
B
ReadAddr1
ReadAddr2
WriteAddr
WriteData
ReadData1
ReadData2
Register File
rt
imm
rd
rt
rs
rt
MemReadMemWrite
PC
EN
1004
____
??
______
__
__?
??
?
??? ? ?
?
?
?
?
?
Cycle 3ID: sub $2, $4, $5IF: and $9, $10, $11 EX: lw $8, 4($29) MEM: ??? WB: ???
Sign Extend
ADD
RegWrite
rd1
0
RegDst
0
1
ALUSrcALUOp
ADD
Address
Write Data
Read Data
Data Memory
1
0
MemToRegRead Address
Instruction[31:0]
Instruction Memory
A
B
ReadAddr1
ReadAddr2
WriteAddr
WriteData
ReadData1
ReadData2
Register File
rt
imm
rd
rt
rs
rt
MemReadMemWrite
PC
EN
1008
45
??
x25
104
105 __?
__
__
__
____
__ ?
?
?
?
?
?
__
__ __
__
Cycle 4ID: and $9, $10, $11IF: or $16, $17, $18 EX: sub $2, $4, $5 MEM: lw $8, 4($29) WB: ???
Sign Extend
ADD
RegWrite
rd1
0
RegDst
0
1
ALUSrcALUOp
ADD
Address
Write Data
Read Data
Data Memory
1
0
MemToRegRead Address
Instruction[31:0]
Instruction Memory
A
B
ReadAddr1
ReadAddr2
WriteAddr
WriteData
ReadData1
ReadData2
Register File
rt
imm
rd
rt
rs
rt
MemReadMemWrite
PC
EN
1012
1011
??
x910
110
111-1
__104
105
x
25
2 __
__
?
?
?
?
105
0 SUB
1
__
Cycle 5 (full)ID: or $16, $17, $18IF: add $13, $14, $0 EX: and $9, $10, $11 MEM: sub $2, $4, $5 WB: lw $8, 4($29)
Sign Extend
ADD
RegWrite
rd1
0
RegDst
0
1
ALUSrcALUOp
ADD
Address
Write Data
Read Data
Data Memory
1
0
MemToRegRead Address
Instruction[31:0]
Instruction Memory
A
B
ReadAddr1
ReadAddr2
WriteAddr
WriteData
ReadData1
ReadData2
Register File
rt
imm
rd
rt
rs
rt
MemReadMemWrite
PC
EN
1016
1718
____
x1618
117
118110
-1110
111
x
911
9 2
x
__
__
__
__
111
0 AND
1
105__
__
Sign Extend
ADD
RegWrite
rd1
0
RegDst
0
1
ALUSrcALUOp
ADD
Address
Write Data
Read Data
Data Memory
1
0
MemToRegRead Address
Instruction[31:0]
Instruction Memory
A
B
ReadAddr1
ReadAddr2
WriteAddr
WriteData
ReadData1
ReadData2
Register File
rt
imm
rd
rt
rs
rt
MemReadMemWrite
PC
EN
ID: add $13, $14, $0IF: ??? EX: or $16, $17, $18 MEM: and $9, $10, $11 WB: sub $2, $4, $5
Cycle 6 (emptying)
00
x x
117
111
00
x
Cycle 7ID: ???IF: ??? EX: add $13, $14, $0 MEM: or $16, $17, $18 WB: and $9, $10,
$11
Sign Extend
ADD
RegWrite
rd1
0
RegDst
0
1
ALUSrcALUOp
ADD
Address
Write Data
Read Data
Data Memory
1
0
MemToRegRead Address
Instruction[31:0]
Instruction Memory
A
B
ReadAddr1
ReadAddr2
WriteAddr
WriteData
ReadData1
ReadData2
Register File
rt
imm
rd
rt
rs
rt
MemReadMemWrite
PC
EN
Cycle 8IF: ??? EX: ??? MEM: add $13, $14, $0 WB: or $16, $17,
$18
Sign Extend
ADD
RegWrite
rd1
0
RegDst
0
1
ALUSrcALUOp
ADD
Address
Write Data
Read Data
Data Memory
1
0
MemToRegRead Address
Instruction[31:0]
Instruction Memory
A
B
ReadAddr1
ReadAddr2
WriteAddr
WriteData
ReadData1
ReadData2
Register File
rt
imm
rd
rt
rs
rt
MemReadMemWrite
PC
EN
ID: ???
Cycle 9MEM: ??? WB: add $13, $14, $0IF: ??? EX: ???ID: ???
Sign Extend
ADD
RegWrite
rd1
0
RegDst
0
1
ALUSrcALUOp
ADD
Address
Write Data
Read Data
Data Memory
1
0
MemToRegRead Address
Instruction[31:0]
Instruction Memory
A
B
ReadAddr1
ReadAddr2
WriteAddr
WriteData
ReadData1
ReadData2
Register File
rt
imm
rd
rt
rs
rt
MemReadMemWrite
PC
EN
Things to notice from the last nine slides
Instruction executions overlap Each functional unit is used by a different instruction in each cycle. In clock cycle 5, all of the hardware units are used (the pipeline is full).
This is the ideal situation, and what makes pipelined processors so fast Similar example in the book available at the end of Section 6.3.
Clock cycle1 2 3 4 5 6 7 8 9
add $sp, $sp, -4 IF ID EX NOP WBsub $v0, $a0, $a1 IF ID EX NOP WBlw $t0, 4($sp) IF ID EX MEM WBor $s0, $s1, $s2 IF ID EX NOP WBlw $t1, 8($sp) IF ID EX MEM WB
MIPs ISA makes pipelining “easy” Instruction formats are the same length and uniform Addressing modes are simple Each instruction takes only one cycle
Everything goes left to right, except …
Next time: We will discuss Data Hazards
Sign Extend
4
ADD
PCSrc
RegWrite
rd1
0
RegDst
MIPS Decoder
0
1
ALUSrcALUOp
<< 2
ADD
Address
Write Data
Read Data
Data Memory
1
0
MemToReg
IF/ID ID/EX
WBMEX
EX/MEM
WBM
Mem/WB
WB
0
1
Read Address
Instruction[31:0]
Instruction Memory
A
B
ReadAddr1
ReadAddr2
WriteAddr
WriteData
ReadData1
ReadData2
Register File
rt
imm
rd
rt
rs
rt
MemReadMemWrite
PC
EN
zero
Some instructions do not require all five stages, can we skip stages?
Clock cycle1 2 3 4 5 6 7 8 9
add$sp, $sp, -4 IF ID EX WBsub $v0, $a0, $a1 IF ID EX WBlw $t0, 4($sp) IF ID EX MEM WBor $s0, $s1, $s2 IF ID EX WBlw $t1, 8($sp) IF ID EX MEM WB
Trying to use the single stage for multiple instructions creates a structural hazard Each functional unit can only be used once per instruction
Clock cycle1 2 3 4 5 6 7 8 9
add$sp, $sp, -4 IF ID EX WBsub $v0, $a0, $a1 IF ID EX WBlw $t0, 4($sp) IF ID EX MEM WBor $s0, $s1, $s2 IF ID EX WBlw $t1, 8($sp) IF ID EX MEM WB
Insert NOP (No OPeration) stages to avoid structural hazards
Stores and Branches have NOP stages, too…
Clock cycle1 2 3 4 5 6 7 8 9
add $sp, $sp, -4 IF ID EX NOP WBsub $v0, $a0, $a1 IF ID EX NOP WBlw $t0, 4($sp) IF ID EX MEM WBor $s0, $s1, $s2 IF ID EX NOP WBlw $t1, 8($sp) IF ID EX MEM WB
R-type IF ID EX NOP WB
store IF ID EX MEM NOPbranch IF ID EX NOP NOP