1 Seoul National University Pipelined Implementation : Part I.
-
Upload
tracy-griffin -
Category
Documents
-
view
216 -
download
0
Transcript of 1 Seoul National University Pipelined Implementation : Part I.
1
Seoul National University
Pipelined Implementation : Part I
2
Overview
Seoul National University
General Principles of Pipelining Goal Difficulties
Creating a Pipelined Y86 Processor Rearranging SEQ Inserting pipeline registers Problems with data and control hazards
3
Real-World Pipelines: Car Washes
Seoul National University
Idea Divide process into independent stages Move objects through stages in
sequence At any given times, multiple objects
being processed
Sequential Parallel
Pipelined
4
Computational Example
Seoul National University
System Computation requires total of 300 picoseconds Additional 20 picoseconds to save result in register Must have clock cycle of at least 320 ps
Combinationallogic
Reg
300 ps 20 ps
Clock
Delay = 320 psThroughput = 3.12 GIPS
5
3-Way Pipelined Version
Seoul National University
System Divide combinational logic into 3 blocks of 100 ps each Can begin new operation as soon as previous one passes through stage A.
Begin new operation every 120 ps Overall latency increases
360 ps from start to finish
Reg
Clock
Comb.logic
A
Reg
Comb.logic
B
Reg
Comb.logic
C
100 ps 20 ps 100 ps 20 ps 100 ps 20 ps
Delay = 360 psThroughput = 8.33 GIPS
6
Pipeline Diagrams
Seoul National University
Unpipelined
Cannot start new operation until previous one completes
3-Way Pipelined
Up to 3 operations in process simultaneously
OP1
Time
OP2
OP3
OP1
Time
A B C
A B C
A B COP2
OP3
7
Operating a Pipeline
Seoul National University
Time
OP1
OP2
OP3
A B C
A B C
A B C
0 120 240 360 480 640
Clock
Reg
Clock
Comb.logic
A
Reg
Comb.logic
B
Reg
Comb.logic
C
100 ps 20 ps 100 ps 20 ps 100 ps 20 ps
239
Reg
Clock
Comb.logic
A
Reg
Comb.logic
B
Reg
Comb.logic
C
100 ps 20 ps 100 ps 20 ps 100 ps 20 ps
241
Reg
Reg
Reg
100 ps 20 ps 100 ps 20 ps 100 ps 20 ps
Comb.logic
A
Comb.logic
B
Comb.logic
C
Clock
300
Reg
Clock
Comb.logic
A
Reg
Comb.logic
B
Reg
Comb.logic
C
100 ps 20 ps 100 ps 20 ps 100 ps 20 ps
359
8
Limitations: Nonuniform Delays
Seoul National University
Throughput limited by slowest stage Other stages sit idle for much of the time Challenging to partition system into balanced stages
Reg
Clock
Reg
Comb.logic
B
Reg
Comb.logic
C
50 ps
20 ps 150 ps 20 ps 100 ps 20 ps
Delay = 510 psThroughput = 5.88 GIPS
Comb.logic
A
Time
OP1
OP2
OP3
A B C
A B C
A B C
9
Limitations: Register Overhead
Seoul National University
As try to deepen pipeline, overhead of loading registers becomes more significant
Percentage of clock cycle spent loading register: 1-stage pipeline: 6.25% 3-stage pipeline: 16.67% 6-stage pipeline: 28.57%
High speeds of modern processor designs obtained through very deep pipelining
Delay = 420 ps, Throughput = 14.29 GIPSClock
Reg
Comb.logic
50 ps 20 ps
Reg
Comb.logic
50 ps 20 ps
Reg
Comb.logic
50 ps 20 ps
Reg
Comb.logic
50 ps 20 ps
Reg
Comb.logic
50 ps 20 ps
Reg
Comb.logic
50 ps 20 ps
10
Data Hazards (Dependencies) in Processors
Seoul National University
Result from one instruction used as operand for another Read-after-write (RAW) dependency
Very common in actual programs Must make sure our pipeline handles these properly
Get correct results Minimize performance impact
1 irmovl $50, %eax
2 addl %eax , %ebx
3 mrmovl 100( %ebx ), %edx
11
Adding Pipeline Registers
Seoul National University
Instructionmemory
Instructionmemory
PCincrement
PCincrement
CCCCALUALU
Datamemory
Datamemory
Fetch
Decode
Execute
Memory
Write back
icode ifunrA, rB
valC
Registerfile
Registerfile
A BM
E
Registerfile
Registerfile
A BM
E
PC
valP
srcA, srcBdstA, dstB
valA, valB
aluA, aluB
Cnd
valE
Addr, Data
valM
PCvalE, valM
newPC
,
12
Pipeline Stages
Seoul National University
Fetch Select current PC Read instruction Compute incremented PC
Decode Read program registers
Execute Operate ALU
Memory Read or write data memory
Write Back Update register file
13
PIPE- Hardware
Seoul National University
Pipeline registers hold intermediate values from instruction execution
Forward (Upward) Paths Same values passed from one
stage to next e.g., icode, valC, etc
Cannot jump over other stages
14
Signal Naming Conventions
Seoul National University
S_Field Value of Field held in stage S pipeline register
s_Field Value of Field computed in stage S
15
Feedback Paths
Seoul National University
Predicted PC Guess value of next PC
Branch information Jump taken/not-taken Fall-through or target address
Return point Read from memory
Register updates To register file write ports
16
Predicting the PC
Seoul National University
Start fetch of new instruction after current one has completed fetch stage Not possible to determine the next instruction 100% correctly
Guess which instruction will follow Recover if prediction was incorrect
17
Our Prediction Strategy
Seoul National University
Instructions that Don’t Transfer Control Predict next PC to be valP Always reliable
Call and Unconditional Jumps Predict next PC to be valC (destination) Always reliable
Conditional Jumps Predict next PC to be valC (destination) Only correct if branch is taken
Typically right 60% of time Return Instruction
Don’t try to predict
18
Recovering from PC Misprediction
Seoul National University
Mispredicted Jump Will see branch condition flag once instruction reaches memory stage Can get fall-through PC from valA (value M_valA)
Return Instruction Will get return PC when ret reaches write-back stage (W_valM)
19
Pipeline Demonstration
Seoul National University
irmovl $1,%eax #I1
1 2 3 4 5 6 7 8 9
F D E M
Wirmovl $2,%ecx #I2 F D E M
W
irmovl $3,%edx #I3 F D E M Wirmovl $4,%ebx #I4 F D E M Whalt #I5 F D E M W
Cycle 5WI1
MI2
EI3
DI4
FI5
20
Data Dependencies: No Nop
Seoul National University
0x000: irmovl $10,%edx
0x006: irmovl $3,%eax
0x00c: addl %edx,%eax
0x00e: halt
1 2 3 4 5 6 7 8
F D E MWF D E M
W
F D E M WF D E M W
E
D
valA f R[%edx] = 0valB f R[%eax] = 0
D
valA f R[%edx] = 0valB f R[%eax] = 0
Cycle 4
Error
MM_valE= 10M_dstE= %edx
e_valE f0 + 3 = 3 E_dstE= %eax
21
Data Dependencies: 1 Nop
Seoul National University
0x000: irmovl $10,%edx
0x006: irmovl $3,%eax
0x00c: nop
0x00d: addl %edx,%eax
0x00f: halt
1 2 3 4 5 6 7 8 9
F D E MWF D E M
W
F D E M WF D E M WF D E M WF D E M W
F D E M WF D E M W
W
R[%edx] f 10
W
R[%edx] f 10
D
valA f R[%edx] = 0valB f R[%eax] = 0
D
valA f R[%edx] = 0valB f R[%eax] = 0
•••
Cycle 5
Error
MM_valE= 3M_dstE= %eax
22
Data Dependencies: 2 Nop’s
Seoul National University
1 2 3 4 5 6 7 8 9
F D E M WF D E M WF D E M WF D E M W
F D E M WF D E M WF D E M WF D E M W
F D E M WF D E M WF D E M WF D E M W
10
0x000: irmovl $10,%edx
0x006: irmovl $3,%eax
0x00c: nop
0x00d: nop
0x00e: addl %edx,%eax
0x010: halt
W
R[%eax] f 3
D
valA f R[%edx] = 10valB f R[%eax] = 0
•••
W
R[%eax] f 3
W
R[%eax] f 3
D
valA f R[%edx] = 10valB f R[%eax] = 0
D
valA f R[%edx] = 10valB f R[%eax] = 0
•••
Cycle 6
Error
23
Data Dependencies: 3 Nop’s
Seoul National University
0x000: irmovl $10,%edx
0x006: irmovl $3,%eax
0x00c: nop
0x00d: nop
0x00e: nop
0x00f: addl %edx,%eax
0x011: halt
1 2 3 4 5 6 7 8 9
F D E M WF D E M WF D E M WF D E M W
F D E M WF D E M WF D E M WF D E M W
F D E M WF D E M WF D E M WF D E M W
10
W
R[%eax] f 3
W
R[%eax] f 3
D
valA f R[%edx] = 10valB f R[%eax] = 3
D
valA f R[%edx] = 10valB f R[%eax] = 3
Cycle 6
11
F D E M WF D E M W
Cycle 7
24
Branch Misprediction Example
Seoul National University
0x000: xorl %eax,%eax 0x002: jne t # Not taken 0x007: irmovl $1, %eax # Fall through 0x00d: nop 0x00e: nop 0x00f: nop 0x010: halt 0x011: t: irmovl $3, %edx # Target (Should not execute) 0x017: irmovl $4, %ecx # Should not execute 0x01d: irmovl $5, %edx # Should not execute
25
Branch Misprediction Trace
Seoul National University
0x000: xorl %eax,%eax
0x002: jne t # Not taken
0x011: t: irmovl $3, %edx # Target
0x017: irmovl $4, %ecx # Target+1
0x007: irmovl $1, %eax # Fall Through
1 2 3 4 5 6 7 8 9
F D E M
WF D E M
W
F D E M W
F D E M W
F D E M WF D E M W
Cycle 5
E
valE f 3dstE = %edx
E
valE f 3dstE = %edx
MM_Cnd = 0M_valA= 0x007
D
valC= 4dstE= %ecx
D
valC= 4dstE= %ecx
F
valC f 1rB f %eax
F
valC f 1rB f %eax
Incorrectly execute two instructions at branch target
...
...
26
Quick & Dirty Solution
Seoul National University
0x000: xorl %eax,%eax 0x002: jne t # Not taken 0x007: irmovl $1, %eax # Fall through 0x00d: nop 0x00e: nop 0x00f: nop 0x010: halt 0x011: t: nop 0x012 nop 0x013 nop 0x014: irmovl $3, %edx # Target (Should not execute) 0x01a: irmovl $4, %ecx # Should not execute 0x020: irmovl $5, %edx # Should not execute
27
Return Example
Seoul National University
0x000: irmovl Stack,%esp # Initialize stack pointer 0x006: nop # Avoid hazard on %esp 0x007: nop 0x008: nop 0x009: call p # Procedure call 0x00e: irmovl $5,%esi # Return point 0x014: halt 0x020: .pos 0x20 0x020: p: nop # procedure 0x021: nop 0x022: nop 0x023: ret 0x024: irmovl $1,%eax # Should not be executed 0x02a: irmovl $2,%ecx # Should not be executed 0x030: irmovl $3,%edx # Should not be executed 0x036: irmovl $4,%ebx # Should not be executed 0x100: .pos 0x100 0x100: Stack: # Stack: Stack pointer
28
Incorrect Return Example
Seoul National University
F D E M
WF D E M
W
F D E M W
F D E M W
F D E M WF D E M W
EvalE f 2dstE = %ecx
MvalE = 1dstE = %eax
DvalC = 3dstE = %edx
FvalC f 5rB f %esi
W
valM = 0x0e
F D E M
WF D E M
W
F D E M W
F D E M W
F D E M W
0x024: irmovl $1,%eax # Oops!
0x02a: irmovl $2,%ecx # Oops!
0x030: irmovl $3,%edx # Oops!
0x00e: irmovl $5,%esi
0x023: ret
0x024: irmovl $1,%eax # Oops!
0x02a: irmovl $2,%ecx # Oops!
0x030: irmovl $3,%edx # Oops!
0x00e: irmovl $5,%esi # Return F D E M W
EvalE f 2dstE = %ecx
EvalE f 2dstE = %ecx
MvalE = 1dstE = %eax
MvalE = 1dstE = %eax
DvalC = 3dstE = %edx
DvalC = 3dstE = %edx
FvalC f 5rB f %esi
FvalC f 5rB f %esi
W
valM = 0x0e
W
valM = 0x0e
Incorrectly execute 3 instructions following ret
29
Another Quick & Dirty Solution
Seoul National University
0x000: irmovl Stack,%esp # Initialize stack pointer 0x006: nop # Avoid hazard on %esp 0x007: nop 0x008: nop 0x009: call p # Procedure call 0x00e: irmovl $5,%esi # Return point 0x014: halt 0x020: .pos 0x20 0x020: p: nop # procedure 0x021: nop 0x022: nop 0x023: ret 0x024 nop 0x025 nop 0x026 nop 0x027 nop 0x028: irmovl $1,%eax # Should not be executed 0x02e: irmovl $2,%ecx # Should not be executed 0x034: irmovl $3,%edx # Should not be executed 0x03a: irmovl $4,%ebx # Should not be executed 0x100: .pos 0x100 0x100: Stack: # Stack: Stack pointer
30
Pipeline Summary
Seoul National University
Concept Break instruction execution into 5 stages Run instructions through in pipelined mode
Limitations Can’t handle dependencies between instructions when instructions follow
too closely Data dependencies
One instruction writes register, later one reads it Control dependency
Instruction sets PC in way that pipeline did not predict correctly Mispredicted branch and return
Fixing the Pipeline We’ll do that next time