ppt

29
Chapter 3 Pipelining

description

1111

Transcript of ppt

Page 1: ppt

Chapter 3 Pipelining

Page 2: ppt

3.1 Pipeline Model

TerminologyTerminology– tasktask– subtasksubtask– stagestage– staging registerstaging register

Total processing time for each task. Total processing time for each task. – TTplpl = , where = , where ttii is the processing time, is the processing time,

ddi i is the delay by the staging register, and k is the is the delay by the staging register, and k is the

number of stagesnumber of stages

k

i

i idt1

)(

Page 3: ppt

3.1 Pipeline Model (continued)

Total processing time for each task. Total processing time for each task. – TTseqseq = =

pipeline cycle time,pipeline cycle time, t tmaxmax = Max(= Max(ttii+d+dii), ), 1 1 I I k k

clock frequency = 1/clock frequency = 1/ t tmaxmax

pipeline cycle timepipeline cycle time t tcyccyc can be denoted by can be denoted by TTseqseq//

k + d k + d speedup, S = speedup, S = ,where N is the numb ,where N is the numb

er of tasks.er of tasks.

k

i

it1

)(

cyc

seq

tNkTN)1(

Page 4: ppt

3.1 Pipeline Model (continued)

If staging register delay is ignored and the proIf staging register delay is ignored and the processing times of the stages are same, cessing times of the stages are same, ttcyccyc = =

TTseqseq / k. / k. Therefore, Therefore,

SSideal ideal becomes becomes If If

1NkkN

kSN ideal ,

Page 5: ppt

3.1 Pipeline Model (continued)

The total cost of the pipeline is given by The total cost of the pipeline is given by C= C= L.k + Cp where Cp =L.k + Cp where Cp = and L is the cost of and L is the cost of each staging register.each staging register.

To minimize the composite cost per the To minimize the composite cost per the computation rate, k = computation rate, k = dL

TseqCp

k

i

ic1

Page 6: ppt

3.1 Pipeline Model (continued)

In practice, making the delays of pipeline stages In practice, making the delays of pipeline stages equal is a complicated and time-consuming processequal is a complicated and time-consuming process– It is essential to maximum performance that the stages be It is essential to maximum performance that the stages be

close to balanced.close to balanced.– It is done for commercial processors, although it is not easy It is done for commercial processors, although it is not easy

and cheap to doand cheap to do

Another problem with pipelines is the overhead in Another problem with pipelines is the overhead in term of handling exception or interrupts.term of handling exception or interrupts.– A deep pipeline increases the interrupt handling overhead. A deep pipeline increases the interrupt handling overhead.

Page 7: ppt

Pipeline Types Pipeline Types(Handler’s classification)Pipeline Types(Handler’s classification)

– Instruction pipelinesInstruction pipelines FI, DI, CA, FO, EX, STFI, DI, CA, FO, EX, ST

– arithmetic pipelinesarithmetic pipelines– processor pipelines: a cascade of processor pipelines: a cascade of

processors each executing a specific processors each executing a specific module in the application program.module in the application program.

Page 8: ppt

Instruction pipeline

reservation tablereservation table– Row : stagesRow : stages– Column : pipeline cycles Column : pipeline cycles

The cycle time of instruction pipelines is The cycle time of instruction pipelines is often determined by the stages often determined by the stages requiring memory access. requiring memory access.

Page 9: ppt

Control Hazard

Conditional branch instructionsConditional branch instructions– The target address of branch will be known only afThe target address of branch will be known only af

ter the evaluation of the condition.ter the evaluation of the condition. The ways to solve control hazardsThe ways to solve control hazards

– The pipeline is frozenThe pipeline is frozen– The pipeline predicts that the branch will not be taThe pipeline predicts that the branch will not be ta

ken.ken.– It would be to start fetching the target instruction sIt would be to start fetching the target instruction s

equence into a buffer while the nonbranch sequenequence into a buffer while the nonbranch sequence is being fed into the pipeline. ce is being fed into the pipeline.

Page 10: ppt

Arithmetic pipelines

Floating point additionFloating point addition– Consider S = A + B, where A=(Ea,Ma), B=(Eb, MConsider S = A + B, where A=(Ea,Ma), B=(Eb, M

b), and S=(Es,Ms) b), and S=(Es,Ms) – Addition steps (Figure 3.5)Addition steps (Figure 3.5)

Equalize the exponentsEqualize the exponents Add mantissasAdd mantissas Normalize Ms and adjust Es for the sum normalization Normalize Ms and adjust Es for the sum normalization Round MsRound Ms Renormalize Ms and adjust EsRenormalize Ms and adjust Es

– Modified floating point add pipeline (Figure 3.6 & 3.Modified floating point add pipeline (Figure 3.6 & 3.7) 7)

Page 11: ppt

Arithmetic pipelines(cont.)

floating point multiplicationfloating point multiplication– Consider P= A x B, where A=(Ea,Ma), B=(Eb, MConsider P= A x B, where A=(Ea,Ma), B=(Eb, M

b), and P=(Ep,Mp) b), and P=(Ep,Mp) – Multiplication steps (Figure 3.8)Multiplication steps (Figure 3.8)

Add exponentsAdd exponents Multiply mantissasMultiply mantissas Normalize Mp and adjust Ep Normalize Mp and adjust Ep Round MpRound Mp Renormalize Mp and adjust EpRenormalize Mp and adjust Ep

– Modified floating point add pipeline (Figure 3.9)Modified floating point add pipeline (Figure 3.9)

Page 12: ppt

Arithmetic pipelines(cont.)

Multifunction pipelineMultifunction pipeline– To perform more than one operation To perform more than one operation – A control input is needed for proper A control input is needed for proper

operation of the multifunction pipeline.operation of the multifunction pipeline.– Figure 3.10 : floating point add/multiplier Figure 3.10 : floating point add/multiplier

Page 13: ppt

Classification scheme by Ramamoorthy and Li

FunctionalityFunctionality– unifunctional unifunctional – multifunctional multifunctional

ConfigurationConfiguration– static static – dynamicdynamic

Mode of operation:Mode of operation:– scalar scalar – vectorvector

Page 14: ppt

3.2 Pipeline control and Performance

To provide the max. possible throughput, it To provide the max. possible throughput, it must be kept full and flowing smoothly. must be kept full and flowing smoothly.

Two conditions of smooth flow of a pipeline:Two conditions of smooth flow of a pipeline:– the rate of input of datathe rate of input of data– data interlocks between the stagesdata interlocks between the stages

Example 3.1 : the pipeline completes one Example 3.1 : the pipeline completes one operation per cycle(once it is full)operation per cycle(once it is full)

Example 3.2 : non-linear pipelineExample 3.2 : non-linear pipeline

Page 15: ppt

Structural hazard

Due to the non-availability of Due to the non-availability of appropriate hardwareappropriate hardware

One obvious way of avoiding structural One obvious way of avoiding structural hazard is to insert additional hardware hazard is to insert additional hardware into the pipeline. into the pipeline.

Page 16: ppt

Example 3.3

Figure 3.12 depicts the operation of the Figure 3.12 depicts the operation of the pipelinepipeline– In cycle 3, 4, 5, and 6, simultaneous accesses are In cycle 3, 4, 5, and 6, simultaneous accesses are

needed.needed.– If we assume that the machine has separate data If we assume that the machine has separate data

and instruction caches, in cycles 5 and 6 the and instruction caches, in cycles 5 and 6 the problems are solved.problems are solved.

– One way to solve the problem in cycle 4 is to stall One way to solve the problem in cycle 4 is to stall the ADD instruction (Figure 3.13)the ADD instruction (Figure 3.13)

The stalling process results in a degradation of pipeline The stalling process results in a degradation of pipeline performance. performance.

Page 17: ppt

Collision vectors

Initiation : launching of an operation into the Initiation : launching of an operation into the pipelinepipeline

Latency: the number of cycles that elapse Latency: the number of cycles that elapse between two initiation.between two initiation.

Latency sequence: the latencies between Latency sequence: the latencies between successive initiationssuccessive initiations

Collision: it occurs if a stage in the pipeline is Collision: it occurs if a stage in the pipeline is required to perform more than one task at required to perform more than one task at any time.any time.

Page 18: ppt

Collision vectors(cont.)

Forbidden set: the set of all possible column Forbidden set: the set of all possible column distances between two entries on some row distances between two entries on some row of RT.of RT.

Collision vector can be derived from forbiddCollision vector can be derived from forbidden set F and can be utilized to control the inien set F and can be utilized to control the initiation of operations in the pipelines.tiation of operations in the pipelines.

– CV = (vCV = (vn-1n-1,v,vn-2n-2,…,v,…,v22,v,v11))

– VVii =1 if i is in the forbidden set =1 if i is in the forbidden set

Page 19: ppt

Examples

Example 3.4Example 3.4(a)(a) Overlapped RTOverlapped RT

(b)(b) Collision Vector(CV)Collision Vector(CV)

Example 3.5 & 3.6Example 3.5 & 3.6Collision case and no collision case Collision case and no collision case

Page 20: ppt

Control

How to control the initiation of pipeline using CV.How to control the initiation of pipeline using CV.– Place the CV in a shift reg.Place the CV in a shift reg.– If the LSB of the shift reg. Is 1, do not initiate an If the LSB of the shift reg. Is 1, do not initiate an

operation at that cycle; shift the CV right once, operation at that cycle; shift the CV right once, inserting 0 at the vacant MSB positioninserting 0 at the vacant MSB position

– If the LSB of the shift reg. Is 0, initiate a new operation If the LSB of the shift reg. Is 0, initiate a new operation at that cycle; shift the CV right once, inserting 0 at the at that cycle; shift the CV right once, inserting 0 at the vacant MSB position. In order to reflect the vacant MSB position. In order to reflect the superposing status due to the new initiation over the superposing status due to the new initiation over the original one, perform a bit-by-bit OR of the original CV original one, perform a bit-by-bit OR of the original CV with the content of the shift reg.with the content of the shift reg.

Page 21: ppt

3.2.3 Performance

Figure 3.15(a)Figure 3.15(a)– The CV of Figure 3.11 : (00111)The CV of Figure 3.11 : (00111)– Figure 3.15(a) shows the state transitions. Figure 3.15(a) shows the state transitions.

Page 22: ppt

3.2.3 Performance

Average latencyAverage latency simple cyclesimple cycle greedy cyclegreedy cycle MAL(Minimum average Latency)MAL(Minimum average Latency)

Page 23: ppt

3.2.4 Multifunction Pipelines

Figure 3.17Figure 3.17 Vxx, Vxy, Vyx, VyyVxx, Vxy, Vyx, Vyy

Page 24: ppt

3.3 Other Pipeline Problems Data Interlock: due to the sharing of Data Interlock: due to the sharing of

resources. Data hazardresources. Data hazard data forwardingdata forwarding internal forwardinginternal forwarding

– write-read forwardingwrite-read forwarding– read-read forwardingread-read forwarding– write-write forwardingwrite-write forwarding

load/store architectures versus load/store architectures versus memory/memory architecturesmemory/memory architectures

Page 25: ppt

3.3 Other Pipeline Problems (continued)

Conditional BranchesConditional Branches– branch predictionbranch prediction– delayed branchdelayed branch– branch-prediction bufferbranch-prediction buffer– branch historybranch history– multiple instruction buffersmultiple instruction buffers

InterruptsInterrupts– precise interrupt schemeprecise interrupt scheme

Page 26: ppt

3.4 Dynamic Pipelines Instruction deferralInstruction deferral

– scoreboardscoreboard Tomosulo’s algorithmTomosulo’s algorithm Performance evaluationPerformance evaluation

– maximizing the total number of initiations pmaximizing the total number of initiations per unit timeer unit time

– minimizing the total time required to handle minimizing the total time required to handle a specific sequences of initiation table typea specific sequences of initiation table typess

Page 27: ppt

3.5 Example systems

CDC Star-100CDC Star-100 CDC 6600CDC 6600 MIPS R-4000MIPS R-4000

Page 28: ppt

3.6 Summaries

Three approaches have been tried to Three approaches have been tried to improve the performance beyond the improve the performance beyond the ideal CPI case:ideal CPI case:– superpipelinesuperpipeline– superscalarsuperscalar– VLIW(Very Long Instruction Word)VLIW(Very Long Instruction Word)

Page 29: ppt

End of Chapter 3