ppt

Chapter 3 Pipelining

3.1 Pipeline Model

TerminologyTerminology– tasktask– subtasksubtask– stagestage– staging registerstaging register

Total processing time for each task. Total processing time for each task. – TTplpl = , where = , where ttii is the processing time, is the processing time,

ddi i is the delay by the staging register, and k is the is the delay by the staging register, and k is the

number of stagesnumber of stages

k

i

i idt1

)(

3.1 Pipeline Model (continued)

Total processing time for each task. Total processing time for each task. – TTseqseq = =

pipeline cycle time,pipeline cycle time, t tmaxmax = Max(= Max(ttii+d+dii), ), 1 1 I I k k

clock frequency = 1/clock frequency = 1/ t tmaxmax

pipeline cycle timepipeline cycle time t tcyccyc can be denoted by can be denoted by TTseqseq//

k + d k + d speedup, S = speedup, S = ,where N is the numb ,where N is the numb

er of tasks.er of tasks.

k

i

it1

)(

cyc

seq

tNkTN)1(


If staging register delay is ignored and the proIf staging register delay is ignored and the processing times of the stages are same, cessing times of the stages are same, ttcyccyc = =

TTseqseq / k. / k. Therefore, Therefore,

SSideal ideal becomes becomes If If

1NkkN

kSN ideal ,


The total cost of the pipeline is given by The total cost of the pipeline is given by C= C= L.k + Cp where Cp =L.k + Cp where Cp = and L is the cost of and L is the cost of each staging register.each staging register.

To minimize the composite cost per the To minimize the composite cost per the computation rate, k = computation rate, k = dL

TseqCp

k

i

ic1


In practice, making the delays of pipeline stages In practice, making the delays of pipeline stages equal is a complicated and time-consuming processequal is a complicated and time-consuming process– It is essential to maximum performance that the stages be It is essential to maximum performance that the stages be

close to balanced.close to balanced.– It is done for commercial processors, although it is not easy It is done for commercial processors, although it is not easy

and cheap to doand cheap to do

Another problem with pipelines is the overhead in Another problem with pipelines is the overhead in term of handling exception or interrupts.term of handling exception or interrupts.– A deep pipeline increases the interrupt handling overhead. A deep pipeline increases the interrupt handling overhead.

Pipeline Types Pipeline Types(Handler’s classification)Pipeline Types(Handler’s classification)

– Instruction pipelinesInstruction pipelines FI, DI, CA, FO, EX, STFI, DI, CA, FO, EX, ST

– arithmetic pipelinesarithmetic pipelines– processor pipelines: a cascade of processor pipelines: a cascade of

processors each executing a specific processors each executing a specific module in the application program.module in the application program.

Instruction pipeline

reservation tablereservation table– Row : stagesRow : stages– Column : pipeline cycles Column : pipeline cycles

The cycle time of instruction pipelines is The cycle time of instruction pipelines is often determined by the stages often determined by the stages requiring memory access. requiring memory access.

Control Hazard

Conditional branch instructionsConditional branch instructions– The target address of branch will be known only afThe target address of branch will be known only af

ter the evaluation of the condition.ter the evaluation of the condition. The ways to solve control hazardsThe ways to solve control hazards

– The pipeline is frozenThe pipeline is frozen– The pipeline predicts that the branch will not be taThe pipeline predicts that the branch will not be ta

ken.ken.– It would be to start fetching the target instruction sIt would be to start fetching the target instruction s

equence into a buffer while the nonbranch sequenequence into a buffer while the nonbranch sequence is being fed into the pipeline. ce is being fed into the pipeline.

Arithmetic pipelines

Floating point additionFloating point addition– Consider S = A + B, where A=(Ea,Ma), B=(Eb, MConsider S = A + B, where A=(Ea,Ma), B=(Eb, M

b), and S=(Es,Ms) b), and S=(Es,Ms) – Addition steps (Figure 3.5)Addition steps (Figure 3.5)

Equalize the exponentsEqualize the exponents Add mantissasAdd mantissas Normalize Ms and adjust Es for the sum normalization Normalize Ms and adjust Es for the sum normalization Round MsRound Ms Renormalize Ms and adjust EsRenormalize Ms and adjust Es

– Modified floating point add pipeline (Figure 3.6 & 3.Modified floating point add pipeline (Figure 3.6 & 3.7) 7)

Arithmetic pipelines(cont.)

floating point multiplicationfloating point multiplication– Consider P= A x B, where A=(Ea,Ma), B=(Eb, MConsider P= A x B, where A=(Ea,Ma), B=(Eb, M

b), and P=(Ep,Mp) b), and P=(Ep,Mp) – Multiplication steps (Figure 3.8)Multiplication steps (Figure 3.8)

Add exponentsAdd exponents Multiply mantissasMultiply mantissas Normalize Mp and adjust Ep Normalize Mp and adjust Ep Round MpRound Mp Renormalize Mp and adjust EpRenormalize Mp and adjust Ep

– Modified floating point add pipeline (Figure 3.9)Modified floating point add pipeline (Figure 3.9)

Arithmetic pipelines(cont.)

Multifunction pipelineMultifunction pipeline– To perform more than one operation To perform more than one operation – A control input is needed for proper A control input is needed for proper

operation of the multifunction pipeline.operation of the multifunction pipeline.– Figure 3.10 : floating point add/multiplier Figure 3.10 : floating point add/multiplier

Classification scheme by Ramamoorthy and Li

FunctionalityFunctionality– unifunctional unifunctional – multifunctional multifunctional

ConfigurationConfiguration– static static – dynamicdynamic

Mode of operation:Mode of operation:– scalar scalar – vectorvector

3.2 Pipeline control and Performance

To provide the max. possible throughput, it To provide the max. possible throughput, it must be kept full and flowing smoothly. must be kept full and flowing smoothly.

Two conditions of smooth flow of a pipeline:Two conditions of smooth flow of a pipeline:– the rate of input of datathe rate of input of data– data interlocks between the stagesdata interlocks between the stages

Example 3.1 : the pipeline completes one Example 3.1 : the pipeline completes one operation per cycle(once it is full)operation per cycle(once it is full)

Example 3.2 : non-linear pipelineExample 3.2 : non-linear pipeline

Structural hazard

Due to the non-availability of Due to the non-availability of appropriate hardwareappropriate hardware

One obvious way of avoiding structural One obvious way of avoiding structural hazard is to insert additional hardware hazard is to insert additional hardware into the pipeline. into the pipeline.

Example 3.3

Figure 3.12 depicts the operation of the Figure 3.12 depicts the operation of the pipelinepipeline– In cycle 3, 4, 5, and 6, simultaneous accesses are In cycle 3, 4, 5, and 6, simultaneous accesses are

needed.needed.– If we assume that the machine has separate data If we assume that the machine has separate data

and instruction caches, in cycles 5 and 6 the and instruction caches, in cycles 5 and 6 the problems are solved.problems are solved.

– One way to solve the problem in cycle 4 is to stall One way to solve the problem in cycle 4 is to stall the ADD instruction (Figure 3.13)the ADD instruction (Figure 3.13)

The stalling process results in a degradation of pipeline The stalling process results in a degradation of pipeline performance. performance.

Collision vectors

Initiation : launching of an operation into the Initiation : launching of an operation into the pipelinepipeline

Latency: the number of cycles that elapse Latency: the number of cycles that elapse between two initiation.between two initiation.

Latency sequence: the latencies between Latency sequence: the latencies between successive initiationssuccessive initiations

Collision: it occurs if a stage in the pipeline is Collision: it occurs if a stage in the pipeline is required to perform more than one task at required to perform more than one task at any time.any time.

Collision vectors(cont.)

Forbidden set: the set of all possible column Forbidden set: the set of all possible column distances between two entries on some row distances between two entries on some row of RT.of RT.

Collision vector can be derived from forbiddCollision vector can be derived from forbidden set F and can be utilized to control the inien set F and can be utilized to control the initiation of operations in the pipelines.tiation of operations in the pipelines.

– CV = (vCV = (vn-1n-1,v,vn-2n-2,…,v,…,v22,v,v11))

– VVii =1 if i is in the forbidden set =1 if i is in the forbidden set

Examples

Example 3.4Example 3.4(a)(a) Overlapped RTOverlapped RT

(b)(b) Collision Vector(CV)Collision Vector(CV)

Example 3.5 & 3.6Example 3.5 & 3.6Collision case and no collision case Collision case and no collision case

Control

How to control the initiation of pipeline using CV.How to control the initiation of pipeline using CV.– Place the CV in a shift reg.Place the CV in a shift reg.– If the LSB of the shift reg. Is 1, do not initiate an If the LSB of the shift reg. Is 1, do not initiate an

operation at that cycle; shift the CV right once, operation at that cycle; shift the CV right once, inserting 0 at the vacant MSB positioninserting 0 at the vacant MSB position

– If the LSB of the shift reg. Is 0, initiate a new operation If the LSB of the shift reg. Is 0, initiate a new operation at that cycle; shift the CV right once, inserting 0 at the at that cycle; shift the CV right once, inserting 0 at the vacant MSB position. In order to reflect the vacant MSB position. In order to reflect the superposing status due to the new initiation over the superposing status due to the new initiation over the original one, perform a bit-by-bit OR of the original CV original one, perform a bit-by-bit OR of the original CV with the content of the shift reg.with the content of the shift reg.

3.2.3 Performance

Figure 3.15(a)Figure 3.15(a)– The CV of Figure 3.11 : (00111)The CV of Figure 3.11 : (00111)– Figure 3.15(a) shows the state transitions. Figure 3.15(a) shows the state transitions.

3.2.3 Performance

Average latencyAverage latency simple cyclesimple cycle greedy cyclegreedy cycle MAL(Minimum average Latency)MAL(Minimum average Latency)

3.2.4 Multifunction Pipelines

Figure 3.17Figure 3.17 Vxx, Vxy, Vyx, VyyVxx, Vxy, Vyx, Vyy

3.3 Other Pipeline Problems Data Interlock: due to the sharing of Data Interlock: due to the sharing of

resources. Data hazardresources. Data hazard data forwardingdata forwarding internal forwardinginternal forwarding

– write-read forwardingwrite-read forwarding– read-read forwardingread-read forwarding– write-write forwardingwrite-write forwarding

load/store architectures versus load/store architectures versus memory/memory architecturesmemory/memory architectures

3.3 Other Pipeline Problems (continued)

Conditional BranchesConditional Branches– branch predictionbranch prediction– delayed branchdelayed branch– branch-prediction bufferbranch-prediction buffer– branch historybranch history– multiple instruction buffersmultiple instruction buffers

InterruptsInterrupts– precise interrupt schemeprecise interrupt scheme

3.4 Dynamic Pipelines Instruction deferralInstruction deferral

– scoreboardscoreboard Tomosulo’s algorithmTomosulo’s algorithm Performance evaluationPerformance evaluation

– maximizing the total number of initiations pmaximizing the total number of initiations per unit timeer unit time

– minimizing the total time required to handle minimizing the total time required to handle a specific sequences of initiation table typea specific sequences of initiation table typess

3.5 Example systems

CDC Star-100CDC Star-100 CDC 6600CDC 6600 MIPS R-4000MIPS R-4000

3.6 Summaries

Three approaches have been tried to Three approaches have been tried to improve the performance beyond the improve the performance beyond the ideal CPI case:ideal CPI case:– superpipelinesuperpipeline– superscalarsuperscalar– VLIW(Very Long Instruction Word)VLIW(Very Long Instruction Word)

End of Chapter 3

ppt

Documents

Transcript of ppt