Chapter 3 Pipelining and parallel Processing · 10 YORK UNIVERSITY CSE4210 Low Power 2 charge 2 ( o...

14
1 Chapter 3 Pipelining and parallel Processing Mokhtar Aboelaze CSE4210 Winter 2012 YORK UNIVERSITY CSE4210 Pipelining -- Introduction Pipelining can be used to reduce the the critical path. That can lead to either increasing the clock speed, or decreasing the power consumption Multiprocessing can be also used to increase speed or reduce power.

Transcript of Chapter 3 Pipelining and parallel Processing · 10 YORK UNIVERSITY CSE4210 Low Power 2 charge 2 ( o...

Page 1: Chapter 3 Pipelining and parallel Processing · 10 YORK UNIVERSITY CSE4210 Low Power 2 charge 2 ( o t) o pd total o k V V C V T P C V f − = = Simple approximation for CMOS Ctotal

1

YORK UNIVERSITY CSE4210

Chapter 3Pipelining and parallel Processing

Mokhtar AboelazeCSE4210 Winter 2012

YORK UNIVERSITY CSE4210

Pipelining -- Introduction• Pipelining can be used to reduce the the

critical path.• That can lead to either increasing the

clock speed, or decreasing the power consumption

• Multiprocessing can be also used to increase speed or reduce power.

Page 2: Chapter 3 Pipelining and parallel Processing · 10 YORK UNIVERSITY CSE4210 Low Power 2 charge 2 ( o t) o pd total o k V V C V T P C V f − = = Simple approximation for CMOS Ctotal

2

YORK UNIVERSITY CSE4210

Pipelining

D D

⊕⊗ ⊗ ⊗

⊕a b c

y(n)

x(n-2)x(n-1)x(n)

)2(1,2

tionmultiplica one and additions twois herepath critical The

muladdsmuladds TTfTTT +=+=

YORK UNIVERSITY CSE4210

⊕ ⊕

⊕ ⊕

⊕ ⊕⊕ ⊕

D

x(n)

x(n)

y(n)

y(n-1)

b(n)a(n)

b(n-1)a(n)

b(2k)a(2k)

b(2k+1)a(2k+1)y(2k)

y(2k+1)x(2k+1)

x(2k)Parallel processing

Pipelining

Page 3: Chapter 3 Pipelining and parallel Processing · 10 YORK UNIVERSITY CSE4210 Low Power 2 charge 2 ( o t) o pd total o k V V C V T P C V f − = = Simple approximation for CMOS Ctotal

3

YORK UNIVERSITY CSE4210

Pipelining• Advantages

– Could be used to reduce power and/or to increase clock rate (speed)

• Disadvantages– Increases number of delay elements (latches

or flip-flops)– Increases latency

YORK UNIVERSITY CSE4210

Pipelining• Cutset: is a set of edges of a graph if

removed, the graph becomes partitioned• Feed forward cutset: a cutset where the

data moves in the forward direction on all the edges in the cutset

• We can place latches on a feed-forward cutset without affecting the functionality of the graph.

Page 4: Chapter 3 Pipelining and parallel Processing · 10 YORK UNIVERSITY CSE4210 Low Power 2 charge 2 ( o t) o pd total o k V V C V T P C V f − = = Simple approximation for CMOS Ctotal

4

YORK UNIVERSITY CSE4210

Pipelining

D D

⊕⊗ ⊗ ⊗

⊕a b c

y(n)

x(n-2)x(n-1)x(n)

D

D

YORK UNIVERSITY CSE4210

Example

D

D

A1

A2 A4

A6

A3A5

D

D

A1

A2 A4

A6

A3A5

D

D

A1

A2 A4

A6

A3A5

Critical Path?

Page 5: Chapter 3 Pipelining and parallel Processing · 10 YORK UNIVERSITY CSE4210 Low Power 2 charge 2 ( o t) o pd total o k V V C V T P C V f − = = Simple approximation for CMOS Ctotal

5

YORK UNIVERSITY CSE4210

Data Broadcast Structures• Reversing the direction of all the edges in

a given SFG, and interchanging the ,input and output preserves the functionality of the system.

YORK UNIVERSITY CSE4210

Data Broadcast

y(n)

o o

o

o

o o

Z-1 Z-1x(n)

ab c

x(n)

o o

o

o

o o

Z-1 Z-1y(n)

ab c

Page 6: Chapter 3 Pipelining and parallel Processing · 10 YORK UNIVERSITY CSE4210 Low Power 2 charge 2 ( o t) o pd total o k V V C V T P C V f − = = Simple approximation for CMOS Ctotal

6

YORK UNIVERSITY CSE4210

Fine-Grain Pipelining

D D⊕⊗ ⊗ ⊗

⊕c b a

y(n)

x(n)

Multiplication time = 10

Addition time = 2

Critical path = ?

YORK UNIVERSITY CSE4210

Fine grain Pipelining

Page 7: Chapter 3 Pipelining and parallel Processing · 10 YORK UNIVERSITY CSE4210 Low Power 2 charge 2 ( o t) o pd total o k V V C V T P C V f − = = Simple approximation for CMOS Ctotal

7

YORK UNIVERSITY CSE4210

Parallel Processing

)3()13()23()23()13()3()13()13()23()13()3()3(

)2()1()()(

kcxkbxkaxkykcxkbxkaxkykcxkbxkaxky

ncxnbxnaxny

++++=+−+++=+−+−+=

−+−+=

YORK UNIVERSITY CSE4210

X(3k) or x(3k-2)?

Page 8: Chapter 3 Pipelining and parallel Processing · 10 YORK UNIVERSITY CSE4210 Low Power 2 charge 2 ( o t) o pd total o k V V C V T P C V f − = = Simple approximation for CMOS Ctotal

8

YORK UNIVERSITY CSE4210

Complete Parallel System

Serial to Parallel

Converter

MIMO

SYSTEM

Parallel to Serial

Converter

x(n)

y(n)Clock period

T

Sampling period T/4

Clock period T/4

x(4k+3) x(4k+1)

x(4k+2) x(4k)

y(4k)y(4k+1)y(4k+2)y(4k+3)

YORK UNIVERSITY CSE4210

S/P and P/S ConverterD D D

T/4 T/4 T/4

TT T T

x(4k+3) x(4k+2) x(4k+1) x(4k)

D D D y(n)TTTT

T/4 T/4 T/4

Page 9: Chapter 3 Pipelining and parallel Processing · 10 YORK UNIVERSITY CSE4210 Low Power 2 charge 2 ( o t) o pd total o k V V C V T P C V f − = = Simple approximation for CMOS Ctotal

9

YORK UNIVERSITY CSE4210

Parallel Processing• Why use parallel processing? It increases

the hardware.• There is a limit for the use of pipelining,

you may not be able to pipeline a functional unit beyond a certain limie

• Also, I/O usually imposes a bound on the cycle time (communication bound)

YORK UNIVERSITY CSE4210

Combining pipelining and parallel processing

Page 10: Chapter 3 Pipelining and parallel Processing · 10 YORK UNIVERSITY CSE4210 Low Power 2 charge 2 ( o t) o pd total o k V V C V T P C V f − = = Simple approximation for CMOS Ctotal

10

YORK UNIVERSITY CSE4210

Low Power

2charge

2

)( to

opd

ototal

VVk

VCT

fVCP

−=

=

Simple approximation for CMOS

Ctotal is the total capacitance of the circuit, Vo is the supply voltage. Ccharge is the capacitance to be charged/discharged in a single clock cycle.

Pipelining and parallel processing could be used to minimize power or execution time.

YORK UNIVERSITY CSE4210

Low Power• What happens in case of M –pipelining?• Critical path is reduced by M, so is Ccharge

• If we keep the same f, then we have more time to charge Ccharge, which means we can reduce the supply voltage

?,)()( 2

charge

2charge =

−==

−= P

VVk

VM

C

TVVk

VCT

to

opip

to

oseq

β

β

Page 11: Chapter 3 Pipelining and parallel Processing · 10 YORK UNIVERSITY CSE4210 Low Power 2 charge 2 ( o t) o pd total o k V V C V T P C V f − = = Simple approximation for CMOS Ctotal

11

YORK UNIVERSITY CSE4210

Low Power —Example

D D

m1c b a

y(n)

x(n)

m1 m1

a1 a1

YORK UNIVERSITY CSE4210

Example

Page 12: Chapter 3 Pipelining and parallel Processing · 10 YORK UNIVERSITY CSE4210 Low Power 2 charge 2 ( o t) o pd total o k V V C V T P C V f − = = Simple approximation for CMOS Ctotal

12

YORK UNIVERSITY CSE4210

Parallel processing• What happen for L parallel System?• Total capacitance increased by L• Same performance, increase T by L• More time to charge Ccharge, can decrease

V

?,)()( 2

arg2

charge =−

==−

= PVVk

VCT

VVk

VCLLT

to

oechpar

to

oseq

β

β

YORK UNIVERSITY CSE4210

Parallel Processing Example• Consider a 4-tap FIR filter shown in Fig. 3.18(a)

and its 2-parallel version in 3.18(b). The parallel filter has exactly 2 copies of the original filter. The dashed line donates the critical path.

• Assume Also that TM=8, TA=1, Vt=0.45V, Vo=3.3V, CM=8CA– What is the supply voltage of the 2-parallel filter? – What is the power consumption of the 2-parallel filter

as a percentage of the original filter?

Page 13: Chapter 3 Pipelining and parallel Processing · 10 YORK UNIVERSITY CSE4210 Low Power 2 charge 2 ( o t) o pd total o k V V C V T P C V f − = = Simple approximation for CMOS Ctotal

13

YORK UNIVERSITY CSE4210

YORK UNIVERSITY CSE4210

Example

Page 14: Chapter 3 Pipelining and parallel Processing · 10 YORK UNIVERSITY CSE4210 Low Power 2 charge 2 ( o t) o pd total o k V V C V T P C V f − = = Simple approximation for CMOS Ctotal

14

YORK UNIVERSITY CSE4210

Different Architecture