Nov. 29, 2005ELEC6970-0011 Power Minimization Using Voltage Reduction and Parallel Processing By...

19
Nov. 29, 2005 ELEC6970-001 1 Power Minimization Using Voltage Reduction and Parallel Processing By Sudheer Vemula
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    215
  • download

    0

Transcript of Nov. 29, 2005ELEC6970-0011 Power Minimization Using Voltage Reduction and Parallel Processing By...

Page 1: Nov. 29, 2005ELEC6970-0011 Power Minimization Using Voltage Reduction and Parallel Processing By Sudheer Vemula.

Nov. 29, 2005 ELEC6970-001 1

Power Minimization Using Voltage Reduction and Parallel Processing

By

Sudheer Vemula

Page 2: Nov. 29, 2005ELEC6970-0011 Power Minimization Using Voltage Reduction and Parallel Processing By Sudheer Vemula.

Nov. 29, 2005 ELEC6970-001 2

Outline:- Goal of the Project Introduction to Parallel Processing Delay of the critical path in the given circuit of

32x32 Array Multiplier Methods to introduce parallelism in the given circuit. Reduction in delay of critical path due to the

introduced parallelism Calculations showing that the estimation of area and

delay Conclusion

Page 3: Nov. 29, 2005ELEC6970-0011 Power Minimization Using Voltage Reduction and Parallel Processing By Sudheer Vemula.

Nov. 29, 2005 ELEC6970-001 3

Goal of the Project To reduce the power consumption of the

circuit. By reducing the Voltage of the power supply.

Consequence: Increases the delay of the critical path.

To compensate the increase in delay by introducing parallelism.

To calculate the reduction in power.

Page 4: Nov. 29, 2005ELEC6970-0011 Power Minimization Using Voltage Reduction and Parallel Processing By Sudheer Vemula.

Nov. 29, 2005 ELEC6970-001 4

Parallel Processing Definition:- Concurrent execution of several

programs or several blocks of a program is known as parallel processing[1].

Types of parallelism Data Parallelism & Control Parallelism

Data Parallelism is parallel execution of single expression on data distributed over multiple processors[2].

Control Parallelism is the parallelism that is achieved by the simultaneous execution of multiple threads [3].

Page 5: Nov. 29, 2005ELEC6970-0011 Power Minimization Using Voltage Reduction and Parallel Processing By Sudheer Vemula.

Nov. 29, 2005 ELEC6970-001 5

Voltage Scaling and Delay:-

Since transistor is a voltage controlled current device, the resistance depends on the voltage and current.

= 0.5(0.5 Rp C + 0.5 Rn C)

RC5.0

2

fr

dsatpdsatn

dd

II

CV 11

4

tdd

dd

VV

kV

= 2 for low Vdd

Page 6: Nov. 29, 2005ELEC6970-0011 Power Minimization Using Voltage Reduction and Parallel Processing By Sudheer Vemula.

Nov. 29, 2005 ELEC6970-001 6

Critical Path:-

0

0

0

0

0 0 0 0

A0

A1

A2

A3

B3 B2 B1 B0

Y0

Y1

Y2

Y3 Y4 Y5 Y6 Y7

0

0

0

0

0 0 0 0

A0

A1

A2

A3

B3 B2 B1 B0

Y0

Y1

Y2

Y3 Y4 Y5 Y6 Y7

Delay of the Critical path for a multiplier of order n x m = (2m+n-2)

Delay of the Critical path for a multiplier of order 32 x 32 = 94

Approximate area of 32 x 32 Multiplier = 1024FAs + 128FAs (due to AND Gates) = 1152 FAs

Page 7: Nov. 29, 2005ELEC6970-0011 Power Minimization Using Voltage Reduction and Parallel Processing By Sudheer Vemula.

Nov. 29, 2005 ELEC6970-001 7

Horizontal Partition:-

32 x 16 Multiplier

32 x 16 Multiplier

16 bits

16 bits

32

bits

32

bits

48

48

32 bit Adder

64 bit Result

1632

16

32

16 bit Half Adder

Cout

0

0 0 0

A0

0 A1

B0B1B2B3

0

0 0 0

A2

0 A3

B0B1B2B3

Critical path delay for a multiplier of order 32x16 = (2*16+32-2) + Delay of the 32 bit Full Adder (FA) + Delay of the 16 bit Half Adder (HA)= 62 + Delay of the 32 bit FA+ Delay of the 16 bit HA

Ex.: A=98 and B=76

AB=(90x76) + (8x76)

=(9x76)x10 + 8x76

Page 8: Nov. 29, 2005ELEC6970-0011 Power Minimization Using Voltage Reduction and Parallel Processing By Sudheer Vemula.

Nov. 29, 2005 ELEC6970-001 8

Vertical Partition

16 x 32 Multiplier

32 bits

16

bits

16 x 32 Multiplier

32 bits1

6 b

its

32 bit Full AdderCout

16 bit Half Adder

64 bit result

0

0

0

0

0

A0

A1

A2

A3

0B0B1

0

0

0

0

0

A0

A1

A2

A3

0B3B4

Ex.: A=98 and B=76

AB = (98x70) + (98x6) = (98x7)x10 + (98x6)

Critical path delay for a multiplier of order 16x32

= (2x32+16-2) + Delay of the 32 bit FA+ Delay of the 16 bit HA

=78 + Delay of the 32 bit FA+ Delay of the 16 bit HA

Page 9: Nov. 29, 2005ELEC6970-0011 Power Minimization Using Voltage Reduction and Parallel Processing By Sudheer Vemula.

Nov. 29, 2005 ELEC6970-001 9

Delay of the 32 bit FA:-

The computation of products and sum is done simultaneously.

FA introduces only a delay of 1 unit.

Now the remaining delay is due to the delay of the HA.

The delay due to 16 bit HA adder is ~ equal to 8 FA units

Let A=1010 B=1011

1010 1010

X 10 x 11

10100 11110

Product1:- 1 1 1 1 0

Product2:- 1 0 1 0 0

Sum:- 0 1 1 0 1 1 1 0

Page 10: Nov. 29, 2005ELEC6970-0011 Power Minimization Using Voltage Reduction and Parallel Processing By Sudheer Vemula.

Nov. 29, 2005 ELEC6970-001 10

Eliminating the Delay due to Half Adder:-

16 x 32 Multiplier

32 bits

16 b

its

16 x 32 Multiplier

32 bits

16 b

its

32 bit Full AdderCout

16 bit Half Adder

64 bit result

‘1’

48 4832 32

16

16

16 32

16

16

Here we are introducing a 16 bit multiplexer to eliminate the delay due to 16 bit Half Adder.

The additional delay is only due to the multiplexer.

Delay of this circuit = 78+1+0.5(~delay due to mux)

Additional No. of gates = 32FAs + 16 HAs + Multiplexers ~ 32+8+5 = 45FAs

The same procedure can be implemented in the circuit with horizontal partitioning.

Page 11: Nov. 29, 2005ELEC6970-0011 Power Minimization Using Voltage Reduction and Parallel Processing By Sudheer Vemula.

Nov. 29, 2005 ELEC6970-001 11

16 x 16 Multiplier 32 bits

16 b

its

16 x 16 Multiplier 32 bits

16 b

its

16 bit Full AdderCout

16 bit HA

48 bit result

‘1’

32 16 16 32

16

16

16

1616

16

16 x 16 Multiplier 32 bits

16 b

its

16 x 16 Multiplier 32 bits

16 b

its

16 bit Full AdderCout

16 bit HA

48 bit result

‘1’

32 16 16 32

16

16

16

16

16

16

32 bit FA

64 bit Result

16 bit HA

‘1’ Cout

48

48

32

32

32

16

16

16

16

16

Ex.: A=98 and B=76

AB=(90x76) + (8x76)

=(9x76) 10 + 8x76

=(9x7) 100 + (9x6) 10+(8x7) 10 + (8x6)

Page 12: Nov. 29, 2005ELEC6970-0011 Power Minimization Using Voltage Reduction and Parallel Processing By Sudheer Vemula.

Nov. 29, 2005 ELEC6970-001 12

Delay and Area Calculations:- Delay of the circuit = (2x16+16-2)+ 1.5 + (Delay due to 32

bit FA) +1.5 Delay due to 32 bit FA is 16 units. Because the 16 LSBs of

the FA are computed simultaneously with previous stage whereas the 16 MSBs are computed without any overlap.

Therefore, Delay = 49 + 16 = 65 Area Overhead = 2 x 16 bit FAs + 32 bit FA +3 x 16 bit HAs

+ 3 x 16 bit Multiplexers ~ 64 + 24 + 3 x 8 = 112 FAsPercentage Reduction in Delay = (94-65) x 100 / 94 = 30.8%Percentage Increase in Area = (112/1152) x 100 = 9.7%

Page 13: Nov. 29, 2005ELEC6970-0011 Power Minimization Using Voltage Reduction and Parallel Processing By Sudheer Vemula.

Nov. 29, 2005 ELEC6970-001 13

Circuit with improved Delay:-

16 x 16 Multiplier 32 bits

16 b

its

16 x 16 Multiplier 32 bits

16 b

its

16 bit Full AdderCout

16 bit HA

48 bit result

‘1’

32 16 16 32

16

16

16

1616

16

16 x 16 Multiplier 32 bits

16 b

its16 x 16

Multiplier 32 bits

16 b

its

16 bit Full AdderCout

16 bit HA

48 bit result

‘1’

32 16 16 32

16

16

16

16

16

16

64 bit Result

16 bit HA

‘1’ Cout

48

48

32

32

32

16

16

16

16

1616 bit CLA 16 bit FA

Page 14: Nov. 29, 2005ELEC6970-0011 Power Minimization Using Voltage Reduction and Parallel Processing By Sudheer Vemula.

Nov. 29, 2005 ELEC6970-001 14

Delay and Area Calculations:- Delay of the circuit = (2x16+16-2)+ 1.5 + (Delay due to 16

bit CLA) +1.5 Therefore, Delay = 49 + (16/3.6) = 53.5 --[4] Area Overhead = 2 x 16 bit FAs + 16 bit FA + 16 bit Carry

Look Ahead Adder (CLA) + 3 x 16 bit HAs + 3 x 16 bit Multiplexers

~ 32 + 16 + 16 x (10/7.2) + 24 + 24 --- [4] = 48 + 22 + 48 = 118 FAsPercentage Reduction in Delay = (94-53.5) x 100 / 94 = 43.08%Percentage Increase in Area = (118/1152) x 100 = 10.24%

Page 15: Nov. 29, 2005ELEC6970-0011 Power Minimization Using Voltage Reduction and Parallel Processing By Sudheer Vemula.

Nov. 29, 2005 ELEC6970-001 15

16 x 16 Multiplier 32 bits

16 b

its

16 x 16 Multiplier 32 bits

16 b

its

16 bit Full AdderCout

16 bit HA

48 bit result

‘1’

32 16 16 32

16

16

16

1616

16

16 x 16 Multiplier 32 bits

16 b

its

16 x 16 Multiplier 32 bits

16 b

its

16 bit Full Adder

32 bit result

32 16 16 32

16

16

16

64 bit Result

Cout

48

32

32

32

1616 bit CLA 16 bit FA

FA

S

C

15 bit HA

‘1’

15

15

Page 16: Nov. 29, 2005ELEC6970-0011 Power Minimization Using Voltage Reduction and Parallel Processing By Sudheer Vemula.

Nov. 29, 2005 ELEC6970-001 16

Delay and Area Calculations:- Delay of the circuit = (2x16+16-2)+ 1.5 + (Delay due to 16

bit CLA) +1.5 + 1(Added delay due to one FA) Therefore, Delay = 49 + (16/3.6) +1 = 54.5 ---[4] Area Overhead = 2 x 16 bit FAs + 16 bit FA + 16 bit Carry

Look Ahead Adder (CLA) + 16 bit HA + 1 bit FA + 15 bit HA + 3 x 16 bit Multiplexers

~ 32 + 16 + 16 x (10/7.2) + 8 + 1+ 8.5 + 24 --- [4] = 48 + 22 + 41.5 = 111.5 FAsPercentage Reduction in Delay = (94-54.5) x 100 / 94 = 42.02%Percentage Increase in Area = (111.5/1152) x 100 = 9.7%

Page 17: Nov. 29, 2005ELEC6970-0011 Power Minimization Using Voltage Reduction and Parallel Processing By Sudheer Vemula.

Nov. 29, 2005 ELEC6970-001 17

32x32 Multiplier with 4x4 Multipliers:-

New delay of the circuit = (2x4+4-2) + 1.5 + 1.5 + 10 (CLAs) + 3 + 4.5 (both from previous ckt. values) = 29.5

New Area overhead = 8 x 4 bit FAs + 8 x 4 bit HAs + 4 x 4 bit CLA + 4 x 4 bit FA + overhead of previous ckt = 32 + 16 + 16 x (10/7.2) + 16 + 111.5 ~ 198 FAs

Percentage reduction in Delay = (94 - 30) / 94 = 68% Percentage increase in Area = 198/1152 = 17%

Page 18: Nov. 29, 2005ELEC6970-0011 Power Minimization Using Voltage Reduction and Parallel Processing By Sudheer Vemula.

Nov. 29, 2005 ELEC6970-001 18

Conclusion:- The percentage reduction in Delay is much

higher than the increase in Area. So, there is a very high possibility that the final power consumed after voltage scaling is much lesser than the original value.

Page 19: Nov. 29, 2005ELEC6970-0011 Power Minimization Using Voltage Reduction and Parallel Processing By Sudheer Vemula.

Nov. 29, 2005 ELEC6970-001 19

References [1]dspvillage.ti.com/docs/catalog/

dspplatform/details.jhtml [2]www.llnl.gov/CASC/Overture/henshaw/

documentation/App/manual/node160.html [3]books.nap.edu/html/up_to_spedd/

appD.html [4] J. M. Rabey & M. Pedram, Low power

Design Metodologies, Kluwer Academic Publishers, Boston MA, 1996.