Exploiting Fast Carry Chains of FPGAs for Designing Compressor Trees

29
Exploiting Fast Carry Chains of FPGAs for Designing Compressor Trees Hadi P. Afshar Philip Brisk Paolo Ienne

description

Exploiting Fast Carry Chains of FPGAs for Designing Compressor Trees. Hadi P. Afshar Philip Brisk Paolo Ienne. Multi-input Additions are Fundamental. DSP and Multimedia Application FIR filters, Motion Estimation,… Parallel Multipliers Flow Graph Transformation. D. D. D. D. - PowerPoint PPT Presentation

Transcript of Exploiting Fast Carry Chains of FPGAs for Designing Compressor Trees

Page 1: Exploiting Fast Carry Chains of  FPGAs  for Designing Compressor Trees

Exploiting Fast Carry Chains of FPGAs for Designing Compressor TreesHadi P. AfsharPhilip BriskPaolo Ienne

Page 2: Exploiting Fast Carry Chains of  FPGAs  for Designing Compressor Trees

Multi-input Additions are Fundamental DSP and Multimedia Application

– FIR filters, Motion Estimation,…

Parallel Multipliers Flow Graph Transformation

2

DD DD DD

ΣΣ

DD

FIR Filter

Page 3: Exploiting Fast Carry Chains of  FPGAs  for Designing Compressor Trees

3

Flow Graph Transformationstep 3

>>

&

delta

7

&4

=

0+step 1

>>

&

2

=

0step 2

>>

&

1

=

0

vpdiff

step 3

>>

=

delta

1

0

step 2

>>0

=

delta

2

0

step 1

>>0

=

delta

4

&0

step 0

>>0

vpdiff

+Compressor Tree

ADPCM

+

+

& &

BEFORE AFTER

Page 4: Exploiting Fast Carry Chains of  FPGAs  for Designing Compressor Trees

Compressor vs. Adder Tree

4

CPACPA

CSACSA CSACSACSA

CSA CSACSACSA

CPACPA

CPACPA

CPACPA

CPACPA

Compressor Tree Adder Tree

Compressors are better than Adder

Trees in VLSI

Compressors are better than Adder

Trees in VLSI

But Adder Trees are better than

Compressors in FPGA!

But Adder Trees are better than

Compressors in FPGA!

- Slow intra LUT routing- Poor LUT utilization- Low logic density

- Slow intra LUT routing- Poor LUT utilization- Low logic density

Page 5: Exploiting Fast Carry Chains of  FPGAs  for Designing Compressor Trees

But Compressor Trees can be

faster and smaller

if

Properly Designed

5

Page 6: Exploiting Fast Carry Chains of  FPGAs  for Designing Compressor Trees

Better Compressors on FPGA Generalized Parallel Counter (GPC) is the basic block More logic density Fewer logic levels Less pressure on the routing

6

CPACPA

GPCGPCGPCGPC

GPCGPCGPC

Page 7: Exploiting Fast Carry Chains of  FPGAs  for Designing Compressor Trees

Overview Arithmetic Concepts

Hybrid Design Approach– Bottom-up– Top-down

Experiments

Conclusion

7

Page 8: Exploiting Fast Carry Chains of  FPGAs  for Designing Compressor Trees

Parallel Counters Parallel Counter– Count # of input bits set to 1– Output is a binary value– 3:2 − Full Adder– 2:2 − Half Adder

Generalized Parallel Counter (GPC)– Input bits can have different bit position– Eg. (3, 3; 4) GPC

8

m

n

m:n counter

n = log2(m+1)

Page 9: Exploiting Fast Carry Chains of  FPGAs  for Designing Compressor Trees

Compressor Trees on FPGAs We propose GPCs as the basic blocks for

compressor trees

– Why?

1.GPCs map well onto FPGA logic cells

2.GPCs are flexible

9

Page 10: Exploiting Fast Carry Chains of  FPGAs  for Designing Compressor Trees

GPC Mapping Example

10

(3,5;4)(3,4;4)(0,5;3)

5 Counters3 GPCs

Page 11: Exploiting Fast Carry Chains of  FPGAs  for Designing Compressor Trees

Overview Arithmetic Concepts

Hybrid Design Approach– Bottom-up– Top-down

Experiments

Conclusion

11

Page 12: Exploiting Fast Carry Chains of  FPGAs  for Designing Compressor Trees

Hybrid Design Approach

12

FPGA Architectural

Characteristics

FPGA Architectural

Characteristics

Compressor Tree Specification

Compressor Tree Specification

Atom Level GPC HDL Library

Atom Level GPC HDL Library

GPC Mapped HDL Netlist

GPC Mapped HDL Netlist

Place and Route

Place and Route

Bottom-UpBottom-Up

Top-DownTop-Down

ResultResult

Page 13: Exploiting Fast Carry Chains of  FPGAs  for Designing Compressor Trees

FPGA Logic Cell Altera Stratix-II/III/V

13

Logic Array Block (LAB)

Adaptive Logic Module (ALM)

Reg

Reg

Comb.Logic

+

+

1234

5678

Page 14: Exploiting Fast Carry Chains of  FPGAs  for Designing Compressor Trees

FPGA Logic Cell

14

ALM Configuration Modes– Normal– Extended– Arithmetic– Shared Arithmetic

4-LUT

4-LUT

4-LUT

4-LUT

+

+

4-LUT

4-LUT

4-LUT

4-LUT

+

+

Page 15: Exploiting Fast Carry Chains of  FPGAs  for Designing Compressor Trees

Bottom-up Design

15

F0F1F2

F0

F1

F2

6:3 GPC

What if we have bigger GPCs like

7:3 GPC?

What if we have bigger GPCs like

7:3 GPC?

Can we exploit the carry chain and dedicated

adders for building GPCs?

Can we exploit the carry chain and dedicated

adders for building GPCs?

LAB0 LAB1

Page 16: Exploiting Fast Carry Chains of  FPGAs  for Designing Compressor Trees

GPC Design Example

16

a5

a4a3a2a1

a0

FAFAHAHA

FAFA

FAFA

s0c0s1c1

z0

z1z2

(0, 6; 3) GPC

C(a1,a2,a3)

C(a4,a5)

S(a1,a2,a3)

S(a4,a5)

a0

a0

s0

s1

c0

c1

0

0

a0

z0

z1

z2

0

0

ALM0

ALM1

+

+

+

+

z2

Page 17: Exploiting Fast Carry Chains of  FPGAs  for Designing Compressor Trees

GPC Placement

17

+

+

+

+

GPC Boundary

+

+

GPC Boundary

Zero value on the carry

0

Logic separation between carry and sum

{cout,s} = cin+ a + a = cin+ 2a cout = a and s = cin

a

a

cin

cout

s

Page 18: Exploiting Fast Carry Chains of  FPGAs  for Designing Compressor Trees

19

GPCi

+LUT

+LUT

+LUT

+LUT

+LUT

+LUT

+LUT

+LUT

+LUT

+LUT

+LUT

+LUT

+LUT

+LUT

+LUT

+LUT

GPCi+1

GPCi

GPCi+1

Page 19: Exploiting Fast Carry Chains of  FPGAs  for Designing Compressor Trees

Top-down Heuristic

20

{Build_GPC_library(); repeat{ while (col_indx<max_col_indx) { if(columns[col_indx] > H) Map_by_GPC(); else col_indx++; } lsb_to_msb_covering(); Connect_GPCs_IOs(); Propagate_comb_delay(); Generate_next_stage_dots();

} until three rows of dots remains;}

Step1:

Step2:

Step3:

Mapping_algorithm(Integer : M, Integer : W, Array of Integers : columns )

(0, H; log2H)

Page 20: Exploiting Fast Carry Chains of  FPGAs  for Designing Compressor Trees

Major Step of Heuristic

21Process columns from LSB to MSB

Mapped to (0, H; log2H) GPCs

Height < H

Page 21: Exploiting Fast Carry Chains of  FPGAs  for Designing Compressor Trees

Delay Balancing

22

CP1 = z1d+a0d

z1d > z4d > z6d

a0d > a2d > a5d

z2 z1 z0z5 z4 z3z8 z7 z6

a5 a2 a0

CP2 = max(z1d+a5d, z4d+a2d, z6d+a0d)

Page 22: Exploiting Fast Carry Chains of  FPGAs  for Designing Compressor Trees

Overview Arithmetic Constructs

Hybrid Design Approach– Bottom-up– Top-down

Experiments

Conclusion

23

Page 23: Exploiting Fast Carry Chains of  FPGAs  for Designing Compressor Trees

Experiments Bottom-up design– Atom-level design by Verilog Quartus Module (VQM)

format Top-down– Heuristic: C++– Output: Structural VHDL

Quartus-II Altera tool Benchmarks– DCT, FIR, ME, G721– Multiplier– Horner Polynomial– Video Mixer

24

Page 24: Exploiting Fast Carry Chains of  FPGAs  for Designing Compressor Trees

Experiments Mapping methods– Ternary– LUT Only– Arith1: Arithmetic mode, without delay balancing– Arith2: Arithmetic mode, with delay balancing

25

Page 25: Exploiting Fast Carry Chains of  FPGAs  for Designing Compressor Trees

Delay (ns)

26

-27% +2%

Page 26: Exploiting Fast Carry Chains of  FPGAs  for Designing Compressor Trees

Area (ALM)

27

+47% +18%

Page 27: Exploiting Fast Carry Chains of  FPGAs  for Designing Compressor Trees

Area (LAB)

28

-4.5%

Page 28: Exploiting Fast Carry Chains of  FPGAs  for Designing Compressor Trees

Overview Arithmetic Concepts

Hybrid Design Approach– Bottom-up– Top-down

Experiments

Conclusion

29

Page 29: Exploiting Fast Carry Chains of  FPGAs  for Designing Compressor Trees

30

Conclusion

Conventional wisdom has held that adder trees outperform compressor trees on FPGAs– Ternary adder trees were a major selling point

Conventional wisdom is wrong!– GPCs map nicely onto FPGA logic cells

Carry-chain

– Compressor trees on FPGAs, are faster than adder trees when built from GPCs