Exploiting Fast Carry Chains of FPGAs for Designing Compressor Trees

Exploiting Fast Carry Chains of FPGAs for Designing Compressor TreesHadi P. AfsharPhilip BriskPaolo Ienne

Multi-input Additions are Fundamental DSP and Multimedia Application

– FIR filters, Motion Estimation,…

Parallel Multipliers Flow Graph Transformation

DD DD DD

FIR Filter

Flow Graph Transformationstep 3

0+step 1

0step 2

vpdiff

step 3

step 2

step 1

step 0

vpdiff

+Compressor Tree

BEFORE AFTER

Compressor vs. Adder Tree

CPACPA

CSACSA CSACSACSA

CSA CSACSACSA

CPACPA

Compressor Tree Adder Tree

Compressors are better than Adder

Trees in VLSI

Compressors are better than Adder

Trees in VLSI

But Adder Trees are better than

Compressors in FPGA!

But Adder Trees are better than

Compressors in FPGA!

- Slow intra LUT routing- Poor LUT utilization- Low logic density

But Compressor Trees can be

faster and smaller

Properly Designed

Better Compressors on FPGA Generalized Parallel Counter (GPC) is the basic block More logic density Fewer logic levels Less pressure on the routing

CPACPA

GPCGPCGPCGPC

GPCGPCGPC

Overview Arithmetic Concepts

Hybrid Design Approach– Bottom-up– Top-down

Experiments

Conclusion

Parallel Counters Parallel Counter– Count # of input bits set to 1– Output is a binary value– 3:2 − Full Adder– 2:2 − Half Adder

Generalized Parallel Counter (GPC)– Input bits can have different bit position– Eg. (3, 3; 4) GPC

m:n counter

n = log2(m+1)

Compressor Trees on FPGAs We propose GPCs as the basic blocks for

compressor trees

– Why?

1.GPCs map well onto FPGA logic cells

2.GPCs are flexible

GPC Mapping Example

(3,5;4)(3,4;4)(0,5;3)

5 Counters3 GPCs

Experiments

Conclusion

Hybrid Design Approach

FPGA Architectural

Characteristics

FPGA Architectural

Characteristics

Compressor Tree Specification

Atom Level GPC HDL Library

GPC Mapped HDL Netlist

Place and Route

Bottom-UpBottom-Up

Top-DownTop-Down

ResultResult

FPGA Logic Cell Altera Stratix-II/III/V

Logic Array Block (LAB)

Adaptive Logic Module (ALM)

Comb.Logic

FPGA Logic Cell

ALM Configuration Modes– Normal– Extended– Arithmetic– Shared Arithmetic

Bottom-up Design

F0F1F2

6:3 GPC

What if we have bigger GPCs like

7:3 GPC?

What if we have bigger GPCs like

7:3 GPC?

Can we exploit the carry chain and dedicated

adders for building GPCs?

Can we exploit the carry chain and dedicated

adders for building GPCs?

LAB0 LAB1

GPC Design Example

a4a3a2a1

FAFAHAHA

s0c0s1c1

(0, 6; 3) GPC

C(a1,a2,a3)

C(a4,a5)

S(a1,a2,a3)

S(a4,a5)

GPC Placement

GPC Boundary

Zero value on the carry

Logic separation between carry and sum

{cout,s} = cin+ a + a = cin+ 2a cout = a and s = cin

GPCi+1

Top-down Heuristic

{Build_GPC_library(); repeat{ while (col_indx<max_col_indx) { if(columns[col_indx] > H) Map_by_GPC(); else col_indx++; } lsb_to_msb_covering(); Connect_GPCs_IOs(); Propagate_comb_delay(); Generate_next_stage_dots();

} until three rows of dots remains;}

Step1:

Step2:

Step3:

Mapping_algorithm(Integer : M, Integer : W, Array of Integers : columns )

(0, H; log2H)

Major Step of Heuristic

21Process columns from LSB to MSB

Mapped to (0, H; log2H) GPCs

Height < H

Delay Balancing

CP1 = z1d+a0d

z1d > z4d > z6d

a0d > a2d > a5d

z2 z1 z0z5 z4 z3z8 z7 z6

a5 a2 a0

CP2 = max(z1d+a5d, z4d+a2d, z6d+a0d)

Overview Arithmetic Constructs

Experiments

Conclusion

Experiments Bottom-up design– Atom-level design by Verilog Quartus Module (VQM)

format Top-down– Heuristic: C++– Output: Structural VHDL

Quartus-II Altera tool Benchmarks– DCT, FIR, ME, G721– Multiplier– Horner Polynomial– Video Mixer

Experiments Mapping methods– Ternary– LUT Only– Arith1: Arithmetic mode, without delay balancing– Arith2: Arithmetic mode, with delay balancing

Delay (ns)

-27% +2%

Area (ALM)

+47% +18%

Area (LAB)

Experiments

Conclusion

Conventional wisdom has held that adder trees outperform compressor trees on FPGAs– Ternary adder trees were a major selling point

Conventional wisdom is wrong!– GPCs map nicely onto FPGA logic cells

Carry-chain

– Compressor trees on FPGAs, are faster than adder trees when built from GPCs

Exploiting Fast Carry Chains of FPGAs for Designing Compressor Trees

Documents

Transcript of Exploiting Fast Carry Chains of FPGAs for Designing Compressor Trees

FPGAs in Network Equipment FPGAs as the Focus

Fpga01.FPGAs Overview

Wp375 Hpc Using Fpgas

XC6200 Family FPGAs

Beyond the Arithmetic Constraint: Depth-Optimal Mapping of Logic Chains in LUT-based FPGAs

FPGAs : An Overview

Exploiting with Metasploi Exploiting with Metasploit - hacking

Next Generation FPGAs

Y. Hu, V. Shih, R. Majumdar and L. He, “Exploiting Symmetries to Speedup SAT-based Boolean Matching for Logic Synthesis of FPGAs”, TCAD 2008. Y. Hu,

FPGAs - ICL

Exploiting Partially Reconfigurable FPGAs for Situation-Based Reconfiguration in Wireless Sensor Networks Rafael Garcia, Dr. Ann Gordon-Ross, Dr. Alan.

ATS Exploiting Free LUT Entries to Mitigate Soft Errors in SRAM- based FPGAs Keheng Huang, Yu Hu, Xiaowei Li Institute of Computing Technology Chinese.

Rethink FPGAs - Microsemi

PALs, CPLDs und FPGAs - sus.ziti.uni-heidelberg.de · Digitale Schaltungstechnik - PALs und FPGAs © P. Fischer, ziti, Uni Heidelberg, Seite 1 PALs, CPLDs und FPGAs

HPEC using FPGAs

Putting FPGAs to Work in Software Radio Systems · Putting FPGAs to Work in Software Radio Systems FPGAs: New Device Technology FPGAs: New Development Tools Technology Figure 5 Figure

Designing with 6.375-Gbps Transceiver-Based FPGAs · Transceiver-Based FPGAsTransceiver-Based FPGAs FPGA vendors have introduced transceiver-based FPGAs −Altera has 4 generations

Reprogrammable FPGAs at Astrium - ESA …microelectronics.esa.int/fiws/WFIFT_P1_Reprogrammable_FPGAs_at... · ... anti-fuse FPGAs and/or custom ASICs are used. Reprogrammable FPGAs

Designing with FPGAs

FPGAs in Industrial Control Applications - Welcome to …arro.anglia.ac.uk/296319/1/FPGAs in Industrial Control Applications... · FPGAs in Industrial Control Applications ... multiphase