Clustering of Large Designs for Channel-Width Constrained FPGAs

36
Clustering of Large Designs for Channel-Width Constrained FPGAs Marvin Tom Guy Lemieux University of British Columbia Department of Electrical and Computer Engineering Vancouver, BC, Canada

description

Clustering of Large Designs for Channel-Width Constrained FPGAs. Marvin TomGuy Lemieux University of British Columbia Department of Electrical and Computer Engineering Vancouver, BC, Canada. Overview. Introduction, Goals and Motivation - PowerPoint PPT Presentation

Transcript of Clustering of Large Designs for Channel-Width Constrained FPGAs

Page 1: Clustering of  Large Designs  for Channel-Width Constrained  FPGAs

Clustering of Large Designs forChannel-Width Constrained

FPGAs

Marvin Tom Guy Lemieux

University of British ColumbiaDepartment of Electrical and Computer Engineering

Vancouver, BC, Canada

Page 2: Clustering of  Large Designs  for Channel-Width Constrained  FPGAs

Overview

• Introduction, Goals and Motivation– Reduce channel width, lower cost, make circuits “routable”

• Reducing Channel Width By Depopulation

• Large Benchmark Circuits

• New Clustering Technique– Selective Depopulation

• Conclusions and Future Work

Page 3: Clustering of  Large Designs  for Channel-Width Constrained  FPGAs

Mesh-Based FPGA Architecture

• Channel width– Number of routing

tracks per channel

L L L

L L L

L L L

L L L

L L L

L L L

L L L

L

L

L

L

• Larger FPGA devices: more tiles– Channel width is fixed

Page 4: Clustering of  Large Designs  for Channel-Width Constrained  FPGAs

Motivation: Area of FPGA Devices

alu4

apex2

apex4

bigkey

des

diffeq

dsip

elliptic

ex1010

ex5p

frisc

misex3

pdc

s298s38417

s38584seq

spla

tseng

10

20

30

40

50

60

70

80

90

0 50 100 150 200 250 300

CLB Count

Routed Channel

Width

Number ofLayout Tiles

SIZE ofLayout Tile

Total Layout AREA= SIZE * Number

MCNC Circuits Mapped onto an FPGA

Page 5: Clustering of  Large Designs  for Channel-Width Constrained  FPGAs

Motivation: Channel Width Demand

alu4

apex2

apex4

bigkey

des

diffeq

dsip

elliptic

ex1010

ex5p

frisc

misex3

pdc

s298s38417

s38584seq

spla

tseng

10

20

30

40

50

60

70

80

90

0 50 100 150 200 250 300

CLB Count

Routed Channel

Width

Logic RangeUser buys bigger device.

InterconnectRange

User hasno choice!

Devices built for worst-casechannel width (fixed width)

Interconnect cost dominates (>70%)

MCNC Circuits Mapped onto an FPGA

Page 6: Clustering of  Large Designs  for Channel-Width Constrained  FPGAs

Goal: Reduce Channel Width

alu4

apex2

apex4

bigkey

des

diffeq

dsip

elliptic

ex1010

ex5p

frisc

misex3

pdc

s298s38417

s38584seq

spla

tseng

10

20

30

40

50

60

70

80

90

0 50 100 150 200 250 300

CLB Count

Routed Channel

Width

But { apex4, elliptic, frisc, ex1010, spla, pdc } are unroutable….

Can we make them routable in a Constrained FPGA?

Altera Cyclone• Channel width constraint of 80 routing tracks

Constrained FPGA• Channel width constraint of 60 routing tracks• Smaller area, lower cost for low-channel-width circuits

Page 7: Clustering of  Large Designs  for Channel-Width Constrained  FPGAs

alu4

apex2

apex4

bigkey

clma

des

diffeq

dsip

elliptic

ex1010

ex5p

frisc

misex3

pdc

s298s38417

s38584seq

spla

tseng

pdc

ex1010

frisc splaapex4 elliptic

10

20

30

40

50

60

70

80

90

0 50 100 150 200 250 300 350 400 450 500 550 600 650 700

CLB Count

Ro

ute

d C

ha

nn

el W

idth

Possible Solution• Trade-off logic utilization for channel width

– User can always buy more logic…. (not more wires)

FPGA 1 FPGA 2

L L L L

L L L L

L L L L

L L L L

L L L L

L L L L

L L L L

L L L L

L

L

L

L

L L L L L

Trade-off:

CLB count

for

Channel width

But….. can we achieve lower Total Area? ( = SIZE * CLB Count)

Page 8: Clustering of  Large Designs  for Channel-Width Constrained  FPGAs

Logic Element: BLE and CLB

• Basic Logic Element (BLE)– ‘k’-input LUT + FF

• Clustered Logic Block (CLB) – ‘N’ BLEs, ‘N’ outputs– ‘I’ shared inputs

‘I’ Inputs ‘N’ Outputs

BLE #1

BLE #2

BLE #3

BLE #4

BLE #5

CLB

L L L L

L L L L

L L L L

L L L L

Note: I < k*N

Page 9: Clustering of  Large Designs  for Channel-Width Constrained  FPGAs

CLB Depopulation

• Normally: CLBs fully packed– Reduces total # of CLBs

needed for circuit

• CLB Depopulation: Tessier, DeHon– Do not use all BLEs – Increase # CLBs used – Decrease channel width – Decrease overall area

• Problem– Increase in # CLBs high for

large circuits– Our work: limits # CLB increase

‘I’ Inputs ‘N’ Outputs

BLE #1

BLE #2

BLE #3

BLE #4

BLE #5

CLB

Page 10: Clustering of  Large Designs  for Channel-Width Constrained  FPGAs

Uniform Depopulation

• Previous work – Depopulate each CLB by

equal amount

• But… circuit observations– regions of high routing

demand– regions of low routing

demand

• Depopulate in low congestion areas ??– Unnecessary increase in

area

Page 11: Clustering of  Large Designs  for Channel-Width Constrained  FPGAs

Non-Uniform Depopulation

• Our depopulation method:– Assume congestion is

localized– Depopulate only congested

areas

• We show non-uniform de-population– Effective method of channel

width reduction– Graceful tradeoff between

channel width and area– Makes unroutable circuits

routable

Page 12: Clustering of  Large Designs  for Channel-Width Constrained  FPGAs

Depopulation Methodsto

Reduce Channel Width

Page 13: Clustering of  Large Designs  for Channel-Width Constrained  FPGAs

CLB Depopulation

• General Approach– Use existing clustering tools– Do not fill CLB while

clustering

1. Input-Limited• Eg. Maximum 67% input

utilization per CLB• Might use all BLEs

2. BLE-Limited• Eg. Maximum 60% BLE

utilization per CLB• Might use all Inputs

‘I’ Inputs‘N’ Outputs

BLE #1

BLE #2

BLE #3

BLE #4

BLE #5

CLB

Page 14: Clustering of  Large Designs  for Channel-Width Constrained  FPGAs

Reducing Channel Width Results(max cluster size 16)

• Input-Limited• No channel width control

30

40

50

60

70

80

90

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

Cluster Size (BLE-Limit)

Routed Channel

Width

6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54Number of Inputs (Input-Limit)

Input-limited clmaBLE-Limited clma

• BLE-Limited• (almost) monotonically increasing good channel width control

Page 15: Clustering of  Large Designs  for Channel-Width Constrained  FPGAs

Benchmark Circuit Creation

(We want BIG circuits!)

(What do REALLY BIG circuits look like?)

Page 16: Clustering of  Large Designs  for Channel-Width Constrained  FPGAs

Benchmarking Circuits: Some Observations

• Altera has bigger benchmarks than academics– We noted similar characteristics:

• Some LARGE circuits routable with NARROW routing channels

• Some SMALL circuits need WIDE routing channels

• What if each circuit is IP Block in larger system… ??

20 Largest MCNC Benchmarks

Altera Cyclone Benchmarks [CICC 2003]

LUT Range

10:1 (1,000..10,000 LUTs)

10:1 (2,500..25,000 LUTs)

Channel Width Range

4:1 (20..80 tracks)

3:1 (40..120 tracks)

Page 17: Clustering of  Large Designs  for Channel-Width Constrained  FPGAs

Benchmark Creation – IP Blocks

• Mimic process of creating large designs– “IP Blocks” <==> MCNC Circuits– SoC <==> Randomly integrate/stitch together “IP Blocks”– IP Blocks have varied interconnect needs

• Real-life large designs: System-on-Chip Methodology– IP blocks (own, 3rd party)

• Re-use improves productivity

– Primarily integration and verification effort

Page 18: Clustering of  Large Designs  for Channel-Width Constrained  FPGAs

Benchmark Creation – Large Designs

• Considered 3 stitching schemes…

– Independent• IP Blocks are not connected to each other

– Pipeline• Outputs of one IP block connected to inputs of next IP block

– Clique• Outputs of each IP block are uniformly distributed to inputs of

all other IP blocks

Page 19: Clustering of  Large Designs  for Channel-Width Constrained  FPGAs

MetaCircuit:Reducing Routed Channel Width?

• Observations

– IP blocks are tightly-connected internally– IP blocks have varied channel width needs

• Hypotheses

1. Placement keeps each “IP block” together

2. IP blocks has large routed channel width MetaCircuit has large routed channel width

Page 20: Clustering of  Large Designs  for Channel-Width Constrained  FPGAs

Hypothesis Testing:MetaCircuit P&R Results

• Use VPR FPGA tools from University of Toronto

• Hypothesis 1– VPR placer successfully

groups IP blocks from random initial placement

• Hypothesis 2– VPR router confirms channel

width of MetaCircuit is dominated by a few IP blocks{ pdc, clma, ex1010 }

Page 21: Clustering of  Large Designs  for Channel-Width Constrained  FPGAs

Consequences of Hypothesis 2

• Question– Shrink channel width of few IP blocks

?? shrink channel width of MetaCircuit?

• How to shrink channel widths?– Selective CLB Depopulation !!– Depopulate hard-to-route IP blocks the most

• How much to depopulate?– Channel width profiling of IP block…

Page 22: Clustering of  Large Designs  for Channel-Width Constrained  FPGAs

Meeting Channel Width Constraints:Selective Depopulation

• Step 1: Channel Width Profiling of IP Blocks (Congestion Estimation)

• Step 2: Re-cluster Only Congested IP Blocks (Selective Depopulation)

Page 23: Clustering of  Large Designs  for Channel-Width Constrained  FPGAs

IP Block Properties• Cluster IP Blocks into N=16, k=6 • VPR: determine minimum channel width for each IP Block• Sort IP Blocks based on channel width

0

10

20

30

40

50

60

70

80

90

alu4s2

98tse

ng

mise

x3

s384

17ex

5p

s385

84

apex

2se

qdif

feq

apex

4ds

ip

bigke

yde

ssp

lafri

scclm

a

ex10

10 pdc

ellipt

ic

IP Blocks, sorted by Channel Width

Ch

ann

el W

idth

Hard-to-Route CircuitsEasy-to-Route Circuits

Page 24: Clustering of  Large Designs  for Channel-Width Constrained  FPGAs

Channel Width Profiling of IP Block• Cluster sizes

– NA = FPGA Architecture Cluster Size (fixed)– NC = BLE-Limit Size (variable)

• Sweep NC for each IP block

10

20

30

40

50

60

70

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

BLE-Limit Size, NC

Ro

ute

d C

han

nel

Wid

th

clma

tseng

Page 25: Clustering of  Large Designs  for Channel-Width Constrained  FPGAs

Analysis with Constraint• Given channel-width constraint of 60 tracks

– tseng routable (easy)– clma routable for NC <= 10– clma not routable for NC > 10

10

20

30

40

50

60

70

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

BLE-Limit Size, NC

Ro

ute

d C

han

nel

Wid

th

clma

tseng

Page 26: Clustering of  Large Designs  for Channel-Width Constrained  FPGAs

Our Technique: Selective Depopulation

• Step 1: Channel Width Profiling of IP Blocks (Congestion Estimation)

• Step 2: Re-cluster Only Congested IP Blocks (Selective Depopulation)

Page 27: Clustering of  Large Designs  for Channel-Width Constrained  FPGAs

Uniform Depopulation• Minimum NC Cluster Size

– De-populate all clusters equally

– Eg, use NC=10 for both IP Blocks

10

20

30

40

50

60

70

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

Cluster Size, NC

Ro

ute

d C

ha

nn

el

Wid

th

clmatseng

Page 28: Clustering of  Large Designs  for Channel-Width Constrained  FPGAs

Non-Uniform Depopulation• Maximal NC Cluster Size

– Depopulate each IP block according to maximal cluster size

– Eg, clma NC=10, tseng NC=16

10

20

30

40

50

60

70

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

Cluster Size, NC

Ro

ute

d C

ha

nn

el

Wid

th

clmatseng

Page 29: Clustering of  Large Designs  for Channel-Width Constrained  FPGAs

Uniform vs. Non-Uniform

2

4

6

8

10

12

14

16

18

20

40 50 60 70 80 90 100

• Non-Uniform depopulation better than Uniform– Lower CLB count– Higher LUT utilization

Channel Width Constraint

Uniform Non-Uniform

LUT UtilizationTotal CLBs Needed

Channel Width Constraint

x 1,

000

0

0.2

0.4

0.6

0.8

1

40 50 60 70 80 90 100

Uniform Non-Uniform

Page 30: Clustering of  Large Designs  for Channel-Width Constrained  FPGAs

MetaCircuit Clustering Results

• Depopulate the most-congested IP blocks

– (BLE-Limit) of each IP block shown(max=16)

– Some IP blocks are depopulated more than others

Page 31: Clustering of  Large Designs  for Channel-Width Constrained  FPGAs

0.8

1

1.2

1.4

1.6

1.8

2

40 50 60 70 80 90 100

1

Channel Width Constraint

No

rmal

ized

Are

a

MetaCircuit P&R Results

40

50

60

70

80

90

100

40 50 60 70 80 90 100

• Clique MetaCircuit– P&R channel width results closely match “constraints”

• Shrink Channel Width by ~20% (from 95 to 75), NO AREA INCREASE by ~50% (from 95 to 50), 1.7x area increase

Channel Width Constraint

Ch

ann

el W

idth

Constraint Routed

Page 32: Clustering of  Large Designs  for Channel-Width Constrained  FPGAs

Other MetaCircuit Results

Circuit Clustering Tool

Channel Width Decreases

( < 1.05 x Area )

( 1.7 x – 3.5 x Area )

CliqueT-VPack

iRAC Rep.20%7%

50%29%

Independent*T-VPack

iRAC Rep.24%27%

42%30%

Pipeline*T-VPack

iRAC Rep.25%11%

55%27%

* These latest results are better than those given in paper

Page 33: Clustering of  Large Designs  for Channel-Width Constrained  FPGAs

Critical Path Delay and Average Wirelength• Expect critical path delay to increase under tighter constraints

– Delay “noise” due to instability of floorplan locations

• Average wirelength / net increases under tighter constraints

23

24

25

26

27

28

29

30

40 50 60 70 80 90 100

Channel Width Constraint

Cri

tic

al

Pa

th (

ns

)

13

14

15

16

17

18

19

20

Avg

. R

ou

ted

Wir

elen

gth

per

Net

Critical PathAvg. WL/Net

Page 34: Clustering of  Large Designs  for Channel-Width Constrained  FPGAs

Conclusion• System-level technique to map large System-on-Chip (SoC) designs

to channel-width constrained FPGAs using fewer routing resources

• Depopulating CLBs effective at reducing channel width

• Non-uniform depopulation important to limit area inflation

• Channel width reduced– by 0-20% with < 5% area increase– by up to 50% with 3.3 X area increase

• Effective solution to trade-off CLBs for Interconnect !!!– UNROUTABLE circuits (channel width TOO LARGE)

can be made ROUTABLE (reduced channel width)by buying an FPGA with MORE LOGIC!!!

Page 35: Clustering of  Large Designs  for Channel-Width Constrained  FPGAs

End of Talk

Page 36: Clustering of  Large Designs  for Channel-Width Constrained  FPGAs

Future Work

• Real-Life SoC Benchmark– Licensed IP: Bluetooth baseband processor– 325,000 ASIC gates– Numerous IP blocks of varying complexity– Needed to authenticate “Synthetic” results

• Automated technique to find “hard” IP blocks– Granularity is based on design hierarchy (?)– Replaces time-consuming Step 1 of process