Exploration of Pipelined FPGA Interconnect Structures Scott Hauck Akshay Sharma, Carl Ebeling...

Exploration of Pipelined FPGA Interconnect Structures

Scott HauckAkshay Sharma, Carl EbelingUniversity of Washington

Katherine ComptonUniversity of Wisconsin - Madison

2

PipeRoute

• FPGA’2003: Pipelining-aware Router for FPGAs• Architecture-adaptive, based on Pathfinder

• Uses optimal 2-terminal, 1-delay router

• Greedy formulation for multi-delay, multi-terminal routing

T1

S

T2

3

RaPiD

• Coarse-grained, 1D, 16-bit, w/DSP Units

• Carl Ebeling @ UW-CSE

• Pipelined interconnect via Bus Connectors (BCs)

GP

R

RA

M

RA

M

GP

R

MU

LT

GP

R

AL

U

AL

U

GP

R

GP

R

RA

M

AL

U

GP

R

4

Pipelined Routing Results• Area expansion due to pipelining

• Normalized to unpipelined circuit area

0

0.5

1

1.5

2

2.5

3

0% 10% 20% 30% 40% 50% 60% 70%

% PIPELINED SIGNALS

NO

RM

AL

IZE

D A

RE

A

TS TS

Ave: 75% cost

5

Contributions

• Optimized PipeRoute• Support multiple delays per BC (greedy preprocessor)

• Timing driven – Pathfinder’s, worst-case criticality across signal

• RouteCost = Criticality * delay_cost + (1-criticality) * area_cost

• Arch. Exploration of RaPiD Pipelined Interconnects• Registered logic block (input/output/none)

• BC track length

• Delays per register/BC

• BC/non-BC routing mix

• Register-only logic blocks

• Goal: More efficient support of pipelined interconnects

TS

6

Methodology

• Benchmarks• Retimed, not C-slowed

• Graphs• Increase arch to fit

(cells, tracks/cell)

• Variation around local minima

0

1

2

3

4

5

6

7

8

9

10

1 2 3 4 5 6 7Delays per BC/Reg

AR

EA

0

2

4

6

8

10

12

DE

LA

Y

AREA DELAY AREA*DELAY

0%

20%

40%

60%

80%

NETLIST

% P

IPE

LIN

ED

SIG

NA

LS

7

Registers in Logic Blocks

• Output Registers

• No Registers

• Input Registers

0

1

2

3

4

5

6

7

8

9

Out None InRegs in Functional Units

AR

EA

0

2

4

6

8

10

12

DE

LA

Y


+

+

+

T1

S

T2

5% 20% 23%

8

Delays per Register/BC

• 1 Delay/BC

• 2 Delays/BC

0

1

2

3

4

5

6

7

8

9

10

1 2 3 4 5 6 7Delays per BC/Reg

AR

EA

0

2

4

6

8

10

12

DE

LA

Y


15% 20% 30%

9

BC Track Length

• Length 16 BC wires

• Length 8 BC wires

0

1

2

3

4

5

6

7

8

9

32 16 8 4BC Track Length

AR

EA

0

5

10

15

20

25

DE

LA

Y


17% 64% 69%

10

Routing Resource Mix (BC vs. non-BC)

• 5/7

• 7/7

0

1

2

3

4

5

6

7

8

9

7/7 6/7 5/7 4/7 3/7Proportion BC Tracks

AR

EA

0

2

4

6

8

10

12

DE

LA

Y


19% 17% 18%

11

GPRs per Cell

• GPR roles:• Registers from computation

• Passthrough for changing tracks

• 6 per cell

• 9 per cell

0

1

2

3

4

5

6

7

8

5 6 7 8 9 10GPRs per Cell

AR

EA

0

2

4

6

8

10

12

DE

LA

Y


6% 23% 22%

12

Overall – vs. RaPiD-I

• RaPiD-I• 1 BC / cell (13 LBs long)

• 5/7 BC tracks

• 3 registers / BC

• 6 GPRs / cell

• registered outputs

• Post-Explore• 1 BC / cell (16 LBs long)

• 5/7 BC tracks

• 3 registers / BC

• 9 GPRs / cell

• registered inputs

0

0.2

0.4

0.6

0.8

1

1.2

1.4

firtm

fft16

cascade

matmult4

sobel

imagerapid

firsymeven

sort_g

sort_rb

Proportion non-BC Tracks

Ra

tio

Po

st/

Ra

PiD

-I


Ave: 1% 18% 19%

13

Overall – Pipelining Cost

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

0% 10% 20% 30% 40% 50% 60% 70% 80%

% Pipelined Signals

No

rma

lize

d A

rea

TS TS

Ave: 18% cost

14

Conclusions

• Router for arbitrary pipelined architectures• Timing-driven

• Supports multiple delays at each register site

• Good quality: <18% of pseudo-lower bound (non-pipelined) area

• Architecture Exploration of RaPiD• Parameters:

• Registered inputs on functional units

• Length 16 wires

• 3 delays per BC/register

• 2/7 non-registered, 5/7 registered wires

• 9 GPRs/cell to improve flexibility

• Delay: spacing of registers CRITICAL, too close better than too far

• 19% area*delay improvement over RaPiD-I (primarily delay)

15

*** End of Talk Marker ***

16

1-Delay Two Terminal

• Can do optimal routing for 1-delay routes via BFS

TS

17



TS

18



TS

19



TS

20



TS

21



TS

22



TS

23

N-Delay Two Terminal

• Greedy Approximation via 1-Delay Router

TS

24


• Greedy Approximation via 1-Delay Router• Find 1-delay route

TS

25



• While not enough delay on route

• Replace any 0-delay segment with cheapest 1-delay replacement

TS

26





TS

27





TS

28





TS

Exploration of Pipelined FPGA Interconnect Structures Scott Hauck Akshay Sharma, Carl Ebeling...

Documents

Transcript of Exploration of Pipelined FPGA Interconnect Structures Scott Hauck Akshay Sharma, Carl Ebeling...