Exploration of Pipelined FPGA Interconnect Structures Scott Hauck Akshay Sharma, Carl Ebeling...
-
date post
19-Dec-2015 -
Category
Documents
-
view
220 -
download
0
Transcript of Exploration of Pipelined FPGA Interconnect Structures Scott Hauck Akshay Sharma, Carl Ebeling...
Exploration of Pipelined FPGA Interconnect Structures
Scott HauckAkshay Sharma, Carl EbelingUniversity of Washington
Katherine ComptonUniversity of Wisconsin - Madison
2
PipeRoute
• FPGA’2003: Pipelining-aware Router for FPGAs• Architecture-adaptive, based on Pathfinder
• Uses optimal 2-terminal, 1-delay router
• Greedy formulation for multi-delay, multi-terminal routing
T1
S
T2
3
RaPiD
• Coarse-grained, 1D, 16-bit, w/DSP Units
• Carl Ebeling @ UW-CSE
• Pipelined interconnect via Bus Connectors (BCs)
GP
R
RA
M
RA
M
GP
R
MU
LT
GP
R
AL
U
AL
U
GP
R
GP
R
RA
M
AL
U
GP
R
4
Pipelined Routing Results• Area expansion due to pipelining
• Normalized to unpipelined circuit area
0
0.5
1
1.5
2
2.5
3
0% 10% 20% 30% 40% 50% 60% 70%
% PIPELINED SIGNALS
NO
RM
AL
IZE
D A
RE
A
TS TS
Ave: 75% cost
5
Contributions
• Optimized PipeRoute• Support multiple delays per BC (greedy preprocessor)
• Timing driven – Pathfinder’s, worst-case criticality across signal
• RouteCost = Criticality * delay_cost + (1-criticality) * area_cost
• Arch. Exploration of RaPiD Pipelined Interconnects• Registered logic block (input/output/none)
• BC track length
• Delays per register/BC
• BC/non-BC routing mix
• Register-only logic blocks
• Goal: More efficient support of pipelined interconnects
TS
6
Methodology
• Benchmarks• Retimed, not C-slowed
• Graphs• Increase arch to fit
(cells, tracks/cell)
• Variation around local minima
0
1
2
3
4
5
6
7
8
9
10
1 2 3 4 5 6 7Delays per BC/Reg
AR
EA
0
2
4
6
8
10
12
DE
LA
Y
AREA DELAY AREA*DELAY
0%
20%
40%
60%
80%
NETLIST
% P
IPE
LIN
ED
SIG
NA
LS
7
Registers in Logic Blocks
• Output Registers
• No Registers
• Input Registers
0
1
2
3
4
5
6
7
8
9
Out None InRegs in Functional Units
AR
EA
0
2
4
6
8
10
12
DE
LA
Y
AREA DELAY AREA*DELAY
+
+
+
T1
S
T2
5% 20% 23%
8
Delays per Register/BC
• 1 Delay/BC
• 2 Delays/BC
0
1
2
3
4
5
6
7
8
9
10
1 2 3 4 5 6 7Delays per BC/Reg
AR
EA
0
2
4
6
8
10
12
DE
LA
Y
AREA DELAY AREA*DELAY
15% 20% 30%
9
BC Track Length
• Length 16 BC wires
• Length 8 BC wires
0
1
2
3
4
5
6
7
8
9
32 16 8 4BC Track Length
AR
EA
0
5
10
15
20
25
DE
LA
Y
AREA DELAY AREA*DELAY
17% 64% 69%
10
Routing Resource Mix (BC vs. non-BC)
• 5/7
• 7/7
0
1
2
3
4
5
6
7
8
9
7/7 6/7 5/7 4/7 3/7Proportion BC Tracks
AR
EA
0
2
4
6
8
10
12
DE
LA
Y
AREA DELAY AREA*DELAY
19% 17% 18%
11
GPRs per Cell
• GPR roles:• Registers from computation
• Passthrough for changing tracks
• 6 per cell
• 9 per cell
0
1
2
3
4
5
6
7
8
5 6 7 8 9 10GPRs per Cell
AR
EA
0
2
4
6
8
10
12
DE
LA
Y
AREA DELAY AREA*DELAY
6% 23% 22%
12
Overall – vs. RaPiD-I
• RaPiD-I• 1 BC / cell (13 LBs long)
• 5/7 BC tracks
• 3 registers / BC
• 6 GPRs / cell
• registered outputs
• Post-Explore• 1 BC / cell (16 LBs long)
• 5/7 BC tracks
• 3 registers / BC
• 9 GPRs / cell
• registered inputs
0
0.2
0.4
0.6
0.8
1
1.2
1.4
firtm
fft16
cascade
matmult4
sobel
imagerapid
firsymeven
sort_g
sort_rb
Proportion non-BC Tracks
Ra
tio
Po
st/
Ra
PiD
-I
AREA DELAY AREA*DELAY
Ave: 1% 18% 19%
13
Overall – Pipelining Cost
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
0% 10% 20% 30% 40% 50% 60% 70% 80%
% Pipelined Signals
No
rma
lize
d A
rea
TS TS
Ave: 18% cost
14
Conclusions
• Router for arbitrary pipelined architectures• Timing-driven
• Supports multiple delays at each register site
• Good quality: <18% of pseudo-lower bound (non-pipelined) area
• Architecture Exploration of RaPiD• Parameters:
• Registered inputs on functional units
• Length 16 wires
• 3 delays per BC/register
• 2/7 non-registered, 5/7 registered wires
• 9 GPRs/cell to improve flexibility
• Delay: spacing of registers CRITICAL, too close better than too far
• 19% area*delay improvement over RaPiD-I (primarily delay)
15
*** End of Talk Marker ***
16
1-Delay Two Terminal
• Can do optimal routing for 1-delay routes via BFS
TS
17
1-Delay Two Terminal
• Can do optimal routing for 1-delay routes via BFS
TS
18
1-Delay Two Terminal
• Can do optimal routing for 1-delay routes via BFS
TS
19
1-Delay Two Terminal
• Can do optimal routing for 1-delay routes via BFS
TS
20
1-Delay Two Terminal
• Can do optimal routing for 1-delay routes via BFS
TS
21
1-Delay Two Terminal
• Can do optimal routing for 1-delay routes via BFS
TS
22
1-Delay Two Terminal
• Can do optimal routing for 1-delay routes via BFS
TS
23
N-Delay Two Terminal
• Greedy Approximation via 1-Delay Router
TS
24
N-Delay Two Terminal
• Greedy Approximation via 1-Delay Router• Find 1-delay route
TS
25
N-Delay Two Terminal
• Greedy Approximation via 1-Delay Router• Find 1-delay route
• While not enough delay on route
• Replace any 0-delay segment with cheapest 1-delay replacement
TS
26
N-Delay Two Terminal
• Greedy Approximation via 1-Delay Router• Find 1-delay route
• While not enough delay on route
• Replace any 0-delay segment with cheapest 1-delay replacement
TS
27
N-Delay Two Terminal
• Greedy Approximation via 1-Delay Router• Find 1-delay route
• While not enough delay on route
• Replace any 0-delay segment with cheapest 1-delay replacement
TS
28
N-Delay Two Terminal
• Greedy Approximation via 1-Delay Router• Find 1-delay route
• While not enough delay on route
• Replace any 0-delay segment with cheapest 1-delay replacement
TS