T1B1 Das (1)stevew/papers/slides/fpl09a.pdfTitle Microsoft PowerPoint - T1B1_Das (1) Author Steve...
Transcript of T1B1 Das (1)stevew/papers/slides/fpl09a.pdfTitle Microsoft PowerPoint - T1B1_Das (1) Author Steve...
![Page 1: T1B1 Das (1)stevew/papers/slides/fpl09a.pdfTitle Microsoft PowerPoint - T1B1_Das (1) Author Steve Wilton Created Date 9/7/2009 11:33:50 AM](https://reader036.fdocuments.net/reader036/viewer/2022071507/6127fd06b6908f13717e8b49/html5/thumbnails/1.jpg)
Modeling Post-Techmapping and Post-Clustering FPGA Circuit Depth
Funded by Altera and NSERC
Joydip Das1, Steven J.E. Wilton1, Philip Leong2, Wayne Luk3
1The University of British Columbia, 2The Chinese University of Hong Kong,
3Imperial College London
2
FPGA Architecture Design
Architectures are usually evaluated using experimental methods
- Using tools like VPR
Problems:- Multi-dimensional optimization space – too much time- Require CAD tools for each architecture
- Or “tuning” of a generic tool like VPR- No insight into what makes a good architecture
![Page 2: T1B1 Das (1)stevew/papers/slides/fpl09a.pdfTitle Microsoft PowerPoint - T1B1_Das (1) Author Steve Wilton Created Date 9/7/2009 11:33:50 AM](https://reader036.fdocuments.net/reader036/viewer/2022071507/6127fd06b6908f13717e8b49/html5/thumbnails/2.jpg)
3
Can we supplement the experimental approach with analytical techniques?
This talk: A model that makes it possible
4
This Talk1. Motivation: Speeding up architecture design
2. The model: - What makes a good model- What makes it hard- Overview
3. Details on Depth/Delay Model
4. Example of our Model’s Application
![Page 3: T1B1 Das (1)stevew/papers/slides/fpl09a.pdfTitle Microsoft PowerPoint - T1B1_Das (1) Author Steve Wilton Created Date 9/7/2009 11:33:50 AM](https://reader036.fdocuments.net/reader036/viewer/2022071507/6127fd06b6908f13717e8b49/html5/thumbnails/3.jpg)
5
Accelerating FPGA Architecture Investigation
Early Architecture Evaluation
Insight to Guide Experimentation
6
Analytical ModelThe key is an analytical model that relates architecture parameters to efficiency of the FPGA:
Delay of FPGA Implementation
Depth of Critical Path in 2-LUTs
Lookup-table size,Routing parameters, etc
Area= fA (K,N,Fc ....)Delay = fD (K,N,Fc,....)Power = fP (K,N,Fc ....)
![Page 4: T1B1 Das (1)stevew/papers/slides/fpl09a.pdfTitle Microsoft PowerPoint - T1B1_Das (1) Author Steve Wilton Created Date 9/7/2009 11:33:50 AM](https://reader036.fdocuments.net/reader036/viewer/2022071507/6127fd06b6908f13717e8b49/html5/thumbnails/4.jpg)
7
Challenge: Capturing the “essence” of programmable logicin a set of simple equations
8
What makes a good model?
1. Analytical Model– No curve-fitting or expensive experimental techniques
2. Balancing Complexity and Accuracy– Simpler equations provide significantly more insight
into architectural tradeoffs
3. Architectural Relationships– Should be as independent of user circuit as possible
![Page 5: T1B1 Das (1)stevew/papers/slides/fpl09a.pdfTitle Microsoft PowerPoint - T1B1_Das (1) Author Steve Wilton Created Date 9/7/2009 11:33:50 AM](https://reader036.fdocuments.net/reader036/viewer/2022071507/6127fd06b6908f13717e8b49/html5/thumbnails/5.jpg)
9
Estimation vs. Modeling
Estimation: What is the performance or density of a given user circuit on an FPGA?
- Useful in CAD tools to predict long paths, congestedregions, etc.
Modeling: On average, how does an architecture parameter affect the expected speed or density of an FPGA?
- Can we answer this independent of the user circuit?
10
What makes it hard:- Many parameters that interact in complex ways
How we make it feasible:- Break the model into stages, analogous to CAD flow
![Page 6: T1B1 Das (1)stevew/papers/slides/fpl09a.pdfTitle Microsoft PowerPoint - T1B1_Das (1) Author Steve Wilton Created Date 9/7/2009 11:33:50 AM](https://reader036.fdocuments.net/reader036/viewer/2022071507/6127fd06b6908f13717e8b49/html5/thumbnails/6.jpg)
11
Breaking it up: Delay
Depth of c.p.in 2-LUTs
Depth of c.p.In k-LUTs
# Inter-clusterconnections
on c.p.
# Intra-clusterconnections
on c.p.
Post-RoutedWirelengthalong c.p.
Average Post-PlacementWirelength
DelayPost-PlacementWirelength along
c.p.
K, p
12
Depth of c.p.in 2-LUTs
Depth of c.p.In k-LUTs
# Inter-clusterconnections
on c.p.
# Intra-clusterconnections
on c.p.
Post-RoutedWirelengthalong c.p.
Average Post-PlacementWirelength
DelayPost-PlacementWirelength along
c.p.
K, p N, I, p
Breaking it up: Delay
![Page 7: T1B1 Das (1)stevew/papers/slides/fpl09a.pdfTitle Microsoft PowerPoint - T1B1_Das (1) Author Steve Wilton Created Date 9/7/2009 11:33:50 AM](https://reader036.fdocuments.net/reader036/viewer/2022071507/6127fd06b6908f13717e8b49/html5/thumbnails/7.jpg)
13
Depth of c.p.in 2-LUTs
Depth of c.p.In k-LUTs
# Inter-clusterconnections
on c.p.
# Intra-clusterconnections
on c.p.
Post-RoutedWirelengthalong c.p.
Average Post-PlacementWirelength
DelayPost-PlacementWirelength along
c.p.
K, p N, I, p
Breaking it up: Delay
14
Depth of c.p.in 2-LUTs
Depth of c.p.In k-LUTs
# Inter-clusterconnections
on c.p.
# Intra-clusterconnections
on c.p.
Post-RoutedWirelengthalong c.p.
Average Post-PlacementWirelength
DelayPost-PlacementWirelength along
c.p.
K, p N, I, p
Fc, Fs, etc
Breaking it up: Delay
![Page 8: T1B1 Das (1)stevew/papers/slides/fpl09a.pdfTitle Microsoft PowerPoint - T1B1_Das (1) Author Steve Wilton Created Date 9/7/2009 11:33:50 AM](https://reader036.fdocuments.net/reader036/viewer/2022071507/6127fd06b6908f13717e8b49/html5/thumbnails/8.jpg)
15
Depth of c.p.in 2-LUTs
Depth of c.p.In k-LUTs
# Inter-clusterconnections
on c.p.
# Intra-clusterconnections
on c.p.
Post-RoutedWirelengthalong c.p.
Average Post-PlacementWirelength
DelayPost-PlacementWirelength along
c.p.
K, p N, I, p
Fc, Fs, etcAll arch. params
Breaking it up: Delay
16
Depth of c.p.in 2-LUTs
Depth of c.p.In k-LUTs
# Inter-clusterconnections
on c.p.
# Intra-clusterconnections
on c.p.
Post-RoutedWirelengthalong c.p.
Average Post-PlacementWirelength
DelayPost-PlacementWirelength along
c.p.
Each part is simple, but together, theyrelate delay-efficiency to architectural parameters
K, p N, I, p
Fc, Fs, etcAll arch. params
Breaking it up: Delay
![Page 9: T1B1 Das (1)stevew/papers/slides/fpl09a.pdfTitle Microsoft PowerPoint - T1B1_Das (1) Author Steve Wilton Created Date 9/7/2009 11:33:50 AM](https://reader036.fdocuments.net/reader036/viewer/2022071507/6127fd06b6908f13717e8b49/html5/thumbnails/9.jpg)
17
Depth of c.p.in 2-LUTs
Depth of c.p.In k-LUTs
# Inter-clusterconnections
on c.p.
# Intra-clusterconnections
on c.p.
Post-RoutedWirelengthalong c.p.
Average Post-PlacementWirelength
DelayPost-PlacementWirelength along
c.p.
Tech MappingClustering
RoutingPhysical Models
This Paper:
18
Depth of c.p.in 2-LUTs
Depth of c.p.In k-LUTs
# Inter-clusterconnections
on c.p.
# Intra-clusterconnections
on c.p.
Post-RoutedWirelengthalong c.p.
Average Post-PlacementWirelength
DelayPost-PlacementWirelength along
c.p.
K, p
Technology Mapping Model
![Page 10: T1B1 Das (1)stevew/papers/slides/fpl09a.pdfTitle Microsoft PowerPoint - T1B1_Das (1) Author Steve Wilton Created Date 9/7/2009 11:33:50 AM](https://reader036.fdocuments.net/reader036/viewer/2022071507/6127fd06b6908f13717e8b49/html5/thumbnails/10.jpg)
19
Review: Technology Mapping
Most algorithms give implementations of minimum depth
Map logic gates to lookup-tables:
20
Modeling Technology Mapping:Intuitively, Bigger LUT Size Means Smaller Depth
2-LUT / Depth=4 4-LUT / Depth=2Circuit
So, Larger LUT Size Fewer Nets Lower Depth
![Page 11: T1B1 Das (1)stevew/papers/slides/fpl09a.pdfTitle Microsoft PowerPoint - T1B1_Das (1) Author Steve Wilton Created Date 9/7/2009 11:33:50 AM](https://reader036.fdocuments.net/reader036/viewer/2022071507/6127fd06b6908f13717e8b49/html5/thumbnails/11.jpg)
21
Mapping with K=4 : Two Extremes
Depth = (K – 1) = 3
[Maximum Possible]
Depth = log2(K) = 2
[Minimum Possible]
Simple Approach: Take the average
22
Technology Mapping Model
)(log12
22 γγ −+−−=
KKdd k
Depth afterTechmapping
Depth beforeTechmapping
LUT Size Average Unused Inputs in Each LUT
![Page 12: T1B1 Das (1)stevew/papers/slides/fpl09a.pdfTitle Microsoft PowerPoint - T1B1_Das (1) Author Steve Wilton Created Date 9/7/2009 11:33:50 AM](https://reader036.fdocuments.net/reader036/viewer/2022071507/6127fd06b6908f13717e8b49/html5/thumbnails/12.jpg)
23
Techmapping Model : ValidationOver-estimation
24
Depth of c.p.in 2-LUTs
Depth of c.p.In k-LUTs
# Inter-clusterconnections
on c.p.
# Intra-clusterconnections
on c.p.
Post-RoutedWirelengthalong c.p.
Average Post-PlacementWirelength
DelayPost-PlacementWirelength along
c.p.
N, I, p
Clustering Model
![Page 13: T1B1 Das (1)stevew/papers/slides/fpl09a.pdfTitle Microsoft PowerPoint - T1B1_Das (1) Author Steve Wilton Created Date 9/7/2009 11:33:50 AM](https://reader036.fdocuments.net/reader036/viewer/2022071507/6127fd06b6908f13717e8b49/html5/thumbnails/13.jpg)
Review: Clustering / Packing
25
FPGA logic blocks usually contain several LUTs:Altera: LABs Xilinx: CLBs
Goal of Clustering Algorithms: Group LUTs into LAB-sized clusters
- Connections between LUTs within a cluster are fast
26
Clustering does not eliminate nets:- It just makes some nets local (intra-cluster) and some global (inter-cluster)
Intuitively: the larger the cluster size, the more nets are made local.
Goal: derive an equation for the proportion of nets alongthe critical path that are made local
Clustering Model
![Page 14: T1B1 Das (1)stevew/papers/slides/fpl09a.pdfTitle Microsoft PowerPoint - T1B1_Das (1) Author Steve Wilton Created Date 9/7/2009 11:33:50 AM](https://reader036.fdocuments.net/reader036/viewer/2022071507/6127fd06b6908f13717e8b49/html5/thumbnails/14.jpg)
27
Sketch of derivation:1. Some nets are made local “on purpose”
2. Some nets are made local “by chance”- These are not nets that are specifically targeted
by the cluster algorithm
Work out proportion of each and combine them
Clustering Model
28
Connections on Critical Path – Primary Goal
LUT-1LUT-1
LUT-3
LUT-2
LUT-4
LUT-5
Cluster Size, c = 5
LUT-2
LUT-3
LUT-4
LUT-5
LUT-6
LUT-7
![Page 15: T1B1 Das (1)stevew/papers/slides/fpl09a.pdfTitle Microsoft PowerPoint - T1B1_Das (1) Author Steve Wilton Created Date 9/7/2009 11:33:50 AM](https://reader036.fdocuments.net/reader036/viewer/2022071507/6127fd06b6908f13717e8b49/html5/thumbnails/15.jpg)
29
Connections on Critical Path – Primary Goal
LUT-1LUT-1
LUT-3
LUT-2
LUT-4
LUT-5
Cluster Size, c = 5
LUT-2
LUT-3
LUT-4
LUT-5
LUT-6
LUT-7
)1( /
−==
cLocalAbsorbedcSizeCluster
30
Connections Absorbed: Not on Critical Path – by chance
LUT-1 LUT-1
LUT-3
LUT-2
LUT-4
LUT-5
Cluster Size, c = 5
LUT-2
LUT-3
LUT-4
LUT-5
LUT-6
LUT-7
![Page 16: T1B1 Das (1)stevew/papers/slides/fpl09a.pdfTitle Microsoft PowerPoint - T1B1_Das (1) Author Steve Wilton Created Date 9/7/2009 11:33:50 AM](https://reader036.fdocuments.net/reader036/viewer/2022071507/6127fd06b6908f13717e8b49/html5/thumbnails/16.jpg)
31
Connections Absorbed: Not on Critical Path – by chance
LUT-1 LUT-1
LUT-3
LUT-2
LUT-4
LUT-5
Cluster Size, c = 5
LUT-2
LUT-3
LUT-4
LUT-5
LUT-6
LUT-7
[ ]1)(
:
+−− cKcknc
chancebyAbsorbed
γ
Details of derivation can be found in our paper
32
Which leads to
[ ]
c
kk
k
c
nncwhere
Kc
cKcncc
dd
=
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
−
+−−+−= ,
)(
1)()1(
γ
γ
Clustering Model:
Details of derivation can be found in our paper
![Page 17: T1B1 Das (1)stevew/papers/slides/fpl09a.pdfTitle Microsoft PowerPoint - T1B1_Das (1) Author Steve Wilton Created Date 9/7/2009 11:33:50 AM](https://reader036.fdocuments.net/reader036/viewer/2022071507/6127fd06b6908f13717e8b49/html5/thumbnails/17.jpg)
33
Clustering Model : Validation
LUT Size, K=4 LUT Size, K=6
34
Is our Model Actually Useful?
![Page 18: T1B1 Das (1)stevew/papers/slides/fpl09a.pdfTitle Microsoft PowerPoint - T1B1_Das (1) Author Steve Wilton Created Date 9/7/2009 11:33:50 AM](https://reader036.fdocuments.net/reader036/viewer/2022071507/6127fd06b6908f13717e8b49/html5/thumbnails/18.jpg)
35
We considered two flows:
The “shape” of the results is more important than the actual values
Example of Model’s Application
36
Critical Path Delay from Analytical Model
Analytical Flow
![Page 19: T1B1 Das (1)stevew/papers/slides/fpl09a.pdfTitle Microsoft PowerPoint - T1B1_Das (1) Author Steve Wilton Created Date 9/7/2009 11:33:50 AM](https://reader036.fdocuments.net/reader036/viewer/2022071507/6127fd06b6908f13717e8b49/html5/thumbnails/19.jpg)
37
Intra-Cluster & Inter-Cluster Delay:
Intra-Cluster Delay(t_intra)
Inter-Cluster Delay(t_inter)
38
Critical Path Delay from Analytical Model
Analytical Flow
![Page 20: T1B1 Das (1)stevew/papers/slides/fpl09a.pdfTitle Microsoft PowerPoint - T1B1_Das (1) Author Steve Wilton Created Date 9/7/2009 11:33:50 AM](https://reader036.fdocuments.net/reader036/viewer/2022071507/6127fd06b6908f13717e8b49/html5/thumbnails/20.jpg)
39
Important caveat: We do not yet have a model for delay routing
For now, we use experimental results for this part
Depth of c.p.in 2-LUTs
Depth of c.p.In k-LUTs
# Inter-clusterconnections
on c.p.
# Intra-clusterconnections
on c.p.
Post-RoutedWirelengthalong c.p.
Average Post-PlacementWirelength
DelayPost-PlacementWirelength along
c.p.
40
Overall Results: Delay
![Page 21: T1B1 Das (1)stevew/papers/slides/fpl09a.pdfTitle Microsoft PowerPoint - T1B1_Das (1) Author Steve Wilton Created Date 9/7/2009 11:33:50 AM](https://reader036.fdocuments.net/reader036/viewer/2022071507/6127fd06b6908f13717e8b49/html5/thumbnails/21.jpg)
41Same Conclusion in both cases: K=4.
Overall Results: Delay
42
Key Result: It is possible to describe an FPGA architecturesusing a set of simple equations
This Talk:Analytical model for techmapped & clustered depth
Example of model’s application for early stage architecture evaluation
Ongoing Works:Analytical model for post-routing delay
Investigation of "Discrete Effects"
Analytical model for the whole design flow
Summary