T1B1 Das (1)stevew/papers/slides/fpl09a.pdfTitle Microsoft PowerPoint - T1B1_Das (1) Author Steve...

21
Modeling Post-Techmapping and Post-Clustering FPGA Circuit Depth Funded by Altera and NSERC Joydip Das 1 , Steven J.E. Wilton 1 , Philip Leong 2 , Wayne Luk 3 1 The University of British Columbia, 2 The Chinese University of Hong Kong, 3 Imperial College London 2 FPGA Architecture Design Architectures are usually evaluated using experimental methods - Using tools like VPR Problems: - Multi-dimensional optimization space – too much time - Require CAD tools for each architecture - Or “tuning” of a generic tool like VPR - No insight into what makes a good architecture

Transcript of T1B1 Das (1)stevew/papers/slides/fpl09a.pdfTitle Microsoft PowerPoint - T1B1_Das (1) Author Steve...

Page 1: T1B1 Das (1)stevew/papers/slides/fpl09a.pdfTitle Microsoft PowerPoint - T1B1_Das (1) Author Steve Wilton Created Date 9/7/2009 11:33:50 AM

Modeling Post-Techmapping and Post-Clustering FPGA Circuit Depth

Funded by Altera and NSERC

Joydip Das1, Steven J.E. Wilton1, Philip Leong2, Wayne Luk3

1The University of British Columbia, 2The Chinese University of Hong Kong,

3Imperial College London

2

FPGA Architecture Design

Architectures are usually evaluated using experimental methods

- Using tools like VPR

Problems:- Multi-dimensional optimization space – too much time- Require CAD tools for each architecture

- Or “tuning” of a generic tool like VPR- No insight into what makes a good architecture

Page 2: T1B1 Das (1)stevew/papers/slides/fpl09a.pdfTitle Microsoft PowerPoint - T1B1_Das (1) Author Steve Wilton Created Date 9/7/2009 11:33:50 AM

3

Can we supplement the experimental approach with analytical techniques?

This talk: A model that makes it possible

4

This Talk1. Motivation: Speeding up architecture design

2. The model: - What makes a good model- What makes it hard- Overview

3. Details on Depth/Delay Model

4. Example of our Model’s Application

Page 3: T1B1 Das (1)stevew/papers/slides/fpl09a.pdfTitle Microsoft PowerPoint - T1B1_Das (1) Author Steve Wilton Created Date 9/7/2009 11:33:50 AM

5

Accelerating FPGA Architecture Investigation

Early Architecture Evaluation

Insight to Guide Experimentation

6

Analytical ModelThe key is an analytical model that relates architecture parameters to efficiency of the FPGA:

Delay of FPGA Implementation

Depth of Critical Path in 2-LUTs

Lookup-table size,Routing parameters, etc

Area= fA (K,N,Fc ....)Delay = fD (K,N,Fc,....)Power = fP (K,N,Fc ....)

Page 4: T1B1 Das (1)stevew/papers/slides/fpl09a.pdfTitle Microsoft PowerPoint - T1B1_Das (1) Author Steve Wilton Created Date 9/7/2009 11:33:50 AM

7

Challenge: Capturing the “essence” of programmable logicin a set of simple equations

8

What makes a good model?

1. Analytical Model– No curve-fitting or expensive experimental techniques

2. Balancing Complexity and Accuracy– Simpler equations provide significantly more insight

into architectural tradeoffs

3. Architectural Relationships– Should be as independent of user circuit as possible

Page 5: T1B1 Das (1)stevew/papers/slides/fpl09a.pdfTitle Microsoft PowerPoint - T1B1_Das (1) Author Steve Wilton Created Date 9/7/2009 11:33:50 AM

9

Estimation vs. Modeling

Estimation: What is the performance or density of a given user circuit on an FPGA?

- Useful in CAD tools to predict long paths, congestedregions, etc.

Modeling: On average, how does an architecture parameter affect the expected speed or density of an FPGA?

- Can we answer this independent of the user circuit?

10

What makes it hard:- Many parameters that interact in complex ways

How we make it feasible:- Break the model into stages, analogous to CAD flow

Page 6: T1B1 Das (1)stevew/papers/slides/fpl09a.pdfTitle Microsoft PowerPoint - T1B1_Das (1) Author Steve Wilton Created Date 9/7/2009 11:33:50 AM

11

Breaking it up: Delay

Depth of c.p.in 2-LUTs

Depth of c.p.In k-LUTs

# Inter-clusterconnections

on c.p.

# Intra-clusterconnections

on c.p.

Post-RoutedWirelengthalong c.p.

Average Post-PlacementWirelength

DelayPost-PlacementWirelength along

c.p.

K, p

12

Depth of c.p.in 2-LUTs

Depth of c.p.In k-LUTs

# Inter-clusterconnections

on c.p.

# Intra-clusterconnections

on c.p.

Post-RoutedWirelengthalong c.p.

Average Post-PlacementWirelength

DelayPost-PlacementWirelength along

c.p.

K, p N, I, p

Breaking it up: Delay

Page 7: T1B1 Das (1)stevew/papers/slides/fpl09a.pdfTitle Microsoft PowerPoint - T1B1_Das (1) Author Steve Wilton Created Date 9/7/2009 11:33:50 AM

13

Depth of c.p.in 2-LUTs

Depth of c.p.In k-LUTs

# Inter-clusterconnections

on c.p.

# Intra-clusterconnections

on c.p.

Post-RoutedWirelengthalong c.p.

Average Post-PlacementWirelength

DelayPost-PlacementWirelength along

c.p.

K, p N, I, p

Breaking it up: Delay

14

Depth of c.p.in 2-LUTs

Depth of c.p.In k-LUTs

# Inter-clusterconnections

on c.p.

# Intra-clusterconnections

on c.p.

Post-RoutedWirelengthalong c.p.

Average Post-PlacementWirelength

DelayPost-PlacementWirelength along

c.p.

K, p N, I, p

Fc, Fs, etc

Breaking it up: Delay

Page 8: T1B1 Das (1)stevew/papers/slides/fpl09a.pdfTitle Microsoft PowerPoint - T1B1_Das (1) Author Steve Wilton Created Date 9/7/2009 11:33:50 AM

15

Depth of c.p.in 2-LUTs

Depth of c.p.In k-LUTs

# Inter-clusterconnections

on c.p.

# Intra-clusterconnections

on c.p.

Post-RoutedWirelengthalong c.p.

Average Post-PlacementWirelength

DelayPost-PlacementWirelength along

c.p.

K, p N, I, p

Fc, Fs, etcAll arch. params

Breaking it up: Delay

16

Depth of c.p.in 2-LUTs

Depth of c.p.In k-LUTs

# Inter-clusterconnections

on c.p.

# Intra-clusterconnections

on c.p.

Post-RoutedWirelengthalong c.p.

Average Post-PlacementWirelength

DelayPost-PlacementWirelength along

c.p.

Each part is simple, but together, theyrelate delay-efficiency to architectural parameters

K, p N, I, p

Fc, Fs, etcAll arch. params

Breaking it up: Delay

Page 9: T1B1 Das (1)stevew/papers/slides/fpl09a.pdfTitle Microsoft PowerPoint - T1B1_Das (1) Author Steve Wilton Created Date 9/7/2009 11:33:50 AM

17

Depth of c.p.in 2-LUTs

Depth of c.p.In k-LUTs

# Inter-clusterconnections

on c.p.

# Intra-clusterconnections

on c.p.

Post-RoutedWirelengthalong c.p.

Average Post-PlacementWirelength

DelayPost-PlacementWirelength along

c.p.

Tech MappingClustering

RoutingPhysical Models

This Paper:

18

Depth of c.p.in 2-LUTs

Depth of c.p.In k-LUTs

# Inter-clusterconnections

on c.p.

# Intra-clusterconnections

on c.p.

Post-RoutedWirelengthalong c.p.

Average Post-PlacementWirelength

DelayPost-PlacementWirelength along

c.p.

K, p

Technology Mapping Model

Page 10: T1B1 Das (1)stevew/papers/slides/fpl09a.pdfTitle Microsoft PowerPoint - T1B1_Das (1) Author Steve Wilton Created Date 9/7/2009 11:33:50 AM

19

Review: Technology Mapping

Most algorithms give implementations of minimum depth

Map logic gates to lookup-tables:

20

Modeling Technology Mapping:Intuitively, Bigger LUT Size Means Smaller Depth

2-LUT / Depth=4 4-LUT / Depth=2Circuit

So, Larger LUT Size Fewer Nets Lower Depth

Page 11: T1B1 Das (1)stevew/papers/slides/fpl09a.pdfTitle Microsoft PowerPoint - T1B1_Das (1) Author Steve Wilton Created Date 9/7/2009 11:33:50 AM

21

Mapping with K=4 : Two Extremes

Depth = (K – 1) = 3

[Maximum Possible]

Depth = log2(K) = 2

[Minimum Possible]

Simple Approach: Take the average

22

Technology Mapping Model

)(log12

22 γγ −+−−=

KKdd k

Depth afterTechmapping

Depth beforeTechmapping

LUT Size Average Unused Inputs in Each LUT

Page 12: T1B1 Das (1)stevew/papers/slides/fpl09a.pdfTitle Microsoft PowerPoint - T1B1_Das (1) Author Steve Wilton Created Date 9/7/2009 11:33:50 AM

23

Techmapping Model : ValidationOver-estimation

24

Depth of c.p.in 2-LUTs

Depth of c.p.In k-LUTs

# Inter-clusterconnections

on c.p.

# Intra-clusterconnections

on c.p.

Post-RoutedWirelengthalong c.p.

Average Post-PlacementWirelength

DelayPost-PlacementWirelength along

c.p.

N, I, p

Clustering Model

Page 13: T1B1 Das (1)stevew/papers/slides/fpl09a.pdfTitle Microsoft PowerPoint - T1B1_Das (1) Author Steve Wilton Created Date 9/7/2009 11:33:50 AM

Review: Clustering / Packing

25

FPGA logic blocks usually contain several LUTs:Altera: LABs Xilinx: CLBs

Goal of Clustering Algorithms: Group LUTs into LAB-sized clusters

- Connections between LUTs within a cluster are fast

26

Clustering does not eliminate nets:- It just makes some nets local (intra-cluster) and some global (inter-cluster)

Intuitively: the larger the cluster size, the more nets are made local.

Goal: derive an equation for the proportion of nets alongthe critical path that are made local

Clustering Model

Page 14: T1B1 Das (1)stevew/papers/slides/fpl09a.pdfTitle Microsoft PowerPoint - T1B1_Das (1) Author Steve Wilton Created Date 9/7/2009 11:33:50 AM

27

Sketch of derivation:1. Some nets are made local “on purpose”

2. Some nets are made local “by chance”- These are not nets that are specifically targeted

by the cluster algorithm

Work out proportion of each and combine them

Clustering Model

28

Connections on Critical Path – Primary Goal

LUT-1LUT-1

LUT-3

LUT-2

LUT-4

LUT-5

Cluster Size, c = 5

LUT-2

LUT-3

LUT-4

LUT-5

LUT-6

LUT-7

Page 15: T1B1 Das (1)stevew/papers/slides/fpl09a.pdfTitle Microsoft PowerPoint - T1B1_Das (1) Author Steve Wilton Created Date 9/7/2009 11:33:50 AM

29

Connections on Critical Path – Primary Goal

LUT-1LUT-1

LUT-3

LUT-2

LUT-4

LUT-5

Cluster Size, c = 5

LUT-2

LUT-3

LUT-4

LUT-5

LUT-6

LUT-7

)1( /

−==

cLocalAbsorbedcSizeCluster

30

Connections Absorbed: Not on Critical Path – by chance

LUT-1 LUT-1

LUT-3

LUT-2

LUT-4

LUT-5

Cluster Size, c = 5

LUT-2

LUT-3

LUT-4

LUT-5

LUT-6

LUT-7

Page 16: T1B1 Das (1)stevew/papers/slides/fpl09a.pdfTitle Microsoft PowerPoint - T1B1_Das (1) Author Steve Wilton Created Date 9/7/2009 11:33:50 AM

31

Connections Absorbed: Not on Critical Path – by chance

LUT-1 LUT-1

LUT-3

LUT-2

LUT-4

LUT-5

Cluster Size, c = 5

LUT-2

LUT-3

LUT-4

LUT-5

LUT-6

LUT-7

[ ]1)(

:

+−− cKcknc

chancebyAbsorbed

γ

Details of derivation can be found in our paper

32

Which leads to

[ ]

c

kk

k

c

nncwhere

Kc

cKcncc

dd

=

⎥⎥⎥⎥

⎢⎢⎢⎢

+−−+−= ,

)(

1)()1(

γ

γ

Clustering Model:

Details of derivation can be found in our paper

Page 17: T1B1 Das (1)stevew/papers/slides/fpl09a.pdfTitle Microsoft PowerPoint - T1B1_Das (1) Author Steve Wilton Created Date 9/7/2009 11:33:50 AM

33

Clustering Model : Validation

LUT Size, K=4 LUT Size, K=6

34

Is our Model Actually Useful?

Page 18: T1B1 Das (1)stevew/papers/slides/fpl09a.pdfTitle Microsoft PowerPoint - T1B1_Das (1) Author Steve Wilton Created Date 9/7/2009 11:33:50 AM

35

We considered two flows:

The “shape” of the results is more important than the actual values

Example of Model’s Application

36

Critical Path Delay from Analytical Model

Analytical Flow

Page 19: T1B1 Das (1)stevew/papers/slides/fpl09a.pdfTitle Microsoft PowerPoint - T1B1_Das (1) Author Steve Wilton Created Date 9/7/2009 11:33:50 AM

37

Intra-Cluster & Inter-Cluster Delay:

Intra-Cluster Delay(t_intra)

Inter-Cluster Delay(t_inter)

38

Critical Path Delay from Analytical Model

Analytical Flow

Page 20: T1B1 Das (1)stevew/papers/slides/fpl09a.pdfTitle Microsoft PowerPoint - T1B1_Das (1) Author Steve Wilton Created Date 9/7/2009 11:33:50 AM

39

Important caveat: We do not yet have a model for delay routing

For now, we use experimental results for this part

Depth of c.p.in 2-LUTs

Depth of c.p.In k-LUTs

# Inter-clusterconnections

on c.p.

# Intra-clusterconnections

on c.p.

Post-RoutedWirelengthalong c.p.

Average Post-PlacementWirelength

DelayPost-PlacementWirelength along

c.p.

40

Overall Results: Delay

Page 21: T1B1 Das (1)stevew/papers/slides/fpl09a.pdfTitle Microsoft PowerPoint - T1B1_Das (1) Author Steve Wilton Created Date 9/7/2009 11:33:50 AM

41Same Conclusion in both cases: K=4.

Overall Results: Delay

42

Key Result: It is possible to describe an FPGA architecturesusing a set of simple equations

This Talk:Analytical model for techmapped & clustered depth

Example of model’s application for early stage architecture evaluation

Ongoing Works:Analytical model for post-routing delay

Investigation of "Discrete Effects"

Analytical model for the whole design flow

Summary