3rd 3DDRESD: Floorplacer

Post on 13-Jun-2015

267 views 1 download

Tags:

Transcript of 3rd 3DDRESD: Floorplacer

Time-driven reconfiguration-aware

floorplacer

BY

Alessio Montone

alessio.montone@dresd.org

NDDRESD Baceno, Goglio – Jul 29, 2008

2

Rationale and InnovationRationale and Innovation

Problem statementGiven a reconfigurable architecture, find an on-chip position for each functional unit

Innovative contribution: taking into accountTarget Device HeterogeneityTarget Device reconfiguation capabilities Inter-FU Communication

3

AimsAims

Considering the area assignment problem tailored for reconfigurable architectures, providea formalization of the problem, andan approach (in 3 algorithms) for solving

4

OutlineOutline

IntroductionFloorplacementThe Proposed ApproachExperimental ResultsComparison with the state of the artConclusions and Future WorksQuestions

INTRODUCTIONINTRODUCTION

5

Introduction: Simulated Introduction: Simulated AnnealingAnnealing

Metropolis PhysicsAn improving move is always acceptedA peggiorative move accepted with a decreasing probability

6

•N. Metropolis, A.W. Rosenbluth, M.N. Rosenbluth, A.H. Teller, and E. Teller. "Equations of State Calculations by Fast Computing Machines". Journal of Chemical Physics, 1953

•Sëma Kachalo, Hsiao-Mei Lu, and Jie Liang, Protein Folding Dynamics via Quantification of Kinematic Energy Landscape, Physical Review Letters, 2006

Reconfigurable Architectures - IReconfigurable Architectures - I

On FPGAsReconfigurable DevicesHeterogeneousReconfiguration Limits

Different types of Reconfigurable Architectures:

TotalPartial (Static)Partial (Dynamic)

7

Reconfigurable Architectures - IIReconfigurable Architectures - II

Partial Dynamic

8

Area Assignment ProblemArea Assignment Problem

Let consider a Reconfigurable ArchitectureGiven a scheduled task graph (TG) of the application

Node: Reconfigurable Functional Unit (RFU) [*], A netlist obtained after post synthesis and technology mapping (i.e., before placement and routing)

Aim: find an area assignment for each RFU

9

[*] K. Bazargan, R. Kastner, M.S.: 3-d floorplanning: Simulated annealing and greedy placement methods for reconfigurable computing systems. IEEE Rapid Systems Prototyping (1999)

Related Works - IRelated Works - I

[*] introduced the concept of 3D floorplanning for reconfigurable systems

SA in order to solve HW/SW codesign problemFor each task choose between

HW implementationSW implementation

LimitsNo device limits consideredNo communicationinfrastructure

10

[*] K. Bazargan, R. Kastner, M.S.: 3-d floorplanning: Simulated annealing and greedy placement methods for reconfigurable computing systems. IEEE Rapid Systems Prototyping (1999)

Related Works - IIRelated Works - II

[*] is the state of art in 3D floorplanningSimulated Annealing over Transitive Closure GraphTakes into account device reconfiguration limits

LimitsNo heterogeneity consideredHigh overhead communicationinfrastructure solution [**]

11

[*] Ping-Hung Yuh, Chia-Lin Yang, Yao-Wen Chang, Hsin-Lung Chen: Temporal Floorplanning Using 3D-subTCG, Design Automation Conference, 2004

[**] S. P. Fekete, E. Kohler, and J. Teich: Optimal FPGA Module Placement with Temporal Precedence Constraints, Proc. DATE, 2001.

FLOORPLACEMENTFLOORPLACEMENT

12

Floorplanning vs. PlacementFloorplanning vs. PlacementCharacteristic Floorplanning Placement

# items <100 >10.000

Items (for FPGAs) IP-Core Slice, CLB

Aim Find a position for each item

obj. function depends

mainly on Area mainly on Wirelength

Constraints Items can be positioned everywhere

There is a set of possible positions

13

PlacementFloor plan

Floorplacement - IFloorplacement - I

Hierarchical Approach (Floorplanning + Partitioning)

14

S. N. Adya, I. L. Markov, Fixed-outline Floorplanning: Enabling Hierarchical Design, IEEE Transaction on VLSI System, 2002

S. N. Adya, S. Chaturvedi, J. A. Roy, David A. Papa, I. L. Markov: Unification of Partitioning, Placement and Floorplanning , IEEE Intl. Conf. on CAD, 2004

Floorplacement - IIFloorplacement - II

Reconfigurable Functional Unit (RFU)A netlist obtained after post synthesis and technology mapping (i.e., before placement and routing)

Reconfigurable Region (RR)

15

Floorplacement - IIIFloorplacement - III

Resource Aware (i.e., not all positions are feasible)

Device heterogeneity

Device Reconfigurationcapabilities

16

THE PROPOSED THE PROPOSED APPROACHAPPROACH

17

Proposed Problem DefinitionProposed Problem Definition

18

AimDefine RRsFor each task find

find a RRA position inside RR

Objective FunctionMin. Fragmentation

ConstraintsCommunication issuesDevice limits

Target Devices: Xilinx Virtex 4 - 5Target architecture based on EAPR design flow

Target Architecture and DevicesTarget Architecture and Devices

19

InputInput

20

Proposed Approach: overviewProposed Approach: overview

21

11stst Algorithm: Partitioning into Algorithm: Partitioning into RRRR

Aim: identify the RRs and associate each RFU to one RRHow: partitioning the TG minimizing resource requirement variance of the RRs (moving and swapping nodes)

22

Resource of type t required by RFU n, at static photo p

22ndnd Algorithm:TFiRR - I Algorithm:TFiRR - I

23

Temporal Floorplacement inside RR (TFiRR)Aim: for each RR find a set of feasible width-height pairsHow: floorplacing RFUs inside corresponding RR Assumption: RFUs’ height = height of the RR they belong to

Pseudo Code:

22ndnd Algorithm:TFiRR - II Algorithm:TFiRR - II

24

Let consider an iteration:

22ndnd Algorithm:TFiRR - III Algorithm:TFiRR - III

25

Let consider an iteration:

22ndnd Algorithm:TFiRR - IV Algorithm:TFiRR - IV

26

Let consider an iteration:

22ndnd Algorithm:TFiRR - V Algorithm:TFiRR - V

27

Let consider an iteration:

22ndnd Algorithm:TFiRR - VI Algorithm:TFiRR - VI

28

22ndnd Algorithm: TFiRR – Example Algorithm: TFiRR – Example

29

33rdrd Algorithm: RR floorplacement - I Algorithm: RR floorplacement - I

Simulated AnnealingObjective Function

Data Structure4 Constraint Lists (one per row)

30

33rdrd Algorithm: RR floorplacement - II Algorithm: RR floorplacement - II

Simulated Annealing: movesSwap two RRsMove one RRsSpan over one more rowUn-Span over one less row

After each move packing is performed(i.e., the floorplacement is compressed)

31

33rdrd Algorithm: RR floorplacement – Example - I Algorithm: RR floorplacement – Example - I

32

33rdrd Algorithm: RR floorplacement – Example - II Algorithm: RR floorplacement – Example - II

33

ImplementationImplementation

Three simulated annealers written in C++ STL

34

Output ExamplesOutput Examples

TFiRR

RR Floorplacement

35

EXPERIMENTAL RESULTSEXPERIMENTAL RESULTS

36

Identification of the number of RRs Identification of the number of RRs

37

Partitioning’s impact on TFiRRPartitioning’s impact on TFiRR

TFiRR on Partitioned

TG

TFiRR on TG

Execution Time 125ms 114ms 4m54s

Width (normalized)

1.00 1.19 1.04

38

Increasing the number of RFUs decreases the possibility to pick up the right one

Partitioning is a precondition of the 3rd algorithm in order to better exploit FPGA’s area (2D Floorplacement)

Tests performed directly floorplacing RFUsExecution time about 100 ms (100K iterations)

Floorplacement – Success Floorplacement – Success RateRate

39

Floorplacement – Aspect RatioFloorplacement – Aspect Ratio

40

Tests performed directly floorplacing RFUs

COMPARISON WITH THE COMPARISON WITH THE STATE OF THE ARTSTATE OF THE ART

41

State of the artState of the art

42

Authors Comm. Infrastructure

ResourceAware

Reconfiguration Aware

Device LimitsAware

Bazargan et al. No No Yes No

Yuh et al. Limited, w/ High Overhead

No Yes Yes

Singhal et al. No No Yes No

Feng et al. No Yes No No

NotesNotes

The comparsion is performed with respect to the description given by Yuh et al in [*]

Yuh’s approach does not supportMultiple ResourcesExistence of a Static side

In order perform the comparisonthe case study has been chosen in order to avoid multiple resource limitationYuh’s approach has been extended to support a static side

43

[*] Ping-Hung Yuh, Chia-Lin Yang, Yao-Wen Chang, Hsin-Lung Chen: Temporal Floorplanning Using 3D-subTCG, Design Automation Conference, 2004

The Case StudyThe Case Study

A Reconfigurable Architecture (for Biomedical Purpose) on XC5VLX30T 1. Collecting data from sensor2. Elaborating them3. Sending to a host computer thorough the net

44

The Proposed SolutionThe Proposed Solution

It is a reconfigurable architecture with 2 static photos

45

Area assignments comparisonArea assignments comparison

46

Proposedsolution

Yuh’s solution

Communication performances comparisonCommunication performances comparison

Considering 100 M samples, 32 bit each, at 75 MHz.

The entire data are transferred byThe Proposed Approach in 2.6 secondsYuh’s approach in 416.0 seconds

47

Conclusions - IConclusions - I

An algorithm for the identification of area constraint for reconfigurable architectures has been introduced

Novelties: taking into accountTarget device heterogeneityTarget device reconfiguration capabilitiesCommunication issues

48

Conclusions - IIConclusions - II

Results have been publishedA. Montone, M.D. Santambrogio, D. Sciuto, A Design Workflow for the Identification of Area Constraints in Dynamic Reconfigurable Systems, IEEE International Symposium on Electronic Design, Test and Applications (DELTA), 2008A. Montone, M.D. Santambrogio, Area Constraint Evaluation for FPGAs, The Syndicated Q1-2008, A technical newsletter for FPGA, ASIC Verification and DSP Designers, Synplicity Incorporation

49

Future WorksFuture Works

Take into consideration IOBs and inter modules communications

Partitioning considering clock regions

50

QuestionsQuestions

51