Temporal Placement. 2 Wasted Resources Wasted resource wr(v i ) of a node v i : Unused area...

26
Temporal Placement

Transcript of Temporal Placement. 2 Wasted Resources Wasted resource wr(v i ) of a node v i : Unused area...

Page 1: Temporal Placement. 2 Wasted Resources Wasted resource wr(v i ) of a node v i :  Unused area occupied by the node v i during the computation of a partition.

Temporal Placement

Page 2: Temporal Placement. 2 Wasted Resources Wasted resource wr(v i ) of a node v i :  Unused area occupied by the node v i during the computation of a partition.

2

Wasted Resources• Wasted resource wr(vi) of a node vi:

Unused area occupied by the node vi during the computation of a partition.

wr(vi) = (t(Pi)−ti)×ai, t(Pi): run-time of partition Pi.ti: run-time of the component vi

ai: area of vi

• Wasted resource avoidance: A component is placed on the chip only when its computation is

required and remains on the device only for time it is active.

− Idle components can be replaced by new ones.• Assumptions:

There is partial reconfiguration support. Blocks are hard

− For soft library modules, temporal partitioning + 2d placement may suffice.

Page 3: Temporal Placement. 2 Wasted Resources Wasted resource wr(v i ) of a node v i :  Unused area occupied by the node v i during the computation of a partition.

3

• Temporal placement: Time-dependent placement Management of task execution at run-time

• Graphical Representation: Z axis: life time of a configuration. Some overlaps (resource sharing) in 2-D is

allowed:− If not at the same time.

Temporal Placement

A horizontal cut: RFU configuration at a given time

Page 4: Temporal Placement. 2 Wasted Resources Wasted resource wr(v i ) of a node v i :  Unused area occupied by the node v i during the computation of a partition.

4

Temporal Placement

• Off-line (compile time) Pre-defined placement sequence

− In DSP applications, program flow can be predicted.

• On-line (during execution) Computation sequence is not known in advance.

− Dynamically at run time. Needs to solve computational intensive problems in a

fraction of millisecond:− Efficient management of the free space− Selection of the best site for a new component − Management of communication

Page 5: Temporal Placement. 2 Wasted Resources Wasted resource wr(v i ) of a node v i :  Unused area occupied by the node v i during the computation of a partition.

5

Off-Line Temporal Placement

• Off-line temporal placement: Given a DFG G = (V,E) and a reconfigurable device H

(with length Hx and width Hy), a temporal placement is a 3-dimensional vector function:

p = (px, py, pt) : V → R3,

pt defines a feasible schedule (pt(vi): Starting time of vi.)

px(vi), py(vi): Coordinates of vi on H,

Page 6: Temporal Placement. 2 Wasted Resources Wasted resource wr(v i ) of a node v i :  Unused area occupied by the node v i during the computation of a partition.

6

Off-Line Temporal Placement

[Bazargan]

Page 7: Temporal Placement. 2 Wasted Resources Wasted resource wr(v i ) of a node v i :  Unused area occupied by the node v i during the computation of a partition.

7

Off-Line Temporal Placement

• Algorithms:

1. First fit

2. Best fit

3. Packing

4. Simulated annealing (KAMER)

Page 8: Temporal Placement. 2 Wasted Resources Wasted resource wr(v i ) of a node v i :  Unused area occupied by the node v i during the computation of a partition.

8

First/Best Fit Algorithm

• First fit:

1. Select a node among ready nodes.− i.e. nodes whose predecessors have been placed

2. Place it in the first free location.

• Advantages: Fast (linear w.r.t. number of free locations)

• Disadvantages: Unused resources very high Fragmentation

Page 9: Temporal Placement. 2 Wasted Resources Wasted resource wr(v i ) of a node v i :  Unused area occupied by the node v i during the computation of a partition.

9

First/Best Fit Algorithm

• Best fit:1. Select a node among ready nodes.2. Place it in the best free location.

• Advantages: Better area utilization

• Disadvantages: Much slower:

− Complete list of free locations must be searched. Fragmentation

• Simplifying the problem: Restricting the modules to columns (Virtex II) or part

of a column (Virtex 4 / Virtex 5 / Virtex 6).

Frames

CLBs

Page 10: Temporal Placement. 2 Wasted Resources Wasted resource wr(v i ) of a node v i :  Unused area occupied by the node v i during the computation of a partition.

10

First Fit Algorithm• Algorithm: First-fit temporal placement of clusters

1. While all the clusters are not placed do

1.1. Select one cluster Cact from the list of ready-to-run clusters

1.2. From the clusters already placed, select the cluster Ctop with the smallest finishing-time such that

Ctop ≤ Cact

1.3. Place the cluster Cact on top

of Ctop

2. end while

Page 11: Temporal Placement. 2 Wasted Resources Wasted resource wr(v i ) of a node v i :  Unused area occupied by the node v i during the computation of a partition.

11

First Fit Algorithm• Configuration sequence

produced:

a) {0, 1, 2, 3, 4}

b) {5, 1, 2, 3, 4}

c) {5, 1 , 6, 7, 4}

d) {5, 1 , 6, 7, 8}

e) {5, 9 , 6, 7, 8}

f) {10, 9 , 6, 7, 8}

Page 12: Temporal Placement. 2 Wasted Resources Wasted resource wr(v i ) of a node v i :  Unused area occupied by the node v i during the computation of a partition.

12

ILP

• ILP: Can construct a constraint set An objective function

• Advantage: Exact solution

• Limitation: Only for small size problems.

Page 13: Temporal Placement. 2 Wasted Resources Wasted resource wr(v i ) of a node v i :  Unused area occupied by the node v i during the computation of a partition.

13

Packing Approach

• Variations of Packing Problem:

1. Base Minimization Problem (BMP): Given a set of boxes B and a height H, find a container with

minimal size (x, y, H) that can accommodate the set of boxes B i.e. in temporal placement:

− Find the device with minimum size (x, y) on which a set of components can be implemented given an overall run-time t = H.

2. Strip Packing Problem (SPP): Given a set of boxes and a base (X, Y), find the minimum height h

container with size (X, Y, h) that can hold all the boxes. i.e. in temporal

− Find the minimum run-time t = h for a set of components given a reconfigurable device with size (X, Y).

• The device is usually fixed SPP is more common.

Page 14: Temporal Placement. 2 Wasted Resources Wasted resource wr(v i ) of a node v i :  Unused area occupied by the node v i during the computation of a partition.

KAMER

Page 15: Temporal Placement. 2 Wasted Resources Wasted resource wr(v i ) of a node v i :  Unused area occupied by the node v i during the computation of a partition.

15

KAMER Model of an RCS

• Larger picture: An RFU operation ri can be

either accepted or rejected− based on availability of

RFU real-estate If rejected, run on CPU

− running time penalty

• Assumption: No communications among

RFUs After running an operation

on RFU, results are saved in CPU registers

Page 16: Temporal Placement. 2 Wasted Resources Wasted resource wr(v i ) of a node v i :  Unused area occupied by the node v i during the computation of a partition.

16

KAMER Model of an RCS

• On-line temporal placement

• RFUOPS = {r1, …, rn| ri = (wi, hi, si, ei)}

ei - si : time span the operation is resident in the system

• Placement of RFUOPS: Modeled as 3D template placement

• Empty Rectangle (ER): A rectangle that does not overlap a placed module on the

chip• Maximum Empty Rectangle (MER):

An ER not included in any other rectangle than the device bounding box

Page 17: Temporal Placement. 2 Wasted Resources Wasted resource wr(v i ) of a node v i :  Unused area occupied by the node v i during the computation of a partition.

17

Example

• Non-ER: (E,F)

• ER: (E,D): MER (A,D): MER (E,C): not MER

• KAMER method: keeping all maximum empty rectangles:

− Permanently keeps track of all MERs

− Whenever a request for placing a component v arrives, the list of MERs is searched for a rectangle that can accommodate v.

Possible to have many MERs in which v can fit Strategies: first-fit or best-fit

Page 18: Temporal Placement. 2 Wasted Resources Wasted resource wr(v i ) of a node v i :  Unused area occupied by the node v i during the computation of a partition.

18

KAMER

Once a rectangle is chosen:− Candidate points are those that do not allow an overlap

with the external part of the rectangle.

• Problem: Number of empty rectangles does not grow linear with

the number of components included

Page 19: Temporal Placement. 2 Wasted Resources Wasted resource wr(v i ) of a node v i :  Unused area occupied by the node v i during the computation of a partition.

19

KAMER

Run-time of free rectangle placement: O(n2)

• Heuristic: Keep only the non-overlapping empty rectangles

− Linear time, lower quality

• Problem 1: Several non-overlapping rectangle representations

Page 20: Temporal Placement. 2 Wasted Resources Wasted resource wr(v i ) of a node v i :  Unused area occupied by the node v i during the computation of a partition.

20

Keeping Non-Overlapping ERs

• Problem 2: Non-overlapping empty rectangles are not necessarily

maximal− A module may exists that could fit onto the device, but

cannot be placed

Can fitCan’t fit

(bad decomposition)

Page 21: Temporal Placement. 2 Wasted Resources Wasted resource wr(v i ) of a node v i :  Unused area occupied by the node v i during the computation of a partition.

21

Keeping Non-Overlapping ERs

• Problem 2:

Can fitCan’t fit

Page 22: Temporal Placement. 2 Wasted Resources Wasted resource wr(v i ) of a node v i :  Unused area occupied by the node v i during the computation of a partition.

22

Keeping Non-Overlapping ERs

• Incremental update: Whenever a new component v1 is placed in a rectangle,

two possibilities:− Horizontal splitting− Vertical splitting

Negative impact on next components

Page 23: Temporal Placement. 2 Wasted Resources Wasted resource wr(v i ) of a node v i :  Unused area occupied by the node v i during the computation of a partition.

23

Keeping Non-Overlapping ERs

• Horizontal

Can’t fit

Can fit

• Vertical

•Solution:Delaying the split decision for a number of steps later

Page 24: Temporal Placement. 2 Wasted Resources Wasted resource wr(v i ) of a node v i :  Unused area occupied by the node v i during the computation of a partition.

24

Keeping Non-Overlapping ERs

KAMER always finds a rectangle to place a new component, if one exists.

But position to place the component within the rectangle must be selected from a set of points

− whose number is the area of the rectangle in worst case• In [Bazargan00]:

Bottom-left position− Note: No communication is assumed between the rectangle to be

placed and those already placed.• If communication exists:

An optimal algorithm should consider any single point• Solution:

[Ahmadinia04]

Page 25: Temporal Placement. 2 Wasted Resources Wasted resource wr(v i ) of a node v i :  Unused area occupied by the node v i during the computation of a partition.

25

KAMER Off-Line Placement

• Uses simulated annealing in a 3D space:

1. Places X% largest modules− X is a parameter of the algorithm− Idea: Remove small modules

− Open space for larger ones− Small modules fragment more

Initial placement: from KAMER- best fit Perturb:

− Can displace a module on the RFU− No change in start or end time

− Can move an operation from CPU to RFU and vice versa

2. Use zero-temperature SA for fine tuning Place as many (100-X)% modules as it can

Page 26: Temporal Placement. 2 Wasted Resources Wasted resource wr(v i ) of a node v i :  Unused area occupied by the node v i during the computation of a partition.

35

References [Bobda07] C. Bobda, “Introduction to Reconfigurable Computing:

Architectures, Algorithms and Applications,” Springer, 2007. [Hauck08] S. Hauck, A. DeHon, "Reconfigurable Computing: The

Theory and Practice of FPGA-Based Computation" Morgan-Kaufmann, 2007

[Mehdipour06] F. Mehdipour*, M. Saheb Zamani, M. Sedighi, “An integrated temporal partitioning and physical design framework for static compilation of reconfigurable computing systems,” Journal of Microprocessors and Microsystems, Elsevier, v30, 2006, pp. 52–62.

[Bazargan00] K. Bazargan, R. Kastner and M. Sarrafzadeh, "Fast Template Placement for Reconfigurable Computing Systems," IEEE Design and Test - Special Issue on Reconfigurable Computing, January-March 2000.

[Bazargan] Bazargan, “Fast Placement Methods for Reconfigurable Computing Systems,” http://www.ece.umn.edu/~kia/Research/Pre2000/RCSPlace/

[Ahmadinia04] A. Ahmadinia, C. Bobda, M. Bednara, and J. Teich, “A new approach for on-line placement on reconfigurable devices,” in Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS) / Reconfigurable Architectures Workshop (RAW), 2004.