High-level Synthesis Scheduling, Allocation, Assignment,

Post on 21-Jan-2016

45 views 0 download

description

High-level Synthesis Scheduling, Allocation, Assignment,. Note: Several slides in this Lecture are from Prof. Miodrag Potkonjak, UCLA CS. Overview. High Level Synthesis Scheduling, Allocation and Assignment Estimations Transformations. Allocation, Assignment, and Scheduling. - PowerPoint PPT Presentation

Transcript of High-level Synthesis Scheduling, Allocation, Assignment,

Mani SrivastavaUCLA - EE DepartmentRoom: 6731-H Boelter HallEmail: mbs@ee.ucla.eduTel: 310-267-2098WWW: http://www.ee.ucla.edu/~mbs

Copyright 2003 Mani Srivastava

High-level Synthesis Scheduling, Allocation, Assignment,

Note: Several slides in this Lecture are from

Prof. Miodrag Potkonjak, UCLA CS

Copyright 2003 Mani Srivastava2

Overview

High Level Synthesis

Scheduling, Allocation and Assignment

Estimations

Transformations

Copyright 2003 Mani Srivastava3

Allocation, Assignment, and Scheduling

D

+

-

>>

>>

+

-

>>

+ >>

+

>>

+

Allocation: How Much?2 adders

Assignment: Where?

Schedule: When?

Shifter 1

Time Slot 4

1 shifter24 registers

D

Techniques Well Understood and Mature

Copyright 2003 Mani Srivastava4

Scheduling and Assignment

+

*3*2

3

+

*1

2

+1 1

2

3

3

4 4

+

*3*2

3

+2

+1 2

3

4

1

2 3

4 control steps

+ * * + *

*1

Schedule 1 Schedule 2

1 +1

2 +2

3 +3 *1

4 *2 *3

Control Step

1 +3

2 +1 *2

3 +2 *3

4 *1

Control Step

Copyright 2003 Mani Srivastava5

ASAP Scheduling Algorithm

Copyright 2003 Mani Srivastava6

ASAP Scheduling Example

Copyright 2003 Mani Srivastava7

ASAP: Another Example

Sequence Graph ASAP Schedule

Copyright 2003 Mani Srivastava8

ALAP Scheduling Algorithm

Copyright 2003 Mani Srivastava9

ALAP Scheduling Example

Copyright 2003 Mani Srivastava10

ALAP: Another Example

Sequence Graph ALAP Schedule(latency constraint = 4)

Copyright 2003 Mani Srivastava11

Observation about ALAP & ASAP

No priority is given to nodes on critical path As a result, less critical nodes may be scheduled

ahead of critical nodes No problem if unlimited hardware However of the resources are limited, the less

critical nodes may block the critical nodes and thus produce inferior schedules

List scheduling techniques overcome this problem by utilizing a more global node selection criterion

Copyright 2003 Mani Srivastava12

List Scheduling and Assignment

List_Scheduling() {

Create_Candidate_List();

while (Candidate_List != NULL) {

Select_Candidate();

Schedule Candidate();

}

}

+

*3*2

3

+

*1

2

+14 control steps

Schedule 1

+1 +3

+3 *1

*2 *3

*2

+3 +2

1:

2:

3:

4:

Copyright 2003 Mani Srivastava13

List Scheduling Algorithm using Decreasing Criticalness Criterion

Copyright 2003 Mani Srivastava14

Scheduling

NP-complete Problem Optimal Heuristics - Iterative Improvements Heuristics – Constructive Various versions of problem

Unconstrained minimum latency Resource-constrained minimum latency Timing constrained

If all resources identical, reduced to multiprocessor scheduling

Minimum latency multiprocessor problem is intractable

Copyright 2003 Mani Srivastava15

Scheduling - Optimal Techniques

Integer Linear Programming

Branch and Bound

Copyright 2003 Mani Srivastava16

Integer Linear Programming

Given: integer-valued matrix Amxn,

vectors B = ( b1, b2, … , bm ), C = ( c1, c2, … , cn )

Minimize: CTX

Subject to:

AX B

X = ( x1, x2, … , xn ) is an integer-valued vector

Copyright 2003 Mani Srivastava17

Integer Linear Programming Problem: For a set of (dependent) computations {t1,t2,...,tn},

find the minimum number of units needed to complete the execution by k control steps.

Integer linear programming:Let y0 be an integer variable. For each control step i ( 1 i k ): define variable xij asxij = 1, if computation tj is executed in the ith control step. xij = 0, otherwise. define variable yi = xi1 + xI2 + ... + xin .

Copyright 2003 Mani Srivastava18

Integer Linear Programming

Integer linear programming:For each computation dependency: ti has to be done before tj, introduce a constraint: k x1i+ (k-1) x2i+ ... + xki k x1j+ (k-1) x2j+ ... + xkj+ 1(*)

Minimize: y0

Subject to: x1i+ x2i+ ... + xki = 1 for all 1 i n

yj y0 for all 1 i k

all computation dependency of type (*)

Copyright 2003 Mani Srivastava19

An Example

c1 c2 c3

c4

c6

c5

6 computations3 control steps

Copyright 2003 Mani Srivastava20

An Example

Introduce variables: xij for 1 i 3, 1 j 6

yi = xi1+xi2+xi3+xi4+xi5+xi6 for 1 i 3

y0

Dependency constraints: e.g. execute c1 before c4

3x11+2x21+x31 3x14 +2x24+x34+1

Execution constraints:

x1i+x2i+x3i = 1 for 1 i 6

Copyright 2003 Mani Srivastava21

An Example Minimize: y0

Subject to: yi y0 for all 1 i 3

dependency constraints

execution constraints One solution: y0 = 2

x11 = 1, x12 = 1,

x23 = 1, x24 = 1,

x35 = 1, x36 = 1.

All other xij = 0

Copyright 2003 Mani Srivastava22

ILP Model of Scheduling

Binary decision variables xil

i = 0, 1, …, n l = 1, 2, … +1

Start time is unique

Copyright 2003 Mani Srivastava23

ILP Model of Scheduling (contd.)

Sequencing relationships must be satisfied

Resource bounds must be met let upper bound on # of resources of type k be ak

Copyright 2003 Mani Srivastava24

Minimum-latency Scheduling Under Resource-constraints

Let t be the vector whose entries are start times Formal ILP model

Copyright 2003 Mani Srivastava25

Example

Two types of resources Multiplier ALU

Adder Subtraction Comparison

Both take 1 cycle execution time

Copyright 2003 Mani Srivastava26

Example (contd.)

Heuristic (list scheduling) gives latency = 4 steps Use ALAP and ASAP (with no resource

constraints) to get bounds on start times ASAP matches latency of heuristic

so heuristic is optimum, but let us ignore it! Constraints?

Copyright 2003 Mani Srivastava27

Example (contd.)

Start time is unique

Copyright 2003 Mani Srivastava28

Example (contd.)

Sequencing constraints note: only non-trivial ones listed

those with more than one possible start time for at least one operation

Copyright 2003 Mani Srivastava29

Example (contd.)

Resource constraints

Copyright 2003 Mani Srivastava30

Example (contd.)

Consider c = [0, 0, …, 1]T

Minimum latency schedule since sink has no mobility (xn,5 = 1), any feasible

schedule is optimum Consider c = [1, 1, …, 1] T

finds earliest start times for all operations equivalently,

Copyright 2003 Mani Srivastava31

Example Solution: Optimum Schedule Under Resource

Constraint

Copyright 2003 Mani Srivastava32

Example (contd.)

Assume multiplier costs 5 units of area, and ALU costs 1 unit of area

Same uniqueness and sequencing constraints as before

Resource constraints are in terms of unknown variables a1 and a2

a1 = # of multipliers

a2 = # of ALUs

Copyright 2003 Mani Srivastava33

Example (contd.) Resource constraints

Copyright 2003 Mani Srivastava34

Example Solution

MinimizecTa = 5.a1 + 1.a2

Solution with cost 12

Copyright 2003 Mani Srivastava35

Precedence-constrained Multiprocessor Scheduling

All operations done by the same type of resource intractable problem intractable even if all operations have unit delay

Copyright 2003 Mani Srivastava36

Scheduling - Iterative Improvement

Kernighan - Lin (deterministic) Simulated Annealing Lottery Iterative Improvement Neural Networks Genetic Algorithms Taboo Search

Copyright 2003 Mani Srivastava37

Scheduling - Constructive Techniques

Most Constrained

Least Constraining

Copyright 2003 Mani Srivastava38

Force Directed Scheduling

Goal is to reduce hardware by balancing concurrency

Iterative algorithm, one operation scheduled per iteration

Information (i.e. speed & area) fed back into scheduler

Copyright 2003 Mani Srivastava39

The Force Directed Scheduling Algorithm

Copyright 2003 Mani Srivastava40

Step 1

Determine ASAP and ALAP schedules

*

-+

**

*+ <

**-

*

-

+* * *+ <**

-

ASAP ALAP

Copyright 2003 Mani Srivastava41

Step 2

Determine Time Frame of each op Length of box ~ Possible execution cycles Width of box ~ Probability of assignment Uniform distribution, Area assigned = 1

C-step 1

C-step 2

C-step 3

C-step 4

Time Frames

*

-

*

*

-

*

**

+ <

+

1/2

1/3

Copyright 2003 Mani Srivastava42

Step 3

Create Distribution Graphs Sum of probabilities of each Op type Indicates concurrency of similar Ops

DG(i) = Prob(Op, i)

DG for Multiply DG for Add, Sub, Comp

Copyright 2003 Mani Srivastava43

Diff Eq Example: Precedence Graph Recalled

Copyright 2003 Mani Srivastava44

Diff Eq Example: Time Frame & Probability Calculation

Copyright 2003 Mani Srivastava45

Diff Eq Example: DG Calculation

Copyright 2003 Mani Srivastava46

Conditional Statements

Operations in different branches are mutually exclusive Operations of same type can be overlapped onto DG Probability of most likely operation is added to DG

DG for Add

-+

-+

+Fork

Join

+-+

-+

Copyright 2003 Mani Srivastava47

Self Forces Scheduling an operation will effect overall concurrency Every operation has 'self force' for every C-step of its time frame Analogous to the effect of a spring: f = Kx

Desirable scheduling will have negative self force Will achieve better concurrency (lower potential energy)

Force(i) = DG(i) * x(i)

DG(i) ~ Current Distribution Graph value

x(i) ~ Change in operation’s probability

Self Force(j) = [Force(i)]

b

ti

Copyright 2003 Mani Srivastava48

Example Attempt to schedule multiply in C-step 1

Self Force(1) = Force(1) + Force(2)

= ( DG(1) * X(1) ) + ( DG(2) * X(2) )

= [2.833*(0.5) + 2.333 * (-0.5)] = +0.25

This is positive, scheduling the multiply

in the first C-step would be bad

DG for Multiply

*

-

*

*

-

*

**

+ <

+

C-step 1

C-step 2

C-step 3

C-step 41/2

1/3

Copyright 2003 Mani Srivastava49

Diff Eq Example: Self Force for Node 4

Copyright 2003 Mani Srivastava50

Predecessor & Successor Forces

Scheduling an operation may affect the time frames of other linked operations

This may negate the benefits of the desired assignment Predecessor/Successor Forces = Sum of Self Forces of

any implicitly scheduled operations

*

-+

**

*+ <

**-

Copyright 2003 Mani Srivastava51

Diff Eq Example: Successor Force on Node 4

If node 4 scheduled in step 1 no effect on time frame for successor node 8

Total force = Froce4(1) = +0.25 If node 4 scheduled in step 2

causes node 8 to be scheduled into step 3 must calculate successor force

Copyright 2003 Mani Srivastava52

Diff Eq Example: Final Time Frame and Schedule

Copyright 2003 Mani Srivastava53

Diff Eq Example: Final DG

Copyright 2003 Mani Srivastava54

Lookahead Temporarily modify the constant DG(i) to include the effect

of the iteration being considered

Force (i) = temp_DG(i) * x(i)temp_DG(i) = DG(i) + x(i)/3

Consider previous example:

Self Force(1) = (DG(1) + x(1)/3)x(1) + (DG(2) + x(2)/3)x(2) = .5(2.833 + .5/3) -.5(2.333 - .5/3) = +.41667

This is even worse than before

Copyright 2003 Mani Srivastava55

Minimization of Bus Costs

Basic algorithm suitable for narrow class of problems Algorithm can be refined to consider “cost” factors Number of buses ~ number of concurrent data transfers Number of buses = maximum transfers in any C-step Create modified DG to include transfers: Transfer DG

Trans DG(i) = [Prob (op,i) * Opn_No_InOuts]

Opn_No_InOuts ~ combined distinct in/outputs for Op

Calculate Force with this DG and add to Self Force

Copyright 2003 Mani Srivastava56

Minimization of Register Costs Minimum registers required is given by the largest

number of data arcs crossing a C-step boundary Create Storage Operations, at output of any operation

that transfers a value to a destination in a later C-step Generate Storage DG for these “operations” Length of storage operation depends on final schedule

s

ss

d

d d

Storage distribution for S

ASAP Lifetime MAX Lifetime ALAP Lifetime

Copyright 2003 Mani Srivastava57

Minimization of Register Costs( contd.) avg life] =

storage DG(i) = (no overlap between ASAP & ALAP)

storage DG(i) = (if overlap)

Calculate and add “Storage” Force to Self Force

3

life] [MAX life] [ALAP life] [ASAP

life][max

life] [avg

[overlap]life][max

[overlap] - life] [avg

7 registers minimum

ASAP Force Directed

5 registers minimum

Copyright 2003 Mani Srivastava58

Pipelining* * *

***

+

+<

--

* * ****

+

+<

--

DG for Multiply

123, 1’4, 2’ 3’ 4’

Instance

Instance’

Functional Pipelining

1

2

34

*

*

Structural Pipelining

Functional Pipelining Pipelining across multiple

operations Must balance distribution

across groups of concurrent C-steps

Cut DG horizontally and superimpose

Finally perform regular Force Directed Scheduling

Structural Pipelining Pipelining within an operation For non data-dependant

operations, only the first C-step need be considered

Copyright 2003 Mani Srivastava59

Other Optimizations Local timing constraints

Insert dummy timing operations -> Restricted time frames

Multiclass FU’s Create multiclass DG by summing probabilities of

relevant ops Multistep/Chained operations.

Carry propagation delay information with operation Extend time frames into other C-steps as required

Hardware constraints Use Force as priority function in list scheduling

algorithms

Copyright 2003 Mani Srivastava60

Scheduling using Simulated Annealing

Reference:

Devadas, S.; Newton, A.R.

Algorithms for hardware allocation in data path synthesis.

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, July 1989, Vol.8, (no.7):768-81.

Copyright 2003 Mani Srivastava61

Simulated Annealing

Local Search

Solution space

Cos

t fu

nctio

n

?

Copyright 2003 Mani Srivastava62

Statistical Mechanics

Combinatorial Optimization

State {r:} (configuration -- a set of atomic position )

weight e-E({r:])/K BT -- Boltzmann distribution

E({r:]): energy of configuration

KB: Boltzmann constant

T: temperature

Low temperature limit ??

Copyright 2003 Mani Srivastava63

Analogy

Physical System

State (configuration)

Energy

Ground State

Rapid Quenching

Careful Annealing

Optimization Problem

Solution

Cost Function

Optimal Solution

Iteration Improvement

Simulated Annealing

Copyright 2003 Mani Srivastava64

Generic Simulated Annealing Algorithm

1. Get an initial solution S2. Get an initial temperature T > 03. While not yet 'frozen' do the following: 3.1 For 1 i L, do the following:

3.1.1 Pick a random neighbor S'of S 3.1.2 Let =cost(S') - cost(S) 3.1.3 If 0 (downhill move) set S = S' 3.1.4 If >0 (uphill move)

set S=S' with probability e-/T

3.2 Set T = rT (reduce temperature)4. Return S

Copyright 2003 Mani Srivastava65

Basic Ingredients for S.A.

Solution Space

Neighborhood Structure

Cost Function

Annealing Schedule

Copyright 2003 Mani Srivastava66

Observation

All scheduling algorithms we have discussed so far are critical path schedulers

They can only generate schedules for iteration period larger than or equal to the critical path

They only exploit concurrency within a single iteration, and only utilize the intra-iteration precedence constraints

Copyright 2003 Mani Srivastava67

Example

Can one do better than iteration period of 4? Pipelining + retiming can reduce critical path to 3, and also

the # of functional units Approaches

Transformations followed by scheduling Transformations integrated with scheduling

Copyright 2003 Mani Srivastava74

Conclusions

High Level Synthesis Connects Behavioral Description and Structural

Description Scheduling, Estimations, Transformations High Level of Abstraction, High Impact on the

Final Design