About the Final Exam - clear.rice.edu · About the Final Exam ... ♦May bereadyintime for class on...

45
About the Final Exam Due to scheduling conflicts, the final exam will be a take-home exam Available by Friday, April 20 (this Friday) May be ready in time for class on Thursday Three hour exam Closed book, closed notes, closed devices Covered by the Rice Honor Code Due back to me by May 2 It would help me if you return them earlier than May 2 Return by signing it in and sliding it under my door clipboard outside the door to sign I will post a list of topics by Thursday COMP 506, Rice University 1

Transcript of About the Final Exam - clear.rice.edu · About the Final Exam ... ♦May bereadyintime for class on...

About the Final Exam

Due to scheduling conflicts, the final exam will be a take-home exam• Available by Friday, April 20 (this Friday)♦ May be ready in time for class on Thursday

• Three hour exam♦ Closed book, closed notes, closed devices♦ Covered by the Rice Honor Code

• Due back to me by May 2♦ It would help me if you return them earlier than May 2♦ Return by signing it in and sliding it under my door→ clipboard outside the door to sign

I will post a list of topics by Thursday

COMP 506, Rice University 1

Instruction Scheduling

Copyright 2018, Keith D. Cooper & Linda Torczon, all rights reserved.

Students enrolled in Comp 506 at Rice University have explicit permission to make copies of these

materials for their personal use.

Faculty from other educational institutions may use these materials for nonprofit educational purposes,

provided this copyright notice is preserved

COMP 506

Rice University

Spring 2018

Chapter 12 in EaC2e.

Front End Optimizer Back End

IR IRsourcecode

targetcode

Order of Operations Matters

The order in which operations issue affects total runtime• Many operations have non-zero latencies• Modern machines can issue several operations in each cycle• Execution time is order-dependent (and has been so since the 1960’s)

Instruction Scheduling reorders operations• Tries to minimize wasted cycles (cover latencies with useful work)• Tries to maximize functional unit utilization

Opportunities• Non-blocking operations♦ Can issue new operations which waiting for one that is executing• Operations that are independent of each other♦ Can issue independent operations in parallel with each other

COMP 506, Rice University 3

ALU Characteristics

To schedule operations, the compiler needs accurate data on latenciesThis data is surprisingly hard to obtain.

COMP 506, Rice University 4

platform-aware compilation environment

platform-aware compilation environment

Intel E5530 operation latencies Instruction Cost64 bit integer subtract 1

64 bit integer multiply 3

64 bit integer divide 41

Double precision add 3

Double precision subtract 3

Double precision multiply 5

Double precision divide 22

Single precision add 3

Single precision subtract 3

Single precision multiply 4

Single precision divide 14

• Value-dependent behavior

• Context-dependent behavior

• Compiler-dependent behavior

♦ Have seen gcc underallocate registers & inflate operation costs with spills

♦ Have seen commercial compiler generate 3 extra ops per divide (raising cost by 3)

• Difficult to reconcile measured reality with data from manufacturer’s docs (e.g., integer divide cost)

Order of Operations Matters

The order in which operations issue affects total runtime• Many operations have non-zero latencies• Modern machines can issue several operations in each cycle• Execution time is order-dependent (and has been so since the 1960’s)

Instruction Scheduling reorders operations• Tries to minimize wasted cycles (cover latencies with useful work)• Tries to maximize functional unit utilization

Opportunities• Non-blocking operations♦ Can issue new operations which waiting for one that is executing• Operations that are independent of each other♦ Can issue independent operations in parallel with each other

COMP 506, Rice University 5

A Simple Example

COMP 506, Rice University 6

start endNaïve Schedule

1 3 loadAI rARP, @a ⇒r1

4 4 add r1, r1 ⇒ r1

5 7 loadAI rARP, @b ⇒ r2

8 9 mult r1, r2 ⇒ r1

10 12 loadAI rARP, @c ⇒ r2

13 14 mult r1, r2 ⇒ r1

15 17 loadAI rARP, @d ⇒ r2

18 19 mult r1, r2 ⇒ r1

20 22 storeAI r1 ⇒ rARP, @a

Op Latency

loads & stores 3 cycles

multiplies 2 cycles

all others 1 cycle

• Schedule for a single functional unit

• Naïve schedule takes ops in treewalk order

a ¬ a * 2 * b * c * d

A Simple Example

COMP 506, Rice University 7

a ¬ a * 2 * b * c * d

start endNaïve Schedule

1 3 loadAI rARP, @a ⇒r1

4 4 add r1, r1 ⇒ r1

5 7 loadAI rARP, @b ⇒ r2

8 9 mult r1, r2 ⇒ r1

10 12 loadAI rARP, @c ⇒ r2

13 14 mult r1, r2 ⇒ r1

15 17 loadAI rARP, @d ⇒ r2

18 19 mult r1, r2 ⇒ r1

20 22 storeAI r1 ⇒ rARP, @a

Op Latency

loads & stores 3 cycles

multiplies 2 cycles

all others 1 cycle

start endSchedule Loads Early

1 3 loadAI rARP, @a ⇒r1

2 4 loadAI rARP, @b ⇒ r2

3 5 loadAI rARP, @c ⇒ r3

4 4 add r1, r1 ⇒ r1

5 6 mult r1, r2 ⇒ r1

6 8 loadAI rARP, @d ⇒ r2

7 8 mult r1, r3 ⇒ r1

9 10 mult r1, r2 ⇒ r1

11 13 storeAI r1 ⇒ rARP, @a

• Scheduled for a single functional unit

• Naïve schedule takes ops in treewalk order

• Scheduled code overlaps latencies in the load operations

• Shows potential of non-block operations

A Simple Example

COMP 506, Rice University 8

a ¬ a * 2 * b * c * d

start endNaïve Schedule

1 3 loadAI rARP, @a ⇒r1

4 4 add r1, r1 ⇒ r1

5 7 loadAI rARP, @b ⇒ r2

8 9 mult r1, r2 ⇒ r1

10 12 loadAI rARP, @c ⇒ r2

13 14 mult r1, r2 ⇒ r1

15 17 loadAI rARP, @d ⇒ r2

18 19 mult r1, r2 ⇒ r1

20 22 storeAI r1 ⇒ rARP, @a

Op Latency

loads & stores 3 cycles

multiplies 2 cycles

all others 1 cycle

start endSchedule Loads Early

1 3 loadAI rARP, @a ⇒r1

2 4 loadAI rARP, @b ⇒ r2

3 5 loadAI rARP, @c ⇒ r3

4 4 add r1, r1 ⇒ r1

5 6 mult r1, r2 ⇒ r1

6 8 loadAI rARP, @d ⇒ r2

7 8 mult r1, r3 ⇒ r1

9 10 mult r1, r2 ⇒ r1

11 13 storeAI r1 ⇒ rARP, @a

• Original code reused r2 , which constrained the scheduler• New schedule uses an extra register• We assume, wlog, that scheduling precedes allocation→ Scheduler can increase use of virtual registers

Instruction Scheduling (Operational View)

1. Rename registers into values to eliminate artificial constraints• New name at each definition, similar to names used in LVN and SSA• In practice, can only rename “local” names — not LIVE on exit

COMP 506, Rice University 9

The Original Code

a: loadAI rARP, @a ⇒ r1

b: add r1, r1 ⇒ r1

c: loadAI rARP, @b ⇒ r2

d: mult r1, r2 ⇒ r1

e: loadAI rARP, @c ⇒ r2

f: mult r1, r2 ⇒ r1

g: loadAI rARP, @d ⇒ r2

h: mult r1, r2 ⇒ r1

i: storeAI r1 ⇒ r0,@a

Instruction Scheduling (Operational View)

1. Rename registers into values to eliminate artificial constraints• New name at each definition, similar to names used in LVN and SSA• In practice, can only rename “local” names — not LIVE on exit

COMP 506, Rice University 10

The Original Code

a: loadAI rARP, @a ⇒ r1b: add r1, r1 ⇒ r1c: loadAI rARP, @b ⇒ r2d: mult r1, r2 ⇒ r1e: loadAI rARP, @c ⇒ r2f: mult r1, r2 ⇒ r1g: loadAI rARP, @d ⇒ r2h: mult r1, r2 ⇒ r1i: storeAI r1 ⇒ r0,@a

After Renaming

a: loadAI rARP, @a ⇒ r1b: add r1, r1 ⇒ r2c: loadAI rARP, @b ⇒ r3d: mult r2, r3 ⇒ r4e: loadAI rARP, @c ⇒ r5f: mult r4, r5 ⇒ r6g: loadAI rARP, @d ⇒ r7h: mult r6, r7 ⇒ r8i: storeAI r8 ⇒ r0,@a

2. Build a dependence graph to capture the flow of values in the code• Nodes n Î D represent operations with type(n) and delay(n)

• An edge e = (n1,n2) Î D iff n1 uses the result of n2 (runs from use to def)

The Original Code

a: loadAI rARP, @a ⇒ r1

b: add r1, r1 ⇒ r1

c: loadAI rARP, @b ⇒ r2

d: mult r1, r2 ⇒ r1

e: loadAI rARP, @c ⇒ r2

f: mult r1, r2 ⇒ r1

g: loadAI rARP, @d ⇒ r2

h: mult r1, r2 ⇒ r1

i: storeAI r1 ⇒ r0,@a

Instruction Scheduling (Operational View)

COMP 506, Rice University 11

The Dependence Graph

true dependences or flow dependences

Some authors prefer the term “precedence graph” over “dependence

graph.” In the context of scheduling, they are interchangeable.

a

b c

d e

f g

h

i

2. Build a dependence graph to capture the flow of values in the code• The final schedule must honor each dependence in the input code♦ Values computed before they are used & used before they are overwritten

• Renaming lets the scheduler ignore some anti-dependences

The Original Code

a: loadAI rARP, @a ⇒ r1b: add r1, r1 ⇒ r1c: loadAI rARP, @b ⇒ r2d: mult r1, r2 ⇒ r1e: loadAI rARP, @c ⇒ r2f: mult r1, r2 ⇒ r1g: loadAI rARP, @d ⇒ r2h: mult r1, r2 ⇒ r1i: storeAI r1 ⇒ r0,@a

Instruction Scheduling (Operational View)

COMP 506, Rice University 12

The Dependence Graph

a

b c

d e

f g

h

i

anti- dependences

Instruction Scheduling (Operational View)

2. Build a dependence graph to capture the flow of values in the code• The final schedule must honor each dependence in the input code♦ Values computed before they are used & used before they are rewritten

• Renaming lets the scheduler ignore some anti-dependences

COMP 506, Rice University 13

The Dependence Graph

a

b c

d e

f g

h

i

The Renamed Code

a: loadAI rARP, @a ⇒ r1b: add r1, r1 ⇒ r2c: loadAI rARP, @b ⇒ r3d: mult r2, r3 ⇒ r4e: loadAI rARP, @c ⇒ r5f: mult r4, r5 ⇒ r6g: loadAI rARP, @d ⇒ r7h: mult r6, r7 ⇒ r8i: storeAI r8 ⇒ r0,@a

anti- dependences are gone

The Scheduler Needs Some Additional Edges

To ensure the correct flow of values, the scheduler needs edges that specify the relative ordering of loads, stores, and outputs

COMP 506, Rice University 14

First op

Second op

Same address

Distinctaddresses

Unknownaddress(es) Acronym

load load — No conflict (after renaming) — —

load oroutput

store conflict no conflict conflict WAR

store store conflict no conflict conflict WAW

storeload oroutput

conflict no conflict conflict RAW

output output need an edge to serialize the outputs —

“conflict” implies 1st op must complete before 2nd op issues⇒ an edge in dependence graph

Computer architects also worry about these conflicts. The acronyms arise in discussions

about memory system design.

1: loadI 8 => vr3prio: <12,12,4>

2: loadI 12 => vr4prio: <12,12,4>

3: add vr3, vr4 => vr0prio: <11,11,3>

vr3, 1 vr4, 1

4: load vr0 => vr1prio: <10,10,2>

vr0, 1 5: load vr3 => vr2prio: <10,10,2>

vr3, 1

6: store vr1 => vr0prio: <5,5,1>

vr0, 1

vr1, 5 IO Edge, 1

7: output 12prio: <0,0,0>

IO Edge, 5

Dependence

Dependence relations among memory operations are more complex

COMP 412, Fall 2017 15

loadI 12 => r4 loadI 8 => r3load r3 => r2 add r3, r4 => r0load r0 => r1 nopnop nopnop nopnop nopnop nopstore r1 => r0 nopnop nopnop nopnop nopnop nopoutput 12 nop

Scheduled Code(2 Functional Units)

1 loadI 8 => r12 loadI 12 => r23 add r1, r2 => r34 load r3 => r45 load r1 => r56 store r4 => r37 output 12

Original Code Dependence Graph

What is this edge?

Instruction Scheduling (Operational View)

2. Build a dependence graph to capture the flow of values in the code

COMP 506, Rice University 16

Building the Graph

Create an empty map, Mwalk the block, top to bottom

at each operation o that defines VRi :create a node for o 1

set M(VRi ) to the node ofor each VRj used in o, add an edge from node o to the node in M(VRj) if o is a load, store, or outputoperation, add edges to ensure serialization of memory ops 2

Explanatory Notes1. ‘o’ refers to both the operation in the

original block and the node that represents it in the graph.

The meaning should be clear by context.

2. I/O operations need additional edges for synchronization. At a minimum:

• load & output need an edge to the most recent store (anti-dependence)

• output needs an edge to the most recent output (serialization)

• store needs an edge to the most recent store, as well as each previous load & output (serialization)O(n + m2)

where m is |memory ops|

Rem

inds

me

of LV

N

Instruction Scheduling (Operational View)

3. Compute a priority function on the nodes of the graph• The priority function should reflect the importance of the operation to

the final schedule (whatever that means)♦ Latency-weighted distance to a root is a classic priority function

COMP 506, Rice University 17

The Renamed Code

a: loadAI rARP, @a ⇒ r1b: add r1, r1 ⇒ r2c: loadAI rARP, @b ⇒ r3d: mult r2, r3 ⇒ r4e: loadAI rARP, @c ⇒ r5f: mult r4, r5 ⇒ r6g: loadAI rARP, @d ⇒ r7h: mult r6, r7 ⇒ r8i: storeAI r8 ⇒ r0,@a

The Dependence Graph

a

b c

d e

f g

h

i

leaf

leaf

leaf

leaf

root

Instruction Scheduling (Operational View)

3. Compute a priority function on the nodes of the graph• The priority function should reflect the importance of the operation to

the final schedule (whatever that means)♦ Latency-weighted distance to a root is a classic priority function

COMP 506, Rice University 18

The Renamed Code

a: loadAI rARP, @a ⇒ r1b: add r1, r1 ⇒ r2c: loadAI rARP, @b ⇒ r3d: mult r2, r3 ⇒ r4e: loadAI rARP, @c ⇒ r5f: mult r4, r5 ⇒ r6g: loadAI rARP, @d ⇒ r7h: mult r6, r7 ⇒ r8i: storeAI r8 ⇒ r0,@a

The Dependence Graph

3

58

7

9

10

12

10

13a

b c

d e

f g

h

i

latency-weighted distance to root

Op Latency

loads & stores 3 cycles

multiplies 2 cycles

all others 1 cycle

Instruction Scheduling (Operational View)

4. Schedule the operations to reflect dependences and priorities

COMP 506, Rice University 19

Cycle ¬ 1Ready ¬ leaves of DActive ¬ Ø

while (Ready È Active ¹ Ø)for each functional unit, f, do

if ∃ 1 or more op in Ready for f thenremove highest priority op for f from

the Ready queueS(op) ¬ CycleActive ¬ Active È op

Cycle ¬ Cycle + 1

for each op in Activeif (S(op) + delay(op) ≤ Cycle) then

remove op from Activefor each successor s of op in D

if (s is ready) thenReady ¬ Ready È s

Remove in priority order (break ties)Need a sliding window to see the constrained operations.

op has completed execution

If successor’s operands are “ready”, add it to Ready

Implement Ready as a priority queue

Greedy, heuristic, list-scheduling algorithm

If we add edges to synchronize loads and stores, this test must also check successors along those edges

Instruction Scheduling (Operational View)

4. Schedule the operations to reflect dependences and priorities• Use a greedy heuristic technique, such as list scheduling

COMP 506, Rice University 20

The ScheduleDependence Graph

3

58

7

9

10

12

10

13a

b c

d e

f g

h

i

Schedule1

2

3

4

5

6

7

8

9

10

11

Ready <a,13>, <c,12>, <e,10>, <g,8>

Active

Start with the operations that are ready to schedule (the leaves)— remove completed ops from

Active and visit ops that depend on them. If they are ready, move them onto the Ready queue

— take the highest priority op in Ready & schedule it into next slot

— place the newly schedule op in Active

repeatuntil done

Instruction Scheduling (Operational View)

4. Schedule the operations to reflect dependences and priorities• Use a greedy heuristic technique, such as list scheduling

COMP 506, Rice University 21

The ScheduleDependence Graph

3

58

7

9

10

12

10

13a

b c

d e

f g

h

i

Schedule1 a2

3

4

5

6

7

8

9

10

11

Ready <c,12>, <e,10>, <g,8>

Active <a,#4>

Start with the operations that are ready to schedule (the leaves)— remove completed ops from

Active and visit ops that depend on them. If they are ready, move them onto the Ready queue

— take the highest priority op in Ready & schedule it into next slot

— place the newly schedule op in Active

Instruction Scheduling (Operational View)

4. Schedule the operations to reflect dependences and priorities• Use a greedy heuristic technique, such as list scheduling

COMP 506, Rice University 22

The ScheduleDependence Graph

3

58

7

9

10

12

10

13a

b c

d e

f g

h

i

Schedule1 a2 c3

4

5

6

7

8

9

10

11

Ready <e,10>, <g,8>

Active <a,#4>, <c,#5>

Start with the operations that are ready to schedule (the leaves)— remove completed ops from

Active and visit ops that depend on them. If they are ready, move them onto the Ready queue

— take the highest priority op in Ready & schedule it into next slot

— place the newly schedule op in Active

Instruction Scheduling (Operational View)

4. Schedule the operations to reflect dependences and priorities• Use a greedy heuristic technique, such as list scheduling

COMP 506, Rice University 23

The ScheduleDependence Graph

3

58

7

9

10

12

10

13a

b c

d e

f g

h

i

Schedule1 a2 c3 e4

5

6

7

8

9

10

11

Ready <g,8>, <b,10>

Active <a,#4>, <c,#5>, <e,#6>

Start with the operations that are ready to schedule (the leaves)— remove completed ops from

Active and visit ops that depend on them. If they are ready, move them onto the Ready queue

— take the highest priority op in Ready & schedule it into next slot

— place the newly schedule op in Active

Instruction Scheduling (Operational View)

4. Schedule the operations to reflect dependences and priorities• Use a greedy heuristic technique, such as list scheduling

COMP 506, Rice University 24

The ScheduleDependence Graph

3

58

7

9

10

12

10

13a

b c

d e

f g

h

i

Schedule1 a

2 c

3 e

4

5

6

7

8

9

10

11

Ready <g,8>, <b,10>

Active <a,#4>, <c,#5>, <e,#6>

Start with the operations that are

ready to schedule (the leaves)

— remove completed ops from

Active and visit ops that depend

on them. If they are ready, move

them onto the Ready queue

— take the highest priority op in

Ready & schedule it into next slot

— place the newly schedule op in

Active

Instruction Scheduling (Operational View)

4. Schedule the operations to reflect dependences and priorities• Use a greedy heuristic technique, such as list scheduling

COMP 506, Rice University 25

The ScheduleDependence Graph

3

58

7

9

10

12

10

13a

b c

d e

f g

h

i

Schedule1 a2 c3 e4 b5

6

7

8

9

10

11

Ready <g,8>

Active <c,#5>, <e,#6>, <b#5>

Start with the operations that are ready to schedule (the leaves)— remove completed ops from

Active and visit ops that depend on them. If they are ready, move them onto the Ready queue

— take the highest priority op in Ready & schedule it into next slot

— place the newly schedule op in Active

Instruction Scheduling (Operational View)

4. Schedule the operations to reflect dependences and priorities• Use a greedy heuristic technique, such as list scheduling

COMP 506, Rice University 26

The ScheduleDependence Graph

3

58

7

9

10

12

10

13a

b c

d e

f g

h

i

Schedule1 a2 c3 e4 b5

6

7

8

9

10

11

Ready <g,8>, <d,9>

Active <c,#5>, <e,#6>, <b#5>

Start with the operations that are ready to schedule (the leaves)— remove completed ops from

Active and visit ops that depend on them. If they are ready, move them onto the Ready queue

— take the highest priority op in Ready & schedule it into next slot

— place the newly schedule op in Active

Instruction Scheduling (Operational View)

4. Schedule the operations to reflect dependences and priorities• Use a greedy heuristic technique, such as list scheduling

COMP 506, Rice University 27

The ScheduleDependence Graph

3

58

7

9

10

12

10

13a

b c

d e

f g

h

i

Schedule1 a2 c3 e4 b5 d6

7

8

9

10

11

Ready <g,8>

Active <e,#6>, <d, #7>

Start with the operations that are ready to schedule (the leaves)— remove completed ops from

Active and visit ops that depend on them. If they are ready, move them onto the Ready queue

— take the highest priority op in Ready & schedule it into next slot

— place the newly schedule op in Active

Instruction Scheduling (Operational View)

4. Schedule the operations to reflect dependences and priorities• Use a greedy heuristic technique, such as list scheduling

COMP 506, Rice University 28

The ScheduleDependence Graph

3

58

7

9

10

12

10

13a

b c

d e

f g

h

i

Schedule1 a2 c3 e4 b5 d6

7

8

9

10

11

Ready <g,8>

Active <e,#6>, <d,#7>

Start with the operations that are ready to schedule (the leaves)— remove completed ops from

Active and visit ops that depend on them. If they are ready, move them onto the Ready queue

— take the highest priority op in Ready & schedule it into next slot

— place the newly schedule op in Active

Instruction Scheduling (Operational View)

4. Schedule the operations to reflect dependences and priorities• Use a greedy heuristic technique, such as list scheduling

COMP 506, Rice University 29

The ScheduleDependence Graph

3

58

7

9

10

12

10

13a

b c

d e

f g

h

i

Schedule1 a2 c3 e4 b5 d6 g7

8

9

10

11

Ready

Active <g,#9>, <d,#7>

Start with the operations that are ready to schedule (the leaves)— remove completed ops from

Active and visit ops that depend on them. If they are ready, move them onto the Ready queue

— take the highest priority op in Ready & schedule it into next slot

— place the newly schedule op in Active

Instruction Scheduling (Operational View)

4. Schedule the operations to reflect dependences and priorities• Use a greedy heuristic technique, such as list scheduling

COMP 506, Rice University 30

The ScheduleDependence Graph

3

58

7

9

10

12

10

13a

b c

d e

f g

h

i

Schedule1 a2 c3 e4 b5 d6 g7

8

9

10

11

Ready <f,7>,

Active <g,#9>, <d,#7>

Start with the operations that are ready to schedule (the leaves)— remove completed ops from

Active and visit ops that depend on them. If they are ready, move them onto the Ready queue

— take the highest priority op in Ready & schedule it into next slot

— place the newly schedule op in Active

Instruction Scheduling (Operational View)

4. Schedule the operations to reflect dependences and priorities• Use a greedy heuristic technique, such as list scheduling

COMP 506, Rice University 31

The ScheduleDependence Graph

3

58

7

9

10

12

10

13a

b c

d e

f g

h

i

Schedule1 a2 c3 e4 b5 d6 g7 f8

9

10

11

Ready

Active <g, #9>, <f,#9>

Start with the operations that are ready to schedule (the leaves)— remove completed ops from

Active and visit ops that depend on them. If they are ready, move them onto the Ready queue

— take the highest priority op in Ready & schedule it into next slot

— place the newly schedule op in Active

Instruction Scheduling (Operational View)

4. Schedule the operations to reflect dependences and priorities• Use a greedy heuristic technique, such as list scheduling

COMP 506, Rice University 32

The ScheduleDependence Graph

3

58

7

9

10

12

10

13a

b c

d e

f g

h

i

Schedule1 a2 c3 e4 b5 d6 g7 f8

9

10

11

Ready

Active <g, #9>, <f,#9>

Start with the operations that are ready to schedule (the leaves)— remove completed ops from

Active and visit ops that depend on them. If they are ready, move them onto the Ready queue

— take the highest priority op in Ready & schedule it into next slot

— place the newly schedule op in Active

nothing ready

Instruction Scheduling (Operational View)

4. Schedule the operations to reflect dependences and priorities• Use a greedy heuristic technique, such as list scheduling

COMP 506, Rice University 33

The ScheduleDependence Graph

3

58

7

9

10

12

10

13a

b c

d e

f g

h

i

Schedule1 a2 c3 e4 b5 d6 g7 f8 nop

9

10

11

Ready

Active <g, #9>, <f,#9>

Start with the operations that are ready to schedule (the leaves)— remove completed ops from

Active and visit ops that depend on them. If they are ready, move them onto the Ready queue

— take the highest priority op in Ready & schedule it into next slot

— place the newly schedule op in Active

nothing ready

Instruction Scheduling (Operational View)

4. Schedule the operations to reflect dependences and priorities• Use a greedy heuristic technique, such as list scheduling

COMP 506, Rice University 34

The ScheduleDependence Graph

3

58

7

9

10

12

10

13a

b c

d e

f g

h

i

Schedule1 a2 c3 e4 b5 d6 g7 f8 nop

9

10

11

Ready <h,5>

Active <g, #9>, <f#9>

Start with the operations that are ready to schedule (the leaves)— remove completed ops from

Active and visit ops that depend on them. If they are ready, move them onto the Ready queue

— take the highest priority op in Ready & schedule it into next slot

— place the newly schedule op in Active

Instruction Scheduling (Operational View)

4. Schedule the operations to reflect dependences and priorities• Use a greedy heuristic technique, such as list scheduling

COMP 506, Rice University 35

The ScheduleDependence Graph

3

58

7

9

10

12

10

13a

b c

d e

f g

h

i

Schedule1 a2 c3 e4 b5 d6 g7 f8 nop

9 h10

11

Ready

Active <h,#11>

Start with the operations that are ready to schedule (the leaves)— remove completed ops from

Active and visit ops that depend on them. If they are ready, move them onto the Ready queue

— take the highest priority op in Ready & schedule it into next slot

— place the newly schedule op in Active

Instruction Scheduling (Operational View)

4. Schedule the operations to reflect dependences and priorities• Use a greedy heuristic technique, such as list scheduling

COMP 506, Rice University 36

The ScheduleDependence Graph

3

58

7

9

10

12

10

13a

b c

d e

f g

h

i

Schedule1 a2 c3 e4 b5 d6 g7 f8 nop

9 h10

11

Ready

Active <h,#11>

Start with the operations that are ready to schedule (the leaves)— remove completed ops from

Active and visit ops that depend on them. If they are ready, move them onto the Ready queue

— take the highest priority op in Ready & schedule it into next slot

— place the newly schedule op in Active⇐

nothing ready

Instruction Scheduling (Operational View)

4. Schedule the operations to reflect dependences and priorities• Use a greedy heuristic technique, such as list scheduling

COMP 506, Rice University 37

The ScheduleDependence Graph

3

58

7

9

10

12

10

13a

b c

d e

f g

h

i

Schedule1 a2 c3 e4 b5 d6 g7 f8 nop

9 h10 nop

11

Ready

Active <h,#11>

Start with the operations that are ready to schedule (the leaves)— remove completed ops from

Active and visit ops that depend on them. If they are ready, move them onto the Ready queue

— take the highest priority op in Ready & schedule it into next slot

— place the newly schedule op in Active⇐

nothing ready

Instruction Scheduling (Operational View)

4. Schedule the operations to reflect dependences and priorities• Use a greedy heuristic technique, such as list scheduling

COMP 506, Rice University 38

The ScheduleDependence Graph

3

58

7

9

10

12

10

13a

b c

d e

f g

h

i

Schedule1 a

2 c

3 e

4 b

5 d

6 g

7 f

8 nop

9 h

10 nop

11

Ready <I,3>

Active <h,#11>

Start with the operations that are

ready to schedule (the leaves)

— remove completed ops from

Active and visit ops that depend

on them. If they are ready, move

them onto the Ready queue

— take the highest priority op in

Ready & schedule it into next slot

— place the newly schedule op in

Active

Instruction Scheduling (Operational View)

4. Schedule the operations to reflect dependences and priorities• Use a greedy heuristic technique, such as list scheduling

COMP 506, Rice University 39

The ScheduleDependence Graph

3

58

7

9

10

12

10

13a

b c

d e

f g

h

i

Schedule1 a

2 c

3 e

4 b

5 d

6 g

7 f

8 nop

9 h

10 nop

11 i

Ready

Active

Start with the operations that are ready to schedule (the leaves)

— remove completed ops from Active and visit ops that depend on them. If they are ready, move them onto the Ready queue

— take the highest priority op in Ready & schedule it into next slot

— place the newly schedule op in Active

Instruction Scheduling (Operational View)

4. Schedule the operations to reflect dependences and priorities

• Use a greedy heuristic technique, such as list scheduling

COMP 506, Rice University 40

The ScheduleDependence Graph

3

5

8

7

9

10

12

10

13a

b c

d e

f g

h

i

Schedule

1 a2 c3 e4 b5 d6 g7 f8 nop

9 h10 nop

11 i

Ready

Active

Start with the operations that are ready to schedule (the leaves)— remove completed ops from

Active and visit ops that depend on them. If they are ready, move them onto the Ready queue

— take the highest priority op in Ready & schedule it into next slot

— place the newly schedule op in Active

And, the ready queue and

the active queue are empty,

at last … so, it halts.

Same schedule as on slide 5.

Improvements

COMP 506, Rice University 41

Cycle ¬ 1Ready ¬ leaves of DActive ¬ Ø

while (Ready È Active ¹ Ø)for each functional unit, f, do

if ∃ 1 or more op ∈ Ready for f thenremove highest priority op for f from

the Ready queueS(op) ¬ CycleActive ¬ Active È op

Cycle ¬ Cycle + 1

for each op Î Activeif (S(op) + delay(op) ≤ Cycle) then

remove op from Activefor each successor s of op in D

if (s is ready) thenReady ¬ Ready È s

Priorities

The problem is NP-Complete• No deterministic technique wins

• Different priority functions can have different effects♦ Latency-weighted depth♦ Most descendants in the graph♦ Breadth-first numbering♦ Depth-first numbering♦ Boost priority of any op that can

only run on one of the units♦ Boost priority of loads over stores

Cannot make up your mind?• Try a linear combination of two

priority functions (# x prio1 + $ x prio2)

• Can try # / $ tuning — Restrict # + $ = 1 and perform a

parameter sweep over values of #&$

Improvements

The problem is NP-Complete• No deterministic technique wins• When 2 or more ops have the

same priority, use another criterion to break the tie♦ Any of the suggested priority

functions might work♦ Prefer multi-cycle operations♦ Prefer more successors♦ Balance progress on all paths♦ Prefer operations on critical paths

Can also try random tie breaking, although you need to run the scheduler multiple times & keep the best schedule to see good results.

COMP 506, Rice University 42

Cycle ¬ 1Ready ¬ leaves of DActive ¬ Ø

while (Ready È Active ¹ Ø)for each functional unit, f, do

if ∃ 1 or more op ∈ Ready for f thenremove highest priority op for f from

the Ready queueS(op) ¬ CycleActive ¬ Active È op

Cycle ¬ Cycle + 1

for each op Î Activeif (S(op) + delay(op) ≤ Cycle) then

remove op from Activefor each successor s of op in D

if (s is ready) thenReady ¬ Ready È s

Tie breaking

Classic reference is “Efficient Instruction Scheduling for a Pipelined Architecture,” P.B. Gibbons and S.S. Muchnick, ACM SIGPLAN 86 Conference on Compiler Construction.

More List Scheduling

List scheduling algorithms fall into two distinct classes

The algorithm presented in this lecture is a forward scheduling algorithm• Schedules first cycle in the block before the second, before the third, …• A backward scheduling algorithm starts at the last, then schedules the

second last, then the third last, ...• An op only comes off of the active queue onto the ready queue when all of

its successors have been scheduled and its own latency has been covered.

COMP 506, Rice University 43

Forward list scheduling• Start with available operations• Work forward in time• Ready Þ all operands available

Backward list scheduling• Start with no successors• Work backward in time• Ready Þ latency covers uses

Again, classic reference is “Efficient Instruction Scheduling for a Pipelined Architecture,” P.B. Gibbons and S.S. Muchnick, ACM SIGPLAN 86 Conference on Compiler Construction.

More List Scheduling

Forward and backward scheduling can produce different results

COMP 506, Rice University 44

Block from SPEC benchmark “go”

Operation load loadI add addI store cmpLatency 1 1 2 1 4 1

cbr

cmp store1 store2 store3 store4 store5

add1 add2 add3 add4 addI

loadI1 lshift loadI2 loadI3 loadI4

1

2 5 5 5 5 5

7 7 7 7 6

88888Latency to

the cbr

Subscript to identify

More List Scheduling

COMP 506, Rice University 45Using “latency to root” as the priority function

Assume 2 identical ALUs & 1 load/store unit

Int Int Mem1 loadI1 lshift2 loadI2 loadI3

3 loadI4 add1

4 add2 add3

5 add4 addI store1

6 cmp store2

7 store3

8 store4

9 store5

10111213 cbr

Forward Schedule

Int Int Mem1 loadI4

2 addI lshift3 add4 loadI3

4 add3 loadI2 store5

5 add2 loadI1 store4

6 add1 store3

7 store2

8 store1

91011 cmp12 cbr

Backward Schedule1 fewer cycle in the backward schedule