Spring 2014Jim Hogg - UW - CSE - P501R-1 CSE P501 – Compiler Construction Available Expressions...

59
Spring 2014 Jim Hogg - UW - CSE - P501 R-1 CSE P501 – Compiler Construction Available Expressions Dataflow Analysis Aliasing

Transcript of Spring 2014Jim Hogg - UW - CSE - P501R-1 CSE P501 – Compiler Construction Available Expressions...

Page 1: Spring 2014Jim Hogg - UW - CSE - P501R-1 CSE P501 – Compiler Construction Available Expressions Dataflow Analysis Aliasing.

Spring 2014 Jim Hogg - UW - CSE - P501 R-1

CSE P501 – Compiler Construction

Available Expressions

Dataflow Analysis

Aliasing

Page 2: Spring 2014Jim Hogg - UW - CSE - P501R-1 CSE P501 – Compiler Construction Available Expressions Dataflow Analysis Aliasing.

Spring 2014 Jim Hogg - UW - CSE - P501 R-2

The Story So Far…

Redundant expression elimination Local Value Numbering (LVN) Super-local Value Numbering (SVN)

Extends LVN to EBBs SSA-like namespace

Dominator Value Numbering (DVN)

All of these propagate along forward edges

None are global In particular, none can handle back edges and

loops

Page 3: Spring 2014Jim Hogg - UW - CSE - P501R-1 CSE P501 – Compiler Construction Available Expressions Dataflow Analysis Aliasing.

Spring 2014 Jim Hogg - UW - CSE - P501 R-3

Dominator Value Numbering

m0 = a0 + b0

n0 = a0 + b0

A

p0 = c0 + d0

r0 = c0 + d0

Bq0 = a0 + b0

r1 = c0 + d0

C

e0 = b0 + 18s0 = a0 + b0

u0 = e0 + f0

De1 = a0 + 17t0 = c0 + d0

u1 = e1 + f0

E

e2 = Φ(e0,e1)u2 = Φ(u0,u1)v0 = a0 + b0

w0 = c0 + d0

x0 = e2 + f0

F

r2 = Φ(r0,r1)y0 = a0 + b0

z0 = c0 + d0

G

Most sophisticated algorithm so far

Still misses some opportunities Can’t handle loops

Missed opportunities

Page 4: Spring 2014Jim Hogg - UW - CSE - P501R-1 CSE P501 – Compiler Construction Available Expressions Dataflow Analysis Aliasing.

Spring 2014 Jim Hogg - UW - CSE - P501 R-4

Goal: use dataflow analysis to find CSEs that span basic blocks

Idea: calculate available expressions at beginning of each block (rather than just the Value-Numbers for variables)

Having found an expression that is already available, there's no need to re-evaluate it: use a copy instead

Available Expressions

Page 5: Spring 2014Jim Hogg - UW - CSE - P501R-1 CSE P501 – Compiler Construction Available Expressions Dataflow Analysis Aliasing.

Available Expressions: It's Simple!

Spring 2014 Jim Hogg - UW - CSE - P501 R-5

a=b+cd=e+ff=a+c

g=a+c g=a+dh=b+c

j=a+b+c+d

b+c is available here

• b+c was calculated earlier• neither b nor c has been assigned-to

since• so replace h=b+c with h=a

• No Value Numbers (super-scripts)• ie: trying to work out whether two variables hold

same value• No SSA Numbers (sub-scripts)

• ie: recording the life or instantiation of each variable

Page 6: Spring 2014Jim Hogg - UW - CSE - P501R-1 CSE P501 – Compiler Construction Available Expressions Dataflow Analysis Aliasing.

Spring 2014 Jim Hogg - UW - CSE - P501 R-6

“Available” and Other Terms

An expression e is defined at point p in the flowgraph if its value is computed at p

Sometimes called definition site, or simply "def" eg: x = a+b ; expression a+b is defined here

An expression e is killed at point p if one of its operands is defined at p

Sometimes called kill site, or simply "kill" eg: x = a+b ; def site b = 7 ; kill site ; kills every expression involving b !

An expression e is available at point p if every path leading to p contains a prior definition of e and e is not killed between that definition and p

Simply: an available expression is one you don't need to re-calculate

Page 7: Spring 2014Jim Hogg - UW - CSE - P501R-1 CSE P501 – Compiler Construction Available Expressions Dataflow Analysis Aliasing.

Available Expressions - Intuition

Spring 2014 Jim Hogg - UW - CSE - P501 R-7

=a+b

a+b?

=a+b

=a+b

a+b?

=a+b

a+b?

a=

=a+b

a+b?

a+b must reach a+b? So, every path from start to a+b? must include a def for a+b. Any assignment to a or b kills that available expression, throughout the procedure!

Page 8: Spring 2014Jim Hogg - UW - CSE - P501R-1 CSE P501 – Compiler Construction Available Expressions Dataflow Analysis Aliasing.

Available Expressions: Flowgraph

Spring 2014 Jim Hogg - UW - CSE - P501 R-8

a=b+cd=e+ff=a+c

g=a+c g=a+dh=b+c

j=a+b

Page 9: Spring 2014Jim Hogg - UW - CSE - P501R-1 CSE P501 – Compiler Construction Available Expressions Dataflow Analysis Aliasing.

Number the Expressions

Spring 2014 Jim Hogg - UW - CSE - P501 R-9

a=b+c 1d=e+f 2f=a+c 3

g=a+c 4

g=a+d 5h=b+c 6

j=a+b 7

• Start by assigning (arbitrary) numbers to every expression in the function

• Pay no attention to what each expression is, just number it!

• Implementation: a map between expression number and location - eg, expression #6 = instruction#3 in basic-block #4

Page 10: Spring 2014Jim Hogg - UW - CSE - P501R-1 CSE P501 – Compiler Construction Available Expressions Dataflow Analysis Aliasing.

def & kill for each Instruction

Spring 2014 Jim Hogg - UW - CSE - P501 R-10

a=b+c 1d=e+f 2f=a+c 3

g=a+c 4

g=a+d 5h=b+c 6

j=a+b 7

{1}{2}{3}

{3,4,5,7}{5}{}

Eg: a = b + c• defs the expression b+c• kills every expression that

uses a

killdef

Page 11: Spring 2014Jim Hogg - UW - CSE - P501R-1 CSE P501 – Compiler Construction Available Expressions Dataflow Analysis Aliasing.

Summarize DEF & KILL for Basic Block

Spring 2014 Jim Hogg - UW - CSE - P501 R-11

a=b+c 1d=e+f 2f=a+c 3

g=a+c 4

g=a+d 5h=b+c 6

DEF= {}foreach instruction DEF= (DEF geni ) - killi

KILL = {}foreach instruction KILL = killi

{1}{2}{3}{1,2}

{3,4,5,7}{5}{}{3,4,5,7}

killdef

def and kill ~ instructionDEF and KILL ~ basic block

j=a+b 7

Union all the defs: {1,2,3}. Remove any which appear in KILL => {1,2}

Page 12: Spring 2014Jim Hogg - UW - CSE - P501R-1 CSE P501 – Compiler Construction Available Expressions Dataflow Analysis Aliasing.

Summarize DEF & KILL for Flowgraph

Spring 2014 Jim Hogg - UW - CSE - P501 R-12

a=b+c 1d=e+f 2f=a+c 3

g=a+c 4

g=a+d 5h=b+c 6

{1}{2}{3}{1,2}

{3,4,5,7}{5}{}{3,4,5,7}

killdef

{5}{6}{5,6}

{}{}{}

{4}{4}

{}{}

j=a+b 7

Page 13: Spring 2014Jim Hogg - UW - CSE - P501R-1 CSE P501 – Compiler Construction Available Expressions Dataflow Analysis Aliasing.

Spring 2014 Jim Hogg - UW - CSE - P501 R-13

Available Expression Sets For each block b, define

DEF(b) – the set of expressions defined in b and not subsequently killed in b

ie: defined in b and survives to its end can construct this by inspecting b in isolation - never changes

KILL(b) – the set of all expressions in the entire procedure that is killed in b

can construct this by inspecting b in isolation - never changes

AVAIL(b) – the set of expressions available on entry to b find by solving a set of equations

Implementation: assign a number to each expression and track its availability via one bit in a (large) bit-vector representing the set of all expressions in the function

Page 14: Spring 2014Jim Hogg - UW - CSE - P501R-1 CSE P501 – Compiler Construction Available Expressions Dataflow Analysis Aliasing.

Spring 2014 R-14

Computing Available Expressions

preds(b) is the set of b’s predecessors in the flowgraph works for all flows, including loops defines a system of simultaneous equations – a dataflow

problem

AVAIL(b) = DEF(p) ( AVAIL(p) - KILL(p) ) p preds(b)

b

p3p2p1 predecessors

AVAIL(b)

GEN(p)

KILL(p)

AVAIL(p)

AVAILb = Intersectp DEFp (AVAILp - KILLp)

Page 15: Spring 2014Jim Hogg - UW - CSE - P501R-1 CSE P501 – Compiler Construction Available Expressions Dataflow Analysis Aliasing.

Spring 2014 Jim Hogg - UW - CSE - P501 R-15

Computing Available Expressions

foreach block b { set AVAILb = {all expressions in function} = U}

worklist = {all blocks in function}

while (worklist {}) remove a block b from worklist recompute AVAILb

if AVAILb changed worklist = = successorsb

}}

Given DEFb and KILLb for each basic block, b, in the procedure:

Page 16: Spring 2014Jim Hogg - UW - CSE - P501R-1 CSE P501 – Compiler Construction Available Expressions Dataflow Analysis Aliasing.

Spring 2014 Jim Hogg - UW - CSE - P501 R-16

Name Space?

In previous value-numbering algorithms, we used an SSA-like renaming to keep track of versions

In global dataflow problems, we use the original namespace we require that a+b have the same value along all paths to its use if a or b is updated along any path to its use, then a+b has the

'wrong' value, so must recalculate its value so original names are exactly what we want

KILL captures when an expression becomes no longer "available"

Page 17: Spring 2014Jim Hogg - UW - CSE - P501R-1 CSE P501 – Compiler Construction Available Expressions Dataflow Analysis Aliasing.

Spring 2014 Jim Hogg - UW - CSE - P501 R-17

Global CSE using Available Expressions

Phase I Number each expression in procedure For each block b, compute DEFb and KILLb - once off Initialize AVAILb = {all expressions in procedure} = U For each block b, compute AVAILb, by iterating until fixed

point

Phase II For each block b, value-number the block starting with

AVAILb

Replace expressions in AVAILb with references to the previously computed values

Also called "Global Redundancy Elimination" or GRE

Page 18: Spring 2014Jim Hogg - UW - CSE - P501R-1 CSE P501 – Compiler Construction Available Expressions Dataflow Analysis Aliasing.

Spring 2014 Jim Hogg - UW - CSE - P501 R-18

Comparing Algorithms

m = a + bn = a + b

A

p = c + dr = c + d

Bq = a + br = c + d

C

e = b + 18s = a + bu = e + f

De = a + 17t = c + du = e + f

E

v = a + bw = c + dx = e + f

F

y = a + bz = c + d

G

LVN – Local Value Numbering SVN – Superlocal Value

Numbering DVN – Dominator-based Value

Numbering GRE – Global Redundancy

Elimination

Page 19: Spring 2014Jim Hogg - UW - CSE - P501R-1 CSE P501 – Compiler Construction Available Expressions Dataflow Analysis Aliasing.

Spring 2014 Jim Hogg - UW - CSE - P501 R-19

Comparing Algorithms (2)

LVN <= SVN <= DVN form a strict hierarchy later algorithms find a superset of previous information

Global Redundancy Elimination, via Available Expressions, finds a different set

Discovers e+f in F (computed in both D and E) Misses identical values if they have different names

eg: a+b and c+d when a=c and b=d Value Numbering catches this

e = b + 18s = a + bu = e + f

De = a + 17t = c + du = e + f

E

v = a + bw = c + dx = e + f

F

Page 20: Spring 2014Jim Hogg - UW - CSE - P501R-1 CSE P501 – Compiler Construction Available Expressions Dataflow Analysis Aliasing.

Spring 2014 Jim Hogg - UW - CSE - P501 R-20

Scope of Analysis

Larger context (EBBs, regions, global, inter-proc) may help

More opportunities for optimizations

But not always

Introduces uncertainties about flow of control Usually only allows weaker analysis Sometimes has unwanted side effects

Can create additional pressure on registers, for example

Page 21: Spring 2014Jim Hogg - UW - CSE - P501R-1 CSE P501 – Compiler Construction Available Expressions Dataflow Analysis Aliasing.

Code Replication

Sometimes replicating code increases opportunities

modify code to create larger regions with simple control flow

Two examples Cloning Inline substitution

Spring 2014 Jim Hogg - UW - CSE - P501 R-21

Page 22: Spring 2014Jim Hogg - UW - CSE - P501R-1 CSE P501 – Compiler Construction Available Expressions Dataflow Analysis Aliasing.

Spring 2014 Jim Hogg - UW - CSE - P501 R-22

A

B C

D E

F

G

m = a + bn = a + bA

p = c + dr = c + d

B q = a + br = c + d

C

e = b + 18s = a + bu = e + f

De = a + 17t = c + du = e + f

E

v = a + bw = c + dx = e + f

F

y = a + bz = c + d

G

v = a + bw = c + dx = e + f

F

y = a + bz = c + d

Gy = a + bz = c + d

G

Original

Cloned

Cloning: Before & After

Even LVN can optimize these larger basic blocksLarger code size => increased I-cache pressure

Page 23: Spring 2014Jim Hogg - UW - CSE - P501R-1 CSE P501 – Compiler Construction Available Expressions Dataflow Analysis Aliasing.

Inline Substitution ("inlining")

Global optimizer must assume the callee can modify all reachable data:

In MiniJava, all fields of all objects In C/C++, additionally all "global" data

So the call kills many available expressions Must save/restore caller registers across call Calling the function imposes its own overhead

Spring 2014 Jim Hogg - UW - CSE - P501 R-23

Calling a function can be expensive!

• Inlining: replace each call with a copy of the callee

• Introduces more opportunities for optimization

Solution

Page 24: Spring 2014Jim Hogg - UW - CSE - P501R-1 CSE P501 – Compiler Construction Available Expressions Dataflow Analysis Aliasing.

Inline Substitution - "inlining"

Eliminates call overhead Opens opportunities for more

optimizations Can be applied to large method bodies

too Aggressive optimizer will inline 2 or more

deep Increases total code size With care, is a huge win for OO code Recompile if caller or callee changes!

Spring 2014 Jim Hogg - UW - CSE - P501 R-24

class C { int x; int getx() { return x; }}

class X { void f() { C c = new C(); int total = c.getx() + 42; }}

class X { void f() { C c = new C(); int total = c.x + 42; }}

Class with trivial getter

Method f calls getx

Compiler inlines body of getx into f

Page 25: Spring 2014Jim Hogg - UW - CSE - P501R-1 CSE P501 – Compiler Construction Available Expressions Dataflow Analysis Aliasing.

"Available Expressions" is a first example of dataflow analysis It supports the optimization called "Global Redundancy Elimination", or GRE

Many similar problems can be expressed in same framework

No limit to the number of execution paths thru a function No limit to the length of an execution path And yet, Dataflow Analysis infers a finite number of facts about the function Dataflow Analysis does not distinguish among the paths taken to any point

eg: it assumes both arms of an IF-THEN-ELSE can be taken We then use these facts to transform and optimize the IR

Example facts about a single function Variable x has the constant value 42 at every point At point p, variable x has same value as variable y At point p, value of x could have been defined only at point q At point p, the value of x is no longer required

Spring 2014 Jim Hogg - UW - CSE - P501 R-25

Dataflow Analysis

Page 26: Spring 2014Jim Hogg - UW - CSE - P501R-1 CSE P501 – Compiler Construction Available Expressions Dataflow Analysis Aliasing.

Dataflow Equations - Overview

Spring 2014 Jim Hogg - UW - CSE - P501 R-26

• Available Expressions• AVAILINb = Intersectp DEFp ( AVAILINp - KILLp )

• Live Variables• LIVEINb = USEb ( LIVEOUTb - DEFb )

• Reaching Defs• REACHESb = Unionp DEFOUTp ( REACHESp SURVIVESp )

• Anticipatable (Very Busy) Expressions• ANTICb = Intersects USEDs ( ANTICs - KILLEDs )

• Generic• OUTb = GENb ( INb - KILLb )

Page 27: Spring 2014Jim Hogg - UW - CSE - P501R-1 CSE P501 – Compiler Construction Available Expressions Dataflow Analysis Aliasing.

Spring 2014 Jim Hogg - UW - CSE - P501 R-27

Dataflow Analysis

Set of techniques for compile-time reasoning about runtime values

Need to build a graph Trivial for basic blocks Flowgraph for whole-function (global) analysis Callgraph for whole-program analysis

Limitations Assumes all paths are taken (eg: both arms of IF-THEN-ELSE) Infers facts about a function, rather than actual runtime values

eg: x+y is redundant Arrays – treats array as one variable

eg: don't know, in general, whether a[i] == a[j] Pointers – difficult, expensive to analyze

eg: *p = 1; *q = 2; return *p; // same as return 1?

Page 28: Spring 2014Jim Hogg - UW - CSE - P501R-1 CSE P501 – Compiler Construction Available Expressions Dataflow Analysis Aliasing.

R-28

Same analysis we did earlier to eliminate redundant expressions

Spring 2014

b

p3p2p1 predecessors

AVAILb

DEFp

KILLp

AVAILp

AVAILb = Intersectp DEFp ( AVAILp - KILLp )

Recap: Available Expressions

Page 29: Spring 2014Jim Hogg - UW - CSE - P501R-1 CSE P501 – Compiler Construction Available Expressions Dataflow Analysis Aliasing.

Spring 2014 R-29

Characterizing Dataflow Analysis

All dataflow algorithms involve sets of facts about each block b

INb – facts true on entry to b OUTb – facts true on exit from b GENb – facts created and not killed in b KILLb – facts killed in b

These are related by the equationOUTb = GENb ( INb – KILLb )

Solve this iteratively for all blocks Sometimes facts propagate forward (eg: available expressions) Sometimes facts propagate backward (eg: reaching defs)

bGENb

INb

OUTbKILLb

Page 30: Spring 2014Jim Hogg - UW - CSE - P501R-1 CSE P501 – Compiler Construction Available Expressions Dataflow Analysis Aliasing.

Spring 2014 Jim Hogg - UW - CSE - P501 R-30

A variable v is live at point p if there is any path from p to a use of v along which v is not redefined

ie: a variable is live here if some later code uses its value there

Some uses:

Register allocation – only live variables need a register Only live variables need be stored back to memory Detect use of uninitialized variables - how? Improve SSA construction – only need Φ-function for

variables that are live in a block (later)

Live Variables (or "liveness")

Page 31: Spring 2014Jim Hogg - UW - CSE - P501R-1 CSE P501 – Compiler Construction Available Expressions Dataflow Analysis Aliasing.

Liveness Analysis Sets

For each block b, define the sets:

USEb = variables used (ie, read-from) in b before any def

DEFb = variables defined (ie, assigned-to) in b & not subsequently killed in b

INb = variables live on entry to b

OUTb = variables live on exit from b

Spring 2014 Jim Hogg - UW - CSE - P501 R-31

Page 32: Spring 2014Jim Hogg - UW - CSE - P501R-1 CSE P501 – Compiler Construction Available Expressions Dataflow Analysis Aliasing.

Liveness - Intuition

Spring 2014 Jim Hogg - UW - CSE - P501 R-32

=x x is "livein"

B

x=

x is "liveout"

B

=x

x=

x is not "liveout"

B

x=

DEFB = {x}

USEC = {x}

USEB = {x}

DEFB = {x}

C

CDEFC = {x}

Page 33: Spring 2014Jim Hogg - UW - CSE - P501R-1 CSE P501 – Compiler Construction Available Expressions Dataflow Analysis Aliasing.

Liveness Equations

Set INb = OUTb = {} Update IN and OUT until no

change "backwards" dataflow analysis

Spring 2014 T-33

OUTb = Unions INs

INb = USEb ( OUTb - DEFb )

b

s2s1 successors

OUTb

INs1INs2

USEb

DEFb

INb

Page 34: Spring 2014Jim Hogg - UW - CSE - P501R-1 CSE P501 – Compiler Construction Available Expressions Dataflow Analysis Aliasing.

Liveness Calculation

Spring 2014 Jim Hogg - UW - CSE - P501 R-34

INb = USEb (OUTb – DEFb) OUTb = Unions INs

1: a = 0

2: b = a + 1

3: c = c + b

4: a = b * 2

5: a < N

6: return c

block USE DEF

1 - a

2 a b

3 b c c

4 b a

5 a -

6 c -

• Work backwards from 6 to 1

Page 35: Spring 2014Jim Hogg - UW - CSE - P501R-1 CSE P501 – Compiler Construction Available Expressions Dataflow Analysis Aliasing.

Liveness Calculation

Spring 2014 Jim Hogg - UW - CSE - P501 R-35

INb = USEb (OUTb – DEFb) OUTb = Unions INs

1: a = 0

2: b = a + 1

3: c = c + b

4: a = b * 2

5: a < N

6: return c

block USE DEF OUT IN

1 - a a c c

2 a b b c a c

3 b c c b c b c

4 b a a c b c

5 a - c a c

6 c - - c

• Work backwards from 6 to 1• Note c is livein for block 1 -

uninitialized!

Page 36: Spring 2014Jim Hogg - UW - CSE - P501R-1 CSE P501 – Compiler Construction Available Expressions Dataflow Analysis Aliasing.

Liveness Calculation

Spring 2014 Jim Hogg - UW - CSE - P501 R-36

INb = USEb (OUTb – DEFb) OUTb = Unions INs

1: a = 0

2: b = a + 1

3: c = c + b

4: a = b * 2

5: a < N

6: return c

block USE DEF OUT IN OUT

IN

1 - a a c c a c c

2 a b b c a c b c a c

3 b c c b c b c b c b c

4 b a a c b c a c b c

5 a - c a c a c a c

6 c - - c - c

• Work backwards from 6 to 1• Only change in iteration 2 - a is ivein for

block 5• Stops changing after 2 iterations

Page 37: Spring 2014Jim Hogg - UW - CSE - P501R-1 CSE P501 – Compiler Construction Available Expressions Dataflow Analysis Aliasing.

Liveness Calculation

Spring 2014 Jim Hogg - UW - CSE - P501 R-37

INb = USEb (OUTb – DEFb) OUTb = Unions INs

1: a = 0

2: b = a + 1

3: c = c + b

4: a = b * 2

5: a < N

6: return c

block USE DEF OUT IN OUT

IN

1 - a a c c a c c

2 a b b c a c b c a c

3 b c c b c b c b c b c

4 b a a c b c a c b c

5 a - c a c a c a c

6 c - - c - c

• Work backwards from 6 to 1• Stops changing after 2

iterations

Page 38: Spring 2014Jim Hogg - UW - CSE - P501R-1 CSE P501 – Compiler Construction Available Expressions Dataflow Analysis Aliasing.

Spring 2014 Jim Hogg - UW - CSE - P501 R-38

Alternate Liveness Equations

Many problems have more than one formulation

Different books use different sets:

USED[b] – variables used in b before being defined in b

NOTDEF[b] – variables not defined in b LIVE[b] – variables live on exit from b

Equation LIVE[b] = ssucc(b) USED[s] ( LIVE[s] NOTDEF[s] )

Page 39: Spring 2014Jim Hogg - UW - CSE - P501R-1 CSE P501 – Compiler Construction Available Expressions Dataflow Analysis Aliasing.

Spring 2014 Jim Hogg - UW - CSE - P501 R-39

A definition of variable v at L1 reaches instruction at L2 if that instruction uses v and there is a path from L1 to L2 that does not re-define v

Use:

Find all possible defs for a variable in an expression - great debugger plugin when looking for 'culprit'

Reaching Defs

Page 40: Spring 2014Jim Hogg - UW - CSE - P501R-1 CSE P501 – Compiler Construction Available Expressions Dataflow Analysis Aliasing.

Spring 2014 Jim Hogg - UW - CSE - P501 R-40

Equations for Reaching Defs

Sets DEFOUTb – set of defs in b that reach the end of b

SURVIVEDb – set of all defs not killed by a re-def in b

REACHb – set of defs that reach b

Equation

REACHb = Unionp DEFOUTp ( REACHp SURVIVEDp )

Page 41: Spring 2014Jim Hogg - UW - CSE - P501R-1 CSE P501 – Compiler Construction Available Expressions Dataflow Analysis Aliasing.

Spring 2014 Jim Hogg - UW - CSE - P501 R-41

Also known as "Very Busy" Expressions

Expression x+y is anticipated at point p if all paths from p eventually compute x+y, using

values of x and y as they exist at p

Use: Code hoisting – move x+y to p reduces code size; no effect on execution time

Anticipated Expressions

Page 42: Spring 2014Jim Hogg - UW - CSE - P501R-1 CSE P501 – Compiler Construction Available Expressions Dataflow Analysis Aliasing.

Spring 2014 Jim Hogg - UW - CSE - P501 R-42

Equations for: Anticipated Expressions

Sets

USEDb – expressions used in b before they are killed KILLEDb – expressions def'd in b before they

are used ANTICb – anticipated expressions on exit from

b

Equation

ANTICb = Intersects USEDs ( ANTICs - KILLEDs )

Page 43: Spring 2014Jim Hogg - UW - CSE - P501R-1 CSE P501 – Compiler Construction Available Expressions Dataflow Analysis Aliasing.

Dataflow Equations - Recap

Spring 2014 Jim Hogg - UW - CSE - P501 R-43

• Available Expressions

• AVAILINb = Intersectp DEFp ( AVAILINp - KILLp )

• Live Variables

• LIVEINb = USEb ( LIVEOUTb - DEFb )

• Reaching Defs

• REACHESb = Unionp DEFOUTp ( REACHESp SURVIVESp )

• Anticipated Expressions

• ANTICb = Intersects USEDs ( ANTICs - KILLEDs )

• Generic

• OUTb = GENb ( INb - KILLb )

Page 44: Spring 2014Jim Hogg - UW - CSE - P501R-1 CSE P501 – Compiler Construction Available Expressions Dataflow Analysis Aliasing.

Spring 2014 Jim Hogg - UW - CSE - P501 R-44

Efficiency of Dataflow Analysis

The algorithms eventually terminate but reduce time-needed by picking a good order to visit

nodes in the flowgraph depends on how information flows

Forward problems – reverse post-order Backward problems - post-order

Page 45: Spring 2014Jim Hogg - UW - CSE - P501R-1 CSE P501 – Compiler Construction Available Expressions Dataflow Analysis Aliasing.

Using Dataflow Facts

Some possible Tranformations/Optimizations ...

Spring 2014 Jim Hogg - UW - CSE - P501 R-45

Page 46: Spring 2014Jim Hogg - UW - CSE - P501R-1 CSE P501 – Compiler Construction Available Expressions Dataflow Analysis Aliasing.

CSE Elimination

x+y is defined at L1 and available at L2 ie: x nor y is re-defined between L1 and

L2

Spring 2014 Jim Hogg - UW - CSE - P501 R-46

L1: = x+y t = a

L2: = t

L1: = x+y

L2: = x+y

before after

Save calculation into temp t

Use t, rather than re-calculate

• Analysis: Available Expressions• Code runs faster by not re-calculating

x+y

Page 47: Spring 2014Jim Hogg - UW - CSE - P501R-1 CSE P501 – Compiler Construction Available Expressions Dataflow Analysis Aliasing.

Constant Prop.

c is a constant a reaches L2 (a is not re-defined between L1

and L2)

Spring 2014 Jim Hogg - UW - CSE - P501 R-47

L1: a = c

L2: = c

L1: a = c

L2: = a

before after

Propagate c, not a

• Analysis: Reaching Defs• Code runs faster because c is embedded into the

instruction

Page 48: Spring 2014Jim Hogg - UW - CSE - P501R-1 CSE P501 – Compiler Construction Available Expressions Dataflow Analysis Aliasing.

Copy Prop. x is a variable a reaches L2 (a is not re-defined between L1 and

L2), and is the only def of a to reach L2

Spring 2014 Jim Hogg - UW - CSE - P501 R-48

L1: a = x

L2: = x

L1: a = x

L2: = a

before after

Propagate x, not a

• Analysis: Reaching Defs• Code runs faster because c is embedded into the

instruction

Page 49: Spring 2014Jim Hogg - UW - CSE - P501R-1 CSE P501 – Compiler Construction Available Expressions Dataflow Analysis Aliasing.

Copy Prop. Tradeoffs

Downside: can lengthen lifetime of variable x => more register pressure or memory traffic thru spilling not worth doing if only reason is to eliminate copies – let the

register allocate deal with that

Upside: may expose other optimizations. Eg:

Spring 2014 Jim Hogg - UW - CSE - P501 R-49

a = y + zu = yc = u + z

a = y + zu = yc = y + z

before

after

Now reveals CSE y+z

Page 50: Spring 2014Jim Hogg - UW - CSE - P501R-1 CSE P501 – Compiler Construction Available Expressions Dataflow Analysis Aliasing.

Dead Code Elimination (DCE)

a is dead after L1 statement at L1 has no side-effects (output,

exceptions, etc)

Spring 2014 Jim Hogg - UW - CSE - P501 R-50

L1: a = x

before after

Delete statement at L1

• Analysis: Liveness• Code runs faster because one less

statement

Page 51: Spring 2014Jim Hogg - UW - CSE - P501R-1 CSE P501 – Compiler Construction Available Expressions Dataflow Analysis Aliasing.

Aliasing = more than one name can refer to the same memory location

Call-by-reference parameters eg: calc(C c1, C c2) - c1 might point to same object as c2

Address-taken variables eg: int* p = &x - so x and *p refer to the same memory location

Expressions involving subscripts Without knowing specific value of i, a[i] might refer to any

element of a

Aliasing is the enemy of optimization

Spring 2014 Jim Hogg - UW - CSE - P501 R-51

Aliasing

Page 52: Spring 2014Jim Hogg - UW - CSE - P501R-1 CSE P501 – Compiler Construction Available Expressions Dataflow Analysis Aliasing.

Aliases vs Optimizations

Example:p.x = 5; q.x = 7; a = p.x;

Does reaching-defs show that the p.x reaches a?

(Or: do p and q refer to the same variable/object?)

(Or: can p and q refer to the same thing?)

Spring 2014 Jim Hogg - UW - CSE - P501 R-52

Page 53: Spring 2014Jim Hogg - UW - CSE - P501R-1 CSE P501 – Compiler Construction Available Expressions Dataflow Analysis Aliasing.

Aliases vs Optimizations

Is it safe for this function to return the value 1?

What if p and q refer to the same int?

Conservative/safe: since it's possible, the compiler must assume it's always true!

C provides "restrict" keyword - where user asserts there is no aliasing

Spring 2014 Jim Hogg - UW - CSE - P501 R-53

void f(int *p, int *q) { *p = 1; *q = 2;

return *p;

}

Page 54: Spring 2014Jim Hogg - UW - CSE - P501R-1 CSE P501 – Compiler Construction Available Expressions Dataflow Analysis Aliasing.

Types and Aliases (1)

In Java, ML, MiniJava, and others, if two variables have incompatible types they cannot be names for the same location

Also helps that programmer cannot create arbitrary pointers to storage in these languages

Spring 2014 Jim Hogg - UW - CSE - P501 R-54

Page 55: Spring 2014Jim Hogg - UW - CSE - P501R-1 CSE P501 – Compiler Construction Available Expressions Dataflow Analysis Aliasing.

Types and Aliases (2)

Strategy: Divide memory locations into alias classes based on type information (every type, array, record field is a class)

Implication: need to propagate type information from the semantics pass to optimizer

Not normally true of a minimally typed IR

Items in different alias classes cannot refer to each other

Spring 2014 Jim Hogg - UW - CSE - P501 R-55

Page 56: Spring 2014Jim Hogg - UW - CSE - P501R-1 CSE P501 – Compiler Construction Available Expressions Dataflow Analysis Aliasing.

Aliases and Flow Analysis

Idea: Base alias classes on points where a value is created

Every new/malloc and each local or global variable whose address is taken is an alias class

Pointers can refer to values in multiple alias classes (so each memory reference is to a set of alias classes)

Use to calculate “may alias” information (eg: p “may alias” q at program point s)

Spring 2014 Jim Hogg - UW - CSE - P501 R-56

Page 57: Spring 2014Jim Hogg - UW - CSE - P501R-1 CSE P501 – Compiler Construction Available Expressions Dataflow Analysis Aliasing.

Using “may-alias” information

Treat each alias class as a “variable” in dataflow analysis problems

Example: framework for available expressions Given statement s: M[a]:=b,

gen[s] = { }kill[s] = { M[x] | a may alias x at s }

Spring 2014 Jim Hogg - UW - CSE - P501 R-57

Page 58: Spring 2014Jim Hogg - UW - CSE - P501R-1 CSE P501 – Compiler Construction Available Expressions Dataflow Analysis Aliasing.

May-Alias Analysis

Without alias analysis, #2 kills M[t] since x and t might be related

If analysis determines that “x may-alias t” is false, M[t] is still available at #3; can eliminate the common sub-expression and use copy propagation

1: u = M[t]2: M[x] = r3: w = M[t]4: b = u+w

Spring 2014 Jim Hogg - UW - CSE - P501 R-58

Page 59: Spring 2014Jim Hogg - UW - CSE - P501R-1 CSE P501 – Compiler Construction Available Expressions Dataflow Analysis Aliasing.

Spring 2014 Jim Hogg - UW - CSE - P501 R-59

Dataflow analysis is core of classical optimizations

Still to explore: Discovering and optimizing loops SSA – Static Single Assignment form

Next