06.Intermediate Code Generation

41
Intermediate Code Generation dr. Hristo Nikolov LIACS Leiden University Leiden, Fall 2011

Transcript of 06.Intermediate Code Generation

Intermediate Code Generation

dr. Hristo Nikolov

LIACS

Leiden University

Leiden, Fall 2011

Why Intermediate Code?

Front-end: generates intermediate representation

Back-end: generates target code

Why Intermediate code?

Retargeting is facilitated

A machine-independent code optimizer can be applied

Intermediate

Code Parser

Static Checker

Intermediate Code

Generator

Code Generator

Intermediate Languages Syntax trees, Postfix notation,

Three-address code (x := y op z)

Example: a := b * -c + b * -c

assign

+

*

a

b uniminus

c

*

b uniminus

c

Syntax tree:

assign

+

*

a

b uniminus

c

DAG:

Postfix notation: a b c uniminus * b c uniminus * + assign

Three-Address Code Sequence of statements:

x := y op z

x + y * z => t1 := y * z

t2 := x + t1

Example: a := b * -c + b * -c

t1 := - c

t2 := b *t1

t3 := - c

t4 := b * t3

t5 := t2 + t4

a := t5

Linearized representation of a syntax tree

t1 := - c

t2 := b *t1

t5 := t2 + t2

a := t5

Linearized representation of a syntax DAG

Types of Three-Address Statements Assignment statements: x := y op z and x := op y

Indexed assignments: x := y[ i ] and x[ i ] := y

Pointer assignments: x := & y, x := * y, * x := y

Copy statements: x := y

Unconditional jumps: goto L

Conditional jumps: if x relop y goto L

Function calls: param x, call p, n, and return y

p(x1,..., xn) => param x1

. . .

param xn

call p, n

Syntax-directed Translation into Three-Address Code Temporary names are created

E → E1 + E2 => t := E1 + E2,

e.g. t5 := t2 + t4

a := t5

When expression is an identifier – no new temporary

Nonterminal E has two attributes:

E.place – the name which holds the value of E

E.code – three-address code sequence

Syntax-directed Translation

Production Semantic Rule

S → id := E S.code := E.code || gen( id.place ‘:=‘ E.place )

E → E 1 + E2 E.place := newtemp(); E.code := E1.code || E2.code || gen( E.place ‘:=’ E1.place ‘+’ E2.place )

E → E1 * E2 E.place := newtemp(); E.code := E1.code || E2.code || gen( E.place ‘:=’ E1.place ‘*’ E2 .place )

E → - E1 E.place := newtemp(); E.code := E1.code || gen( E.place ‘:=’ ‘uminus ’ E1.place )

E → ( E1 ) E.place := E1.place E.code := E.code

E → id E.place := id.place E .code :=‘’

Function newtemp() returns distinct names (t1, t2…)

gen(x ‘:=‘ y ‘op’ z) => x := y op z

To produce three-address code for assignments

Flow-of-control Statements

Production Semantic Rule

S → while E do S1 S.begin := newlabel() S.after := newlabel() S.code := gen( S.begin ‘:’ ) || E.code || gen ( ‘if ’ E.place ‘=‘ ‘0’ ‘goto’ S.after ) || S1.code || gen ( ‘goto’ S.begin) || gen( S.after ‘:’ )

While statements

E.code

...

if E.place = 0 goto S.after

S1.code

...

goto S.begin

...

S.begin :

S.after :

Flow-of-control Statements Example

i := 2 * n + k

while

i := 2 * n + k

while i do

i := i - k

Source program fragment

t1 := 2

L1: if i = 0 goto L2

L2:

t1 := 2

t2 := t1 * n

t3 := t2 + k

i := t3

L1: if i = 0 goto L2

t4 := i - k

i := t4

goto L1

L2:

Three-address code sequence

Implementation … (1)

Quadruple is record: op, arg 1, arg 2, result

Quadruples, triples, and idirect triples

op arg 1 arg 2 result

(0) uniminus c t1

(1) * b t1 t2

(2) uniminus c t3

(3) * b t3 t4

(4) + t2 t4 t5

(5) := t5 a

Pro: easy to rearrange code for global optimization Cons: lots of temporaries

t1 := - c

t2 := b *t1

t3 :=

t4 := b * t3

t5 := t2 + t4

a := t5

t1 := - c

t2 := b *t1

t3 := - c

t4 := b * t3

t5 := t2 + t4

a := t5

a := b * -c + b * -c

Three-address code Quads (quadruples)

Implementation … (2)

Triple is record: op, arg 1, arg 2

Quadruples, triples, and idirect triples

op arg 1 arg 2

(0) uniminus c

(1) * b (0)

(2) uniminus c

(3) * b (2)

(4) + (1) (3)

(5) := a (4)

Pro: temporaries are implicit Cons: difficult to rearrange code

t1 := - c

t2 := b *t1

t3 :=

t4 := b * t3

t5 := t2 + t4

a := t5

t1 := - c

t2 := b *t1

t3 := - c

t4 := b * t3

t5 := t2 + t4

a := t5

a := b * -c + b * -c

Three-address code Triples

Implementation … (3)

Implicit Triples: pointers to triples

Quadruples, triples, and idirect triples

op arg 1 arg 2

(14) uniminus c

(15) * b (14)

(16) uniminus c

(17) * b (16)

(18) + (15) (17)

(19) := a (18)

Pro: temporaries are implicit and it is easier to rearrange code

t1 := - c

t2 := b *t1

t3 :=

t4 := b * t3

t5 := t2 + t4

a := t5

t1 := - c

t2 := b *t1

t3 := - c

t4 := b * t3

t5 := t2 + t4

a := t5

a := b * -c + b * -c

Three-address code Implicit Triples Triple Container

statement

(0) (14)

(1) (15)

(2) (16)

(3) (17)

(4) (18)

(5) (19)

Names and Scope The three-address code is simplistic

it assumes that the names of variables can be easily resolved by the back end in global or local variables

We need local symbol tables to

record global and local declarations in procedures, blocks, and structs to resolve names

Declarations in a Procedure

P → M D

M → ε { offset := 0 }

D → D ; D

D → id : T { enter( id.name, T.type, offset); offset := offset + T.width }

T → integer { T.type := integer; T.width := 4 }

T → real { T.type := real; T.width := 8 }

T → array [ num ] of T1 { T.type := array( num.val, T1.type ); T.width := num.val x T1.width }

T → ^ T1 { T.type := pointer( T1.type ); T.width := 4 }

Keeping Track of Scope Information

Nested procedures

P → D

D → D ; D | id : T | proc id ; D ; S

D → proc id ; D1 ; S: then a new symbol table is created

The new table points to the enclosing procedure’s table

Procedure enter() ‘knows’ in which table to make an entry

Symbol Table Operations

mktable( previous ): returns a pointer to a new table which is linked to a previous table in the outer scope

enter( table, name, type, offset ): creates a new entry in table

addwidth( table, width ): accumulates the total width of all entries in table

enterproc( table, name, newtable ): creates a new entry in table for procedure with local scope newtable

Processing Declarations Nested Procedures

P → M D { addwidth( top( tblptr ), top( offset ) ); pop( tblptr ); pop( offset ) }

M → ε { t := mktable( nil ); push( t, tblptr ); push( 0, offset ) }

D → D1 ; D2

D → proc id ; N D1 ; S

{ t := top( tblptr ); addwidth( t, top( offset ) ); pop( tblptr ); pop( offset ); enterproc( top( tblptr ), id.name, t ) }

D → id : T { enter( top ( tblptr ), id.name, T.type, top( offset ) ); top( offset ) := top( offset ) + T.width }

N → ε { t := mktable( top( tblptr ) ); push( t, tblptr ); push( 0, offset ) }

Assignment Statements

S → id := E { p := lookup( id.name ); if p <> nil then emit( p ‘:=‘ E.place ) else error }

E → E1 + E2 { E.place := newtemp; emit( E.place ‘:=‘ E1.place ‘+’ E2.place ) }

E → E1 * E2 { E.place := newtemp; emit( E.place ‘:=‘ E1.place ‘*’ E2.place ) }

E → - E1

{ E.place := newtemp; emit( E.place ‘:=‘ ‘uniminus’ E1.place )}

E → ( E1 ) { E.place := E1.place }

E → id { p := lookup( id.name ); if p <> nil then E.place := p else error }

A Translation Scheme

Reusing Temporary Names

Each time newtemp() generates a new temporary name

Can we reuse temporaries?

E → E1 + E2 results in:

Modify newtemp() to use a “stack ”: Keep a counter c, initialized to 0

newtemp() increments c and returns temporary $c

Decrement counter on each use of a $i in a three-address statement

evaluate E1 into t1

evaluate E2 into t2

t3 := t1 + t2

evaluate E1 into t1

evaluate E2 into t2

t3 := t1 + t2

If t1 is no longer used, can reuse t1 instead of using new temp t3

Reusing Temporary Names

Statement Value of c

0

$0 := a * b 1

$1 := c * d 2

$0 := $0 + $1 1

$1 := e * f 2

$0 := $0 - $1 1

X := $0 0

Example: x := a * b + c * d – e * f

Addressing Array Elements

A : array [low .. high] of type;

ith element of A begins in location

base + (i – low) x W

base = A[low], W = sizeof( type )

Compile time evaluation if rewritten as

i x W + ( base – low x W )

c = base – low x W is stored in symbol-table entry for A

A[i] = i x W + c

One-Dimensional Arrays

t1 := c

t2 := i * W

t3 := t1[t2]

… := t3

t1 := c

t2 := i * W

t3 := t1[t2]

… := t3

Addressing Array Elements

In Row-major form A[i,j] is equivalent to A[i][j]

Two-Dimensional Arrays... (1)

A[1,1]

A[1,2]

A[1,3]

A[2,1]

A[2,2]

A[2,3]

A[1,1]

A[2,1]

A[1,2]

A[2,2]

A[1,3]

A[2,3]

Row-major Column-major

First Column

First Row

A : array [1..2, 1..3] of integer;

Addressing Array Elements

A[i,j]= base+((i–lowi)x Nj+j-lowj)x W

Nj = highj-lowj+1

((i x Nj) + j) x W + c, where

c = base - ((lowi x Nj) + lowj) x W

Two-Dimensional Arrays... (2)

A : array [1..2, 1..3] of integer; // Row-major form

t1 := i * 3

t1 := t1 + j

t2 := c

t3 := t1 * 4

t4 := t2[t3]

… := t4

t1 := i * 3

t1 := t1 + j

t2 := c

t3 := t1 * 4

t4 := t2[t3]

… := t4

Addressing Array Elements Grammar

Synthesized Attributes:

E.place name of temp holding value of E

Elist.array array name

Elist.place name of temp holding index value

Elist.ndim number of array dimensions

L.place lvalue (=name of temp)

L.offset index into array (=name of temp)

null indicates non-array simple id

S → L := E E L Elist

S → L := E E → E + E | ( E ) | L L → Elist ] | id Elist → Elist , E | id [ E

Addressing Array Elements A Translation Scheme ... (1)

S → L := E { if L.offset = null then /* L is a simple id */ emit( L.place ‘:=‘ E.place ) else emit( L.place’ [‘ L.offset ‘]’ ‘:=‘ E.place ) }

E → E1 + E2 { E.place := newtemp; emit( E.place ‘:=‘ E1.place ‘+’ E2.place ) }

E → ( E1 ) { E.place := E1.place }

E → L

{ if L.offset = null then /* L is a simple id */ E.place := L.place else begin E.place := newtemp; emit( E.place ‘:=‘L.place’ [‘ L.offset ‘]’ ) end }

Addressing Array Elements A Translation Scheme... (2)

L → Elist ] { L.place := newtemp; L.offset := newtemp emit( L.place ‘:=‘ c( Elist.array ) ) emit( L.offset ‘:=‘ Elist.place ‘*’ width( Elist.array )) }

L → id { L.place := id.place; L.offset := null }

Elist → Elist 1, E { t := newtemp; m := Elist 1.ndim +1; emit( t ‘:=‘ Elist 1.place ‘*’ limit( Elist 1.array, m )); emit( t ‘:=‘ t ‘+’ E.place ); Elist.array := Elist 1.array; Elist.place := t; Elist.ndim := m }

Elist → id [ E

{ Elist.array := id.place; Elist.place := E.place; Elist.ndim := 1 }

Translating Boolean Expressions

Composed by boolean operators and, or, not

Applied to boolean variables or relational expressions

Relational expression: E1 relop E2, where

E1 and E2 are arithmetic expressions

The grammar:

Attribute ‘op’: <, ≤, =, ≠, >, ≥ represented by relop

E → E or E | E and E | not E | ( E ) | id relop id | true | false E → E or E | E and E | not E | ( E ) | id relop id | true | false

Translating Boolean Expressions

True = 1, false = 0

Example:

a or b and not c

a < b: “if a<b then 1 else 0”

Numerical representation

t1 := not c

t2 := b and t1

t3 := a or t2

t1 := not c

t2 := b and t1

t3 := a or t2

100: if a < b goto 103

101: t := 0

102: goto 104

103: t := 1

104:

100: if a < b goto 103

101: t := 0

102: goto 104

103: t := 1

104:

Translating Boolean Expressions A Translation Scheme

E → E1 or E2 { E.place := newtemp; emit( E.place ‘:=‘ E1.place ‘or’ E2.place ) }

E → E1 and E2 { E.place := newtemp; emit( E.place ‘:=‘ E1.place ‘and’ E2.place ) }

E → not E1 { E.place := newtemp; emit( E.place ‘:=‘ ‘not’ E1.place ) }

E → ( E1 ) { E.place := E1.place }

E → id1 relop id2 { E.place := newtemp; emit( ‘if ’ id1.place relop.op id2.place ‘goto’ nextstat + 3 ); emit( E.place ‘:=‘ ‘0’ ); emit( ‘goto’ nextstat + 2 ); emit( E.place ‘:=‘ ‘1’ ) }

E → true { E.place := newtemp; emit( E.place ‘:=‘ ‘1’ ) }

E → false { E.place := newtemp; emit( E.place ‘:=‘ ‘0’ ) }

Backpatching

Code generation problem:

Labels (addresses) that control must go to may not be known at the time the jump statements are generated

A solution - backpatching:

Generate branching statements with empty jumps

Put the statements in a ‘goto’ list

Fill the labels of these statements when the proper label is determined

Backpatching Quadruples in a quadruple array

Labels are indexes into the array

Manipulate the list of lables:

makelist( i ) creates a new list containing three-address location i, returns a pointer to the list

merge( p1 , p2 ) concatenates lists pointed to by p1 and p2, returns a pointer to the concatenates list

backpatch( p, i ) inserts i as the target label for each of the statements in the list pointed to by p

Backpatching Grammar for boolean expressions

Synthesized Attributes:

E.code three-address code

E.truelist backpatch list for jumps on true

E.falselist backpatch list for jumps on false

M.quad location of current three-address quad

E → E1 or M E2 M E2 id1 relop id2 M

E → E1 or M E2 | E1 and M E2 | not E1 | ( E1 ) | id1 relop id2 | true | false M → ε

Backpatching A Translation Scheme... (1)

E → E1 or M E2 { backpatch( E1.falselist, M.quad ); E.truelist := merge( E1.truelist, E2.truelist ); E.falselist := E2.falselist }

E → E1 and M E2 { backpatch( E1.truelist, M.quad ); E.truelist := E2.truelist; E.falselist := merge( E1.falselist, E2.falselist ) }

E → not E1 { E.truelist := E1.falselist; E.falselist := E1.truelist }

E → ( E1 ) { E.truelist := E1.truelist; E.falselist := E1.falselist }

Backpatching A Translation Scheme... (2)

E → id1 relop id2 { E.truelist := makelist( nextquad ); E.falselist := makelist( nextquad + 1 ); emit( ‘if ’ id1.place relop.op id2.place ‘goto_’ ); emit( ‘goto_’ ) }

E → true { E.truelist := makelist( nextquad ); emit( ‘goto_’ ) }

E → false { E.falselist := makelist( nextquad ); emit( ‘goto_’ ) }

M → ε { M.quad := nextquad }

Backpatching Example: a<b or c<d and e<f

E → id1 relop id2

E → E1 or M E2

(M.quad := 102)

E → E1 and M E2

(M.quad := 104)

backpatch( {102}, 104 )

backpatch( {101}, 102 )

100: if a < b goto_

101: goto_

102: if c < d

103: goto_

104: if e < f

105: goto_

100: if a < b goto_

101: goto_

102: if c < d goto_

103: goto_

104: if e < f goto_

105: goto_

100: if a < b goto TRUE

101: goto 102

102: if c < d

103: goto FALSE

104: if e < f

105: goto FALSE

100: if a < b goto TRUE

101: goto 102

102: if c < d goto 104

103: goto FALSE

104: if e < f goto TRUE

105: goto FALSE

backpatch( {102}, 104 )

backpatch( {101}, 102 )

Flow-of-control Statements Grammar

Synthesized Attributes:

E.code three-address code

E.truelist backpatch list for jumps on true

E.falselist backpatch list for jumps on false

M.quad location of current three-address quad

S.nextlist backpatch list for jumps to the next statement after S (or nil)

L.nextlist backpatch list for jumps to the next statement after L (or nil)

S → if E then S L

S → if E then S | if E then S else S | while E do S | begin L end | A L → L ; S | S

Flow-of-Control Statements A Translation Scheme using backpatching... (1)

S → if E then M1 S1 N else M2 S2

{ backpatch( E.truelist, M1.quad ); backpatch( E.falselist, M2.quad ); S.nextlist := merge(S1.nextlist, merge(N.nextlist,S2.nextlist) }

N → ε { N.nextlist := makelist( nextquad ); emit( ‘goto_’) }

M → ε { M.quad := nextquad }

S → if E then M S1 { backpatch( E.truelist, M.quad ); S.nextlist := merge( E.falselist, S1.nextlist) }

S → wihle M1 E do M2 S1

{ backpatch( S1.nextlist, M1.quad ); backpatch( E.truelist, M2.quad ); S.nextlist := E.falselist; emit( ‘goto_’ M1.quad ) }

Flow-of-Control Statements A Translation Scheme using backpatching... (2)

S → begin L end { S.nextlist := L.nextlist }

S → A { S.nextlist := nil }

L → L1 ; M S { backpatch( L1.nextlist, M.quad ); L.nextlist := S.nextlist }

L → S { L.nextlist := S.nextlist }

Intermediate Code Generation

Chapters for reading:

8.1 to 8.6