Compiler Construction - Urząd Miasta Łodzimath.uni.lodz.pl/~robpleb/presentation_1.pdfIntermediate...

42
Compiler Construction Intermediate Code Generation

Transcript of Compiler Construction - Urząd Miasta Łodzimath.uni.lodz.pl/~robpleb/presentation_1.pdfIntermediate...

Page 1: Compiler Construction - Urząd Miasta Łodzimath.uni.lodz.pl/~robpleb/presentation_1.pdfIntermediate Code is often the link between the compiler’s ... Syntax trees and postfix notation

Compiler Construction

Intermediate Code Generation

Page 2: Compiler Construction - Urząd Miasta Łodzimath.uni.lodz.pl/~robpleb/presentation_1.pdfIntermediate Code is often the link between the compiler’s ... Syntax trees and postfix notation

22

Intermediate code

Intermediate Code is often the link between the compiler’s front end and back end. The front end translate a source program into an intermediate representation from which the back end generates target code.

There are some benefits of using a machine-independent :1. Retargeting is facilitated.2. A machine-independent code optimizer can be applied.

Page 3: Compiler Construction - Urząd Miasta Łodzimath.uni.lodz.pl/~robpleb/presentation_1.pdfIntermediate Code is often the link between the compiler’s ... Syntax trees and postfix notation

33

Intermediate Languages

Syntax trees and postfix notation are two types of intermediate representations. There is another one called three address code.Example of:

Syntax tree Postfix notation a b c uminus * b c uminus * + assign

Page 4: Compiler Construction - Urząd Miasta Łodzimath.uni.lodz.pl/~robpleb/presentation_1.pdfIntermediate Code is often the link between the compiler’s ... Syntax trees and postfix notation

44

Example syntax tree generation

Syntax trees for assignment statements are produced by the syntax-directed definition. The two binary operators + and * are examples of the full operator set in a typical language.

Production Semantic Rule

S -> id := E E -> E1 + E2 E -> E1 * E2 E -> - E1 E -> ( E1 ) E -> id

S.nptr := mknode( ‘assign’, mkleaf( id, id.place ), E.nptr )E.nptr := mknode( ‘+’, E1.nptr, E2.nptr )E.nptr := mknode( ‘*’, E1.nptr, E2.nptr )E.nptr := mknode( ‘uminus’, E1.nptr )E.nptr := E1.nptrE.nptr := mkleaf( id, id.place )

Page 5: Compiler Construction - Urząd Miasta Łodzimath.uni.lodz.pl/~robpleb/presentation_1.pdfIntermediate Code is often the link between the compiler’s ... Syntax trees and postfix notation

55

Three-address codeThe most common representation is 3AC (Three-Address code).

It makes machine code generation easier. It is a sequence of statements of the general form:

x := y op z

Here, x, y and z are names, constants, or compiler-generated temporaries and op is an operator.

To get an expression like x + y * z, we introduce Temporaries:t1 := y * zt2 := x + t1

3AC is easy to generate from syntax trees. We associate a temporary with each interior tree node.

Page 6: Compiler Construction - Urząd Miasta Łodzimath.uni.lodz.pl/~robpleb/presentation_1.pdfIntermediate Code is often the link between the compiler’s ... Syntax trees and postfix notation

Three-address code

t1 : * - ct2 := b * t1t3 := - ct4 := b * t3t5 := t2 + t4a := t5

t1 := - ct2 := b * t1t5 := t2 + t2a := t3

Page 7: Compiler Construction - Urząd Miasta Łodzimath.uni.lodz.pl/~robpleb/presentation_1.pdfIntermediate Code is often the link between the compiler’s ... Syntax trees and postfix notation

Three-address code

The reason for the term “ three-address code” is that each statement usually contains three addresses, two for the operands and one for the result.

Page 8: Compiler Construction - Urząd Miasta Łodzimath.uni.lodz.pl/~robpleb/presentation_1.pdfIntermediate Code is often the link between the compiler’s ... Syntax trees and postfix notation

88

Types of 3AC statementsThree-address statement are akin to assembly code. They can have symbolic labels that represent the index of a three

1. Assignment statements of the form x := y op z, where op is a binary arithmetic or logical operation.

2. Assignment statements of the form x := op y, where op is a unary operator.

3. Copy statements of the form x := y, which assigns the value of y to x.4. Unconditional statements goto L, which means the statement with

label L is the next to be executed.5. Conditional jumps, such as if x relop y goto L, where relop is a

relational operator (<, =, >=, etc) and L is a label. (If the condition x relop y is true, the statement with label L will be executed next.)

Page 9: Compiler Construction - Urząd Miasta Łodzimath.uni.lodz.pl/~robpleb/presentation_1.pdfIntermediate Code is often the link between the compiler’s ... Syntax trees and postfix notation

99

Types of 3AC statements6. param x and call p, n for procedure calls, and return y, where y

represents the (optional) returned value. The typical usage is as the sequence of three-address statements.

param x1param x2…param xncall p, n

7. Index assignments of the form x := y[i] and x[i] := y. The first sets x to the value in the location i memory units beyond location y. The second sets the content of the location i unit beyond x to the value of y.

8. Address and pointer assignments:x := &yx := *y*x := y

Page 10: Compiler Construction - Urząd Miasta Łodzimath.uni.lodz.pl/~robpleb/presentation_1.pdfIntermediate Code is often the link between the compiler’s ... Syntax trees and postfix notation

1010

Syntax-directed translation into Three-Address Code

When three-address code is generated, temporary names are made up for the interior nodes of a syntax tree. THe value of nonterminal E on the left side of E → E1 + E2 will be computed into a temporary t.

The nonterminal E has two attributes:E.place: a name to hold the value of E at runtimeE.code: the sequence of 3AC statements implementing E

We associate temporary names for interior nodes of the syntax tree.− The function newtemp() returns a fresh temporary name on each

invocation. It means that return a sequence of distinct names t1, t2, t3,... in response to successive calls.

Page 11: Compiler Construction - Urząd Miasta Łodzimath.uni.lodz.pl/~robpleb/presentation_1.pdfIntermediate Code is often the link between the compiler’s ... Syntax trees and postfix notation

1111

Syntax-directed translationSyntax-directed definition to procedure three-address code for

assignments.

Production Semantic Rules

S -> id := E E -> E1 + E2

E -> E1 * E2

E -> - E1

E -> ( E1 ) E -> id

S.code := E.code || gen( id.place ‘:=‘ E.place )E.place := newtemp();

E.code := E1.code || E2.code ||gen( E.place ‘:=‘ E1.place ‘+’ E2.place )

E.place := newtemp();E.code := E1.code || E2.code ||

gen( E.place ‘:=‘ E1.place ‘*’ E2.place )E.place := newtemp();

E.code := E1.code || gen( E.place ‘:=‘ ‘uminus’ E1.place )

E.place := E1.place; E.code := E1.codeE.place := id.place; E.code := ‘’

Page 12: Compiler Construction - Urząd Miasta Łodzimath.uni.lodz.pl/~robpleb/presentation_1.pdfIntermediate Code is often the link between the compiler’s ... Syntax trees and postfix notation

1212

Adding flow-of-control statementsFlow-of-control statements can be added to the language of assignments by producing and semantic rules. Here the semantic rules generating code for a while statement.

Production Semantic Rules

S -> while E do S1 S.begin := newlabel(); S.after := newlabel(); S.code := gen( S.begin ‘:’ ) || E.code || gen( ‘if’ E.place ‘=‘ ‘0’ ‘goto’ S.after ) ||S1.code || gen( ‘goto’ S.begin ) || gen( S.after

‘:’ )

Page 13: Compiler Construction - Urząd Miasta Łodzimath.uni.lodz.pl/~robpleb/presentation_1.pdfIntermediate Code is often the link between the compiler’s ... Syntax trees and postfix notation

1313

Implementation of Three-Address Statements

A three-address statement is an abstract form of intermediate code. The way to implement it is as records with fields for the operator and the operands. This representation are Quadruples.

A quadruple is a record structure with four fields, this are the components:− OP: the op field contains an internal code for the operator.− ARG1: the first operand− ARG2: the second operand− RESULT: the destinationThe contents of the fields arg1, arg2 and result are normally pointers to the symbol-table entries for the names represented by these fields.

Page 14: Compiler Construction - Urząd Miasta Łodzimath.uni.lodz.pl/~robpleb/presentation_1.pdfIntermediate Code is often the link between the compiler’s ... Syntax trees and postfix notation

1414

Quadruples

The quadruples in the table are for the following assignment:

a := b * - c + b * - c

Three-Address Code:t1 := - ct2 := b * t1t3 := - ct4 := b * t3t5 := t2 + t4a := t5

op arg1 arg2 result

(0) uminus c t1

(1) * b t1 t2

(2) uminus c t3

(3) * b t3 t4

(4) + t2 t4 t5

(5) := t5 a

Page 15: Compiler Construction - Urząd Miasta Łodzimath.uni.lodz.pl/~robpleb/presentation_1.pdfIntermediate Code is often the link between the compiler’s ... Syntax trees and postfix notation

1515

Declarations

As the sequence of declarations in a procedure or block is examined, we can lay out storage for names local to the procedure. For each local name, we create a symbol-table entry with information like :− The type of the name− How much storage the name requires− A relative offset from the beginning of the static data

area or beginning of the activation record.

Page 16: Compiler Construction - Urząd Miasta Łodzimath.uni.lodz.pl/~robpleb/presentation_1.pdfIntermediate Code is often the link between the compiler’s ... Syntax trees and postfix notation

1616

Declarations

The syntax of languages such as C, Pascal, and Fortran, allows all the declarations in a single procedure to be processed as a group. Offset it how we call to a global variable that is able to keep track of the next available relative address.

Page 17: Compiler Construction - Urząd Miasta Łodzimath.uni.lodz.pl/~robpleb/presentation_1.pdfIntermediate Code is often the link between the compiler’s ... Syntax trees and postfix notation

1717

Declarations in a procedure

P -> DD -> D ; DD-> id : T T -> integer T -> real T -> array [ num ] of T1 T -> ^ T1

{ offset := 0 }

{ enter( id.name, T.type, offset ); offset := offset + T.width }{ T.type := integer; T.width := 4 }{ T.type := real; T.width := 8 }

{ T.type := array( num.val, T1.type ); T.width := num.val * T1.width }

{ T.type := pointer( T1.type ); T.width := 4 }

Integers here have width 4 and reals have 8. The width of an array is obtained by multiplying the width of each element by the number of elements. The width of each pointer is assumed to be 4 too.Computing the types and relatives address of declared names.

Page 18: Compiler Construction - Urząd Miasta Łodzimath.uni.lodz.pl/~robpleb/presentation_1.pdfIntermediate Code is often the link between the compiler’s ... Syntax trees and postfix notation

1818

Keeping track of scope

In a language with nested procedures, names local to each procedure can be assigned relative address using the approach explained before.

When nested procedures or blocks are entered, we need to suspend processing declarations in the enclosing scope. So as to do this we need to change the grammar. The solution is easy, adding the following rules to the language.

P -> D

D -> D ; D | id : T | proc id ; D ; S

Page 19: Compiler Construction - Urząd Miasta Łodzimath.uni.lodz.pl/~robpleb/presentation_1.pdfIntermediate Code is often the link between the compiler’s ... Syntax trees and postfix notation

1919

Keeping track of scopeSuppose we have a separate ST(Symbol table) for each

procedure.When we enter a procedure declaration, we create a new ST.The new ST points back to the ST of the enclosing

procedure.The name of the procedure is a local for the enclosing

procedure.

Page 20: Compiler Construction - Urząd Miasta Łodzimath.uni.lodz.pl/~robpleb/presentation_1.pdfIntermediate Code is often the link between the compiler’s ... Syntax trees and postfix notation

2020

This is a symbol table for nested procedures.

Page 21: Compiler Construction - Urząd Miasta Łodzimath.uni.lodz.pl/~robpleb/presentation_1.pdfIntermediate Code is often the link between the compiler’s ... Syntax trees and postfix notation

2121

Operations defined in terms of semantic rules

mktable(previous) creates a new symbol table pointing to previous, and returns a pointer to the new table.

enter(table,name,type,offset) creates a new entry for name in a symbol table with the given type and offset.

addwidth(table,width) records the width of ALL the entries in table.

enterproc(table,name,newtable) creates a new entry for procedure name in ST table, and links it to newtable.

Page 22: Compiler Construction - Urząd Miasta Łodzimath.uni.lodz.pl/~robpleb/presentation_1.pdfIntermediate Code is often the link between the compiler’s ... Syntax trees and postfix notation

2222

Addressing array elementsIf an array element has width w, then the ith element of array

A begins at addressbase + ( i - low ) * w

where base is the address of the first element of A.We can rewrite the expression as

i * w + ( base - low * w )The first term depends on i (a program variable)The second term can be precomputed at compile time.

Page 23: Compiler Construction - Urząd Miasta Łodzimath.uni.lodz.pl/~robpleb/presentation_1.pdfIntermediate Code is often the link between the compiler’s ... Syntax trees and postfix notation

2323

Two-dimensional arraysIn a 2D array, the offset of A[i1,i2] is

base + ( (i1-low1)*n2 + (i2-low2) ) * wThis can be rewritten as

((i1*n2)+i2)*w+(base-((low1*n2)+low2)*w)Where the first term is dynamic and the second term is static

(precomputable at compile time).

This generalizes to N dimensions.

Page 24: Compiler Construction - Urząd Miasta Łodzimath.uni.lodz.pl/~robpleb/presentation_1.pdfIntermediate Code is often the link between the compiler’s ... Syntax trees and postfix notation

2424

Code generation for array referencesWe replace plain “id” as an expression with a nonterminalS -> L := EE -> E + EE -> ( E )E -> LL -> Elist ]L -> idElist -> Elist, EElist -> id [ E

Page 25: Compiler Construction - Urząd Miasta Łodzimath.uni.lodz.pl/~robpleb/presentation_1.pdfIntermediate Code is often the link between the compiler’s ... Syntax trees and postfix notation

2525

Code generation for array referencesS -> L := E { if L.offset = null then

/* L is a simple id */emit(L.place ‘:=‘ E.place);

elseemit(L.place ’[‘ L.offset ‘]’ ‘:=‘ E.place) }

E -> E + E { … (no change) }E -> ( E ) { … (no change) }E -> L { if L.offset = null then

/* L is a simple id */E.place := L.place

else beginE.place := newtemp;emit( E.place ‘:=‘ L.place ‘[‘ L.offset ‘]’ )

end }

a temp varcontaining

a calculatedarray offset

Page 26: Compiler Construction - Urząd Miasta Łodzimath.uni.lodz.pl/~robpleb/presentation_1.pdfIntermediate Code is often the link between the compiler’s ... Syntax trees and postfix notation

2626

Code generation for array referencesL -> Elist ] { L.place := newtemp;

L.offset := newtemp; emit(L.place ‘:=‘ c(Elist.array)); emit(L.offset ‘:=‘ Elist.place ‘*’ width(Elist.array)) }

L -> id { L.place := id.place; L.offset = null }Elist -> Elist1, E { t := newtemp(); m := Elist1.ndim + 1;

emit(t ‘:=‘ Elist1.place ‘*’ limit( Elist1.array, m )); emit(t ‘:=‘ t ‘+’ E.place ); Elist.array := Elist1.array; Elist.place := t; Elist.ndim := m }

Elist -> id [ E { Elist.array := id.place; Elist.place := E.place; Elist.ndim := 1 }

the staticpart of the array

reference

Page 27: Compiler Construction - Urząd Miasta Łodzimath.uni.lodz.pl/~robpleb/presentation_1.pdfIntermediate Code is often the link between the compiler’s ... Syntax trees and postfix notation

Type conversions within Assignments

In practice, there would be many different types of variables and constants, so the compiler must either reject certain mixed-type operations or generate appropriate coercion (type conversion) instructions.

Page 28: Compiler Construction - Urząd Miasta Łodzimath.uni.lodz.pl/~robpleb/presentation_1.pdfIntermediate Code is often the link between the compiler’s ... Syntax trees and postfix notation

Type conversions within Assignments

Let´s suppose there are two types ( real and integer), with integers converted to reals when necessary. We introduce another attribute E.type, whose value is either real or integer. The semantic rule for E.type associated with the production E -> E + E is:

Page 29: Compiler Construction - Urząd Miasta Łodzimath.uni.lodz.pl/~robpleb/presentation_1.pdfIntermediate Code is often the link between the compiler’s ... Syntax trees and postfix notation

Type conversions within Assignments

The entire semantic rule must be modified to generate three-address statements of the form x: = inttorealy whose effect is to convert y to a real of equal value, called x.

For example for the input x:= y + i + j where x and y have type real and i and j integer, the output would be:

Page 30: Compiler Construction - Urząd Miasta Łodzimath.uni.lodz.pl/~robpleb/presentation_1.pdfIntermediate Code is often the link between the compiler’s ... Syntax trees and postfix notation

Type conversions within Assignments

As the number of types subject to conversion increases, the number of cases that arise increases quadratically ( or worse, if there are operators with more than two arguments). Therefore with large number of types, careful organization of the semantic actions becomes more important.

Page 31: Compiler Construction - Urząd Miasta Łodzimath.uni.lodz.pl/~robpleb/presentation_1.pdfIntermediate Code is often the link between the compiler’s ... Syntax trees and postfix notation

Definition and Representation of BE

Def - BE are expressions used for representing logic values and conditional statements that alter flow-control.

There are 2 main ways to representate the boolean expressions:

● Numerically● Flow of control (Position on the program)

Page 32: Compiler Construction - Urząd Miasta Łodzimath.uni.lodz.pl/~robpleb/presentation_1.pdfIntermediate Code is often the link between the compiler’s ... Syntax trees and postfix notation

Numerical Representation

All the expression will be evaluated, like an arithmetic statement.

Relational expression Conditional statement Three-adress code

a > b if a>b then 1 else 0

Page 33: Compiler Construction - Urząd Miasta Łodzimath.uni.lodz.pl/~robpleb/presentation_1.pdfIntermediate Code is often the link between the compiler’s ... Syntax trees and postfix notation

Short-Circuit Statements

Not always analyze all the expression to define its value. Sometimes the position in the code can tell the result of the sentence.

Relational expression Three-address code

a < b or c < d and e < f

Page 34: Compiler Construction - Urząd Miasta Łodzimath.uni.lodz.pl/~robpleb/presentation_1.pdfIntermediate Code is often the link between the compiler’s ... Syntax trees and postfix notation

Control-Flow Statements

All the boolean expressions shaped by the gramar:

S -> if E then S1 | if E then S1 else S2 | while E do S1

Page 35: Compiler Construction - Urząd Miasta Łodzimath.uni.lodz.pl/~robpleb/presentation_1.pdfIntermediate Code is often the link between the compiler’s ... Syntax trees and postfix notation

Control-Flow Translation of BE

The generation of the code corresponding with E in the previous grammar. (Control-Flow Statements)

Relational Expression Three-address code

a < b or c < d and e < f

Page 36: Compiler Construction - Urząd Miasta Łodzimath.uni.lodz.pl/~robpleb/presentation_1.pdfIntermediate Code is often the link between the compiler’s ... Syntax trees and postfix notation

Case Statements

Regarding the “switch-case” included in many programming language, we are including a section about how to translate this expressions. The necessary steps are:

1. Evaluate the expression.2. Find which value in the lists of cases matches the

result of step 1. (Default case if none)3. Execute the proper statement.

Page 37: Compiler Construction - Urząd Miasta Łodzimath.uni.lodz.pl/~robpleb/presentation_1.pdfIntermediate Code is often the link between the compiler’s ... Syntax trees and postfix notation

Syntax-Directed Translation of Case Statements

Page 38: Compiler Construction - Urząd Miasta Łodzimath.uni.lodz.pl/~robpleb/presentation_1.pdfIntermediate Code is often the link between the compiler’s ... Syntax trees and postfix notation

Backpatching

Backpatching is generating a series of branching statements with the targets of the jumps temporarily left unspecified, and put each of these statements on a list of goto statements whose labels will be filled in when the proper label can be determined.

This way backpatching can be used to generate code for boolean expressions and flow-of-control statements in one pass.

Page 39: Compiler Construction - Urząd Miasta Łodzimath.uni.lodz.pl/~robpleb/presentation_1.pdfIntermediate Code is often the link between the compiler’s ... Syntax trees and postfix notation

Backpatching

We will generate quadruples into a quadruple array. Labels will be indices into this array. To manipulate lists of labels, we use three functions.

- Makelist(i): to create a new list containing only i, an index into the array of quadruples; makelist returns a pointer to the list it has madre.

- Merge(p1,p2): to concatenate the lists pointed to by p1 and p2, and returns a pointer to the concatenated list.

- Backpatch(p,i): to insert i as the target label for each of the statements on the list pointed to by p.

Page 40: Compiler Construction - Urząd Miasta Łodzimath.uni.lodz.pl/~robpleb/presentation_1.pdfIntermediate Code is often the link between the compiler’s ... Syntax trees and postfix notation

Procedure calls

The procedures are such important and frequently used programming constructs that is imperative for a compiler to generate good code for procedure calls and returns. The run-time routines that handle procedure argument passing, calls and returns are part of the run-time support package.

Page 41: Compiler Construction - Urząd Miasta Łodzimath.uni.lodz.pl/~robpleb/presentation_1.pdfIntermediate Code is often the link between the compiler’s ... Syntax trees and postfix notation

When a procedure call occurs

- Space must be allocated.- Arguments must be evaluated and made available to

the called procedure in a known place.- Environment pointers must be established to enable

the called procedure to access data.- The state of the calling procedure and the return

address must be saved.- Finally, a jump to the beginning of the code must be

generated.

Page 42: Compiler Construction - Urząd Miasta Łodzimath.uni.lodz.pl/~robpleb/presentation_1.pdfIntermediate Code is often the link between the compiler’s ... Syntax trees and postfix notation

When a procedure call returns.

- If the called procedure is a function, the result must be stored in a known place.

- The activation record of the calling procedure must be restored.

- A jump to the calling procedure's return address must be generated.