Chapter 8 Intermediate Code Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin...

Chapter 8 Intermediate Code

Zhang Jing, Wang HaiLing

College of Computer Science & Technology

Harbin Engineering University

[email protected] 2

Intermediate code generation is in the mediate part of compiler, it is a bridge which translate source program into intermediate representation and then translate into target code. The position of intermediate code generation in compiler is shown in Figure 8.1. .

[email protected] 3

[email protected] 4

There are two advantages of using intermediate code, The first one is that we can attach different target code

machines to same front part after the part of intermediate code generation; ;

The second one is that a machine-independent code optimizer can be applied to the intermediated representation. .

[email protected] 5

Intermediate codes are machine independent codes, but they are close to machine instructions. The given program in a source language is converted to an equivalent program in an intermediate language by the intermediate code generator. .

[email protected] 6

Intermediate language can be many different languages, and the designer of the compiler decides this intermediate language. Postfix notation, four-address code(Quadraples), three-address code, portable code and assembly code can be used as an intermediate language. In this chapter, we will introduce them in detail.

[email protected] 7

8.1 Postfix Notation

If we can represent the source program by postfix notation, it will be easy to be translated into target code, because the target instruction order is same with the operator order in postfix notation. .

[email protected] 8

8.1.1 The definition of postfix notation the postfix notation for the expression a+b*c is abc

*+. the expression are as follows:

1 The order of operands for expression in postfix notation is same with its original order.

2 Operator follows its operand, and there are no parentheses in postfix notation.

3 The operator appears in the order by the calculation order.

[email protected] 9

For example, the postfix notation for expression a*(b+c/d) is abcd/+*, the translation procedure is just following the steps above. .

firstly, according to step 1 we get the order of operands of the expression: abcd,

secondly, by the step 2, the first operator in operator order is /, because it just follows its operands cd, in addition, as the step 3, operator / is calculated first, so the operator follow operands is / . The second operator in operator order is +, it dues to that there is parentheses in the original expression, operator + should be calculated earlier than operator *.The last one is *, because * is calculated lastly. .

[email protected] 10

The other example, the postfix notation for expression a*b+(c-d)/e is ab*cd-e/+. From examples, we know it is a bit difficult to translate an expression into its postfix notation. So scientist E.W.DIJKSTRA from Holand created a method to solve the problem. .


8.1.2 E.W.DIJKSTRA Method

There are two stacks in E.W.DIJKSTRA method, one stack storages operands, the other one is for operators, the procedure of it is shown by Figure 8.2, and the step of E.W.DIJKSTRA method is as follows: .


Actually, scanning the expression is from left to right. At the beginning of scanning, we push identifier # to the bottom of operator stack, similarly, we add identifier # to the end of expression to label that it is terminal of expression. When the two identifier # meet, it means the end of scanning. The steps of scanning are:

1 If it is operand, go to the operand stack :


2 If it is operator, it should be compared with the operator on the top of operator stack. When the priority of operator on the top stack is bigger than the scanning operator, or equal to it, the operator on the top of operator stack would be popped and go to the left side. On the other hand when the priority of operator on the top stack is less than the scanning operator, scanning operator should be pushed into operator stack.


3 If it is left parenthesis, just push it into operator stack, and then compare the operators within parentheses. .

If it is right parenthesis, pop all the operators within parentheses, what is more, parentheses would be disappeared and would not be represented as postfix notation. .

4 Return to step 1 till two identifier # meet.


Example 8.1

There is an expression of a+b*c , its postfix notation is abc*+. From the translating procedure shown by Figure 8.3, we can see that operator order is *+, it is also the pop order of the operator stack and calculating order .


8.1.3 Extended postfix notation

If the expression E is in the form of E1:=E2, then the postfix notation for E is :=, where and are the postfix notation for E1 and E2,respectively.

Example 8.2

There is an expression a:=b, according to the

definition above, its postfix notation is ab:= .


Example 8.3 For expression a ： =5*(b+8), the postfix notation

of it is: a5b8+*:=

If There is a program which form is ： IF u THEN S1

ELSE

……………………BEGIN S2

……………………END


u in this program is a condition which has two results, one is true, the other one is false, S1, S2 are two parts of program, l1 is the start number of S2, l2 is the end number of S2. .

The postfix notation of the program is as follows: BZ means if the value u is not true, then turn to l1,

BR means just go to l2. Now we will give an example to explain the translation. .


Example 8.4 S:=0; IF i<10 THEN S ： =S+i ELSE BEGIN i:=i+1 END ； The postfix notation of example 8.4 is as follows, the value

of in this program is 7, the value of is 9. S0 ： = i10< 7 BZ SSi+:= 9 BR (7) ii1+:= (9)


8.2 Four-Address Code

The other representation of intermediate code is four-address code, and its definition is:

op, y, z, x Apply operator op to y and z, and store the result

in x. where x, y and z are names, constants or compiler-generated temporaries; op is any operator. .


Example 8.5 The expression: a+(-b*c+d)*e might be translated

into the following four-address code sequence

（ 1 ）（ - ， b ，， T1 ）（ 2 ）（ * ， T1 ， c ， T2 ）（ 3 ）（ + ， T2 ， d ， T3 ）（ 4 ）（ * ， T3 ， e ， T4 ）（ 5 ）（ + ， a ， T4 ， T5 ） We can divide four-address code into different typ

es according to their operators.


Binary Operator:

op, y, z, result Where op is a binary arithmetic or logical operato

r. This binary operator is applied to y and z, and the result of the operation is stored in result. For example: a+b*c, its four address code is:

（ * ， b ， c ， T1 ）（ + ， a ， T1 ， T2 ）


Unary Operator: op, y, , result Where op is a unary arithmetic or logical operator.

This unary operator is applied to y, and the result of the operation is stored in result. For example, expression S:=0 , its four address code is: :

（： = ， 0 ，， S ）


Unconditional Jumps:

op, L, We will jump to the three-address code with the la

bel L, and the execution continues from that statement. Here op is BR. For example: :

（ BR ， 9 ，， )


Conditional Jumps: op, L, ,x Here op is BZ, we will jump to the three-address c

ode with the label L if the result of x is not true, and the execution continues from that statement. If the result is true, the execution continues from the statement following this conditional jump statement. .

（ BZ ， 7 ，， T1 ）


The four address code for example 8.4: （ 1 ）（： = ， 0 ，， S ）（ 2 ）（—， i ， 10 ， T1 ）（ 3 ）（ BZ ， 7 ，， T1 ）（ 4 ）（ + ， S ， i ， T2 ）（ 5 ）（： = ， T2 ，， S ）（ 6 ）（ BR ， 9 ，，）（ 7 ）（ + ， i ， l ， T3 ）（ 8 ）（： = ， T3 ，， i ） The storage for four-address code is similar with postfix

notation, namely, they are all use E.W.DIJKSTRA method to realize the operation. .


8.3 Three-Address Code The difference between three-address code and four-

address code is the different memory they occupy. When we produce target code, all the data will be assigned run-time memory. The memory location will be placed in the symbol-table for the data. Compared with three-address code, symbol table for four-address code interpose an extra field to store the result part in four-address code. When we use the calculation result, we should only look for the fourth part in four-address code, however, in three-address code, we should define a temporary value which references to the result part. This problem makes three-address code more difficult to be designed in an optimizing compiler. .


8.5 Portable Code Portable code is a kind of intermediate code, it

can be written by many program languages. This section, we will explain portable code written by PASCAL subprogram. .

Portable code includes two sections, one is PROCEDURE BLOCK which forms intermediate code, the other one is PROCEDURE GEN which generates intermediate code, and then stores it to CODE by PROCEDURE INTERPRET


We will introduce the PROCEDURE GEN in detail. There is a PASCAL source program which is shown below. .

PROGRAM main ；PROCEDURE 1 ；PROCEDURE 2 ；

BEGINREAD （ i ）；WHILE i>1 DO

BEGINIF i>10 THEN CALL 1

ELSE BEGINCALL 2END ；END ；

END


The portable code of the source program is as follows.


From above portable code, the structure of portable code includes three parts,

the first one is operand, such as INT, STO, OPR, LOD, JPC and CAL,

the second part is the level value, actually it is 0,

the third part is value, such as relative address, the number of units, procedure enter address, value of constant or some special operators. .


INT means data space in stack. A represents unit number in stack for procedures, for example, 5 in line 11.

CAL means that it calls procedure. A in it is the address of procedure.

LIT is pushing constant into the top of stack. A in it is the value of constant.

LOD is pushing variable into the top of stack. A in it is the relative address of variable.


STO means to pop the top of stack to unit. A in it is the relative address of it.

JMP means to go to an address directly. JPC is to move the address while the value on the

top of stack is false, otherwise it moves forward. OPR is operator. When A=2, it represents the cal

culation of “+”. When A=12, it means “ ＞” . When A=16 means the operator of “read” which reads data from the top of stack. When A=0, it means fetch return address.


8.6 Assembly code

Assembly code is a kind of intermediate code. Compared with three-address code and four-address code, it has the following advantages:

1 It is easier to be translated into machine code, in addition, its code is mapped to machine code one by one. .

2 It needn’t to be calculated the transfer address, because it often use symbol to represent address.


It can use all kinds of bite to represent data, and needn’t to be transferred.

Example 8.5 a+(-b*c+d)*e might be translated into the following assembly code


Mov ax,b

Neg ax

Mov bx,c

Imul bx

Mov bx,d

Add ax,bx

Mov bx,e

Imul bx

Mov bx,a

Add ax,bx

Mov t,ax


We will explain the assembly code above by examples, such as:

Mov ax,b means storing data b to variable ax, Neg ax means that the value ax is negative. Imul bx means multiplying value bx by value ax,

and then stores their result to ax. Add ax,bx means add value bx to value ax, and th

en stores their result to ax. t means a temporary variable.

Chapter 8 Intermediate Code Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin...

Documents

Transcript of Chapter 8 Intermediate Code Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin...