Chapter 8 Intermediate Code Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin...

40
Chapter 8 Intermediate Code Zhang Jing, Wang HaiLing College of Computer Sc ience & Technology Harbin Enginee ring University

Transcript of Chapter 8 Intermediate Code Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin...

Page 1: Chapter 8 Intermediate Code Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University.

Chapter 8 Intermediate Code

Zhang Jing, Wang HaiLing

College of Computer Science & Technology

Harbin Engineering University

Page 2: Chapter 8 Intermediate Code Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University.

[email protected] 2

Intermediate code generation is in the mediate part of compiler, it is a bridge which translate source program into intermediate representation and then translate into target code. The position of intermediate code generation in compiler is shown in Figure 8.1. .

Page 4: Chapter 8 Intermediate Code Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University.

[email protected] 4

There are two advantages of using intermediate code, The first one is that we can attach different target code

machines to same front part after the part of intermediate code generation; ;

The second one is that a machine-independent code optimizer can be applied to the intermediated representation. .

Page 5: Chapter 8 Intermediate Code Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University.

[email protected] 5

Intermediate codes are machine independent codes, but they are close to machine instructions. The given program in a source language is converted to an equivalent program in an intermediate language by the intermediate code generator. .

Page 6: Chapter 8 Intermediate Code Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University.

[email protected] 6

Intermediate language can be many different languages, and the designer of the compiler decides this intermediate language. Postfix notation, four-address code(Quadraples), three-address code, portable code and assembly code can be used as an intermediate language. In this chapter, we will introduce them in detail.

Page 7: Chapter 8 Intermediate Code Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University.

[email protected] 7

8.1 Postfix Notation

If we can represent the source program by postfix notation, it will be easy to be translated into target code, because the target instruction order is same with the operator order in postfix notation. .

Page 8: Chapter 8 Intermediate Code Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University.

[email protected] 8

8.1.1 The definition of postfix notation the postfix notation for the expression a+b*c is abc

*+. the expression are as follows:

1 The order of operands for expression in postfix notation is same with its original order.

2 Operator follows its operand, and there are no parentheses in postfix notation.

3 The operator appears in the order by the calculation order.

Page 9: Chapter 8 Intermediate Code Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University.

[email protected] 9

For example, the postfix notation for expression a*(b+c/d) is abcd/+*, the translation procedure is just following the steps above. .

firstly, according to step 1 we get the order of operands of the expression: abcd,

secondly, by the step 2, the first operator in operator order is /, because it just follows its operands cd, in addition, as the step 3, operator / is calculated first, so the operator follow operands is / . The second operator in operator order is +, it dues to that there is parentheses in the original expression, operator + should be calculated earlier than operator *.The last one is *, because * is calculated lastly. .

Page 10: Chapter 8 Intermediate Code Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University.

[email protected] 10

The other example, the postfix notation for expression a*b+(c-d)/e is ab*cd-e/+. From examples, we know it is a bit difficult to translate an expression into its postfix notation. So scientist E.W.DIJKSTRA from Holand created a method to solve the problem. .

Page 11: Chapter 8 Intermediate Code Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University.

[email protected] 11

8.1.2 E.W.DIJKSTRA Method

There are two stacks in E.W.DIJKSTRA method, one stack storages operands, the other one is for operators, the procedure of it is shown by Figure 8.2, and the step of E.W.DIJKSTRA method is as follows: .

Page 13: Chapter 8 Intermediate Code Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University.

[email protected] 13

Actually, scanning the expression is from left to right. At the beginning of scanning, we push identifier # to the bottom of operator stack, similarly, we add identifier # to the end of expression to label that it is terminal of expression. When the two identifier # meet, it means the end of scanning. The steps of scanning are:

1 If it is operand, go to the operand stack :

Page 14: Chapter 8 Intermediate Code Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University.

[email protected] 14

2 If it is operator, it should be compared with the operator on the top of operator stack. When the priority of operator on the top stack is bigger than the scanning operator, or equal to it, the operator on the top of operator stack would be popped and go to the left side. On the other hand when the priority of operator on the top stack is less than the scanning operator, scanning operator should be pushed into operator stack.

Page 15: Chapter 8 Intermediate Code Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University.

[email protected] 15

3 If it is left parenthesis, just push it into operator stack, and then compare the operators within parentheses. .

If it is right parenthesis, pop all the operators within parentheses, what is more, parentheses would be disappeared and would not be represented as postfix notation. .

4 Return to step 1 till two identifier # meet.

Page 16: Chapter 8 Intermediate Code Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University.

[email protected] 16

Example 8.1

There is an expression of a+b*c , its postfix notation is abc*+. From the translating procedure shown by Figure 8.3, we can see that operator order is *+, it is also the pop order of the operator stack and calculating order .

Page 19: Chapter 8 Intermediate Code Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University.

[email protected] 19

8.1.3 Extended postfix notation

If the expression E is in the form of E1:=E2, then the postfix notation for E is :=, where and are the postfix notation for E1 and E2,respectively.

Example 8.2

There is an expression a:=b, according to the

definition above, its postfix notation is ab:= .

Page 20: Chapter 8 Intermediate Code Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University.

[email protected] 20

Example 8.3 For expression a : =5*(b+8), the postfix notation

of it is: a5b8+*:=

If There is a program which form is : IF u THEN S1

ELSE

……………………BEGIN S2

……………………END

Page 21: Chapter 8 Intermediate Code Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University.

[email protected] 21

u in this program is a condition which has two results, one is true, the other one is false, S1, S2 are two parts of program, l1 is the start number of S2, l2 is the end number of S2. .

The postfix notation of the program is as follows: BZ means if the value u is not true, then turn to l1,

BR means just go to l2. Now we will give an example to explain the translation. .

Page 22: Chapter 8 Intermediate Code Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University.

[email protected] 22

Example 8.4 S:=0; IF i<10 THEN S : =S+i ELSE BEGIN i:=i+1 END ; The postfix notation of example 8.4 is as follows, the value

of in this program is 7, the value of is 9. S0 : = i10< 7 BZ SSi+:= 9 BR (7) ii1+:= (9)

Page 23: Chapter 8 Intermediate Code Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University.

[email protected] 23

8.2 Four-Address Code

The other representation of intermediate code is four-address code, and its definition is:

op, y, z, x Apply operator op to y and z, and store the result

in x. where x, y and z are names, constants or compiler-generated temporaries; op is any operator. .

Page 24: Chapter 8 Intermediate Code Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University.

[email protected] 24

Example 8.5 The expression: a+(-b*c+d)*e might be translated

into the following four-address code sequence

( 1 )( - , b , , T1 ) ( 2 )( * , T1 , c , T2 ) ( 3 )( + , T2 , d , T3 ) ( 4 )( * , T3 , e , T4 ) ( 5 )( + , a , T4 , T5 ) We can divide four-address code into different typ

es according to their operators.

Page 25: Chapter 8 Intermediate Code Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University.

[email protected] 25

Binary Operator:

op, y, z, result Where op is a binary arithmetic or logical operato

r. This binary operator is applied to y and z, and the result of the operation is stored in result. For example: a+b*c, its four address code is:

( * , b , c , T1 ) ( + , a , T1 , T2 )

Page 26: Chapter 8 Intermediate Code Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University.

[email protected] 26

Unary Operator: op, y, , result Where op is a unary arithmetic or logical operator.

This unary operator is applied to y, and the result of the operation is stored in result. For example, expression S:=0 , its four address code is: :

(: = , 0 , , S )

Page 27: Chapter 8 Intermediate Code Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University.

[email protected] 27

Unconditional Jumps:

op, L, We will jump to the three-address code with the la

bel L, and the execution continues from that statement. Here op is BR. For example: :

( BR , 9 , , )

Page 28: Chapter 8 Intermediate Code Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University.

[email protected] 28

Conditional Jumps: op, L, ,x Here op is BZ, we will jump to the three-address c

ode with the label L if the result of x is not true, and the execution continues from that statement. If the result is true, the execution continues from the statement following this conditional jump statement. .

( BZ , 7 , , T1 )

Page 29: Chapter 8 Intermediate Code Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University.

[email protected] 29

The four address code for example 8.4: ( 1 )(: = , 0 , , S ) ( 2 )(—, i , 10 , T1 ) ( 3 )( BZ , 7 , , T1 ) ( 4 )( + , S , i , T2 ) ( 5 )(: = , T2 , , S ) ( 6 )( BR , 9 , , ) ( 7 )( + , i , l , T3 ) ( 8 )(: = , T3 , , i ) The storage for four-address code is similar with postfix

notation, namely, they are all use E.W.DIJKSTRA method to realize the operation. .

Page 30: Chapter 8 Intermediate Code Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University.

[email protected] 30

8.3 Three-Address Code The difference between three-address code and four-

address code is the different memory they occupy. When we produce target code, all the data will be assigned run-time memory. The memory location will be placed in the symbol-table for the data. Compared with three-address code, symbol table for four-address code interpose an extra field to store the result part in four-address code. When we use the calculation result, we should only look for the fourth part in four-address code, however, in three-address code, we should define a temporary value which references to the result part. This problem makes three-address code more difficult to be designed in an optimizing compiler. .

Page 31: Chapter 8 Intermediate Code Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University.

[email protected] 31

8.5 Portable Code Portable code is a kind of intermediate code, it

can be written by many program languages. This section, we will explain portable code written by PASCAL subprogram. .

Portable code includes two sections, one is PROCEDURE BLOCK which forms intermediate code, the other one is PROCEDURE GEN which generates intermediate code, and then stores it to CODE by PROCEDURE INTERPRET

Page 32: Chapter 8 Intermediate Code Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University.

[email protected] 32

We will introduce the PROCEDURE GEN in detail. There is a PASCAL source program which is shown below. .

PROGRAM main ;PROCEDURE 1 ;PROCEDURE 2 ;

BEGINREAD ( i );WHILE i>1 DO

BEGINIF i>10 THEN CALL 1

ELSE BEGINCALL 2END ;END ;

END

Page 33: Chapter 8 Intermediate Code Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University.

[email protected] 33

The portable code of the source program is as follows.

Page 34: Chapter 8 Intermediate Code Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University.

[email protected] 34

From above portable code, the structure of portable code includes three parts,

the first one is operand, such as INT, STO, OPR, LOD, JPC and CAL,

the second part is the level value, actually it is 0,

the third part is value, such as relative address, the number of units, procedure enter address, value of constant or some special operators. .

Page 35: Chapter 8 Intermediate Code Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University.

[email protected] 35

INT means data space in stack. A represents unit number in stack for procedures, for example, 5 in line 11.

CAL means that it calls procedure. A in it is the address of procedure.

LIT is pushing constant into the top of stack. A in it is the value of constant.

LOD is pushing variable into the top of stack. A in it is the relative address of variable.

Page 36: Chapter 8 Intermediate Code Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University.

[email protected] 36

STO means to pop the top of stack to unit. A in it is the relative address of it.

JMP means to go to an address directly. JPC is to move the address while the value on the

top of stack is false, otherwise it moves forward. OPR is operator. When A=2, it represents the cal

culation of “+”. When A=12, it means “ >” . When A=16 means the operator of “read” which reads data from the top of stack. When A=0, it means fetch return address.

Page 37: Chapter 8 Intermediate Code Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University.

[email protected] 37

8.6 Assembly code

Assembly code is a kind of intermediate code. Compared with three-address code and four-address code, it has the following advantages:

1 It is easier to be translated into machine code, in addition, its code is mapped to machine code one by one. .

2 It needn’t to be calculated the transfer address, because it often use symbol to represent address.

Page 38: Chapter 8 Intermediate Code Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University.

[email protected] 38

It can use all kinds of bite to represent data, and needn’t to be transferred.

Example 8.5 a+(-b*c+d)*e might be translated into the following assembly code

Page 39: Chapter 8 Intermediate Code Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University.

[email protected] 39

Mov ax,b

Neg ax

Mov bx,c

Imul bx

Mov bx,d

Add ax,bx

Mov bx,e

Imul bx

Mov bx,a

Add ax,bx

Mov t,ax

Page 40: Chapter 8 Intermediate Code Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University.

[email protected] 40

We will explain the assembly code above by examples, such as:

Mov ax,b means storing data b to variable ax, Neg ax means that the value ax is negative. Imul bx means multiplying value bx by value ax,

and then stores their result to ax. Add ax,bx means add value bx to value ax, and th

en stores their result to ax. t means a temporary variable.