CHAPTER 5 Compiler 5.1 Basic Compiler Concepts. Basic Compiler Concepts 1. Lexical Analysis (Lexical...

CHAPTER 5 Compiler5.1 Basic Compiler Concepts

Source program

Lexical analysis Token

Table management

Syntax analysis Parse tree

Intermediate code generation Intermediate code

Error handling

Code optimalization Intermediate code

Code generation

Machine code

編譯器執行的功能

Basic Compiler Concepts

1. Lexical Analysis (Lexical Analyzer 或 Scanner)

Read the source program one character at a time, carving the some program into a sequence of atomic units called token.

Token (token type, token value)


PROGRAM MAIN;VARIABLE INTEGER:U,V,M;U = 5;V = 7;CALL S1(U ,V , M );ENP;SUBPOUTINE S1( INTEGER : X , Y , M ) ;M = X + Y + 2.7;ENS;

FRANCIS語言所寫之程式


PROGRAM MAIN;(2,21) (5,3) (1,1)

VARIABLE INTEGER: U , V , M ;(2,25) (2,14) (1,12) (5,1) (1,11) (5,5) (1,11) (5,6) (1,1)

U = 5 ;(5,1) (1,4) (3,1) (1,1)

V = 7 ;(5,5) (1,4) (3,2) (1,1)

CALL S1 ( U , V , M ) ;(2,3) (5,10) (1,2) (5,1) (1,11) (5,5) (1,11) (5,6) (1,3) (1,1)

ENP ;(2,6) (1,1)

SUBPOUTINE S1 ( INTEGER : X , Y , M ) ;(2,23) (5,10) (1,2) (2,14) (1,12) (5,8) (1,11) (5,4) (1,11) (5,9) (1,3) (1,1)

M = X + Y + 2.7 ;(5,9) (1,4) (5,8) (1,5) (5,4) (1,5) (4,1) (1,1)

ENS ;

(2,7) (1,1)FRANCIS語言所寫之程式，被轉換成記號的格式


2. Syntax Analysis (Syntax Analyzer 或 Parser)

The grammar specified the form, or syntax, of legal

statements in the language.

<id-list> ::= id | <id-list>,id

<assign> ::= id:=<exp>

<exp> ::= <term> | <exp>+<term> | <exp>-<term>

<term> ::= <factor> | <term>*<factor> | <term> DIV<factor>

<factor> ::= id | int | (<exp>)

<read> ::= READ(<id-list>)

<write> ::= WRITE(<id-list>) PASCAL語言之部份文法






<read>

<id-list>

READ ( id )

VALUEREAD (VALUE)敘述之語法樹

Parse Tree

Basic Compiler Concepts<assign> ::= id:=<exp>



<factor> ::= id | int | (<exp>) PASCAL語言之部份文法 <assign>

<exp>

<exp>

<term> <term>

<term> <term>

<factor> <factor> <factor> <factor>

id := id DIV int - id * id

VARIANCE SUMSQ 100 MEAN MEAN

VARIANCE：= SUMSQ DIV 100 - MEAN * MEAN敘述之語法樹


Syntax Error

<term>

<factor> <factor>

id + / id

A B A + / B敘述之語法樹




<factor> ::= id | int | (<exp>) PASCAL語言之部份文法


3. Intermediate Code Generation

Three Address Code

(operator ， operand1 ， operand2 ， Res

ult)

A=B+C (+ ， B ， C ， A)

SUM ： =A/B*C ，可以被分解成 T1=A/B (/ ， A ， B ， T1)

T2=T1*C (* ， T1 ， C ， T2)

SUM=T2 (= ， T2 ，， SUM)

Basic Compiler Concepts SUM ： =A/B*C ，可以被分解成 T1=A/B (/ ， A ， B ， T1)

T2=T1*C (* ， T1 ， C ， T2)

SUM=T2 (= ， T2 ，， SUM) <assign>

<exp>

<exp>

<term>

<term> <term>

<factor> <factor> <factor>

id := id DIV id * id

SUM A B C

敘述 SUM：=A/B*C之語法樹


4. Code Optimization

Improve the intermediate code (or machine code),

so that the ultimate object program run fast

and/or takes less space

FOR I：= 1 To 10 Do A：=10；begin FOR I：= 1 To 10 Do

A：=10； begin

B[I+1]：= C[I+1]+A； J：== I + 1； end B[J]：= C[J]+A；未最佳化 end

最佳化後


5. Code Generation

* Allocate memory location

* Select machine code for each intermediate code

* Register allocation: utilize registers as efficientl

y as possible

(+ ， B ， C ， A) 我們可以得到

MOV AX,B

ADD AX,C

MOV A,AX


SUM ： =A/B*C

(/ ， A ， B ， T1) MOV AX,A

DIV B

MOV T1,AX

(* ， T1 ， C ， T2) MOV AX,T1

MUL C

MOV T2,AX

(= ， T2 ，， SUM) MOV AX,T2

MOV SUM,AX


(/ ， A ， B ， T1) MOV AX,A DIV B MOV T1,AX (* ， T1 ， C ， T2) MOV AX,T1 MUL C MOV T2,AX (= ， T2 ，， SUM) MOV AX,T2 MOV SUM,AX

再作一次碼的最佳化


6. Table Management and Error Handling

Token, symbol table, reserved word table, delimiter tab

le, constant table,… etc.

* 五大功能之每一功能均做一次處理，如此就是五次處理。

* 也可以把幾個功能合併在同一次處理。

* 它至少是二次處理。

Grammar

5.2 Grammar 1. Grammar Backus Naur Form Grammar consists of a set of

rules, each which defines the syntax of some

construct in the programming language.








Non-terminal symbol Terminal symbol

Grammar

2. Parse Tree (Syntax Tree)

It is often convenient to display the analysis of source

statement in terms of a grammar as a tree.

<read>

<id-list>

READ ( id )


Grammar

3. Precedence and associativity

Precedence *, / > +, - Associativity a + b + c ( (a + b) + c)

Left associativity

Right associativity

Grammar

4. Ambiguous Grammar

There is more than one possible parse

tree for a given statement. <start>

<term>

<term>

<term> <term> <term>

id + id - id

<start>

<term>

<term>


id + id - id

Grammar

<start>

<term>

<term>


id + id - id

<start>

<term>

<term>


id + id - id

<start> ::= <term>

<term> ::= id | <term>+<term> | <term>-<term>

Ambiguous Grammar

Lexical Analysis5.3 Lexical Analysis

Program 內有下列幾類 Token:

a. Identifier

b. Delimiter

c. Reserved Word

d. Constant integer, float, string

1. Identifier

<ident> ::= <letter> | <ident> <letter> | <ident> <digit

>

<letter>::= A | B | C | …..

<digit>::= 0 | 1 | 2 |…..

Multiple character token

Lexical Analysis2. Token and Tables

1 ;2 (3 )4 =5 +6 -7 *8 /9 10 ‘11 ’12 :Table 1 Delimiters


1. AND2. BOOLEAN3. CALL4. DIMENSION5. ELSE6. ENP7. ENS8. EQ9. GE10. GT11. GTO12. IF13. INPUT14. INTEGER15. LABEL16. LE17. LT18. NE19. OR20. OUTPUT21. PROGRAM22. REAL23. SUBROUTINE24. THEN25. VARIABLE

Table 2 (Reserved Word Table)


1 5

2 7

Table 3 (Integer Table)

1 2.7

Table 4 (Real Number Table)


Identifier Subroutine Type Pointer

1 U 323 MAIN4 Y 105 V 36 M 378 X 109 M 1010 S1

Table 5 (Identifier Table)


PROGRAM MAIN;(2,21) (5,3) (1,1)

VARIABLE INTEGER: U , V , M ;(2,25) (2,14) (1,12) (5,1) (1,11) (5,5) (1,11) (5,6) (1,1)

U = 5 ;(5,1) (1,4) (3,1) (1,1)

V = 7 ;(5,5) (1,4) (3,2) (1,1)

CALL S1 ( U , V , M ) ;(2,3) (5,10) (1,2) (5,1) (1,11) (5,5) (1,11) (5,6) (1,3) (1,1)

ENP ;(2,6) (1,1)

SUBPOUTINE S1 ( INTEGER : X , Y , M ) ;(2,23) (5,10) (1,2) (2,14) (1,12) (5,8) (1,11) (5,4) (1,11) (5,9) (1,3) (1,1)

M = X + Y + 2.7 ;(5,9) (1,4) (5,8) (1,5) (5,4) (1,5) (4,1) (1,1)

ENS ;

(2,7) (1,1)FRANCIS語言所寫之程式，被轉換成記號的格式

Token Specifier(Token Type, Token Value)

Table Entry

Syntax Analysis5.4 Syntax Analysis

1. Building the Parse Tree

a. Top down method

Begin with the rule of the grammar,

and attempt to construct the tree so

that the terminal nodes match the

statements being analyzed.

b. Bottom up method

Begin with the terminal nodes of the

tree, and attempt to combine these into

successively high level nodes until the

root is reached.

Syntax Analysis * Top down method

Begin with the rule of the grammar,

and attempt to construct the tree so

that the terminal nodes match the

statements being analyzed. <start>

<term>

<term>

<term>

id + id - id

Syntax Analysis * Bottom up method

Begin with the terminal nodes of the

tree, and attempt to combine these into

successively high level nodes until the

root is reached.

<term>


id + id - id

Syntax Analysis2. Operator Precedence Parser Bottom up parser

READ ; := + - ( ) idREAD =

; < > <:= > < < < <+ > > > < > <- > > > < > <( < < < = <) > > >

id > > > >

Precedence Matrix

Syntax AnalysisREAD ; := + - ( ) id

READ =; < > <

:= > < < < <+ > > > < > <- > > > < > <( < < < = <) > > >

id > > > >

Stack input< READ(id);<READ (id)<READ = ( id)<READ = ( <id )<READ = ( <id> )<READ = ( = id-list )<READ = ( = id-list ) >read

<read>

<id-list>

READ ( id )


Syntax Analysis READ ; := + - ( ) idREAD =

; < > <:= > < < < <+ > > > < > <- > > > < > <( < < < = <) > > >

id > > > >Stack input< id + id - id<id + id - id<id> + id - id<term + id - id<term + < id > - id<term + term > - id<term - < id<term - <id><term - term>term

<start>

<term>

<term>


id + id - id

Syntax Analysis

Stack input< id + id - id<id + id - id<id> + id - id<term + id - id<term + < id > - id<term + term > - id<term - < id<term - <id><term - term>term

Generally use a stack to save tokens that have

been scanned but not yet parsed

<start> ::= <term>


Syntax Analysis3. Recursive Descent Parser Top down method a. leftmost derivation It must be possible to decide which

alternative to used by examining the next input token

<stmt> id, READ, WRITE

<stmt> ::= <assign> | <read> | <write>







Syntax Analysis b. left recursive Top down parser can not be used with

grammar that contains left recursive. Because unable to decide between its alternatives tokens.

both id and <id-list> can begin with id.








Syntax AnalysisModified for recursive descent parser

<id-list> ::= id {, id}


<exp> ::= <term> { +<term> | -<term> }

<term> ::= <factor> { *<factor> | DIV<factor> }




Code Generation5.5 Code Generation

When the parser recognizes a portion of the source program according to some rule of grammar, the corresponding routine is executed.

Semantic Routine or Code Generation Routines

1.Operator precedence parser When sub-string is reduced to nonterminal

2.Recursive descent parser When procedure return to its caller, indicating su

ccess.

Code Generation<start> ::= <term>


<start>

<term>

<term>


id + id - id

<term> ::= <term>1 + <term>2 MOV AX, <term>1 ADD AX, <term>2 MOV <term>, AX

<term> ::= <term>1 - <term>2 MOV AX, <term>1 SUB AX, <term>2 MOV <term>, AX

<term> ::= id add id to <term>

Code Generation

直接產生 Assembly instructions 或 Machine codes 太細

故先翻成 Intermediate Form

Intermediate Form

5.6 Intermediate Form

Three Address Code (Quadruple Form) (operator ， operand1 ， operand2 ， Result)

<term> ::= <term>1 + <term>2

(+, <term>1, <term>2, <term>)

<term> ::= <term>1 - <term>2

(-, <term>1, <term>2, <term>)

<term> ::= id

add id to <term>

Intermediate Form

Variance := sumsq DIV 100 - mean * mean

(DIV, sumsq, #100, i1)

(*, mean, mean, i2)

(-, i1, i2, i3)

(:=, i3, , variance)

Machine Independent Compiler Features

5.7 Machine Independent Compiler Features

1. Storage Allocation

a. Storage Allocation

* Static Allocation

Allocate at compiler time

* Dynamic Allocation

Allocate at run time

Auto : Function call STACK

Controlled : malloc( ), free( ) HEAP

Machine Independent Compiler Features2. Activation Record

Each function call creates an activation record that contains storage for all the variables used by the function, return address,… etc.

Variables

Return Address

Next

Previous

Variables

Return Address

Next

Previous

Stack

Machine Independent Compiler FeaturesActivation Record

MAIN

Call SUB

MAIN Variables

Return Address

Next

Previous

Stack

MAIN

To OS


SUB Variables

MAIN

Return Address

Next

Previous Call SUB

MAIN Variables

Return Address SUB

Next Call SUB

Previous

Stack

MAIN

SUB

To OS


Return Address

SUB Variables

MAIN

Return Address

Next

Previous Call SUB

MAIN Variables

Return Address SUB

Next Call SUB

Previous

Stack

MAIN

SUB

SUB To OS


3. Prologue and Epilogue

The compiler must generate additional code to manage the activation records themselves.

a. Prologue

The code to create a new activation record

b. Epilogue

The code to delete the current activation record


4. Structure Variables

Array, Record, String, Set …..

B：array[0..3，0..1] of integer

B[0][0] B[0][1]

B[1][0] B[1][1]

B[2][0] B[2][1]

B[3][0] B[3][1]

B[0][0] B[0][1] B[1][0] B[1][1] B[2][0] B[2][1] B[3][0] B[3][1]

此陣列為列優先

B[0][0] B[1][0] B[2][0] B[3][0] B[0][1] B[1][1] B[2][1] B[3][1]

此陣列為行優先


Type B[a-b] [c-d]

Address of B[s][t]

Row Major

[(s - a) *(d - c +1) + (t - c) ] * sizeof(Type) + Base address

Column Major

[(t - c) *(b - a +1) + (s - a) ] * sizeof(Type) + Base address

B[0][0] B[0][1] B[1][0] B[1][1] B[2][0] B[2][1] B[3][0] B[3][1]

此陣列為列優先

B[0][0] B[1][0] B[2][0] B[3][0] B[0][1] B[1][1] B[2][1] B[3][1]

此陣列為行優先


5. Code Optimization

For I:= 1 to 10 Begin x[I, 2*J-1] := T[I, 2*J]; Table[I] := 2**I; END

T1:= 2 *J;T2 := T1 - 1;K := 1;For I:= 1 to 10 Begin x[I, T2] := T[I, T1]; K := K * 2; Table[I] := K; END

a. Common Sub-expression

b. Loop In-variants

c. Reduction in Strength

Compiler Design Option

5.8 Compiler Design Option

1. Interpreter

An interpreter processes a source program written

in a high level language, just as a compiler does.

The main difference is that interpreters execute a

version of the source directly.

An interpreter can be viewed as a set of functions,

the execution of these functions is driven by the

internal form of the program.


2. P Code Compiler

* P Code 就是 Byte Code, 是一種與機器無關 (Machine Independent) 的語言

* 可以跨平台在不同種類的電腦內執行。

Source Java Byte

Program Interpreter Code

Byte Java

Code Run Module Run

Compiler Design Option3. Compiler-Compiler

A software tool that can be used to help in the task of compiler construction.

Uses Finite State Automata

YACC Parser Generator

LEX Scanner GeneratorUnix


4. Cross Compiler

Program Cross 80XX Machine

Source Compiler Code 工作站

80XX Machine 個人電腦 Run

Code

CHAPTER 5 Compiler 5.1 Basic Compiler Concepts. Basic Compiler Concepts 1. Lexical Analysis (Lexical...

Documents

Transcript of CHAPTER 5 Compiler 5.1 Basic Compiler Concepts. Basic Compiler Concepts 1. Lexical Analysis (Lexical...