CHAPTER 5 Compiler 5.1 Basic Compiler Concepts. Basic Compiler Concepts 1. Lexical Analysis (Lexical...

54
CHAPTER 5 Compiler 5.1 Basic Compiler Concepts Source program Lexical analysis Token Table m anagem ent Syntax analysis Parse tree Interm ediate code generation Interm ediate code Errorhandling Codeoptim alization Interm ediate code Code generation M achine code 編編編編編編編編

Transcript of CHAPTER 5 Compiler 5.1 Basic Compiler Concepts. Basic Compiler Concepts 1. Lexical Analysis (Lexical...

CHAPTER 5 Compiler5.1 Basic Compiler Concepts

Source program

Lexical analysis Token

Table management

Syntax analysis Parse tree

Intermediate code generation Intermediate code

Error handling

Code optimalization Intermediate code

Code generation

Machine code

編譯器執行的功能

Basic Compiler Concepts

1. Lexical Analysis (Lexical Analyzer 或 Scanner)

Read the source program one character at a time, carving the some program into a sequence of atomic units called token.

Token (token type, token value)

Basic Compiler Concepts

PROGRAM MAIN;VARIABLE INTEGER:U,V,M;U = 5;V = 7;CALL S1(U ,V , M );ENP;SUBPOUTINE S1( INTEGER : X , Y , M ) ;M = X + Y + 2.7;ENS;

FRANCIS語言所寫之程式

Basic Compiler Concepts

PROGRAM MAIN;(2,21) (5,3) (1,1)

VARIABLE INTEGER: U , V , M ;(2,25) (2,14) (1,12) (5,1) (1,11) (5,5) (1,11) (5,6) (1,1)

U = 5 ;(5,1) (1,4) (3,1) (1,1)

V = 7 ;(5,5) (1,4) (3,2) (1,1)

CALL S1 ( U , V , M ) ;(2,3) (5,10) (1,2) (5,1) (1,11) (5,5) (1,11) (5,6) (1,3) (1,1)

ENP ;(2,6) (1,1)

SUBPOUTINE S1 ( INTEGER : X , Y , M ) ;(2,23) (5,10) (1,2) (2,14) (1,12) (5,8) (1,11) (5,4) (1,11) (5,9) (1,3) (1,1)

M = X + Y + 2.7 ;(5,9) (1,4) (5,8) (1,5) (5,4) (1,5) (4,1) (1,1)

ENS ;

(2,7) (1,1)FRANCIS語言所寫之程式,被轉換成記號的格式

Basic Compiler Concepts

2. Syntax Analysis (Syntax Analyzer 或 Parser)

The grammar specified the form, or syntax, of legal

statements in the language.

<id-list> ::= id | <id-list>,id

<assign> ::= id:=<exp>

<exp> ::= <term> | <exp>+<term> | <exp>-<term>

<term> ::= <factor> | <term>*<factor> | <term> DIV<factor>

<factor> ::= id | int | (<exp>)

<read> ::= READ(<id-list>)

<write> ::= WRITE(<id-list>) PASCAL語言之部份文法

Basic Compiler Concepts

<id-list> ::= id | <id-list>,id

<assign> ::= id:=<exp>

<read> ::= READ(<id-list>)

<write> ::= WRITE(<id-list>) PASCAL語言之部份文法

<read>

<id-list>

READ ( id )

VALUEREAD (VALUE)敘述之語法樹

Parse Tree

Basic Compiler Concepts<assign> ::= id:=<exp>

<exp> ::= <term> | <exp>+<term> | <exp>-<term>

<term> ::= <factor> | <term>*<factor> | <term> DIV<factor>

<factor> ::= id | int | (<exp>) PASCAL語言之部份文法 <assign>

<exp>

<exp>

<term> <term>

<term> <term>

<factor> <factor> <factor> <factor>

id := id DIV int - id * id

VARIANCE SUMSQ 100 MEAN MEAN

VARIANCE:= SUMSQ DIV 100 - MEAN * MEAN敘述之語法樹

Basic Compiler Concepts

Syntax Error

<term>

<factor> <factor>

id + / id

A B A + / B敘述之語法樹

<assign> ::= id:=<exp>

<exp> ::= <term> | <exp>+<term> | <exp>-<term>

<term> ::= <factor> | <term>*<factor> | <term> DIV<factor>

<factor> ::= id | int | (<exp>) PASCAL語言之部份文法

Basic Compiler Concepts

3. Intermediate Code Generation

Three Address Code

(operator , operand1 , operand2 , Res

ult)

A=B+C (+ , B , C , A)

SUM : =A/B*C ,可以被分解成 T1=A/B (/ , A , B , T1)

T2=T1*C (* , T1 , C , T2)

SUM=T2 (= , T2 , , SUM)

Basic Compiler Concepts SUM : =A/B*C ,可以被分解成 T1=A/B (/ , A , B , T1)

T2=T1*C (* , T1 , C , T2)

SUM=T2 (= , T2 , , SUM) <assign>

<exp>

<exp>

<term>

<term> <term>

<factor> <factor> <factor>

id := id DIV id * id

SUM A B C

敘述 SUM:=A/B*C之語法樹

Basic Compiler Concepts

4. Code Optimization

Improve the intermediate code (or machine code),

so that the ultimate object program run fast

and/or takes less space

FOR I:= 1 To 10 Do A:=10;begin FOR I:= 1 To 10 Do

A:=10; begin

B[I+1]:= C[I+1]+A; J:== I + 1; end B[J]:= C[J]+A; 未最佳化 end

最佳化後

Basic Compiler Concepts

5. Code Generation

* Allocate memory location

* Select machine code for each intermediate code

* Register allocation: utilize registers as efficientl

y as possible

(+ , B , C , A) 我們可以得到

MOV AX,B

ADD AX,C

MOV A,AX

Basic Compiler Concepts

SUM : =A/B*C

(/ , A , B , T1) MOV AX,A

DIV B

MOV T1,AX

(* , T1 , C , T2) MOV AX,T1

MUL C

MOV T2,AX

(= , T2 , , SUM) MOV AX,T2

MOV SUM,AX

Basic Compiler Concepts

(/ , A , B , T1) MOV AX,A DIV B MOV T1,AX (* , T1 , C , T2) MOV AX,T1 MUL C MOV T2,AX (= , T2 , , SUM) MOV AX,T2 MOV SUM,AX

再作一次碼的最佳化

Basic Compiler Concepts

6. Table Management and Error Handling

Token, symbol table, reserved word table, delimiter tab

le, constant table,… etc.

* 五大功能之每一功能均做一次處理,如此就是五次處理。

* 也可以把幾個功能合併在同一次處理。

* 它至少是二次處理。

Grammar

5.2 Grammar 1. Grammar Backus Naur Form Grammar consists of a set of

rules, each which defines the syntax of some

construct in the programming language.

<id-list> ::= id | <id-list>,id

<assign> ::= id:=<exp>

<exp> ::= <term> | <exp>+<term> | <exp>-<term>

<term> ::= <factor> | <term>*<factor> | <term> DIV<factor>

<factor> ::= id | int | (<exp>)

<read> ::= READ(<id-list>)

<write> ::= WRITE(<id-list>) PASCAL語言之部份文法

Non-terminal symbol Terminal symbol

Grammar

2. Parse Tree (Syntax Tree)

It is often convenient to display the analysis of source

statement in terms of a grammar as a tree.

<read>

<id-list>

READ ( id )

VALUEREAD (VALUE)敘述之語法樹

Grammar

3. Precedence and associativity

Precedence *, / > +, - Associativity a + b + c ( (a + b) + c)

Left associativity

Right associativity

Grammar

4. Ambiguous Grammar

There is more than one possible parse

tree for a given statement. <start>

<term>

<term>

<term> <term> <term>

id + id - id

<start>

<term>

<term>

<term> <term> <term>

id + id - id

Grammar

<start>

<term>

<term>

<term> <term> <term>

id + id - id

<start>

<term>

<term>

<term> <term> <term>

id + id - id

<start> ::= <term>

<term> ::= id | <term>+<term> | <term>-<term>

Ambiguous Grammar

Lexical Analysis5.3 Lexical Analysis

Program 內有下列幾類 Token:

a. Identifier

b. Delimiter

c. Reserved Word

d. Constant integer, float, string

1. Identifier

<ident> ::= <letter> | <ident> <letter> | <ident> <digit

>

<letter>::= A | B | C | …..

<digit>::= 0 | 1 | 2 |…..

Multiple character token

Lexical Analysis2. Token and Tables

1 ;2 (3 )4 =5 +6 -7 *8 /9 10 ‘11 ’12 :Table 1 Delimiters

Lexical Analysis2. Token and Tables

1. AND2. BOOLEAN3. CALL4. DIMENSION5. ELSE6. ENP7. ENS8. EQ9. GE10. GT11. GTO12. IF13. INPUT14. INTEGER15. LABEL16. LE17. LT18. NE19. OR20. OUTPUT21. PROGRAM22. REAL23. SUBROUTINE24. THEN25. VARIABLE

Table 2 (Reserved Word Table)

Lexical Analysis2. Token and Tables

1 5

2 7

Table 3 (Integer Table)

1 2.7

Table 4 (Real Number Table)

Lexical Analysis2. Token and Tables

Identifier Subroutine Type Pointer

1 U 323 MAIN4 Y 105 V 36 M 378 X 109 M 1010 S1

Table 5 (Identifier Table)

Lexical Analysis2. Token and Tables

PROGRAM MAIN;(2,21) (5,3) (1,1)

VARIABLE INTEGER: U , V , M ;(2,25) (2,14) (1,12) (5,1) (1,11) (5,5) (1,11) (5,6) (1,1)

U = 5 ;(5,1) (1,4) (3,1) (1,1)

V = 7 ;(5,5) (1,4) (3,2) (1,1)

CALL S1 ( U , V , M ) ;(2,3) (5,10) (1,2) (5,1) (1,11) (5,5) (1,11) (5,6) (1,3) (1,1)

ENP ;(2,6) (1,1)

SUBPOUTINE S1 ( INTEGER : X , Y , M ) ;(2,23) (5,10) (1,2) (2,14) (1,12) (5,8) (1,11) (5,4) (1,11) (5,9) (1,3) (1,1)

M = X + Y + 2.7 ;(5,9) (1,4) (5,8) (1,5) (5,4) (1,5) (4,1) (1,1)

ENS ;

(2,7) (1,1)FRANCIS語言所寫之程式,被轉換成記號的格式

Token Specifier(Token Type, Token Value)

Table Entry

Syntax Analysis5.4 Syntax Analysis

1. Building the Parse Tree

a. Top down method

Begin with the rule of the grammar,

and attempt to construct the tree so

that the terminal nodes match the

statements being analyzed.

b. Bottom up method

Begin with the terminal nodes of the

tree, and attempt to combine these into

successively high level nodes until the

root is reached.

Syntax Analysis * Top down method

Begin with the rule of the grammar,

and attempt to construct the tree so

that the terminal nodes match the

statements being analyzed. <start>

<term>

<term>

<term>

id + id - id

Syntax Analysis * Bottom up method

Begin with the terminal nodes of the

tree, and attempt to combine these into

successively high level nodes until the

root is reached.

<term>

<term> <term> <term>

id + id - id

Syntax Analysis2. Operator Precedence Parser Bottom up parser

READ ; := + - ( ) idREAD =

; < > <:= > < < < <+ > > > < > <- > > > < > <( < < < = <) > > >

id > > > >

Precedence Matrix

Syntax AnalysisREAD ; := + - ( ) id

READ =; < > <

:= > < < < <+ > > > < > <- > > > < > <( < < < = <) > > >

id > > > >

Stack input< READ(id);<READ (id)<READ = ( id)<READ = ( <id )<READ = ( <id> )<READ = ( = id-list )<READ = ( = id-list ) >read

<read>

<id-list>

READ ( id )

VALUEREAD (VALUE)敘述之語法樹

Syntax Analysis READ ; := + - ( ) idREAD =

; < > <:= > < < < <+ > > > < > <- > > > < > <( < < < = <) > > >

id > > > >Stack input< id + id - id<id + id - id<id> + id - id<term + id - id<term + < id > - id<term + term > - id<term - < id<term - <id><term - term>term

<start>

<term>

<term>

<term> <term> <term>

id + id - id

Syntax Analysis

Stack input< id + id - id<id + id - id<id> + id - id<term + id - id<term + < id > - id<term + term > - id<term - < id<term - <id><term - term>term

Generally use a stack to save tokens that have

been scanned but not yet parsed

<start> ::= <term>

<term> ::= id | <term>+<term> | <term>-<term>

Syntax Analysis3. Recursive Descent Parser Top down method a. leftmost derivation It must be possible to decide which

alternative to used by examining the next input token

<stmt> id, READ, WRITE

<stmt> ::= <assign> | <read> | <write>

<id-list> ::= id | <id-list>,id

<assign> ::= id:=<exp>

<exp> ::= <term> | <exp>+<term> | <exp>-<term>

<term> ::= <factor> | <term>*<factor> | <term> DIV<factor>

<read> ::= READ(<id-list>)

<write> ::= WRITE(<id-list>) PASCAL語言之部份文法

Syntax Analysis b. left recursive Top down parser can not be used with

grammar that contains left recursive. Because unable to decide between its alternatives tokens.

both id and <id-list> can begin with id.

<id-list> ::= id | <id-list>,id

<assign> ::= id:=<exp>

<exp> ::= <term> | <exp>+<term> | <exp>-<term>

<term> ::= <factor> | <term>*<factor> | <term> DIV<factor>

<factor> ::= id | int | (<exp>)

<read> ::= READ(<id-list>)

<write> ::= WRITE(<id-list>) PASCAL語言之部份文法

Syntax AnalysisModified for recursive descent parser

<id-list> ::= id {, id}

<assign> ::= id:=<exp>

<exp> ::= <term> { +<term> | -<term> }

<term> ::= <factor> { *<factor> | DIV<factor> }

<factor> ::= id | int | (<exp>)

<read> ::= READ(<id-list>)

<write> ::= WRITE(<id-list>) PASCAL語言之部份文法

Code Generation5.5 Code Generation

When the parser recognizes a portion of the source program according to some rule of grammar, the corresponding routine is executed.

Semantic Routine or Code Generation Routines

1.Operator precedence parser When sub-string is reduced to nonterminal

2.Recursive descent parser When procedure return to its caller, indicating su

ccess.

Code Generation<start> ::= <term>

<term> ::= id | <term>+<term> | <term>-<term>

<start>

<term>

<term>

<term> <term> <term>

id + id - id

<term> ::= <term>1 + <term>2 MOV AX, <term>1 ADD AX, <term>2 MOV <term>, AX

<term> ::= <term>1 - <term>2 MOV AX, <term>1 SUB AX, <term>2 MOV <term>, AX

<term> ::= id add id to <term>

Code Generation

直接產生 Assembly instructions 或 Machine codes 太細

故先翻成 Intermediate Form

Intermediate Form

5.6 Intermediate Form

Three Address Code (Quadruple Form) (operator , operand1 , operand2 , Result)

<term> ::= <term>1 + <term>2

(+, <term>1, <term>2, <term>)

<term> ::= <term>1 - <term>2

(-, <term>1, <term>2, <term>)

<term> ::= id

add id to <term>

Intermediate Form

Variance := sumsq DIV 100 - mean * mean

(DIV, sumsq, #100, i1)

(*, mean, mean, i2)

(-, i1, i2, i3)

(:=, i3, , variance)

Machine Independent Compiler Features

5.7 Machine Independent Compiler Features

1. Storage Allocation

a. Storage Allocation

* Static Allocation

Allocate at compiler time

* Dynamic Allocation

Allocate at run time

Auto : Function call STACK

Controlled : malloc( ), free( ) HEAP

Machine Independent Compiler Features2. Activation Record

Each function call creates an activation record that contains storage for all the variables used by the function, return address,… etc.

Variables

Return Address

Next

Previous

Variables

Return Address

Next

Previous

Stack

Machine Independent Compiler FeaturesActivation Record

MAIN

Call SUB

MAIN Variables

Return Address

Next

Previous

Stack

MAIN

To OS

Machine Independent Compiler FeaturesActivation Record

SUB Variables

MAIN

Return Address

Next

Previous Call SUB

MAIN Variables

Return Address SUB

Next Call SUB

Previous

Stack

MAIN

SUB

To OS

Machine Independent Compiler FeaturesActivation Record

Return Address

SUB Variables

MAIN

Return Address

Next

Previous Call SUB

MAIN Variables

Return Address SUB

Next Call SUB

Previous

Stack

MAIN

SUB

SUB To OS

Machine Independent Compiler Features

3. Prologue and Epilogue

The compiler must generate additional code to manage the activation records themselves.

a. Prologue

The code to create a new activation record

b. Epilogue

The code to delete the current activation record

Machine Independent Compiler Features

4. Structure Variables

Array, Record, String, Set …..

B:array[0..3,0..1] of integer

B[0][0] B[0][1]

B[1][0] B[1][1]

B[2][0] B[2][1]

B[3][0] B[3][1]

B[0][0] B[0][1] B[1][0] B[1][1] B[2][0] B[2][1] B[3][0] B[3][1]

此陣列為列優先

B[0][0] B[1][0] B[2][0] B[3][0] B[0][1] B[1][1] B[2][1] B[3][1]

此陣列為行優先

Machine Independent Compiler Features

Type B[a-b] [c-d]

Address of B[s][t]

Row Major

[(s - a) *(d - c +1) + (t - c) ] * sizeof(Type) + Base address

Column Major

[(t - c) *(b - a +1) + (s - a) ] * sizeof(Type) + Base address

B[0][0] B[0][1] B[1][0] B[1][1] B[2][0] B[2][1] B[3][0] B[3][1]

此陣列為列優先

B[0][0] B[1][0] B[2][0] B[3][0] B[0][1] B[1][1] B[2][1] B[3][1]

此陣列為行優先

Machine Independent Compiler Features

5. Code Optimization

For I:= 1 to 10 Begin x[I, 2*J-1] := T[I, 2*J]; Table[I] := 2**I; END

T1:= 2 *J;T2 := T1 - 1;K := 1;For I:= 1 to 10 Begin x[I, T2] := T[I, T1]; K := K * 2; Table[I] := K; END

a. Common Sub-expression

b. Loop In-variants

c. Reduction in Strength

Compiler Design Option

5.8 Compiler Design Option

1. Interpreter

An interpreter processes a source program written

in a high level language, just as a compiler does.

The main difference is that interpreters execute a

version of the source directly.

An interpreter can be viewed as a set of functions,

the execution of these functions is driven by the

internal form of the program.

Compiler Design Option

2. P Code Compiler

* P Code 就是 Byte Code, 是一種與機器無關 (Machine Independent) 的語言

* 可以跨平台在不同種類的電腦內執行。

Source Java Byte

Program Interpreter Code

Byte Java

Code Run Module Run

Compiler Design Option3. Compiler-Compiler

A software tool that can be used to help in the task of compiler construction.

Uses Finite State Automata

YACC Parser Generator

LEX Scanner GeneratorUnix

Compiler Design Option

4. Cross Compiler

Program Cross 80XX Machine

Source Compiler Code 工作站

80XX Machine 個人電腦 Run

Code