Ch02

25
2 형식 언어 컴파일러 입문

Transcript of Ch02

Page 1: Ch02

제 2 장

형식 언어

컴파일러 입문

Page 2: Ch02

목 차

2.1 언어(Language)

2.2 문법(Grammar)

2.3 문법의 분류

Inroduction to FL theory Page 2

꼭 기억해야 할 세 가지 개념

1. 언어의 정의

2. 문법의 정의 및 개념

3. 인식기의 의미

Page 3: Ch02

Language Basic definitions

(1) alphabet

a finite set of symbols.

ex) T1 = {ㄱ,ㄴ,ㄷ,...,ㅎ,ㅏ,ㅑ, … ,ㅡ,ㅣ}

T2 = {A,B,C, … ,Z, a, b, c, …, z}

T3 = {main, int, char, …, while}

(2) string(or sentence, word)

a sequence of symbols from some alphabet T.

(3) length

the number of symbols in the string.

denoted by |ω|

Inroduction to FL theory Page 3

Page 4: Ch02

(4) empty string

a string consisting of no symbols.

denoted by ε or λ.

(5) T* denotes the set of all strings of symbols over the

alphabet T, including the empty string. T+ = T* - {ε}

T* : T star

T+ : T dagger

(6) Language is any set of strings over an alphabet.(Text p.40)

(or A Language L over the alphabet T is a subset of T*.)

L ⊆ T*

Inroduction to FL theory Page 4

Page 5: Ch02

Two problems (1) How do we represent a language ?

If the language is finite, the answer is easy.

If the language is infinite, we are faced with the problem

of finding a finite representation for the language.

Set description

Grammar : Generating Scheme

Recognizer : Recognition Scheme

(2) Does there exist a finite representation for every language ?

No !

This is not always possible.

Inroduction to FL theory Page 5

L1 = {a, ab, ba, aba}

L2 = T* = {, a, b, aa, ab, ….}

L3 = {anbn | n1}

L4 = {wwR | wT*}

Page 6: Ch02

More definitions (1) concatenation

u = a1a2a3...an, v = b1b2b3...bm , u • v = a1a2a3...anb1b2b3...bm

u • v를 보통 uv로 표기. uε= u = εu ∀u,v ∈ T*, uv ∈ T*. |uv| = |u| + |v|

(2) an represents n a's.

a0 = ε

(3) the reversal of a string ω, denoted ωR is the string ω

written in reverse order:

i.e., if ω = a1a2...an then ωR = anan-1...a1.

Inroduction to FL theory Page 6

(ωR)R=ω

Page 7: Ch02

(4) language product

LL' = {xy| x ∈ L and y ∈ L'}

(5) The powers of a language L are defined recursively by:

L0 = {ε}

Ln = LLn-1 for n 1.

(6) L* : reflexive transitive closure

= L0 ∪ L1 ∪ L2 ∪ ...∪ Ln ∪… =

(7) L+ : transitive closure

= L1 ∪ L2 ∪... ∪ Ln ∪ ...

= L* - L0

0i

iL

Inroduction to FL theory Page 7

L = {a, ab, ba, aba}

L0 = {}

L2 = {aa, aab, aba, aaba, …}

Page 8: Ch02

Grammar

Language

문장(sentence)들을 원소로 갖는 집합

언어를 어떻게 표현할 것인가 ?

Grammar

terminal : 정의된 언어의 알파벳

nonterminal :

스트링을 생성하는 데 사용되는 중간 과정의 심볼

언어의 구조를 정의하는데 사용

grammar symbol (V)

Inroduction to FL theory Page 8

Page 9: Ch02

Inroduction to FL theory Page 9

G = (VN, VT, P, S)

VN : a finite set of nonterminal symbols

VT : a finite set of terminal symbols

VN ∩ VT = , VN∪ VT = V

P : a finite set of production rules

α → β, α∈ V+, β∈ V*

lhs rhs

S : start symbol(sentence symbol)

Page 10: Ch02

[예] G = ( {S, A}, {a, b}, P, S ) Text p.47 [예 8]

P : S → aAS S → a

A → SbA A → ba A → SS

⇒ S → aAS | a

A → SbA | ba | SS

Inroduction to FL theory Page 10

Page 11: Ch02

Derivation 1. ⇒ : “directly produce” or “directly derive”

if α → β∈ P and , δ∈ V* then

αδ ⇒ βδ

2. ⇒ : Suppose α1,α2,...,αn ∈ V* and α1 ⇒α2 ⇒ … ⇒αn,

then α1 ⇒ αn

(zero or more derivations)

3. ⇒ : one or more derivations.

cf) → : production rule에서 사용.

“may be replaced by”

⇒ : derivation할 때 사용한다.

Inroduction to FL theory Page 11

*

+

*

Page 12: Ch02

Ex. P : S → aA | bB | ε

A → bS

B → aS

S ⇒ abba 유도 과정

Inroduction to FL theory Page 12

S ⇒ aA (생성규칙 S → aA)

⇒ abS (생성규칙 A → bS)

⇒ abbB (생성규칙 S → bB)

⇒ abbaS (생성규칙 B → aS)

⇒ abba (생성규칙 S → )

Page 13: Ch02

L(G) : Language generated by grammar G L(G) = {ω | S ⇒ ω, ω ∈ VT

*}

☞ ω is a sentential form of G if S ⇒ ω and ω ∈ V*.

ω is a sentence of G if S ⇒ ω and ω ∈ VT*.

Inroduction to FL theory Page 13

*

*

*

S ⇒ aA ( 생성규칙 S → aA를 적용 )

⇒ abS ( 생성규칙 A → bS를 적용 )

⇒ abbB ( 생성규칙 S → bB를 적용 )

⇒ abbaS ( 생성규칙 B → aS를 적용 )

⇒ abba ( 생성규칙 S → 를 적용 )

S ⇒ abba

sentential form : aA, abS, abbB, abbaS, abba

sentence : abba

Page 14: Ch02

G1 = ( {S}, {a}, P, S ) 을 이용하여 L(G1)

P : S → a | aS

L (G1) = { an | n 1 }

Language design

Grammar Language

generation

design

Inroduction to FL theory Page 14

Page 15: Ch02

주어진 문법으로부터 생성되는 언어 발견 과정

1. start symbol로부터 길이가 짧은 순으로 생성규칙 적용

2. 생성된 문장들의 형태를 고려하여 일정한 규칙 발견

G = ( {A, B, C}, {a, b, c}, P, A)

P : A → abc A → aBbc

Bb → bB Bc → Cbcc

bC → Cb aC → aaB

aC → aa

Inroduction to FL theory Page 15

L(G) = { anbncn | n 1 }

Page 16: Ch02

(===>) ex1) S → 0S1 | 01 ex2) S → aSb | c

ex3) A → aB

B → bB | b

ex4) O → a

O → aE

E → aO ex5) A → abc A → aBbc

Bb → bB Bc → Cbcc

bC → Cb aC → aaB

aC → aa

Inroduction to FL theory

Page 16

Page 17: Ch02

문법 기술 방법

embedded rule : S → aSb

right-recursive rule : A → aA

left-recursive rule : A → Aa

L = { an | n 0 } 일 때 문법 : A → aA | ε

L = { an | n 1 } 일 때 문법 : A → aA | a

Embedded production A → aAb

ex1) L1 = { anbn | n 0 }

ex2) L2 = { 0i1j | i j, i,j 1 }

ex3) Constructs of Conventional PL

Inroduction to FL theory Page 17

Page 18: Ch02

ex) C 언어의 정수 선언 부분 :

정수선언 부분은 여러 개의 정수선언으로 구성되며 하나의 선

언은 int a,a,a;와 같은 형태를 갖는다. 여기서 a는 임의의

identifier를 나타낸다.

그리고 ; 으로 각각의 선언을 구분한다. 예를 들어, int i,j; int sum;

과 같다.

※ 문법을 고안할 때, nonterminal의 이름은 구문 구조를

대변할 수 있는 명칭으로 쓰는 것이 바람직하다.

Inroduction to FL theory Page 18

Page 19: Ch02

In order to prove that a grammar generates a language L

i) Every sentence generated by the grammar is in L. ii) Every string in L can be generated by the grammar. 교과서 55쪽 [예16]

proof) (=>) Every sentence derivable from S is balanced. (<=) Every balanced string is derivable from S.

Inroduction to FL theory Page 19

G = ( { S }, { ( , ) }, {S → (S)S |ε}, S )

⇔ All strings of balanced parentheses.

Page 20: Ch02

(=>) Every sentence derivable from S is balanced.

(i.e., S ⇒ ω, ω: balanced)

By induction on the number of steps in a derivation.

i) n = 1 일 때, S ⇒ ε, ε is surely balanced.

ii) Suppose that all derivations of fewer than n steps

produce balanced sentences.

iii) Consider a leftmost derivation of exactly n steps.

S ⇒ (S)S ⇒ (x)S ⇒ (x)y

By the hypothesis x, y : balanced.

Thus (x)y balanced.

Inroduction to FL theory Page 20

*

* *

*

Page 21: Ch02

(<=) Every balanced string is derivable from S.

By induction on the length of a string.

i) |ω| = 0, S ⇒ ε

(the empty string is derivable from S.)

ii) Assume that every balanced string of length less than 2n is derived from S.

iii) Consider a balanced string ω of length 2n.

Let (x) : shortest prefix of ω being balanced.

Thus ω = (x)y, where x, y : balanced.

Since |x|, |y | < 2n, they are derivable from S by inductive hypothesis.

Thus S ⇒ (S)S ⇒ (x)S ⇒ (x)y = ω

Therefore, (x)y is also derivable from S.

Inroduction to FL theory Page 21

* *

Page 22: Ch02

Chomsky Hierarchy

Noam Chomsky

According to the form of the productions.

α → β ∈ P Type 0 : No restrictions(unrestricted grammar)

Type 1 : Context-sensitive grammar(CSG).

→ β, | | | β|

Type 2 : Context-free grammar(CFG).

A → , where A : nonterminal, ∈ V*.

Type 3 : Regular grammar(RG).

A → tB or A → t, (right-linear)

A → Bt or A → t, (left-linear)

where, A, B : nonterminal, t ∈ VT*.

Inroduction to FL theory Page 22

Page 23: Ch02

REL (Recursively Enumerable Language)

CSL (Context Sensitive Language)

CFL (Context Free Language)

RL (Regular Language)

Examples of Formal Language

simple matching language : Lm = {anbn | n ≥ 0} CFL

double matching language : Ldm = {anbncn | n ≥ 0} CSL

mirror image language : Lmi = {ωωR | ω ∈ VT*} CFL

palindrome language : Lr = {ω | ω = ωR } CFL

parenthesis language : Lp = {ω | ω: balanced parenthesis} CFL

Inroduction to FL theory Page 23

Page 24: Ch02

The Chomsky Hierarchy of Languages

unrestricted language

context-sensitive language

context-free language

regular language

Inroduction to FL theory Page 24

Page 25: Ch02

Languages & Recognizers

Grammar Language Recognizer

type 0

(unrestricted)

recursively

enumerable set

Turing

machine

type 1

(context-sensitive)

context-sensitive

language

Linear Bounded

Automata

type 2

(context-free)

context-free

language

Pushdown

Automata

type 3

(regular)

regular

language Finite Automata

Inroduction to FL theory Page 25