Ch02
Transcript of Ch02
![Page 1: Ch02](https://reader031.fdocuments.net/reader031/viewer/2022030120/58a1dd571a28abb6678b68ab/html5/thumbnails/1.jpg)
제 2 장
형식 언어
컴파일러 입문
![Page 2: Ch02](https://reader031.fdocuments.net/reader031/viewer/2022030120/58a1dd571a28abb6678b68ab/html5/thumbnails/2.jpg)
목 차
2.1 언어(Language)
2.2 문법(Grammar)
2.3 문법의 분류
Inroduction to FL theory Page 2
꼭 기억해야 할 세 가지 개념
1. 언어의 정의
2. 문법의 정의 및 개념
3. 인식기의 의미
![Page 3: Ch02](https://reader031.fdocuments.net/reader031/viewer/2022030120/58a1dd571a28abb6678b68ab/html5/thumbnails/3.jpg)
Language Basic definitions
(1) alphabet
a finite set of symbols.
ex) T1 = {ㄱ,ㄴ,ㄷ,...,ㅎ,ㅏ,ㅑ, … ,ㅡ,ㅣ}
T2 = {A,B,C, … ,Z, a, b, c, …, z}
T3 = {main, int, char, …, while}
(2) string(or sentence, word)
a sequence of symbols from some alphabet T.
(3) length
the number of symbols in the string.
denoted by |ω|
Inroduction to FL theory Page 3
![Page 4: Ch02](https://reader031.fdocuments.net/reader031/viewer/2022030120/58a1dd571a28abb6678b68ab/html5/thumbnails/4.jpg)
(4) empty string
a string consisting of no symbols.
denoted by ε or λ.
(5) T* denotes the set of all strings of symbols over the
alphabet T, including the empty string. T+ = T* - {ε}
T* : T star
T+ : T dagger
(6) Language is any set of strings over an alphabet.(Text p.40)
(or A Language L over the alphabet T is a subset of T*.)
L ⊆ T*
Inroduction to FL theory Page 4
![Page 5: Ch02](https://reader031.fdocuments.net/reader031/viewer/2022030120/58a1dd571a28abb6678b68ab/html5/thumbnails/5.jpg)
Two problems (1) How do we represent a language ?
If the language is finite, the answer is easy.
If the language is infinite, we are faced with the problem
of finding a finite representation for the language.
Set description
Grammar : Generating Scheme
Recognizer : Recognition Scheme
(2) Does there exist a finite representation for every language ?
No !
This is not always possible.
Inroduction to FL theory Page 5
L1 = {a, ab, ba, aba}
L2 = T* = {, a, b, aa, ab, ….}
L3 = {anbn | n1}
L4 = {wwR | wT*}
![Page 6: Ch02](https://reader031.fdocuments.net/reader031/viewer/2022030120/58a1dd571a28abb6678b68ab/html5/thumbnails/6.jpg)
More definitions (1) concatenation
u = a1a2a3...an, v = b1b2b3...bm , u • v = a1a2a3...anb1b2b3...bm
u • v를 보통 uv로 표기. uε= u = εu ∀u,v ∈ T*, uv ∈ T*. |uv| = |u| + |v|
(2) an represents n a's.
a0 = ε
(3) the reversal of a string ω, denoted ωR is the string ω
written in reverse order:
i.e., if ω = a1a2...an then ωR = anan-1...a1.
Inroduction to FL theory Page 6
(ωR)R=ω
![Page 7: Ch02](https://reader031.fdocuments.net/reader031/viewer/2022030120/58a1dd571a28abb6678b68ab/html5/thumbnails/7.jpg)
(4) language product
LL' = {xy| x ∈ L and y ∈ L'}
(5) The powers of a language L are defined recursively by:
L0 = {ε}
Ln = LLn-1 for n 1.
(6) L* : reflexive transitive closure
= L0 ∪ L1 ∪ L2 ∪ ...∪ Ln ∪… =
(7) L+ : transitive closure
= L1 ∪ L2 ∪... ∪ Ln ∪ ...
= L* - L0
0i
iL
Inroduction to FL theory Page 7
L = {a, ab, ba, aba}
L0 = {}
L2 = {aa, aab, aba, aaba, …}
![Page 8: Ch02](https://reader031.fdocuments.net/reader031/viewer/2022030120/58a1dd571a28abb6678b68ab/html5/thumbnails/8.jpg)
Grammar
Language
문장(sentence)들을 원소로 갖는 집합
언어를 어떻게 표현할 것인가 ?
Grammar
terminal : 정의된 언어의 알파벳
nonterminal :
스트링을 생성하는 데 사용되는 중간 과정의 심볼
언어의 구조를 정의하는데 사용
grammar symbol (V)
Inroduction to FL theory Page 8
![Page 9: Ch02](https://reader031.fdocuments.net/reader031/viewer/2022030120/58a1dd571a28abb6678b68ab/html5/thumbnails/9.jpg)
Inroduction to FL theory Page 9
G = (VN, VT, P, S)
VN : a finite set of nonterminal symbols
VT : a finite set of terminal symbols
VN ∩ VT = , VN∪ VT = V
P : a finite set of production rules
α → β, α∈ V+, β∈ V*
lhs rhs
S : start symbol(sentence symbol)
![Page 10: Ch02](https://reader031.fdocuments.net/reader031/viewer/2022030120/58a1dd571a28abb6678b68ab/html5/thumbnails/10.jpg)
[예] G = ( {S, A}, {a, b}, P, S ) Text p.47 [예 8]
P : S → aAS S → a
A → SbA A → ba A → SS
⇒ S → aAS | a
A → SbA | ba | SS
Inroduction to FL theory Page 10
![Page 11: Ch02](https://reader031.fdocuments.net/reader031/viewer/2022030120/58a1dd571a28abb6678b68ab/html5/thumbnails/11.jpg)
Derivation 1. ⇒ : “directly produce” or “directly derive”
if α → β∈ P and , δ∈ V* then
αδ ⇒ βδ
2. ⇒ : Suppose α1,α2,...,αn ∈ V* and α1 ⇒α2 ⇒ … ⇒αn,
then α1 ⇒ αn
(zero or more derivations)
3. ⇒ : one or more derivations.
cf) → : production rule에서 사용.
“may be replaced by”
⇒ : derivation할 때 사용한다.
Inroduction to FL theory Page 11
*
+
*
![Page 12: Ch02](https://reader031.fdocuments.net/reader031/viewer/2022030120/58a1dd571a28abb6678b68ab/html5/thumbnails/12.jpg)
Ex. P : S → aA | bB | ε
A → bS
B → aS
S ⇒ abba 유도 과정
Inroduction to FL theory Page 12
S ⇒ aA (생성규칙 S → aA)
⇒ abS (생성규칙 A → bS)
⇒ abbB (생성규칙 S → bB)
⇒ abbaS (생성규칙 B → aS)
⇒ abba (생성규칙 S → )
![Page 13: Ch02](https://reader031.fdocuments.net/reader031/viewer/2022030120/58a1dd571a28abb6678b68ab/html5/thumbnails/13.jpg)
L(G) : Language generated by grammar G L(G) = {ω | S ⇒ ω, ω ∈ VT
*}
☞ ω is a sentential form of G if S ⇒ ω and ω ∈ V*.
ω is a sentence of G if S ⇒ ω and ω ∈ VT*.
Inroduction to FL theory Page 13
*
*
*
S ⇒ aA ( 생성규칙 S → aA를 적용 )
⇒ abS ( 생성규칙 A → bS를 적용 )
⇒ abbB ( 생성규칙 S → bB를 적용 )
⇒ abbaS ( 생성규칙 B → aS를 적용 )
⇒ abba ( 생성규칙 S → 를 적용 )
S ⇒ abba
sentential form : aA, abS, abbB, abbaS, abba
sentence : abba
![Page 14: Ch02](https://reader031.fdocuments.net/reader031/viewer/2022030120/58a1dd571a28abb6678b68ab/html5/thumbnails/14.jpg)
G1 = ( {S}, {a}, P, S ) 을 이용하여 L(G1)
P : S → a | aS
L (G1) = { an | n 1 }
Language design
Grammar Language
generation
design
Inroduction to FL theory Page 14
![Page 15: Ch02](https://reader031.fdocuments.net/reader031/viewer/2022030120/58a1dd571a28abb6678b68ab/html5/thumbnails/15.jpg)
주어진 문법으로부터 생성되는 언어 발견 과정
1. start symbol로부터 길이가 짧은 순으로 생성규칙 적용
2. 생성된 문장들의 형태를 고려하여 일정한 규칙 발견
G = ( {A, B, C}, {a, b, c}, P, A)
P : A → abc A → aBbc
Bb → bB Bc → Cbcc
bC → Cb aC → aaB
aC → aa
Inroduction to FL theory Page 15
L(G) = { anbncn | n 1 }
![Page 16: Ch02](https://reader031.fdocuments.net/reader031/viewer/2022030120/58a1dd571a28abb6678b68ab/html5/thumbnails/16.jpg)
(===>) ex1) S → 0S1 | 01 ex2) S → aSb | c
ex3) A → aB
B → bB | b
ex4) O → a
O → aE
E → aO ex5) A → abc A → aBbc
Bb → bB Bc → Cbcc
bC → Cb aC → aaB
aC → aa
Inroduction to FL theory
Page 16
![Page 17: Ch02](https://reader031.fdocuments.net/reader031/viewer/2022030120/58a1dd571a28abb6678b68ab/html5/thumbnails/17.jpg)
문법 기술 방법
embedded rule : S → aSb
right-recursive rule : A → aA
left-recursive rule : A → Aa
L = { an | n 0 } 일 때 문법 : A → aA | ε
L = { an | n 1 } 일 때 문법 : A → aA | a
Embedded production A → aAb
ex1) L1 = { anbn | n 0 }
ex2) L2 = { 0i1j | i j, i,j 1 }
ex3) Constructs of Conventional PL
Inroduction to FL theory Page 17
![Page 18: Ch02](https://reader031.fdocuments.net/reader031/viewer/2022030120/58a1dd571a28abb6678b68ab/html5/thumbnails/18.jpg)
ex) C 언어의 정수 선언 부분 :
정수선언 부분은 여러 개의 정수선언으로 구성되며 하나의 선
언은 int a,a,a;와 같은 형태를 갖는다. 여기서 a는 임의의
identifier를 나타낸다.
그리고 ; 으로 각각의 선언을 구분한다. 예를 들어, int i,j; int sum;
과 같다.
※ 문법을 고안할 때, nonterminal의 이름은 구문 구조를
대변할 수 있는 명칭으로 쓰는 것이 바람직하다.
Inroduction to FL theory Page 18
![Page 19: Ch02](https://reader031.fdocuments.net/reader031/viewer/2022030120/58a1dd571a28abb6678b68ab/html5/thumbnails/19.jpg)
In order to prove that a grammar generates a language L
i) Every sentence generated by the grammar is in L. ii) Every string in L can be generated by the grammar. 교과서 55쪽 [예16]
proof) (=>) Every sentence derivable from S is balanced. (<=) Every balanced string is derivable from S.
Inroduction to FL theory Page 19
G = ( { S }, { ( , ) }, {S → (S)S |ε}, S )
⇔ All strings of balanced parentheses.
![Page 20: Ch02](https://reader031.fdocuments.net/reader031/viewer/2022030120/58a1dd571a28abb6678b68ab/html5/thumbnails/20.jpg)
(=>) Every sentence derivable from S is balanced.
(i.e., S ⇒ ω, ω: balanced)
By induction on the number of steps in a derivation.
i) n = 1 일 때, S ⇒ ε, ε is surely balanced.
ii) Suppose that all derivations of fewer than n steps
produce balanced sentences.
iii) Consider a leftmost derivation of exactly n steps.
S ⇒ (S)S ⇒ (x)S ⇒ (x)y
By the hypothesis x, y : balanced.
Thus (x)y balanced.
Inroduction to FL theory Page 20
*
* *
*
![Page 21: Ch02](https://reader031.fdocuments.net/reader031/viewer/2022030120/58a1dd571a28abb6678b68ab/html5/thumbnails/21.jpg)
(<=) Every balanced string is derivable from S.
By induction on the length of a string.
i) |ω| = 0, S ⇒ ε
(the empty string is derivable from S.)
ii) Assume that every balanced string of length less than 2n is derived from S.
iii) Consider a balanced string ω of length 2n.
Let (x) : shortest prefix of ω being balanced.
Thus ω = (x)y, where x, y : balanced.
Since |x|, |y | < 2n, they are derivable from S by inductive hypothesis.
Thus S ⇒ (S)S ⇒ (x)S ⇒ (x)y = ω
Therefore, (x)y is also derivable from S.
Inroduction to FL theory Page 21
* *
![Page 22: Ch02](https://reader031.fdocuments.net/reader031/viewer/2022030120/58a1dd571a28abb6678b68ab/html5/thumbnails/22.jpg)
Chomsky Hierarchy
Noam Chomsky
According to the form of the productions.
α → β ∈ P Type 0 : No restrictions(unrestricted grammar)
Type 1 : Context-sensitive grammar(CSG).
→ β, | | | β|
Type 2 : Context-free grammar(CFG).
A → , where A : nonterminal, ∈ V*.
Type 3 : Regular grammar(RG).
A → tB or A → t, (right-linear)
A → Bt or A → t, (left-linear)
where, A, B : nonterminal, t ∈ VT*.
Inroduction to FL theory Page 22
![Page 23: Ch02](https://reader031.fdocuments.net/reader031/viewer/2022030120/58a1dd571a28abb6678b68ab/html5/thumbnails/23.jpg)
REL (Recursively Enumerable Language)
CSL (Context Sensitive Language)
CFL (Context Free Language)
RL (Regular Language)
Examples of Formal Language
simple matching language : Lm = {anbn | n ≥ 0} CFL
double matching language : Ldm = {anbncn | n ≥ 0} CSL
mirror image language : Lmi = {ωωR | ω ∈ VT*} CFL
palindrome language : Lr = {ω | ω = ωR } CFL
parenthesis language : Lp = {ω | ω: balanced parenthesis} CFL
Inroduction to FL theory Page 23
![Page 24: Ch02](https://reader031.fdocuments.net/reader031/viewer/2022030120/58a1dd571a28abb6678b68ab/html5/thumbnails/24.jpg)
The Chomsky Hierarchy of Languages
unrestricted language
context-sensitive language
context-free language
regular language
Inroduction to FL theory Page 24
![Page 25: Ch02](https://reader031.fdocuments.net/reader031/viewer/2022030120/58a1dd571a28abb6678b68ab/html5/thumbnails/25.jpg)
Languages & Recognizers
Grammar Language Recognizer
type 0
(unrestricted)
recursively
enumerable set
Turing
machine
type 1
(context-sensitive)
context-sensitive
language
Linear Bounded
Automata
type 2
(context-free)
context-free
language
Pushdown
Automata
type 3
(regular)
regular
language Finite Automata
Inroduction to FL theory Page 25