Ch03

제 3 장 정규 언어

컴파일러 입문

목 차

3.1 정규 문법과 정규 언어

3.2 정규 표현

3.3 유한 오토마타

3.4 정규 언어의 속성

Regular Language

정규 문법과 정규 언어 A study of the theory of regular languages is often justified by the fact

that they model the lexical analysis stage of a compiler.

Type 3 Grammar(N. Chomsky)

RLG (Right-Linear Grammar): A → tB, A → t

LLG (Left-Linear Grammar) : A → Bt, A → t

where, A,B ∈ VN and t ∈ VT*.

ex. G1 : S → 000S | 000 G2 : S→ S000 | 000

It is important to note that grammars in which

left-linear productions are intermixed with right-linear productions

are not regular.

For example,

G : S → aR S → c R → Sb

L(G) = {ancbn | n 0} is a cfl.

Regular Language

Definition

(1) A grammar is regular if each rule is

i) A aB, A a, where a VT, A, B VN.

ii) if S ε P, then S doesn't appear in RHS.

우선형 문법 A tB, A t의 형태에서 t가 하나의 terminal로

이루어진 경우로 정규 문법에 관한 속성을 체계적으로 전개하기 위

하여 바람직한 형태이다.

(2) A language is said to be a regular language(rl) if it can be

generated by a regular grammar.

ex) L = { anbm| n, m ≥1 } is rl.

S aS | aA

A bA | b

Regular Language

[Theorem] The production forms of regular grammar

can be derived from those of RLG.(RLG => RG) (Text p.69)

(proof) A tB, where t VT.

Let t = a1a2...an, ai VT. A a1A1 A1 a2A2 . . . An-1 anB. ex) S abcA ⇒ S aS1, S1 bS2 S2 cA A bcA ⇒ A bA1, A1 cA A cd ⇒ A cA1', A1' d

If t = e, then A B (single production) or A e (epsilon production).

⇒ These forms of productions can be easily removed.

(Text pp.175-181)

Right-linear grammar :

A → tB or A → t,

where A, B ∈ VN and t ∈ VT*.

Regular Language

Equivalence

1. 언어 L은 우선형 문법에 의해 생성된다.

2. 언어 L은 좌선형 문법에 의해 생성된다.

3. 언어 L은 정규 문법에 의해 생성된다.

정규 언어

[예] L = {anbm | n,m ≥ 1} : rl

S aS | aA

A bA | b

Regular Language

토큰의 구조를 정의하는데 정규 언어를 사용하는 이유 (1) 토큰의 구조는 간단하기 때문에 정규 문법으로 표현할 수 있다.

(2) context-free 문법보다는 정규 문법으로부터 효율적인 인식기를

구현할 수 있다.

(3) 컴파일러의 전반부를 모듈러하게 나누어 구성할 수 있다.

(Scanner + Parser)

문법의 형태가 정규 문법이면

그 문법이 나타내는 언어의 형태를 체계적으로 구하여

정규 문법으로 나타낼 수 있다.

Regular Language

derivation L G

if G = rg, L: re.

정규 표현 (Regular Expression)

A notation that allows us to describe the structures of

sentences in regular language.

The methods for specifying the regular languages

(1) regular grammar(rg)

(2) regular expression(re)

(3) finite automata(fa)

Regular Language

rg

re fa

Regular Language

Definition : A regular expression over the alphabet T and the language

denoted by that expression are defined recursively as follows :

I. Basis : f , e , a T.

(1) f is a regular expression denoting the empty set.

(2) e is a regular expression denoting {e}.

(3) a where a T is a regular expression denoting {a}.

II. Recurse : + , • , *

If P and Q are regular expressions denoting Lp and Lq

respectively, then

(1) (P + Q) is a regular expression denoting Lp U Lq. (union)

(2) (P • Q) is a regular expression denoting Lp Lq. (concatenation)

(3) (P*) is a regular expression denoting (closure)

{e} U Lp U Lp2 U ... U Lp

n ...

Note : precedence : + < • < *

III. Nothing else is a regular expression.

other notations

(e1)•(e2)= (e1)(e2)

(e1)+=(e1)(e1)

*

(e1)+(e2)=(e1)|(e2)

regular expression examples

ab* = {abn | n0 }

(0+1)* denotes {0,1}*.

(0+1)*011 denotes the set of all strings of 0s and 1s

ending in 011.

identifier의 정규수식 : letter(letter+digit)*

Regular Language

Definition : if α is a regular expression, L(α) denotes

the language associated with α . (Text p.77)

Let a and b be regular expressions. Then,

(1) L(α+ β) = L(α) L(β)

(2) L(α β) = L(α) L(β)

(3) L(α*) = L(α)*

examples :

(1) L(a*) = {e, a, aa, aaa, … } = {an | n 0}

(2) L((aa)*(bb)*b) = {a2nb2m+1| n,m 0}

(3) L((a+b)*b(a+ab)*) --- 연습문제 3.2 (3) - text p.115

= { b, ba, bab, ab, bb, aab, bbb, … }

Regular Language

Definition : Two regular expressions are equal if and only if they denote the same language. α= β if L(α) = L(β).

Axioms : Some algebraic properties of regular expressions. Let a, b and g be regular expressions. Then, (Text p.73)

A1. α+β = β+α A2. (α+β) +γ = α+ (β+γ)

A3. (αβ) γ = α (βγ) A4. α(β+γ) = αβ +αγ A5. (β + γ) α = βα + γα A6. α+α=α A7. α + f = α A8. αf = f = fα A9. e α = α = α e A10. α* = e +α•α*

A11. α* = (e + α)* A12. (α* )* = α*

A13. α* + α = α * A14. α* + α+ = α*

A15. (α + β)* = (α* β *) *

Regular Language

All of these identities(=Axioms) are easily proved by the

definition of regular expression.

A8. αf = f = f α

(proof) αf = { xy | x Lα and y Lf }

Since y Lf is false, (x Lα and y Lf) is false.

Thus αf = f .

Definitions : regular expression equations.

::= the set of equations whose coefficient are regular expressions.

ex) α,β가 정규 표현이면, X = αX+β가 정규 표현식이다. 이때,

X의 의미는 nonterminal 심볼이며 우측의 식이 그

nonterminal이 생성하는 언어의 형태이다.

Regular Language

▶ The solution of the regular expression equation X = αX + β.

When we substitute X = α*β in both side of the equation,

each side of the equation represents the same language. X = αX + β = α(α*β) + β = αα*β + β = (αα* + ε)β = α*β.

fixed point iteration

X = αX + β = α(αX + β) + β = α2X + αβ + β = α2X + (ε + α)β . . . = αk+1X + (ε + α + α2 + ... αk )β = (ε + α + α2 + ... + αk + ...)β = α*β.

Regular Language

Not all regular expression equations have unique solution.

X = αX + β

(a) If ε is not in α, then X = α*β is the unique solution.

(b) If ε is in α, then X = α*(β + L) for some language L.

So it has an infinity of solutions.

⇒ Smallest solution : X = α*β.

ex) X = X + a : not unique solution

⇒ X = a + b or X = b*a or X = (a + b)* etc.

X = X + a X = X + a

= a + b + a = b*a + a

= a + a + b = (b* + ε) a

= a + b. = b*a

Regular Language

Finding a regular expression denoting L(G) for a given rg G.

L(A) where A VN denotes the language generated by A.

By definition, if S is a start symbol, then L(G)= L(S).

Two steps :

1. Construct a set of simultaneous equations from G.

A aB, A a

L(A) = {a}·L(B) U {a} A = aB + a

In general, X α |β| γ ⇒ X = α + β + γ.

2. Solve these equations.

X = αX + β X = α*β.

Regular Language

derivation L G G L

if G = rg, L: re.

P.80

step1) 정규문법에서 정규표현식을 구성

X α |β| γ ⇒ X = α + β + γ

step2) 구성된 정규표현식에서

X = αX + β 형태의 식은 X = α*β 으로 대체

step3) step2에서 얻는 X의 정규표현식을 다른 표현에 대입하고

다시 Y = αY + β 형태가 나타나면 Y = α*β 으로 대체

step4) 시작 심볼에 대한 정규표현식을 S = αS + β 형태로 고쳐

S = α*β 로 풀면

α*β 가 정규문법(G)으로부터 생성될 수 있는

정규언어(L(G))가 됨

ex1) S aS S bR S ε R aS L(S) = {a}L(S) U {b}L(R) U{ε}

L(R) = {a}L(S)

ree: S = aS + bR + ε

R = As

S = aS + baS + ε

= (a + ba)S + ε

= (a + ba)* ε = (a + ba)*

ex2) S aA | bB | b A bA | ε B bS

ree: S = aA + bB + b

A = bA + ε ⇒ A = b*ε = b*

B = bS

S = ab* + bbS + b

= bbS + ab* + b

= (bb)*(ab*+b)

Regular Language

X α |β| γ ⇒ X = α + β + γ

X = αX + β X = α*β.

인식기(Recognizer)

☞ A recognizer for a language L is a program

that takes as input string x and answers “yes ”

if x is a sentence of L and “no ” otherwise.

a0a1a2 … aiai+1ai+2 … an

Finite State Control

input head

Auxiliary Storage

input

Regular Language

• Turing Machine

• Linear Bounded A

• PushDown Automata

• Finite Automata

유한 오토마타

Regular Language

G = (VN, VT, P, S)

re : f, e, a, + , • , *

M = (Q, , , q0, F)

Definition : fa

A finite automaton M over an alphabet is a system (Q, , , q0, F)

where, Q : finite, non-empty set of states.

: finite input alphabet.

: mapping function.

q0 Q : start(or initial) state.

F ⊆ Q : set of final states. mapping : Q x 2Q.

i,e. (q,a) = {p1, p2, ... , pn}

DFA , NFA.

목차 - FA

1. DFA

2. NFA

3. Converting NFA into DFA

4. Minimization of FA

5. Closure Properties of FA

Regular Language

1. Deterministic Finite Automata(DFA)

deterministic if (q,a) consists of one state.

We shall write "(q,a) = p " instead of (q,a) = {p} if deterministic.

If δ(q,a) always has exactly one number,

We say that M is completely specified.

예) DFA M = ({q0, q1, q2}, {a, b}, , q0, {q2})

(q0, a) = q1 (q0, b) = q2 (q1, a) = q2

(q1, b) = q0 (q2, a) = q0 (q2, b) = q1 전이함수를 행렬로 표시한 것을

상태전이표(state-transition table)라 함.

상태 q0에서 input string aba가 나타났을 때

(q0, aba) = ( (q0, ab), a) = (((q0, a), b), a) -> ( (q1, b), a) = (q0, a) =q1

Regular Language

a b

q0 q1 q2

q1 q2 q0

q2 q0 q1

1. Deterministic Finite Automata(DFA)

extension of : Q x ⇒ Q x *

(q, e ) = q

(q,xa) = ((q,x),a), where x * and a .

A sentence x is said to be accepted by M

if (q0, x) = p , for some p F. The language accepted by M :

L(M) = { x | (q0,x) F }

Regular Language

ex) M = ( {p, q, r}, {0, 1}, , p, {r} )

: (p,0) = q (p,1) = p

(q,0) = r (q,1) = p

(r,0) = r δ(r,1) = r

1001 L(M) ?

(p,1001) = (p,001) = (q,01) = (r,1) = r F．

∴ 1001 L(M).

1010 L(M) ?

(p,1010) = (p,010) = (q,10) = (p,0) = q F.

∴ 1010 L(M).

: matrix 형태로 transition table. ex)

Regular Language

p q p

r r r

p r q

1 0

Input symbols

Definition : State (or Transition) diagram for automaton.

The state diagram consists of a node for every state

and a directed arc from state q to state p with label

a if (q,a) = p.

Final states are indicated by a double circle and

the initial state is marked by an arrow labeled start.

p rstart

0, 11

q0

1

0

(1+01)*00(0+1)*

Astart

letter, digit

Sletter

Identifier :

Regular Language

p q p

r r r

p r q

1 0

Input symbols

Regular Language

Algorithm : w L(M).

assume M = (Q, , , q0, F);

begin

currentstate := q0; (* start state *)

get(nextsymbol);

while not eof do

begin currentstate := (currentstate, nextsymbol);

get(nextsymbol)

end;

if currentstate in F then write(‘Valid String’)

else write(‘Invalid String’);

end.

?

2. Nondeterministic Finite Automata(NFA)

nondeterministic if (q,a) = {p1, p2, ..., pn}

In state q, scanning input data a, moves input head one symbol

right and chooses any one of p1, p2, ..., pn as the next state.

ex) NFA (Nondeterministic Finite Automata)

M = ( {q0,q1,q2,q3,qf}, {0,1}, , q0, {qf} )

if (q,a) = f, then (q,a) is undefined.

(q0,1001) = {q1,q3,qf)

Regular Language

δ 0 1

q0 {q1, q2} {q1, q3}

q1 {q1, q2} {q1, q3}

q2 {qf} f

q3 f {qf}

qf {qf} {qf}

To define the language recognized by NFA, we must extend .

(i) : Q x * → 2Q

( q, ε ) = { q }

( q, xa ) = U (p,a), where a VT and x VT*.

p ( q, x )

(ii) : 2Q x * → 2Q

({p1, p2, ..., pk}, x) =

Definition : A sentence x is accepted by M

if there is a state p in both F and (q0, x).

ex) 1011 L(M) ?

({q0}, 1011) = ({q1,q3}, 011) = ({q1,q2},11)

= ({q1,q3},1) = {q1,q3,qf}

1011 L(M) ( ∵ {q1,q3,qf} ∩ {qf} Φ)

ex) 0100 L(M) ?

k

i

i xp1

),(

Regular Language

δ 0 1

q0 {q1, q2} {q1, q3}

q1 {q1, q2} {q1, q3}

q2 {qf} f

q3 f {qf}

qf {qf} {qf}

Nondeterministic behavior

If the number of states |Q| = m and input length |x| = n,

then there are mn nodes.

exponential time -> computationally intractable In general, NFA can not be easily simulated by a simple

program, but DFA can be simulated easily.

And so we shall see DFA is constructible from the NFA.

Regular Language

(q0, 1011)

(q1, 011) (q3, 011)

(q1, 11) (q2, 11) f (q1, 1) (q3, 1) f

q1 q3 qf

Regular Language

3. Converting NFA into DFA

NFA : easily describe the real world.

DFA : easily simulated by a simple program.

===> Fortunately, for each NFA we can find a DFA accepting

the same language.

Accepting Sequence(NFA) (q0, a1,a2 ... an) = ({q1,q2, … ,qi}, a2a3 ... an)

... ...

= ({p1,p2, … ,pj}, ai ... an)

... ...

= {r1,r2, ... ,rk}

Since the states of the DFA represent subsets of the set of all states of the NFA, this algorithm is often called the subset construction.

[Theorem] Let L be a language accepted by NFA. Then

there exists DFA which accepts L. (Text p.86)

(proof) Let M = (Q, , , q0, F) be a NFA accepting L.

Define DFA M' = (Q', , ', q0', F') such that

(1) Q' = 2Q, {q1, q2, ..., qi} ∈ Q', where qi ∈ Q.

denote a set of Q' as [q1, q2, ..., qi].

(2) q0' = {q0} = [q0]

(3) F' = {[r1, r2, ..., rk] | ri ∈ F}

(4) ' : ' ([q1, q2, ...,qi], a) = [p1, p2, ..., pj]

if ({q1, q2, ..., qj}, a) = {p1, p2, ..., pj}.

Now we must prove that L(M) = L(M’) i.e,

' (q0',x) F' (q0, x) ∩ F f.

we can easily show that by inductive hypothesis on the length

of the input string x.

Regular Language

ex1) M = ({q0,q1}, {0,1}, , q0, {q1}),

dfa M' = (Q', , ', q0', F'),

where Q' = 2Q = {[q0], [q1], [q0,q1]}

q0' = [q0]

F' = {[q1], [q0,q1]}

δ' :δ'([q0],0) = δ({q0},0) = {q0,q1} = [q0,q1]

δ'([q0],1) = {q0} = [q0]

δ' ([q1],0) = δ(q1,0) = f

δ' ([q1],1) = δ(q1,1) = {q0,q1} = [q0,q1]

δ' ([q0,q1],0) = δ({q0,q1},0) = {q0,q1} = [q0,q1]

δ' ([q0,q1],1) = δ({q0,q1},1) = {q0,q1} = [q0,q1]

Regular Language

0 1

q 0

{q 0 , q

1 } {q

0 }

q 1

f {q 0 , q

1 }

State renaming : [q0] = A, [q1] = B, [q0,q1] = C.

Since B is an inaccessible state, it can be removed.

’ 0 1

A C A

B f C

C C C

A Cstart

0, 11

0

B

1

Regular Language

A Cstart

0, 11

0

Definition : we call a state p accessible if there is w

such that (q0, w) (p, ε) , where q0 is the initial state.

ex2) NFA DFA

Regular Language

*

NFA : 0 1

q0 {q1,q2} {q1,q3}

q1 {q1,q2} {q1,q3}

q2 {qf} f

q3 f {qf}

qf {qf} {qf}

DFA : ’ 0 1

q0 q1q2 q1q3

q1q2 q1q2qf q1q3

q1q3 q1q2 q1q3qf

q1q2qf q1q2qf q1q3qf

q1q3qf q1q2qf q1q3qf

Definition : e - NFA M = (Q, , , q0, F) : Q ( {e} ) 2Q

e - CLOSURE : e을 보고 갈 수 있는 상태들의 집합

s가 하나의 상태

e-CLOSURE(s) = {s}{q | (p, e)=q, p e-CLOSURE(s)} T가 하나 이상의 상태 집합인 경우

e-CLOSURE(T) =

ex) e - NFA에서 e - CLOSURE를 구하기 e - CLOSURE (A) = {A, B, D}

e - CLOSURE({A,C}) = CLOSURE(A) CLOSURE(C) = {A, B, C, D}

Tq

qCLOSURE

)(e

A Dstart

a

Ca

B

b

ε

εε

a

Regular Language

Ex) e - NFA DFA A = [1,3,4], B = [2], C = [3,4], D = [4]

1start

a

c

2b

ε ε3

4

Regular Language

Dstarta b

A B

C

c

c

CLOSURE(1) = {1,3,4}

[1,3,4]

a

CLOSURE(2) = {2}

[2]

b

f CLOSURE(3) = {3,4}

[3,4]

c

[2] f CLOSURE(4) = {4}

[4] f

[3,4]

[4]

f f CLOSURE(3) = {3,4}

[3,4]

f f f

Regular Language

4. Minimization of FA

State minimization => state merge Definition :

ω* distinguishes q1 from q2 if (q1,ω) = q3,

(q2,ω) = q4 and exactly one of q3, q4 is in F.

Algorithm : equivalence relation() ⇒ partition.

(1) : final state인가 아닌 가로 partition.

(2) : input symbol에 따라 다른 equivalence class 로 가는가?

그 symbol로 distinguish 된다고 함.

:

(3) : 더 이상 partition이 일어나지 않을 때까지.

The states that can not be distinguished are merged into a

single state.

a b

[AF] [BE]

[BE] [CD]

[CD] [AF]

[AF]

[BE]

[CD]

δ’

Regular Language

Ex)

: {A,F}, {B, C, D, E} : 처음에 final, nonfinal로 분할한다.

: {A,F}, {B,E}, {C,D} : {B, C, D, E} 가 input symbol에 의해

partition 됨

: {A,F}, {B,E}, {C,D}.

D

F

B E

A

a

a

C

a

a

ba

b

b b

b

b

a

start

How to minimize the number of states in a fa.

<step 1> Delete all inaccessible states;

<step 2> Construct the equivalence relations;

<step 3> Construct fa M’ = (Q’, , ’, q0’, F’),

(a) Q’ : set of equivalence classes under

Let [p] be the equivalence class of state p under .

(b) ’([p],a) = [q] if (p,a) = q.

(c) q0’ is [q0].

(d) F' = {[q] | q F}.

Definition : M is said to be reduced

if (1) no state in Q is inaccessible and

(2) no two distinct states of Q are indistinguishable

Regular Language

A

B

C

δ

D

EF

B

E

A

FDD

C

FA

E

FE

0 1

Regular Language

ex) Find the minimum state finite automaton for the language specified by

the finite automaton M = ({A,B,C,D,E,F}, {0,1}, , A, {E,F}),

where is given by

: {A, B, C, D}, {E, F} : {A}, {C}, {B, D}, {E, F}

0 1

[A] = S1 S3 S2

[C] = S2 S1 S1

[BD] = S3 S4 S4

[EF] = S4 S3 S4

5. Closure properties of FA

[Theorem] If L1 and L2 are finite automaton languages (FAL),

then so are (i) L1 U L2 (ii) L1 • L2 (iii) L1*.

(proof) M1 = (Q1, , 1, q1, F1)

M2 = (Q2, , 2, q2, F2), Q1 Q2 = f (∵ renaming)

(i) M = (Q1 U Q2 U {q0}, , , q0, F)

where, (1) q0 is a new state.

(2) F = F1 U F2 if e L1 U L2.

F1 U F2 U {q0} if e L1 U L2.

(3) (a) (q0,a) = (q1,a) U (q2,a) for all a .

(b) (q,a) = 1(q,a) for all q Q1, a .

(c) (q,a) = 2(q,a) for all q Q2, a .

새로운 시작 상태를 만들어 각각의 fa에 마치 각 fa의 시작 상태에서 온 것처럼 연결한다. 그리고 를 인식하면 새로 만든 시작 상태도 종결 상태로 만든다.

ex) p.105 [예 28]

Regular Language

(ii) M = (Q1 U Q2, , , q0, F)

(1) F = F2 if q2 F2

F1 U F2 if q2 F2

(2) (a) (q,a) = 1(q,a) for all q Q1 - F1.

(b) (q,a) = 1(q,a) U 2(q2,a) for all q F1.

(c) (q,a) = 2(q,a) for all q Q2.

M1의 종결 상태에서 M2의 시작 상태에서 온 것처럼 연결한다. 그리고 M1의

시작 상태가 접속한 오토마타의 시작 상태가 된다.

A Bstart

1

0 M1 : => 01

*

X Ystart

0

1 M2 : => 10

*

A Ystart

0

0 M1 •M2 : => 01

*10

*B

1

1

Regular Language

정규 언어의 속성

Regular grammar (rg)

Finite automata (fa) Regular expression (re)

Regular Language

※ re ===> fa : scanner generator

목 차

1. RG & FA

2. FA & RE

3. Closure Properties of Regular Language

4. The Pumping Lemma for Regular

Language

Regular Language

1. RG & FA

Given rg, there exists a fa that accepts the same

language generated by rg and vice versa.

rg fa

Given rg, G = (VN, VT, P, S) , construct M = (Q, , , q0, F).

(1) Q = VN U {f}, where f is a new final state.

(2) = VT.

(3) q0 = S.

(4) F = {f} if e L(G)

= {S, f} otherwise.

(5) : if A aB P then (A,a) ' B.

if A a P then (A,a) ' f.

Regular Language

(proof)

If is accepted by fa then it is accepted in some sequence of

moves through states, ending in f.

But if (A,a) = B and B f , then A aB is a productions.

Also if (A,a) = f then A a is a production.

So we can use the same series of productions to generate in G

Thus S => .

ex) p.107 [예 29]

Regular Language

*

rg G=({S, B}, {0, 1}, P, S)

P : S → 0S

S → 1B

S → 0

S → 1

B → 0S

B → 0

fa M = (Q, , , q0, F)

Q : VN∪{f} = {S, B, f}

: VT = {0, 1}

q0 : S

F : {f}

:

Introduction to Compiler Design Theory

0 1

S {S, f} {B, f}

B {S, f} f

f f f

fa rg

Given M = (Q, , , q0, F), construct G = (VN, VT, P, S).

(1) VN = Q

(2) VT =

(3) S = q0

(4) P : if (q,a) = r then q ar.

if p F then p e.

ex)

p rstart

0, 11

q0

1

0

L(P)=(1+01)*00(0+1)*

p 1p | 0q

q 1p | 0r

r 0r | 1r | ε

Regular Language

2. FA & RE

fa rg re ex) p.126 3.10 (1)

A Dstart

b

Cb

a

bB

a

a

a

b

A = bA + aBB = aB + bCC = aB + bD

D = aB + bA + e

= A + e

A = (a+b)*abb

Regular Language

re fa (※ scanner generator)

For each component, we construct a fa inductively :

1. basis

2. induction - combine the components.

i f ε :ε

i fa :a

(1) N1 + N2

N1

i

ε

ε

ε

ε

N2

f

Regular Language

ex) p.112 [예 31]

ε

(2) N1 •N2

N1i N2 f

(3) N*

i f ε ε

ε

ε

N

Regular Language

Definition : The size of a regular expression is the number

of operations and operands in the expression.

ex) size(ab + c*) = 6

decomposition:

The number of state is at most twice the size of the expression.

(∵ each operand introduces two states and each operator introduces at

most two states.)

The number of arcs is at most four times the size of the expression.

*

R6

R3 +

R1 R2

R5

R4

a b c

.

Regular Language

Simplifications : p.113

※e -arc가 포함되면서 소스상태에서 나가는 다른 arc가 없으면

두 상태는 하나로 취급할 수 있다.

※ e -arc로 연결된 두 상태는 소스 상태에서 나가는 다른 arc가 없

으면 같은 상태로 취급될 수 있다.

A B

ε

aA

a

Regular Language

A B C e a

A B a

Simplifications : p.113

※두 경로가 같은 곳으로 이동하면 아래와 같이 간단화시킨다.

※ a*를 인식하는 경우는 아래와 같이 간단화시킨다..

Regular Language

A B

F S

C D

e

e

a

b

e

e

S F a, b

A B C e e

A

a

e

a

re e-NFA (간단화) DFA ex) p.115 [예 33]

(ab)*(ba)*

Introduction to Compiler Design Theory

2 1

a

b

e

4 3

b

a

a b

[1,3] [2] [4]

[2] [1,3]

[3] [4]

[4] [3]

2 1

a

b

b

4 3

b

a

The following statements are equivalent :

1. L is generated by some regular grammar.

2. L is recognized by some finite automata.

3. L is described by some regular expression.

Regular Language

p.127 3.14

(3) a(a+b)*b(a+b)*a(a+b)*b(a+b)*

Xb

WS Y

a, b

Za a b

a, b a, b a, b

Regular Language

(1) (b + a(aa* b)*b)*

b

Ya

b

Xa

b

a

Z

(2) (b + aa + ac + aaa + aac)*

Z

b

Ya

a, c

Xa

a, c

3. Closure Properties of Regular Language

[Theorem] If L1 and L2 are regular languages,

then so are

(i) L1 U L2 ,

(ii) L1L2, and

(iii) L1*.

Regular Language

(proof) (ii) Since L1 and L2 are rl, rg G1 = (VN1, VT1, P1, S1) and

rg G2 = (VN2,VT2, P2, S2), such that L(G1) = L1 and L(G2) = L2.

Construct G=(VN1 U VN2,VT1 U VT2, P, S1) in which P is defined as follows :

(1) If A aB P1, A aB P.

(2) If A a P1, A aS2 P.

(3) All productions in P2 are in P.

We must prove that L(G) = L(G1) . L(G2).

Since G is rg, L(G) is rl. Therefore L(G1) . L(G2) is rl.

ex) P1 : S aS | bA A aA | a

P2 : X 0X | 1Y Y 0Y | 1

P : S aS | bA A aA | aX X 0X | 1Y Y 0Y | 1

Regular Language

(iii) L : rl, rg G = (VN, VT, P, S) such that L(G) = L.

Let G' = (VN U {S'}, VT, P', S')

P' : (1) If A aB P, then A aB P'.

(2) If A a P, then A a, A aS' P'.

(3) S' S ┃ε P'.

We must prove that L(G') = (L(G))*.

L(G), S => . S' => S => wS' => w*S' => w*.

∴ (L(G))* = L(G').

ex) P : S aS, S b P' : S aS, S b, S bS', S' S, S' e .

note P : S = aS + b = a*b

P' : S = aS + b + bS' = a*(b+bS') = a*b + a*bS'

∴ S' = S + e

= a*bS' + a*b + e

= (a*b)*(a*b + e )

= (a*b)*(a*b) + (a*b)* = (a*b)*

Regular Language

* * *

4. The Pumping Lemma for Regular Language

It is useful in proving certain languages not to be regular.

[Theorem] Let L be a regular language. There exists a constant p such that

if a string w is in L and |ω| p, then w can be written as xyz,

where 0 < |y| ≤p and xyiz L for all i 0.

(proof) Let M = (Q, , , q0, F) be a fa with n states such that L(M) = L.

Let p = n. If L and |ω| n, then consider the sequence of

configurations entered by M in accepting w. Since there are at

least n+1 configurations in the sequence, there must be two with

the same state among the first n+1 configurations.

Thus we have a sequence of moves such that

(q0,xyz) = (q1,yz) = δ(q1,z) = qf F for some q1.

But then, (q0,xyiz) = (q1,yiz) = (q1,y

i-1z) = ... = (q1,z) = qf F.

Since w = xyz L, xyiz≤ L for all i 0.

zq1q0

x

y

qf

Regular Language

Consequently, we say that “finite automata can not count”,

meaning they can not accept a language which requires that

they count the number exactly.

ex) L = {0n1n | n ³ 1} is not type 3.

(Proof)

Suppose that L is regular.

Then for a sufficiently large n, 0n1n can be written as xyz

such that y and xyiz L for all i 0.

If y 0+ or y 1+ , then xz = xy0z L.

If y 0+1+, then xyiz L.

We have a contradiction, so L can not be regular.

ancbn

not rl

ancbm

rl Regular Language

연습문제 3.5 (4) 풀이교과서 123쪽

A = aB + bA ……………………… (1)

B = aB + bC ……………………… (2)

C = bD + aB ……………………… (3)

D = bA + aB + e ……………………… (4)

식 (4)에서 bA + aB = aB + bA = A 이므로

D = A + e ……………………… (5)

식 (3)에 식 (5)를 대입

C = b(A + e) + aB = bA + aB + b

= A + b ……………………… (6)

식 (2)에 식 (6)을 대입

B = aB + b(A + b) = aB + bA + bb

= A + bb ……………………… (7)

식 (1)에 식 (7)을 대입

A = aB + bA = a(A + bb) + bA = aA + abb + bA = (a + b)A + abb

= (a+b)*abb

L(G) = (a+b)*abb

Regular Language

Ch03

Automotive

Transcript of Ch03