Automata 4

41
CS138, Wim van Dam, UCSB Automata and Formal Languages CS138, Winter 2006 Wim van Dam Room 5109, Engr. I [email protected] http://www.cs.ucsb.edu/~vandam/

description

automata and formal languages slides

Transcript of Automata 4

Page 1: Automata 4

CS138, Wim van Dam, UCSB

Automata and Formal Languages

CS138, Winter 2006

Wim van Dam

Room 5109, Engr. I

[email protected]

http://www.cs.ucsb.edu/~vandam/

Page 2: Automata 4

CS138, Wim van Dam, UCSB

Formalities

New homework has been announced and is due Monday January 30, 11:30 in CS 138 homework box.

Questions?

Page 3: Automata 4

CS138, Wim van Dam, UCSB

Transitions of (N)FA

For deterministic Finite Automata, each input has a unique path of Q-states that the automaton goes through.

For nondeterministic Finite Automata, each input has a set of paths of Q-states that the automaton could go through.

For a FA, the function δ:QΣQ is defined for all xΣ

For a NFA, the function δ:QΣ(Q) might have δ(q,x)=Hence

1 0

0,1

0 1

Deterministic

1 0

0

0 1

Nondeterministic

Page 4: Automata 4

CS138, Wim van Dam, UCSB

FA = NFA

Theorem 1.39: For every language L that is accepted by a nondeterministic finite automaton, there is a (deterministic) finite automaton that accepts L as well.FA and NFA are equivalent computational models.

Proof idea: When keeping track of a nondeterministic computation of an NFA N we use many ‘fingers’ to point at the subset Q of states of N that can be reached on a given input string. We can simulate this computation with a deterministic automaton M with state space (Q).

Page 5: Automata 4

CS138, Wim van Dam, UCSB

NA=NFA Proof

More formal proof of Theorem 1.39: Let A be the language recognized by the NFA N = (Q,Σ,δ,q0,F). Define the deterministic finite automaton M = (Q’,Σ,δ’,q’0,F’) by

1. Q’ = (Q)2. δ’(R,a) = { qQ | qδ(r,a) for an rR }3. q’0 = {q0}4. F’ = {RQ’ | R contains an ‘accept state’ of N}

This works almost, except for the ε-arrows: Define - E(R) = { q | q reachable from R using ε* steps }- δ’(R,a) = { qQ | qE(δ(r,a)) for an rR }- q’0 = E({q0})

Page 6: Automata 4

CS138, Wim van Dam, UCSB

NA=NFA Proof (cont.)

It is easy to see that the previously described deterministic finite automaton M accepts the same language as N.

See Example 1.41 for the construction in action.

Because FA are a subset of NFA, we have proven:

Corollary 1.40: A language is regular if and only if it is accepted by a nondeterministic finite automaton.

Page 7: Automata 4

CS138, Wim van Dam, UCSB

Closure under Regular Operations

With NFA it is much simpler to prove the various closure properties of the regular languages.

We will show the closure of regular languages under the union, concatenation and star * operation by construction.

Theorem 1.45: RLs are closed under union operation.

Given NFA N1 and N2 that accept L1 and L2, make a NFA N (using N1 and N2) that accepts L1 L2.

Theorem 1.47: RLs are closed under concatenation.… make NFA N that accepts L1 • L2.

Theorem 1.49: RLs are closed under star * operation.… make NFA N that accepts (L1)*.

Page 8: Automata 4

CS138, Wim van Dam, UCSB

Union Closure

Construction of Theorem 1.45: Given two NFAs N1 and N2, put them ‘in parallel’ to recognize the language L(N1)L(N2):

N1N2

εε

Page 9: Automata 4

CS138, Wim van Dam, UCSB

Concatenation Closure

Theorem 1.47: Given two NFAs N1 and N2, put them ‘sequential’ to recognize the concatentation L(N1)•L(N2):

N1N2

ε

εε

Page 10: Automata 4

CS138, Wim van Dam, UCSB

Star Operation Closure

Construction of Theorem 1.49: Given a NFA N1, make a ‘loop’ to recognize the language L(N1)*:

N1

ε

εε

ε

Page 11: Automata 4

CS138, Wim van Dam, UCSB

Question Time

What about complements?

Page 12: Automata 4

CS138, Wim van Dam, UCSB

Regular Expressions (Def. 1.52)

Given an alphabet Σ, R is a regular expression if:

1. R = a, with aΣ2. R = ε3. R = 4. R = (R1R2), with R1 and R2 regular expressions5. R = (R1R2), with R1 and R2 regular expressions6. R = (R1*), with R1 a regular expression

Page 13: Automata 4

CS138, Wim van Dam, UCSB

Reading Regular Expressions

Assume for the moment that Σ={a,b,c}.

- We allow ourselves to write Σ instead of ((ab)c).So, ((Σ*)•b) stands for the set of strings ending with a “b”.- R+ is a shorthand for RR*.- Just as with multiplication, you can drop the concatenation symbol “•”: “a•b” equals “ab” - Just as with arithmetic you can drop some parentheses. We have the precedence order: star, concatenation, union.

Hence: aa* equals a(a*), which does not equal (aa)*.

Also, 0110 equals (01)(10) does not equal 0(11)0.

And 01* equals 0(1*) and not (01)*.

Page 14: Automata 4

CS138, Wim van Dam, UCSB

Last Monday

• We proved that each Nondeterministic Finite Automaton can be transformed into a deterministic one: NFA=FA• We proved that NFA recognized languages are closed under union, concatenation and star operation *.• Hence the set of Regular Languages is closed under these regular operations. [Reader, pp. 29–38]

• Another way of expressing simple languages is done by Regular Expressions (Def. 1.52) like aΣ*(bc) for strings that have to start with an “a” and end with a “b” or a “c”.

Page 15: Automata 4

CS138, Wim van Dam, UCSB

Languages and RE

A RE R describes a language L(R) in the obvious way:

1. If R = a, then L(R) = {a}2. If R = ε, then L(R) = {ε}3. If R = , then L(R) = {}4. If R = (R1R2), then L(R) = L(R1)L(R2)5. If R = (R1R2), then L(R) = L(R1)L(R2)6. If R = (R1*), then L(R) = (L(R1))*

Note that formally there is a difference between the expression R and the language L(R) that it describes:0Σ*1Σ* and (01)Σ* are different expressions, but they describe the same language.

Page 16: Automata 4

CS138, Wim van Dam, UCSB

Some Examples

• Bit string with at least two “1”s: {0,1}* 1 {0,1}* 1 {0,1}*• Bit string with at most two “1”s: 0* 0*10* 0*10*10*• Alternatively: 0*(ε1)0*(ε1)0*• a = • aε = a (note the difference between and ε)• (ΣΣ)* (strings of even length)• (ΣΣ)*(ΣΣΣ)* (strings of even length or multiple of 3)• (ΣΣΣΣΣ)* (strings of length 0 or of length greater than 1)

Page 17: Automata 4

CS138, Wim van Dam, UCSB

Applications of REs

Regular expressions are commonly when analyzing or editing text strings. Two common examples:• The grep command in UNIX and LINUX Use man grap to see how it works• String processing in PERL See Wikipedia’s “Perl regular expression examples”

There is a good reason why we use regular expressions for this kind of pattern matching that we want to do fast…

Page 18: Automata 4

CS138, Wim van Dam, UCSB

Thm 1.54: RL = RE

As the names suggest, the following result holds:Theorem 1.54: A language is regular if and only if some regular expression describes it.

Lemma 1.55: If a language is described by a regular expression, then it is regular. This is relatively easy,using the closure properties of RLs that we proved.

Lemma 1.60: If a language is regular, then it can be described by a regular expression. This is harder toprove and requires the definition of Generalized Nondeterministic Finite Automata (GNFA).

Page 19: Automata 4

CS138, Wim van Dam, UCSB

Proof of Lemma 1.55

Given a regular expression R, construct (by structural induction on R) a NFA N such that L(R) = L(N):

1. If R = a with L(R) = {a}, then

2. If R = ε with L(R) = {ε}, then

3. If R = with L(R) = {}, then

4. If R = (R1R2), then L(R) = L(R1)L(R2)5. If R = (R1R2), then L(R) = L(R1)L(R2)6. If R = (R1*), then L(R) = (L(R1))*

a

ε

described

last Monday

Page 20: Automata 4

CS138, Wim van Dam, UCSB

Proof Lemma 1.60

If a language is regular, then it is described by a RE.

Proof Idea: Use generalized nondeterministic finite automata where the labels on the transition arrows are allowed to be REs (elements of ).

For an R this GNFA recognizes L(R):

We have to prove that for each regular language L,the corresponding FA M can be transformed intoa GNFA with only two states like the one above.

We do this by removing the internal states of a GNFA one-by-one until we are left with a GNFA that has onlyone start state, one accept state and no ‘loops’.

R

Page 21: Automata 4

CS138, Wim van Dam, UCSB

Example GNFA

qS qA

01*

0

0* 11

0110

ε

Page 22: Automata 4

CS138, Wim van Dam, UCSB

Example GNFA

qS qA

01*

0

0* 11

0110

ε

R

Page 23: Automata 4

CS138, Wim van Dam, UCSB

Generalized NFA

Def. 1.64: A Generalized nondeterministic finite automaton (GNFA) is defined by M=(Q, Σ, δ, qstart, qaccept) with• Q finite set of states• Σ the input alphabet• qstart the start state • qaccept the accept state

• δ:(Q\{qaccept})(Q\{qstart}) the transition function

• ( is the set of regular expressions over Σ)

Page 24: Automata 4

CS138, Wim van Dam, UCSB

Characteristics of GNFA’s

• δ:(Q\{qaccept})(Q\{qstart})

The interior Q\{qaccept,qstart} is fully connected by δFrom qstart we have only ‘outgoing transitions’To qaccept we have only ‘ingoing transitions’Impossible qiqj transitions are “δ(qi,qj) = ”

qS qAR

Observation: This GNFA:recognizes the language L(R)

Page 25: Automata 4

CS138, Wim van Dam, UCSB

Proof Idea of Lemma 1.60

Proof idea (given a DFA M):

Construct an equivalent GNFA M’ with k2 states

Reduce one-by-one the internal states until k=2

This GNFA M will be of the form

This regular expression R will be such that L(R) = L(M)

qS qAR

Page 26: Automata 4

CS138, Wim van Dam, UCSB

DFA M Equivalent GNFA M’

Let M have k states Q={q1,…,qk}

- Add two states qaccept and qstartqS

q1ε

- Connect qstart to earlier q1:

qi qj- Complete missing transitions by

qAqj

ε - Connect old accepting states to qaccept

- Join multiple transitions:

qi 0qj

1 becomes qi01 qj

Page 27: Automata 4

CS138, Wim van Dam, UCSB

Remove Internal state of GNFA

If the GNFA M has more than 2 states, ‘rip’internal qrip to get equivalent GNFA M’ by:- Removing state qrip: Q’=Q\{qrip}- Changing the transition function δ by δ’(qi,qj) = δ(qi,qj) (δ(qi,qrip)(δ(qrip,qrip))*δ(qrip,qj))

for every qiQ’\{qaccept} and qjQ’\{qstart}

qi

R4(R1R2*R3) qjqi

R2

qjR4

qripR1

R3 =

Page 28: Automata 4

CS138, Wim van Dam, UCSB

Recap Proof Lemma 1.60

1. Let M be DFA with k states.2. Create equivalent GNFA M’ with k+2 states3. Reduce in k steps M’ to M’’ with 2 states4. The resulting GNFA describes a RE R5. The language L(M) equals L(R)

Ingredients Theorem 1.54 RL=RE:Lemma 1.55: Let R be a regular expression, then there exists an NFA M such that L(R) = L(M).Lemma 1.60: The language L(M) of a DFA M is equivalent to a language L(M’) of a GNFA = M’, which equals a two-state M’’ qstart R qaccept such that L(R) = L(M’’) Hence: RE NFA = DFA GNFA RE

Page 29: Automata 4

CS138, Wim van Dam, UCSB

Usefulness of RE

The fact that regular expressions can be recognized (‘matched’) by deterministic FA is very useful as this is fast.

Consider a RE like ((011)*Σ*00 (ΣΣ)*)* that you want to search for in a very long binary file…

There seem to be too many options to do this efficiently: (where in the file? plus the nondeterministic operations ,*)

Solution: Create a deterministic FA that accepts the corresponding language L and use it on the file?

Actually, create a FA that accepts Σ*LΣ* and use it.

Complexity ~ length(file) + 2length(regular expression).

Page 30: Automata 4

CS138, Wim van Dam, UCSB

Formalities

Next Friday: Midterm on “Automata: The Methods and the Madness” and “Regular Languages” [pp. 1–56, Reader]

Note that this week’s material will be part of the Midterm, although you will not have had it as homework.

Questions?

Page 31: Automata 4

CS138, Wim van Dam, UCSB

§1.4: Nonregular Languages

What languages can not be recognized by finite automata? How to prove that a language is nonregular?

Example: L={ 0n1n | nN }• Because DFA = NFA = GNFA, it is sufficient to prove that the language can not be accepted by a DFA.• ‘Playing around’ with DFA convinces you that the ‘finiteness’ of DFA is problematic for “all nN”.• The problem occurs between the 0n and the 1n.

• Informal observation: the memory of a FA is limited by the the number of states |Q|.

Page 32: Automata 4

CS138, Wim van Dam, UCSB

Repeating DFA Paths

q1

qk

qj

Consider an accepting DFA M with size |Q|.On a string of length p, p+1 states get visited.For p|Q|, there must be a j such that the deterministiccomputational path looks like: q1,…,qj,…,qj,…,qk.

Page 33: Automata 4

CS138, Wim van Dam, UCSB

Repeating DFA Paths

q1

qk

qj

The action of the DFA in qj is always the same.If we repeat (or ignore) the qj,…,qj part, the newpath will again be an accepting path:

Page 34: Automata 4

CS138, Wim van Dam, UCSB

Line of Reasoning

If we want to prove that a language L is nonregular, we can use the following proof by contradiction technique:• Assume that L is regular.• Hence, there is a DFA M that recognizes L.• For strings of length |Q| the DFA M has to ‘repeat itself’.• Show that M will accept strings outside L.• Conclude that the assumption was wrong.

Note that we use the simple DFA, not themore elaborate (but equivalent) NFA or GNFA.

Page 35: Automata 4

CS138, Wim van Dam, UCSB

Thm 1.70: Pumping Lemma

For every regular language L, there is a finite pumping length p, such that for any string sL and |s|p, we can write s=xyz with:

1) x yi z L for every i{0,1,2,…}2) |y| 13) |xy| p

Note that: (1) implies that xz L, (2) says that y can not be the empty string ε, (3) is not always used.

This is a lemma about regular languages

Page 36: Automata 4

CS138, Wim van Dam, UCSB

Formal Proof of Pumping Lemma

Let M = (Q,Σ,δ,q1,F) with Q = {q1,…,qp}.Let s = s1…snL(M) with |s| = n p.The computational path of M on s is the sequence r1…rn+1

Qn+1 with r1 = q1, rn+1F and rt+1= δ(rt,st) for 1tn.Because n+1 p+1, there have to be two states rj and rk

such that rj = qi = rk (with 1 ≤ j < k ≤ p+1).Let x = s1…sj–1, y = sj…sk–1, and z = sk…sn+1.The string x takes M from q1=r1 to rj, the string y takes M from rj to rj, and the string z takes M from rj to rn+1F.As a result: xyiz takes M from q1 to rn+1F (for all i 0).

Page 37: Automata 4

CS138, Wim van Dam, UCSB

Formal Proof of Pumping Lemma

Let M = (Q,Σ,δ,q1,F) with Q = {q1,…,qp}.Let s = s1…snL(M) with |s| = n p.The computational path of M on s is the sequence r1…rn+1

Qn+1 with r1 = q1, rn+1F and rt+1= δ(rt,st) for 1tn.Because n+1 p+1, there have to be two states rj and rk

such that rj = qi = rk (with 1 ≤ j < k ≤ p+1).Let x = s1…sj–1, y = sj…sk–1, and z = sk…sn+1.The string x takes M from q1=r1 to rj, the string y takes M from rj to rj, and the string z takes M from rj to rn+1F.As a result: xyiz takes M from q1 to rn+1F (for all i 0).

Page 38: Automata 4

CS138, Wim van Dam, UCSB

Pumping 0n1n (Ex. 1.73)

Assume that B = {0n1n | n0} is regular.Let p be the pumping length, and s = 0p1p B.Pumping Lemma: s = xyz = 0p1p with xyiz B for all i0.Three options for y:

1) y=0k, hence xyyz = 0p+k1p B2) y=1k, hence xyyz = 0p1k+p B3) y=0k1l, hence xyyz = 0p1l0k1p B

Conclusion: The pumping lemma does not hold,hence the language B is not regular.

Page 39: Automata 4

CS138, Wim van Dam, UCSB

F = { ww | w{0,1}* } (Ex. 1.75)

Let p be the pumping length, and take s = 0p10p1.Let s = xyz = 0p10p1 with condition 3) |xy|p.Only one option: y=0k, with xyyz = 0p+k10p1 F.

Without 3) this would have been a pain.

Page 40: Automata 4

CS138, Wim van Dam, UCSB

Intersecting Regular Languages

Example 1.74: Let C = { w | # of 0s in w = # of 1s in w}.Problem: If xyzC with yC, then xyizC.Subversive Idea: If C is regular and F is regular, then the intersection CF has to be regular as well.

Proof by Contradiction: Assume that C is regular.Take the regular language F = { 0n1m | n,mN} such that theintersection CF = { 0n1n | nN } has to be regular as well.But we know that CF is not regular.Conclusion: C is not regular.

Page 41: Automata 4

CS138, Wim van Dam, UCSB

Pumping Down E = { 0i1j | ij }

Problem: ‘pumping up’ s=0p1p with y=0k givesxyyz = 0p+k1p, xy3z = 0p+2k1p, which are all in E(hence do not give contradictions).Solution: pump down to xz = 0p–k1p. Overall for s = xyz = 0p1p (with |xy|p):we have y=0k with k>0, hence xz = 0p–k1p E.

Contradiction: E is not regular.

End of “Regular Languages”