Automata 4
description
Transcript of Automata 4
CS138, Wim van Dam, UCSB
Automata and Formal Languages
CS138, Winter 2006
Wim van Dam
Room 5109, Engr. I
http://www.cs.ucsb.edu/~vandam/
CS138, Wim van Dam, UCSB
Formalities
New homework has been announced and is due Monday January 30, 11:30 in CS 138 homework box.
Questions?
CS138, Wim van Dam, UCSB
Transitions of (N)FA
For deterministic Finite Automata, each input has a unique path of Q-states that the automaton goes through.
For nondeterministic Finite Automata, each input has a set of paths of Q-states that the automaton could go through.
For a FA, the function δ:QΣQ is defined for all xΣ
For a NFA, the function δ:QΣ(Q) might have δ(q,x)=Hence
1 0
0,1
0 1
Deterministic
1 0
0
0 1
Nondeterministic
CS138, Wim van Dam, UCSB
FA = NFA
Theorem 1.39: For every language L that is accepted by a nondeterministic finite automaton, there is a (deterministic) finite automaton that accepts L as well.FA and NFA are equivalent computational models.
Proof idea: When keeping track of a nondeterministic computation of an NFA N we use many ‘fingers’ to point at the subset Q of states of N that can be reached on a given input string. We can simulate this computation with a deterministic automaton M with state space (Q).
CS138, Wim van Dam, UCSB
NA=NFA Proof
More formal proof of Theorem 1.39: Let A be the language recognized by the NFA N = (Q,Σ,δ,q0,F). Define the deterministic finite automaton M = (Q’,Σ,δ’,q’0,F’) by
1. Q’ = (Q)2. δ’(R,a) = { qQ | qδ(r,a) for an rR }3. q’0 = {q0}4. F’ = {RQ’ | R contains an ‘accept state’ of N}
This works almost, except for the ε-arrows: Define - E(R) = { q | q reachable from R using ε* steps }- δ’(R,a) = { qQ | qE(δ(r,a)) for an rR }- q’0 = E({q0})
CS138, Wim van Dam, UCSB
NA=NFA Proof (cont.)
It is easy to see that the previously described deterministic finite automaton M accepts the same language as N.
See Example 1.41 for the construction in action.
Because FA are a subset of NFA, we have proven:
Corollary 1.40: A language is regular if and only if it is accepted by a nondeterministic finite automaton.
CS138, Wim van Dam, UCSB
Closure under Regular Operations
With NFA it is much simpler to prove the various closure properties of the regular languages.
We will show the closure of regular languages under the union, concatenation and star * operation by construction.
Theorem 1.45: RLs are closed under union operation.
Given NFA N1 and N2 that accept L1 and L2, make a NFA N (using N1 and N2) that accepts L1 L2.
Theorem 1.47: RLs are closed under concatenation.… make NFA N that accepts L1 • L2.
Theorem 1.49: RLs are closed under star * operation.… make NFA N that accepts (L1)*.
CS138, Wim van Dam, UCSB
Union Closure
Construction of Theorem 1.45: Given two NFAs N1 and N2, put them ‘in parallel’ to recognize the language L(N1)L(N2):
N1N2
εε
CS138, Wim van Dam, UCSB
Concatenation Closure
Theorem 1.47: Given two NFAs N1 and N2, put them ‘sequential’ to recognize the concatentation L(N1)•L(N2):
N1N2
ε
εε
CS138, Wim van Dam, UCSB
Star Operation Closure
Construction of Theorem 1.49: Given a NFA N1, make a ‘loop’ to recognize the language L(N1)*:
N1
ε
εε
ε
CS138, Wim van Dam, UCSB
Question Time
What about complements?
CS138, Wim van Dam, UCSB
Regular Expressions (Def. 1.52)
Given an alphabet Σ, R is a regular expression if:
1. R = a, with aΣ2. R = ε3. R = 4. R = (R1R2), with R1 and R2 regular expressions5. R = (R1R2), with R1 and R2 regular expressions6. R = (R1*), with R1 a regular expression
CS138, Wim van Dam, UCSB
Reading Regular Expressions
Assume for the moment that Σ={a,b,c}.
- We allow ourselves to write Σ instead of ((ab)c).So, ((Σ*)•b) stands for the set of strings ending with a “b”.- R+ is a shorthand for RR*.- Just as with multiplication, you can drop the concatenation symbol “•”: “a•b” equals “ab” - Just as with arithmetic you can drop some parentheses. We have the precedence order: star, concatenation, union.
Hence: aa* equals a(a*), which does not equal (aa)*.
Also, 0110 equals (01)(10) does not equal 0(11)0.
And 01* equals 0(1*) and not (01)*.
CS138, Wim van Dam, UCSB
Last Monday
• We proved that each Nondeterministic Finite Automaton can be transformed into a deterministic one: NFA=FA• We proved that NFA recognized languages are closed under union, concatenation and star operation *.• Hence the set of Regular Languages is closed under these regular operations. [Reader, pp. 29–38]
• Another way of expressing simple languages is done by Regular Expressions (Def. 1.52) like aΣ*(bc) for strings that have to start with an “a” and end with a “b” or a “c”.
CS138, Wim van Dam, UCSB
Languages and RE
A RE R describes a language L(R) in the obvious way:
1. If R = a, then L(R) = {a}2. If R = ε, then L(R) = {ε}3. If R = , then L(R) = {}4. If R = (R1R2), then L(R) = L(R1)L(R2)5. If R = (R1R2), then L(R) = L(R1)L(R2)6. If R = (R1*), then L(R) = (L(R1))*
Note that formally there is a difference between the expression R and the language L(R) that it describes:0Σ*1Σ* and (01)Σ* are different expressions, but they describe the same language.
CS138, Wim van Dam, UCSB
Some Examples
• Bit string with at least two “1”s: {0,1}* 1 {0,1}* 1 {0,1}*• Bit string with at most two “1”s: 0* 0*10* 0*10*10*• Alternatively: 0*(ε1)0*(ε1)0*• a = • aε = a (note the difference between and ε)• (ΣΣ)* (strings of even length)• (ΣΣ)*(ΣΣΣ)* (strings of even length or multiple of 3)• (ΣΣΣΣΣ)* (strings of length 0 or of length greater than 1)
CS138, Wim van Dam, UCSB
Applications of REs
Regular expressions are commonly when analyzing or editing text strings. Two common examples:• The grep command in UNIX and LINUX Use man grap to see how it works• String processing in PERL See Wikipedia’s “Perl regular expression examples”
There is a good reason why we use regular expressions for this kind of pattern matching that we want to do fast…
CS138, Wim van Dam, UCSB
Thm 1.54: RL = RE
As the names suggest, the following result holds:Theorem 1.54: A language is regular if and only if some regular expression describes it.
Lemma 1.55: If a language is described by a regular expression, then it is regular. This is relatively easy,using the closure properties of RLs that we proved.
Lemma 1.60: If a language is regular, then it can be described by a regular expression. This is harder toprove and requires the definition of Generalized Nondeterministic Finite Automata (GNFA).
CS138, Wim van Dam, UCSB
Proof of Lemma 1.55
Given a regular expression R, construct (by structural induction on R) a NFA N such that L(R) = L(N):
1. If R = a with L(R) = {a}, then
2. If R = ε with L(R) = {ε}, then
3. If R = with L(R) = {}, then
4. If R = (R1R2), then L(R) = L(R1)L(R2)5. If R = (R1R2), then L(R) = L(R1)L(R2)6. If R = (R1*), then L(R) = (L(R1))*
a
ε
described
last Monday
CS138, Wim van Dam, UCSB
Proof Lemma 1.60
If a language is regular, then it is described by a RE.
Proof Idea: Use generalized nondeterministic finite automata where the labels on the transition arrows are allowed to be REs (elements of ).
For an R this GNFA recognizes L(R):
We have to prove that for each regular language L,the corresponding FA M can be transformed intoa GNFA with only two states like the one above.
We do this by removing the internal states of a GNFA one-by-one until we are left with a GNFA that has onlyone start state, one accept state and no ‘loops’.
R
CS138, Wim van Dam, UCSB
Example GNFA
qS qA
01*
0
0* 11
0110
ε
CS138, Wim van Dam, UCSB
Example GNFA
qS qA
01*
0
0* 11
0110
ε
R
CS138, Wim van Dam, UCSB
Generalized NFA
Def. 1.64: A Generalized nondeterministic finite automaton (GNFA) is defined by M=(Q, Σ, δ, qstart, qaccept) with• Q finite set of states• Σ the input alphabet• qstart the start state • qaccept the accept state
• δ:(Q\{qaccept})(Q\{qstart}) the transition function
• ( is the set of regular expressions over Σ)
CS138, Wim van Dam, UCSB
Characteristics of GNFA’s
• δ:(Q\{qaccept})(Q\{qstart})
The interior Q\{qaccept,qstart} is fully connected by δFrom qstart we have only ‘outgoing transitions’To qaccept we have only ‘ingoing transitions’Impossible qiqj transitions are “δ(qi,qj) = ”
qS qAR
Observation: This GNFA:recognizes the language L(R)
CS138, Wim van Dam, UCSB
Proof Idea of Lemma 1.60
Proof idea (given a DFA M):
Construct an equivalent GNFA M’ with k2 states
Reduce one-by-one the internal states until k=2
This GNFA M will be of the form
This regular expression R will be such that L(R) = L(M)
qS qAR
CS138, Wim van Dam, UCSB
DFA M Equivalent GNFA M’
Let M have k states Q={q1,…,qk}
- Add two states qaccept and qstartqS
q1ε
- Connect qstart to earlier q1:
qi qj- Complete missing transitions by
qAqj
ε - Connect old accepting states to qaccept
- Join multiple transitions:
qi 0qj
1 becomes qi01 qj
CS138, Wim van Dam, UCSB
Remove Internal state of GNFA
If the GNFA M has more than 2 states, ‘rip’internal qrip to get equivalent GNFA M’ by:- Removing state qrip: Q’=Q\{qrip}- Changing the transition function δ by δ’(qi,qj) = δ(qi,qj) (δ(qi,qrip)(δ(qrip,qrip))*δ(qrip,qj))
for every qiQ’\{qaccept} and qjQ’\{qstart}
qi
R4(R1R2*R3) qjqi
R2
qjR4
qripR1
R3 =
CS138, Wim van Dam, UCSB
Recap Proof Lemma 1.60
1. Let M be DFA with k states.2. Create equivalent GNFA M’ with k+2 states3. Reduce in k steps M’ to M’’ with 2 states4. The resulting GNFA describes a RE R5. The language L(M) equals L(R)
Ingredients Theorem 1.54 RL=RE:Lemma 1.55: Let R be a regular expression, then there exists an NFA M such that L(R) = L(M).Lemma 1.60: The language L(M) of a DFA M is equivalent to a language L(M’) of a GNFA = M’, which equals a two-state M’’ qstart R qaccept such that L(R) = L(M’’) Hence: RE NFA = DFA GNFA RE
CS138, Wim van Dam, UCSB
Usefulness of RE
The fact that regular expressions can be recognized (‘matched’) by deterministic FA is very useful as this is fast.
Consider a RE like ((011)*Σ*00 (ΣΣ)*)* that you want to search for in a very long binary file…
There seem to be too many options to do this efficiently: (where in the file? plus the nondeterministic operations ,*)
Solution: Create a deterministic FA that accepts the corresponding language L and use it on the file?
Actually, create a FA that accepts Σ*LΣ* and use it.
Complexity ~ length(file) + 2length(regular expression).
CS138, Wim van Dam, UCSB
Formalities
Next Friday: Midterm on “Automata: The Methods and the Madness” and “Regular Languages” [pp. 1–56, Reader]
Note that this week’s material will be part of the Midterm, although you will not have had it as homework.
Questions?
CS138, Wim van Dam, UCSB
§1.4: Nonregular Languages
What languages can not be recognized by finite automata? How to prove that a language is nonregular?
Example: L={ 0n1n | nN }• Because DFA = NFA = GNFA, it is sufficient to prove that the language can not be accepted by a DFA.• ‘Playing around’ with DFA convinces you that the ‘finiteness’ of DFA is problematic for “all nN”.• The problem occurs between the 0n and the 1n.
• Informal observation: the memory of a FA is limited by the the number of states |Q|.
CS138, Wim van Dam, UCSB
Repeating DFA Paths
q1
qk
qj
Consider an accepting DFA M with size |Q|.On a string of length p, p+1 states get visited.For p|Q|, there must be a j such that the deterministiccomputational path looks like: q1,…,qj,…,qj,…,qk.
CS138, Wim van Dam, UCSB
Repeating DFA Paths
q1
qk
qj
The action of the DFA in qj is always the same.If we repeat (or ignore) the qj,…,qj part, the newpath will again be an accepting path:
CS138, Wim van Dam, UCSB
Line of Reasoning
If we want to prove that a language L is nonregular, we can use the following proof by contradiction technique:• Assume that L is regular.• Hence, there is a DFA M that recognizes L.• For strings of length |Q| the DFA M has to ‘repeat itself’.• Show that M will accept strings outside L.• Conclude that the assumption was wrong.
Note that we use the simple DFA, not themore elaborate (but equivalent) NFA or GNFA.
CS138, Wim van Dam, UCSB
Thm 1.70: Pumping Lemma
For every regular language L, there is a finite pumping length p, such that for any string sL and |s|p, we can write s=xyz with:
1) x yi z L for every i{0,1,2,…}2) |y| 13) |xy| p
Note that: (1) implies that xz L, (2) says that y can not be the empty string ε, (3) is not always used.
This is a lemma about regular languages
CS138, Wim van Dam, UCSB
Formal Proof of Pumping Lemma
Let M = (Q,Σ,δ,q1,F) with Q = {q1,…,qp}.Let s = s1…snL(M) with |s| = n p.The computational path of M on s is the sequence r1…rn+1
Qn+1 with r1 = q1, rn+1F and rt+1= δ(rt,st) for 1tn.Because n+1 p+1, there have to be two states rj and rk
such that rj = qi = rk (with 1 ≤ j < k ≤ p+1).Let x = s1…sj–1, y = sj…sk–1, and z = sk…sn+1.The string x takes M from q1=r1 to rj, the string y takes M from rj to rj, and the string z takes M from rj to rn+1F.As a result: xyiz takes M from q1 to rn+1F (for all i 0).
CS138, Wim van Dam, UCSB
Formal Proof of Pumping Lemma
Let M = (Q,Σ,δ,q1,F) with Q = {q1,…,qp}.Let s = s1…snL(M) with |s| = n p.The computational path of M on s is the sequence r1…rn+1
Qn+1 with r1 = q1, rn+1F and rt+1= δ(rt,st) for 1tn.Because n+1 p+1, there have to be two states rj and rk
such that rj = qi = rk (with 1 ≤ j < k ≤ p+1).Let x = s1…sj–1, y = sj…sk–1, and z = sk…sn+1.The string x takes M from q1=r1 to rj, the string y takes M from rj to rj, and the string z takes M from rj to rn+1F.As a result: xyiz takes M from q1 to rn+1F (for all i 0).
CS138, Wim van Dam, UCSB
Pumping 0n1n (Ex. 1.73)
Assume that B = {0n1n | n0} is regular.Let p be the pumping length, and s = 0p1p B.Pumping Lemma: s = xyz = 0p1p with xyiz B for all i0.Three options for y:
1) y=0k, hence xyyz = 0p+k1p B2) y=1k, hence xyyz = 0p1k+p B3) y=0k1l, hence xyyz = 0p1l0k1p B
Conclusion: The pumping lemma does not hold,hence the language B is not regular.
CS138, Wim van Dam, UCSB
F = { ww | w{0,1}* } (Ex. 1.75)
Let p be the pumping length, and take s = 0p10p1.Let s = xyz = 0p10p1 with condition 3) |xy|p.Only one option: y=0k, with xyyz = 0p+k10p1 F.
Without 3) this would have been a pain.
CS138, Wim van Dam, UCSB
Intersecting Regular Languages
Example 1.74: Let C = { w | # of 0s in w = # of 1s in w}.Problem: If xyzC with yC, then xyizC.Subversive Idea: If C is regular and F is regular, then the intersection CF has to be regular as well.
Proof by Contradiction: Assume that C is regular.Take the regular language F = { 0n1m | n,mN} such that theintersection CF = { 0n1n | nN } has to be regular as well.But we know that CF is not regular.Conclusion: C is not regular.
CS138, Wim van Dam, UCSB
Pumping Down E = { 0i1j | ij }
Problem: ‘pumping up’ s=0p1p with y=0k givesxyyz = 0p+k1p, xy3z = 0p+2k1p, which are all in E(hence do not give contradictions).Solution: pump down to xz = 0p–k1p. Overall for s = xyz = 0p1p (with |xy|p):we have y=0k with k>0, hence xz = 0p–k1p E.
Contradiction: E is not regular.
End of “Regular Languages”