A2B91E13d01

Formal Languages

Chapter 4 Properties of Regular Languages

Wuu Yang

National Chiao-Tung University, Taiwan, R.O.C.

August 30, 2010

1

Chapter Outline

1. Closure Properties of Regular Languages

2. Elementary Questions about Regular Languages

3. Identifying Non-Regular Languages

2

First Question.

Formal languages study how we may define a set. There are severalcommon set notations, such as

• {a, b, c}.• {1, 2, 3, . . .}.• {x ∈ N | x/2 ∈ N}.

Notations for finite sets are easy. We may simply list all theelements one by one. There would be no confusion.

For infinite sets, things are trickier We need to state a rule forgenerating all the elements, preferably without repetition (butsometimes, this requirement is relaxed), for instance “x/2 ∈ N”(even numbers) is a rule. The notation “. . .” also implies an implicitrule, which is assumed to be easily understandable for a reader.

For instance, the rule “2, 3, 5, 7, 11, . . .” is understood to be the set

3

of all prime numbers. On the other hand, the rule “9, 20, 34, . . .”cannot be understood by the general public.

Regular expressions, regular grammars, and dfa’s are all methodsfor defining rules of sets.

We are concerned with the questions:

What subsets of Σ∗ can be defined with common rules?

Could it be that every formal language can be defined with a fa(i.e., every subset of Σ∗ is a regular language)?

The answer is NO. We could define many sets that are not regularlanguages, for instance, the set of balanced parentheses

{[n]n | n ∈ N}The following set is also not regular: {an | n is a prime. }.We may define various (set) operations on regular languages.

A related question is whether an element (that is, a string) belongs

4

to a set. This is concerned with the effectiveness of a rule.

Second Question.

Since formal languages are sets, we wish to determine certainproperties of a language, such as whether it is a finite set.

Third Question.

Finally, we may ask if a language is regular. If so, we may find a dfa(or a regular expression or a regular grammar) for it. If not, whatcan we do? We can make use of the properties of regular languages.

5

4.1 Closure Properties of Regular Languages

Theorem 4.1. Let L1 and L2 be two regular languages. ThenL1 ∪ L2, L1 ∩ L2, L1 · L2, L1, and L∗1 are all regular languages.

We say the family of regular languages is closed under union,intersection, concatenation, complement, and Kleene-closure.

Proof. We may construct the nfa’s for the two regularlanguages. The nfa’s for L1 ∪L2, L1 ·L2, and L∗1 are shownin the following figure.

S F

nfa for L2

nfa for L1

S Fnfa for L2nfa for L1

S nfa for L1 F

The nfa for L1 is simply that for L1. However, the roles of

6

accepting and non-accepting states are exchanged. We alsoneed to draw the trap state explicitly.

The nfa for L1 ∩ L2 is constructed as follows: A state ofthe nfa has the form (s1, s2), where s1 and s2 are states ofthe nfa’s for L1 and L2, respectively. There is a transition(s1, s2) →a (t1, t2) if and only if there are transitionss1 →a t1 s2 →a t2, respectively. The initial state is (q1, q2),where q1 and q2 are the initial states of the nfa’s for L1 andL2, respectively. A state (s1, s2) is an accepting state ifand only if s1 and s2 are accepting states of the nfa’s forL1 and L2, respectively.

Note that a path p0 in the constructed fa corresponds to apath p1 in the fa for L1 and a path p2 in the fa for L2.Furthermore, p0, p1, p2 carry the same path label.

It can be shown that a string that is accepted by theconstructed fa must also be accepted by fa’s for both L1

7

and L2. And vice versa.

A second proof is to use regular expressions. Let r1 and r2

denote the regular expressions for the two regularlanguages, respectively. Then r1 + r2, r1 · r2, and r∗1 are theregular expressions for L1 ∪ L2, L1 · L2, and L∗1,respectively. (It is not easy to construct the regularexpressions for L1 ∩ L2 and L1.)

A third proof. L1 ∩ L2 = L1 ∪ L2. The nfa for L1 ∩ L2 canbe constructed from that of L1 and L2 respectively. 2

A proof by constructing the required nfa’a is called a constructiveproof.

Theorem. Let L1 and L2 be two regular languages. Then L1 − L2

(set difference) is also a regular language.

Proof. Note that L1 − L2 = L1 ∩ L2. 2

8

The reversal of a regular language is also regular.

Theorem 4.2. Let L be a regular language. Then LR is also aregular language.

Proof. In chapter 3, we use left-linear and right-lineargrammars to prove this theorem. Here we give analternative proof.

Since L is regular, there is an nfa that accepts L. We mayslightly modify the nfa such that there is exactly oneaccepting state. Now reverse the directions of all thetransition edges of the nfa. The resulting nfa accepts LR. 2

The regular expression for LR can be constructed based on thefollowing equations:λR = λ; aR = a; (α+β)R = αR+βR; (αβ)R = βRαR; (α∗)R = (αR)∗,where α and β are regular expressions.

9

We have seen several common closure operations. More can bedefined.

Definition. Let Σ and Γ be alphabets. Then a substitution functionh : Σ → Γ∗ is a homomorphism.

Homomorphism is a coding of Σ in terms of Γ, such as ASCIIencoding.

We may extend h to Σ∗, as follows:

h(λ) =def λ

h(αβ) =def h(α)h(β)

Let L be a language over Σ. The homomorphic image of L isdefined as h(L) =def {h(w) | w ∈ L}Note that h(L) is a language over Γ.

10

Example. Let Σ =def {a, b} and Γ =def {a, b, c}. Define h byh(a) =def ab and h(b) =def bbc. Then h(aba) = abbbcab.h(aa) = abab. 2

Theorem. Let r be the regular expression for the regular languageL. Then h(L) is also a regular language and h(r) is the regularexpression for h(L).

Proof. Since L is regular, let M be the dfa for it. For eachsymbol a ∈ Σ, replace the label a on each transition edgewith h(a). This results in a generalized transition graph.This generalized transition graph can be made a standardtransition graph by applying the transformation shown infigure 1. Hence, h(L) is a regular language. It is obviousthat h(r) is the regular expression for h(L) since we cansimulate each derivation in r with one in h(r). 2

11

P Q

P Qabcd

a b c d

Figure 1: A transformation

Hence, the family of regular languages is closed underhomomorphisms.

Example. Let Σ =def {a, b} and Γ =def {b, c, d}. Define h byh(a) =def dbcc and h(b) =def bdc. Let L be the language defined bythe regular expression r =def (a + b∗)(aa)∗. Then h(L) is definedby the regular expression h(r) = (dbcc + (bdc)∗)(dbccdbcc)∗. 2

12

Note that if L is not regular, h(L) may or may not be regular.

Ex. Let L =def {(ab)nbk | n > k, k ≥ 0} (see example 4.10 on slide4-37). Let h(x) =def a, for every x ∈ Σ. Then L is not regular buth(L) is.

We may use the right-linear grammar for h(L):S → aaT

T → aaT

T → U

U → λ

U → aaaU

Here T denotes (aa)p, for some p ≥ 0, and U denotes (aa)kak, forsome k ≥ 0. 2

13

Definition. Let L1 and L2 be languages on the same alphabet. Theright quotient of L1 with respect to L2 is defined as

L1/L2 =def {x | ∃y ∈ L2 such that xy ∈ L1}

We may define left quotients similarly.

x y in L2

L1 machine

Quotients feel like the inverse of concatenation (or product).

Lemma. (Distributive law) (L1 ∪ L2)/L3 = L1/L3 ∪ L2/L3.

In this lemma, L1 and L2 need not be regular. Conversely, thefollowing is not necessarily true: L3/(L1 ∪ L2)? =?L3/L1 ∪ L3/L2.

14

Example. Let L1 =def {anbm | n ≥ 1,m ≥ 0} ∪ {ba} andL2 =def {bm | m ≥ 1}. Then L1/L2 = {anbm | n ≥ 1, m ≥ 0}.For the above example,L1 = {ba, a, aa, aaa, . . . , ab, aab, aaab, . . . , abb, aabb, aaabb, . . .}.L2 = {b, bb, bbb, . . .}.L1/L2 = {a, aa, aaa, . . . , ab, aab, aaab, . . . , abb, aabb, aaabb, . . .} (justconsider the case b ∈ L2).

Note that L1, L2, L1/L2 are all regular. We may generalize thisresults. We next prove that the family of regular languages isclosed under right quotient.

15

Lemma. Let L1 and L2 be two regular languages. Then their rightquotient, L1/L2, is also a regular language.

Proof. We construct a dfa for L1/L2. This dfa is identicalto a dfa for L1, except that the accepting might bedifferent.

Let M =def (Q, Σ, δ, q0, F ) be a dfa for L1. For each statep ∈ Q, we attempt to determine if there is a input stringw ∈ L2 that will move M from state p to an acceptingstate in F .

We define Mp =def (Q, Σ, δ, p, F ), which is identical to M

except that the initial state is state p. We check if there isany string that belongs to both L2 and L(Mp), that is, ifL2 ∩ L(Mp) 6= ∅.Let F ′ be the set of all states p such that L2 ∩ L(Mp) 6= ∅.Let M ′ =def (Q, Σ, δ, q0, F

′). Then M ′ is a dfa for L1/L2.2

16

Note that it is effectively decidable whether L2 ∩ L(Mp) 6= ∅. Weconstruct the dfa for the intersection of the two regular languagesand see if there is a path from the initial state to an accepting state.

(We can use the shortest-path algorithm, dfs, or bfs to determine ifthere is such a path.)

17

Example. Let L1 =def L(a∗baa∗) and L2 =def L(ab∗). Find L1/L2.

Solution. We first construct a dfa for L1, which is shown in(a) below.

0

1

2

3

a b a

bb

a

a,b

0

1

2

3

a b a

bb

a

a,b

(a) dfa for L1 (b) dfa for L1/L2

Then we can verify that L(M0) ∩ L2 = ∅;L(M1) ∩ L2 = {a} 6= ∅; L(M2) ∩ L2 = {a} 6= ∅;L(M3) ∩ L2 = ∅. Hence, in the dfa for L1/L1, states 1 and2 are accepting states, shown in (b). This dfa acceptsL(a∗b + a∗baa∗) = L(a∗ba∗).

Note that in this example, L1 ⊂ L1/L2. In general, if

18

L1 ⊆ L1/L2, for two regular languages L1 and L2, then L1

must be an infinite set or λ ∈ L2. (Prove this.) 2

19

Example. Given two languages L1 and L2, Is L0 defined below aregular language?

L0 =def {xy | xy ∈ L1, y ∈ L2}

Define suffixIn(L) =def {xy | x ∈ Σ∗, y ∈ L}.Then L0 = L1 ∩ suffixIn(L2).

Note if L is regular, so is suffixIn(L). (Draw its fa.)

Therefore, this set L0 is also regular, given that L1 and L2 areregular.

To construct an nfa for L1 ∩ suffixIn(L), we may first constructan nfa for suffixIn(L2) and then construct an nfa forL1 ∩ suffixIn(L).

Proof. Let M1 and M2 be the nfa’s for L1 and L2,respectively. Then an nfa for suffixIn(L2) is shownbelow:

20

nfa for L2

(every symbol) nfa for suffixIn(L2)

To construct an nfa for L1 ∩ suffixIn(L), we may use theconstruction in Theorem 4.1. 2

21

We need a definition in the following example.

Definition. The string x is a prefix of the string xy, where x, y ∈ Σ∗.

Example. Let L be a language. Defineprefix(L) =def {x | xy ∈ L, y ∈ Σ∗},which is the set of all prefixes of all strings in L.

Lemma. prefix(prefix(L)) = prefix(L).

Is prefix(L) regular (assuming L is regular)?

Solution. The answer is YES. Let M be a dfa for L. LetM ′ be identical to M , except that all states except thetrap states are marked as accepting states. Then M ′

accepts exactly prefix(L). 2

suffix(L) is defined similarly. It is obvious that suffix(L) is alsoregular (assuming L is regular) becausesuffix(L) = (prefix(LR))R.

22

§4.2 Elementary Questions About Regular Languages

Membership algorithm. Given a language L and a string w, wewant to answer if w ∈ L.

We first face the question of defining the regular language. Thefinite automata, regular expressions, and regular grammars are thestandard representations of a regular language, which aresufficiently defined for the membership question. Other definitions,such as a natural-language description, is usually not preciseenough.

Theorem 4.5. The membership question for regular languages isdecidable. (Decidable means there is a method.)

Proof. We simply use the dfa to determine themembership. 2

23

Other important questions include (1) whether a langauge isempty, (2) whether a language is finite, (3) whether a language is asubset of another, (4) whether two languages are the same. Forregular languages, these questions are simple.

Theorem 4.6. There is an algorithm for determining a givenregular language, in a standard representation, is empty, finite, orinfinite.

Proof. We use the transition graph of a dfa for the regularlanguage. The regular language is not empty if and only ifthere is a path from the initial state to an accepting state.

The regular language is infinite if and only if there is a pathfrom the initial state to an accepting state that includes avertex belonging to a cycle. (We can identify all states thatbelongs to a cycle according to some graph algorithms.) 2

24

To determine if a regular language L1 is a subset of L2, let M1 andM2 be dfa’s for L1 and L2, respectively. We may draw an nfa asshown below and transform it to a dfa.

S F

nfa for L2

nfa for L1

In the resulting dfa, if every accepting state that includes anaccepting state of M1 also includes an accepting state of L2, thenL1 ⊆ L2, and vice versa.

25

Theorem 4.7. Given the standard representation of two regularlanguages L1 and L2, there is an algorithm to determine if L1 = L2.

Proof.1st proof. Determine if L1 ⊆ L2 and L2 ⊆ L1.

2nd proof. Let L3 =def (L1 ∩ L2) ∪ (L1 ∩ L2). Due to theclosure property, L3 is regular. L1 = L2 if and only if L3 isan empty set. 2

These fundamental questions are easy for regular languages, butthey become very difficult or even impossible to answer for otherfamilies of languages. We will encounter these questions in laterchapters.

26

§4.3 Identifying Non-Regular Languages

Consider dfa’s. A dfa is an automaton with only a finite amount ofmemory (states are a kind of memory). This means that whenprocessing a string, a dfa can remember only a finite amount of thecharacteristics of the part of the string that is already scanned bythe dfa. Examples of such characteristics include the number of a’sis even, the difference of the numbers of a’s and b’s is smaller than5, etc. This is a severe restriction.

On the other hand, a language is regular if, when we process astring of the language, only a finite amount of the characteristics ofthe string, however long it might be, needs to be remembered atany stage. This is not necessarily so for other, more complexlanguages.

In this sense, we say a regular language has a simple structurewhile other languages might have more complex structures.

27

The most detailed characteristics of the part of the string that isalready scanned by a dfa is the part itself.

Review: pigeon-hole principle

If we put n balls into m boxes, and n > m, then at least one boxcontains two or more balls.

28

Example 4.6. Is the language L =def {anbn | n ≥ 0} regular?

Solution. The answer is NO. We prove this bycontradiction. Suppose L is regular. Let M be a dfa for L.Let n be the number of states in M . Note that n is finitenumber.

Let q0 be the starting state of M . Then consider thefollowing n + 1 states: δ∗(q0, a), δ∗(q0, aa), δ∗(q0, aaa), . . .,δ∗(q0, aa . . . a) (there are n + 1 a’s). By the pigeon-holeprinciple, at least two of the above n + 1 states must be thesame, say am and an, where m 6= n. That is,δ∗(q0, a

m) = δ∗(q0, an). Therefore,

δ∗(q0, ambm) = δ∗(q0, a

nbm). Since M accepts ambm, itmust also accept anbm. But annm 6∈ L, that is, M will notaccept it. This is a contradiction.

A more verbose explanation. When scanning a stringaaa . . . a4a . . . abbb . . . b, where the 4 sign indicates

29

the position of the read head, the dfa needs toremember the number of a’s that have already beenscanned (that is, the number of a’s that are to theleft of 4). Since there are only n states, M candistinguish at most n different numbers. Hence, bythe pigeon-hole principle, M cannot distinguish atleast two of the following n + 1 situations:

a; aa; . . .; aaa . . . a (there are n + 1 a’s)

Suppose that M could not distinguish a and aa.That is, M ends up in the same state after scanninga and aa. Since M accept ab, M must also acceptaab. However, aab 6∈ L. M could not accept aab.This is a contradiction. 2

2

30

Corollary. {anbncn | n ≥ 0} is not regular.

Corollary. {anbm | n ≥ m} is not regular.

Corollary. {anbm | n ≤ m} is not regular.

31

Basic question. Given a language we ask if it is regular.

If we can draw the fa of a language, then it is regular.

Example. Is the language {w ∈ Σ∗ | |w| mod 5 = 0} regular?

Solution.

a,b a,b a,b a,b

a,b

0 1 2 3 4

2

Lemma. Every finite set is regular.

From this point of view, regular languages are extensions of finitelanguages.

Example. Draw the fa for the finite language {abb, bbb, caa}.Based on this lemma, we only need to be worried about infinitelanguages.

32

Lemma. (Pumping Lemma For Regular Languages) Let L be aninfinite regular language. Then there exists a number m such thatevery w ∈ L (where |w| ≥ m) can be written as xyz that satisfiesthe following three requirements:

1. |y| ≥ 1

2. |xy| ≤ m

3. every xyiz ∈ L, for every i = 0, 1, 2, . . ..

Intuitively, every sufficiently long sentence w can be decomposedinto three parts (xyz) such that middle part (y) can be repeated(i.e., pumped) as many times as we wish.

Though y is guaranteed to be non-empty, x and z could be emptyanyway.

This pumping lemma is a form of the pigeon-hole principle.

Proof. Since L is regular, let M be a dfa that accepts L.

33

Let m be the number of states of M . Choose any w ∈ L

satisfying |w| ≥ m (since L is infinite, there must be such asentence w ∈ L). We may write w as a1a2 . . . ak. Letq0, q1, . . . , qk be the sequence of states of M when scanningthe symbols a1a2 . . . ak. We may write this as

qoa1q1a2q2 . . . amqmam+1qm+1 . . . akqk

This notation means that M is in state q0 initially. M

enters state q1 after reading a1. M enters state q2 afterreading a2. etc.

Note qk ∈ F since w ∈ L.

Now consider q0, q1, . . . , qm. There are m + 1 states.However, M contains only m different states. Hence atleast two of q0, q1, . . . , qm must be the same state, say qi

and qj . Without loss of generality, we may assume that0 ≤ i < j ≤ m.

34

Let x =def a1a2 . . . ai, y =def ai+1ai+2 . . . aj , andz =def aj+1aj+2 . . . ak. Then we have

q0 a1a2 . . . ai qi ai+1ai+2 . . . aj qi aj+1aj+2 . . . ak qk

It can be readily seen that xz ∈ L, xyyz ∈ L, xyyyz ∈ L,etc. That is, xyiz ∈ L, where i = 0, 1, 2, . . .. 2

35

Pumping lemma is usually used to prove a language is not regular.The proof is always by contradiction.

Example 4.7. Use the pumping lemma to show thatL =def {anbn | n ≥ 0} is not regular.

Proof. Assume L is regular. Hence it satisfies the pumpinglemma. Let m be the number required by the pumpinglemma. Choose w =def ambm. According to the pumpinglemma, w can be written as xyz, where x is a string of a’sand y is a non-empty string of a’s, and z consists of therest of w. According to the pumping lemma, xyyz ∈ L.However, in the string xyyz, there are more a’s than b’s.Hence, xyyz 6∈ L. This is a contradiction. Hence, L is notregular. 2

36

Example 4.8. Let Σ =def {a, b}. Show thatL =def {wwR | w ∈ Σ∗} is not regular.

Proof. Assume L is regular. Hence it satisfies the pumpinglemma. Let m be number required by the pumping lemma.Choose ambmbmam. According to the pumping lemma,ambmbmam can be written as xyz, where x is a string ofa’s, y is a non-empty string of a’s, and z consists of theremainder. According to the pumping lemma, xyyz ∈ L.However, xyyz cannot be written in the form wwR. Hence,xyyz 6∈ L. This is a contradiction. Therefore, L is notregular. 2

37

How to use Pumping Lemma to prove a language L is notregular

1. Assume L is regular (so it satisfies Pumping Lemma).

2. Let m be the number supplied by Pumping Lemma.

3. Choose a sentence s in L whose length is at least m. Usuallythe sentence is somehow related to m, such ambmcm, am!, b2m,etc.

4. Show all possible divisions of the sentence s into xyz.

5. Show that every possible division eventually leads to acontradiction. That is, xyiz 6∈ L, for some i.

6. We conclude that the language L cannot be regular.

7. Note that we do not assume the exact value of m.

38

Example 4.9. Show that the languageL =def {w ∈ Σ∗ | na(w) < nb(w)} is not regular.

Proof. Assume L is regular. Hence, it satisfies the pumpinglemma. Let m be number required by the pumping lemma.Choose w =def ambm+1. According to the pumping lemma,w can be written as xyz, where x is a string of a’s, y is anon-empty string of a’s, and z consists of the remainingpart of w. According to the pumping lemma, xyyz ∈ L.However, na(xyyz) ≥ nb(xyyz). Hence, xyyz 6∈ L. This is acontradiction. Therefore, L is not regular. 2

39

Example 4.10. Show that the languageL =def {(ab)nbk | n > k, k ≥ 0} is not regular.

Proof. Assume L is regular. Hence, it satisfies the pumpinglemma. Let m be number required by the pumping lemma.Choose w =def (ab)m+1bm. According to the pumpinglemma, w can be written as xyz, where x and y fall in thepart of (ab)∗’s. According to the pumping lemma, xz ∈ L

and xyyz ∈ L. Consider y. There are two cases:

• y = (ab)ha or y = b(ab)h, where h ≥ 0. In this case,xz 6∈ L or xyyz 6∈ L.

• y = (ab)h or y = (ba)h, where h ≥ 1. In this case,xz 6∈ L.

There is a contradiction in either case. Therefore, L is notregular. 2

40

Example 4.11. Show that the language L =def {an! | n ≥ 0} is notregular.

Proof. Assume L is regular. Hence, it satisfies the pumpinglemma. Let m be number required by the pumping lemma.If m < 3, we will use m = 3 instead. Choose w =def am!.According to the pumping lemma, w can be written asxyz, where y is a non-empty string of a’a. Note that0 < |y| ≤ m. According to the pumping lemma, xz ∈ L.

We may compute the length of xz. |xz| = m!− |y|. Sincem ≥ 3 and 0 < |y| ≤ m, m!− |y| > (m− 1)!. This meansthat xz 6∈ L.

This is a contradiction. Therefore, L is not regular. 2

41

Example 4.12. Show that the languageL =def {anbkcn+k | n ≥ 0, k ≥ 0} is not regular.

Proof. Assume L is regular. Hence, it satisfies the pumpinglemma. Let m be number required by the pumping lemma.Choose w =def ambmc2m. According to the pumpinglemma, w can be written as xyz, where y is a non-emptystring of a’a. According to the pumping lemma, xz ∈ L.

However, the number of a’s in xz is less than m. Thismeans that xz 6∈ L.

This is a contradiction. Therefore, L is not regular. 2

42

Example 4.13. Show that the language L =def {anbl | n 6= l} is notregular.

Proof. (1st proof) Assume L is regular. Hence, it satisfiesthe pumping lemma. Let m be number required by thepumping lemma. Choose w =def am!b(m+1)!. According tothe pumping lemma, w can be written as xyz, where y is anon-empty string of a’a. According to the pumping lemma,xyiz ∈ L, for every i ≥ 0.

Next we will show that it is always possible to find anappropriate i so that xyiz 6∈ L. Let k =def |y|. Note that0 < k ≤ m. Then xyiz contains (m!− k + ik) a’s and(m + 1)! b’s. Solve the equation

(m!− k + ik) = (m + 1)!

We obtain i = 1 + (m+1)!−m!k = 1 + m(m+1)!

k . Sincek < m + 1, there is always an integer solution for i.

43

This is a contradiction. Therefore, L is not regular.

(2nd proof) Suppose L is regular. Then L = {anbn | n ≥ 0}must also be regular. But we have already shown that L isnot regular. Hence, L is not regular. 2

44

Review

1. nfa = dfa

2. regular languages = regular expressions = finite automata =regular grammars

3. closure properties

4. pumping lemma

45

A wrong proof based on pumping lemma.

Example 4.11. Show that the language L =def {a3n | n ≥ 0} is notregular.

Proof. Assume L is regular. Hence, it satisfies the pumpinglemma. Let m be number required by the pumping lemma.Choose w =def a3m. w may be written xyz. Let y =def a.Then xyyz = a3m+1 6∈ L. This is a contradiction.Therefore, L is not regular. 2

What is wrong with the above proof?

46

Indexequality problem, 26

membership algorithm, 23

46-1

A2B91E13d01

Documents

Transcript of A2B91E13d01