chap5

40
Formal Languages Chapter 5 Context-Free Languages Wuu Yang National Chiao-Tung University, Taiwan, R.O.C. September 15, 2008 1

description

this is testing

Transcript of chap5

Page 1: chap5

Formal Languages

Chapter 5 Context-Free Languages

Wuu Yang

National Chiao-Tung University, Taiwan, R.O.C.

September 15, 2008

1

Page 2: chap5

Chapter Outline

1. Context-Free Grammars

2. Parsing and Ambiguity

3. Context-Free Grammars and Programming Languages

2

Page 3: chap5

We have seen many languages that are not regular, for instance,{(n)n | n ≥ 0}, which is a special case of properly nestedparentheses widely used in conventional programming languages.

Context-free languages are mostly used in the specification ofhigh-level computer programming languages, such as Java and Perl.

To decide the membership problem (whether a string belongs to acontext-free language) is called parsing, which is the front-end of acompiler.

3

Page 4: chap5

§5.1 Context-Free Grammars

Definition. A grammar G =def (V, T, S, P ) is a context-freegrammar if all production rules in P have the form A → α, whereA ∈ V and α ∈ (V ∪ T )∗. A language L is context-free if and only ifL = L(G) for some context-free grammar G.

Note that a regular grammar satisfies the above definition and,hence, it is also a context-free grammar. Consequently, a regularlanguage is also a context-free language.

Example 5.1. The following grammar is context-free, but is notregular.

S → aSa

S → bSb

S → λ

Here is a sample derivation:S ⇒ aSa ⇒ aaSaa ⇒ aabSbaa ⇒ aabbaa. The language generated

4

Page 5: chap5

by this grammar is {wwR | w ∈ Σ∗}, which is context-free, but notregular. 2

Note that this grammar is linear (see slide 3-20) in that theright-hand side of each production rule contains at most onenonterminal. But it is not right-linear nor left-linear.

From this example, we conclude that the family of regularlanguages is a proper subclass of the family of the context-freelanguages.

5

Page 6: chap5

Example 5.2. The following grammar is context-free, but is notregular.

S → abB

A → aaBb

B → bbAa

A → λ

The language generated by this grammar is{ab(bbaa)nbba(ba)n | n ≥ 0}. This language, which is similar to{enfn | n ≥ 0}, is not regular.

Note that, though, similar to a right-linear grammar, the right-handside of each production rule contains at most one nonterminal, thegrammar is not right-linear (hence, not regular). 2

6

Page 7: chap5

Example. The language L =def {w ∈ {a, b}∗ | na(w) = nb(w)} iscontext-free, but is not regular. We can derive a grammar for L:

V → V aV bV

V → V bV aV

V → λ

There are at least two derivations of the sentence abab:V ⇒ V aV bV ⇒ aV bV ⇒ abV ⇒ abV aV bV ⇒ abaV bV ⇒ababV ⇒ abab andV ⇒ V aV bV ⇒ V aV b ⇒ V ab ⇒ V aV bV ab ⇒ aV bV ab ⇒abV ab ⇒ abab andV ⇒ V aV bV ⇒ V aV b ⇒ aV b ⇒ aV bV aV b ⇒ aV bV ab ⇒abV ab ⇒ abab. We way this grammar is ambiguous. 2

7

Page 8: chap5

Example. The language L =def {w ∈ {a, b}∗ | na(w) ≥ nb(w)} iscontext-free, but is not regular. We can derive a grammar for L:

T → TaT

T → V

V → V aV bV | V bV aV | λ

This grammar is also ambiguous. 2

Example. The language L =def {w ∈ {a, b}∗ | na(w) > nb(w)} iscontext-free, but is not regular. We can derive a grammar for L:

S → TaT

T → TaT | V

V → V aV bV | V bV aV | λ

This grammar is also ambiguous. 2

8

Page 9: chap5

Example. The language L =def {w ∈ {a, b}∗ | na(w) 6= nb(w)} iscontext-free, but is not regular. We can derive a grammar for L:

S → TaT | UbU

T → TaT | V

U → UbU | V

V → V aV bV | V bV aV | λ

This grammar is also ambiguous.

This language is the complement of a previous context-freelangauge. 2

9

Page 10: chap5

Example. The language L =def {anbm | n = m} is context-free, butis not regular. We can derive a grammar for L:

V → aV b

V → λ

This grammar is unambiguous. 2

Example. The language L =def {anbm | n ≥ m} is context-free, butis not regular. We can derive a grammar for L:

T → aT

T → V

V → aV b | λ

The strings derived from T contains zero or more a’s than b’s. Thisgrammar is unambiguous. 2

10

Page 11: chap5

Example. The language L =def {anbm | n > m} is context-free, butis not regular. We can derive a grammar for L:

S → aT

T → aT | V

V → aV b | λ

The strings derived from S contains one or more a’s than b’s. Thisgrammar is unambiguous. 2

11

Page 12: chap5

Example 5.3. The language L =def {anbm | n 6= m} is context-free,but is not regular. We can derive a grammar for L:

S → aT | Ub

T → aT | V

U → Ub | V

V → aV b | λ

Either (1) the strings derived from S contains one or more a’s thanb’s (if we take S → aT during the first derivation step) or (2) thestrings derived from S contains one or more b’s than a’s (if we takeS → Ub during the first derivation step). This grammar isunambiguous.

(2nd solution). Here is the grammar from the textbook:

S → AV | V B

A → aA | a

B → Bb | b

12

Page 13: chap5

V → aV b | λ

This grammar is unambiguous. 2

How can we show that the two grammars generate the samelanguage?

Exercise 25. Find a linear grammar for this language.

13

Page 14: chap5

Example 5.4. Consider the following grammar:

S → aSb | SS | λ

The language generated by this grammar is{w ∈ {a, b}∗ | na(w) = nb(w); na(v) ≥ nb(v), for any prefix v of w}.This is the language of properly nested parentheses commonly usedin computer programming languages and mathematical expressions.2

This language is not regular.

Question. Is there a linear grammar for this language? (Seechapter 8.)

14

Page 15: chap5

Leftmost and Rightmost Derivations

A derivation is a sequence of steps. In each step we expand anonterminal A by replacing A with the right-hand side of anA-production rule. For example, consider the following grammar:

S → AB

A → aaA

A → λ

B → Bb

B → λ

The language generated by this grammar is{a2nbm | n ≥ 0,m ≥ 0}. The string aab is a sentence (or anelement) of this language. Here are two derivations of this sentence:

S ⇒ AB ⇒ aaAB ⇒ aaB ⇒ aaBb ⇒ aab

S ⇒ AB ⇒ ABb ⇒ Ab ⇒ aaAb ⇒ aab

15

Page 16: chap5

The result of each derivation step is called a sentential form. Thederivation stops when non more nonterminal is left. A sententialform without nonterminals is called a sentence.

The first derivation is a leftmost derivation in which the leftmostnonterminal is expanded first. Similarly, the second derivation is arightmost derivation in which the rightmost nonterminal isexpanded first.

A derivation can be drawn as a derivation tree. A derivation tree isalso called a syntax tree. For example:

16

Page 17: chap5

S

A B

aa A B b

Fig 5.1

A

aa A

(a) a derivation tree

(b) a partial derivation tree

A derivation tree is an ordered tree, which means that there is anordering among siblings. The root of a derivation is labelled withthe start symbol of the grammar. The leaves are labelled with anelement of T ∪ {λ}. The internal nodes are labelled with anonterminal (or a variable, which is an element of V ). A subtree ofthe derivation tree with some sub-subtrees removed is called apartial derivation tree.

17

Page 18: chap5

Example 5.6. Consider the following grammar:

S → aAB

A → bBb

B → A | λ

The language generated by this grammar is {a(bb)m | m ≥ 1}. Thestring abbbb is a sentence of this language. Here is the leftmostderivation of this sentence:

S ⇒ aAB ⇒ abBbB ⇒ abbB ⇒ abbA ⇒ abbbBb ⇒ abbbb

18

Page 19: chap5

S

A B

b B

Fig 5.2(a) a derivation tree

(b) a partial derivation tree

a

b A

b B b

B

A

b B b

19

Page 20: chap5

Theorem 5.1. There is an obvious correspondence between aderivation of a sentence w ∈ L(G) and its derivation tree.

20

Page 21: chap5

§5.2 Parsing and Ambiguity

There are two sides of a (context-free) grammar:

• We may use a grammar to generate sentences (derivation).

• We may ask whether a string can be generated by a grammar(parsing).

A simple parsing method is to try all possible derivations and see ifthe string could be derived.

We use a top-down, breadth-first, left-to-right approach.

21

Page 22: chap5

0. exhaustive search1. Input is a string w and a grammar G.2. T = {S} (the start symbol of the grammar)3. repeat4. for each sentential form f in T do5. locate the leftmost nonterminal, say A,6. expand A with every A-rule7. T := T − {f} ∪ { new sentential forms }8. delete those sentential forms that cannot generate therequired string.9. end for10. until we finds a leftmost derivation of the string or thecollection of sentential becomes empty.

If w ∈ L(G), then this algorithm always terminates and returns aleftmost derivation of w. If w 6∈ L(G), this algorithm may notterminate.

22

Page 23: chap5

An alternative strategy of exhaustive search. We may dothis by following the leftmost derivation. When expandinga nonterminal A, we try each A-rule in turn. Deriving asentence stops whenever it is possible to decide whetherthe result is the required string.

This exhaustive search method may not terminate, even ifw ∈ L(G), due to left-recursive rules (that is, rules of theform L → Lα). This same problem occurs if we follow therightmost derivation, due to right-recursive rules.

This is a top-down, depth-first approach. 2

23

Page 24: chap5

Recall the reverse of a grammar GR defined in §3.3. A leftmostderivation in G corresponds to a rightmost derivation in GR.

24

Page 25: chap5

Example 5.7. Consider the string aabb and the grammar

S → SS | aSb | bSa | λ

In the 1st round, we will try the following derivations in turn:

S ⇒ SS

S ⇒ aSb

S ⇒ bSa

S ⇒ λ

The last two derivations cannot lead to the string aabb. In the 2ndround, we have 8 sentential forms:

S ⇒ SS ⇒ SSS

S ⇒ SS ⇒ aSbS

S ⇒ SS ⇒ bSaS

25

Page 26: chap5

S ⇒ SS ⇒ λS

S ⇒ aSb ⇒ aSSb

S ⇒ aSb ⇒ aaSbb

S ⇒ aSb ⇒ abSab

S ⇒ aSb ⇒ aλb

The 3rd, 7th, and 8th derivations cannot lead to the required stringaabb. There are 5 sentential forms left. We may conduct the 3rdround and will find a leftmost derivation:

S ⇒ aSb ⇒ aaSbb ⇒ aaλbb

26

Page 27: chap5

Problems with exhaustive search:

• It is inefficient.

• It may not terminate if w 6∈ L(G). If we impose the additionalconstraint that there is no λ rules (that is, rules of the formA → λ) nor rules of the form A → B, then the aboveexhaustive search method always terminates with a correct,definite answer whether or not w ∈ L(G).We will see later that this constraint does not affect the powerof context-free grammars in any significant way.

27

Page 28: chap5

Example 5.8. The grammar in example 5.7 is equivalent to thefollowing grammar (except the empty sentence), which satisfies theabove constraint (no λ-rules):

T → TT | aTb | bTa | ab | ba

2

Corollary. Let G be a context-free grammar which does notinclude rules of the forms A → λ and A → B where A,B ∈ V .Then the derivation of a sentence w ∈ L(G) takes at most 2|w| − 1steps.

Proof. Note that in such grammars, every derivation stepincreases the length of the derived sentential form by atleast 1 or it changes a nonterminal to a terminal (with arule A → a). 2

28

Page 29: chap5

Theorem 5.2. Let G be a context-free grammar which does notinclude rules of the forms A → λ and A → B where A,B ∈ V .Then the exhaustive search method always terminate with acorrect answer.

Proof. Due to the above corollary, we can limit our searchto at most 2|w| − 1 rounds (there is a derivation step perround), where w is the given string. If w ∈ L(G) we willfind a (leftmost) derivation. Otherwise, the search willterminate with a NO answer. 2

29

Page 30: chap5

Next we will consider the time complexity of exhaustive search.

Initially, there is a single sentential form (which consists of thesingle start symbol S). In each round, a sentential form is expandedinto at most |P | new sentential forms. There are at most 2|w| − 1rounds. Hence the upper bound of the number of sentential forms is

|P |+ |P |2 + |P |3 + . . . + |P |2|w|−1 =|P |2|w| − |P ||P | − 1

= O(|P |2|w|)

This is an exponential function on the length of the input string|w|. There are more efficient general parsers, such as CYK andEarley’s parsers.

30

Page 31: chap5

Theorem 5.3. Every context-free grammars have a O(n3)-timeparser.

Context-free grammars and parsing are used mostly inprogramming languages and compilers.

In practice we usually require a linear-time parser.

Not all context-free grammars have a linear-time parser.

31

Page 32: chap5

Definition. A (context-free) grammar G is ambiguous if and only ifthere is a sentence w ∈ L(G) that have two or more leftmostderivations.

Equivalently, a (context-free) grammar G is ambiguous if and onlyif there is a sentence w ∈ L(G) that have two or more rightmostderivations.

Equivalently, a (context-free) grammar G is ambiguous if and onlyif there is a sentence w ∈ L(G) that have two or more derivationtrees.

Example 5.10. The grammar S → aSb | SS | λ is ambiguous sincethe sentence aabb has the following two leftmost derivations:

S ⇒ SS ⇒ S ⇒ aSb ⇒ aaSbb ⇒ aabb

S ⇒ aSb ⇒ aaSbb ⇒ aabb

32

Page 33: chap5

S

S ba

S

Fig 5.4

ba

S

S

S

S ba

S ba

Sometimes it is possible to transform an ambiguous grammar intoan unambiguous one. For instance, the above grammar isequivalent to the following unambiguous grammar:

S → T | λ

T → U | UT

U → ab | aUb

33

Page 34: chap5

It is very difficult to determine if a context-free grammar isambiguous. (We will discuss this later)

Example 5.11. The following grammar E → E + E | E ∗E | (E) | a

is ambiguous. This grammar is used to model the usual arithmeticexpressions.

Usually, we impose the additional stipulation that ∗ is performedbefore + (that is, ∗ has a higher precedence than +). We may usethe following (unambiguous) grammar to show this precedence:

Example 5.12.

E → E + T | T

T → T ∗ F | F

F → (E) | a

34

Page 35: chap5

The above examples show that a context-free grammar can be usedto impose precedence. Similarly, associativity can also be enforcedby context-free grammars.

For left-associative operations, such as +:

L → L + E | E

For right-associative operations, such as ∗∗:R → E ∗ ∗R | E

35

Page 36: chap5

We have shown that ambiguity sometimes can be removed byproperly transforming the grammar. However, this is not alwayspossible.

Certain context-free languages have only ambiguous grammars.They are called inherently ambiguous languages.

Definition. Let L be a context-free grammar. If L has anunambiguous grammar, it is unambiguous. Otherwise, it isinherently ambiguous.

Example 5.13. Consider the following language

L =def {anbncm} ∪ {anbmcm}The left part {anbncm} can be generated by a grammar:

S → Sc | A

A → aAb | λ

Similarly, the right part {anbmcm} can be generated by a grammar:

36

Page 37: chap5

T → aT | B

B → bBc | λ

Their union is described by one additional rule:

Q → S | T

The string anbncn, which belongs to both parts, have twoderivations.

Though this does not shown L is inherently ambiguous, it is quitepossible that it is never possible to combine the two parts with asingle unambiguous grammar.

37

Page 38: chap5

§5.4 Context-Free Languages and Programming Languages

The syntax of a programming language is usually specified by acontext-free grammar. Due to the consideration of parsingefficiency, we are usually restricted to the subclass of LL(1) orLR(1) grammars.

The following page contains C’s LALR(1) grammar.

38

Page 39: chap5

39

Page 40: chap5

Indexambiguous grammar, 7, 32

sentence, 16sentential form, 16

39-1