chap5

Formal Languages

Chapter 5 Context-Free Languages

Wuu Yang

National Chiao-Tung University, Taiwan, R.O.C.

September 15, 2008

1

Chapter Outline

1. Context-Free Grammars

2. Parsing and Ambiguity

3. Context-Free Grammars and Programming Languages

2

We have seen many languages that are not regular, for instance,{(n)n | n ≥ 0}, which is a special case of properly nestedparentheses widely used in conventional programming languages.

Context-free languages are mostly used in the specification ofhigh-level computer programming languages, such as Java and Perl.

To decide the membership problem (whether a string belongs to acontext-free language) is called parsing, which is the front-end of acompiler.

3

§5.1 Context-Free Grammars

Definition. A grammar G =def (V, T, S, P ) is a context-freegrammar if all production rules in P have the form A → α, whereA ∈ V and α ∈ (V ∪ T )∗. A language L is context-free if and only ifL = L(G) for some context-free grammar G.

Note that a regular grammar satisfies the above definition and,hence, it is also a context-free grammar. Consequently, a regularlanguage is also a context-free language.

Example 5.1. The following grammar is context-free, but is notregular.

S → aSa

S → bSb

S → λ

Here is a sample derivation:S ⇒ aSa ⇒ aaSaa ⇒ aabSbaa ⇒ aabbaa. The language generated

4

by this grammar is {wwR | w ∈ Σ∗}, which is context-free, but notregular. 2

Note that this grammar is linear (see slide 3-20) in that theright-hand side of each production rule contains at most onenonterminal. But it is not right-linear nor left-linear.

From this example, we conclude that the family of regularlanguages is a proper subclass of the family of the context-freelanguages.

5

Example 5.2. The following grammar is context-free, but is notregular.

S → abB

A → aaBb

B → bbAa

A → λ

The language generated by this grammar is{ab(bbaa)nbba(ba)n | n ≥ 0}. This language, which is similar to{enfn | n ≥ 0}, is not regular.

Note that, though, similar to a right-linear grammar, the right-handside of each production rule contains at most one nonterminal, thegrammar is not right-linear (hence, not regular). 2

6

Example. The language L =def {w ∈ {a, b}∗ | na(w) = nb(w)} iscontext-free, but is not regular. We can derive a grammar for L:

V → V aV bV

V → V bV aV

V → λ

There are at least two derivations of the sentence abab:V ⇒ V aV bV ⇒ aV bV ⇒ abV ⇒ abV aV bV ⇒ abaV bV ⇒ababV ⇒ abab andV ⇒ V aV bV ⇒ V aV b ⇒ V ab ⇒ V aV bV ab ⇒ aV bV ab ⇒abV ab ⇒ abab andV ⇒ V aV bV ⇒ V aV b ⇒ aV b ⇒ aV bV aV b ⇒ aV bV ab ⇒abV ab ⇒ abab. We way this grammar is ambiguous. 2

7

Example. The language L =def {w ∈ {a, b}∗ | na(w) ≥ nb(w)} iscontext-free, but is not regular. We can derive a grammar for L:

T → TaT

T → V

V → V aV bV | V bV aV | λ

This grammar is also ambiguous. 2

Example. The language L =def {w ∈ {a, b}∗ | na(w) > nb(w)} iscontext-free, but is not regular. We can derive a grammar for L:

S → TaT

T → TaT | V


This grammar is also ambiguous. 2

8

Example. The language L =def {w ∈ {a, b}∗ | na(w) 6= nb(w)} iscontext-free, but is not regular. We can derive a grammar for L:

S → TaT | UbU

T → TaT | V

U → UbU | V


This grammar is also ambiguous.

This language is the complement of a previous context-freelangauge. 2

9

Example. The language L =def {anbm | n = m} is context-free, butis not regular. We can derive a grammar for L:

V → aV b

V → λ

This grammar is unambiguous. 2

Example. The language L =def {anbm | n ≥ m} is context-free, butis not regular. We can derive a grammar for L:

T → aT

T → V

V → aV b | λ

The strings derived from T contains zero or more a’s than b’s. Thisgrammar is unambiguous. 2

10

Example. The language L =def {anbm | n > m} is context-free, butis not regular. We can derive a grammar for L:

S → aT

T → aT | V

V → aV b | λ

The strings derived from S contains one or more a’s than b’s. Thisgrammar is unambiguous. 2

11

Example 5.3. The language L =def {anbm | n 6= m} is context-free,but is not regular. We can derive a grammar for L:

S → aT | Ub

T → aT | V

U → Ub | V

V → aV b | λ

Either (1) the strings derived from S contains one or more a’s thanb’s (if we take S → aT during the first derivation step) or (2) thestrings derived from S contains one or more b’s than a’s (if we takeS → Ub during the first derivation step). This grammar isunambiguous.

(2nd solution). Here is the grammar from the textbook:

S → AV | V B

A → aA | a

B → Bb | b

12

V → aV b | λ

This grammar is unambiguous. 2

How can we show that the two grammars generate the samelanguage?

Exercise 25. Find a linear grammar for this language.

13

Example 5.4. Consider the following grammar:

S → aSb | SS | λ

The language generated by this grammar is{w ∈ {a, b}∗ | na(w) = nb(w); na(v) ≥ nb(v), for any prefix v of w}.This is the language of properly nested parentheses commonly usedin computer programming languages and mathematical expressions.2

This language is not regular.

Question. Is there a linear grammar for this language? (Seechapter 8.)

14

Leftmost and Rightmost Derivations

A derivation is a sequence of steps. In each step we expand anonterminal A by replacing A with the right-hand side of anA-production rule. For example, consider the following grammar:

S → AB

A → aaA

A → λ

B → Bb

B → λ

The language generated by this grammar is{a2nbm | n ≥ 0,m ≥ 0}. The string aab is a sentence (or anelement) of this language. Here are two derivations of this sentence:

S ⇒ AB ⇒ aaAB ⇒ aaB ⇒ aaBb ⇒ aab

S ⇒ AB ⇒ ABb ⇒ Ab ⇒ aaAb ⇒ aab

15

The result of each derivation step is called a sentential form. Thederivation stops when non more nonterminal is left. A sententialform without nonterminals is called a sentence.

The first derivation is a leftmost derivation in which the leftmostnonterminal is expanded first. Similarly, the second derivation is arightmost derivation in which the rightmost nonterminal isexpanded first.

A derivation can be drawn as a derivation tree. A derivation tree isalso called a syntax tree. For example:

16

S

A B

aa A B b

Fig 5.1

A

aa A

(a) a derivation tree

(b) a partial derivation tree

A derivation tree is an ordered tree, which means that there is anordering among siblings. The root of a derivation is labelled withthe start symbol of the grammar. The leaves are labelled with anelement of T ∪ {λ}. The internal nodes are labelled with anonterminal (or a variable, which is an element of V ). A subtree ofthe derivation tree with some sub-subtrees removed is called apartial derivation tree.

17

Example 5.6. Consider the following grammar:

S → aAB

A → bBb

B → A | λ

The language generated by this grammar is {a(bb)m | m ≥ 1}. Thestring abbbb is a sentence of this language. Here is the leftmostderivation of this sentence:

S ⇒ aAB ⇒ abBbB ⇒ abbB ⇒ abbA ⇒ abbbBb ⇒ abbbb

18

S

A B

b B

Fig 5.2(a) a derivation tree

(b) a partial derivation tree

a

b A

b B b

B

A

b B b

19

Theorem 5.1. There is an obvious correspondence between aderivation of a sentence w ∈ L(G) and its derivation tree.

20

§5.2 Parsing and Ambiguity

There are two sides of a (context-free) grammar:

• We may use a grammar to generate sentences (derivation).

• We may ask whether a string can be generated by a grammar(parsing).

A simple parsing method is to try all possible derivations and see ifthe string could be derived.

We use a top-down, breadth-first, left-to-right approach.

21

0. exhaustive search1. Input is a string w and a grammar G.2. T = {S} (the start symbol of the grammar)3. repeat4. for each sentential form f in T do5. locate the leftmost nonterminal, say A,6. expand A with every A-rule7. T := T − {f} ∪ { new sentential forms }8. delete those sentential forms that cannot generate therequired string.9. end for10. until we finds a leftmost derivation of the string or thecollection of sentential becomes empty.

If w ∈ L(G), then this algorithm always terminates and returns aleftmost derivation of w. If w 6∈ L(G), this algorithm may notterminate.

22

An alternative strategy of exhaustive search. We may dothis by following the leftmost derivation. When expandinga nonterminal A, we try each A-rule in turn. Deriving asentence stops whenever it is possible to decide whetherthe result is the required string.

This exhaustive search method may not terminate, even ifw ∈ L(G), due to left-recursive rules (that is, rules of theform L → Lα). This same problem occurs if we follow therightmost derivation, due to right-recursive rules.

This is a top-down, depth-first approach. 2

23

Recall the reverse of a grammar GR defined in §3.3. A leftmostderivation in G corresponds to a rightmost derivation in GR.

24

Example 5.7. Consider the string aabb and the grammar

S → SS | aSb | bSa | λ

In the 1st round, we will try the following derivations in turn:

S ⇒ SS

S ⇒ aSb

S ⇒ bSa

S ⇒ λ

The last two derivations cannot lead to the string aabb. In the 2ndround, we have 8 sentential forms:

S ⇒ SS ⇒ SSS

S ⇒ SS ⇒ aSbS

S ⇒ SS ⇒ bSaS

25

S ⇒ SS ⇒ λS

S ⇒ aSb ⇒ aSSb

S ⇒ aSb ⇒ aaSbb

S ⇒ aSb ⇒ abSab

S ⇒ aSb ⇒ aλb

The 3rd, 7th, and 8th derivations cannot lead to the required stringaabb. There are 5 sentential forms left. We may conduct the 3rdround and will find a leftmost derivation:

S ⇒ aSb ⇒ aaSbb ⇒ aaλbb

26

Problems with exhaustive search:

• It is inefficient.

• It may not terminate if w 6∈ L(G). If we impose the additionalconstraint that there is no λ rules (that is, rules of the formA → λ) nor rules of the form A → B, then the aboveexhaustive search method always terminates with a correct,definite answer whether or not w ∈ L(G).We will see later that this constraint does not affect the powerof context-free grammars in any significant way.

27

Example 5.8. The grammar in example 5.7 is equivalent to thefollowing grammar (except the empty sentence), which satisfies theabove constraint (no λ-rules):

T → TT | aTb | bTa | ab | ba

2

Corollary. Let G be a context-free grammar which does notinclude rules of the forms A → λ and A → B where A,B ∈ V .Then the derivation of a sentence w ∈ L(G) takes at most 2|w| − 1steps.

Proof. Note that in such grammars, every derivation stepincreases the length of the derived sentential form by atleast 1 or it changes a nonterminal to a terminal (with arule A → a). 2

28

Theorem 5.2. Let G be a context-free grammar which does notinclude rules of the forms A → λ and A → B where A,B ∈ V .Then the exhaustive search method always terminate with acorrect answer.

Proof. Due to the above corollary, we can limit our searchto at most 2|w| − 1 rounds (there is a derivation step perround), where w is the given string. If w ∈ L(G) we willfind a (leftmost) derivation. Otherwise, the search willterminate with a NO answer. 2

29

Next we will consider the time complexity of exhaustive search.

Initially, there is a single sentential form (which consists of thesingle start symbol S). In each round, a sentential form is expandedinto at most |P | new sentential forms. There are at most 2|w| − 1rounds. Hence the upper bound of the number of sentential forms is

|P |+ |P |2 + |P |3 + . . . + |P |2|w|−1 =|P |2|w| − |P ||P | − 1

= O(|P |2|w|)

This is an exponential function on the length of the input string|w|. There are more efficient general parsers, such as CYK andEarley’s parsers.

30

Theorem 5.3. Every context-free grammars have a O(n3)-timeparser.

Context-free grammars and parsing are used mostly inprogramming languages and compilers.

In practice we usually require a linear-time parser.

Not all context-free grammars have a linear-time parser.

31

Definition. A (context-free) grammar G is ambiguous if and only ifthere is a sentence w ∈ L(G) that have two or more leftmostderivations.

Equivalently, a (context-free) grammar G is ambiguous if and onlyif there is a sentence w ∈ L(G) that have two or more rightmostderivations.

Equivalently, a (context-free) grammar G is ambiguous if and onlyif there is a sentence w ∈ L(G) that have two or more derivationtrees.

Example 5.10. The grammar S → aSb | SS | λ is ambiguous sincethe sentence aabb has the following two leftmost derivations:

S ⇒ SS ⇒ S ⇒ aSb ⇒ aaSbb ⇒ aabb

S ⇒ aSb ⇒ aaSbb ⇒ aabb

32

S

S ba

S

Fig 5.4

ba

S

S

S

S ba

S ba

Sometimes it is possible to transform an ambiguous grammar intoan unambiguous one. For instance, the above grammar isequivalent to the following unambiguous grammar:

S → T | λ

T → U | UT

U → ab | aUb

33

It is very difficult to determine if a context-free grammar isambiguous. (We will discuss this later)

Example 5.11. The following grammar E → E + E | E ∗E | (E) | a

is ambiguous. This grammar is used to model the usual arithmeticexpressions.

Usually, we impose the additional stipulation that ∗ is performedbefore + (that is, ∗ has a higher precedence than +). We may usethe following (unambiguous) grammar to show this precedence:

Example 5.12.

E → E + T | T

T → T ∗ F | F

F → (E) | a

34

The above examples show that a context-free grammar can be usedto impose precedence. Similarly, associativity can also be enforcedby context-free grammars.

For left-associative operations, such as +:

L → L + E | E

For right-associative operations, such as ∗∗:R → E ∗ ∗R | E

35

We have shown that ambiguity sometimes can be removed byproperly transforming the grammar. However, this is not alwayspossible.

Certain context-free languages have only ambiguous grammars.They are called inherently ambiguous languages.

Definition. Let L be a context-free grammar. If L has anunambiguous grammar, it is unambiguous. Otherwise, it isinherently ambiguous.

Example 5.13. Consider the following language

L =def {anbncm} ∪ {anbmcm}The left part {anbncm} can be generated by a grammar:

S → Sc | A

A → aAb | λ

Similarly, the right part {anbmcm} can be generated by a grammar:

36

T → aT | B

B → bBc | λ

Their union is described by one additional rule:

Q → S | T

The string anbncn, which belongs to both parts, have twoderivations.

Though this does not shown L is inherently ambiguous, it is quitepossible that it is never possible to combine the two parts with asingle unambiguous grammar.

37

§5.4 Context-Free Languages and Programming Languages

The syntax of a programming language is usually specified by acontext-free grammar. Due to the consideration of parsingefficiency, we are usually restricted to the subclass of LL(1) orLR(1) grammars.

The following page contains C’s LALR(1) grammar.

38

Indexambiguous grammar, 7, 32

sentence, 16sentential form, 16

39-1

chap5

Documents

Transcript of chap5