€¦  · Web viewA regular expression denotes a set of strings, which is therefore referred to as...

18
2 Languages and Grammars 2.1 Languages We start with a finite, nonempty set of symbols, called the alphabet. From the individual symbols we construct strings (over or on ), which are finite sequences of symbols from the alphabet. The empty string is a string with no symbols at all. Any set of strings over/on is a language over/on . Example 1.1 Example 1.2 The concatenation of two strings and is the string obtained by appending the symbols of to the right end of , that is, if and , then the concatenation of and , denoted by , is

Transcript of €¦  · Web viewA regular expression denotes a set of strings, which is therefore referred to as...

Page 1: €¦  · Web viewA regular expression denotes a set of strings, which is therefore referred to as a regular set or language. Regarding the notation of regular expression, texts

2 Languages and Grammars

2.1 LanguagesWe start with a finite, nonempty set of symbols, called the alphabet. From the individual symbols we construct strings (over or on ), which are finite sequences of symbols from the alphabet.The empty string is a string with no symbols at all. Any set of strings over/on is a language over/on . Example 1.1

Example 1.2

The concatenation of two strings and is the string obtained by appending the symbols of to the right end of , that is, if

and

, then the concatenation of and , denoted by , is

If is a string, then is the string obtained by concatening with itself times. As a special case, we define

, for all . Note that for all . The reverse of a string is obtained by writing the symbols in reverse order; if is a string as shown above, then its reverse is

Page 2: €¦  · Web viewA regular expression denotes a set of strings, which is therefore referred to as a regular set or language. Regarding the notation of regular expression, texts

If ,

then is said to be prefix and a suffix of .The length of a string , denoted by , is the number of symbols in the string.Note that,

If and are strings, then the length of their concatenation is the sum of the individual lengths,

Let us show that . To prove this by induction on the length of strings, let us define the length of a string recursively, by

for all and any string on . This definition is a formal statement of our intuitive understanding of the length of a string: the length of a single symbol is one, and the length of any string is incremented by one if we add another symbol to it. Basis: holds for all of any length and all of length 1 (by definition). Induction Hypothesis: we assume that holds for all of any length and

all of length . Induction Step: Take any of length and write it as . Then,

,

.

By the induction hypothesis (which is applicable since is of length ). .

so that .

which completes the induction step.

Page 3: €¦  · Web viewA regular expression denotes a set of strings, which is therefore referred to as a regular set or language. Regarding the notation of regular expression, texts

If is an alphabet, then we use to denote the set of strings obtained by concatenating zero or more symbols from . We denote The sets and are always infinite.

A language can thus be defined as a subset of . A string in a language is also called a word or a sentence of .

Example 1.3 . Then

. The set

. is a language on . Because it has a finite number of words, we call it a finite language. The set

is also a language on . The strings aabb and aaaabbbb are words in the language , but the string abb is not in . This language is infinite.Since languages are sets, the union, intersection, and difference of two languages are immediately defined. The complement of a language is defined with respect to ; that is, the complement of is

The concatenation of two languages and is the set of all strings obtained by concatenating any element of with any element of ; specifically,

We define as concatenated with itself times, with the special case

for every language .

Example 1.4

Page 4: €¦  · Web viewA regular expression denotes a set of strings, which is therefore referred to as a regular set or language. Regarding the notation of regular expression, texts

Example 1.5 For

then

The string aabbaaabbb is in .The star-closure or Kleene closure of a language is defined as

and the positive closure as

2.2 Regular Expressions Definition 2.1 Let be a given alphabet. Then,

1. , (representing { }), (representing {a}) are regular expressions. They are called primitive regular expressions.

2.If and are regular expressions so are ( ), ( ), ( ), ( ). 3.A string is a regular expression if it can be derived from the primitive regular

expressions by applying a finite number of the operations +, * and concatenation.

A regular expression denotes a set of strings, which is therefore referred to as a regular set or language.

Regarding the notation of regular expression, texts will usually print them boldface; however, we assume that it will be understood that, in the context of regular expressions,

is used to represent { } and is used to represent {a}.

Example 2.1 is a regular expression. Example 2.2

Beyond the usual properties of + and concatenation, important equivalences involving

Page 5: €¦  · Web viewA regular expression denotes a set of strings, which is therefore referred to as a regular set or language. Regarding the notation of regular expression, texts

regular expressions concern porperties of the closure (Kleene star) operation. Some are given below, where stand for arbitrary regular expressions:

1. . 2. . 3. . 4. . 5. . 6. . 7. . 8. . In general, the distribution law does not hold for the closure operation. For example, the

statement is false because the right hand side denotes no string in which

both and appear.

2.3 Grammars Definition 3.1 A grammar is defined as a quadruple

where V is a finite set of symbols called variables or nonterminals, is a finite set of symbols called terminal symbols or terminals, is a special symbol called the start symbol, is a finite set of productions or rules or production rules. We assume and are non-empty and disjoint sets. Production rules specify the transformation of one string into another. They are of the form

where

and

Given a string of the form

we say that the production is applicable to this string, and we may use it to replace with , thereby obtaining a new string,

;

Page 6: €¦  · Web viewA regular expression denotes a set of strings, which is therefore referred to as a regular set or language. Regarding the notation of regular expression, texts

we say that derives or that is derived from .Successive strings are derived by applying the productions of the grammar in arbitrary order. A production can be used whenever it is applicable, and it can be applied as often as desired. If

we say that derives , and write .

The * indicates that an unspecified number of steps (including zero) can be taken to derive from . Thus

is always the case. If we want to indicate that atleast one production must be applied, we can write

Let be a grammar. Then the set

is the language generated by . If , then the sequence

is a derivation of the sentence (or word) . The strings , are called sentential forms of the derivation.

Example 3.1 Consider the grammar

with given by,

Then

, so we can write

.

The string aabb is a sentence in the language generated by .

Page 7: €¦  · Web viewA regular expression denotes a set of strings, which is therefore referred to as a regular set or language. Regarding the notation of regular expression, texts

Example 3.2 P:

Figure 1: Derivation tree

Example 3.3

Leftmost Derivation

Page 8: €¦  · Web viewA regular expression denotes a set of strings, which is therefore referred to as a regular set or language. Regarding the notation of regular expression, texts

This is a leftmost derivation of the string in the grammar (corresponding to ). Note that another leftmost derivation can be given for the above expression.

A grammar (such as the one above) is called ambiguous if some string in has more than one leftmost derivation. An unambiguous grammar for the language is the following:

Note that, for an inherently ambiguous language , every grammar that generates is ambiguous.

Example 3.4

Page 9: €¦  · Web viewA regular expression denotes a set of strings, which is therefore referred to as a regular set or language. Regarding the notation of regular expression, texts

Show that

1. . (All strings derived by , are in .) For , all productions of add a number of 's which is same as the number of 's added;

2. Let . By definition of , . We show that by induction (on the length of ).Basis: is in both and . . The only two strings of length 2 in are and

Induction Hypothesis: with , we assume that Induction Step: Let , - of the form (or ) where (by I. H.) We derive using the rule We derive using the rule - or Let us assign a count of +1 to and -1 to ; Thus for the total count = 0. We will now show that count goes through 0 at least once within (case is similar) (count = +1) (count goes through 0) (count = -1) (by end, count = 0). (count = 0) where ,

Page 10: €¦  · Web viewA regular expression denotes a set of strings, which is therefore referred to as a regular set or language. Regarding the notation of regular expression, texts

We also have and so that and (I. H.) can be derived in from and , using the rule . Example 3.5

For example, let us derive .

Example 3.6

Page 11: €¦  · Web viewA regular expression denotes a set of strings, which is therefore referred to as a regular set or language. Regarding the notation of regular expression, texts

derive ccbaba Solution:

Example 3.7

To prove that 1. 2.

• Let , we apply ( times), thus

then

• : We need to show that, if can be derived in then . is in the language, by

definition. We first show that all sentential forms are of the form , by induction on the length of the sentential form.

Page 12: €¦  · Web viewA regular expression denotes a set of strings, which is therefore referred to as a regular set or language. Regarding the notation of regular expression, texts

Basis: is a sentential form, since . Induction Hypothesis: Sentential form of length is of the form . Induction Step: Sentential form of length is derived as

.

To get a sentence, we must apply the production ; i.e.,

represents all possible derivations; hence derives only strings of the form .

2.4 Classification of Grammars and Languages

A classification of grammars (and the corresponding classes of languages) is given with respect to the form of the grammar rules into the Type 1, Type 2 and Type 3 classes, respectively.

• If all the grammar rules satisfy , then the grammar is context sensitive or Type 1. Grammar G will generate a language L(G) which is called a context-sensitive language. Note that has to be of length at least 1 and thereby too. Hence, it is not possible to derive the empty string in such a grammar.

• If all production rules are of the form where , then the grammar is said to be context-free or Type 2 (i.e., the left hand side of each rule is of length 1).

• If the production rules are of the following forms: where (a string of all terminals or the empty string), and (variables), then the grammar is called right linear.

Similarly, for a left linear grammar, the production rules are of the form

For a regular grammar, the production rules are of the form

Page 13: €¦  · Web viewA regular expression denotes a set of strings, which is therefore referred to as a regular set or language. Regarding the notation of regular expression, texts

with .

A language which can be generated by a regular grammar will (later) be shown to be regular. Note that, a language that can be derived by a regular grammar iff it can be derived by a right linear grammar iff it can be derived by a left linear grammar.

2.5 Normal Forms of Context-Free Grammars

2.5.1 Chomsky Normal Form (CNF) Definition 5.1 A context-free grammar is in Chomsky Normal Form if each rule is of the form • • • where

Theorem 5.1 Let be a context-free grammar. There is an algorithm to construct a grammar in Chomsky normal form that is equivalent to ( ).

Example 5.1 Convert the given grammar to CNF.

Solution: A CNF equivalent can be given as :

Page 14: €¦  · Web viewA regular expression denotes a set of strings, which is therefore referred to as a regular set or language. Regarding the notation of regular expression, texts

2.5.2 Greibach Normal Form (GNF)If a grammar is in GNF, then the length of the terminals prefix of the sentential form is increased at every grammar rule application, thereby enabling the prevention of the left recursion.

Definition 5.2 A context-free grammar is in Greibach Normal Form if each rule is of the form,

• • •