1 - Language

Post on 15-Nov-2014

135 views 3 download

Tags:

Transcript of 1 - Language

Automata and Language TheoryFirst Semester SY 2009-2010

To understand language To know the fundamental units of

language

How can you define a language? English Filipino Japanese Chinese Etc…

English Different entities: letters, words, and

sentences Groups of letters make up words Groups of words make up sentences Not all collection of letters form a valid

word and as well as not all collection of words form a valid sentence.

Computer Languages Certain character strings are recognizable words

(DO, IF, END, …) Strings of words that are recognizable commands

Sets of commands becomes a program ( with or without data) that can be compiled.

It is necessary for us to adopt a definition of a “Language Structure” that is, a structure in which the decision of whether a given string of units constitutes a valid larger unit is not a matter of guesswork, but based on explicitly stated rules.

Theory of Formal Languages “formal” refers to the fact that all rules

for the languages are explicitly stated in terms of what strings of symbols can occur.

No liberties are tolerated, and no reference to any “deeper understanding” is required.

Languages - will be considered solely as symbols on paper and not as expressions of ideas in the minds of humans. Not a communication among intellects, but

a game of symbols with formal rules. The term “formal”, emphasizes that it is

the form of the string of symbols we are interested in, not the meaning.

Alphabet – finite set of fundamental units out of which we build structures.

Language – a certain specified set of strings of characters from the alphabet.

Words – those strings that are permissible in the language.

(empty string or null string) – a string having no letter

No matter what “alphabet” we are considering, the null string will always and for all languages the null word, if it is a word in the language, is also

Two words are considered the same if all their letters are the same and in the same order.

- language that has no words A subtle but important difference between a word

that has no letters, , and the language that has no words,

It is not true that is a word in the language since this language has no words at all.

A certain language L does not contain the word and we wish to add it to L, we use the “union of sets” operation denoted by “+” to form L + {}.

A certain language L does not contain the word and we wish to add it to L, we use the “union of sets” operation denoted by “+” to form L + {}. This language is not equal or same as L. Nevertheless, L + is the same as L since no new

word have been added. The fact that is a language even though it

has no words will turn out to be an important distinction.

English – familiar example of language The alphabet is the usual set of letters plus the

apostrophe and hyphen. Let us denote it by a Greek letter capital sigma:

= { a b c d e . . . Z ’ -}

, , , - time-honored tradition If we wished to be supermeticulous, we would

also include in the uppercase letters and the seldom used diacritical marks.

Valid Words in the Language- Strings, more or less, as is done in a dictionary

say,ENGLISH-WORDS={all the words in a standard

dictionary}

{} – mathematical notational denoting a set

o Does not have any grammaro Formal definition of language of the sentences

in English , we must begin by saying that this time our basic alphabet is the entries in the dictionary.

Consider , the capital gamma, as the alphabet

= {the entries in a standard dictionary, plus a blank space, plus the usual punctuation marks}

Language, ENGLISH-SENTENCESRule: Grammatical RulesThe trick of defining the language ENGLISH-SENTENCES by listing all the rules of English grammar allows us to give a finite description of an infinite language.

If we go by the rules of grammar only,Examine:

“I ate three Tuesdays”- this string is allowed in formal language- grammatically correct; only its meaning reveals that it is ridiculous

Meaning is something that we do not refer to in formal languages.

Syntax, not semantics or diction “just like the bad teacher who is interested only

in the correct spelling, not the ideas in a homework composition”

Two ways of defining Abstract Languages we treat Either they will be presented as an alphabet and the

exhaustive list of all valid words, or They will be presented as an alphabet and a set of

rules defining the acceptable words. Defining a Language – presenting the alphabet

and then specifying which strings are words. The word “specify” is trickier than we may at

first suppose

Consider the Language MY-PET and the alphabet for this language is{a c d g o t}- there is only one word for this language, and for

our own perverse reasons we wish to specify it by this sentence:

If the Earth and the Moon ever collide, thenMY-PET = {cat}

but, if the Earth and the Moon never collide, then

MY-PET = {dog}

MY-PET issue One or the other of these two events will occur, but

at this point in history of the universe, it is impossible to be certain whether the word dog is or is not in the language MY-PET.

This sentence is therefore not an adequate specification of the language MY-PET because it is not useful

To be an acceptable specification of a language, a set of rules must enable us to decide, in a finite amount of time, whether a given string of alphabet letters is or is not a word in the language.

MY-PET issue continued… It is not a requirement that all the letters in

the alphabet need to appear in the words selected for the language.

Language Different entities: letters/alphabets, words, and sentences

empty string or null string a string having no letter

A language with no words

+ union of sets

Automata and Language Theory

Define words in a language

The set of language-defining rules can be of two kinds- They can either tell us how to test a

string of alphabet letters that we might be presented with, to see if it is a valid word, or

- They can tell us how to construct all the words in the language by some clear procedures

Consider an alphabet having only one letter, the letter x,

= {x}Language definition: any nonempty string of

alphabet characters is a word:L1= {x xx xxx xxxx . . .}

Alternate form:L1= {xn for n = 1 2 3 . . . }

Concatenation – two strings are written down side by side to form a new longer string Example: to concatenate the word xxx and

the word xx, we obtain the word xxxxx. The words in this language are analogous to

positive integers, and the operation of concatenation is analogous to addition.

xn concatenated with xm is the word xn+m

Concatenation continued… It is often convenient to designate the

words in a given language by new symbols, that is, other than the ones in the alphabet. Say,

The word xxx is called a and the word xx is b. Concatenating a and b,

ab = xxxxx It is not always true that when two words

are concatenated they produce another word in the language

Concatenation continued… For example, the language is

L2 = {x xxx xxxxx xxxxxxx . . . }

= {xodd} = {x2n+1 for n = 0 1 2 3 . . .}

Then, a = xxx and b = xxxxx are both words in L2, but their concatenation ab = xxxxxxxx is not in L2.

In these simple examples, when we concatenate a with b, we get the same word as when we concatenate b with a.

ab = ba

Concatenation continued…ab = ba

This relationship does not hold for all languages. In English when we concatenate “house” and “boat,” we get “houseboat,” which is indeed a word but distinct from “boathouse,” which is different thing – not because they have different meanings, but because they are different words.

“Merry-go-round” and “carousel” mean the same thing, but they are different words.

Example Consider another language. Let us begin

with the alphabet: = { 0 1 2 3 4 5 6 7 8 9 }

and define the set of words:L3= {any finite string of alphabet letters that does not start with the letter zero}L3= { 1 2 3 4 5 6 7 8 9 10 11 12 . . . }

Defining Language rules … how to test …

L1= {xn for n = 1 2 3 . . . }

… how to construct … L3= {any finite string of alphabet letters that

does not start with the letter zero}

DEFINITION We define the function length of a string to be a

number of letters in the string.Example:

a = xxxx in the language L1

length(a) = 4

if c = 428 in the language L3

length© = 3Or, we could write directly that in L1

length(xxxx) = 4and in L3

length(428) = 3

Example:length() = 0

if length(w) = 0, then w = .

therefore, another definition of L3

L3={ any finite string of alphabet letters that, if it has a length more than 1, does not start with a zero}

There is some inherent ambiguity in the phrase “any finite string,” since it is not clear whether we intend to include the null string (, the string of no letters). We’ll define a language like L1 but that does

contain .

L4 = { x xx xxx xxxx . . . }

= { xn for n = 0 1 2 3 . . . }

Here x0 = , not x0 = 1 as in algebra.

DEFINITION Let us introduce the function reverse. If a

is a word in some language L, then reverse(a) is the same string of letters spelled backward, called the reverse of a, even if this backward string is not a word in L.

EXAMPLEreverse(xxx) = xxxreverse(xxxx) = xxxxreverse(145) = 541

But let us also note that in L3

reverse(140) = 041which is not a word in L3.

DEFINITIONLet us define a new language called PALINDROME over the alphabet = {a b}

PALINDROME = {, all strings x such that reverse(x)=x}

If we begin listing the elements in PALINDROME, we findPALINDROME = { a b aa bb aaa aba bab bbb

aaaa abba . . .}*Concatenation???

DEFINITIONGiven an alphabet , we wish to define a language in which any string of letters from is a word, even the null string. This language we shall call the closure of the alphabet. It is denoted by writing a star (an asterisk) after the name of the alphabet as a superscript:

*

This notation is sometimes known as the Kleene star after the logician who was one of the founders of this subject.

EXAMPLEIf = {x}, then

*=L4= { x xx xxx . . .}

EXAMPLEIf = { 0 1 }, then

* = { 0 1 00 01 10 11 000 001 . . .}

EXAMPLEIf = {a b c}, then

* ={ a b c aa ab ac ba bb bc ca cb cc aaa …}

We can think of the Kleene star as an operation that makes an infinite language of strings of letters out of an alphabet. Infinitely many words, each of finite length.

Lexicographic order words arranged in size order (words of shortest length first) and

then listed all the words of the same length alphabetically. We shall usually follow this method of sequencing a language.

DEFINITIONIf S is a set of words, then by S* we mean the set of all finite strings formed by concatenating words from S, where any word may be used as often as we like, and where the null string is also included.

Let us not make the mistake of confusing the two language

ENGLISH-WORDS* and ENGLISH-SENTENCES

the first language contains the word butterbutterbutterhat, the second does not.

Words in the ENGLISH-WORDS are the concatenate of arbitrarily many words from ENGLISH-WORDS

While ENGLISH-SENTENCES are restricted to juxtaposing only words from ENGLISH-WORDS in an order that compiles with the rules of grammar.

EXAMPLEIf S = {aa b}, then

S*= { plus any word composed of factors of aa and b}

= { plus all strings of a’s and b’s in which the a’s occur in even clumps}

= { b aa aab baa bbb aaaa aabb baab bbaa

bbbb aaaab aabaa aabbb baaaa baabb bbaab … }

The string aabaaab is not S* since it has a clump of a’s of length 3.

EXAMPLEIf S = {a ab}, then S*= { plus any word composed of factors of a and ab}

= { plus all strings of a’s and b’s except those that

start with b and those that contain double b} = { a aa ab aaa aab aba aaaa aaab aaba

abaa abab aaaaa aaaab aaaba aabaa aabab

abaaa abaab ababa ...}

To prove that a certain word is in the closure language S*, we must show how it can be written as a concatenate of words from the base set S.

In the last example, to show that abaab is in S*, we can factor as follows:

(ab)(a)(ab)These three factors are all in the set S; therefore, their concatenation is in S*. This is the only way to factor this string into factors of (a) and (ab). When this happens, we say that the factoring is unique.

Sometimes factoring is not unique. For example, consider S= {xx xxx}. Then S*= { and all strings of more than one x }

= {xn for n = 0 2 3 4 5 . . .} = { xx xxx xxxx xxxxx xxxxxx . . . }

Notice that the word x is not in the language S*. The string xxxxxxx is in the closure for any of these three reasons. It is

(xx)(xx)(xxx) or (xx)(xxx)(xx) or (xxx)(xx)(xx)Also, x6 is either x2x2x2 or else x3x3.

Proof by constructive algorithm - the method of proving that something exists by showing how to create it.

If the alphabet has no letters, then its closure is the language with the null string as its only word, because is always a word in a Kleene closure. Symbolically, we write

If = (the empty set), *= { }

This is not the same as if S = { },then S*= { }

which is also true for a different reason, that is = .

The Kleene closure of two sets can end up being he same language even if the two sets that we started with were not.

EXAMPLEConsider the following languages

S = { a b ab} and T= {a b bb}Then both S* and T* are languages of a’s and b’s since any string of a’s and b’s can be factored into syllables or either (a) and (b), both of which are in S and T.

If for some reason we with to modify the concept of closure to refer only to the concatenation of some (not zero) string for a set S, we use the notation + instead of *. For Example

if S={x},then S+ = {x xx xxx xxxx ….}(S+ = S* except the word ^)

This is not to say that S+ cannot, in general, contain a word ^. It can, but only on the condition that S contains the word I initially.

What happens if we apply the closure operator twice?

(S*)* or S**

Theorem 1 For any set S of strings we have S*=S**

Proof It is analogous to say that if people are

made up of molecules and molecules are made up of atoms, the people are made up of atoms

Every words in S** is made up of factors from S*. Every factor from S* is made up of factors from S. Therefore, every word in S** is made up of factors from S. Therefore, every word in S** is also a word in S*. We can write

S** S* means “is contained in or equal

to.”

Now, in general, it is true that for any set A we know that AA*, since A* we can chose as a word any one from A. So if we consider A to be our set S*, we have

S* S**

Together, these two inclusions prove that

S*=S**

Consider the language S*, where S = {ab ba}. a. Write at least 30 words in S*. b. Can any of this language contain the substring aaa or bbb?c. Give another description of this

language.

Consider the language S*, where S={a b}. How many word does this language have:

a. of length 2b. of length 3c. of length n

Consider the language S*, where S={aa aba baa}.

a. Show that the words:aabaa, baaabaaa, &

baaaaababaaaa are all in this language.

b. Can any in this language have an odd total number

1. Consider the language S*,where S = {aa ab ba bb}

Give another description of this language

2. Give an example of set S such that S* only contains all possible strings of a’s and b’s that have length divisible by 3.

3. Prove that for all sets S.a. (S+)* = (S*)*b. (S+)+ = S+

c. is (S*)+ = (S+)* for all sets S?