CSC324 Formal Language Theory - University of …afsaneh/csc324w13/fltI.pdf · Example IDEs:...

39
CSC324 – Formal Language Theory Afsaneh Fazly 1 January 9, 2013 1 Thanks to A.Tafliovich, S.McIlraith, E.Joanis, S.Stevenson, G.Penn, D.Horton 1

Transcript of CSC324 Formal Language Theory - University of …afsaneh/csc324w13/fltI.pdf · Example IDEs:...

Page 1: CSC324 Formal Language Theory - University of …afsaneh/csc324w13/fltI.pdf · Example IDEs: Eclipse, PyDev, Java Workshop But the language is not these things: ... Store the result

CSC324 – Formal Language Theory

Afsaneh Fazly1

January 9, 2013

1Thanks to A.Tafliovich, S.McIlraith, E.Joanis, S.Stevenson, G.Penn, D.Horton

1

Page 2: CSC324 Formal Language Theory - University of …afsaneh/csc324w13/fltI.pdf · Example IDEs: Eclipse, PyDev, Java Workshop But the language is not these things: ... Store the result

The only language men ever speak perfectly is the onethey learn in babyhood, when no one can teach themanything.

— Maria Montessori

2

Page 3: CSC324 Formal Language Theory - University of …afsaneh/csc324w13/fltI.pdf · Example IDEs: Eclipse, PyDev, Java Workshop But the language is not these things: ... Store the result

What is a Programming Language (PL)?

We tend to think of a compiler or an IDE as a programminglanguage:

• Example compilers: javac, gcc

• Example IDEs: Eclipse, PyDev, Java Workshop

But the language is not these things:

• A programming language is an abstract entity with somespecficiation.

• Compiler and IDE are pieces of software that implement thelanguage.

3

Page 4: CSC324 Formal Language Theory - University of …afsaneh/csc324w13/fltI.pdf · Example IDEs: Eclipse, PyDev, Java Workshop But the language is not these things: ... Store the result

Language Specification

Natural language:

• Syntax: specification of the ways in which words are puttogether to form phrases and sentences.

• Semantics: specification of the relation among the meaningsof words and phrases and sentences.

Programming language:

• Formal Syntax

units are not words but tokens

• Formal Semantics

defining semantics is hard!

4

Page 5: CSC324 Formal Language Theory - University of …afsaneh/csc324w13/fltI.pdf · Example IDEs: Eclipse, PyDev, Java Workshop But the language is not these things: ... Store the result

Language Specification

Natural language:

• Syntax: specification of the ways in which words are puttogether to form phrases and sentences.

• Semantics: specification of the relation among the meaningsof words and phrases and sentences.

Programming language:

• Formal Syntaxunits are not words but tokens

• Formal Semanticsdefining semantics is hard!

4

Page 6: CSC324 Formal Language Theory - University of …afsaneh/csc324w13/fltI.pdf · Example IDEs: Eclipse, PyDev, Java Workshop But the language is not these things: ... Store the result

More on Syntax

Syntax of a language tells us:

• what is legal or acceptable.

• what the relationships or dependecies in a legal sentence are.

Which one is acceptable in English?

• “used kids clothing store”

4

• “clothing store used kids”

4

What is the structure (dependencies among words)?

• need a way to represent structure: parse tree.

5

Page 7: CSC324 Formal Language Theory - University of …afsaneh/csc324w13/fltI.pdf · Example IDEs: Eclipse, PyDev, Java Workshop But the language is not these things: ... Store the result

More on Syntax

Syntax of a language tells us:

• what is legal or acceptable.

• what the relationships or dependecies in a legal sentence are.

Which one is acceptable in English?

• “used kids clothing store” 4

• “clothing store used kids”

4

What is the structure (dependencies among words)?

• need a way to represent structure: parse tree.

5

Page 8: CSC324 Formal Language Theory - University of …afsaneh/csc324w13/fltI.pdf · Example IDEs: Eclipse, PyDev, Java Workshop But the language is not these things: ... Store the result

More on Syntax

Syntax of a language tells us:

• what is legal or acceptable.

• what the relationships or dependecies in a legal sentence are.

Which one is acceptable in English?

• “used kids clothing store” 4

• “clothing store used kids” 4

What is the structure (dependencies among words)?

• need a way to represent structure: parse tree.

5

Page 9: CSC324 Formal Language Theory - University of …afsaneh/csc324w13/fltI.pdf · Example IDEs: Eclipse, PyDev, Java Workshop But the language is not these things: ... Store the result

More on Syntax

Syntax of a language tells us:

• what is legal or acceptable.

• what the relationships or dependecies in a legal sentence are.

Which one is acceptable in English?

• “used kids clothing store” 4

• “clothing store used kids” 4

What is the structure (dependencies among words)?

• need a way to represent structure: parse tree.

5

Page 10: CSC324 Formal Language Theory - University of …afsaneh/csc324w13/fltI.pdf · Example IDEs: Eclipse, PyDev, Java Workshop But the language is not these things: ... Store the result

Stages of Translation Process

1. Lexical analysisconverts source code into sequence of tokens.

2. Syntactic analysisstructures tokens into initial parse tree.

3. Semantic analysisaugments parse tree with semantic information, and performssemantic checks such as type checking, object binding, etc.

4. Code generationperforms optimizations, and produces final machine code.

6

Page 11: CSC324 Formal Language Theory - University of …afsaneh/csc324w13/fltI.pdf · Example IDEs: Eclipse, PyDev, Java Workshop But the language is not these things: ... Store the result

Specifying Syntax Informally

Example:

“Everything between /* and */ is a comment and should beignored.”

/* Do such and such, watching out for problems.

Store the result in y. */

x = 3;

y = x * 17.2;

When syntax is defined informally, incompatible dialects of thelanguage may evolve.

7

Page 12: CSC324 Formal Language Theory - University of …afsaneh/csc324w13/fltI.pdf · Example IDEs: Eclipse, PyDev, Java Workshop But the language is not these things: ... Store the result

Specifying Syntax Formally

• The state of the art is to define PL syntax formally.

• There are a number of well-understood formalisms for doingso. We’ll talk about this in some detail.

8

Page 13: CSC324 Formal Language Theory - University of …afsaneh/csc324w13/fltI.pdf · Example IDEs: Eclipse, PyDev, Java Workshop But the language is not these things: ... Store the result

Specifying Syntax Formally

• Lexical rules specify the form of the building blocks of thelanguage:

• what’s a token(keyword, identifier, literal, operator, punctuation; plus the listof keywords and the form of identifiers)

• how tokens are delimited• where white space can go• syntax of comments

• Syntactic rules specify how to put the building blockstogether:

• what are the acceptable combinations of keywords, identifiers,operators, etc. into larger units, such as mathematical/logicalexpressions and statements?

9

Page 14: CSC324 Formal Language Theory - University of …afsaneh/csc324w13/fltI.pdf · Example IDEs: Eclipse, PyDev, Java Workshop But the language is not these things: ... Store the result

Specifying Syntax Formally

• Lexical rules specify the form of the building blocks of thelanguage:

• what’s a token(keyword, identifier, literal, operator, punctuation; plus the listof keywords and the form of identifiers)

• how tokens are delimited• where white space can go• syntax of comments

• Syntactic rules specify how to put the building blockstogether:

• what are the acceptable combinations of keywords, identifiers,operators, etc. into larger units, such as mathematical/logicalexpressions and statements?

⇒ A tool for specifying lexical/syntactic rules is Grammer

9

Page 15: CSC324 Formal Language Theory - University of …afsaneh/csc324w13/fltI.pdf · Example IDEs: Eclipse, PyDev, Java Workshop But the language is not these things: ... Store the result

Today

• Grammar: a tool for specifying syntax of a language• Regular Grammar• Context-Free Grammar

• Derivations and Parse Trees: useful tools for implementationof a PL (compiler)

• Ambiguity in a PL

• Translation Process

10

Page 16: CSC324 Formal Language Theory - University of …afsaneh/csc324w13/fltI.pdf · Example IDEs: Eclipse, PyDev, Java Workshop But the language is not these things: ... Store the result

Grammar

Informal idea of grammar: a bunch of rules. For example:

• Don’t end a sentence with a preposition.

• Subject and verb must agree in number.

A formal grammar is a different concept.

A language is a set of strings. A grammar generates a language,i.e., it specifies which strings are in the language.

There are many kinds of formal grammar.

11

Page 17: CSC324 Formal Language Theory - University of …afsaneh/csc324w13/fltI.pdf · Example IDEs: Eclipse, PyDev, Java Workshop But the language is not these things: ... Store the result

Grammar

Informal idea of grammar: a bunch of rules. For example:

• Don’t end a sentence with a preposition.

• Subject and verb must agree in number.

A formal grammar is a different concept.

A language is a set of strings. A grammar generates a language,i.e., it specifies which strings are in the language.

There are many kinds of formal grammar.

11

Page 18: CSC324 Formal Language Theory - University of …afsaneh/csc324w13/fltI.pdf · Example IDEs: Eclipse, PyDev, Java Workshop But the language is not these things: ... Store the result

Grammar

Informal idea of grammar: a bunch of rules. For example:

• Don’t end a sentence with a preposition.

• Subject and verb must agree in number.

A formal grammar is a different concept.

A language is a set of strings. A grammar generates a language,i.e., it specifies which strings are in the language.

There are many kinds of formal grammar.

11

Page 19: CSC324 Formal Language Theory - University of …afsaneh/csc324w13/fltI.pdf · Example IDEs: Eclipse, PyDev, Java Workshop But the language is not these things: ... Store the result

Chomsky’s Hierarchy

[ named after linguist & political activist Noam Chomsky, whoresearched grammars for natural languages. ]

There are several categories of grammar, ordered from least tomost expressive:

• Regular Grammars

=⇒ generate Regular Languages

• Context-Free Grammars

=⇒ generate Context-Free Languages

• Context-Sensitive Grammars

• Unrestricted Grammars

12

Page 20: CSC324 Formal Language Theory - University of …afsaneh/csc324w13/fltI.pdf · Example IDEs: Eclipse, PyDev, Java Workshop But the language is not these things: ... Store the result

Chomsky’s Hierarchy

[ named after linguist & political activist Noam Chomsky, whoresearched grammars for natural languages. ]

There are several categories of grammar, ordered from least tomost expressive:

• Regular Grammars =⇒ generate Regular Languages

• Context-Free Grammars =⇒ generate Context-Free Languages

• Context-Sensitive Grammars

• Unrestricted Grammars

12

Page 21: CSC324 Formal Language Theory - University of …afsaneh/csc324w13/fltI.pdf · Example IDEs: Eclipse, PyDev, Java Workshop But the language is not these things: ... Store the result

Regular Expressions

Regular expressions and regular grammars are two equivalent waysfor describing regular languages.

Example Regular Expressions (REs):

• (0 + 1)∗

• 1∗(: + ; )∗

• (a + b)∗aa(a + b)∗ + ε

Notation:

• Kleene star: ∗ superscript denotes 0 or more repetitions.

• alternation: binary “+” denotes choice.It is sometimes denoted by |, i.e., (0|1)∗.

• grouping: “(” and “)” are used for grouping

• empty string: ε (epsilon) denotes the empty or ”null” string.

• empty language: ∅ denotes the language with no strings.

13

Page 22: CSC324 Formal Language Theory - University of …afsaneh/csc324w13/fltI.pdf · Example IDEs: Eclipse, PyDev, Java Workshop But the language is not these things: ... Store the result

Regular Expressions: Formal Definition

Let Σ be a given finite alphabet. A string is a regular expression(RE) if and only if it can be derived from the primitive regularexpressions:• ∅• ε• any a, s.t. a ∈ Σ

by a finite number of applications of the following rules:

If r1 and r2 are REs, then so are:• r1 + r2, or r1|r2 (union, alternation)• r1r2, or r1 · r2 (concatenation)• r∗1 (Kleene closure)• (r1) (grouping)

14

Page 23: CSC324 Formal Language Theory - University of …afsaneh/csc324w13/fltI.pdf · Example IDEs: Eclipse, PyDev, Java Workshop But the language is not these things: ... Store the result

Regular Languages

Each RE r (defined over an alphabet Σ) specifies a regularlanguage, denoted L(r).

Languages specified by primitive REs: L(∅), L(ε), ...

Languages specified by more complex REs: L(r1 + r2), L(r∗), ...

15

Page 24: CSC324 Formal Language Theory - University of …afsaneh/csc324w13/fltI.pdf · Example IDEs: Eclipse, PyDev, Java Workshop But the language is not these things: ... Store the result

Regular Languages (cont’d)

Languages specified by primitive REs:

• L(∅) is

the empty language, i.e., the language that containsno strings, denoted by {}

• L(ε) is

the language {ε}

• for a ∈ Σ, L(a) is

the language {a}

16

Page 25: CSC324 Formal Language Theory - University of …afsaneh/csc324w13/fltI.pdf · Example IDEs: Eclipse, PyDev, Java Workshop But the language is not these things: ... Store the result

Regular Languages (cont’d)

Languages specified by primitive REs:

• L(∅) is the empty language, i.e., the language that containsno strings, denoted by {}

• L(ε) is

the language {ε}

• for a ∈ Σ, L(a) is

the language {a}

16

Page 26: CSC324 Formal Language Theory - University of …afsaneh/csc324w13/fltI.pdf · Example IDEs: Eclipse, PyDev, Java Workshop But the language is not these things: ... Store the result

Regular Languages (cont’d)

Languages specified by primitive REs:

• L(∅) is the empty language, i.e., the language that containsno strings, denoted by {}

• L(ε) is the language {ε}• for a ∈ Σ, L(a) is

the language {a}

16

Page 27: CSC324 Formal Language Theory - University of …afsaneh/csc324w13/fltI.pdf · Example IDEs: Eclipse, PyDev, Java Workshop But the language is not these things: ... Store the result

Regular Languages (cont’d)

Languages specified by primitive REs:

• L(∅) is the empty language, i.e., the language that containsno strings, denoted by {}

• L(ε) is the language {ε}• for a ∈ Σ, L(a) is the language {a}

16

Page 28: CSC324 Formal Language Theory - University of …afsaneh/csc324w13/fltI.pdf · Example IDEs: Eclipse, PyDev, Java Workshop But the language is not these things: ... Store the result

Regular Languages (cont’d)

Languages specified by more complex REs:

If r1 and r2 are regular expressions, then

• L(r1 + r2) = L(r1) ∪ L(r2),where S ∪ T = {s | s ∈ S or s ∈ T}

• L(r1r2) = L(r1)L(r2),where ST = {st | s ∈ S , t ∈ T}

• L((r1)) = L(r1)

• L(r∗1 ) = (L(r1))∗,where S∗ = S0 ∪ S1 ∪ S2 ∪ · · · ,and S0 = {ε},

S1 = S ,S j = set of all possible concatenations of j strings from S

17

Page 29: CSC324 Formal Language Theory - University of …afsaneh/csc324w13/fltI.pdf · Example IDEs: Eclipse, PyDev, Java Workshop But the language is not these things: ... Store the result

Regular Languages: Examples

Let Σ = {a, b}, r1 = a, and r2 = b.

L(r1|r2) =?

L(r1.r2) =?

L((r1)) =?

L(r∗1 ) =?

L((r1|r2)∗) =?

18

Page 30: CSC324 Formal Language Theory - University of …afsaneh/csc324w13/fltI.pdf · Example IDEs: Eclipse, PyDev, Java Workshop But the language is not these things: ... Store the result

Regular Languages: Summary

A Regular Language is a language that:

• is generated by a Regular Grammar.

• can be described / specified by a regular expression.

• is accepted by a finite-state automaton.

We won’t talk about automata in this course: you studied them inCSC236/... .

19

Page 31: CSC324 Formal Language Theory - University of …afsaneh/csc324w13/fltI.pdf · Example IDEs: Eclipse, PyDev, Java Workshop But the language is not these things: ... Store the result

Regular Expressions: Examples

Give regular expressions for these languages:

1. All alphanumeric strings beginning with an upper-case letter.

2. All strings of a’s and b’s in which the third-last character is b.

3. All strings of 0’s and 1’s in which every pair of adjacent 0’sappears before any pairs of adjacent 1’s.

4. All binary numbers with exactly three 1’s.

20

Page 32: CSC324 Formal Language Theory - University of …afsaneh/csc324w13/fltI.pdf · Example IDEs: Eclipse, PyDev, Java Workshop But the language is not these things: ... Store the result

Quiz

Could we use a regular grammar to describe a Natural Language(e.g., English)? This question was first asked and answered byChomsky (1956, 1957)

21

Page 33: CSC324 Formal Language Theory - University of …afsaneh/csc324w13/fltI.pdf · Example IDEs: Eclipse, PyDev, Java Workshop But the language is not these things: ... Store the result

Regular Expressions: Limitaions

Regular expressions are not powerful enough to describe somelanguages.

Examples:

• The language consisting of all strings of one or more a’sfollowed by the same number of b’s.

• The language consisting of strings containing a’s, b’s, leftbrackets, and right brackets, such that the brackets match.

Question: How can we be sure there is no regular expression forthese languages?

Question: Exactly what things can and cannot be expressed witha regular expression?

22

Page 34: CSC324 Formal Language Theory - University of …afsaneh/csc324w13/fltI.pdf · Example IDEs: Eclipse, PyDev, Java Workshop But the language is not these things: ... Store the result

Regular Expressions: Limitaions

Regular expressions are not powerful enough to describe somelanguages.

Examples:

• The language consisting of all strings of one or more a’sfollowed by the same number of b’s.

• The language consisting of strings containing a’s, b’s, leftbrackets, and right brackets, such that the brackets match.

Question: How can we be sure there is no regular expression forthese languages?

Question: Exactly what things can and cannot be expressed witha regular expression?

22

Page 35: CSC324 Formal Language Theory - University of …afsaneh/csc324w13/fltI.pdf · Example IDEs: Eclipse, PyDev, Java Workshop But the language is not these things: ... Store the result

Context-Free Grammars (CFGs)

CFGs are more powerful than regular expressions.

A CFG has four parts:

• A set of terminals:the atomic symbols (tokens) of the language.

• A set of non-terminals:“variables” used in the grammar.

• A special non-terminal chosen as the starting non-terminal orstart symbol:represents the top-level construct of the language.

• A set of rules (or productions), each specifying one legal waythat a non-terminal can be rewritten to a sequence ofterminals and non-terminals.

23

Page 36: CSC324 Formal Language Theory - University of …afsaneh/csc324w13/fltI.pdf · Example IDEs: Eclipse, PyDev, Java Workshop But the language is not these things: ... Store the result

Context-Free Grammars: Example

A CFG for real numbers:

• Terminals:

0 1 2 3 4 5 6 7 8 9 . ±

• Non-terminals:

real-number, part, digit, sign

• Productions:

• A real-number is a ± sign followed by a numerical part,possibly followed by “.” and another numerical part

• A part is a sequence of digits, i.e., a digit, or a digit followedby a part

• A digit is any single terminal except “.”, “+”, and “-”

• Start symbol:

real-number

Note that we use recursion to specify repeated occurrences.

We have “defined” this CFG using plain English. Not a good idea.

24

Page 37: CSC324 Formal Language Theory - University of …afsaneh/csc324w13/fltI.pdf · Example IDEs: Eclipse, PyDev, Java Workshop But the language is not these things: ... Store the result

Context-Free Grammars: Example

A CFG for real numbers:

• Terminals: 0 1 2 3 4 5 6 7 8 9 . ±• Non-terminals:

real-number, part, digit, sign

• Productions:• A real-number is a ± sign followed by a numerical part,

possibly followed by “.” and another numerical part• A part is a sequence of digits, i.e., a digit, or a digit followed

by a part• A digit is any single terminal except “.”, “+”, and “-”

• Start symbol:

real-number

Note that we use recursion to specify repeated occurrences.

We have “defined” this CFG using plain English. Not a good idea.

24

Page 38: CSC324 Formal Language Theory - University of …afsaneh/csc324w13/fltI.pdf · Example IDEs: Eclipse, PyDev, Java Workshop But the language is not these things: ... Store the result

Context-Free Grammars: Example

A CFG for real numbers:

• Terminals: 0 1 2 3 4 5 6 7 8 9 . ±• Non-terminals: real-number, part, digit, sign

• Productions:• A real-number is a ± sign followed by a numerical part,

possibly followed by “.” and another numerical part• A part is a sequence of digits, i.e., a digit, or a digit followed

by a part• A digit is any single terminal except “.”, “+”, and “-”

• Start symbol: real-number

Note that we use recursion to specify repeated occurrences.

We have “defined” this CFG using plain English. Not a good idea.

24

Page 39: CSC324 Formal Language Theory - University of …afsaneh/csc324w13/fltI.pdf · Example IDEs: Eclipse, PyDev, Java Workshop But the language is not these things: ... Store the result

Context-Free Grammars: Example

A CFG for real numbers:

• Terminals: 0 1 2 3 4 5 6 7 8 9 . ±• Non-terminals: real-number, part, digit, sign

• Productions:• A real-number is a ± sign followed by a numerical part,

possibly followed by “.” and another numerical part• A part is a sequence of digits, i.e., a digit, or a digit followed

by a part• A digit is any single terminal except “.”, “+”, and “-”

• Start symbol: real-number

Note that we use recursion to specify repeated occurrences.

We have “defined” this CFG using plain English. Not a good idea.

24