MELJUN CORTES -- AUTOMATA THEORY LECTURE - 8
-
Upload
meljun-cortes-mbampa -
Category
Documents
-
view
216 -
download
0
Transcript of MELJUN CORTES -- AUTOMATA THEORY LECTURE - 8
-
7/31/2019 MELJUN CORTES -- AUTOMATA THEORY LECTURE - 8
1/26
CSC 3130: Automata theory and formal languages
Andrej Bogdanov
http://www.cse.cuhk.edu.hk/~andrejb/csc3130
Context-free languages
-
7/31/2019 MELJUN CORTES -- AUTOMATA THEORY LECTURE - 8
2/26
Context-free grammar
This is an a different model for describinglanguages
The language is specified by productions
(substitution rules) that tell how strings can be
obtained, e.g.
Using these rules, we can derive strings like this:
A 0A1
A B
B #
A, B are variables
0, 1, # are terminals
A is the start variable
A 0A1 00A11 000A111 000B111 000#111
-
7/31/2019 MELJUN CORTES -- AUTOMATA THEORY LECTURE - 8
3/26
Some natural examples
Context-free grammars were first used fornaturallanguages
a girl with a flower likes the boy
SENTENCE
NOUN-PHRASE VERB-PHRASE
ART NOUN PREP ART NOUN VERB ART NOUN
CMPLX-NOUNCMPLX-NOUN CMPLX-NOUN
PREP-PHRASE
CMPLX-VERB
NOUN-PHRASE
-
7/31/2019 MELJUN CORTES -- AUTOMATA THEORY LECTURE - 8
4/26
Natural languages
We can describe (some fragments) of the Englishlanguage by a context-free grammar:
SENTENCE NOUN-PHRASE VERB-PHRASE
NOUN-PHRASE CMPLX-NOUN
NOUN-PHRASE CMPLX-NOUN PREP-PHRASE
VERB-PHRASE CMPLX-VERBVERB-PHRASE CMPLX-VERB PREP-PHRASE
PREP-PHRASE PREP CMPLX-NOUN
CMPLX-NOUN ARTICLE NOUN
CMPLX-VERB VERB NOUN-PHRASE
CMPLX-VERB VERB
ARTICLE a
ARTICLE the
NOUN boy
NOUN girlNOUN flower
VERB likes
VERB touches
VERB sees
PREP with
variables: SENTENCE, NOUN-PHRASE,
terminals: a, the, boy, girl, flower, likes, touches, sees, with
start variable: SENTENCE
-
7/31/2019 MELJUN CORTES -- AUTOMATA THEORY LECTURE - 8
5/26
Programming languages
Context-free grammars are also used to describe(parts of) programming languages
For instance, expressions like (2 + 3) * 5 or
3 + 8 + 2 * 7 can be described by the CFG
+
*
()
0 1
9
Variables:
Terminals:+, *, (, ), 0, 1, , 9
-
7/31/2019 MELJUN CORTES -- AUTOMATA THEORY LECTURE - 8
6/26
Motivation for studying CFGs
Context-free grammars are essential forunderstanding the meaning of computer
programs
They are used in compilers
code:(2 + 3) * 5
meaning: add 2 and 3, and then multiply by 5
-
7/31/2019 MELJUN CORTES -- AUTOMATA THEORY LECTURE - 8
7/26
Definition of context-free grammar
A context-free grammar(CFG) is a 4-tuple(V, T, P, S) where
Vis a finite set ofvariables ornon-terminals
Tis a finite set ofterminals (VT= )
Pis a set ofproductions orsubstitution rules of theform
whereA is a symbol in Vand a is a string overVT S is a variable inVcalled the start variable
A a
-
7/31/2019 MELJUN CORTES -- AUTOMATA THEORY LECTURE - 8
8/26
Shorthand notation for productions
When we have multiple productions with thesame variable on the left like
we can write this in shorthand as
E E + E
E E * E
E (E)
E N
E E + E | E * E | (E) | 0 | 1
N 0N | 1N | 0 | 1
Variables: E, N
Terminals: +, *, (, ), 0, 1
Start variable: E
N 0N
N 1N
N 0
N 1
-
7/31/2019 MELJUN CORTES -- AUTOMATA THEORY LECTURE - 8
9/26
Derivation
A derivation is a sequential application ofproductions:
E
derivation
E * E
(E) * E
(E) * N
(E + E ) * N
(E + E ) * 1
(E + N) * 1
(N + N) * 1
(N + 1N) * 1 (N + 10) * 1
(1 + 10) * 1
a b
means b can be obtained
from a with one production
a b*
means b can be obtained
from a afterzero or more
productions
-
7/31/2019 MELJUN CORTES -- AUTOMATA THEORY LECTURE - 8
10/26
Language of a CFG
The language of a CFG(V, T, P, S) is the set of allstrings containing only terminals that can bederived from the start variable S
This is a language over the alphabet T
A language Lis context-free if it is the language
of some CFG
L = { | T* and S }*
-
7/31/2019 MELJUN CORTES -- AUTOMATA THEORY LECTURE - 8
11/26
Example 1
Is the string 00#11 in L?
How about 00#111, 00#0#1#11?
What is the language of this CFG?
A 0A1 | B
B #
variables:A, B
terminals: 0, 1, #
start variable:A
L= {0n#1n: n 0}
-
7/31/2019 MELJUN CORTES -- AUTOMATA THEORY LECTURE - 8
12/26
Example 2
Give derivations of(), (()())
How about ())?
S SS | (S) | convention: variables in uppercase,
terminals in lowercase, start variable first
S (S) (rule 2)
() (rule 3)
S (S) (rule 2)
(SS) (rule 1) ((S)S) (rule 2)
((S)(S)) (rule 2) (()(S)) (rule 3) (()()) (rule 3)
-
7/31/2019 MELJUN CORTES -- AUTOMATA THEORY LECTURE - 8
13/26
Examples: Designing CFGs
Write a CFG for the following languages Linear equations overx, y, z, like:
x + 5yz = 911xy = 2
Numbers without leading zeros, e.g., 109, 0 but not 019
The language L= {anbncmdm| n 0, m 0}
The language L= {anbmcmdn| n 0, m 0}
-
7/31/2019 MELJUN CORTES -- AUTOMATA THEORY LECTURE - 8
14/26
Context-free versus regular
Write a CFG for the language (0 + 1)*111
Can you do so forevery regular language?
Proof:
S A111
A | 0A | 1A
Every regular language is context-free
regular
expressionDFANFA
-
7/31/2019 MELJUN CORTES -- AUTOMATA THEORY LECTURE - 8
15/26
From regular to context-free
regular expression
a (alphabet symbol)
E1 + E2
CFG
E1E2
E1*
grammar with no rules
S
S a
S S1 | S2
S S1S2
S SS1 |
In all cases, S becomes the newstart symbol
-
7/31/2019 MELJUN CORTES -- AUTOMATA THEORY LECTURE - 8
16/26
Context-free versus regular
Is every context-free language regular? No! We already saw some examples:
This language is context-free but not regular
A 0A1 | BB # L= {0n#1n: n 0}
-
7/31/2019 MELJUN CORTES -- AUTOMATA THEORY LECTURE - 8
17/26
Parse tree
Derivations can also be represented using parsetrees
E E + E
V + E
x + E
x + (E)
x + (E - E)
x + (V- E) x + (y- E)
x + (y- V)
x + (y- z)
E
E E+
V ( E )
E - E
V V
x
y z
E E + E | E - E | (E) | V
V x | y | z
-
7/31/2019 MELJUN CORTES -- AUTOMATA THEORY LECTURE - 8
18/26
Definition of parse tree
A parse tree for a CFG G is an ordered tree withlabels on the nodes such that
Every internal node is labeled by a variable
Every leaf is labeled by a terminal or
Leaves labeled by have no siblings
If a node is labeledA and has childrenA1, , Ak from
left to right, then the rule
is a production in G.
A A1Ak
-
7/31/2019 MELJUN CORTES -- AUTOMATA THEORY LECTURE - 8
19/26
Left derivation
Always derive the leftmost variable first:
Corresponds to a left-to-right traversal of parse
tree
E E + E
V + E
x + E x + (E)
x + (E - E)
x + (V- E)
x + (y- E)
x + (y- V)
x + (y- z)
E
E E+
V ( E )
E - E
V V
x
y z
-
7/31/2019 MELJUN CORTES -- AUTOMATA THEORY LECTURE - 8
20/26
Ambiguity
A grammar is ambiguous if some strings havemore than one parse tree
Example: E E + E | E - E | (E) | VV x | y | z
x + y + z
E
E E+
E E+VV Vx
y z
E
E E+
E E+ V
V V
x y
z
-
7/31/2019 MELJUN CORTES -- AUTOMATA THEORY LECTURE - 8
21/26
Why ambiguity matters
The parse tree represents the intended meaning:
x + y + z
E
E E+
E E+VV Vx
y z
E
E E+
E E+ V
V V
x y
z
first add yand z,and then add this to x
first add x and y,and then add zto this
-
7/31/2019 MELJUN CORTES -- AUTOMATA THEORY LECTURE - 8
22/26
Why ambiguity matters
Suppose we also had multiplication:E E + E | E - E | E E | (E) | VV x | y | z
x y + z
E
E E*
E E+V
V Vx
y z
E
E E+
E E V
V V
x y
z
first x y, then + zfirst y + z, then x
-
7/31/2019 MELJUN CORTES -- AUTOMATA THEORY LECTURE - 8
23/26
Disambiguation
Sometimes we can rewrite the grammartoremove the ambiguity
Rewrite grammar so cannot be broken by +:
E E + E | E - E | E E | (E) | VV x | y | z
E T | E + T | E - T
T F | T FF (E) | V
V x | y | z
T stands forterm:x * (y + z)
F stands forfactor: x, (y + z)
A term always splits into factors
A factor is either a variable or a
parenthesized expression
-
7/31/2019 MELJUN CORTES -- AUTOMATA THEORY LECTURE - 8
24/26
Disambiguation
Example
x y + z
E T | E + T | E - T
T F | T FF (E) | V
V x | y | z
V V V
F
T
FT
E
E
T
F
-
7/31/2019 MELJUN CORTES -- AUTOMATA THEORY LECTURE - 8
25/26
Disambiguation
Can we always disambiguate a grammar?
No, for two reasons
There exists an inherently ambiguous context-free L:Every CFG for this language is ambiguous
There is no general procedure that can tell if a
grammar is ambiguous
However, grammars used in programming
languages can typically be disambiguated
-
7/31/2019 MELJUN CORTES -- AUTOMATA THEORY LECTURE - 8
26/26
Another Example
Is ab, baba, abbbaa in L?
How about a, bba?
What is the language of this CFG?
Is the CFG ambiguous?
S aB | bAA a | aS | bAA
B b | bS | aBB