Chomsky Normal Form - University of Notre Damecpennycu/2019/assets/fall/TOC... · Chomsky Normal...
Transcript of Chomsky Normal Form - University of Notre Damecpennycu/2019/assets/fall/TOC... · Chomsky Normal...
-
Chomsky Normal Form
-
Consider the CFG G
1
:
S → aSb | SS | ε
And the CFG G
2
:
S → aSb | SX | εX → X | S | ε
1. Are G
1
and G
2
equivalent?
(That is, do they generate the
same language?)
2. Is there an advantage of one CFG
over the other?
3. Can the disadvantage of G
2
be
eliminated?
4. How could that procedure be
formalized?
Chomsky Normal Form.
-
Chomsky Normal Form
A context-free grammar is in Chomsky
Normal Form if every rule is of the form:
S ⟶ εA ⟶ BCA ⟶ a
Where a is any terminal and A, B, and C are
any variables, except that B and C may not
be the start variable.
In this example, S is the start variable.
Implied by the definition:
● In A ⟶ BC, only two nonterminals are allowed in the RHS.
● In A ⟶ a, only one terminal is allowed in the RHS.
● In S ⟶ ε, only the start state may go to ε.● S may not appear on the RHS.
Simplifying and bounding grammars.
-
Theorem: Any CFG to CNFIdea: Make conversion in stages, systematically
changing or removing rules that do not match
the requirements.
Stages:
1. Add new start S
0
and rule S
0
⟶ S, where S was the original start variable.
(Guarantees S
0
isn’t on RHS.)
2. Eliminate ε-rules. For all A ⟶ ε, where A is not S
0
, then remove the rule and for
each occurrence of A in any RHS, add a
new rule with the occurrence deleted.
(e.g., for every rule R ⟶ uAv, we would add rule R ⟶ uv.)
2. (cont.) Replacement happens for each
occurrence of A, so R ⟶ uAvAw would add R ⟶ uvAw | uAvw | uvw.
If R ⟶ A exists, then add R ⟶ ε (unless this rule has already been removed).
Repeat until all ε-rules are removed.
3. Remove unit rules. For all rules of the
form A ⟶ B, remove the rule and for all B ⟶ u, add the rule A ⟶ u unless this was a unit rule previously removed (u is a string
of variables and terminals).
Repeat until all unit rules removed.
-
Theorem: Any CFG to CNF (continued)4. Convert all remaining rules into the
proper form.
Reminder: Proper form is
A ⟶ BCA ⟶ a
Replace each rule A ⟶ u1
u
2
…u
k
, where
k ≥ 3 and each u
i
is a variable or terminal,
with rules:
A ⟶ u1
A
1
A
1
⟶ u2
A
2
A
2
⟶ u3
A
3
…
A
k-2
⟶ uk-1
u
k
.
4. (cont.) A
i
’s are new variables.
Replace any terminal u
i
in the preceding
rules with the new variable U
i
and add
the rule U
i
⟶ ui
.
Repeat until all rules are in the proper
form.
-
Example: G6
to CNF: Step 1Before:
S ⟶ ASA | aBA ⟶ B | SB ⟶ b | ε
After (new starting state):
S
0
⟶ SS ⟶ ASA | aBA ⟶ B | SB ⟶ b | ε
-
Example: G6
to CNF: Step 2
After (remove ε-rules, B ⟶ ε):S
0
⟶ SS ⟶ ASA | aB | aA ⟶ B | S | εB ⟶ b | ε
After (remove ε-rules, A ⟶ ε):S
0
⟶ SS ⟶ ASA | aB | a | SA | AS | SA ⟶ B | S | εB ⟶ b
Previous:
S
0
⟶ SS ⟶ ASA | aBA ⟶ B | SB ⟶ b | ε
-
Example: G6
to CNF: Step 3
After (remove unit rules, S ⟶ S):S
0
⟶ SS ⟶ ASA | aB | a | SA | AS | SA ⟶ B | SB ⟶ b
After (remove unit rules, S
0
⟶ S):S
0
⟶ S | ASA | aB | a | SA | ASS ⟶ ASA | aB | a | SA | ASA ⟶ B | SB ⟶ b
Previous:
S
0
⟶ SS ⟶ ASA | aB | a | SA | AS | SA ⟶ B | SB ⟶ b
-
Example: G6
to CNF: Step 3 (cont.)
After (remove unit rules, A ⟶ B):S
0
⟶ ASA | aB | a | SA | ASS ⟶ ASA | aB | a | SA | ASA ⟶ B | S | bB ⟶ b
After (remove unit rules, A ⟶ S):S
0
⟶ ASA | aB | a | SA | ASS ⟶ ASA | aB | a | SA | ASA ⟶ S | b | ASA | aB | a | SA | ASB ⟶ b
Previous:
S
0
⟶ ASA | aB | a | SA | ASS ⟶ ASA | aB | a | SA | ASA ⟶ B | SB ⟶ b
-
After (Eliminate aB on RHS):
S
0
⟶ AA1
| aB | UB | a | SA | AS
S ⟶ AA1
| aB | UB | a | SA | AS
A ⟶ b | AA1
| UB | a | SA | AS
A
1
⟶ SAU ⟶ aB ⟶ b
Example: G6
to CNF: Step 4Previous:
S
0
⟶ ASA | aB | a | SA | ASS ⟶ ASA | aB | a | SA | ASA ⟶ b | ASA | aB | a | SA | ASB ⟶ b
After (Eliminate ASA on RHS):
S
0
⟶ ASA | AA1
| aB | a | SA | AS
S ⟶ ASA | AA1
| aB | a | SA | AS
A ⟶ b | ASA | AA1
| aB | a | SA | AS
A
1
⟶ SAB ⟶ b
Previous:
S
0
⟶ AA1
| aB | a | SA | AS
S ⟶ AA1
| aB | a | SA | AS
A ⟶ b | AA1
| UB | a | SA | AS
A
1
⟶ SAB ⟶ b
-
Example: G6
to CNFBefore:
S ⟶ ASA | aBA ⟶ B | SB ⟶ b | ε
After (Final):
S
0
⟶ AA1
| UB | a | SA | AS
S ⟶ AA1
| UB | a | SA | AS
A ⟶ b | AA1
| UB | a | SA | AS
A
1
⟶ SAU ⟶ aB ⟶ b
Important: Upper size bound is the square of the original Grammar size.
-
Convert this to CNF:
S ⟶ 0S1 | ε
1. Add new start S
0
2. Eliminate ε-rules.3. Remove unit rules.
4. Convert all remaining rules into
the proper form.
-
Convert S ⟶ 0S1 | ε to CNFStep 1:
S
0
⟶ SS ⟶ 0S1 | ε
Step 2:
S
0
⟶ S | εS ⟶ 0S1 | 01
Step 3:
S
0
⟶ 0S1 | 01 | εS ⟶ 0S1 | 01
Step 4, part 1:
S
0
⟶ 0A | 01 | εS ⟶ 0A | 01A ⟶ S1
Step 4, part 2:
S
0
⟶ BA | BC | εS ⟶ BA | BCA ⟶ SCB ⟶ 0C ⟶ 1
-
CNF in Retrospect 1. The book does NOT give a good proof. Why is it a bad example of a proof?2. Wikipedia gives a better overview of the
proof.
https://en.wikipedia.org/wiki/Chomsky_normal_form
3. What benefit does CNF have?
Guaranteed maximum final grammar size.
Guaranteed maximum derivation length.
4. Why do we need CNF?
Proofs! Algorithms! Big-O Analysis!
5. BUT!!!! Does CNF eliminate ambiguity?
No.
https://en.wikipedia.org/wiki/Chomsky_normal_form
-
Chomsky Language Hierarchies
https://commons.wikimedia.org/wiki/File:Chomsky-hierarchy.svg
Formal Grammar Classifications
Each class is a subset of the class
above it.