Chomsky Normal Form - University of Notre Damecpennycu/2019/assets/fall/TOC... · Chomsky Normal...

15
Chomsky Normal Form

Transcript of Chomsky Normal Form - University of Notre Damecpennycu/2019/assets/fall/TOC... · Chomsky Normal...

  • Chomsky Normal Form

  • Consider the CFG G

    1

    :

    S → aSb | SS | ε

    And the CFG G

    2

    :

    S → aSb | SX | εX → X | S | ε

    1. Are G

    1

    and G

    2

    equivalent?

    (That is, do they generate the

    same language?)

    2. Is there an advantage of one CFG

    over the other?

    3. Can the disadvantage of G

    2

    be

    eliminated?

    4. How could that procedure be

    formalized?

    Chomsky Normal Form.

  • Chomsky Normal Form

    A context-free grammar is in Chomsky

    Normal Form if every rule is of the form:

    S ⟶ εA ⟶ BCA ⟶ a

    Where a is any terminal and A, B, and C are

    any variables, except that B and C may not

    be the start variable.

    In this example, S is the start variable.

    Implied by the definition:

    ● In A ⟶ BC, only two nonterminals are allowed in the RHS.

    ● In A ⟶ a, only one terminal is allowed in the RHS.

    ● In S ⟶ ε, only the start state may go to ε.● S may not appear on the RHS.

    Simplifying and bounding grammars.

  • Theorem: Any CFG to CNFIdea: Make conversion in stages, systematically

    changing or removing rules that do not match

    the requirements.

    Stages:

    1. Add new start S

    0

    and rule S

    0

    ⟶ S, where S was the original start variable.

    (Guarantees S

    0

    isn’t on RHS.)

    2. Eliminate ε-rules. For all A ⟶ ε, where A is not S

    0

    , then remove the rule and for

    each occurrence of A in any RHS, add a

    new rule with the occurrence deleted.

    (e.g., for every rule R ⟶ uAv, we would add rule R ⟶ uv.)

    2. (cont.) Replacement happens for each

    occurrence of A, so R ⟶ uAvAw would add R ⟶ uvAw | uAvw | uvw.

    If R ⟶ A exists, then add R ⟶ ε (unless this rule has already been removed).

    Repeat until all ε-rules are removed.

    3. Remove unit rules. For all rules of the

    form A ⟶ B, remove the rule and for all B ⟶ u, add the rule A ⟶ u unless this was a unit rule previously removed (u is a string

    of variables and terminals).

    Repeat until all unit rules removed.

  • Theorem: Any CFG to CNF (continued)4. Convert all remaining rules into the

    proper form.

    Reminder: Proper form is

    A ⟶ BCA ⟶ a

    Replace each rule A ⟶ u1

    u

    2

    …u

    k

    , where

    k ≥ 3 and each u

    i

    is a variable or terminal,

    with rules:

    A ⟶ u1

    A

    1

    A

    1

    ⟶ u2

    A

    2

    A

    2

    ⟶ u3

    A

    3

    A

    k-2

    ⟶ uk-1

    u

    k

    .

    4. (cont.) A

    i

    ’s are new variables.

    Replace any terminal u

    i

    in the preceding

    rules with the new variable U

    i

    and add

    the rule U

    i

    ⟶ ui

    .

    Repeat until all rules are in the proper

    form.

  • Example: G6

    to CNF: Step 1Before:

    S ⟶ ASA | aBA ⟶ B | SB ⟶ b | ε

    After (new starting state):

    S

    0

    ⟶ SS ⟶ ASA | aBA ⟶ B | SB ⟶ b | ε

  • Example: G6

    to CNF: Step 2

    After (remove ε-rules, B ⟶ ε):S

    0

    ⟶ SS ⟶ ASA | aB | aA ⟶ B | S | εB ⟶ b | ε

    After (remove ε-rules, A ⟶ ε):S

    0

    ⟶ SS ⟶ ASA | aB | a | SA | AS | SA ⟶ B | S | εB ⟶ b

    Previous:

    S

    0

    ⟶ SS ⟶ ASA | aBA ⟶ B | SB ⟶ b | ε

  • Example: G6

    to CNF: Step 3

    After (remove unit rules, S ⟶ S):S

    0

    ⟶ SS ⟶ ASA | aB | a | SA | AS | SA ⟶ B | SB ⟶ b

    After (remove unit rules, S

    0

    ⟶ S):S

    0

    ⟶ S | ASA | aB | a | SA | ASS ⟶ ASA | aB | a | SA | ASA ⟶ B | SB ⟶ b

    Previous:

    S

    0

    ⟶ SS ⟶ ASA | aB | a | SA | AS | SA ⟶ B | SB ⟶ b

  • Example: G6

    to CNF: Step 3 (cont.)

    After (remove unit rules, A ⟶ B):S

    0

    ⟶ ASA | aB | a | SA | ASS ⟶ ASA | aB | a | SA | ASA ⟶ B | S | bB ⟶ b

    After (remove unit rules, A ⟶ S):S

    0

    ⟶ ASA | aB | a | SA | ASS ⟶ ASA | aB | a | SA | ASA ⟶ S | b | ASA | aB | a | SA | ASB ⟶ b

    Previous:

    S

    0

    ⟶ ASA | aB | a | SA | ASS ⟶ ASA | aB | a | SA | ASA ⟶ B | SB ⟶ b

  • After (Eliminate aB on RHS):

    S

    0

    ⟶ AA1

    | aB | UB | a | SA | AS

    S ⟶ AA1

    | aB | UB | a | SA | AS

    A ⟶ b | AA1

    | UB | a | SA | AS

    A

    1

    ⟶ SAU ⟶ aB ⟶ b

    Example: G6

    to CNF: Step 4Previous:

    S

    0

    ⟶ ASA | aB | a | SA | ASS ⟶ ASA | aB | a | SA | ASA ⟶ b | ASA | aB | a | SA | ASB ⟶ b

    After (Eliminate ASA on RHS):

    S

    0

    ⟶ ASA | AA1

    | aB | a | SA | AS

    S ⟶ ASA | AA1

    | aB | a | SA | AS

    A ⟶ b | ASA | AA1

    | aB | a | SA | AS

    A

    1

    ⟶ SAB ⟶ b

    Previous:

    S

    0

    ⟶ AA1

    | aB | a | SA | AS

    S ⟶ AA1

    | aB | a | SA | AS

    A ⟶ b | AA1

    | UB | a | SA | AS

    A

    1

    ⟶ SAB ⟶ b

  • Example: G6

    to CNFBefore:

    S ⟶ ASA | aBA ⟶ B | SB ⟶ b | ε

    After (Final):

    S

    0

    ⟶ AA1

    | UB | a | SA | AS

    S ⟶ AA1

    | UB | a | SA | AS

    A ⟶ b | AA1

    | UB | a | SA | AS

    A

    1

    ⟶ SAU ⟶ aB ⟶ b

    Important: Upper size bound is the square of the original Grammar size.

  • Convert this to CNF:

    S ⟶ 0S1 | ε

    1. Add new start S

    0

    2. Eliminate ε-rules.3. Remove unit rules.

    4. Convert all remaining rules into

    the proper form.

  • Convert S ⟶ 0S1 | ε to CNFStep 1:

    S

    0

    ⟶ SS ⟶ 0S1 | ε

    Step 2:

    S

    0

    ⟶ S | εS ⟶ 0S1 | 01

    Step 3:

    S

    0

    ⟶ 0S1 | 01 | εS ⟶ 0S1 | 01

    Step 4, part 1:

    S

    0

    ⟶ 0A | 01 | εS ⟶ 0A | 01A ⟶ S1

    Step 4, part 2:

    S

    0

    ⟶ BA | BC | εS ⟶ BA | BCA ⟶ SCB ⟶ 0C ⟶ 1

  • CNF in Retrospect 1. The book does NOT give a good proof. Why is it a bad example of a proof?2. Wikipedia gives a better overview of the

    proof.

    https://en.wikipedia.org/wiki/Chomsky_normal_form

    3. What benefit does CNF have?

    Guaranteed maximum final grammar size.

    Guaranteed maximum derivation length.

    4. Why do we need CNF?

    Proofs! Algorithms! Big-O Analysis!

    5. BUT!!!! Does CNF eliminate ambiguity?

    No.

    https://en.wikipedia.org/wiki/Chomsky_normal_form

  • Chomsky Language Hierarchies

    https://commons.wikimedia.org/wiki/File:Chomsky-hierarchy.svg

    Formal Grammar Classifications

    Each class is a subset of the class

    above it.