Table-driven parsing Parsing performed by a finite state machine. Parsing algorithm is...

22
Table-driven parsing Parsing performed by a finite state machine. Parsing algorithm is language- independent. FSM driven by table (s) generated automatically from grammar. Language generator tables parser Input tables stack
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    233
  • download

    1

Transcript of Table-driven parsing Parsing performed by a finite state machine. Parsing algorithm is...

Page 1: Table-driven parsing Parsing performed by a finite state machine. Parsing algorithm is language-independent. FSM driven by table (s) generated automatically.

Table-driven parsing

• Parsing performed by a finite state machine.• Parsing algorithm is language-independent.• FSM driven by table (s) generated automatically from

grammar.• Language generator tables

parser

Input

tables

stack

Page 2: Table-driven parsing Parsing performed by a finite state machine. Parsing algorithm is language-independent. FSM driven by table (s) generated automatically.

Pushdown Automata

• A context-free grammar can be recognized by a finite state machine with a stack: a PDA.

• The PDA is defined by set of internal states and a transition table.

• The PDA can read the input and read/write on the stack.• The actions of the PDA are determined by its current state, the

current top of the stack, and the current input symbol.• There are three distinguished states:

– start state: nothing seen– accept state: sentence complete– error state: current symbol doesn’t belong.

Page 3: Table-driven parsing Parsing performed by a finite state machine. Parsing algorithm is language-independent. FSM driven by table (s) generated automatically.

Top-down parsing

• Parse tree is synthesized from the root (sentence symbol).

• Stack contains symbols of rhs of current production, and pending non-terminals.

• Automaton is trivial (no need for explicit states)• Transition table indexed by grammar symbol G and

input symbol a. Entries in table are terminals or productions: P ABC…

Page 4: Table-driven parsing Parsing performed by a finite state machine. Parsing algorithm is language-independent. FSM driven by table (s) generated automatically.

Top-down parsing

• Actions:– initially, stack contains sentence symbol– At each step, let S be symbol on top of stack, and a be the

next token on input.– if T (S, a) is terminal a, read token, pop symbol from stack– if T (S, a) is production P ABC…., remove S from

stack, push the symbols A, B, C on the stack (A on top).– If S is the sentence symbol and a is the end of file, accept.– If T (S, a) is undefined, signal error.

• Semantic action: when starting a production, build tree node for non-terminal, attach to parent.

Page 5: Table-driven parsing Parsing performed by a finite state machine. Parsing algorithm is language-independent. FSM driven by table (s) generated automatically.

Table-driven parsing and recursive descent parsing

• Recursive descent: every production is a procedure. Call stack holds active procedures corresponding to pending non-terminals.

• Stack still needed for context-sensitive legality checks, error messages, etc.

• Table-driven parser: recursion simulated with explicit stack.

Page 6: Table-driven parsing Parsing performed by a finite state machine. Parsing algorithm is language-independent. FSM driven by table (s) generated automatically.

Building the parse table

• Define two functions on the symbols of the grammar: FIRST and FOLLOW.

• For a non-terminal N, FIRST (N) is the set of terminal symbols that can start any derivation from N.– First (If_Statement) = {if}– First (Expr) = {id, ( }

• FOLLOW (N) is the set of terminals that can appear after a string derived from N:– Follow (Expr) = {+, ), $ }

Page 7: Table-driven parsing Parsing performed by a finite state machine. Parsing algorithm is language-independent. FSM driven by table (s) generated automatically.

Computing FIRST (N)

• If N First (N) includes

• if N aABC First (N) includes a

• if N X1X2 First (N) includes First (X1)

• if N X1X2… and X1 ,

• First (N) includes First (X2)

• Obvious generalization to First () where a is X1X2...

Page 8: Table-driven parsing Parsing performed by a finite state machine. Parsing algorithm is language-independent. FSM driven by table (s) generated automatically.

Computing First (N)

• Grammar for expressions, without left-recursion: E TE’ | T

E’ +TE’ | T FT’ | F

T’ *FT’ | F id | (E)

• First (F) = { id, ( } • First (T’) = { *, } First (T) = { id, ( }• First (E’) = { +, } First (E) = { id, ( }

Page 9: Table-driven parsing Parsing performed by a finite state machine. Parsing algorithm is language-independent. FSM driven by table (s) generated automatically.

Computing Follow (N)

• Follow (N) is computed from productions in which N appears on the rhs

• For the sentence symbol S, Follow (S) includes $ • if A N , Follow (N) includes First ()

– because an expansion of N will be followed by an expansion from

• if A N, Follow (N) includes Follow (A)– because N will be expanded in the context in which A is

expanded

• if A N B , B , Follow (N) includes Follow (A)

Page 10: Table-driven parsing Parsing performed by a finite state machine. Parsing algorithm is language-independent. FSM driven by table (s) generated automatically.

Computing Follow (N)

E TE’ | T

E’ +TE’ | T FT’ | F

T’ *FT’ | F id | (E)

• Follow (E) = { ), $ } Follow (E’) = { ), $ }• Follow (T) = First (E’ ) + Follow (E’) = { +, ), $ }• Follow (T’) = Follow (T) = { +, ), $ }• Follow (F) = First (T’) + Follow (T’) = { *, +, ), $ }

Page 11: Table-driven parsing Parsing performed by a finite state machine. Parsing algorithm is language-independent. FSM driven by table (s) generated automatically.

Building LL (1) parse tables

Table indexed by non-terminal and token. Table entry is a production:

for each production P: A loop for each terminal a in First () loop T (A, a) := P; end loop; if in First (), then for each terminal b in Follow () loop T (A, b) := P; end loop; end if;end loop;• All other entries are errors.• If two assignments conflict, parse table cannot be built.

Page 12: Table-driven parsing Parsing performed by a finite state machine. Parsing algorithm is language-independent. FSM driven by table (s) generated automatically.

LL (1) grammars

• If table construction is successful, grammar is LL (1): left-to right, leftmost derivation with one-token lookahead.

• If construction fails, can conceive of LL (2), etc.• Ambiguous grammars are never LL (k)• If a terminal is in First for two different productions of

A, the grammar cannot be LL (1).• Grammars with left-recursion are never LL (k)• Some useful constructs are not LL (k)

Page 13: Table-driven parsing Parsing performed by a finite state machine. Parsing algorithm is language-independent. FSM driven by table (s) generated automatically.

Bottom-up parsing

• Synthesize tree from fragments• Automaton performs two actions:

– shift: push next symbol on stack– reduce: replace symbols on stack

• Automaton synthesizes (reduces) when end of a production is recognized

• States of automaton encode synthesis so far, and expectation of pending non-terminals

• Automaton has potentially large set of states• Technique more general than LL (k)

Page 14: Table-driven parsing Parsing performed by a finite state machine. Parsing algorithm is language-independent. FSM driven by table (s) generated automatically.

LR (k) parsing

• Left-to-right, rightmost derivation with k-token lookahead.

• Most general parsing technique for deterministic grammars.

• In general, not practical: tables too large (10^6 states for C++, Ada).

• Common subsets: SLR, LALR (1).

Page 15: Table-driven parsing Parsing performed by a finite state machine. Parsing algorithm is language-independent. FSM driven by table (s) generated automatically.

The states of the LR(0) automaton

• An item is a point within a production, indicating that part of the production has been recognized:– A . B ,

• seen the expansion of , expect to see expansion of B

• A state is a set of items• Transition within states are determined by terminals

and non-terminals• Parsing tables are built from automaton:

– action: shift / reduce depending on next symbol– goto: change state depending on synthesized non-terminal

Page 16: Table-driven parsing Parsing performed by a finite state machine. Parsing algorithm is language-independent. FSM driven by table (s) generated automatically.

Building LR (0) states

• If a state includes:

A . B • it also includes every state that is the start of B:

B . X Y Z • Informally: if I expect to see B next, I expect to see

anything that B can start with, and so on:

X . G H I • States are built by closure from individual items.

Page 17: Table-driven parsing Parsing performed by a finite state machine. Parsing algorithm is language-independent. FSM driven by table (s) generated automatically.

A grammar of expressions: initial state

• E’ E• E E + T | T; -- left-recursion ok here.• T T * F | F;• F id | (E) • S0 = { E’ .E, E .E + T, E .T,

F .id, F . ( E ) ,

T .T * F, T .F}

Page 18: Table-driven parsing Parsing performed by a finite state machine. Parsing algorithm is language-independent. FSM driven by table (s) generated automatically.

Adding states

• If a state has item A .a ,

and the next symbol in the input is a, we shift a on the stack and enter a state that contains item

• A a.

(as well as all other items brought in by closure)• if a state has as item A . , this indicates the

end of a production: reduce action. • If a state has an item A .N , then after a

reduction that find an N, go to a state with A N.

Page 19: Table-driven parsing Parsing performed by a finite state machine. Parsing algorithm is language-independent. FSM driven by table (s) generated automatically.

The LR (0) states for expressions

• S1 = { E’ E., E E. + T }

• S2 = { E T., T T. * F }

• S3 = { T F. }

• S4 = { F (. E), } + S0 (by closure)

• S5 = { F id. }

• S6 = { E E +. T, T .T * F, T .F, F .id, F .(E)}

• S7 = { T T *. F, F .id, F .(E)}

• S8 = { F (E.), E E.+ T}

• S9 = { E E + T., T T.* F}

• S10 = { T T * F.}, S11 = {F (E).}

Page 20: Table-driven parsing Parsing performed by a finite state machine. Parsing algorithm is language-independent. FSM driven by table (s) generated automatically.

Building SLR tables

• An arc between two states labeled with a terminal is a shift action.

• An arc between two states labeled with a non-terminal is a goto action.

• if a state contains an item A , (a reduce item)• the action is to reduce by this production, for all

terminals in Follow (A).

• If there are shift-reduce conflicts or reduce-reduce

conflicts, more elaborate techniques are needed.

Page 21: Table-driven parsing Parsing performed by a finite state machine. Parsing algorithm is language-independent. FSM driven by table (s) generated automatically.

LR (k) parsing

• Canonical LR (1): annotate each item with its own follow set:

• (A -> a. , f )• f is a subset of the follow set of A, because it is

derived from a single specific production for A• A state that includes A -> a.is a reduce state only

if next symbol is in f: fewer reduce actions, fewer conflicts, technique is more powerful than SLR (1)

• Generalization: use sequences of k symbols in f• Disadvantage: state explosion: impractical in general,

even for LR (1)

Page 22: Table-driven parsing Parsing performed by a finite state machine. Parsing algorithm is language-independent. FSM driven by table (s) generated automatically.

LALR (1)

• Compute follow set for a small set of items

• Tables no bigger than SLR (1)

• Same power as LR (1), slightly worse error diagnostics

• Incorporated into yacc, bison, etc.