Chapter 4: Top-Down Parsing
description
Transcript of Chapter 4: Top-Down Parsing
1
Chapter 4: Top-Down Parsing
2
Objectives of Top-Down Parsing
an attempt to find a leftmost derivation for an input string.
an attempt to construct a parse tree for the input string starting from the root and creating the nodes of the parse tree in preorder.
Input String :
> > >lm lm lm
4
1. with backtracking (making repeated scans of the input, a general form of top-down parsing)
Methods: To create a procedure for each nonterminal.
Approaches of Top-Down Parsing
e.g. S -> cAd A -> ab | a S( ) { if input symbol == ‘c’ A( ) { isave= input-pointer; { Advance(); if input-symbol == ‘a’ if A() { Advance(); if input-symbol == ‘d’ if input-symbol == ‘b’ { Advance(); { Advance(); return true; return true; } } } } return false; input-pointer = isave; } if input-symbol == ‘a’ { Advance(); return true; } else return false; }
L = { cabd, cad }
c a d
Problems for top-down parsing with backtracking :
(1) left-recursion (can cause a top-down parser to go into an infinite loop)
Def. A grammar is said to be left-recursive if it has a nonterminal A s.t. there is a derivation A => A for some .
(2) backtracking - undo not only the movement but
also the semantics entering in symbol table. (3) the order the alternatives are tried (For the grammar
shown above, try w = cabd where A -> a is applied first)
+
7
With immediate left recursion: A -> A | ==> transform into A -> A' A' -> A' |
A
A
A .
A
A
.
===>
A
A'
A'
A'..
A'
Elimination of Left-Recursion
…
8
e.g. E -> E + T | T T -> T * F | F F -> (E) | id
After transformation:
E -> TE' E' -> +TE' | T -> FT' T' -> *FT' | F -> (E) | id
9
General form (with left recursion):A -> A 1 | A 2 | ... | A n | 1 | 2 | ... | m
After transformation: ==> A -> 1 A' | 2 A' | ... | m A' A' -> 1 A' | 2 A' | ... | n A' |
10
How about left recursion occurred for derivation with more than two steps?
e.g., S -> Aa | b A -> Ac | Sd | e where S => Aa => Sda
Algorithm: Eliminating left recursion Input Context-free Grammar G with no cycles (i.e., A => A ) or -productionMethods: 1. Arrange the nonterminals in some order A1, A2, ... , An2. for i = 1 to n do { for j = 1 to i -1 do replace each production of the form Ai -> Aj by the production Ai -> 1 | 2 | ... | k , where Aj -> 1 | 2 | ... | k are all current Aj-production; eliminate the immediate left-recursion among the Ai- production; }
+
12
An Example
e.g. S -> Aa | b A -> Ac | Sd | e Step 1: ==> S -> Aa | bStep 2: ==> A -> Ac | Aad | bd | e Step 3: ==> A -> bdA' |eA' A' -> cA' |adA' |
2. Non-backtracking (recursive-descent) parsing recursive descent : use a collection of mutually recursive routines to perform the syntax analysis.
Left Factoring : A -> 1 | 2 ==> A -> A' A' -> 1 | 2
Methods: 1. For each nonterminal A find the longest prefix common to two or
more of its alternatives. If replace all the A productions A -> 1 | 2 | ... | n | others by A -> A‘ | others A' -> 1 | 2 | ... |
n 2. Repeat the transformation until no more founde.g. S -> iCtS | iCtSeS | a C -> b ==> S -> iCtSS' | a S' -> eS | C -> b
14
Predicative Parsing
Features: - maintains a stack rather than recursive calls - table-drivenComponents: 1. An input buffer with end marker ($) 2. A stack with endmarker ($) on the bottom 3. A parsing table, a two-dimensional array M[A,a],
where ‘A’ is a nonterminal symbol and ‘a’ is the current input symbol (terminal/token). The entry of each array element can be a production (grammar rule) or blank.
15
Parsing Table
M[A,a]
S
(
S ( S ) S
)
S ε S ε
$
16
Algorithm:
Input: An input string w and a parsing table M for grammar G.
Output: A leftmost derivation of w or an error indication.
Initially w$ is in input buffer and S$ is in the stack. Method: do { Let a of w be the next input symbol and X be the top stack symbol;
if X is a terminal { if X == a then pop X from stack and remove a from input; else ERROR();} else { if M[X, a] = X -> Y1Y2...Yn then 1. pop X from the stack; 2. push YnYn-1...Y1 onto the stack with Y1 on top; else ERROR(); } } while (X ≠ $) if (X == $) and (the next input symbol == $) then accept else error();
Starting Symbol of the grammar
19
An Example
22
Construction of the parsing table for predictive parser
First and Follow Def. First() /* denotes grammar symbol*/ is the set of terminals that begin the string derived from . If => , then is also in First().
Def. Follow(A), A is a nonterminal, is the set of terminals a that can appear immediately to the right of A in some sentential form, that is, the set of terminals 'a' s.t. there exists a derivation of the form S =>* A a for some and . If A can be the rightmost symbol in some sentential form, then is in Follow(A).
*
23
Compute First(X) for all grammar symbols X:
1. If X is terminal, then First(X) = {X}.2. If X -> is a production then is in First(X).3. If X is nonterminal and X -> Y1Y2...Yk is a
production, then place 'a' in First(X) if for some i, a is in First(Yi), and is in all of First(Y1), ... , First(Yi-1); that is Y1 ... Yi-1 => . If is in First(Yj) for all j = 1,2,...,k, then add in First(X).
*
24
An Example
E -> TE' E' -> +TE'| T -> FT' T' -> *FT‘ | F -> (E) | id First(E) = First(T) = First(F) = {(, id} First(E') = {+, } First(T') = {*, }
25
26
Compute Follow(A) for all nonterminals A
1. Place $ in Follow(S), where S is the start symbol and $ is the input buffer endmarker.
2. If there is a production A -> B , then everything in First() except for is placed in Follow(B).
3. If there is a production A -> B, or a production A -> B where First() contains , then everything in Follow(A) is in Follow(B).
27
An Example
E -> TE' E' -> +TE'| T -> FT' T' -> *FT' | F -> (E) | id /* E is the start symbol */
Follow(E) = { $,) } // rules 1 & 2Follow(E') = { $,) } // rule 3Follow(T) = { +,$,) } // rules 2 & 3Follow(T') = { +,$,) } // rule 3Follow(F) = { *,+,$,) } // rules 2 & 3
28
E -> TE' E' -> +TE'| T -> FT' T' -> *FT‘ | F -> (E) | id First(E) = First(T) = First(F) = {(, id} First(E') = {+, } First(T') = {*, }
29
Construct a Predicative Parsing Table
1. For each production A -> of the grammar, do steps 2 and 3.
2. For each terminal a in First(), add A -> to M[A, a].3. If is in First(), add A -> to M[A, b] for each terminal
b in Follow(A). If is in First() and $ is in Follow(A), add A -> to M[A, $].
4. Make each undefined entry of M be error.
LL(1) grammar A grammar whose parsing table has no multiply-defined
entries is said to be LL(1).
First 'L' : scan the input from left to right. Second 'L': produce a leftmost derivation. '1' : use one input symbol to determine parsing action.
* No ambiguous or left-recursive grammar can be LL(1).
31
Properties of LL(1) grammar
A grammar G is LL(1) iff whenever A -> | are two distinct productions of G, the following conditions hold:
(1) For no terminal a do both and derive strings beginning with a. (based on method 2)
First() ∩ First() = Φ (2) At most one of and can derive the empty string
(based on method 3).
(3) if => then does not derive any string beginning with a terminal in Follow (A) (based on methods 2 and 3).
First() ∩ Follow(A) = Φ (i.e. If First(A) contains then First(A) ∩ Follow(A) = Φ)
*
32
Def. for Multiply-defined entry If G is left-recursive or ambiguous, then M will have at least one multiply-defined entry. e.g. S -> iCtSS'| a S' -> eS | C -> b generates: M[S',e] = { S' -> , S' -> eS} with multiply-
defined entry.
33
Parsing table with multiply-defined entry
a b e i t $
S S-> a S -> iCtSS'
S’ S’-> S' -> eS
S’->
C C->b
34
Difficulty in predictive parsing
- Left recursion elimination and left factoring make the resulting grammar hard to read and difficult to use for translation purpose.
Thus:* Use predictive parser for control constructs* Use operator precedence for expressions.