LUCY JEANNETTE BERMÚDEZ BERMÚDEZ CONSEJO DE ESTADO SALA DE ...
Parsing Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming...
-
Upload
cuthbert-doyle -
Category
Documents
-
view
218 -
download
1
Transcript of Parsing Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming...
Parsing
Prepared by
Manuel E. Bermúdez, Ph.D.Associate ProfessorUniversity of Florida
Programming Language PrinciplesLecture 3
Context-Free Grammars
• Definition: A context-free grammar (CFG) is a quadruple G = (, , P, S), where all productions are of the form A → , for A and (u )*.
• Re-writing using grammar rules:
– βAγ => βγ if A → (derivation).
String Derivations
• Left-most derivation: At each step, the left-most nonterminal is re-written.
• Right-most derivation: At each step, the right-most nonterminal is re-written.
Derivation Trees
Derivation trees: Describe re-writes, independently of the order (left-most or right-most).
• Each tree branch matches a production rule in the grammar.
Derivation Trees
Notes:1) Leaves are terminals.2) Bottom contour is the sentence.3) Left recursion causes left branching.4) Right recursion causes right branching.
Goal of Parsing
• Examine input string, determine whether it's legal.
• Equivalent to building derivation tree. • Added benefit: tree embodies syntactic
structure of input.• Therefore, tree should be unique.
Ambiguous Grammars
• Definition: A CFG is ambiguous if there exist two different right-most (or left-most, but not both) derivations for some sentence z.
• (Equivalent) Definition: A CFG is ambiguous if there exist two different derivation trees for some sentence z.
Ambiguous Grammars
Classic ambiguities:
– Simultaneous left/right recursion: E → E + E
→ i
– Dangling else problem: S → if E then S → if E then S else S →
Operator Precedence and Associativity
• Let’s build a CFG for expressions consisting of:
– elementary identifier i.– + and - (binary ops) have lowest
precedence, and are left associative .– * and / (binary ops) have middle
precedence, and are right associative.– + and - (unary ops) have highest
precedence, and are right associative.
Corresponding Grammar for Expressions
E → E + T E consists of T's, → E - T separated by –’s and +'s → T (lowest precedence).T → F * T T consists of F's, → F / T separated by *'s and /'s → F (next precedence).F → - F F consists of a single P, → + F preceded by +'s and -'s. → P (next precedence).P → '(' E ')' P consists of a parenthesized E, → i or a single i (highest precedence).
Operator Precedence and Associativity
• Operator precedence:– The lower in the grammar, the higher the
precedence.• Operator Associativity:
– Tie breaker for precedence.– Left recursion in the grammar means
• left associativity of the operator,• left branching in the tree.
– Right recursion in the grammar means• right associativity of the operator,• right branching in the tree.
Building Derivation Trees
Sample Input : - + i - i * ( i + i ) / i + i
(Human) derivation tree construction:
• Bottom-up.• On each pass, scan entire expression,
process operators with highest precedence (parentheses are highest).
• Lowest precedence operators are last, at the top of tree.
Abstract Syntax Trees
• AST is a condensed version of the derivation tree.
• No noise (intermediate nodes).• String-to-tree transduction grammar:
– rules of the form A → ω => 's'. • Build 's' tree node, with one child per tree
from each nonterminal in ω.
Example
E → E + T => + → E - T => - → TT → F * T => * → F / T => / → FF → - F => neg → + F => + → PP → '(' E ')' → i => i
Sample Input : - + i - i * ( i + i ) / i + i
String-to-Tree Transduction
• We transduce from vocabulary of input symbols, to vocabulary of tree node names.
• Could eliminate construction of unary + node, anticipating semantics.
F → - F => neg → + F // no more unary + node → P
The Game of Syntactic Dominoes• The grammar:
E → E+T T → P*T P → (E) → T → P → i
• The playing pieces: An arbitrary supply of each piece (one per grammar rule).
• The game board:• Start domino at the top.• Bottom dominoes are the "input."
The Game of Syntactic Dominoes
• Game rules: – Add game pieces to the board.– Match the flat parts and the symbols.– Lines are infinitely elastic.
• Object of the game:– Connect start domino with the input
dominoes.– Leave no unmatched flat parts.
Parsing Strategies
• Same as for the game of syntactic dominoes.
– “Top-down” parsing: start at the start symbol, work toward the input string.
– “Bottom-up” parsing: start at the input string, work towards the goal symbol.
• In either strategy, can process the input left-to-right or right-to-left
Top-Down Parsing
• Attempt a left-most derivation, by predicting the re-write that will match the remaining input.
• Use a string (a stack, really) from which the input can be derived.
Top-Down Parsing
Start with S on the stack.At every step, two alternatives:
1) (the stack) begins with a terminal t. Match t against the first input symbol.
2) begins with a nonterminal A. Consult an OPF (Omniscient Parsing Function) to determine which production for A would lead to a match with the first symbol of the input.
The OPF does the “predicting” in such a predictive parser.
Classical Top-Down Parsing Algorithm
Push (Stack, S);while not Empty (Stack) do
if Top(Stack) then if Top(Stack) = Head(input)
then input := tail(input)Pop(Stack)
else error (Stack, input)else P:= OPF (Stack, input)
Push (Pop(Stack), RHS(P))od
Top-Down Parsing
• Most parsing methods impose bounds on the amount of stack lookback and input lookahead. For programming languages, a common choice is (1,1).
• We must define OPF (A,t), where A is the top element of the stack, and t is the first symbol on the input.
• Storage requirements: O(n2), where n is the size of the grammar vocabulary (a few hundred).
LL(1) Grammars
Definition:A CFG G is LL(1) (Left-to-right, Left-most, one-symbol lookahead) iff for all A, and for all A→, A→, ,
Select (A → ) ∩ Select (A → ) =
• Previous example: Grammar is not LL(1).• More later on why, and what do to about it.
Example:
S → A {b,}A → bAd {b} → {d, }
Disjoint!
Grammar is LL(1)!
d b
S S → A S → P
A A → A → bAd A →
(At most) one production per entry.
Parsing
Prepared by
Manuel E. Bermúdez, Ph.D.Associate ProfessorUniversity of Florida
Programming Language PrinciplesLecture 3