Bottom-Up Syntax Analysis Mooly Sagiv html://msagiv/courses/wcc03.html Textbook:Modern Compiler...
-
date post
20-Dec-2015 -
Category
Documents
-
view
228 -
download
1
Transcript of Bottom-Up Syntax Analysis Mooly Sagiv html://msagiv/courses/wcc03.html Textbook:Modern Compiler...
Bottom-Up Syntax Analysis
Mooly Sagiv
html://www.math.tau.ac.il/~msagiv/courses/wcc03.htmlTextbook:Modern Compiler Design
Chapter 2.2.5
Efficient Parsers • Pushdown automata
• Deterministic
• Report an error as soon as the input is not a prefix of a valid program
• Not usable for all context free grammars
bison
context free grammar
tokens parser
“Ambiguity errors”
parse tree
Kinds of Parsers
• Top-Down (Predictive Parsing) LL– Construct parse tree in a top-down matter– Find the leftmost derivation– For every non-terminal and token predict the next
production
• Bottom-Up LR– Construct parse tree in a bottom-up manner– Find the rightmost derivation in a reverse order– For every potential right hand side and token decide when a
production is found
Bottom-Up Syntax Analysis
• Input– A context free grammar– A stream of tokens
• Output– A syntax tree or error
• Method– Construct parse tree in a bottom-up manner– Find the rightmost derivation in (reversed order)– For every potential right hand side and token decide when a
production is found– Report an error as soon as the input is not a prefix of valid
program
Plan
• Pushdown automata
• Bottom-up parsing (informal)
• Bottom-up parsing (given a parser table)
• Constructing the parser table
• Interesting non LR grammars
Pushdown Automaton
control
parser-table
input
stack
$
$u t w
V
Informal ExampleS E$ E T | E + T T i | ( E )
i + i $
input
$
stack tree
input
+ i $i$
stack tree
i
shift
Informal ExampleS E$ E T | E + T T i | ( E )
input
+ i $i$
stack tree
ireduce T i
input
+ i $T$
stack tree
i
T
Informal ExampleS E$ E T | E + T T i | ( E )
input
+ i $T$
stack tree
ireduce E T
T
input
+ i $E$
stacktree
i
TE
Informal ExampleS E$ E T | E + T T i | ( E )
shift
input
+ i $E$
stacktree
i
TE
input
i $+E$
stacktree
i
TE
+
Informal ExampleS E$ E T | E + T T i | ( E )
shift
input
i $+E$
stacktree
i
TE
+
input
$i+E$
stacktree
i
TE
+ i
Informal ExampleS E$ E T | E + T T i | ( E )
reduce T i
input
$i+E$
stacktree
i
TE
+ i
input
$T+E$
stacktree
i
TE
+ i
T
Informal ExampleS E$ E T | E + T T i | ( E )
reduce E E + T
input
$T+E$
stacktree
i
TE
+ i
T
input
$E$
stacktree
i
TE
+ T
E
Informal ExampleS E$ E T | E + T T i | ( E )
input
$E$
stacktree
i
TE
+ T
E
shift
input
$ E
$
stacktree
i
TE
+ T
E
Informal ExampleS E$ E T | E + T T i | ( E )
input
$$ E
$
stacktree
i
TE
+ T
E
reduce Z E $
Informal Example
reduce E E + T
reduce T i
reduce E T
reduce T i
reduce Z E $
Handles• Identify the leftmost node (nonterminal) that has not
been constructed but all whose children have been constructed– A
Identifying Handles
• Create a finite state automaton over grammar symbols– Accepting states identify handles
• Report if the grammar is inadequate
• Push automaton states onto parser stack
• Use the automaton states to decide – Shift/Reduce
Example Finite State Automaton
Example Control Table
i + ( ) $ E T
0 s5 err s7 err err 1 6
1 err s3 err err s2
2 acc
3 s5 err s7 err err 4
4 reduce EE+T
5 reduce T i
6 reduce E T
7 s5 err s7 err err 8 6
8 err s3 err s9 err
9 reduce T(E)
Example Control Table (2)
i + i $
input
s0($)
stack
shift 5stack
i + ( ) $ E T
0 s5 err s7 err err 1 6
1 err s3 err err s2
2 acc
3 s5 err s7 err err 4
4 reduce EE+T
5 reduce T i
6 reduce E T
7 s5 err s7 err err 8 6
8 err s3 err s9 err
9 reduce T(E)
+ i $
inputs5 (i)
s0 ($) reduce T i
stack
i + ( ) $ E T
0 s5 err s7 err err 1 6
1 err s3 err err s2
2 acc
3 s5 err s7 err err 4
4 reduce EE+T
5 reduce T i
6 reduce E T
7 s5 err s7 err err 8 6
8 err s3 err s9 err
9 reduce T(E)
+ i $
inputs6 (T)
s0 ($) reduce E T
stack
i + ( ) $ E T
0 s5 err s7 err err 1 6
1 err s3 err err s2
2 acc
3 s5 err s7 err err 4
4 reduce EE+T
5 reduce T i
6 reduce E T
7 s5 err s7 err err 8 6
8 err s3 err s9 err
9 reduce T(E)
+ i $
input
s1(E)
s0 ($)
shift 3
stack
i + ( ) $ E T
0 s5 err s7 err err 1 6
1 err s3 err err s2
2 acc
3 s5 err s7 err err 4
4 reduce EE+T
5 reduce T i
6 reduce E T
7 s5 err s7 err err 8 6
8 err s3 err s9 err
9 reduce T(E)
i $
inputs3 (+)
s1(E)
s0 ($)
shift 5
stack
i + ( ) $ E T
0 s5 err s7 err err 1 6
1 err s3 err err s2
2 acc
3 s5 err s7 err err 4
4 reduce EE+T
5 reduce T i
6 reduce E T
7 s5 err s7 err err 8 6
8 err s3 err s9 err
9 reduce T(E)
$
inputs5 (i)
s3 (+)
s1(E)
s0($)
reduce T i
i + ( ) $ E T
0 s5 err s7 err err 1 6
1 err s3 err err s2
2 acc
3 s5 err s7 err err 4
4 reduce EE+T
5 reduce T i
6 reduce E T
7 s5 err s7 err err 8 6
8 err s3 err s9 err
9 reduce T(E)
$
inputs4 (T)
s3 (+)
s1(E)
s0($)
reduce E E + Tstack
i + ( ) $ E T
0 s5 err s7 err err 1 6
1 err s3 err err s2
2 acc
3 s5 err s7 err err 4
4 reduce EE+T
5 reduce T i
6 reduce E T
7 s5 err s7 err err 8 6
8 err s3 err s9 err
9 reduce T(E)
$
input
s1 (E)
s0 ($)shift 2
stack
i + ( ) $ E T
0 s5 err s7 err err 1 6
1 err s3 err err s2
2 acc
3 s5 err s7 err err 4
4 reduce EE+T
5 reduce T i
6 reduce E T
7 s5 err s7 err err 8 6
8 err s3 err s9 err
9 reduce T(E)
inputs2 ($)
s1 (E)
s0 ($)
reduce Z E $
stack
i + ( ) $ E T
0 s5 err s7 err err 1 6
1 err s3 err err s2
2 acc
3 s5 err s7 err err 4
4 reduce EE+T
5 reduce T i
6 reduce E T
7 s5 err s7 err err 8 6
8 err s3 err s9 err
9 reduce T(E)
((i) $
input
s0($)shift 7
stack
i + ( ) $ E T
0 s5 err s7 err err 1 6
1 err s3 err err s2
2 acc
3 s5 err s7 err err 4
4 reduce EE+T
5 reduce T i
6 reduce E T
7 s5 err s7 err err 8 6
8 err s3 err s9 err
9 reduce T(E)
(i) $
input
s7(()
s0($)
shift 7
stack
i + ( ) $ E T
0 s5 err s7 err err 1 6
1 err s3 err err s2
2 acc
3 s5 err s7 err err 4
4 reduce EE+T
5 reduce T i
6 reduce E T
7 s5 err s7 err err 8 6
8 err s3 err s9 err
9 reduce T(E)
i) $
inputs7 (()
s7(()
s0($)
shift 5
stack
i + ( ) $ E T
0 s5 err s7 err err 1 6
1 err s3 err err s2
2 acc
3 s5 err s7 err err 4
4 reduce EE+T
5 reduce T i
6 reduce E T
7 s5 err s7 err err 8 6
8 err s3 err s9 err
9 reduce T(E)
) $
inputs5 (i)
s7 (()
s7(()
s0($)
reduce T i
stack
i + ( ) $ E T
0 s5 err s7 err err 1 6
1 err s3 err err s2
2 acc
3 s5 err s7 err err 4
4 reduce EE+T
5 reduce T i
6 reduce E T
7 s5 err s7 err err 8 6
8 err s3 err s9 err
9 reduce T(E)
) $
inputs6 (T)
s7 (()
s7(()
s0($)
reduce E T
stack
i + ( ) $ E T
0 s5 err s7 err err 1 6
1 err s3 err err s2
2 acc
3 s5 err s7 err err 4
4 reduce EE+T
5 reduce T i
6 reduce E T
7 s5 err s7 err err 8 6
8 err s3 err s9 err
9 reduce T(E)
) $
inputs8 (E)
s7 (()
s7(()
s0($)
shift 9
stack
i + ( ) $ E T
0 s5 err s7 err err 1 6
1 err s3 err err s2
2 acc
3 s5 err s7 err err 4
4 reduce EE+T
5 reduce T i
6 reduce E T
7 s5 err s7 err err 8 6
8 err s3 err s9 err
9 reduce T(E)
$
input
s9 ())
s8 (E)
s7 (()
s7(()
s0($)
reduce T ( E )
stack
i + ( ) $ E T
0 s5 err s7 err err 1 6
1 err s3 err err s2
2 acc
3 s5 err s7 err err 4
4 reduce EE+T
5 reduce T i
6 reduce E T
7 s5 err s7 err err 8 6
8 err s3 err s9 err
9 reduce T(E)
$
inputs6 (T)
s7(()
s0($)
reduce E T
stack
i + ( ) $ E T
0 s5 err s7 err err 1 6
1 err s3 err err s2
2 acc
3 s5 err s7 err err 4
4 reduce EE+T
5 reduce T i
6 reduce E T
7 s5 err s7 err err 8 6
8 err s3 err s9 err
9 reduce T(E)
$
inputs8 (E)
s7(()
s0($)
err
stack
i + ( ) $ E T
0 s5 err s7 err err 1 6
1 err s3 err err s2
2 acc
3 s5 err s7 err err 4
4 reduce EE+T
5 reduce T i
6 reduce E T
7 s5 err s7 err err 8 6
8 err s3 err s9 err
9 reduce T(E)
Grammar Hierarchy
Non-ambiguous CFG
CLR(1)
LALR(1)
SLR(1)
LL(1)
LR(0)
Constructing LR(0) parsing table
• Add a production S’ S$• Construct a finite automaton accepting “valid
stack symbols”• States are set of items A
– The states of the automaton becomes the states of parsing-table
– Determine shift operations
– Determine goto operations
– Determine reduce operations
Example Finite State Automaton
Constructing a Finite Automaton
• NFA– For X X1 X2 … Xn
• X X1 X2 …XiXi+1 … Xn– “prefixes of rhs (handles)”– X1 X2 … Xi is at the top of the stack and we expect
Xi+1 … Xn
• The initial state S’ S$ (X X1…XiXi+1 … Xn, Xi+1 ) = X X1 …XiXi+1 … Xn
– For every production Xi+1 (X X1 X2 …XXi+1 … Xn, ) = Xi+1
• Convert into DFA
S E$ E T | E + T T i | ( E )
S E $
E T
E E + T
T i
T ( E )
E T
T
i
T i
T ( E )(
T (E )
)
E E + TE
E E + T+ E E + T TS E $
E
S E $
$
T (E )E
Example Finite State Automaton
Filling Parsing Table
• A state si
• reduce A – A si
• Shift – A t si
• Goto(si, X) = sj
– A X si
(si, X) = sj
• When conflicts occurs the grammar is not LR(0)
Example Finite State Automaton
Example Control Table
i + ( ) $ E T
0 s5 err s7 err err 1 6
1 err s3 err err s2
2 acc
3 s5 err s7 err err 4
4 reduce EE+T
5 reduce T i
6 reduce E T
7 s5 err s7 err err 8 6
8 err s3 err s9 err
9 reduce T(E)
ExampleS E $ E E + E | i
SE$EE+E
E i
0
SE$EE+E
2E
EE+ EEE+E
E i
4
+
EE+ EEE+E
5E
+
E i
i
1 S E $
$
3
i
SE$EE+E
E i
0SE$EE+E
2E
EE+ EEE+E
E i
4
+
EE+ EEE+E
5 E
+
E i
i
1 S E $
$3
ii + $ E
0 s1 err err 2
1 reduce E i
2 err s4 s3
3 accept
4 s1 5
5 red s4/red red
Interesting Non LR(0) GrammarS’ S$ S L = R | R
L *R | id
R L
S’ S$S L=R
S RL *RL idR L
S L=RR LL
=
Partial DFA
S L=RR L
L *RL id
LR(1) Parser
• Item A , t is at the top of the stack and we are expecting
t
• LR(1) State– Sets of items
• LALR(1) State– Merge items with the same look-ahead
Interesting Non LR(1) Grammars
• Ambiguous
– Arithmetic expressions
– Dangling-else
• Common derived prefix
– A B1 a b | B2 a c
– B1
– B2
• Optional non-terminals– St OptLab Ass
– OptLab id : | – Ass id := Exp
A motivating example• Create a desk calculator• Challenges
– Non trivial syntax
– Recursive expressions (semantics)• Operator precedence
Solution (lexical analysis)/* desk.l */
%%[0-9]+ { yylval = atoi(yytext);
return NUM ;}“+” { return PLUS ;}“-” { return MINUS ;}“/” { return DIV ;}“*” { return MUL ;} “(“ { return LPAR ;}“)” { return RPAR ;}“//”.*[\n] ; /* comment */[\t\n ] ; /* whitespace */. { error (“illegal symbol”, yytext[0]); }
Solution (syntax analysis)/* desk.y */
%token NUM%left PLUS, MINUS%left MUL, DIV%token LPAR, RPAR%start P%%P : E {printf(“%d\n”, $1) ;} ;E : NUM { $$ = $1 ;} | LPAR e RPAR { $$ = $2; } | e PLUS e { $$ = $1 + $3 ; } | e MINUS e { $$ = $1 - $3 ; } | e MUL e { $$ = $1 * $3; } | e DIV e {$$ = $1 / $3; } ;%%#include “lex.yy.c”
flex desk.l
bison desk.y
cc y.tab.c –ll -ly
Summary• LR is a powerful technique• Generates efficient parsers• Generation tools exit
– Bison, yacc, CUP
• But some grammars need to be tuned– Shift/Reduce conflicts– Reduce/Reduce conflicts– Efficiency of the generated parser