CSE P501 – Compilers
description
Transcript of CSE P501 – Compilers
CSE P501 – Compilers
Parsing
Context Free Grammars (CFG)
Ambiguous Grammars
Next
Spring 2014 Jim Hogg - UW - CSE - P501 C-1
Spring 2014 Jim Hogg - UW - CSE - P501 C-2
Source TargetFront End Back End
Scanchars
tokens
AST
IR
AST = Abstract Syntax TreeIR = Intermediate Representation
‘Middle End’
Optimize
Select Instructions
Parse
Convert
Allocate Registers
Emit
Machine Code
IR
IR
IR
IR
IR
Parsing
Valid Tokens != Valid Program
Spring 2014 Jim Hogg - UW - CSE - P501 C-3
So a MiniJava Scanner would happily accept the following program:
int ; = true { while ( x < true * if { or 123 ) goto count_99
We rely on a MiniJava Parser to reject this kind of gibberish
MiniJava includes the following tokens (among many others):
class int [ ( . true < this ) + * ; while = if id ilit ! / new {
But how do we specify what makes a valid MiniJava program?
What is Parsing?
Analogous to parsing an English sentence Analyze words into subject, verb, object, etc
How to parse a program? Analyze tokens into language constructs:
assignment, if-clause, function-call, while-loop, etc
Spring 2014 Jim Hogg - UW - CSE - P501 C-4
Parsing – so what’s the problem?
1. The set of valid programs is infinite2. The set of invalid programs is infinite
Q: How to specify all valid programs, succinctly?
A: Define a Grammar More specifically, a Context-Free Grammar (CFG)
Spring 2014 Jim Hogg - UW - CSE - P501 C-5
Grammar for the Hokum Language1. Prog Stm ; Prog | Stm2. Stm AsStm | IfStm 3. AsStm Var = Exp4. IfStm if Exp then AsStm5. VorC Var | Const6. Exp VorC | VorC + VorC7. Var [a-z]8. Const [0-9]
Spring 2014 Jim Hogg - UW - CSE - P501 C-6
• Context-Free Grammar ~ CFG ~ Grammar ~ Backus-Naur Form ~ BNF
• Productions, or Rules• Terminals & Non-Terminals; Start (Symbol)• Multiple languages present in the description
Context-Free Grammars (CFG)
Example Hokum Programs
Prog Stm ; Prog | StmStm AsStm | IfStm AsStm Var = ExpIfStm if Exp then AsStmVorC Var | ConstExp VorC | VorC + VorCVar [a-z]Const [0-9]
Spring 2014 Jim Hogg - UW - CSE - P501 C-7
a = 1; b = a + 4;z = 1;if b + 3 then z = 2
a = x < 20b = a + 4 + 5 ;z = 1if (a == 33) z < 2 ;
Hokum BNF Grammar
But how do we know which programs are legal or illegal, in Hokum?
Legal
Illegal
DerivationProg => Stm ; Prog=> AsStm ; Prog=> Var = Exp ; Prog=> a = Exp ; Prog=> a = VorC ; Prog=> a = Const ; Prog=> a = 1 ; Prog=> a = 1 ; Stm=> a = 1 ; IfStm=> a = 1 ; if Exp then AsStm=> a = 1 ; if VorC + VorC then AsStm=> a = 1 ; if Var + VorC then AsStm
Spring 2014 Jim Hogg - UW - CSE - P501 C-8
=> a = 1 ; if a + VorC then AsStm=> a = 1 ; if a + Const then AsStm=> a = 1 ; if a + 1 then AsStm=> a = 1 ; if a + 1 then Var = Exp=> a = 1 ; if a + 1 then b = Exp=> a = 1 ; if a + 1 then b = VorC=> a = 1 ; if a + 1 then b = Const=> a = 1 ; if a + 1 then b = 2
=> versus Leftmost, rightmost, middlemost Sentential Form & Sentence What is a Context-Sensitive
Grammar?
Prog Stm ; Prog | StmStm AsStm | IfStm AsStm Var = ExpIfStm if Exp then AsStmVorC Var | ConstExp VorC | VorC + VorCVar [a-z]Const [0-9]
Parse TreeProg
Stm
1
Spring 2014
; Prog
AsStm
= Exp
Var
a VorC
Const
Stm
IfStm
thenExp
if AsStm
+ VorCVorC
Var
a 1
Const
2
= Exp
Var
b VorC
Const
Prog Stm ; Prog | StmStm AsStm | IfStm AsStm Var = ExpIfStm if Exp then AsStmVorC Var | ConstExp VorC | VorC + VorCVar [a-z]Const [0-9]
Jim Hogg - UW - CSE - P501 C-9
Junk Nodes in the Parse TreeProg
Stm
1
Spring 2014
; Prog
AsStm
= Exp
Var
a VorC
Const
Stm
IfStm
thenExp
if AsStm
+ VorCVorC
Var
a 1
Const
2
= Exp
Var
b VorC
Const
Jim Hogg - UW - CSE - P501 C-10
AST (Abstract Syntax Tree)
Prog
Spring 2014
=
Var:a Const:1
IfStm
+
Var:a Const:1
=
Var:b Const:2
Jim Hogg - UW - CSE - P501 C-11
Why Not Just RegEx?
Spring 2014 Jim Hogg - UW - CSE - P501 C-12
v = [a-z] // variableo = + | - | | // operator
v ( o v )* // derives a + b c ok, but now add ( )
(? v (o v )? )* // derives (a + b) c// but also gibberish like: a + b) c (
Try to invent a regex description for arithmetic expressions: single-character variable names; operators + - and [Note: red
denotes terminals, below]
Almost every programming language includes such balanced pairs: ( ), { }, begin end. Conclusion: regex won’t work.
More generally, regex correspond to DFAs. They can only ‘count’ pairs up to a finite limit.
Context-Sensitive Grammar? All compiler work uses Context-Free Grammars, or CFGs
Why so-called? Alternatively: What is a non-context-free grammar? (ie, a Context-Sensitive
Grammar)
Suppose production B CFG: we can replace B by , no matter what eg: B =>
CSG: we can replace B by only in certain contexts. Ie, only when B is preceded and/or followed by certain strings
eg: c B d
Spring 2014 Jim Hogg - UW - CSE - P501 C-13
Language Kind ProductionsRegular B and B CContext-Free B Context-Sensitive B ' '
Example CSG
1. S a b c 2. | a S B c3. c B W B4. W B W X5. W X B X6. B X B c7. b B b b
Spring 2014 Jim Hogg - UW - CSE - P501 C-14
The following CSG generates the language an bn cn for n >= 1
Note: CSGs will not be discussed further, nor examined, as part of P501
Spring 2014 Jim Hogg - UW - CSE - P501 C-15
Parsing
The syntax of most programming languages can be specified by a Context-Free Grammar or CFG
Parsing = "How to fill the gap between Start Symbol and Sentence"
L(G) = the language generated by G = the set of sentences generated by G
Parsing: Given G and a sentence w in L(G ), construct the derivation, (parse tree) for w in some order
As we parse, do something useful at each node in the tree
Spring 2014 Jim Hogg - UW - CSE - P501 C-16
"in some order"
Top-down Start with the root Traverse the parse tree depth-first; scan tokens, Left-
to-right; create a Leftmost derivation LL(k)
Bottom-up Start at leaves and build up to the root; scan tokens,
Left-to-right; create a Rightmost derivation (in reverse) LR(k) and subsets: LALR(k) and SLR(k)
Spring 2014 Jim Hogg - UW - CSE - P501 C-17
"do something useful at each node"
Perform some semantic action:
Construct nodes of full parse tree (rare)
Construct abstract syntax tree (common)
Construct linear, lower-level representation like assembler code
Generate target code on the fly 1-pass compiler not common in production compilers: poor code quality
Spring 2014 Jim Hogg - UW - CSE - P501 C-18
Context-Free Grammars – Formal Description
A grammar G is a tuple <N, T, P, S> where
N a finite set of Non-terminal symbols
T a finite set of Terminal symbols
P a finite set of Productions A subset of N × (N T) *
S the start symbol, a distinguished element of N If not specified otherwise, this is taken as the Non-
Terminal on the LHS of the first production
Spring 2014 Jim Hogg - UW - CSE - P501 C-19
Standard Notations
a, b, c element of T A, B, C element of N w, x, y, z elements of T* X, Y, Z element of N T , , elements of (N T)*
(A, ) P => A
Spring 2014 Jim Hogg - UW - CSE - P501 C-20
Derivation Relations (1)
if B P then B => simply affirms G is context-free
A =>* denotes there is a chain of zero-or-more
productions, starting with A, that generates transitive closure
Spring 2014 Jim Hogg - UW - CSE - P501 C-21
Derivation Relations (2)
if B P then w B =>lm w derives leftmost prefix of A is all terminals (by construction)
if B P then B w =>rm w derives rightmost prefix of A may include terminals and non-terminals
We will only be interested in leftmost and rightmost derivations – not random orderings
Spring 2014 Jim Hogg - UW - CSE - P501 C-22
Languages
All the sentences (strings of Terminals) I can generate from NonTerminal A:
For A N, L(A) = { w | A =>* w }
All the sentences (strings of Terminal) I can generate from start symbol S:
If S is the start symbol of grammar G, define L(G ) = L(S)
Spring 2014 Jim Hogg - UW - CSE - P501 C-23
Reduced Grammars
Grammar G is reduced iff for every productionA in P there is some derivation
S =>* x A z => x z =>* xyz ie, no production is useless
Convention: we will use only reduced grammars
Spring 2014 Jim Hogg - UW - CSE - P501 C-24
Grammar G is unambiguous iff every w in L(G ) has a unique leftmost (or rightmost) derivation
Fact: unique leftmost or unique rightmost implies the other
A grammar lacking this property is ambiguous Note: other grammars that generate the same language
may be unambiguous So, "ambiguous" applies to a grammar – not a language
We need unambiguous grammars for parsing (well mostly: see later)
Ambiguous Grammars
Spring 2014 Jim Hogg - UW - CSE - P501 C-25
Example: Ambiguous Grammar
Exp Exp Op Exp | DigOp + | - | * | /Dig 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Exercise: show that this is ambiguous
How? Show two different leftmost or rightmost derivations for the same string
Equivalently: show two different parse trees for the same string
Spring 2014
2+3*4 – part 1
Give a leftmost derivation of 2+3*4; show the parse tree
Exp Exp + Exp | Exp – Exp | Exp * Exp
| Exp / Exp | Dig Dig [0-9]
Exp
Exp
Jim Hogg - UW - CSE - P501 C-26
Spring 2014
2+3*4 – part 2
Exp
Exp
Exp
+
Exp => Exp + Exp
Exp Exp + Exp | Exp – Exp | Exp * Exp
| Exp / Exp | Dig Dig [0-9]
Jim Hogg - UW - CSE - P501 C-27
Spring 2014
2+3*4 – part 3
Exp
Exp
Exp
ExpExp => Exp + Exp => Dig + Exp
Dig
+
Exp Exp + Exp | Exp – Exp | Exp * Exp
| Exp / Exp | Dig Dig [0-9]
Jim Hogg - UW - CSE - P501 C-28
Spring 2014
2+3*4 – part 4
Exp
Exp
Exp
ExpExp => Exp + Exp => Dig + Exp => 2 + Exp
Dig
2 +
Exp Exp + Exp | Exp – Exp | Exp * Exp
| Exp / Exp | Dig Dig [0-9]
C-29Jim Hogg - UW - CSE - P501
Spring 2014
2+3*4 – part 5
Exp
Exp
Exp
ExpExp => Exp + Exp => Dig + Exp => 2 + Exp => 2 + Exp * Exp
Dig
2 +
Exp ::= Exp + Exp | Exp – Exp | Exp * Exp
| Exp / Exp | Dig Dig ::= [0-9]
Exp Exp
*C-30Jim Hogg - UW - CSE - P501
Spring 2014
2+3*4 – part 6
Exp
Exp
Exp
Dig
2 +
ExpExp => Exp + Exp => Dig + Exp => 2 + Exp => 2 + Exp * Exp => 2 + Dig * Exp Exp Ex
p
*
Dig
Exp Exp + Exp | Exp – Exp | Exp * Exp
| Exp / Exp | Dig Dig [0-9]
C-31Jim Hogg - UW - CSE - P501
Spring 2014
2+3*4 – part 7
ExpExp => Exp + Exp => Dig + Exp => 2 + Exp => 2 + Exp * Exp => 2 + Dig * Exp => 2 + 3 * Exp
Exp
Exp
Exp
Dig
2 +
Exp Exp
*
Dig
3
Exp Exp + Exp | Exp – Exp | Exp * Exp
| Exp / Exp | Dig Dig [0-9]
C-32Jim Hogg - UW - CSE - P501
Spring 2014
2+3*4 – part 8
ExpExp => Exp + Exp => Dig + Exp => 2 + Exp => 2 + Exp * Exp => 2 + Dig * Exp => 2 + 3 * Exp => 2 + 3 * Dig
Exp
Exp
Exp
Dig
2 +
Exp Exp
*
Dig
3
Dig
Exp Exp + Exp | Exp – Exp | Exp * Exp
| Exp / Exp | Dig Dig [0-9]
C-33Jim Hogg - UW - CSE - P501
2+3*4 – part 9
Exp
Exp
Exp
ExpExp => Exp Op Exp => Dig Op Exp => 2 Op Exp => 2 + Exp => 2 + Exp Op Exp => 2 + Dig Op Exp => 2 + 3 Op Exp => 2 + 3 * Exp => 2 + 3 * Dig => 2 + 3 * 4
Dig
2 +
Exp Exp
Op
Dig
3 *
Dig
4
Exp Exp + Exp | Exp – Exp | Exp * Exp
| Exp / Exp | Dig Dig [0-9]
Spring 2014 C-34Jim Hogg - UW - CSE - P501
2+3*4 – part 10
Give a different leftmost derivation of 2+3*4
Exp
Exp
Exp
Exp => Exp * Exp => Exp + Exp * Exp => 2 + Exp * Exp => 2 + 3 * Exp => 2 + 3 * 4 Dig
4*
Exp Exp
Dig
2 +
Dig
3
Exp Exp + Exp | Exp – Exp | Exp * Exp
| Exp / Exp | Dig Dig [0-9]
Spring 2014 C-35Jim Hogg - UW - CSE - P501
Are derivations equivalent?
+
2 3
4
* +
2
3 4
*
Result = 2 + (3 * 4) = 14Result = (2 + 3) * 4 = 20
Spring 2014 Jim Hogg - UW - CSE - P501 C-36
Spring 2014 Jim Hogg - UW - CSE - P501 C-37
Another example
Give two different derivations of 5 – 6 – 7
Exp Exp + Exp | Exp – Exp | Exp * Exp
| Exp / Exp | Dig Dig [0-9]
Spring 2014 Jim Hogg - UW - CSE - P501 C-38
Another example
Give two different rightmost derivations of 5 – 6 – 7
Exp Exp + Exp | Exp – Exp | Exp * Exp
| Exp / Exp | Dig Dig [0-9]
Exp => Exp - Exp => Exp - Exp - Exp => Exp - Exp - 7 => Exp - 6 - 7 => 5 - 6 - 7
-
-5
6 7
-
-
5 6
7
result = 6
result = -8
Exp => Exp - Exp => Exp - 7 => Exp - Exp - 7 => Exp - 6 - 7 => 5 - 6 - 7
Spring 2014 Jim Hogg - UW - CSE - P501 C-39
Another example
Give two different leftmost derivations of 5 – 6 – 7
Exp Exp + Exp | Exp – Exp | Exp * Exp
| Exp / Exp | Dig Dig [0-9]
Exp => Exp - Exp => 5 - Exp => 5 - Exp - Exp => 5 - 6 - Exp => 5 - 6 - 7
-
-5
6 7
-
-
5 6
7
result = 6
result = -8 Exp => Exp - Exp
=> Exp - Exp - Exp => 5 - Exp - Exp => 5 - 6 - Exp => 5 - 6 - 7
Spring 2014 Jim Hogg - UW - CSE - P501 C-40
What went wrong?
Grammar did not capture precedence or associativity Eg: 2 + (3 * 4) = 14 versus (2 + 3) * 4 =
20 Eg: 5 - (6 - 7) = 6 versus (5 - 6) - 7 = -8
Solution Create a non-terminal for each level of precedence Isolate the corresponding part of the grammar Force the parser to recognize higher precedence
sub-expressions first
Spring 2014 Jim Hogg - UW - CSE - P501 C-41
Classic Expression Grammar
exp exp + term | exp – term | termterm term * factor | term / factor |
factorfactor int | ( exp )int 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7
E E + T | E – T | TT T * F | T / F | FF ( E ) | DD [0-9]
Spring 2014 Jim Hogg - UW - CSE - P501 C-42
Derive 2 + 3 * 4 E E + T | E – T | TT T * F | T / F | FF ( E ) | DD [0-9]
E => E + T => E + T * F => E + T * D => E + T * 4 => E + F * 4 => E + D * 4 => E + 3 * 4 => T + 3 * 4 => F + 3 * 4 => D + 3 * 4 => 2 + 3 * 4
+
2
3 4
*
Result = 2 + (3 * 4) = 14
This grammar yields the correct, expected (school algebra) result
Spring 2014 Jim Hogg - UW - CSE - P501 C-43
Derive 5 - 6 - 7
E => E - T => E - F => E - D => E - 7 => E - T - 7 => E - F - 7 => E - D - 7 => E - 6 - 7 => F - 6 - 7 => D - 6 - 7 => 5 - 6 - 7
E E + T | E – T | TT T * F | T / F | FF ( E ) | DD [0-9]
-
-
5 6
7
result = -8
• This grammar yields the correct, expected (school algebra) result• Note how left-recursive rules yield left-associativity
Spring 2014 Jim Hogg - UW - CSE - P501 C-44
Classic Example of Ambiguous Grammar
Grammar for conditional statements
stm if ( cond ) stm | if ( cond ) stm else stm
Exercise: show that this is ambiguous How?
“The Dangling Else” - a 'weakness' in C, Pascal, etc
Spring 2014 Jim Hogg - UW - CSE - P501 C-45
Two Derivations
if (cond) if (cond) stm else stm
stm if ( cond ) stm | if ( cond ) stm else stm
stm
if stm( cond
)
if stm( cond
) else stm
stm
if stm( cond
)
if stm( cond
) else stm
Spring 2014 Jim Hogg - UW - CSE - P501 C-46
Solving the Dangling Else
Fix the grammar to separate if statements with else clause from those without
Done in Java reference grammar Adds lots of non-terminals
Use some ad-hoc rule in parser “else matches closest unpaired if”
Change the language Only possible if you 'own' the language
Resolving Ambiguity with Grammar (1)
Stm IfElse | IfNoElse IfElse if ( Exp ) IfElse else IfElse IfNoElse if ( Exp ) Stm | if ( Exp ) IfElse else IfNoElse
formal, no additional rules beyond syntax
sometimes obscures original grammar
Spring 2014 Jim Hogg - UW - CSE - P501 C-47
Resolving Ambiguity with Grammar (2)
If you can (re-)design the language, avoid the problem entirely
IfStm if Exp then Stm end | if Exp then Stm else Stm end
formal, clear, elegant allows sequence of Stms in then and else branches, no
{ } needed extra end required for every if
Spring 2014 Jim Hogg - UW - CSE - P501 C-48
Spring 2014 Jim Hogg - UW - CSE - P501 C-49
Parser Tools and Operators
Most parser tools cope with ambiguous grammars Earlier productions chosen before later ones Longest match used if there is a choice Makes life simpler if used with discipline But be sure the tool does what you really want
Specify operator precedence & associativity Allows simpler, ambiguous grammar with fewer non-
terminals Used in CUP
Spring 2014 Jim Hogg - UW - CSE - P501 C-50
Next LR (bottom-up / shift-reduce) parsing
Reading Continue Cooper&Torczon chapter 3
Note Note: LR parsing is the toughest session in
P501
Next