Parsing
description
Transcript of Parsing
![Page 2: Parsing](https://reader034.fdocuments.net/reader034/viewer/2022052702/568154ae550346895dc2b9ba/html5/thumbnails/2.jpg)
Derivations A string is valid in a language if
and only if there exists a derivation from the start state which produces it
Begin with the start symbol, and apply grammar rules until you produce the string Note that the final string (sentence)
consists of only terminals
![Page 3: Parsing](https://reader034.fdocuments.net/reader034/viewer/2022052702/568154ae550346895dc2b9ba/html5/thumbnails/3.jpg)
Question
Given a formal grammar G and a sentence (program) p, is p derivable from grammar G ?
Or equivalently, is a given program p valid according to some language’s syntax (say C)?
![Page 4: Parsing](https://reader034.fdocuments.net/reader034/viewer/2022052702/568154ae550346895dc2b9ba/html5/thumbnails/4.jpg)
Example: Context-Free Grammar
S ::= x A
| y B
A ::= u C
| v C
B ::= t
C ::= w
| z
// derivable?
xum
![Page 5: Parsing](https://reader034.fdocuments.net/reader034/viewer/2022052702/568154ae550346895dc2b9ba/html5/thumbnails/5.jpg)
Example: Context-Free Grammar
// derivable?
xum
xuwz
S ::= x A
| y B
A ::= u C
| v C
B ::= t
C ::= w
| z
![Page 6: Parsing](https://reader034.fdocuments.net/reader034/viewer/2022052702/568154ae550346895dc2b9ba/html5/thumbnails/6.jpg)
Example: Context-Free Grammar
// derivable?
xum
xuwz
xwu
S ::= x A
| y B
A ::= u C
| v C
B ::= t
C ::= w
| z
![Page 7: Parsing](https://reader034.fdocuments.net/reader034/viewer/2022052702/568154ae550346895dc2b9ba/html5/thumbnails/7.jpg)
Example: Context-Free Grammar
// derivable?
xum
xuwz
xwu
xuz
S ::= x A
| y B
A ::= u C
| v C
B ::= t
C ::= w
| z
![Page 8: Parsing](https://reader034.fdocuments.net/reader034/viewer/2022052702/568154ae550346895dc2b9ba/html5/thumbnails/8.jpg)
Lexical Analyzer The lexical analyzer translates the
source program into a stream of lexical tokens Source program:
stream of (ASCII or Unicode) characters Lexical token:
compiler data structure that represents the occurrence of a terminal symbol
Valid sentence consists of only allowable terminals
![Page 9: Parsing](https://reader034.fdocuments.net/reader034/viewer/2022052702/568154ae550346895dc2b9ba/html5/thumbnails/9.jpg)
Example: Context-Free Grammar
// all terminals
T={x, y, u, v, t, w, z}
S ::= x A
| y B
A ::= u C
| v C
B ::= t
C ::= w
| z
![Page 10: Parsing](https://reader034.fdocuments.net/reader034/viewer/2022052702/568154ae550346895dc2b9ba/html5/thumbnails/10.jpg)
Example: Context-Free Grammar
// all terminals
T={x, y, u, v, t, w, z}
// allowable stringsT*
S ::= x A
| y B
A ::= u C
| v C
B ::= t
C ::= w
| z
![Page 11: Parsing](https://reader034.fdocuments.net/reader034/viewer/2022052702/568154ae550346895dc2b9ba/html5/thumbnails/11.jpg)
Predictive Parsing Parsing: recognizing a string and do
something useful The most naïve approach to use
when implementing a parser is to use recursive descent
A form of top-down parsing Not as powerful as other methods,
but easy enough to implement by hand
![Page 12: Parsing](https://reader034.fdocuments.net/reader034/viewer/2022052702/568154ae550346895dc2b9ba/html5/thumbnails/12.jpg)
Predictive Parsing
// Valid?
xum
xuwz
xwu
xuz
S ::= x A
| y B
A ::= u C
| v C
B ::= t
C ::= w
| z
![Page 13: Parsing](https://reader034.fdocuments.net/reader034/viewer/2022052702/568154ae550346895dc2b9ba/html5/thumbnails/13.jpg)
A Predictive Parser in C (Sketch)tokenTy token;
void parseS (){ switch (token.kind) { case x: token = nextToken (); parseA (); break; case y: token = nextToken (); parseB (); break; default: error (…); } }// other functions are similar
![Page 14: Parsing](https://reader034.fdocuments.net/reader034/viewer/2022052702/568154ae550346895dc2b9ba/html5/thumbnails/14.jpg)
Output:Abstract Syntax Tree
xuz
S
x A
u C
z
![Page 15: Parsing](https://reader034.fdocuments.net/reader034/viewer/2022052702/568154ae550346895dc2b9ba/html5/thumbnails/15.jpg)
A Predictive Parser Emitting AST in C (Sketch)tokenTy token;
S parseS (){ switch (token.kind) { case x: token = nextToken (); a=parseA (); return newS1 (x, a); case y: token = nextToken (); b=parseB (); return newS2 (y, b); default: error (…); } }// other functions are similar
![Page 16: Parsing](https://reader034.fdocuments.net/reader034/viewer/2022052702/568154ae550346895dc2b9ba/html5/thumbnails/16.jpg)
Predictive Parsing Difficulties
// derivable?
xuz
S ::= x A
| x B
A ::= u C
| v C
B ::= t
C ::= w
| z
![Page 17: Parsing](https://reader034.fdocuments.net/reader034/viewer/2022052702/568154ae550346895dc2b9ba/html5/thumbnails/17.jpg)
E
By 4 => E * E
By 5 => E * (E + E)
By 2 => E * (E + 4)
By 2 => E * (3 + 4)
By 2 => 15 * (3 + 4)
Or Even Worse
1 E ::= id
2 | num
3 | E + E
4 | E * E
5 | ( E )
15*(3+4)
![Page 18: Parsing](https://reader034.fdocuments.net/reader034/viewer/2022052702/568154ae550346895dc2b9ba/html5/thumbnails/18.jpg)
E
E * E
E * (E + E)
E * (E + 4)
E * (3 + 4)
15 * (3 + 4)
Or Even Worse15*(3+4)
E
E * E
15 * E
15 * (E + E)
15 * (3 + E)
15 * (3 + 4)rightmost derivation leftmost derivation
![Page 19: Parsing](https://reader034.fdocuments.net/reader034/viewer/2022052702/568154ae550346895dc2b9ba/html5/thumbnails/19.jpg)
Ambiguous grammars
A grammar is ambiguous if there is a sentence with >1 parse tree
15 * 3 + 4E
E * E
15 E + E
3 4
E
E + E
15E * E
15 3
![Page 20: Parsing](https://reader034.fdocuments.net/reader034/viewer/2022052702/568154ae550346895dc2b9ba/html5/thumbnails/20.jpg)
Eliminating ambiguity In programming language syntax,
ambiguity often arises from missing operator precedence or associativity * higher precedence than +? * and + are left associative?
Can sometimes rewrite the grammar to disambiguate this Beyond the scope of this course
![Page 21: Parsing](https://reader034.fdocuments.net/reader034/viewer/2022052702/568154ae550346895dc2b9ba/html5/thumbnails/21.jpg)
Unambiguous Grammar
E ::= id
| num
| E + E
| E * E
| ( E )
E ::= E + T
| T
T ::= T * F
| F
F ::= id
| num
| ( E )Accepts the same language, but parses unambiguously
![Page 22: Parsing](https://reader034.fdocuments.net/reader034/viewer/2022052702/568154ae550346895dc2b9ba/html5/thumbnails/22.jpg)
Limitations with Predictive Parsing
Rewriting grammar: to resolve ambiguity
Grammars/trees are ugly But…easy to write code by hand,
and very good for error reporting
![Page 23: Parsing](https://reader034.fdocuments.net/reader034/viewer/2022052702/568154ae550346895dc2b9ba/html5/thumbnails/23.jpg)
Doing better We can do better We can use a parsing algorithm
that can handle all context-free languages (though not all context-free
grammars) Remember: a context-free language
might have many different context-free grammars
![Page 24: Parsing](https://reader034.fdocuments.net/reader034/viewer/2022052702/568154ae550346895dc2b9ba/html5/thumbnails/24.jpg)
The Yacc Toolsemantic analyzer
specification
parser
YaccOriginally developed for C, and now almost every main-st
ream language has its own Yacc-tool:
bison (C), ml-yacc (SML), Cup (Java), GPPG (C#), …
![Page 25: Parsing](https://reader034.fdocuments.net/reader034/viewer/2022052702/568154ae550346895dc2b9ba/html5/thumbnails/25.jpg)
Whole Structure
source code
abstract syntax
tree
lexical analyzer
parser
tokens
Pentium
other
part