ML-YACC

Post on 04-Jan-2016

46 views 0 download

description

ML-YACC. David Walker COS 320. Outline. Last Week Introduction to Lexing, CFGs, and Parsing Today: More parsing: automatic parser generation via ML-Yacc Reading: Chapter 3 of Appel. Parser Implementation. Implementation Options: Write a Parser from scratch - PowerPoint PPT Presentation

Transcript of ML-YACC

ML-YACC

David Walker

COS 320

Outline

• Last Week– Introduction to Lexing, CFGs, and Parsing

• Today:– More parsing:

• automatic parser generation via ML-Yacc

– Reading: Chapter 3 of Appel

Parser Implementation• Implementation Options:

1. Write a Parser from scratch– not as boring as writing a lexer, but not exactly a

weekend in the Bahamas

2. Use a Parser Generator– Very general & robust. sometimes not quite as

efficient as hand-written parsers. Nevertheless, good for lazy compiler writers.

Parser Specification

Parser Implementation• Implementation Options:

1. Write a Parser from scratch– not as boring as writing a lexer, but not exactly a

weekend in the Bahamas

2. Use a Parser Generator– Very general & robust. sometimes not quite as

efficient as hand-written parsers. Nevertheless, good for lazy compiler writers.

Parser Specification

parsergenerator

Parser

Parser Implementation• Implementation Options:

1. Write a Parser from scratch– not as boring as writing a lexer, but not exactly a

weekend in the Bahamas

2. Use a Parser Generator– Very general & robust. sometimes not quite as

efficient as hand-written parsers. Nevertheless, good for lazy compiler writers.

Parser Specification

parsergenerator

Parser

abstract syntax

stream oftokens

ML-Yacc specification

• three parts:

User Declarations: declare values available in the rule actions

%%

ML-Yacc Definitions: declare terminals and non-terminals; special declarations to resolve conflicts

%%

Rules: parser specified by CFG rules and associated semantic action that generate abstract syntax

ML-Yacc declarations (preliminaries)

• specify type of positions%pos int * int

• specify terminal and nonterminal symbols%term IF | THEN | ELSE | PLUS | MINUS ...%nonterm prog | exp | op

• specify end-of-parse token%eop EOF

• specify start symbol (by default, non terminal in LHS of first rule)

%start prog

Simple ML-Yacc Example%%

%term NUM | PLUS | MUL | LPAR | RPAR%nonterm exp | fact | base

%pos int%start exp%eop EOF

%%

exp : fact () | fact PLUS exp ()

fact : base () | base MUL factor ()

base : NUM () | LPAR exp RPAR ()

grammar rules

semantic actions(currentlydo nothing)

grammarsymbols

attribute-grammars

• ML-Yacc uses an attribute-grammar scheme– each nonterminal may have a semantic value

associated with it– when the parser reduces with (X ::= s)

• a semantic action will be executed• uses semantic values from symbols in s

– when parsing is completed successfully• parser returns semantic value associated with the

start symbol• usually a parse tree

attribute-grammars

• semantic actions typically build the abstract syntax for the internal language

• to use semantic values during parsing, we must declare symbol types:– %terminal NUM of int | PLUS | MUL | ...– %nonterminal exp of int | fact of int | base of int

• type of semantic action must match type declared for LHS nonterminal in rule

ML-Yacc with Semantic Actions%%

%term NUM of int | PLUS | MUL | LPAR | RPAR%nonterm exp of int | fact of int | base of int

%pos int%start exp%eop EOF

%%

exp : fact (fact) | fact PLUS exp (fact + exp)

fact : base (base) | base MUL base (base1 * base2)

base : NUM (NUM) | LPAR exp RPAR (exp)

grammar ruleswithsemantic actions

grammarsymbolswithtypedeclarations

computinginteger resultvia semanticactions

ML-Yacc with Semantic Actions

datatype exp = Int of int | Add of exp * exp | Mul of exp * exp

%%...%%

exp : fact (fact) | fact PLUS exp (Add (fact, exp))

fact : base (base) | base MUL exp (Mul (base, exp))

base : NUM (Int NUM) | LPAR exp RPAR (exp)

computingabstract syntaxvia semanticactions

A simpler grammar

datatype exp = Int of int | Add of exp * exp | Mul of exp * exp

%%...%%

exp : NUM (Int NUM) | exp PLUS exp (Add (exp1, exp2)) | exp MUL exp (Mul (exp1, exp2)) | LPAR exp RPAR (exp)

why don’t we just use this simpler grammar?

A simpler grammar

datatype exp = Int of int | Add of exp * exp | Mul of exp * exp

%%...%%

exp : NUM (Int NUM) | exp PLUS exp (Add (exp1, exp2)) | exp MUL exp (Mul (exp1, exp2)) | LPAR exp RPAR (exp)

this grammar isambiguous!

NUM + NUM * NUM

NUMNUM

NUM+

*E E

E

E E

NUMNUM

NUM *

+ EE

E

E E

a simpler grammar

datatype exp = Int of int | Add of exp * exp | Mul of exp * exp

%%...%%

exp : NUM (Int NUM) | exp PLUS exp (Add (exp1, exp2)) | exp MUL exp (Mul (exp1, exp2)) | LPAR exp RPAR (exp)

But it is so cleanthat it would be nice to use. Moreover, weknow which parsetree we want. Wejust need a mechanism to specify it!

NUM + NUM * NUM

NUMNUM

NUM+

*E E

E

E E

NUMNUM

NUM *

+ EE

E

E E

Recall how LR parsing works:

exp ::= NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR

NUM + NUM * NUM

State of parse so far:

Input from lexer:

E + E

yet to read

NUMNUM

NUM *

+ EE

E

E E

desired parse tree:

We have a shift-reduce conflict.What should we do to get the right parse?

elements ofdesired parseparsed so far

Recall how LR parsing works:

exp ::= NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR

NUM + NUM * NUM

State of parse so far:

Input from lexer:

E + E *

yet to read

NUMNUM

NUM *

+ EE

E

E E

desired parse tree:

We have a shift-reduce conflict.What should we do to get the right parse?SHIFT

elements ofdesired parseparsed so far

Recall how LR parsing works:

exp ::= NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR

NUM + NUM * NUM

State of parse so far:

Input from lexer:

E + E * NUM

yet to read

NUMNUM

NUM *

+ EE

E

E E

desired parse tree:

elements ofdesired parseparsed so far

SHIFT SHIFT

Recall how LR parsing works:

exp ::= NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR

NUM + NUM * NUM

State of parse so far:

Input from lexer:

E + E * E

yet to read

NUMNUM

NUM *

+ EE

E

E E

desired parse tree:

elements ofdesired parseparsed so far

REDUCE

Recall how LR parsing works:

exp ::= NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR

NUM + NUM * NUM

State of parse so far:

Input from lexer:

E + E

yet to read

NUMNUM

NUM *

+ EE

E

E E

desired parse tree:

elements ofdesired parseparsed so far

REDUCE

Recall how LR parsing works:

exp ::= NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR

NUM + NUM * NUM

State of parse so far:

Input from lexer:

E

yet to read

NUMNUM

NUM *

+ EE

E

E E

desired parse tree:

elements ofdesired parseparsed so far

REDUCE

The alternative parse

exp ::= NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR

NUM + NUM * NUM

State of parse so far:

Input from lexer:

E + E

yet to read

We have a shift-reduce conflict.Suppose we REDUCE next

elementsparsed so far

NUMNUM

+E E

The alternative parse

exp ::= NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR

NUM + NUM * NUM

State of parse so far:

Input from lexer:

E

yet to read

REDUCE

elementsparsed so far

NUMNUM

+E E

E

The alternative parse

exp ::= NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR

NUM + NUM * NUM

State of parse so far:

Input from lexer:

E * E

yet to read

Now: SHIFT SHIFT REDUCE

elementsparsed so far

NUMNUM

+E E

E E

NUM

*

The alternative parse

exp ::= NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR

NUM + NUM * NUM

State of parse so far:

Input from lexer:

E

yet to read

REDUCE

NUMNUM

NUM+

*E E

E

E E

elementsparsed so far

Summary

exp ::= NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR

NUM + NUM * NUM

State of parse so far:

Input from lexer:

E + E

yet to read

NUMNUM

NUM *

+ EE

E

E E

desired parse tree:

We have a shift-reduce conflict.We have E + E on stack, we see *.We want to shift. We ALWAYS want toshift since * has higher precedence than +==> symbols to the right on the stack get processed first

elements ofdesired parseparsed so far

Example 2

exp ::= NUM | exp PLUS exp | exp MUL exp | exp MINUS exp | LPAR exp RPAR

NUM - NUM - NUM

State of parse so far:

Input from lexer:

E - E

yet to read

We have a shift-reduce conflict.We have E - E on stack, we see -.We want “-” to be a left-associative operator.ie: NUM – NUM – NUM == ((NUM – NUM) – NUM)What do we do?

NUMNUM

-E E

elementsparsed so far

Example 2

exp ::= NUM | exp PLUS exp | exp MUL exp | exp MINUS exp | LPAR exp RPAR

NUM - NUM - NUM

State of parse so far:

Input from lexer:

E

yet to read

We have a shift-reduce conflict.We have E - E on stack, we see -.What do we do?REDUCE

NUMNUM

-E E

elementsparsed so far

E

Example 2

exp ::= NUM | exp PLUS exp | exp MUL exp | exp MINUS exp | LPAR exp RPAR

NUM - NUM - NUM

State of parse so far:

Input from lexer:

E - E

yet to read

SHIFT SHIFT REDUCE

NUMNUM

NUM-

-E E

E E

elementsparsed so far

Example 2

exp ::= NUM | exp PLUS exp | exp MUL exp | exp MINUS exp | LPAR exp RPAR

NUM - NUM - NUM

State of parse so far:

Input from lexer:

E

yet to read

REDUCE

NUMNUM

NUM-

-E E

E

E E

elementsparsed so far

Example 2: Summary

exp ::= NUM | exp PLUS exp | exp MUL exp | exp MINUS exp | LPAR exp RPAR

NUM - NUM - NUM

State of parse so far:

Input from lexer:

E

yet to readNUMNUM

NUM-

-E E

E

E E

elementsparsed so far

We have a shift-reduce conflict.We have E - E on stack, we see -.What do we do? REDUCE. We ALWAYSwant to reduce since – is left-associative.

precedence and associativity

• three solutions to dealing with operator precedence and associativity:1) let Yacc complain.

• its default choice is to shift when it encounters a shift-reduce error

• BAD: programmer intentions unclear; harder to debug other parts of your grammar; generally inelegant

2) rewrite the grammar to eliminate ambiguity• can be complicated and less clear

3) use Yacc precedence directives• %left, %right %nonassoc

precedence and associativity• given directives, ML-Yacc assigns precedence to each

terminal and rule– precedence of terminal based on order in which associativity is

specified– precedence of rule is the precedence of the right-most terminal

• eg: precedence of (E ::= E + E) == prec(+)

• a shift-reduce conflict is resolved as follows– prec(terminal) > prec(rule) ==> shift– prec(terminal) < prec(rule) ==> reduce– prec(terminal) = prec(rule) ==>

• assoc(terminal) = left ==> reduce• assoc(terminal) = right ==> shift• assoc(terminal) = nonassoc ==> report as error

........E % E

....................T E

yet to read

input: terminal T next:

RHS of rule on stack:

precedence and associativity

datatype exp = Int of int | Add of exp * exp | Sub of exp * exp | Mul of exp * exp | Div of exp *exp

%%

%left PLUS MINUS%left MUL DIV

%%

exp : NUM (Int NUM) | exp PLUS exp (Add (exp1, exp2)) | exp MINUS exp (Sub (exp1, exp2)) | exp MUL exp (Mul (exp1, exp2)) | exp DIV exp (Div (exp1, exp2)) | LPAR exp RPAR (exp)

precedence and associativity

...E PLUS E

....................MUL E

yet to read

input: terminal T next:

RHS of rule on stack:

precedence directives:

%left PLUS MINUS%left MUL DIV

prec(MUL) > prec(PLUS)

precedence and associativity

... E PLUS E

....................MUL E

yet to read

input: terminal T next:

RHS of rule on stack:

precedence directives:

%left PLUS MINUS%left MUL DIV

prec(MUL) > prec(PLUS)

SHIFT

precedence and associativity

...E PLUS E

....................SUB E

yet to read

input: terminal T next:

RHS of rule on stack:

precedence directives:

%left PLUS MINUS%left MUL DIV

prec(PLUS) = prec(SUB)

precedence and associativity

...E PLUS E

....................SUB E

yet to read

input: terminal T next:

RHS of rule on stack:

precedence directives:

%left PLUS MINUS%left MUL DIV

prec(PLUS) = prec(SUB)

REDUCE

one more exampledatatype exp = Int of int | Add of exp * exp | Sub of exp * exp | Mul of exp * exp | Div of exp *exp| Uminus of exp

%%

%left PLUS MINUS%left MUL DIV

%%

exp : NUM (Int NUM) | MINUS exp (Uminus exp) | exp PLUS exp (Add (exp1, exp2)) | exp MINUS exp (Sub (exp1, exp2)) | exp MUL exp (Mul (exp1, exp2)) | exp DIV exp (Div (exp1, exp2)) | LPAR exp RPAR (exp)

...MINUS E

....................MUL E

yet to read

what happens?

one more exampledatatype exp = Int of int | Add of exp * exp | Sub of exp * exp | Mul of exp * exp | Div of exp *exp| Uminus of exp

%%

%left PLUS MINUS%left MUL DIV

%%

exp : NUM (Int NUM) | MINUS exp (Uminus exp) | exp PLUS exp (Add (exp1, exp2)) | exp MINUS exp (Sub (exp1, exp2)) | exp MUL exp (Mul (exp1, exp2)) | exp DIV exp (Div (exp1, exp2)) | LPAR exp RPAR (exp)

...MINUS E

....................MUL E

yet to read

what happens?

prec(*) > prec(-) ==> we SHIFT

the fixdatatype exp = Int of int | Add of exp * exp | Sub of exp * exp | Mul of exp * exp | Div of exp *exp| Uminus of exp

%%

%left PLUS MINUS%left MUL DIV%left UMINUS

%%

exp : NUM (Int NUM) | MINUS exp %prec UMINUS (Uminus exp) | exp PLUS exp (Add (exp1, exp2)) | exp MINUS exp (Sub (exp1, exp2)) | exp MUL exp (Mul (exp1, exp2)) | exp DIV exp (Div (exp1, exp2)) | LPAR exp RPAR (exp)

...MINUS E

....................MUL E

yet to read

the fixdatatype exp = Int of int | Add of exp * exp | Sub of exp * exp | Mul of exp * exp | Div of exp *exp| Uminus of exp

%%

%left PLUS MINUS%left MUL DIV%left UMINUS

%%

exp : NUM (Int NUM) | MINUS exp %prec UMINUS (Uminus exp) | exp PLUS exp (Add (exp1, exp2)) | exp MINUS exp (Sub (exp1, exp2)) | exp MUL exp (Mul (exp1, exp2)) | exp DIV exp (Div (exp1, exp2)) | LPAR exp RPAR (exp)

...E MINUS E

....................MUL E

yet to read

changing precedence of rulealters decision:

prec(UMINUS) > prec(MUL) ==> we REDUCE

the dangling else problem

• Grammar:S ::= if E then S else S | if E then S | ...

• Consider: if a then if b then S else S– parse 1: if a then (if b then S else S)– parse 2: if a then (if b then S) else S

• Parser reports shift-reduce error– in default behavior: shift (what we want)

the dangling else problem

• Grammar:S ::= if E then S else S | if E then S | ...

• Alternative solution is to rewrite grammar:S ::= M | UM ::= if E then M else M | ...U ::= if E then S | if E then M else U

default behavior of ML-Yacc

• Shift-Reduce error– shift

• Reduce-Reduce error– reduce by first rule– generally considered unacceptable

• for assignment 3, your job is to write a grammar for Fun such that there are no conflicts– you may use precedence directives tastefully

Note: To enter ML-Yacc hell, use a parser to catch type errors

• when doing assignment 3, your job is to catch parse errors

• there are lots of programming errors that will slip by the parser:– eg: 3 + true– catching these sorts of errors is the job of the type

checker– just as catching program structure errors was the job

of the parser, not the lexer– attempting to do type checking in the parser is

impossible (in general)• why? Hint: what does “context-free grammar” imply?