ML-YACC

David Walker

COS 320

Outline

• Last Week– Introduction to Lexing, CFGs, and Parsing

• Today:– More parsing:

• automatic parser generation via ML-Yacc

– Reading: Chapter 3 of Appel

Parser Implementation• Implementation Options:

1. Write a Parser from scratch– not as boring as writing a lexer, but not exactly a

weekend in the Bahamas

2. Use a Parser Generator– Very general & robust. sometimes not quite as

efficient as hand-written parsers. Nevertheless, good for lazy compiler writers.

Parser Specification

parsergenerator

Parser

parsergenerator

Parser

abstract syntax

stream oftokens

ML-Yacc specification

• three parts:

User Declarations: declare values available in the rule actions

ML-Yacc Definitions: declare terminals and non-terminals; special declarations to resolve conflicts

Rules: parser specified by CFG rules and associated semantic action that generate abstract syntax

ML-Yacc declarations (preliminaries)

• specify type of positions%pos int * int

• specify end-of-parse token%eop EOF

• specify start symbol (by default, non terminal in LHS of first rule)

%start prog

Simple ML-Yacc Example%%

%pos int%start exp%eop EOF

exp : fact () | fact PLUS exp ()

fact : base () | base MUL factor ()

base : NUM () | LPAR exp RPAR ()

grammar rules

semantic actions(currentlydo nothing)

grammarsymbols

attribute-grammars

• ML-Yacc uses an attribute-grammar scheme– each nonterminal may have a semantic value

associated with it– when the parser reduces with (X ::= s)

• a semantic action will be executed• uses semantic values from symbols in s

– when parsing is completed successfully• parser returns semantic value associated with the

start symbol• usually a parse tree

attribute-grammars

• semantic actions typically build the abstract syntax for the internal language

• type of semantic action must match type declared for LHS nonterminal in rule

ML-Yacc with Semantic Actions%%

%pos int%start exp%eop EOF

exp : fact (fact) | fact PLUS exp (fact + exp)

fact : base (base) | base MUL base (base1 * base2)

base : NUM (NUM) | LPAR exp RPAR (exp)

grammar ruleswithsemantic actions

grammarsymbolswithtypedeclarations

computinginteger resultvia semanticactions

ML-Yacc with Semantic Actions

datatype exp = Int of int | Add of exp * exp | Mul of exp * exp

%%...%%

exp : fact (fact) | fact PLUS exp (Add (fact, exp))

fact : base (base) | base MUL exp (Mul (base, exp))

base : NUM (Int NUM) | LPAR exp RPAR (exp)

computingabstract syntaxvia semanticactions

A simpler grammar

%%...%%

exp : NUM (Int NUM) | exp PLUS exp (Add (exp1, exp2)) | exp MUL exp (Mul (exp1, exp2)) | LPAR exp RPAR (exp)

why don’t we just use this simpler grammar?

A simpler grammar

%%...%%

this grammar isambiguous!

NUM + NUM * NUM

NUMNUM

a simpler grammar

%%...%%

But it is so cleanthat it would be nice to use. Moreover, weknow which parsetree we want. Wejust need a mechanism to specify it!

NUM + NUM * NUM

NUMNUM

Recall how LR parsing works:

exp ::= NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR

NUM + NUM * NUM

State of parse so far:

Input from lexer:

yet to read

NUMNUM

desired parse tree:

We have a shift-reduce conflict.What should we do to get the right parse?

elements ofdesired parseparsed so far

NUM + NUM * NUM

Input from lexer:

E + E *

yet to read

NUMNUM

desired parse tree:

We have a shift-reduce conflict.What should we do to get the right parse?SHIFT

NUM + NUM * NUM

Input from lexer:

E + E * NUM

yet to read

NUMNUM

desired parse tree:

SHIFT SHIFT

NUM + NUM * NUM

Input from lexer:

E + E * E

yet to read

NUMNUM

desired parse tree:

REDUCE

NUM + NUM * NUM

Input from lexer:

yet to read

NUMNUM

desired parse tree:

REDUCE

NUM + NUM * NUM

Input from lexer:

yet to read

NUMNUM

desired parse tree:

REDUCE

The alternative parse

NUM + NUM * NUM

Input from lexer:

yet to read

We have a shift-reduce conflict.Suppose we REDUCE next

elementsparsed so far

NUMNUM

NUM + NUM * NUM

Input from lexer:

yet to read

REDUCE

NUMNUM

NUM + NUM * NUM

Input from lexer:

yet to read

Now: SHIFT SHIFT REDUCE

NUMNUM

NUM + NUM * NUM

Input from lexer:

yet to read

REDUCE

NUMNUM

Summary

NUM + NUM * NUM

Input from lexer:

yet to read

NUMNUM

desired parse tree:

We have a shift-reduce conflict.We have E + E on stack, we see *.We want to shift. We ALWAYS want toshift since * has higher precedence than +==> symbols to the right on the stack get processed first

Example 2

exp ::= NUM | exp PLUS exp | exp MUL exp | exp MINUS exp | LPAR exp RPAR

NUM - NUM - NUM

Input from lexer:

yet to read

We have a shift-reduce conflict.We have E - E on stack, we see -.We want “-” to be a left-associative operator.ie: NUM – NUM – NUM == ((NUM – NUM) – NUM)What do we do?

NUMNUM

Example 2

NUM - NUM - NUM

Input from lexer:

yet to read

We have a shift-reduce conflict.We have E - E on stack, we see -.What do we do?REDUCE

NUMNUM

Example 2

NUM - NUM - NUM

Input from lexer:

yet to read

SHIFT SHIFT REDUCE

NUMNUM

Example 2

NUM - NUM - NUM

Input from lexer:

yet to read

REDUCE

NUMNUM

Example 2: Summary

NUM - NUM - NUM

Input from lexer:

yet to readNUMNUM

We have a shift-reduce conflict.We have E - E on stack, we see -.What do we do? REDUCE. We ALWAYSwant to reduce since – is left-associative.

precedence and associativity

• three solutions to dealing with operator precedence and associativity:1) let Yacc complain.

• its default choice is to shift when it encounters a shift-reduce error

• BAD: programmer intentions unclear; harder to debug other parts of your grammar; generally inelegant

2) rewrite the grammar to eliminate ambiguity• can be complicated and less clear

3) use Yacc precedence directives• %left, %right %nonassoc

precedence and associativity• given directives, ML-Yacc assigns precedence to each

terminal and rule– precedence of terminal based on order in which associativity is

specified– precedence of rule is the precedence of the right-most terminal

• eg: precedence of (E ::= E + E) == prec(+)

• a shift-reduce conflict is resolved as follows– prec(terminal) > prec(rule) ==> shift– prec(terminal) < prec(rule) ==> reduce– prec(terminal) = prec(rule) ==>

• assoc(terminal) = left ==> reduce• assoc(terminal) = right ==> shift• assoc(terminal) = nonassoc ==> report as error

........E % E

....................T E

yet to read

input: terminal T next:

RHS of rule on stack:

datatype exp = Int of int | Add of exp * exp | Sub of exp * exp | Mul of exp * exp | Div of exp *exp

%left PLUS MINUS%left MUL DIV

...E PLUS E

....................MUL E

yet to read

precedence directives:

prec(MUL) > prec(PLUS)

... E PLUS E

....................MUL E

yet to read

prec(MUL) > prec(PLUS)

...E PLUS E

....................SUB E

yet to read

prec(PLUS) = prec(SUB)

...E PLUS E

....................SUB E

yet to read

prec(PLUS) = prec(SUB)

REDUCE

...MINUS E

....................MUL E

yet to read

what happens?

...MINUS E

....................MUL E

yet to read

what happens?

prec(*) > prec(-) ==> we SHIFT

%left PLUS MINUS%left MUL DIV%left UMINUS

...MINUS E

....................MUL E

yet to read

%left PLUS MINUS%left MUL DIV%left UMINUS

...E MINUS E

....................MUL E

yet to read

changing precedence of rulealters decision:

prec(UMINUS) > prec(MUL) ==> we REDUCE

the dangling else problem

• Grammar:S ::= if E then S else S | if E then S | ...

• Consider: if a then if b then S else S– parse 1: if a then (if b then S else S)– parse 2: if a then (if b then S) else S

• Parser reports shift-reduce error– in default behavior: shift (what we want)

the dangling else problem

• Grammar:S ::= if E then S else S | if E then S | ...

• Alternative solution is to rewrite grammar:S ::= M | UM ::= if E then M else M | ...U ::= if E then S | if E then M else U

default behavior of ML-Yacc

• Shift-Reduce error– shift

• Reduce-Reduce error– reduce by first rule– generally considered unacceptable

• for assignment 3, your job is to write a grammar for Fun such that there are no conflicts– you may use precedence directives tastefully

Note: To enter ML-Yacc hell, use a parser to catch type errors

• when doing assignment 3, your job is to catch parse errors

• there are lots of programming errors that will slip by the parser:– eg: 3 + true– catching these sorts of errors is the job of the type

checker– just as catching program structure errors was the job

of the parser, not the lexer– attempting to do type checking in the parser is

impossible (in general)• why? Hint: what does “context-free grammar” imply?

ML-YACC

Documents

Transcript of ML-YACC

INFORME LEX Y YACC

Scanning & Parsing with Lex and YACC

Lab 3: Using ML-Yacc Zhong Zhuang dyzz@mail.ustc.edu.cn.

A Guide to Lex & Yacc - classes.cs.uchicago.edu · 2 PREFACE This document explains how to construct a compiler using lex and yacc. Lex and yacc are tools used to generate lexical

Lex and Yacc: A Brisk Tutorial

Generar Compiladores Lex-yacc

YACC no more

yacc — A Compiler Compiler 3 - Vrije Universiteit …tinf2.vub.ac.be/~dvermeir/courses/compilers/yacc.pdf35 yacc — A Compiler Compiler 3 Introduction yacc (yet another compiler

Sample Lex and YAcc Programs

Lex & Yacc

SML/NJ's compilation manager (CM) - Standard ML of New · PDF file7.2.1 ML-Yacc ... 12.1 Adding simple shell-command tools ... most programs will want to list it in their

Dr. N.K Shrinath - Lex and Yacc

Lex and YACC primer.docx

Lab 3: Using ML-Yacc

Syntax Analysis Part IV - elearning.dei.unipd.it · Yacc and Bison • Yacc (Yet Another Compiler Compiler) – Generates LALR(1) parsers • Bison – Improved version of Yacc

lex and yacc

LEX AND YACC 2 - WordPress.com · SYSTEM SOFTWARE 10CS52 158 UNIT - 8 LEX AND YACC – 2 8.1 USING YACC Yacc provides a general tool for describing the input to a computer program.

NWCV-DAIICT Formal Veriﬁcation using the Home Page IITD ...sak/courses/foav/nwcv-May2005-iitd-cwb-slides.pdf · Used ML-Lex and ML-Yacc to scan and parse semantic speciﬁcation

Lex y yacc

YACC 2013 14 annual report