1 Introduction to Parsing. 2 Outline l Regular languages revisited l Parser overview Context-free...

Introduction to Parsing

Outline Regular languages revisited

Parser overview

Context-free grammars (CFG’s)

Derivations

Languages and Automata Formal languages are very important in CS

» Especially in programming languages

Regular languages» The weakest formal languages widely used» Many applications

We will also study context-free languages

Limitations of Regular Languages

Intuition: A finite automaton that runs long enough must repeat states

Finite automaton can’t remember # of times it has visited a particular state

Finite automaton has finite memory» Only enough to store in which state it is » Cannot count, except up to a finite limit

E.g., language of balanced parentheses is not regular: { (i )i | i ¸ 0}

The Functionality of the Parser

Input: sequence of tokens from lexer

Output: parse tree of the program

Example Java expr

x == y ? 1 : 2 Parser input

ID == ID ? INT : INT Parser output

== INT

Comparison with Lexical Analysis

Phase Input Output

Lexer Sequence of characters

Sequence of tokens

Parser Sequence of tokens

Parse tree

The Role of the Parser Not all sequences of tokens are programs . . . . . . Parser must distinguish between valid and

invalid sequences of tokens

We need» A language for describing valid sequences of

tokens» A method for distinguishing valid from invalid

sequences of tokens

Context-Free Grammars Programming language constructs have

recursive structure

An EXPR isif EXPR then EXPR else EXPR fi , orwhile EXPR loop EXPR pool , or…

Context-free grammars are a natural notation for this recursive structure

CFGs (Cont.) A CFG consists of

» A set of terminals T» A set of non-terminals N» A start symbol S (a non-terminal)» A set of productions

Assuming X 2 N X ! , or X ! Y1 Y2 ... Yn where Yi µ N [ T

Notational Conventions In these lecture notes

» Non-terminals are written upper-case» Terminals are written lower-case» The start symbol is the left-hand side of the

first production

Examples of CFGsExpr if Expr then Expr else Expr | while Expr do Expr | id

Examples of CFGs Simple arithmetic expressions:

E E E| E + E| E| id

The Language of a CFGRead productions as replacement rules: X ! Y1 ... Yn

Means X can be replaced by Y1 ... Yn

X ! Means X can be erased (replaced with empty

string)

Key Idea1. Begin with a string consisting of the start symbol

“S”2. Replace any non-terminal X in the string by a

right-hand side of some production

3. Repeat (2) until there are no non-terminals in the string

1 nX Y Y

The Language of a CFG (Cont.)

More formally, write

if there is a production

1 1 1 1 1i n i m i nX X X X X Y Y X X

1 i mX Y Y

The Language of a CFG (Cont.)

in 0 or more steps

1 1n mX X Y Y

The Language of a CFGLet G be a context-free grammar with start symbol

S. Then the language of G is:

1 1| and every is a terminaln n ia a S a a a

Terminals Terminals are called because there are no rules

for replacing them

Once generated, terminals are permanent

Terminals ought to be tokens of the language

ExamplesL(G) is the language of CFG G

Strings of balanced parentheses

Two grammars:

( )S SS

( ) | 0i i i

Arithmetic ExampleSimple arithmetic expressions:

Some elements of the language:

E E+E | E E | (E) | id

id id + id(id) id id(id) id id (id)

NotesThe idea of a CFG is a big step. But:

Membership in a language is “yes” or “no”; also need parse tree of the input

Must handle errors gracefully

Need an implementation of CFG’s (e.g., Cup)

More Notes Form of the grammar is important

» Many grammars generate the same language» Tools are sensitive to the grammar

» Note: Tools for regular languages (e.g., flex) are also sensitive to the form of the regular expression, but this is rarely a problem in practice

Derivations and Parse TreesA derivation is a sequence of productions

A derivation can be drawn as a tree» Start symbol is the tree’s root» For a production add children» to node

1 nX Y Y X1 nY Y

Derivation Example Grammar

String

E E+E | E E | (E) | id

id id + id

Derivation Example (Cont.)

EE+EE E+Eid E + Eid id + Eid id + id

Derivation in Detail (1)

EE+EE +

EE+EE E+Eid E + E

EE+EE E+Eid E + id id +

EE+EE E+Eid E + Eid id + Eid id + id

Notes on Derivations A parse tree has

» Terminals at the leaves» Non-terminals at the interior nodes

An in-order traversal of the leaves is the original input

The parse tree shows the association of operations, the input string does not

Left-most and Right-most Derivations

The example is a left-most derivation» At each step, replace

the left-most non-terminal

There is an equivalent notion of a right-most derivation

EE+EE+idE E + idE id + idid id + id

Right-most Derivation in Detail (1)

EE+EE+

E E+id

EE+EE+idE E + id

EE+EE+idE E E

+ idid + id

EE+EE+idE E + idE id + idid id + id

Derivations and Parse Trees Note that right-most and left-most derivations

have the same parse tree

The difference is the order in which branches are added

Summary of Derivations We are not just interested in whether s 2L(G)

» We need a parse tree for s

A derivation defines a parse tree» But one parse tree may have many derivations

Left-most and right-most derivations are important in parser implementation

Issues A parser consumes a sequence of tokens s and

produces a parse tree Issues:

» How do we recognize that s 2 L(G) ?» A parse tree of s describes how s L(G) » Ambiguity: more than one parse tree

(interpretation) for some string s » Error: no parse tree for some string s» How do we construct the parse tree?

Ambiguity Grammar

E ! E + E | E * E | ( E ) | int

String

int * int + int

Ambiguity (Cont.)This string has two parse trees

+intint

*intint

Ambiguity (Cont.) A grammar is ambiguous if it has more than one

parse tree for some string» Equivalently, there is more than one right-most

or left-most derivation for some string Ambiguity is bad

» Leaves meaning of some programs ill-defined Ambiguity is common in programming languages

» Arithmetic expressions» IF-THEN-ELSE

Dealing with Ambiguity There are several ways to handle ambiguity

Most direct method is to rewrite the grammar unambiguously

E ! T + E | T T ! int * T | int | ( E )

Enforces precedence of * over +

Ambiguity: The Dangling Else

Consider the grammar E if E then E | if E then E else E | OTHER

This grammar is also ambiguous

The Dangling Else: Example The expression

if E1 then if E2 then E3 else E4

has two parse treesif

E2 E3 E4

• Typically we want the second form

The Dangling Else: A Fix else matches the closest unmatched then We can describe this in the grammar

E MIF /* all then are matched */

| UIF /* some then are unmatched */

MIF if E then MIF else MIF

| OTHERUIF if E then E | if E then MIF else UIF

Describes the same set of strings

The Dangling Else: Example Revisited

The expression if E1 then if E2 then E3 else E4

if(UIF)

E1 if(MIF)

E2 E3 E4

if(MIF)

E1 if(UIF)

• Not valid because the then expression is not a MIF

• A valid parse tree (for a UIF)

Ambiguity No general techniques for handling ambiguity

Impossible to convert automatically an ambiguous grammar to an unambiguous one

Used with care, ambiguity can simplify the grammar» Sometimes allows more natural definitions» We need disambiguation mechanisms

Precedence and Associativity Declarations

Instead of rewriting the grammar» Use the more natural (ambiguous) grammar» Along with disambiguating declarations

Most tools allow precedence and associativity declarations to disambiguate grammars

Examples …

Associativity Declarations Consider the grammar E E + E | int Ambiguous: two parse trees of int + int + int

E+int +

intint

E+int+

intint• Left-associativity declaration: %left +

Precedence Declarations Consider the grammar E E + E | E * E | int

» And the string int + int * intE

E+int *

intint

E*int+

intint• Precedence declarations: %left + %left *

Abstract Syntax Trees So far a parser traces the derivation of a

sequence of tokens The rest of the compiler needs a structural

representation of the program Abstract syntax trees

» Like parse trees but ignore some details» Abbreviated as AST

Abstract Syntax Tree. (Cont.) Consider the grammar

E int | ( E ) | E + E And the string

5 + (2 + 3) After lexical analysis (a list of tokens) int5 ‘+’ ‘(‘ int2 ‘+’ int3 ‘)’ During parsing we build a parse tree …

Example of Parse TreeE

( E )+

E +int5

Traces the operation of the parser

Does capture the nesting structure

But too much info» Parentheses» Single-successor

Example of Abstract Syntax Tree

Also captures the nesting structure But abstracts from the concrete syntax

=> more compact and easier to use An important data structure in a compiler

Constructing An AST We first define the AST class hierarchy

» ASTNode IntNode , PlusNode Consider an abstract tree type with two constructors:

new IntNode(n)

new PlusNode(

Semantic Actions This is what we’ll use to construct ASTs

Each grammar symbol may have attributes» For terminal symbols (lexical tokens) attributes

can be calculated by the lexer

Each production may have an action» Written as: X Y1 … Yn { action }» That can refer to or compute symbol attributes

Constructing an AST We define an attribute ast for non-terminals

» Values of ast attributes are ASTs» We assume that int.lexval is the value of the

integer lexeme» Computed using semantic actions

E int E.ast = new intNode(int.lexval) | E1 + E2 E.ast = new PlusNode

(E1.ast, E2.ast)

| ( E1 ) E.ast = E1.ast

Parse Tree Example Consider the string int5 ‘+’ ‘(‘ int2 ‘+’ int3 ‘)’ A bottom-up evaluation of the ast attribute: E.ast = new PlusNode(new IntNode(5), new PlusNode(new IntNode(2), new IntNode(3))

Review We can specify language syntax using CFG A parser will answer whether s L(G)

» and will build a parse tree» which we convert to an AST» and pass on to the rest of the compiler

Next lectures:» How do we answer s L(G) and build a parse

Introduction to Top-Down Parsing

Terminals are seen in order of appearance in the token stream:

t2 t5 t6 t8 t9

The parse tree is constructed» From the top» From left to right

Recursive Descent Parsing Consider the grammar

E T + E | T T int | int * T | ( E )

Token stream is: int5 * int2

Start with top-level non-terminal E

Try the rules for E in order

Recursive Descent Parsing. Example (Cont.)

Try E0 T1 + E2 Then try a rule for T1 ( E3 )

» But ( does not match input token int5

Try T1 int . Token matches.

» But + after T1 does not match input token * Try T1 int * T2

» This will match but + after T1 will be unmatched Have exhausted the choices for T1

» Backtrack to choice for E0

Recursive Descent Parsing. Example (Cont.)

Try E0 T1

Follow same steps as before for T1

» And succeed with T1 int * T2 and T2 int

» With the following parse tree

int5 * T2

Recursive Descent Parsing. Notes.

Easy to implement by hand

But does not always work …

Implementation of a Recursive Descent Parser

A Recursive Descent Parser. Preliminaries

Let TOKEN be the type of tokens» Special tokens INT, OPEN, CLOSE, PLUS,

Let the global next point to the next token

A Recursive Descent Parser (2)

Define boolean functions that check the token string for a match of» A given token terminal bool term(TOKEN tok) { return *next++ ==

tok; }» A given production of S (the nth) bool Sn() { … } // and test» Any production of S: bool S() { … } // or test

These functions advance next

For production E T + E bool E1() { return T() && term(PLUS) && E(); }

For production E T bool E2() { return T(); }

For all productions of E (with backtracking) bool E() { TOKEN *save = next; return (next = save, E1()) || (next = save, E2()); }

Functions for non-terminal Tbool T1() { return term(OPEN) && E() && term(CLOSE); }

bool T2() { return term(INT) && term(TIMES) && T(); }

bool T3() { return term(INT); }

bool T() { TOKEN *save = next; return (next = save, T1())

|| (next = save, T2()) || (next = save, T3()); }

Recursive Descent Parsing. Notes.

To start the parser » Initialize next to point to first token» Invoke E()

Notice how this simulates our backtracking example from lecture

Easy to implement by hand But does not always work …

» Predictive parsing is better

1 Introduction to Parsing. 2 Outline l Regular languages revisited l Parser overview Context-free...

Documents

Transcript of 1 Introduction to Parsing. 2 Outline l Regular languages revisited l Parser overview Context-free...

CFG exmpls

Almost-inner derivations of Lie algebras · This master thesis is about almost-inner derivations of Lie algebras, written to l ll the requirements for becoming a Master of Science

Derivations of Physics

REMARKS ON DERIVATIONS SEMIPRIMERINGS

Autorun Cfg

Taller Epp Cfg

Vhln01 Cfg

Frijol Cfg

physics12-derivations-electrostaticssbgstudy.com/documents/notes/class 12/physics/derivations... · 2019-11-23 · Title: physics12-derivations-electrostatics Author: CamScanner Subject:

Geshwic Cfg

Derivations and Constraints

Catálogo CFG

Penyederhanaan CFG

CFG Core Strategies

physics12-derivations-magnetics-effect 12/physics...Title physics12-derivations-magnetics-effect Author CamScanner Subject physics12-derivations-magnetics-effect

INVESTIGATING PHYSICS€¦ · INVESTIGATING PHYSICS ANDREW KENNY GILL & MACMILLAN DERIVATIONS OF FORMULAE AND LINKS. Contents Derivations of Formulae 1 Links 11. DERIVATIONS OF FORMULAE

physics12-derivations-emi-and-ac - SBG STUDYsbgstudy.com/documents/notes/class 12/physics/derivations...Title physics12-derivations-emi-and-ac Author CamScanner Subject physics12-derivations-emi-and-ac

Centones Derivations

Chapter10 CFG

Cs419 lec7 cfg