Syntax of All Sorts

53
Syntax of All Sorts COS 441 Princeton University Fall 2004

description

Syntax of All Sorts. COS 441 Princeton University Fall 2004. Acknowledgements. Many slides from this lecture have been adapted from John Mitchell’s CS-242 course at Stanford Good source of supplemental information http://theory.stanford.edu/people/jcm/books/cpl-teaching.html - PowerPoint PPT Presentation

Transcript of Syntax of All Sorts

Page 1: Syntax of All Sorts

Syntax of All Sorts

COS 441

Princeton University

Fall 2004

Page 2: Syntax of All Sorts

Acknowledgements

Many slides from this lecture have been adapted from John Mitchell’s CS-242 course at Stanford– Good source of supplemental information

http://theory.stanford.edu/people/jcm/books/cpl-teaching.html– Differs in foundational approach when

compared to Harper’s approach will discuss this difference in later lectures

Page 3: Syntax of All Sorts

Interpreter vs Compiler

Source Program

Source Program

Compiler

Input OutputInterpreter

Input OutputTarget

Program

Page 4: Syntax of All Sorts

Interpreter vs Compiler

• The difference is actually a bit more fuzzy

• Some “interpreters” compile to native code– SML/NJ runs native machine code!– Does fancy optimizations too

• Some “compilers” compile to byte-code which is interpreted – javac and a JVM which may also compile

Page 5: Syntax of All Sorts

Interactive vs Batch

• SML/NJ is a native code compiler with an interactive interface

• javac is a batch compiler for bytecode which may be interpreted or compiled to native code by a JVM

• Python compiles to bytecode then interprets the bytecode it has both batch and interactive interfaces

• Terms are historical and misleading– Best to be precise and verbose

Page 6: Syntax of All Sorts

Typical CompilerSource Program

Lexical Analyzer

Syntax Analyzer

Semantic Analyzer

Intermediate Code Generator

Code Optimizer

Code Generator Target Program

Page 7: Syntax of All Sorts

Brief Look at Syntax

• Grammar e ::= n | e + e | e £ e n ::= d | nd d ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

• Expressions in languagee e £ e e £ e + e n £ n + n nd £ d + d dd £ d + d

… 27 £ 4 + 3

Grammar defines a languageExpressions in language derived by sequence of productions

Page 8: Syntax of All Sorts

Parse Tree

• Derivation represented by treee e £ e e £ e + e n £ n + n nd £ d + d dd £ d + d … 27 £ 4 + 3

e

e £

e +

e

e27

4 3

Tree shows parenthesization of expression

Page 9: Syntax of All Sorts

Dealing with Parsing Ambiguity

• Ambiguity– Expression 27 £ 4 + 3 can be parsed two ways– Problem: 27 £ (4 + 3) (27 £ 4) + 3

• Ways to resolve ambiguity– Precedence

• Group £ before + • Parse 3 £ 4 + 2 as (3 £ 4) + 2

– Associativity• Parenthesize operators of equal precedence to left (or right)• Parse 3 + 4 + 5 as (3 + 4) + 5

Page 10: Syntax of All Sorts

Rewriting the Grammar

• Harper describe a way to rewrite the grammar to avoid ambiguity

• Eliminating ambiguity in a parsing grammar is a dark art– Not always even possible– Depends on the parsing technique you use– Learn all you want to know and more in a compiler

book or course

• Ambiguity is bad when trying to think formally about programming languages

Page 11: Syntax of All Sorts

Abstract Syntax Trees

• We want to reason about expressions inductively– So we need an inductive description of

expressions

• We could use the parse tree– But it has all this silly term and factor stuff to

parsing unambiguous

• Introduce nicer tree after the parsing phase

Page 12: Syntax of All Sorts

Syntax Comparisons

Digits d ::= 0 | … | 9

Numbers n ::= d | n d

Expressions e ::= n

| e1 + e2

| e1 £ e2

Numbers n 2 N

Expressions e ::= num[n]

| plus(e1,e2)

| times(e1,e2)

Digits d ::= 0 | … | 9

Numbers n ::= d | n d

Expressions e ::= t | t + e

Terms t ::= f | f £ t

Factors f ::= n | (e)

Abstract

Ambiguous Concrete

Unambiguous Concrete

Page 13: Syntax of All Sorts

Induction Over Syntax

• We can think of the CFG notation as defining a predicate exp

• Thinking about things this way lets us use rule induction on abstract syntax

times(E1,E2)expE1 exp E2 exp

times-eplus(E1,E2)exp

E1 exp E2 expplus-e

num[N] expN 2 N

num-e

Page 14: Syntax of All Sorts

A More General Pattern

• One might ask what is the general pattern of predicates defined by CFG that represent abstract syntax– represents a signature that assigns arities

to operators such a num[N], plus, and times

– Note the T1 … Tn are all unique

OP(T1,…,Tn) term

T1 term Tn term(OP) = n

Page 15: Syntax of All Sorts

An Example

succ(X)natX nat

Szero nat

Z

Page 16: Syntax of All Sorts

An Example

succ(X)natX nat

Szero nat

Z

Operator Arity

zero 0

succ 1

Page 17: Syntax of All Sorts

An Example

succ(X)natX nat

Szero nat

Z

n ::= zero

| succ(n)

Operator Arity

zero 0

succ 1

Page 18: Syntax of All Sorts

Another Example

succ(X)oddX even

S-O

zero evenZ-E

succ(X)evenX odd

S-E

Page 19: Syntax of All Sorts

Another Example

succ(X)oddX even

S-O

zero evenZ-E

succ(X)evenX odd

S-E

Operator Arity

zero 0

succ 1

even ::= zero

| succ(odd)

odd ::= succ(even)

Page 20: Syntax of All Sorts

Another Example

succ(X)oddX even

S-O

zero evenZ-E

succ(X)evenX odd

S-E

Operator Arity

zero 0

succ 1

Operator Arity

succ 1

even ::= zero

| succ(odd)

odd ::= succ(even)

Page 21: Syntax of All Sorts

NOT Abstract Syntax

eq(X,X) equal

A(X,succ(Y),succ(Z))A(X,Y,Z)

F(succ(succ(N)),Z)F(succ(N),X) F(N,Y) A(X,Y,Z)

Page 22: Syntax of All Sorts

Abstract Syntax To ML

n ::= zero

| succ(n)

n 2 N

e ::= num[n]

| plus(e1,e2)

| times(e1,e2)

Page 23: Syntax of All Sorts

Abstract Syntax To ML

n ::= zero

| succ(n)

n 2 N

e ::= num[n]

| plus(e1,e2)

| times(e1,e2)

datatype nat = zero | succ of nat

Page 24: Syntax of All Sorts

Abstract Syntax To ML

n ::= zero

| succ(n)

n 2 N

e ::= num[n]

| plus(e1,e2)

| times(e1,e2)

datatype exp = num of nat | plus of (exp * exp)| times of (exp * exp)

datatype nat = zero | succ of nat

Page 25: Syntax of All Sorts

Abstract Syntax To ML

n ::= zero

| succ(n)

n 2 N

e ::= num[n]

| plus(e1,e2)

| times(e1,e2)

datatype exp = Num of int | Plus of (exp * exp)| Times of (exp * exp)

Capitalize by convention

datatype nat = Zero | Succ of nat

A small cheat for performance

Page 26: Syntax of All Sorts

Abstract Syntax To ML

• Converting things is not always straightforward– Constructor names need to be distinct

• ML lacks some features that would make some things easier– Operator overloading – Interacts badly with type-inference

• Subtyping would improve things too– We can get by with predicates sometimes instead

Page 27: Syntax of All Sorts

Abstract Syntax To ML (cont.)

n ::= zero

| succ(n)

datatype nat = Zero | Succ of nat

even ::= zero

| succ(odd)

odd ::= succ(even)

Page 28: Syntax of All Sorts

Abstract Syntax To ML (cont.)

n ::= zero

| succ(n)

datatype nat = Zero | Succ of nat

even ::= zero

| succ(odd)

odd ::= succ(even)

datatype even = EvenZero| EvenSucc of (odd)and odd = OddSucc of even

Page 29: Syntax of All Sorts

Abstract Syntax To ML (cont.)

n ::= zero

| succ(n)

datatype nat = Zero | Succ of nat

fun even(Zero) = true|even(Succ(n))= odd(n)and odd(Zero) = false |odd(Succ(n)) = even(n)

even ::= zero

| succ(odd)

odd ::= succ(even)

Page 30: Syntax of All Sorts

Syntax Trees Are Not Enough

• Programming languages include binding constructs– variables function arguments, variable function

declarations, type declarations

• When reasoning about binding constructs we want to ignore irrelevant details – e.g. the actual “spelling” of the variable– we only that variables are the same or different

• Still we must be clear about the scope of a variable definition

Page 31: Syntax of All Sorts

Abstract Binding Trees

• Harper uses abstract binding trees (Chapter 5)– This solution is based on very recent work

M. J. Gabbay and A.M. Pitts, A New Approach to Abstract Syntax with Variable Binding, Formal Aspects of Computing 13(2002)341-363

• The solution is new but the problems are old!Alonzo Church. A set of postulates for the foundation of logic. The

Annals of Mathematics, Second Series, Volume 33, Issue 2 (Apr. 1932), 346-366.

• We will talk about the problem with binding today and deal with the solutions in the next lecture

Page 32: Syntax of All Sorts

Lambda Calculus

• Formal system with three parts– Notation for function expressions– Proof system for equations– Calculation rules called reduction

• Basic syntactic notions– Free and bound variables– Illustrates some ideas about scope of binding

• Symbolic evaluation useful for discussing programs

Page 33: Syntax of All Sorts

History

• Original intention– Formal theory of substitution

• More successful for computable functions– Substitution ! symbolic computation– Church/Turing thesis

• Influenced design of Lisp, ML, other languages

• Important part of CS history and theory

Page 34: Syntax of All Sorts

Expressions and Functions

• Expressionsx + y x + 2 £ y + z

• Functionsx. (x + y) z. (x + 2 £ y + z)

• Application(x. (x + y)) 3 = 3 + y(z. (x + 2 £ y + z)) 5 = x + 2 £ y + 5

Parsing: x. f (f x) = x.( f (f (x)) )

Page 35: Syntax of All Sorts

Free and Bound Variables

• Bound variable is “placeholder”– Variable x is bound in x. (x+y) – Function x. (x+y) is same function as z. (z+y)

• Compare x+y dx = z+y dz x P(x) = z P(z)

• Name of free (=unbound) variable does matter– Variable y is free in x. (x+y) – Function x. (x+y) is not same as x. (x+z)

• Occurrences– y is free and bound in x. ((y. y+2) x) + y

Page 36: Syntax of All Sorts

Reduction

• Basic computation rule is -reduction

(x. e1) e2 [xÃe2]e1

where substitution involves renaming as

needed

• Example:

(f. x. f (f x)) (y. y+x)

Page 37: Syntax of All Sorts

Rename Bound Variables

• Rename bound variables to avoid conflicts

(f. z. f (f z)) (y. y+x)

z. (z+x)+x

• Substitute “blindly”

(f. x. f (f x)) (y. y+x)

x. (x+x)+x

Page 38: Syntax of All Sorts

Reduction

(f. x. f (f x)) (y. y+x)

Page 39: Syntax of All Sorts

Reduction

(f. z. f (f z)) (y. y+x)

Rename bound “x” to “z” to avoid conflict with free “x”

Page 40: Syntax of All Sorts

Reduction

(f. z. f (f z)) (y. y+x)

[fÃ(y. y+x)](z. f (f z))

Page 41: Syntax of All Sorts

Reduction

(f. z. f (f z)) (y. y+x)

z. (y. y+x) ((y. y+x) z))

Page 42: Syntax of All Sorts

Reduction

(f. z. f (f z)) (y. y+x)

z. (y. y+x) ((y. y+x) z))

z. (y. y+x) ([yÃz](y+x))

Page 43: Syntax of All Sorts

Reduction

(f. z. f (f z)) (y. y+x)

z. (y. y+x) ((y. y+x) z))

z. (y. y+x) (z+x)

Page 44: Syntax of All Sorts

Reduction

(f. z. f (f z)) (y. y+x)

z. (y. y+x) ((y. y+x) z))

z. (y. y+x) (z+x)

z. [yÃ(z+x)](y+x)

Page 45: Syntax of All Sorts

Reduction

(f. z. f (f z)) (y. y+x)

z. (y. y+x) ((y. y+x) z))

z. (y. y+x) (z+x)

z. ((z+x)+x)

Page 46: Syntax of All Sorts

Incorrect Reduction

(f. x. f (f x)) (y. y+x)

Page 47: Syntax of All Sorts

Incorrect Reduction

(f. x. f (f x)) (y. y+x)

[fÃ(y. y+x)](x. f (f x))

Page 48: Syntax of All Sorts

Incorrect Reduction

(f. x. f (f x)) (y. y+x)

x. (y. y+x) ((y. y+x) x))

Page 49: Syntax of All Sorts

Incorrect Reduction

(f. x. f (f x)) (y. y+x)

x. (y. y+x) ((y. y+x) x))

x. (y. y+x) ([yÃx](y+x))

Page 50: Syntax of All Sorts

Incorrect Reduction

(f. x. f (f x)) (y. y+x)

x. (y. y+x) ((y. y+x) x))

x. (y. y+x) (x+x)

Page 51: Syntax of All Sorts

Incorrect Reduction

(f. x. f (f x)) (y. y+x)

x. (y. y+x) ((y. y+x) x))

x. (y. y+x) (x+x)

x. [yÃ(x+x)](y+x)

Page 52: Syntax of All Sorts

Incorrect Reduction

(f. x. f (f x)) (y. y+x)

x. (y. y+x) ((y. y+x) x))

x. (y. y+x) (x+x)

x. ((x+x)+x)

Page 53: Syntax of All Sorts

Main Points About -calculus

• captures “essence” of variable binding– Function parameters– Bound variables can be renamed

• Succinct function expressions

• Simple symbolic evaluator via substitution

• Easy rule: always rename bound variables to be distinct