Introduction to Functional Programming. John Harrison

Introduction to Functional

Programming

John Harrisonjrh�cl�cam�ac�uk

�rd December ��

Preface

These are the lecture notes accompanying the course Introduction to FunctionalProgramming� which I taught at Cambridge University in the academic year��

This course has mainly been taught in previous years by Mike Gordon� I haveretained the basic structure of his course� with a blend of theory and practice�and have borrowed heavily in what follows from his own lecture notes� availablein book form as Part II of Gordon �� I have also been in�uenced by thosewho have taught related courses here� such as Andy Gordon and Larry Paulsonand� in the chapter on types� by Andy Pitts s course on the subject�

The large chapter on examples is not directly examinable� though studying itshould improve the reader s grasp of the early parts and give a better idea abouthow ML is actually used�

Most chapters include some exercises� either invented specially for this courseor taken from various sources� They are normally intended to require a littlethought� rather than just being routine drill� Those I consider fairly di�cult aremarked with a ��

These notes have not yet been tested extensively and no doubt contain variouserrors and obscurities� I would be grateful for constructive criticism from anyreaders�

John Harrison jrh�cl�cam�ac�uk��

ii

Plan of the lectures

This chapter indicates roughly how the material is to be distributed over a courseof twelve lectures� each of slightly less than one hour�

�� Introduction and Overview Functional and imperative programming�contrast� pros and cons� General structure of the course� how lambda cal�culus turns out to be a general programming language� Lambda notation�how it clari�es variable binding and provides a general analysis of mathe�matical notation� Currying� Russell s paradox�

�� Lambda calculus as a formal system Free and bound variables� Sub�stitution� Conversion rules� Lambda equality� Extensionality� Reductionand reduction strategies� The Church�Rosser theorem� statement and con�sequences� Combinators�

�� Lambda calculus as a programming language Computability back�ground� Turing completeness no proof�� Representing data and basic op�erations� truth values� pairs and tuples� natural numbers� The predecessoroperation� Writing recursive functions� �xed point combinators� Let ex�pressions� Lambda calculus as a declarative language�

�� Types Why types� Answers from programming and logic� Simply typedlambda calculus� Church and Curry typing� Let polymorphism� Mostgeneral types and Milner s algorithm� Strong normalization no proof��and its negative consequences for Turing completeness� Adding a recursionoperator�

�� ML ML as typed lambda calculus with eager evaluation� Details of evalu�ation strategy� The conditional� The ML family� Practicalities of interact�ing with ML� Writing functions� Bindings and declarations� Recursive andpolymorphic functions� Comparison of functions�

�� Details of ML More about interaction with ML� Loading from �les� Com�ments� Basic data types� unit� booleans� numbers and strings� Built�inoperations� Concrete syntax and in�xes� More examples� Recursive typesand pattern matching� Examples� lists and recursive functions on lists�

iii

iv

�� Proving programs correct The correctness problem� Testing and veri��cation� The limits of veri�cation� Functional programs as mathematicalobjects� Examples of program proofs� exponential� GCD� append and re�verse�

� E�ective ML Using standard combinators� List iteration and other usefulcombinators� examples� Tail recursion and accumulators� why tail recursionis more e�cient� Forcing evaluation� Minimizing consing� More e�cientreversal� Use of �as � Imperative features� exceptions� references� arraysand sequencing� Imperative features and types� the value restriction�

�� ML examples I� symbolic di�erentiation Symbolic computation� Datarepresentation� Operator precedence� Association lists� Prettyprintingexpressions� Installing the printer� Di�erentiation� Simpli�cation� Theproblem of the �right simpli�cation�

�� ML examples II� recursive descent parsing Grammars and the parsingproblem� Fixing ambiguity� Recursive descent� Parsers in ML� Parsercombinators� examples� Lexical analysis using the same techniques� Aparser for terms� Automating precedence parsing� Avoiding backtracking�Comparison with other techniques�

�� ML examples III� exact real arithmetic Real numbers and �nite repre�sentations� Real numbers as programs or functions� Our representation ofreals� Arbitrary precision integers� Injecting integers into the reals� Nega�tion and absolute value� Addition� the importance of rounding division�Multiplication and division by integers� General multiplication� Inverse anddivision� Ordering and equality� Testing� Avoiding reevaluation throughmemo functions�

�� ML examples IV� Prolog and theorem proving Prolog terms� Case�sensitive lexing� Parsing and printing� including list syntax� Uni�cation�Backtracking search� Prolog examples� Prolog�style theorem proving� Ma�nipulating formulas� negation normal form� Basic prover� the use of con�tinuations� Examples� Pelletier problems and whodunit�

Contents

� Introduction �

�� The merits of functional programming � � � � � � � � � � � � � � � �� Outline � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �

� Lambda calculus �

�� The bene�ts of lambda notation � � � � � � � � � � � � � � � � � � � �� Russell s paradox � � � � � � � � � � � � � � � � � � � � � � � � � � � �� Lambda calculus as a formal system � � � � � � � � � � � � � � � � � ��

�� Lambda terms � � � � � � � � � � � � � � � � � � � � � � � � � �� Free and bound variables � � � � � � � � � � � � � � � � � � � �� Substitution � � � � � � � � � � � � � � � � � � � � � � � � � � �� Conversions � � � � � � � � � � � � � � � � � � � � � � � � � � �� Lambda equality � � � � � � � � � � � � � � � � � � � � � � � �� Extensionality � � � � � � � � � � � � � � � � � � � � � � � � � �� Lambda reduction � � � � � � � � � � � � � � � � � � � � � � �� Reduction strategies � � � � � � � � � � � � � � � � � � � � � �� The Church�Rosser theorem � � � � � � � � � � � � � � � � � ��

�� Combinators � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ��

� Lambda calculus as a programming language ��

�� Representing data in lambda calculus � � � � � � � � � � � � � � � � �� Truth values and the conditional � � � � � � � � � � � � � � �� Pairs and tuples � � � � � � � � � � � � � � � � � � � � � � � � �� The natural numbers � � � � � � � � � � � � � � � � � � � � � ��

�� Recursive functions � � � � � � � � � � � � � � � � � � � � � � � � � � �� Let expressions � � � � � � � � � � � � � � � � � � � � � � � � � � � � �� Steps towards a real programming language � � � � � � � � � � � � �� Further reading � � � � � � � � � � � � � � � � � � � � � � � � � � � � ��

� Types �

�� Typed lambda calculus � � � � � � � � � � � � � � � � � � � � � � � � �� The stock of types � � � � � � � � � � � � � � � � � � � � � � �� Church and Curry typing � � � � � � � � � � � � � � � � � � ��

v

vi CONTENTS

�� Formal typability rules � � � � � � � � � � � � � � � � � � � � �� Type preservation � � � � � � � � � � � � � � � � � � � � � � � ��

�� Polymorphism � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �� Let polymorphism � � � � � � � � � � � � � � � � � � � � � � �� Most general types � � � � � � � � � � � � � � � � � � � � � � ��

�� Strong normalization � � � � � � � � � � � � � � � � � � � � � � � � � ��

A taste of ML �

�� Eager evaluation � � � � � � � � � � � � � � � � � � � � � � � � � � � �� Consequences of eager evaluation � � � � � � � � � � � � � � � � � � �� The ML family � � � � � � � � � � � � � � � � � � � � � � � � � � � � �� Starting up ML � � � � � � � � � � � � � � � � � � � � � � � � � � � � �� Interacting with ML � � � � � � � � � � � � � � � � � � � � � � � � � �� Bindings and declarations � � � � � � � � � � � � � � � � � � � � � � �� Polymorphic functions � � � � � � � � � � � � � � � � � � � � � � � � �� Equality of functions � � � � � � � � � � � � � � � � � � � � � � � � � ��

� Further ML ��

�� Basic datatypes and operations � � � � � � � � � � � � � � � � � � � �� Syntax of ML phrases � � � � � � � � � � � � � � � � � � � � � � � � �� Further examples � � � � � � � � � � � � � � � � � � � � � � � � � � � �� Type de�nitions � � � � � � � � � � � � � � � � � � � � � � � � � � � � ��

�� Pattern matching � � � � � � � � � � � � � � � � � � � � � � � �� Recursive types � � � � � � � � � � � � � � � � � � � � � � � � �� Tree structures � � � � � � � � � � � � � � � � � � � � � � � � �� The subtlety of recursive types � � � � � � � � � � � � � � � �

� Proving programs correct �

�� Functional programs as mathematical objects � � � � � � � � � � � �� Exponentiation � � � � � � � � � � � � � � � � � � � � � � � � � � � � �� Greatest common divisor � � � � � � � � � � � � � � � � � � � � � � � �� Appending � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �� Reversing � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �

E�ective ML �

�� Useful combinators � � � � � � � � � � � � � � � � � � � � � � � � � � �� Writing e�cient code � � � � � � � � � � � � � � � � � � � � � � � � � ��

�� Tail recursion and accumulators � � � � � � � � � � � � � � � �� Minimizing consing � � � � � � � � � � � � � � � � � � � � � � �� Forcing evaluation � � � � � � � � � � � � � � � � � � � � � � ��

�� Imperative features � � � � � � � � � � � � � � � � � � � � � � � � � � �� Exceptions � � � � � � � � � � � � � � � � � � � � � � � � � � � �� References and arrays � � � � � � � � � � � � � � � � � � � � � ��

CONTENTS vii

�� Sequencing � � � � � � � � � � � � � � � � � � � � � � � � � � �� Interaction with the type system � � � � � � � � � � � � � � ��

Examples ��

�� Symbolic di�erentiation � � � � � � � � � � � � � � � � � � � � � � � �� First order terms � � � � � � � � � � � � � � � � � � � � � � � �� Printing � � � � � � � � � � � � � � � � � � � � � � � � � � � � �� Derivatives � � � � � � � � � � � � � � � � � � � � � � � � � � �� Simpli�cation � � � � � � � � � � � � � � � � � � � � � � � � � ��

�� Parsing � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �� Recursive descent parsing � � � � � � � � � � � � � � � � � � �� Parser combinators � � � � � � � � � � � � � � � � � � � � � � �� Lexical analysis � � � � � � � � � � � � � � � � � � � � � � � � �� Parsing terms � � � � � � � � � � � � � � � � � � � � � � � � � �� Automatic precedence parsing � � � � � � � � � � � � � � � � �� Defects of our approach � � � � � � � � � � � � � � � � � � � ��

�� Exact real arithmetic � � � � � � � � � � � � � � � � � � � � � � � � � �� Representation of real numbers � � � � � � � � � � � � � � � �� Arbitrary�precision integers � � � � � � � � � � � � � � � � � �� Basic operations � � � � � � � � � � � � � � � � � � � � � � � �� General multiplication � � � � � � � � � � � � � � � � � � � � �� Multiplicative inverse � � � � � � � � � � � � � � � � � � � � � �� Ordering relations � � � � � � � � � � � � � � � � � � � � � � � �� Caching � � � � � � � � � � � � � � � � � � � � � � � � � � � � ��

�� Prolog and theorem proving � � � � � � � � � � � � � � � � � � � � � �� Prolog terms � � � � � � � � � � � � � � � � � � � � � � � � � �� Lexical analysis � � � � � � � � � � � � � � � � � � � � � � � � �� Parsing � � � � � � � � � � � � � � � � � � � � � � � � � � � � �� Uni�cation � � � � � � � � � � � � � � � � � � � � � � � � � � � �� Backtracking � � � � � � � � � � � � � � � � � � � � � � � � � �� Examples � � � � � � � � � � � � � � � � � � � � � � � � � � � �� Theorem proving � � � � � � � � � � � � � � � � � � � � � � � ��

Chapter �

Introduction

Programs in traditional languages� such as FORTRAN� Algol� C and Modula��rely heavily on modifying the values of a collection of variables� called the state� Ifwe neglect the input�output operations and the possibility that a program mightrun continuously e�g� the controller for a manufacturing process�� we can arriveat the following abstraction� Before execution� the state has some initial value�� representing the inputs to the program� and when the program has �nished�the state has a new value �� including the results�� Moreover� during execution�each command changes the state� which has therefore proceeded through some�nite sequence of values�

� � �� n � ��

For example in a sorting program� the state initially includes an array ofvalues� and when the program has �nished� the state has been modi�ed in such away that these values are sorted� while the intermediate states represent progresstowards this goal�

The state is typically modi�ed by assignment commands� often written inthe form v � E or v �� E where v is a variable and E some expression� Thesecommands can be executed in a sequential manner by writing them one after theother in the program� often separated by a semicolon� By using statements likeif and while� one can execute these commands conditionally� and repeatedly�depending on other properties of the current state� The program amounts to a setof instructions on how to perform these state changes� and therefore this style ofprogramming is often called imperative or procedural� Correspondingly� the tra�ditional languages intended to support it are known as imperative or procedurallanguages�

Functional programming represents a radical departure from this model� Es�sentially� a functional program is simply an expression� and execution meansevaluation of the expression�� We can see how this might be possible� in gen�

�Functional programming is often called �applicative programming� since the basic mecha�

�

� CHAPTER �� INTRODUCTION

eral terms� as follows� Assuming that an imperative program as a whole� isdeterministic� i�e� the output is completely determined by the input� we can saythat the �nal state� or whichever fragments of it are of interest� is some func�tion of the initial state� say �� f�� In functional programming this viewis emphasized� the program is actually an expression that corresponds to themathematical function f � Functional languages support the construction of suchexpressions by allowing rather powerful functional constructs�

Functional programming can be contrasted with imperative programming ei�ther in a negative or a positive sense� Negatively� functional programs do notuse variables � there is no state� Consequently� they cannot use assignments�since there is nothing to assign to� Furthermore the idea of executing commandsin sequence is meaningless� since the �rst command can make no di�erence tothe second� there being no state to mediate between them� Positively however�functional programs can use functions in much more sophisticated ways� Func�tions can be treated in exactly the same way as simpler objects like integers� theycan be passed to other functions as arguments and returned as results� and ingeneral calculated with� Instead of sequencing and looping� functional languagesuse recursive functions� i�e� functions that are de�ned in terms of themselves� Bycontrast� most traditional languages provide poor facilities in these areas� C al�lows some limited manipulation of functions via pointers� but does not allow oneto create new functions dynamically� FORTRAN does not even support recursionat all�

To illustrate the distinction between imperative and functional programming�the factorial function might be coded imperatively in C without using C s un�usual assignment operations� as�

int fact�int n�

� int x �

while �n � ��

� x � x n

n � n �

�

return x

�

whereas it would be expressed in ML� the functional language we discuss later�as a recursive function�

let rec fact n �

if n � � then

else n fact�n � �

nism is the application of functions to arguments��Compare Naur�s remarks �Raphael �� that he can write any program in a single state�

ment Output � Program�Input�

�� THE MERITS OF FUNCTIONAL PROGRAMMING �

In fact� this sort of de�nition can be used in C too� However for more sophis�ticated uses of functions� functional languages stand in a class by themselves�

�� The merits of functional programming

At �rst sight� a language without variables or sequencing might seem completelyimpractical� This impression cannot be dispelled simply by a few words here�But we hope that by studying the material that follows� readers will gain anappreciation of how it is possible to do a lot of interesting programming in thefunctional manner�

There is nothing sacred about the imperative style� familiar though it is�Many features of imperative languages have arisen by a process of abstractionfrom typical computer hardware� from machine code to assemblers� to macro as�semblers� and then to FORTRAN and beyond� There is no reason to supposethat such languages represent the most palatable way for humans to communi�cate programs to a machine� After all� existing hardware designs are not sacredeither� and computers are supposed to do our bidding rather than conversely�Perhaps the right approach is not to start from the hardware and work upwards�but to start with programming languages as an abstract notation for specifyingalgorithms� and then work down to the hardware Dijkstra �� Actually� thistendency can be detected in traditional languages too� Even FORTRAN allowsarithmetical expressions to be written in the usual way� The programmer is notburdened with the task of linearizing the evaluation of subexpressions and �ndingtemporary storage for intermediate results�

This suggests that the idea of developing programming languages quite dif�ferent from the traditional imperative ones is certainly defensible� However� toemphasize that we are not merely proposing change for change s sake� we shouldsay a few words about why we might prefer functional programs to imperativeones�

Perhaps the main reason is that functional programs correspond more directlyto mathematical objects� and it is therefore easier to reason about them� In orderto get a �rm grip on exactly what programs mean� we might wish to assign anabstract mathematical meaning to a program or command � this is the aimof denotational semantics semantics � meaning�� In imperative languages� thishas to be done in a rather indirect way� because of the implicit dependencyon the value of the state� In simple imperative languages� one can associatea command with a function � � �� where � is the set of possible values forthe state� That is� a command takes some state and produces another state�It may fail to terminate e�g� while true do x �� x�� so this function mayin general be partial� Alternative semantics are sometimes preferred� e�g� interms of predicate transformers Dijkstra �� But if we add features that canpervert the execution sequence in more complex ways� e�g� goto� or C s break


and continue� even these interpretations no longer work� since one commandcan cause the later commands to be skipped� Instead� one typically uses a morecomplicated semantics based on continuations�

By contrast functional programs� in the words of Henson �� wear theirsemantics on their sleeves �� We can illustrate this using ML� The basic datatypeshave a direct interpretation as mathematical objects� Using the standard notationof ��X for �the semantics of X � we can say for example that ��int � Z� Now theML function fact de�ned by�

let rec fact n �

if n � � then

else n fact�n � �

has one argument of type int� and returns a value of type int� so it can simplybe associated with an abstract partial function Z� Z�

��fact n� �

�n! if n � �� otherwise

Here � denotes unde�nedness� since for negative arguments� the programfails to terminate�� This kind of simple interpretation� however� fails in non�functional programs� since so�called �functions might not be functions at all inthe mathematical sense� For example� the standard C library features a functionrand�� which returns di�erent� pseudo�random values on successive calls� Thissort of thing can be implemented by using a local static variable to rememberthe previous result� e�g�

int rand�void�

� static int n � �

return n � �� n � ��

�

Thus� one can see the abandonment of variables and assignments as the logicalnext step after the abandonment of goto� since each step makes the semanticssimpler� A simpler semantics makes reasoning about programs more straightfor�ward� This opens up more possibilities for correctness proofs� and for provablycorrect transformations into more e�cient programs�

Another potential advantage of functional languages is the following� Since theevaluation of expressions has no side�e�ect on any state� separate subexpressionscan be evaluated in any order without a�ecting each other� This means thatfunctional programs may lend themselves well to parallel implementation� i�e�the computer can automatically farm out di�erent subexpressions to di�erent

�More� denotational semantics can be seen as an attempt to turn imperative languages intofunctional ones by making the state explicit�

�� OUTLINE �

processors� By contrast� imperative programs often impose a fairly rigid order ofexecution� and even the limited interleaving of instructions in modern pipelinedprocessors turns out to be complicated and full of technical problems�

Actually� ML is not a purely functional programming language� it does havevariables and assignments if required� Most of the time� we will work inside thepurely functional subset� But even if we do use assignments� and lose some of thepreceding bene�ts� there are advantages too in the more �exible use of functionsthat languages like ML allow� Programs can often be expressed in a very conciseand elegant style using higher�order functions functions that operate on otherfunctions�� Code can be made more general� since it can be parametrized evenover other functions� For example� a program to add up a list of numbers and aprogram to multiply a list of numbers can be seen as instances of the same pro�gram� parametrized by the pairwise arithmetic operation and the correspondingidentity� In one case it is given " and � and in the other case� � and �� Finally�functions can also be used to represent in�nite data in a convenient way � forexample we will show later how to use functions to perform exact calculationwith real numbers� as distinct from �oating point approximations�

At the same time� functional programs are not without their problems� Sincethey correspond less directly the the eventual execution in hardware� it can bedi�cult to reason about their exact usage of resources such as time and space�Input�output is also di�cult to incorporate neatly into a functional model� thoughthere are ingenious techniques based on in�nite sequences�

It is up to readers to decide� after reading this book� on the merits of thefunctional style� We do not wish to enforce any ideologies� merely to point outthat there are di�erent ways of looking at programming� and that in the rightsituations� functional programming may have considerable merits� Most of ourexamples are chosen from areas that might loosely be described as �symboliccomputation � for we believe that functional programs work well in such appli�cations� However� as always one should select the most appropriate tool for thejob� It may be that imperative programming� object�oriented programming orlogic programming are more suited to certain tasks� Horses for courses�

�� Outline

For those used to imperative programming� the transition to functional program�ming is inevitably di�cult� whatever approach is taken� While some will be im�

�Elegance is subjective and conciseness is not an end in itself� Functional languages andother languages like APL often create a temptation to produce very short tricky code whichis elegant to cognoscenti but obscure to outsiders�

�This parallels the notion of abstraction in pure mathematics e�g� that the additive andmultiplicative structures over numbers are instances of the abstract notion of a monoid� Thissimilarly avoids duplication and increases elegance�


patient to get quickly to real programming� we have chosen to start with lambdacalculus� and show how it can be seen as the theoretical underpinning for func�tional languages� This has the merit of corresponding quite well to the actualhistorical line of development�

So �rst we introduce lambda calculus� and show how what was originallyintended as a formal logical system for mathematics turned out to be a completelygeneral programming language� We then discuss why we might want to add typesto lambda calculus� and show how it can be done� This leads us into ML� which isessentially an extended and optimized implementation of typed lambda calculuswith a certain evaluation strategy� We cover the practicalities of basic functionalprogramming in ML� and discuss polymorphism and most general types� Wethen move on to more advanced topics including exceptions and ML s imperativefeatures� We conclude with some substantial examples� which we hope provideevidence for the power of ML�

Further reading

Numerous textbooks on �functional programming include a general introductionto the �eld and a contrast with imperative programming � browse through afew and �nd one that you like� For example� Henson �� contains a goodintroductory discussion� and features a similar mixture of theory and practiceto this text� A detailed and polemical advocacy of the functional style is givenby Backus �� the main inventor of FORTRAN� Gordon �� discusses theproblems of incorporating input�output into functional languages� and some solu�tions� Readers interested in denotational semantics� for imperative and functionallanguages� may look at Winskel ��

Chapter �

Lambda calculus

Lambda calculus is based on the so�called �lambda notation for denoting func�tions� In informal mathematics� when one wants to refer to a function� one usually�rst gives the function an arbitrary name� and thereafter uses that name� e�g�

Suppose f � R � R is de�ned by�

fx� �

�� if x � �x�sin��x�� if x ��

Then f �x� is not Lebesgue integrable over the unit interval ��

Most programming languages� C for example� are similar in this respect� wecan de�ne functions only by giving them names� For example� in order to usethe successor function which adds � to its argument� in nontrivial ways e�g�consider a pointer to it�� then even though it is very simple� we need to name itvia some function de�nition such as�

int suc�int n�

� return n �

�

In either mathematics or programming� this seems quite natural� and gener�ally works well enough� However it can get clumsy when higher order functionsfunctions that manipulate other functions� are involved� In any case� if we wantto treat functions on a par with other mathematical objects� the insistence onnaming is rather inconsistent� When discussing an arithmetical expression builtup from simpler ones� we just write the subexpressions down� without needing togive them names� Imagine if we always had to deal with arithmetic expressionsin this way�

De�ne x and y by x � � and y � � respectively� Then xx � y�

�

CHAPTER �� LAMBDA CALCULUS

Lambda notation allows one to denote functions in much the same way as anyother sort of mathematical object� There is a mainstream notation sometimesused in mathematics for this purpose� though it s normally still used as part ofthe de�nition of a temporary name� We can write

x �� t�x

to denote the function mapping any argument x to some arbitrary expression t�x �which usually� but not necessarily� contains x it is occasionally useful to #throwaway$ an argument�� However� we shall use a di�erent notation developed byChurch ��

�x� t�x

which should be read in the same way� For example� �x�x is the identity functionwhich simply returns its argument� while �x� x� is the squaring function�

The symbol � is completely arbitrary� and no signi�cance should be read intoit� Indeed one often sees� particularly in French texts� the alternative notation�x t�x �� Apparently it arose by a complicated process of evolution� Originally�the famous Principia Mathematica Whitehead and Russell �� used the �hat notation t�%x for the function of x yielding t�x � Church modi�ed this to %x� t�x �but since the typesetter could not place the hat on top of the x� this appeared as�x� t�x � which then mutated into �x� t�x in the hands of another typesetter�

�� The bene�ts of lambda notation

Using lambda notation we can clear up some of the confusion engendered byinformal mathematical notation� For example� it s common to talk sloppily about�fx� � leaving context to determine whether we mean f itself� or the result ofapplying it to particular x� A further bene�t is that lambda notation givesan attractive analysis of practically the whole of mathematical notation� If westart with variables and constants� and build up expressions using just lambda�abstraction and application of functions to arguments� we can represent verycomplicated mathematical expressions�

We will use the conventional notation fx� for the application of a function fto an argument x� except that� as is traditional in lambda notation� the bracketsmay be omitted� allowing us to write just f x� For reasons that will becomeclear in the next paragraph� we assume that function application associates tothe left� i�e� f x y means fx��y�� As a shorthand for �x� �y� t�x� y we will use�x y� t�x� y � and so on� We also assume that the scope of a lambda abstractionextends as far to the right as possible� For example �x�x y means �x�x y� ratherthan �x� x� y�

�� THE BENEFITS OF LAMBDA NOTATION �

At �rst sight� we need some special notation for functions of several arguments�However there is a way of breaking down such applications into ordinary lambdanotation� called currying� after the logician Curry �� Actually the devicehad previously been used by both Frege �� and Sch&on�nkel �� but it seasy to understand why the corresponding appellations haven t caught the publicimagination�� The idea is to use expressions like �x y� x " y� This may beregarded as a function R � R � R�� so it is said to be a �higher order function or �functional since when applied to one argument� it yields another function�which then accepts the second argument� In a sense� it takes its arguments oneat a time rather than both together� So we have for example�

�x y� x " y� � � � �y� � " y� � � � " �

Observe that function application is assumed to associate to the left in lambdanotation precisely because currying is used so much�

Lambda notation is particularly helpful in providing a uni�ed treatment ofbound variables� Variables in mathematics normally express the dependency ofsome expression on the value of that variable� for example� the value of x� " �depends on the value of x� In such contexts� we will say that a variable is free�However there are other situations where a variable is merely used as a place�marker� and does not indicate such a dependency� Two common examples arethe variable m in

nXm��

m �nn " ��

�

and the variable y in

Z x

��y " a dy � x� " ax

In logic� the quanti�ers x� P �x �for all x� P �x � and x� P �x �there existsan x such that P �x � provide further examples� and in set theory we have setabstractions like fx j P �x g as well as indexed unions and intersections� In suchcases� a variable is said to be bound� In a certain subexpression it is free� but inthe whole expression� it is bound by a variable�binding operation like summation�The part �inside this variable�binding operation is called the scope of the boundvariable�

A similar situation occurs in most programming languages� at least from Algol�� onwards� Variables have a de�nite scope� and the formal arguments of proce�dures and functions are e�ectively bound variables� e�g� n in the C de�nition ofthe successor function given above� One can actually regard variable declarationsas binding operations for the enclosed instances of the corresponding variables��Note� by the way� that the scope of a variable should be distinguished sharplyfrom its lifetime� In the C function rand that we gave in the introduction� n had

�� CHAPTER �� LAMBDA CALCULUS

a textually limited scope but it retained its value even outside the execution ofthat part of the code�

We can freely change the name of a bound variable without changing themeaning of the expression� e�g�Z x

��z " a dz � x� " ax

Similarly� in lambda notation� �x� E�x and �y� E�y are equivalent� this iscalled alpha�equivalence and the process of transforming between such pairs iscalled alpha�conversion� We should add the proviso that y is not a free variablein E�x � or the meaning clearly may change� just asZ x

��a " a da �� x� " ax

It is possible to have identically�named free and bound variables in the sameexpression� though this can be confusing� it is technically unambiguous� e�g�Z x

��x " a dx � x� " ax

In fact the usual Leibniz notation for derivatives has just this property� e�g� in�

d

dxx� � �x

x is used both as a bound variable to indicate that di�erentiation is to takeplace with respect to x� and as a free variable to show where to evaluate theresulting derivative� This can be confusing� e�g� f �gx�� is usually taken to meansomething di�erent from d

dxfgx�� Careful writers� especially in multivariate

work� often make the separation explicit by writing�

jd

dxx�jx � �x

or

jd

dzz�jx � �x

Part of the appeal of lambda notation is that all variable�binding opera�tions like summation� di�erentiation and integration can be regarded as func�tions applied to lambda�expressions� Subsuming all variable�binding operationsby lambda abstraction allows us to concentrate on the technical problems ofbound variables in one particular situation� For example� we can view d

dxx� as

a syntactic sugaring of D �x� x�� x where D � R � R� � R � R is a dif�ferentiation operator� yielding the derivative of its �rst function� argument atthe point indicated by its second argument� Breaking down the everyday syntaxcompletely into lambda notation� we have D �x� EXP x �� x for some constantEXP representing the exponential function�

�� RUSSELL�S PARADOX ��

In this way� lambda notation is an attractively general �abstract syntax formathematics� all we need is the appropriate stock of constants to start with�Lambda abstraction seems� in retrospect� to be the appropriate primitive in termsof which to analyze variable binding� This idea goes back to Church s encodingof higher order logic in lambda notation� and as we shall see in the next chapter�Landin has pointed out how many constructs from programming languages havea similar interpretation� In recent times� the idea of using lambda notation asa universal abstract syntax has been put especially clearly by Martin�L&of� andis often referred to in some circles as �Martin�L&of s theory of expressions andarities ��

�� Russell�s paradox

As we have said� one of the appeals of lambda notation is that it permits ananalysis of more or less all of mathematical syntax� Originally� Church hoped togo further and include set theory� which� as is well known� is powerful enough toform a foundation for much of modern mathematics� Given any set S� we canform its so�called characteristic predicate �S� such that�

�Sx� �

�true if x � Sfalse if x �� S

Conversely� given any unary predicate i�e� function of one argument� P �we can consider the set of all x satisfying P x� � we will just write P x� forP x� � true� Thus� we see that sets and predicates are just di�erent ways oftalking about the same thing� Instead of regarding S as a set� and writing x � S�we can regard it as a predicate and write Sx��

This permits a natural analysis into lambda notation� we can allow arbitrarylambda expressions as functions� and hence indirectly as sets� Unfortunately�this turns out to be inconsistent� The simplest way to see this is to consider theRussell paradox of the set of all sets that do not contain themselves�

R � fx j x �� xg

We have R � R� R �� R� a stark contradiction� In terms of lambda de�nedfunctions� we set R � �x� x x�� and �nd that R R � R R�� obviously counterto the intuitive meaning of the negation operator �

To avoid such paradoxes� Church �� followed Russell in augmenting lambdanotation with a notion of type� we shall consider this in a later chapter� Howeverthe paradox itself is suggestive of some interesting possibilities in the standard�untyped� system� as we shall see later�

�This was presented at the Brouwer Symposium in �� but was not described in the printedproceedings�


�� Lambda calculus as a formal system

We have taken for granted certain obvious facts� e�g� that �y� � " y� � � � " ��since these re�ect the intended meaning of abstraction and application� which arein a sense converse operations� Lambda calculus arises if we enshrine certain suchprinciples� and only those� as a set of formal rules� The appeal of this is that therules can then be used mechanically� just as one might transform x � � � �� xinto �x � � " � without pausing each time to think about why these rules aboutmoving things from one side of the equation to the other are valid� As Whitehead�� says� symbolism and formal rules of manipulation�

�� have invariably been introduced to make things easy� �� bythe aid of symbolism� we can make transitions in reasoning almostmechanically by the eye� which otherwise would call into play thehigher faculties of the brain� �� Civilisation advances by extendingthe number of important operations which can be performed withoutthinking about them�

�� Lambda terms

Lambda calculus is based on a formal notion of lambda term� and these termsare built up from variables and some �xed set of constants using the operationsof function application and lambda abstraction� This means that every lambdaterm falls into one of the following four categories�

�� Variables� these are indexed by arbitrary alphanumeric strings� typicallywe will use single letters from towards the end of the alphabet� e�g� x� yand z�

�� Constants� how many constants there are in a given syntax of lambdaterms depends on context� Sometimes there are none at all� We will alsodenote them by alphanumeric strings� leaving context to determine whenthey are meant to be constants�

�� Combinations� i�e� the application of a function s to an argument t� boththese components s and t may themselves be arbitrary ��terms� We willwrite combinations simply as s t� We often refer to s as the �rator and tas the �rand short for �operator and �operand respectively��

�� Abstractions of an arbitrary lambda�term s over a variable x which mayor may not occur free in s�� denoted by �x� s�

Formally� this de�nes the set of lambda terms inductively� i�e� lambda termsarise only in these four ways� This justi�es our�

�� LAMBDA CALCULUS AS A FORMAL SYSTEM ��

� De�ning functions over lambda terms by primitive recursion�

� Proving properties of lambda terms by structural induction�

A formal discussion of inductive generation� and the notions of primitive re�cursion and structural induction� may be found elsewhere� We hope most readerswho are unfamiliar with these terms will �nd that the examples below give thema su�cient grasp of the basic ideas�

We can describe the syntax of lambda terms by a BNF Backus�Naur form�grammar� just as we do for programming languages�

Exp � V ar j Const j Exp Exp j � V ar� Exp

and� following the usual computer science view� we will identify lambda termswith abstract syntax trees� rather than with sequences of characters� This meansthat conventions such as the left�association of function application� the readingof �x y� s as �x� �y� s� and the ambiguity over constant and variable names arepurely a matter of parsing and printing for human convenience� and not part ofthe formal system�

One feature worth mentioning is that we use single characters to stand forvariables and constants in the formal system of lambda terms and as so�called�metavariables standing for arbitrary terms� For example� �x� s might representthe constant function with value s� or an arbitrary lambda abstraction using thevariable x� To make this less confusing� we will normally use letters such as s� tand u for metavariables over terms� It would be more precise if we denoted thevariable x by Vx the x th variable� and likewise the constant k by Ck � thenall the variables in terms would have the same status� However this makes theresulting terms a bit cluttered�

�� Free and bound variables

We now formalize the intuitive idea of free and bound variables in a term� which�incidentally� gives a good illustration of de�ning a function by primitive recursion�Intuitively� a variable in a term is free if it does not occur inside the scope of acorresponding abstraction� We will denote the set of free variables in a term sby FV s�� and de�ne it by recursion as follows�

FV x� � fxg

FV c� � �

FV s t� � FV s� � FV t�

FV �x� s� � FV s�� fxg

Similarly we can de�ne the set of bound variables in a term BV s��


BV x� � �

BV c� � �

BV s t� � BV s� �BV t�

BV �x� s� � BV s� � fxg

For example� if s � �x y� x� �x� z x� we have FV s� � fzg and BV s� �fx� yg� Note that in general a variable can be both free and bound in the sameterm� as illustrated by some of the mathematical examples earlier� As an exampleof using structural induction to establish properties of lambda terms� we will provethe following theorem a similar proof works for BV too��

Theorem �� For any lambda term s� the set FV s� is �nite�Proof� By structural induction� Certainly if s is a variable or a constant� thenby de�nition FV s� is �nite� since it is either a singleton or empty� If s isa combination t u then by the inductive hypothesis� FV t� and FV u� are both�nite� and then FV s� � FV t��FV u�� which is therefore also �nite �the unionof two �nite sets is �nite�� Finally� if s is of the form �x� t then FV t� is �nite�by the inductive hypothesis� and by de�nition FV s� � FV t�� fxg which mustbe �nite too� since it is no larger� Q�E�D�

�� Substitution

The rules we want to formalize include the stipulation that lambda abstractionand function application are inverse operations� That is� if we take a term �x� sand apply it as a function to an argument term t� the answer is the term s withall free instances of x replaced by t� We often make this more transparent indiscussions by using the notation �x� s�x and s�t for the respective terms�

However this simple�looking notion of substituting one term for a variable inanother term is surprisingly di�cult� Some notable logicians have made faultystatements regarding substitution� In fact� this di�culty is rather unfortunatesince as we have said� the appeal of formal rules is that they can be appliedmechanically�

We will denote the operation of substituting a term s for a variable x in an�other term t by t�s�x � One sometimes sees various other notations� e�g� t�x��s ��s�x t� or even t�x�s � The notation we use is perhaps most easily rememberedby noting the vague analogy with multiplication of fractions� x�t�x � t� At �rstsight� one can de�ne substitution formally by recursion as follows�

x�t�x � t

y�t�x � y if x �� y


c�t�x � c

s� s��t�x � s��t�x s��t�x

�x� s��t�x � �x� s

�y� s��t�x � �y� s�t�x � if x �� y

However this isn t quite right� For example �y�x"y��y�x � �y�y"y� whichdoesn t correspond to the intuitive answer�� The original lambda term was �thefunction that adds x to its argument � so after substitution� we might expectto get �the function that adds y to its argument � What we actually get is �thefunction that doubles its argument � The problem is that the variable y that wehave substituted is captured by the variable�binding operation �y� � � �� We should�rst rename the bound variable�

�y� x " y� � �w� x " w�

and only now perform a naive substitution operation�

�w� x " w��y�x � �w� y " w

We can take two approaches to this problem� Either we can add a conditionon all instances of substitution to disallow it wherever variable capture wouldoccur� or we can modify the formal de�nition of substitution so that it performsthe appropriate renamings automatically� We will opt for this latter approach�Here is the de�nition of substitution that we use�

x�t�x � t

y�t�x � y if x �� y

c�t�x � c

s� s��t�x � s��t�x s��t�x

�x� s��t�x � �x� s

�y� s��t�x � �y� s�t�x � if x �� y and either x �� FV s� or y �� FV t�

�y� s��t�x � �z� s�z�y �t�x � otherwise� where z �� FV s� � FV t�

The only di�erence is in the last two lines� We substitute as before in thetwo safe situations where either x isn t free in s� so the substitution is trivial� orwhere y isn t free in t� so variable capture won t occur at this level�� Howeverwhere these conditions fail� we �rst rename y to a new variable z� chosen not tobe free in either s or t� then proceed as before� For de�niteness� the variable zcan be chosen in some canonical way� e�g� the lexicographically �rst name notoccurring as a free variable in either s or t��

�We will continue to use in�x syntax for standard operators� strictly we should write � x yrather than x� y�

�Cognoscenti may also be worried that this de�nition is not in fact primitive recursive


�� Conversions

Lambda calculus is based on three �conversions � which transform one term intoanother one intuitively equivalent to it� These are traditionally denoted by theGreek letters � alpha�� beta� and eta�� Here are the formal de�nitions ofthe operations� using annotated arrows for the conversion relations�

� Alpha conversion� �x� s ��

�y� s�y�x provided y �� FV s�� For example�

�u�u v ��

�w�w v� but �u�u v ��

�v� v v� The restriction avoids another

instance of variable capture�

� Beta conversion� �x� s� t ��

s�t�x �

� Eta conversion� �x�t x ��

t� provided x �� FV t�� For example �u�v u ��

v but �u� u u ��

u�

Of the three� ��conversion is the most important one to us� since it representsthe evaluation of a function on an argument� ��conversion is a technical device tochange the names of bound variables� while �conversion is a form of extensionalityand is therefore mainly of interest to those taking a logical� not a programming�view of lambda calculus�

�� Lambda equality

Using these conversion rules� we can de�ne formally when two lambda terms areto be considered equal� Roughly� two terms are equal if it is possible to get fromone to the other by a �nite sequence of conversions �� or �� either forward orbackward� at any depth inside the term� We can say that lambda equality is thecongruence closure of the three reduction operations together� i�e� the smallestrelation containing the three conversion operations and closed under re�exivity�symmetry� transitivity and substitutivity� Formally we can de�ne it inductivelyas follows� where the horizontal lines should be read as �if what is above the lineholds� then so does what is below �

s ��

t or s ��

t or s ��

t

s � t

t � t

because of the last clause� However it can easily be modi�ed into a primitive recursive de�nitionof multiple parallel substitution� This procedure is analogous to strengthening an inductionhypothesis during a proof by induction� Note that by construction the pair of substitutions inthe last line can be done in parallel rather than sequentially without a�ecting the result�

�These names are due to Curry� Church originally referred to ��conversion and ��conversionas �rule of procedure I� and �rule of procedure II� respectively�


s � t

t � s

s � t and t � u

s � u

s � t

s u � t u

s � t

u s � u t

s � t

�x� s � �x� t

Note that the use of the ordinary equality symbol �� here is misleading� Weare actually de�ning the relation of lambda equality� and it isn t clear that itcorresponds to equality of the corresponding mathematical objects in the usualsense�� Certainly it must be distinguished sharply from equality at the syntacticlevel� We will refer to this latter kind of equality as �identity and use the specialsymbol �� For example �x� x �� y� y but �x� x � �y� y�

For many purposes� ��conversions are immaterial� and often �� is used in�stead of strict identity� This is de�ned like lambda equality� except that only��conversions are allowed� For example� �x� x�y �� y� y�y� Many writers usethis as identity on lambda terms� i�e� consider equivalence classes of terms under�� There are alternative formalizations of syntax where bound variables areunnamed de Bruijn �� and here syntactic identity corresponds to our ��

�� Extensionality

We have said that �conversion embodies a principle of extensionality� In generalphilosophical terms� two properties are said to be extensionally equivalent orcoextensive� when they are satis�ed by exactly the same objects� In mathematics�we usually take an extensional view of sets� i�e� say that two sets are equalprecisely if they have the same elements� Similarly� we normally say that twofunctions are equal precisely if they have the same domain and give the sameresult on all arguments in that domain�

As a consequence of �conversion� our notion of lambda equality is extensional�Indeed� if f x and g x are equal for any x� then in particular f y � g y wherey is chosen not to be free in either f or g� Therefore by the last rule above��y� f y � �y� g y� Now by �converting a couple of times at depth� we see that

�Indeed we haven�t been very precise about what the corresponding mathematical objectsare� But there are models of the lambda calculus where our lambda equality is interpreted asactual equality�

� CHAPTER �� LAMBDA CALCULUS

f � g� Conversely� extensionality implies that all instances of �conversion doindeed give a valid equation� since by ��reduction� �x� t x� y � t y for any ywhen x is not free in t� This is the import of �conversion� and having discussedthat� we will largely ignore it in favour of the more computationally signi�cant��conversion�

�� Lambda reduction

Lambda equality� unsurprisingly� is a symmetric relation� Though it capturesthe notion of equivalence of lambda terms well� it is more interesting from acomputational point of view to consider an asymmetric version� We will de�ne a�reduction relation �� as follows�

s ��

t or s ��

t or s ��

t

s �� t

t �� t

s �� t and t �� u

s �� u

s �� t

s u �� t u

s �� t

u s �� u t

s �� t

�x� s �� x� t

Actually the name �reduction and one also hears ��conversion called ��reduction� is a slight misnomer� since it can cause the size of a term to grow�e�g�

�x� x x x� �x� x x x� �� x� x x x� �x� x x x� �x� x x x�

�� x� x x x� �x� x x x� �x� x x x� �x� x x x�

��

However reduction does correspond to a systematic attempt to evaluate a termby repeatedly evaluating combinations fx� where f is a lambda abstraction�When no more reductions except for � conversions are possible we say that theterm is in normal form�


�� Reduction strategies

Let us recall� in the middle of these theoretical considerations� the relevance ofall this to functional programming� A functional program is an expression andexecuting it means evaluating the expression� In terms of the concepts discussedhere� we are proposing to start with the relevant term and keep on applying re�ductions until there is nothing more to be evaluated� But how are we to choosewhich reduction to apply at each stage� The reduction relation is not determin�istic� i�e� for some terms t there are several ti such that t �� ti� Sometimes thiscan make the di�erence between a �nite and in�nite reduction sequence� i�e� be�tween a program terminating and failing to terminate� For example� by reducingthe innermost redex reducible expression� in the following� we have an in�nitereduction sequence�

�x� y� �x� x x x� �x� x x x��

�� x� y� �x� x x x� �x� x x x� �x� x x x��

�� x� y� �x� x x x� �x� x x x� �x� x x x� �x� x x x��

��

and so ad in�nitum� However the alternative of reducing the outermost redex�rst gives�

�x� y� �x� x x x� �x� x x x�� y

immediately� and there are no more reductions to apply�The situation is clari�ed by the following theorems� whose proofs are too long

to be given here� The �rst one says that the situation we have noted above istrue in a more general sense� i�e� that reducing the leftmost outermost redex isthe best strategy for ensuring termination�

Theorem �� If s �� t with t in normal form� then the reduction sequence thatarises from s by always reducing the leftmost outermost redex is guaranteed toterminate in normal form�

Formally� we de�ne the �leftmost outermost redex recursively� for a term�x� s� t it is the term itself� for any other term s t it is the leftmost outermostredex of s� and for an abstraction �x� s it is the leftmost outermost redex of s�In terms of concrete syntax� we always reduce the redex whose � is the furthestto the left�

�� The Church�Rosser theorem

The next assertion� the famous Church�Rosser theorem� states that if we startfrom a term t and perform any two �nite reduction sequences� there are always


two more reduction sequences that bring the two back to the same term thoughof course this might not be in normal form��

Theorem �� If t �� s� and t �� s�� then there is a term u such that s� �� uand s� �� u�

This has at least the following important consequences�

Corollary �� If t� � t� then there is a term u with t� �� u and t� �� u�Proof� It is easy to see �by structural induction� that the equality relation � isin fact the symmetric transitive closure of the reduction relation� Now we canproceed by induction over the construction of the symmetric transitive closure�However less formally minded readers will probably �nd the following diagrammore convincing�

t� t��R

��

��R

��

��R

��R

��

��R��R��R��

��R

��

��R

��

��R��u

We assume t� � t�� so there is some sequence of reductions in both directions�i�e� the zigzag at the top� that connects them� Now the Church�Rosser theoremallows us to �ll in the remainder of the sides in the above diagram� and hencereach the result by composing these reductions� Q�E�D�

Corollary �� If t � t� and t � t� with t� and t� in normal form� then t� �� t��i�e� t� and t� are equal apart from � conversions�Proof� By the �rst corollary� we have some u with t� �� u and t� �� u� Butsince t� and t� are already in normal form� these reduction sequences to u canonly consist of alpha conversions� Q�E�D�

Hence normal forms� when they do exist� are unique up to alpha conversion�This gives us the �rst proof we have available that the relation of lambda equalityisn t completely trivial� i�e� that there are any unequal terms� For example� since�x y� x and �x y� y are not interconvertible by alpha conversions alone� theycannot be equal�

Let us sum up the computational signi�cance of all these assertions� In somesense� reducing the leftmost outermost redex is the best strategy� since it willwork if any strategy will� This is known as normal order reduction� On the otherhand any terminating reduction sequence will always give the same result� andmoreover it is never too late to abandon a given strategy and start using normalorder reduction� We will see later how this translates into practical terms�

�� COMBINATORS ��

�� Combinators

Combinators were actually developed as an independent theory by Sch&on�nkel�� before lambda notation came along� Moreover Curry rediscovered thetheory soon afterwards� independently of Sch&on�nkel and of Church� When hefound out about Sch&on�nkel s work� Curry attempted to contact him� but by thattime� Sch&on�nkel had been committed to a lunatic asylum�� We will distort thehistorical development by presenting the theory of combinators as an aspect oflambda calculus�

We will de�ne a combinator simply to be a lambda term with no free vari�ables� Such a term is also said to be closed� it has a �xed meaning independentof the values of any variables� Now we will later� in the course of functionalprogramming� come across many useful combinators� But the cornerstone of thetheory of combinators is that one can get away with just a few combinators� andexpress in terms of those and variables any term at all� the operation of lambdaabstraction is unnecessary� In particular� a closed term can be expressed purelyin terms of these few combinators� We start by de�ning�

I � �x� x

K � �x y� x

S � �f g x� f x�g x�

We can motivate the names as follows�� I is the identity function� K producesconstant functions� when applied to an argument a it gives the function �y� a�Finally S is a �sharing combinator� which takes two functions and an argumentand shares out the argument among the functions� Now we prove the following�

Lemma �� For any lambda term t not involving lambda abstraction� there isa term u also not containing lambda abstractions� built up from S� K� I andvariables� with FV u� � FV t� � fxg and u � �x� t� i�e� u is lambda�equal to�x� t�Proof� By structural induction on the term t� By hypothesis� it cannot be anabstraction� so there are just three cases to consider�

� If t is a variable� then there are two possibilities� If it is equal to x� then�x� x � I� so we are �nished� If not� say t � y� then �x� y � K y�

� If t is a constant c� then �x� c � K c�

� If t is a combination� say s u� then by the inductive hypothesis� there arelambda�free terms s� and u� with s� � �x� s and u� � �x� u� Now we claimS s� u� suces� Indeed�

�We are not claiming these are the historical reasons for them��Konstant � Sch�on�nkel was German though in fact he originally used C�


S s� u� x � S �x� s� �x� u� x

� �x� s� x��x� u� x�

� s u

� t

Therefore� by �conversion� we have S s� u� � �x� S s� u� x � �x� t� sinceby the inductive hypothesis x is not free in s� or u��

Q�E�D�

Theorem �� For any lambda term t� there is a lambda�free term t� built upfrom S� K� I and variables� with FV t�� FV t� and t� � t�Proof� By structural induction on t� using the lemma� For example� if t is �x� s�we �rst �nd� by the inductive hypothesis� a lambda�free equivalent s� of s� Nowthe lemma can be applied to �x� s�� The other cases are straightforward� Q�E�D�

This remarkable fact can be strengthened� since I is de�nable in terms of Sand K� Note that for any A�

S K A x � K x�A x�

� �y� x�A x�

� x

So again by �converting� we see that I � S K A for any A� It is customary�for reasons that will become clear when we look at types� to use A � K� SoI � S K K� and we can avoid the use of I in our combinatory expression�

Note that the proofs above are constructive� in the sense that they guideone in a de�nite procedure that� given a lambda term� produces a combinatorequivalent� One proceeds bottom�up� and at each lambda abstraction� which byconstruction has a lambda�free body� applies the top�down transformations givenin the lemma�

Although we have presented combinators as certain lambda terms� they canalso be developed as a theory in their own right� That is� one starts with aformal syntax excluding lambda abstractions but including combinators� Insteadof �� and conversions� one posits conversion rules for expressions involvingcombinators� e�g� K x y �� x� As an independent theory� this has manyanalogies with lambda calculus� e�g� the Church�Rosser theorem holds for thisnotion of reduction too� Moreover� the ugly di�culties connected with boundvariables are avoided completely� However the resulting system is� we feel� lessintuitive� since combinatory expressions can get rather obscure�

�� COMBINATORS ��

Apart from their purely logical interest� combinators have a certain practicalpotential� As we have already hinted� and as will become much clearer in the laterchapters� lambda calculus can be seen as a simple functional language� formingthe core of real practical languages like ML� We might say that the theorem ofcombinatory completeness shows that lambda calculus can be �compiled down toa �machine code of combinators� This computing terminology is not as fancifulas it appears� Combinators have been used as an implementation technique forfunctional languages� and real hardware has been built to evaluate combinatoryexpressions�

Further reading

An encyclopedic but clear book on lambda calculus is Barendregt �� Anotherpopular textbook is Hindley and Seldin �� Both these contain proofs of theresults that we have merely asserted� A more elementary treatment focusedtowards computer science is Part II of Gordon �� Much of our presentationhere and later is based on this last book�

Exercises

�� Find a normal form for �x x x� x� a b c�

�� De�ne twice � �f x� ffx�� What is the intuitive meaning of twice�Find a normal form for twice twice twice f x� Remember that functionapplication associates to the left��

�� Find a term t such that t ��

t� Is it true to say that a term is in normal

form if and only if whenever t �� t� then t �� t��

�� What are the circumstances under which s�t�x �u�y �� s�u�y �t�x �

�� Find an equivalent in terms of the S� K and I combinators alone for�f x� fx x��

�� Find a single combinator X such that all ��terms are equal to a term builtfrom X and variables� You may �nd it helpful to consider A � �p�p K S Kand then think about A A A and A A A��

�� Prove that any X is a �xed point combinator if and only if it is itself a �xedpoint of G� where G � �y m� my m��

Chapter �

Lambda calculus as a

programming language

One of the central questions studied by logicians in the ��s was the Entschei�dungsproblem or �decision problem � This asked whether there was some system�atic� mechanical procedure for deciding validity in �rst order logic� If it turnedout that there was� it would have fundamental philosophical� and perhaps practi�cal� signi�cance� It would mean that a huge variety of complicated mathematicalproblems could in principle be solved purely by following some single �xed method� we would now say algorithm � no additional creativity would be required�

As it stands the question is hard to answer� because we need to de�ne in math�ematical terms what it means for there to be a �systematic� mechanical procedurefor something� Perhaps Turing �� gave the best analysis� by arguing that themechanical procedures are those that could in principle be performed by a rea�sonably intelligent clerk with no knowledge of the underlying subject matter� Heabstracted from the behaviour of such a clerk and arrived at the famous notionof a �Turing machine � Despite the fact that it arose by considering a human�computer � and was purely a mathematical abstraction� we now see a Turingmachine as a very simple computer� Despite being simple� a Turing machine iscapable of performing any calculation that can be done by a physical computer��

A model of computation of equal power to a Turing machine is said to be Turingcomplete�

At about the same time� there were several other proposed de�nitions of�mechanical procedure � Most of these turned out to be equivalent in power toTuring machines� In particular� although it originally arose as a foundation formathematics� lambda calculus can also be seen as a programming language� to beexecuted by performing ��conversions� In fact� Church proposed before Turingthat the set of operations that can be expressed in lambda calculus formalizes the

�Actually more since it neglects the necessary �nite limit to the computer�s store� Arguablyany existing computer is actually a �nite state machine but the assumption of unboundedmemory seems a more fruitful abstraction�

��

��

intuitive idea of a mechanical procedure� a postulate known as Churchs thesis�Church �� showed that if this was accepted� the Entscheidungsproblem isunsolvable� Turing later showed that the functions de�nable in lambda calculusare precisely those computable by a Turing machine� which lent more credibilityto Church s claims�

To a modern programmer� Turing machine programs are just about recog�nizable as a very primitive kind of machine code� Indeed� it seems that Turingmachines� and in particular Turing s construction of a �universal machine �� werea key in�uence in the development of modern stored program computers� thoughthe extent and nature of this in�uence is still a matter for controversy Robinson�� It is remarkable that several of the other proposals for a de�nition of�mechanical procedure � often made before the advent of electronic computers�correspond closely to real programming methods� For example� Markov algo�rithms� the standard approach to computability in the former� Soviet Union�can be seen as underlying the programming language SNOBOL� In what follows�we are concerned with the in�uence of lambda calculus on functional program�ming languages�

LISP� the second oldest high level language FORTRAN being the oldest�took a few ideas from lambda calculus� in particular the notation �LAMBDA � � �� for anonymous functions� However it was not really based on lambda calculusat all� In fact the early versions� and some current dialects� use a system ofdynamic binding that does not correspond to lambda calculus� We will discussthis later�� Moreover there was no real support for higher order functions� whilethere was considerable use of imperative features� Nevertheless LISP deserves tobe considered as the �rst functional programming language� and pioneered manyof the associated techniques such as automatic storage allocation and garbagecollection�

The in�uence of lambda calculus really took o� with the work of Landin andStrachey in the ��s� In particular� Landin showed how many features of currentimperative� languages could usefully be analyzed in terms of lambda calculus�e�g� the notion of variable scoping in Algol �� Landin �� went on to proposethe use of lambda calculus as a core for programming languages� and inventeda functional language ISWIM �If you See What I Mean �� This was widelyin�uential and spawned many real languages�

ML started life as the metalanguage hence the name ML� of a theorem prov�ing system called Edinburgh LCF Gordon� Milner� and Wadsworth �� Thatis� it was intended as a language in which to write algorithms for proving theo�rems in a formal deductive calculus� It was strongly in�uenced by ISWIM butadded an innovative polymorphic type system including abstract types� as well asa system of exceptions to deal with error conditions� These facilities were dictated

�A single Turing machine program capable of simulating any other � we would now regardit as an interpreter� You will learn about this in the course on Computation Theory�

��CHAPTER �� LAMBDA CALCULUS AS A PROGRAMMING LANGUAGE

by the application at hand� and this policy resulted in a coherent and focuseddesign� This narrowness of focus is typical of successful languages C is anothergood example� and contrasts sharply with the failure of committee�designed lan�guages like Algol �� which have served more as a source of important ideas thanas popular practical tools� We will have more to say about ML later� Let us nowsee how pure lambda calculus can be used as a programming language�

�� Representing data in lambda calculus

Programs need data to work on� so we start by �xing lambda expressions toencode data� Furthermore� we de�ne a few basic operations on this data� In manycases� we show how a string of human�readable syntax s can be translated directlyinto a lambda expression s�� This procedure is known as �syntactic sugaring �it makes the bitter pill of pure lambda notation easier to digest� We write suchde�nitions as�

s�� s�

This notation means �s � s� by de�nition � one also commonly sees this writtenas �s �def s� � If preferred� we could always regard these as de�ning a singleconstant denoting that operation� which is then applied to arguments in the usuallambda calculus style� the surface notation being irrelevant� For example we canimagine �if E then E� else E� as �COND E E� E� for some constant COND� Insuch cases� any variables on the left of the de�nition need to be abstracted over�e�g� instead of

fst p�� p true

see below� we could write

fst�� p� p true

�� Truth values and the conditional

We can use any unequal lambda expressions to stand for �true and �false � butthe following work particularly smoothly�

true�� x y� x

false�� x y� y

Given those de�nitions� we can easily de�ne a conditional expression just likeC s �� construct� Note that this is a conditional expression not a conditional

�� REPRESENTING DATA IN LAMBDA CALCULUS ��

command this notion makes no sense in our context� and therefore the �else armis compulsory�

if E then E� else E�� E E� E�

indeed we have�

if true then E� else E� � true E� E�

� �x y� x� E� E�

� E�

and

if false then E� else E� � false E� E�

� �x y� y� E� E�

� E�

Once we have the conditional� it is easy to de�ne all the usual logical operations�

not p�� if p then false else true

p and q�� if p then q else false

p or q�� if p then true else q

�� Pairs and tuples

We can represent ordered pairs as follows�

E�� E�� f� f E� E�

The parentheses are not obligatory� but we often use them for the sake offamiliarity or to enforce associations� In fact we simply regard the comma as anin�x operator like "� Given the above de�nition� the corresponding destructorsfor pairs can be de�ned as�

fst p�� p true

snd p�� p false

It is easy to see that these work as required�

�CHAPTER �� LAMBDA CALCULUS AS A PROGRAMMING LANGUAGE

fst p� q� � p� q� true

� �f� f p q� true

� true p q

� �x y� x� p q

� p

and

snd p� q� � p� q� false

� �f� f p q� false

� false p q

� �x y� y� p q

� q

We can build up triples� quadruples� quintuples� indeed arbitrary n�tuples� byiterating the pairing construct�

E�� E�� En� � E�� E�� En��

We need simply say that the in�x comma operator is right�associative� andcan understand this without any other conventions� For example�

p� q� r� s� � p� q� r� s��

� �f� f p q� r� s��

� �f� f p �f� f q r� s��

� �f� f p �f� f q �f� f r s��

� �f� f p �g� g q �h� h r s��

where in the last line we have performed an alpha�conversion for the sake ofclarity� Although tuples are built up in a �at manner� it is easy to create arbitrary�nitely�branching tree structures by using tupling repeatedly� Finally� if oneprefers conventional functions over Cartesian products to our �curried functions�one can convert between the two using the following�

CURRY f�� x y� fx� y�

UNCURRY g�� p� g fst p� snd p�

�� REPRESENTING DATA IN LAMBDA CALCULUS ��

These special operations for pairs can easily be generalized to arbitrary n�tuples� For example� we can de�ne a selector function to get the ith componentof a �at tuple p� We will write this operation as p�i and de�ne p�� fst pand the others by p�i � fst sndi�� p�� Likewise we can generalize CURRY andUNCURRY�

CURRYn f�� x� � � � xn� fx�� xn�

UNCURRYn g�� p� g p�� p�n

We can now write �x�� xn�� t as an abbreviation for�

UNCURRYn �x� � � � xn� t�

giving a natural notation for functions over Cartesian products�

�� The natural numbers

We will represent each natural number n as follows��

n�� f x� fn x

that is� � � �f x� x� � � �f x� f x� � � �f x� f f x� etc� These representa�tions are known as Church numerals� though the basic idea was used earlier byWittgenstein �� They are not a very e�cient representation� amountingessentially to counting in unary� �� There are�from an e�ciency point of view� much better ways of representing numbers� e�g�tuples of trues and falses as binary expansions� But at the moment we are onlyinterested in computability �in principle and Church numerals have various niceformal properties� For example� it is easy to give lambda expressions for thecommon arithmetic operators such as the successor operator which adds one toits argument�

SUC�� n f x� n f f x�

Indeed

SUC n � �n f x� n f f x��f x� fn x�

� �f x� �f x� fn x�f f x�

� �f x� �x� fn x�f x�

�The n in fn x is just a meta�notation and does not mean that there is anything circularabout our de�nition�

�� A number is the exponent of an operation��


� �f x� fn f x�

� �f x� fn� x

� n " �

Similarly it is easy to test a numeral to see if it is zero�

ISZERO n�� n �x� false� true

since�

ISZERO � � �f x� x��x� false� true � true

and

ISZERO n " �� f x� fn�x��x� false�true

� �x� false�n� true

� �x� false��x� false�n true�

� false

The following perform addition and multiplication on Church numerals�

m " n�� f x� m f n f x�

m � n�� f x� m n f� x

Indeed�

m " n � �f x� m f n f x�

� �f x� �f x� fm x� f n f x�

� �f x� �x� fm x� n f x�

� �f x� fm n f x�

� �f x� fm �f x� fn x� f x�

� �f x� fm �x� fn x� x�

� �f x� fm fn x�

� �f x� fmnx

and�

m � n � �f x� m n f� x

� �f x� �f x� fm x� n f� x

�� RECURSIVE FUNCTIONS ��

� �f x� �x� n f�m x� x

� �f x� n f�m x

� �f x� �f x� fn x� f�m x

� �f x� �x� fn x�m x

� �f x� fn�m x

� �f x� fmnx

Although those operations on natural numbers were fairly easy� a �predecessor function is much harder� What is required is a a lambda expression PRE so thatPRE � � � and PRE n " �� n� Finding such an expression was the �rstpiece of mathematical research by the logician Kleene �� His trick was� given�f x�fn x� to �throw away one of the applications of f � The �rst step is to de�nea function �PREFN that operates on pairs such that�

PREFN f true� x� � false� x�

and

PREFN f false� x� � false� f x�

Given this function� then PREFN f�n�true� x� � false� fn x�� which isenough to let us de�ne a predecessor function without much di�culty� Here is asuitable de�nition of �PREFN �

PREFN�� f p� false� if fst p then snd p else fsnd p�

Now we de�ne�

PRE n�� f x� sndn PREFN f� true� x��

It is left as an exercise to the reader to see that this works correctly�

�� Recursive functions

Being able to de�ne functions by recursion is a central feature of functional pro�gramming� it is the only general way of performing something comparable toiteration� At �rst sight� it appears that we have no way of doing this in lambdacalculus� Indeed� it would seem that the naming of the function is a vital part ofmaking a recursive de�nition� since otherwise� how can we refer to it on the righthand side of the de�nition without descending into in�nite regress� Rather sur�prisingly� it can be done� although as with the existence of a predecessor function�this fact was only discovered after considerable e�ort�

The key is the existence of so�called �xed point combinators� A closed lambdaterm Y is said to be a �xed point or �xpoint� combinator when for all lambda


terms f � we have fY f� � Y f � That is� a �xed point combinator� given anyterm f as an argument� returns a �xed point for f � i�e� a term x such thatfx� � x� The �rst such combinator to be found by Curry� is usually called Y �It can be motivated by recalling the Russell paradox� and for this reason is oftencalled the �paradoxical combinator � We de�ned�

R � �x� x x�

and found that�

R R � R R�

That is� R R is a �xed point of the negation operator� So in order to get ageneral �xed point combinator� all we need to do is generalize into whateverfunction is given it as argument� Therefore we set�

Y�� f� �x� fx x��x� fx x��

It is a simple matter to show that it works�

Y f � �f� �x� fx x��x� fx x�� f

� �x� fx x��x� fx x��

� f�x� fx x��x� fx x��

� fY f�

Though valid from a mathematical point of view� this is not so attractive froma programming point of view since the above was only valid for lambda equality�not reduction in the last line we performed a backwards beta conversion�� Forthis reason the following alternative� due to Turing� may be preferred�

T�� x y� y x x y�� x y� y x x y��

The proof that T f �� fT f� is left as an exercise�� However it doesno harm if we simply allow Y f to be beta�reduced throughout the reductionsequence for a recursively de�ned function� Now let us see how to use a �xedpoint combinator say Y � to implement recursive functions� We will take thefactorial function as an example� We want to de�ne a function fact with�

factn� � if ISZERO n then � else n � factPRE n�

The �rst step is to transform this into the following equivalent�

fact � �n� if ISZERO n then � else n � factPRE n�

which in its turn is equivalent to�

�� LET EXPRESSIONS ��

fact � �f n� if ISZERO n then � else n � fPRE n�� fact

This exhibits fact as the �xed point of a function F � namely�

F � �f n� if ISZERO n then � else n � fPRE n�

Consequently we merely need to set fact � Y F � Similar techniques work formutually recursive functions� i�e� a set of functions whose de�nitions are mutuallydependent� A de�nition such as�

f� � F� f� � � � fn

f� � F� f� � � � fn

� � � � � � �

fn � Fn f� � � � fn

can be transformed� using tuples� into a single equation�

f�� f�� fn� � F� f� � � � fn� F� f� � � � fn� � � � � Fn f� � � � fn�

If we now write t � f�� f�� fn�� then each of the functions on the right canbe recovered by applying the appropriate selector to t� we have fi � t�i� Afterabstracting out t� this gives an equation in the canonical form t � F t which canbe solved by t � Y F � Now the individual functions can be recovered from t bythe same method of selecting components of the tuple�

�� Let expressions

We ve touted the ability to write anonymous functions as one of the merits oflambda calculus� and shown that names are not necessary to de�ne recursivefunctions� Nevertheless it is often useful to be able to give names to expressions�to avoid tedious repetition of large terms� A simple form of naming can besupported as another syntactic sugar on top of pure lambda calculus�

let x � s in t�� x� t� s

For example� as we would expect�

let z � � " � in z " z� � �z� z " z� � " �� " �� " � " ��

We can achieve the binding of multiple names to expressions either in a serialor parallel fashion� For the �rst� we will simply iterate the above construct� Forthe latter� we will write multiple bindings separated by �and �


let x� � s� and � � � and xn � sn in t

and regard it as syntactic sugar for�

�x�� xn�� t� s�� sn�

To illustrate the distinction between serial and parallel binding� consider�

let x � � in let x � � in let y � x in x " y

which should yield �� while�

let x � � in let x � � and y � x in x " y

should yield �� We will also allow arguments to names bound by �let � againthis can be regarded as syntactic sugar� with f x� � � � xn � t meaning f ��x� � � � xn� t� Instead of a pre�x �let x � s in t � we will allow a post�x version�which is sometimes more readable�

t where x � s

For example� we might write �y y� where y � � " x � Normally� the �let and �where constructs will be interpreted as above� in a non�recursive fashion�For example�

let x � x� � in � � �

binds x to one less than whatever value it is bound to outside� rather than tryingto bind x to the �xed point of x � x�� However where a recursive interpretationis required� we add the word �rec � writing �let rec or �where rec � e�g�

let rec factn� � if ISZERO n then � else n � factPRE n�

This can be regarded as a shorthand for �let fact � Y F where

F � �f n� if ISZERO n then � else n � fPRE n�

as explained above�

�There is such a �xed point but the lambda term involved has no normal form so is in asense �unde�ned�� The semantics of nonterminating terms is subtle and we will not considerit at length here� Actually the salient question is not whether a term has a normal formbut whether it has a so�called head normal form � see Barendregt �� and also Abramsky��

�� STEPS TOWARDS A REAL PROGRAMMING LANGUAGE ��

�� Steps towards a real programming language

We have established a fairly extensive system of �syntactic sugar to support ahuman�readable syntax on top of pure lambda calculus� It is remarkable thatthis device is enough to bring us to the point where we can write the de�nitionof the factorial function almost in the way we can in ML� In what sense� though�is the lambda calculus augmented with these notational conventions really aprogramming language�

Ultimately� a program is just a single expression� However� given the use oflet to bind various important subexpressions to names� it is natural instead toview the program as a set of de�nitions of auxiliary functions� followed �nally bythe expression itself� e�g�

let rec factn� � if ISZERO n then � else n � factPRE n� in� � �fact��

We can read these de�nitions of auxiliary functions as equations in the mathe�matical sense� Understood in this way� they do not give any explicit instructionson how to evaluate expressions� or even in which direction the equations areto be used� For this reason� functional programming is often referred to as adeclarative programming method� as also is logic programming e�g� PROLOG��

The program does not incorporate explicit instructions� merely declares certainproperties of the relevant notions� and leaves the rest to the machine�

At the same time� the program is useless� or at least ambiguous� unless themachine somehow reacts sensibly to it� Therefore it must be understood thatthe program� albeit from a super�cial point of view purely declarative� is to beexecuted in a certain way� In fact� the expression is evaluated by expandingall instances of de�ned notions i�e� reading the equations from left to right��and then repeatedly performing ��conversions� That is� although there seems tobe no procedural information in the program� a particular execution strategy isunderstood� In a sense then� the term �declarative is largely a matter of humanpsychology�

Moreover� there must be a de�nite convention for the reduction strategy touse� for we know that di�erent choices of ��redex can lead to di�erent behaviourwith respect to termination� Consequently� it is only when we specify this thatwe really get a de�nite programming language� We will see in later chapters howdi�erent functional languages make di�erent choices here� But �rst we pause toconsider the incorporation of types�

�Apparently Landin preferred �denotative� to �declarative��


�� Further reading

Many of the standard texts already cited include details about the matters dis�cussed here� In particular Gordon �� gives a clear proof that lambda calculusis Turing�complete� and that the lambda calculus analog of the �halting problem whether a term has a normal form� is unsolvable� The in�uence of lambda cal�culus on real programming languages and the evolution of functional languagesare discussed by Hudak ��

Exercises

�� Justify �generalized ��conversion � i�e� prove that�

�x�� xn�� t�x�� xn �t�� tn� � t�t�� tn

�� De�ne f � g�� x� fg x�� and recall that I � �x� x� Prove that CURRY �

UNCURRY � I� Is it also true that UNCURRY � CURRY � I��

�� What arithmetical operation does �n f x� fn f x� perform on Churchnumerals� What about �m n� n m�

�� Prove that the following due to Klop� is a �xed point combinator�

��

where�

�� abcdefghijklmnopqstuvwxyzr� rthisisafixedpointcombinator�

�� Make a recursive de�nition of natural number subtraction�

�� Consider the following representation of lists� due to Mairson ��

nil � �c n� n

cons � �x l c n� c x l c n�

head � �l� l�x y� x� nil

tail � �l� snd l�x p� cons x fst p�� fst p�� nil� nil��

append � �l� l�� l� cons l�

append lists � �L� L append nil

�� FURTHER READING ��

map � �f l� l �x� cons f x�� nil

length � �l� l�x� SUC��

tack � �x l� l cons cons x nil�

reverse � �l� l tack nil

�lter � �l test� l �x� test x�cons x��y� y�� nil

Why does Mairson call them �Church lists � What do all these functionsdo�

Chapter �

Types

Types are a means of distinguishing di�erent sorts of data� like booleans� naturalnumbers and functions� and making sure that these distinctions are respected�by� for example� ensuring that functions cannot be applied to arguments of thewrong type� Why should we want to add types to lambda calculus� and toprogramming languages derived from it� We can distinguish reasons from logicand reasons from programming�

From the logical point of view� we have seen that the Russell paradox makesit di�cult to extend lambda calculus with set theory without contradiction� TheRussell paradox may well arise because of the peculiar self�referential nature ofthe trick involved� we apply a function to itself� Even if it weren t necessary toavoid paradox� we might feel that intuitively we really have a poor grip of what sgoing on in a system where we can do such things� Certainly applying somefunctions to themselves� e�g� the identity function �x� x or a constant function�x� y� seems quite innocuous� But surely we would have a clearer picture of whatsort of functions lambda terms denote if we knew exactly what their domains andcodomains were� and only applied them to arguments in their domains� Thesewere the sorts of reasons why Russell originally introduced types in PrincipiaMathematica�

Types also arose� perhaps at �rst largely independently of the above con�siderations� in programming languages� and since we view lambda calculus as aprogramming language� this motivates us too� Even FORTRAN distinguishedintegers and �oating point values� One reason was clearly e�ciency� the machinecan generate more e�cient code� and use storage more e�ectively� by knowingmore about a variable� For example the implementation of C s pointer arith�metic varies according to the size of the referenced objects� If p is a pointer toobjects occupying � bytes� then what C programmers write as p"� becomes p"�inside a byte�addressed machine� C s ancestor BCPL is untyped� and is thereforeunable to distinguish pointers from integers� to support the same style of pointerarithmetic it is necessary to apply a scaling at every indirection� a signi�cantpenalty�

�

�� TYPED LAMBDA CALCULUS ��

Apart from e�ciency� as time went by� types also began to be appreciatedmore and more for their value in providing limited static checks on programs�Many programming errors� either trivial typos or more substantial conceptualerrors� are suggested by type clashes� and these can be detected before running theprogram� during the compilation process� Moreover� types often serve as usefuldocumentation for people reading code� Finally� types can be used to achievebetter modularization and data hiding by �arti�cially distinguishing some datastructure from its internal representation�

At the same time� some programmers dislike types� because they �nd therestrictions placed on their programming style irksome� There are untyped pro�gramming languages� imperative ones like BCPL and functional ones like ISWIM�SASL and Erlang� Others� like PL�I� have only weak typing� with the compilerallowing some mixtures of di�erent types and casting them appropriately� Thereare also languages like LISP that perform typechecking dynamically� during exe�cution� rather than statically during compilation� This can at least in principle�degrade runtime performance� since it means that the computer is spending sometime on making sure types are respected during execution � compare checking ofarray bounds during execution� another common runtime overhead� By contrast�we have suggested that static typing� if anything� can improve performance�

Just how restrictive a type system is found depends a lot on the nature of theprogrammer s applications and programming style� Finding a type system thatallows useful static typechecking� while at the same time allowing programmersplenty of �exibility� is still an active research area� The ML type system is animportant step along this road� since it features polymorphism� allowing the samefunction to be used with various di�erent types� This retains strong static type�checking while giving some of the bene�ts of weak or dynamic typing�� Moreover�programmers never need to specify any types in ML � the computer can infera most general type for every expression� and reject expressions that are not ty�pable� We will consider polymorphism in the setting of lambda calculus below�Certainly� the ML type system provides a congenial environment for many kindsof programming� Nevertheless� we do not want to give the impression that it isnecessarily the answer to all programming problems�

�� Typed lambda calculus

It is a fairly straightforward matter to modify lambda calculus with a notion oftype� but the resulting changes� as we shall see� are far�reaching� The basic ideais that every lambda term has a type� and a term s can only be applied to a termt in a combination s t when the types match up appropriately� i�e� s has the typeof a function � � � and t has type �� The result� s t� then has type � � This is�in programming language parlance� strong typing� The term t must have exactly

�As well perhaps as some of the overheads�

�� CHAPTER �� TYPES

the type �� there is no notion of subtyping or coercion� This contrasts with someprogramming languages like C where a function expecting an argument of typefloat or double will accept one of type int and appropriately cast it� Similarnotions of subtyping and coercion can also be added to lambda calculus� but toconsider these here would lead us too far astray�

We will use �t � � to mean �t has type � � This is already the standard notationin mathematics where function spaces are concerned� because f � � � � meansthat f is a function from the set � to the set � � We will think of types as sets inwhich the corresponding objects live� and imagine t � � to mean t � �� Thoughthe reader may like to do likewise� as a heuristic device� we will view typed lambdacalculus purely as a formal system and make the rules independent of any suchinterpretation�

�� The stock of types

The �rst stage in our formalization is to specify exactly what the types are�We suppose �rst of all that we have some set of primitive types� which might�for example� contain bool and int� We can construct composite types fromthese using the function space type constructor� Formally� we make the followinginductive de�nition of the set TyC of types based on a set of primitive types C�

� � C

� � TyC

� � TyC � � TyC� � � � TyC

For example� possible types might be int� bool � bool and int � bool� �int � bool� We assume that the function arrow associates to the right� i�e�� means � � � � �� This �ts naturally with the other syntacticconventions connected with currying�

Before long� we shall extend the type system in two ways� First� we willallow so�called type variables as well as primitive type constants � these are thevehicle for polymorphism� Secondly� we will allow the introduction of other typeconstructors besides the function arrow� For example� we will later introduce aconstructor � for the product type� In that case a new clause�

� � TyC � � TyC� � � � TyC

needs to be added to the inductive de�nition� Once we move to the more concretecase of ML� there is the possibility of new� user�de�ned types and type construc�tors� so we will get used to arbitrary sets of constructors of any arity� We willwrite �� n�con for the application of an n�ary type constructor con to thearguments �i� Only in a few special cases like � and � do we use in�x syntax�


for the sake of familiarity�� For example ��list is the type of lists whose elementsall have type ��

An important property of the types� provable because of free inductive gen�eration� is that � � � �� In fact� more generally� a type cannot be the sameas any proper syntactic subexpression of itself�� This rules out the possibility ofapplying a term to itself� unless the two instances in question have distinct types�

�� Church and Curry typing

There are two major approaches to de�ning typed lambda calculus� One ap�proach� due to Church� is explicit� A single� type is attached to each term�That is� in the construction of terms� the untyped terms that we have presentedare modi�ed with an additional �eld for the type� In the case of constants� thistype is pre�assigned� but variables may be of any type� The rules for valid termformation are then�

v � �

Constant c has type �

c � �

s � � � � t � �

s t � �

v � � t � �

�v� t � � � �

For our purposes� however� we prefer Curry s approach to typing� which ispurely implicit� The terms are exactly as in the untyped case� and a term mayor may not have a type� and if it does� may have many di�erent types�� Forexample� the identity function �x� x may be given any type of the form � � ��reasonably enough� There are two related reasons why we prefer this approach�First� it provides a smoother treatment of ML s polymorphism� and secondly� itcorresponds well to the actual use of ML� where it is never necessary to specifytypes explicitly�

At the same time� some of the formal details of Curry�style type assignmentare a bit more complex� We do not merely de�ne a relation of typability inisolation� but with respect to a context� i�e� a �nite set of typing assumptionsabout variables� We write�

' � t � �

�Some purists would argue that this isn�t properly speaking �typed lambda calculus� butrather �untyped lambda calculus with a separate notion of type assignment��


to mean �in context '� the term t can be given type � � We will� however� write� t � � or just t � � when the typing judgement holds in the empty context�� Theelements of ' are of the form v � �� i�e� they are themselves typing assumptionsabout variables� typically those that are components of the term� We assumethat ' never gives contradictory assignments to the same variable� if preferredwe can think of it as a partial function from the indexing set of the variablesinto the set of types� The use of the symbol � corresponds to its general use forjudgements in logic of the form ' � � read as � follows from assumptions ' �

�� Formal typability rules

The rules for typing expressions are fairly natural� Remember the interpretationof t � � as �t could be given type � �

v � � � '

' � v � �

Constant c has type �

c � �

' � s � � � � ' � t � �

' � s t � �

' � fv � �g � t � �

' � �v� t � � � �

Once again� this is to be regarded as an inductive de�nition of the typabilityrelation� so a term only has a type if it can be derived by the above rules� Forexample� let us see how to type the identity function� By the rule for variableswe have�

fx � �g � x � �

and therefore by the last rule we get�

� � �x� x � � � �

By our established convention for empty contexts� we write simply �x� x �� This example also illustrates the need for the context in Curry typing�and why it can be avoided in the Church version� Without the context we coulddeduce x � � for any � � and then by the last rule get �x� x � � � � � clearly atvariance with the intuitive interpretation of the identity function� This problemdoesn t arise in Church typing� since in that case either both variables have type�� in which case the rules imply that �x�x � � � �� or else the two x s are actuallydi�erent variables� since their types di�er and the types are a component of theterm� Now� indeed� �x � �� x � �� but this is reasonable because the


term is alpha�equivalent to �x � �� y � �� In the Curry version of typing�the types are not attached to the terms� and so we need some way of connectinginstances of the same variable�

�� Type preservation

Since the underlying terms of our typed lambda calculus are the same as in theuntyped case� we can take over the calculus of conversions unchanged� However�for this to make sense� we need to check that all the conversions preserve typa�bility� a property called type preservation� This is not di�cult� we will sketcha formal proof for �conversion and leave the other� similar cases to the reader�First we prove a couple of lemmas that are fairly obvious but which need tobe established formally� First� adding new entries to the context cannot inhibittypability�

Lemma �� If ' � t � � and ' � ( then ( � t � ��Proof� By induction on the structure of t� Formally we are �xing t and provingthe above for all ' and (� since in the inductive step for abstractions� the setsconcerned change� If t is a variable we must have that t � � � '� Thereforet � � � ( and the result follows� If t is a constant the result is immediate sincethe stock of constants and their possible types is independent of the context� Ift is a combination s u then we have� for some type � � that ' � s � � � � and' � u � � � By the inductive hypothesis� ( � s � � � � and ( � u � � and theresult follows� Finally� if t is an abstraction �x� s then we must have� by the lastrule� that � is of the form � � � � and that ' � fx � �g � s � � �� Since ' � (we also have ' � fx � �g � ( � fx � �g� and so by the inductive hypothesis( � fx � �g � s � � �� By applying the rule for abstractions� the result follows�Q�E�D�

Secondly� entries in the context for variables not free in the relevant term maybe discarded�

Lemma �� If ' � t � �� then 't � t � � where 't contains only entries forvariables free in t� or more formally 't � fx � � j x � � � ' and x � FV t�g�Proof� Again� we prove the above for all ' and the corresponding 't by structuralinduction over t� If t is a variable� then since ' � t � � we must have an entryx � � in the context� But by the �rst rule we know fx � �g � x � � as required�In the case of a constant� then the typing is independent of the context� so theresult holds trivial ly� If t is a combination s u then we have for some � that' � s � � � � and ' � u � � � By the inductive hypothesis� we have 's � s � � � �and 'u � u � � � By the monotonicity lemma� we therefore have 'su � s � � � �and 'su � u � � � since FV s u� � FV s� � FV u�� Hence by the rule forcombinations we get 'su � t � �� Finally� if t has the form �x� s� we must have


' � fx � �g � s � � � with � of the form � � � �� By the inductive hypothesis wehave ' � fx � �g�s � s � � �� and so ' � fx � �g�s � fx � �g � �x� s� � �� Now weneed simply observe that ' � fx � �g�s � fx � �g � 't and appeal once more tomonotonicity� Q�E�D�

Now we come to the main result�

Theorem �� If ' � t � � and t ��

t�� then also ' � t� � �

Proof� Since by hypothesis t is an �redex� it must be of the form �x�t x� with x ��FV t�� Now� its type can only have arisen by the last typing rule� and therefore� must be of the form � � � �� and we must have fx � �g � t x� � � �� Now� thistyping can only have arisen by the rule for combinations� Since the context canonly give one typing to each variable� we must therefore have fx � �g � t � � � � ��But since by hypothesis� x �� FV t�� the lemma yields � t � � � � � as required�Q�E�D�

Putting together all these results for other conversions� we see that if ' � t � �and t �� t�� then we have ' � t� � �� This is reassuring� since if the evaluationrules that are applied during execution could change the types of expressions� itwould cast doubt on the whole enterprise of static typing�

�� Polymorphism

The Curry typing system already gives us a form of polymorphism� in that agiven term may have di�erent types� Let us start by distinguish between thesimilar concepts of polymorphism and overloading� Both of them mean that anexpression may have multiple types� However in the case of polymorphism� allthe types bear a systematic relationship to each other� and all types following thepattern are allowed� For example� the identity function may have type � � ��or � � � � or � � �� but all the instances have the same structure�By contrast� in overloading� a given function may have di�erent types that bearno structural relation to each other� or only certain instances may be selected�For example� a function " may be allowed to have type int� int� int or typefloat� float� float� but not bool � bool � bool�� Yet a third related notionis subtyping� which is a more principled version of overloading� regarding certaintypes as embedded in others� This� however� turns out to be a lot more complexthan it seems��

�Strachey who coined the term �polymorphism� referred to what we call polymorphism andoverloading as parametric and ad hoc polymorphism respectively�

�For example if � is a subtype of �� should �� be a subtype or a supertype of �� One can defend either interpretation ��covariant� or �contravariant� in certain situations�

�� POLYMORPHISM ��

�� Let polymorphism

Unfortunately the type system we have so far places some unpleasant restrictionson polymorphism� For example� the following expression is perfectly acceptable�

if �x� x� true then �x� x� � else �

Here is a proof according to the above rules� We assume that the constantscan be typed in the empty environment� and we fold together two successiveapplications of the rule for combinations as a rule for if� remembering that�if b then t� else t� is shorthand for �COND b t� t� �

fx � boolg � x � bool

� �x� x� � bool� bool � true � bool

� �x� x� true � bool

fx � intg � x � int

� �x� x� � int� int � � � int

� �x� x� � � int � � � int

� if �x� x� true then �x� x� � else � � int

The two instances of the identity function get types bool � bool and int� int�But consider the following�

let I � �x� x in if I true then I � else �

According to our de�nitions� this is just syntactic sugar for�

�I� if I true then I � else �� x� x�

As the reader may readily con�rm� this cannot be typed according to ourrules� There is just one instance of the identity function and it must be given asingle type� This feature is unbearable� since in practical functional programmingwe rely heavily on using let� Unless we change the typing rules� many of thebene�ts of polymorphism are lost� Our solution is to make let primitive� ratherthan interpret it as syntactic sugar� and add the new typing rule�

' � s � � ' � t�s�x � �

' � let x � s in t � �

This rule� which sets up let polymorphism� expresses the fact that� with respectto typing at least� we treat let bindings just as if separate instances of the namedexpression were written out� The additional hypothesis ' � s � � merely ensuresthat s is typable� the exact type being irrelevant� This is to avoid admitting aswell�typed terms such as�

let x � �f� f f in �

Now� for example� we can type the troublesome example�


fx � �g � x � �

� �x� x � � � �

As given above

� if �x� x� true then �x� x� � else � � int

� let I � �x� x in if I true then I � else � � int

�� Most general types

As we have said� some expressions have no type� e�g� �f� f f or �f� f true� f ��Typable expressions normally have many types� though some� e�g� true� haveexactly one� We have already commented that the form of polymorphism in MLis parametric� i�e� all the di�erent types an expression can have bear a structuralsimilarity to each other� In fact more is true� there exists for each typableexpression a most general type or principal type� and all possible types for theexpression are instances of this most general type� Before stating this preciselywe need to establish some terminology�

First� we extend our notion of types with type variables� That is� types canbe built up by applying type constructors either to type constants or variables�Normally we use the letters � and � to represent type variables� and � and �arbitrary types� Given this extension� we can de�ne what it means to substituteone type for a type variable in another type� This is precisely analogous tosubstitution at the term level� and we even use the same notation� For example� � bool�� bool� The formal de�nition is much simplerthan at the term level� because we do not have to worry about bindings� At thesame time� it is convenient to extend it to multiple parallel substitutions�

�i�� n��k � �i if �i �� for � � i � k

�� n��k � � if �i �� for � � i � k

�� n�con�� n�� con

For simplicity we treat type constants as nullary constructors� e�g� consider�int rather than int� but if preferred it is easy to include them as a special case�Given this de�nition of substitution� we can de�ne the notion of a type � beingmore general than a type �� written � � �� This is de�ned to be true preciselyif there is a set of substitutions � such that �� For example we have�

� � �

��

�� bool � � � �� bool

�This relation is re�exive so we should really say �at least as general as� rather than �moregeneral than��

�� STRONG NORMALIZATION ��

� � � � ��

��

Now we come to the main theorem�

Theorem �� Every typable term has a principal type� i�e� if t � � � then there issome � such that t � � and for any �� if t � �� then � � ��

It is easy to see that the relation � is a preorder relation� i�e� is re�exiveand transitive� The principal type is not unique� but it is unique up to renamingof type variables� More precisely� if � and � are both principal types for anexpression� then � � � � meaning that � � � and � � ��

We will not even sketch a proof of the principal type theorem� it is not partic�ularly di�cult� but is rather long� What is important to understand is that theproof gives a concrete procedure for �nding a principal type� This procedure isknown as Milners algorithm� or often the Hindley�Milner algorithm�� All imple�mentations of ML� and several other functional languages� incorporate a versionof this algorithm� so expressions can automatically be allocated a principal type�or rejected as ill�typed�

�� Strong normalization

Recall our examples of terms with no normal form� such as�

�x� x x x� �x� x x x��

�� x� x x x� �x� x x x� �x� x x x��

��

In typed lambda calculus� this cannot happen by virtue of the following strongnormalization theorem� whose proof is too long for us to give here�

Theorem �� Every typable term has a normal form� and every possible reduc�tion sequence starting from a typable term terminates�

At �rst sight this looks good � a functional program respecting our typediscipline can be evaluated in any order� and it will always terminate in the uniquenormal form� The uniqueness comes from the Church�Rosser theorem� which is

�The above principal type theorem in the form we have presented it is due to Milner �� but the theorem and an associated algorithm had earlier been discovered by Hindley in a similarsystem of combinatory logic�

�Weak normalization means the �rst part only i�e� every term can be reduced to normalform but there may still be reduction sequences that fail to terminate�

� CHAPTER �� TYPES

still true in the typed system�� However� the ability to write nonterminatingfunctions is essential to Turing completeness�� so we are no longer able to de�neall computable functions� not even all total ones�

This might not matter if we could de�ne all �practically interesting functions�However this is not so� the class of functions de�nable as typed lambda termsis fairly limited� Schwichtenberg �� has proved that if Church numerals areused� then the de�nable functions are those that are polynomials or result frompolynomials using de�nition by cases� Note that this is strictly intensional� ifanother numeral representation is used� a di�erent class of functions is obtained�In any case� it is not enough for a general programming language�

Since all de�nable functions are total� we are clearly unable to make arbitraryrecursive de�nitions� Indeed� the usual �xpoint combinators must be untypable�Y � �f� �x� fx x��x� fx x�� clearly isn t well�typed because it applies x toitself and x is bound by a lambda� In order to regain Turing�completeness wesimply add a way of de�ning arbitrary recursive functions that is well�typed� Weintroduce a polymorphic recursion operator with all types of the form�

Rec � � � ��

and the extra reduction rule that for any F � � � �� we have�

Rec F �� F Rec F �

From now on� we will suppose that recursive de�nitions� signalled by �let rec �map down to the use of these recursion operators�

Further reading

Barendregt �� and Hindley and Seldin �� also consider typed lambdacalculus� The original paper by Milner �� is still a good source of informationabout polymorphic typing and the algorithm for �nding principal types� A niceelementary treatment of typed lambda calculus� which gives a proof of StrongNormalization and discusses some interesting connections with logic� is Girard�Lafont� and Taylor �� This also discusses a more advanced version of typedlambda calculus called �System F where even though strong normalization holds�the majority of �interesting functions are de�nable�

Exercises

�� Is it true that if ' � t � � then for any substitution � we have ' � t � ��

�Given any recursive enumeration of total computable functions we can always create onenot in the list by diagonalization� This will be illustrated in the Computability Theory course�

�� STRONG NORMALIZATION ��

�� Prove formally the type preservation theorems for � and � conversion�

�� Show that the converse to type preservation does not hold� i�e� it is possiblethat t �� t� and ' � t� � � but not ' � t � ��

�� Prove that every term of typed lambda calculus with principal type�� reduces to a Church numeral�

�� To what extent can the typechecking process be reversed� i�e� the terminferred from the type� For instance� is it true that in pure typed lambdacalculus with type variables but no constants and no recursion operator�that any t � � � � � � is in fact equal to K � �x y� x in the usual senseof lambda equality� If so� how far can this be generalized��

�� We will say that an arbitrary reduction relation �� is weakly Church�Rosser if whenever t �� t� and t �� t�� then there is a u with t� ��

� uand t� �� u� where �� is the re�exive transitive closure of �� ProveNewmans Lemma� which states that if a relation is weakly Church�Rosserand obeys strong normalization� then �� is Church�Rosser� Hint� use aform of wellfounded induction��

See Mairson �� for more on this question��If you get stuck there is a nice proof in Huet ��

Chapter �

A taste of ML

In the last few chapters� we have started with pure lambda calculus and system�atically extended it with new primitive features� For example we added the �let construct as primitive for the sake of making the polymorphic typing more useful�and a recursion operation in order to restore the computational power that waslost when adding types� By travelling further along this path� we will eventuallyarrive at ML� although it s still valuable to retain the simple worldview of typedlambda calculus�

The next stage is to abandon the encodings of datatypes like booleans andnatural numbers via lambda terms� and instead make them primitive� That is� wehave new primitive types like bool and int actually positive and negative inte�gers� and new type constructors like �� a step we have in many ways anticipatedin the last chapter� Associated with these are new constants and new conversionrules� For example� the expression � " � is evaluated using machine arithmetic�rather than by translation into Church numerals and performing ��conversions�These additional conversions� regarded as augmenting the usual lambda opera�tions� are often referred to as ��conversions � We will see in due course how theML language is extended in other ways in comparison with pure lambda calculus�First� we must address the fundamental question of ML s evaluation strategy�

�� Eager evaluation

We have said that from a theoretical point of view� normal order top downleft�to�right� reduction of expressions is to be preferred� since if any strategyterminates� this one will�� However it has some practical defects� For example�consider the following�

�x� x " x " x� �� " ��

�This strategy is familiar from a few traditional languages like Algol � where it is referredto as call by name�

��

�� EAGER EVALUATION ��

A normal order reduction results in �� " �� " �� " �� " �� " �� and thefollowing reduction steps need to evaluate three separate instances of the sameexpression� This is unacceptable in practice� There are two main solutions tothis problem� and these solutions divide the world of functional programminglanguages into two camps�

The �rst alternative is to stick with normal order reduction� but attemptto optimize the implementation so multiple subexpressions arising in this wayare shared� and never evaluated more than once� Internally� expressions arerepresented as directed acyclic graphs rather than as trees� This is known as lazyor call�by�need evaluation� since expressions are only evaluated when absolutelynecessary�

The second approach is to turn the theoretical considerations about reductionstrategy on their head and evaluate arguments to functions before passing thevalues to the function� This is known as applicative order or eager evaluation�The latter name arises because function arguments are evaluated even when theymay be unnecessary� e�g� t in �x� y� t� Of course� eager evaluation means thatsome expressions may loop that would terminate in a lazy regime� But this isconsidered acceptable� since these instances are generally easy to avoid in practice�In any case� an eager evaluation strategy is the standard in many programminglanguages like C� where it is referred to as call by value�

ML adopts eager evaluation� for two main reasons� Choreographing the re�ductions and sharings that occur in lazy evaluation is quite tricky� and implemen�tations tend to be relatively ine�cient and complicated� Unless the programmeris very careful� memory can �ll up with pending unevaluated expressions� andin general it is hard to understand the space behaviour of programs� In factmany implementations of lazy evaluation try to optimize it to eager evaluationin cases where there is no semantic di�erence�� By contrast� in ML� we always�rst evaluate the arguments to functions and only then perform the ��reduction� this is simple and e�cient� and is easy to implement using standard compilertechnology�

The second reason for preferring applicative order evaluation is that ML isnot a pure functional language� but includes imperative features variables� as�signments etc�� Therefore the order of evaluation of subexpressions can make asemantic di�erence� rather than merely a�ecting e�ciency� If lazy evaluation isused� it seems to become practically impossible for the programmer to visualize�in a nontrivial program� exactly when each subexpression gets evaluated� In theeager ML system� one just needs to remember the simple evaluation rule�

It is important to realize� however� that the ML evaluation strategy is notsimply bottom�up� the exact opposite of normal order reduction� In fact� MLnever evaluates underneath a lambda� In particular� it never reduces �redexes�

�They try to perform strictness analysis a form of static analysis that can often discoverthat arguments are necessarily going to be evaluated �Mycroft ��

�� CHAPTER �� A TASTE OF ML

only ��redexes�� In evaluating �x� s�x � t� t is evaluated �rst� However s�x isnot touched since it is underneath a lambda� and any subexpressions of t thathappen to be in the scope of a lambda are also left alone� The precise evaluationrules are as follows�

� Constants evaluate to themselves�

� Evaluation stops immediately at ��abstractions and does not look insidethem� In particular� there is no �conversion�

� When evaluating a combination s t� then �rst both s and t are evaluated�Then� assuming that the evaluated form of s is a lambda abstraction� atoplevel ��conversion is performed and the process is repeated�

The order of evaluation of s and t varies between versions of ML� In theversion we will use� t is always evaluated �rst� Strictly� we should also specifya rule for let expressions� since as we have said they are by now accepted as aprimitive� However from the point of view of evaluation they can be understoodin the original way� as a lambda applied to an argument� with the understandingthat the rand will be evaluated �rst� To make this explicit� the rule for

let x � s in t

is that �rst of all s is evaluated� the resulting value is substituted for x in t� and�nally the new version of t is evaluated� Let us see some examples of evaluatingexpressions�

�x� �y� y " y� x�� " �� x� �y� y " y� x��

�� y� y " y��

�� " �

��

Note that the subterm �y� y" y� x is not reduced� since it is within the scopeof a lambda� However� terms that are reducible and not enclosed by a lambda inboth function and argument get reduced before the function application itself isevaluated� e�g� the second step in the following�

�f x� f x� �y� y " y�� " �� f x� f x� �y� y " y��

�� x� �y� y " y� x� �

�� y� y " y� �

�� " �

�� CONSEQUENCES OF EAGER EVALUATION ��

The fact that ML does not evaluate under lambdas is of crucial importanceto advanced ML programmers� It gives precise control over the evaluation ofexpressions� and can be used to mimic many of the helpful cases of lazy evaluation�We shall see a very simple instance in the next section�

�� Consequences of eager evaluation

The use of eager evaluation forces us to make additional features primitive� withtheir own special reduction procedures� rather than implementing them directlyin terms of lambda calculus� In particular� we can no longer regard the conditionalconstruct�

if b then e� else e�

as the application of an ordinary ternary operator�

COND b e� e�

The reason is that in any particular instance� because of eager evaluation�we evaluate all the expressions b� e� and e� �rst� before evaluating the body ofthe COND� Usually� this is disastrous� For example� recall our de�nition of thefactorial function�

let rec factn� � if ISZERO n then � else n � factPRE n�

If the conditional evaluates all its arguments� then in order to evaluate fact��the �else arm must be evaluated� in its turn causing factPRE �� to be evalu�ated� This in its turn causes the evaluation of factPRE PRE �� and so on�Consequently the computation will loop inde�nitely�

Accordingly� we make the conditional a new primitive construct� and modifythe usual reduction strategy so that �rst the boolean expression is evaluated� andonly then is exactly one of the two arms evaluated� as appropriate�

What about the process of recursion itself� We have suggested understandingrecursive de�nitions in terms of a recursion operator Rec with its own reductionrule�

Rec f �� fRec f�

This would also loop using an eager evaluation strategy�

Rec f �� fRec f� �� ffRec f�� fffRec f��

However we only need a very simple modi�cation to the reduction rule tomake things work correctly�


Rec f �� f�x� Rec f x�

Now the lambda on the right means that �x� Rec f x evaluates to itself� andso only after the expression has been further reduced following substitution inthe body of f will evaluation continue�

�� The ML family

We have been talking about �ML as if it were a single language� In fact thereare many variants of ML� even including �Lazy ML � an implementation fromChalmers University in Sweden that is based on lazy evaluation� The most popu�lar version of ML in education is �Standard ML � but we will use another version�called CAML �camel � Light�� We chose CAML Light for several reasons�

� The implementation is small and highly portable� so can be run e�ectivelyon Unix machines� PCs� Macs etc�

� The system is simple� syntactically and semantically� making it relativelyeasy to approach�

� The system is well suited to practical programming� e�g� it can easily beinterfaced to C and supports standard make�compatible separate compila�tion�

However� we will be teaching quite general techniques� and any code we writecan be run with a few minor syntactic changes in any version of ML� and indeedoften in other functional languages�

�� Starting up ML

ML has already been installed on Thor� In order to use it� you need to put theCAML binary directory on your PATH� This can be done assuming you use the�bash shell or one of its family�� as follows�

PATH��PATH��home�jrh��caml�bin�export PATH

To avoid doing this every time you log on� you can insert these lines near theend of your �bash profile� or the equivalent for your favourite shell� Now touse CAML in its normal interactive and interpretive mode� you simply need totype camllight� and the system should �re up and present its prompt ��

�The name stands for �Categorical Abstract Machine� the underlying implementationmethod�

�� INTERACTING WITH ML ��

� camllight Caml Light version ��

In order to exit the system� simply type ctrl�d or quit�� at the prompt�If you are interested in installing CAML Light on your own machine� you shouldconsult the following Web page for detailed information�

http��pauillac�inria�fr�caml�

� Interacting with ML

When ML presents you with its prompt� you can type in expressions� terminatedby two successive semicolons� and it will evaluate them and print the result� Incomputing jargon� the ML system sits in a read�eval�print loop� it repeatedlyreads an expression� evaluates it� and prints the result� For example� ML can beused as a simple calculator�

� � �� int � ��

The system not only returns the answer� but also the type of the expression�which it has inferred automatically� It can do this because it knows the typeof the built�in addition operator �� On the other hand� if an expression is nottypable� the system will reject it� and try to give some idea about how the typesfail to match up� In complicated cases� the error messages can be quite tricky tounderstand�

� � true��Toplevel input�let it � � � true�� This expression has type bool�but is used with type int�

Since ML is a functional language� expressions are allowed to have functiontype� The ML syntax for a lambda abstraction �x� t�x is fun x �� t�x�� Forexample we can de�ne the successor function�

fun x � x � �� int � int � �fun


Again� the type of the expression� this time int �� int� is inferred and dis�played� However the function itself is not printed� the system merely writes�fun�� This is because� in general� the internal representations of functions arenot very readable�� Functions are applied to arguments just as in lambda calcu�lus� by juxtaposition� For example�

�fun x � x � �� int � �

Again as in lambda calculus� function application associates to the left� andyou can write curried functions using the same convention of eliding multiple � si�e� fun s�� For example� the following are all equivalent�

��fun x � �fun y � x � y�� int � � �fun x � fun y � x � y� � �� int � � �fun x y � x � y� � �� int � �

� Bindings and declarations

It is not convenient� of course� to evaluate the whole expression in one go� rather�we want to use let to bind useful subexpressions to names� This can be done asfollows�

let successor � fun x � x � � insuccessor�successor�successor �� int � �

Function bindings may use the more elegant sugaring�

let successor x � x � � insuccessor�successor�successor �� int � �

and can be made recursive just by adding the rec keyword�

let rec fact n � if n � then �else n � fact�n � �� in

fact �� int � ��

By using and� we can make several binding simultaneously� and de�ne mutu�ally recursive functions� For example� here are two simple� though highly ine��cient� functions to decide whether or not a natural number is odd or even�

�CAML does not store them simply as syntax trees but compiles them into bytecode�

�� BINDINGS AND DECLARATIONS ��

let rec even n � if n � then true else odd �n � ��and odd n � if n � then false else even �n � ��

even � int � bool � �funodd � int � bool � �fun even �� bool � true odd �� bool � false

In fact� any bindings can be performed separately from the �nal application�ML remembers a set of variable bindings� and the user can add to that set inter�actively� Simply omit the in and terminate the phrase with a double semicolon�

let successor � fun x � x � ��successor � int � int � �fun

After this declaration� any later phrase may use the function successor� e�g�

successor �� int � ��

Note that we are not making assignments to variables� Each binding is onlydone once when the system analyses the input� it cannot be repeated or modi��ed� It can be overwritten by a new de�nition using the same name� but this isnot assignment in the usual sense� since the sequence of events is only connectedwith the compilation process� not with the dynamics of program execution� In�deed� apart from the more interactive feedback from the system� we could equallyreplace all the double semicolons after the declarations by in and evaluate ev�erything at once� On this view we can see that the overwriting of a declarationreally corresponds to the de�nition of a new local variable that hides the outerone� according to the usual lambda calculus rules� For example�

let x � ��x � int � � let y � ��y � int � � let x � ��x � int � � x � y�� int � �

is the same as�

let x � � inlet y � � inlet x � � inx � y��

� � int � �

� CHAPTER �� A TASTE OF ML

Note carefully that� also following lambda calculus� variable binding is static�i�e� the �rst binding of x is still used until an inner binding occurs� and any usesof it until that point are not a�ected by the inner binding� For example�

let x � ��x � int � � let f w � w � x��f � int � int � �fun let x � ��x � int � � f �� int � �

The �rst version of LISP� however� used dynamic binding� where a rebindingof a variable propagated to earlier uses of the variable� so that the analogous se�quence to the above would return �� This was in fact originally regarded as a bug�but soon programmers started to appreciate its convenience� It meant that whensome low�level function was modi�ed� the change propagated automatically to allapplications of it in higher level functions� without the need for recompilation�The feature survived for a long time in many LISP dialects� but eventually theview that static binding is better prevailed� In Common LISP� static binding isthe default� but dynamic binding is available if desired via the keyword special�

�� Polymorphic functions

We can de�ne polymorphic functions� like the identity operator�

let I � fun x � x��I � �a � �a � �fun

ML prints type variables as �a� �b etc� These are supposed to be ASCIIrepresentations of �� and so on� We can now use the polymorphic functionseveral times with di�erent types�

I true�� bool � true I �� int � � I I I I �� int � ��

Each instance of I in the last expression has a di�erent type� and intuitivelycorresponds to a di�erent function� In fact� let s de�ne all the combinators�

�� POLYMORPHIC FUNCTIONS ��

let I x � x��I � �a � �a � �fun let K x y � x��K � �a � �b � �a � �fun let S f g x � �f x� �g x��S � ��a � �b � �c� � ��a � �b� � �a � �c � �fun

Note that the system keeps track of the types for us� even though in the lastcase they were quite complicated� Now� recall that I � S K K� let us try thisout in ML��

let I� � S K K��I� � ��a � ��a � �fun

It has the right type�� and it may easily be checked in all concrete cases� e�g�

I� � � �� bool � true

In the above examples of polymorphic functions� the system very quickly infersa most general type for each expression� and the type it infers is simple� Thisusually happens in practice� but there are pathological cases� e�g� the followingexample due to Mairson �� The type of this expression takes about ��seconds to calculate� and occupies over �� lines on an ��column terminal�

let pair x y � fun z � z x y inlet x� � fun y � pair y y inlet x� � fun y � x��x� y� inlet x� � fun y � x��x� y� inlet x� � fun y � x��x� y� inlet x� � fun y � x��x� y� inx��fun z � z��

We have said that the ML programmer need never enter a type� This is truein the sense that ML will already allocate as general a type as possible to anexpression� However it may sometimes be convenient to restrict the generality ofa type� This cannot make code work that didn t work before� but it may serve asdocumentation regarding the intended purpose of the code� it is also possible touse shorter synonyms for complicated types� Type restriction can be achieved inML by adding type annotations after some expressions�� These type annotationsconsist of a colon followed by a type� It usually doesn t matter exactly wherethese annotations are added� provided they enforce the appropriate constraints�For example� here are some alternative ways of constraining the identity functionto type int �� int�

�We remarked that from the untyped point of view S K A � I for any A� However thereader may try for example S K S and see that the principal type is less general than expected�

�Ignore the underscores for now� This is connected with the typing of imperative features and we will discuss it later�


let I �x�int� � x��I � int � int � �fun let I x � �x�int��I � int � int � �fun let �I�int�int� � fun x � x��I � int � int � �fun let I � fun �x�int� � x��I � int � int � �fun let I � ��fun x � x��int�int��I � int � int � �fun

�� Equality of functions

Instead of comparing the actions of I and I � on particular arguments like �� itwould seem that we can settle the matter de�nitively by comparing the functionsthemselves� However this doesn t work�

I� � I��Uncaught exception� Invalid�argument �equal� functional value�

It is in general forbidden to compare functions for equality� though a fewspecial instances� where the functions are obviously the same� yield true�

let f x � x � ��f � int � int � �fun let g x � x � ��g � int � int � �fun f � f�� bool � true f � g��Uncaught exception� Invalid�argument �equal� functional value� let h � g��h � int � int � �fun h � f��Uncaught exception� Invalid�argument �equal� functional value� h � g�� bool � true

Why these restrictions� Aren t functions supposed to be �rst�class objects inML� Yes� but unfortunately� extensional� function equality is not computable�This follows from a number of classic theorems in recursion theory� such as theunsolvability of the halting problem and Rices theorem� Let us give a concreteillustration of why this might be so� It is still an open problem whether the

�You will see these results proved in the Computation Theory course� Rice�s theorem is anextremely strong undecidability result which asserts that any nontrivial property of the functioncorresponding to a program is uncomputable from its text� An excellent computation theorytextbook is Davis Sigal and Weyuker ��

�� EQUALITY OF FUNCTIONS ��

following function terminates for all arguments� the assertion that it does beingknown as the Collatz conjecture��

let rec collatz n �if n �� then else if even�n� then collatz�n � ��else collatz�� n � ��

collatz � int � int � �fun

What is clear� though� is that if it does halt it returns �� Now consider thefollowing trivial function�

let f �x�int� � ��f � int � int � �fun

By deciding the equation collatz � f� the computer would settle the Collatzconjecture� It is easy to concoct other examples for open mathematical problems�

It is possible to trap out applications of the equality operator to functions anddatatypes built up from them as part of typechecking� rather than at runtime�Types that do not involve functions in these ways are known as equality types�since it is always valid to test objects of such types for equality� On the negativeside� this makes the type system much more complicated� However one mightargue that static typechecking should be extended as far as feasibility allows�

Further reading

Many books on functional programming bear on the general issues we have dis�cussed here� such as evaluation strategy� A good elementary introduction toCAML Light and functional programming is Mauny �� Paulson �� isanother good textbook� though based on Standard ML�

Exercises

�� Suppose that a �conditional function de�ned by ite�b�x�y� � if b then

x else y is the only function available that operates on arguments of typebool� Is there any way to write a factorial function�

�� Use the typing rules in the last chapter to write out a formal proof that theS combinator has the type indicated by the ML system�

�A good survey of this problem and attempts to solve it is given by Lagarias ��Strictly we should use unlimited precision integers rather than machine arithmetic� We willsee later how to do this�


�� Write a simple recursively de�ned function to perform exponentiation ofintegers� i�e� calculate xn for n � �� Write down two ML functions� theequality of which corresponds to the truth of Fermat s last theorem� thereare no integers x� y� z and natural number n � � with xn " yn � zn exceptfor the trivial case where x � � or y � ��

Chapter �

Further ML

In this chapter� we consolidate the previous examples by specifying the basicfacilities of ML and the syntax of phrases more precisely� and then go on to treatsome additional features such as recursive types� We might start by saying moreabout interaction with the system�

So far� we have just been typing phrases into ML s toplevel read�eval�printloop and observing the result� However this is not a good method for writingnontrivial programs� Typically� you should write the expressions and declarationsin a �le� To try things out as you go� they can be inserted in the ML window using�cut and paste � This operation can be performed using X�windows and similarsystems� or in an editor like Emacs with multiple bu�ers� However� this becomeslaborious and time�consuming for large programs� Instead� you can use ML sinclude function to read in the �le directly� For example� if the �le myprog�ml

contains�

let pythag x y z �x � x � y � y � z � z��

pythag � � ��

pythag � ��

pythag � � ��

then the toplevel phrase include �myprog�ml� results in�

include �myprog�ml��pythag � int � int � int � bool � �fun� � bool � true� � bool � true� � bool � false� � unit � ��

��

�� CHAPTER � FURTHER ML

That is� the ML system responds just as if the phrases had been entered atthe top level� The �nal line is the result of evaluating the include expressionitself�

In large programs� it is often helpful to include comments� In ML� these arewritten between the symbols � and �� e�g�

�� This function tests if �x�y�z� is a Pythagorean triple ��

let pythag x y z �x � x � y � y � z � z��

��comments�� pythag ��can�� go�� almost�� anywhere�� and �� can �� be �� nested �� quite �� arbitrarily ��

�� Basic datatypes and operations

ML features several built�in primitive types� From these� composite types maybe built using various type constructors� For the moment� we will only use thefunction space constructor �� and the Cartesian product constructor � but wewill see in due course which others are provided� and how to de�ne new typesand type constructors� The primitive types that concern us now are�

� The type unit� This is a ��element type� whose only element is written�� Obviously� something of type unit conveys no information� so it iscommonly used as the return type of imperatively written �functions thatperform a side�e�ect� such as include above� It is also a convenient argu�ment where the only use of a function type is to delay evaluation�

� The type bool� This is a ��element type of booleans truth�values� whoseelements are written true and false�

� The type int� This contains some �nite subset of the positive and negativeintegers� Typically the permitted range is from �� up to�� The numerals are written in the usual way� optionallywith a negation sign� e�g� ��

� The type string contains strings i�e� �nite sequences� of characters� Theyare written and printed between double quotes� e�g� �hello�� In order toencode include special characters in strings� C�like escape sequences areused� For example� � is the double quote itself� and n is the newlinecharacter�

�This is even more limited than expected for machine arithmetic because a bit is stolen forthe garbage collector� We will see later how to use an alternative type of integers with unlimitedprecision�

�� BASIC DATATYPES AND OPERATIONS ��

The above values like �� false� � and �caml� are all to be regarded asconstants in the lambda calculus sense� There are other constants correspondingto operations on the basic types� Some of these may be written as in�x operators�for the sake of familiarity� These have a notion of precedence so that expressionsare grouped together as one would expect� For example� we write x � y ratherthan � x y and x � � y � z rather than � x �� y� z�� The logicaloperator not also has a special parsing status� in that the usual left�associativityrule is reversed for it� not not p means not �not p�� User�de�ned functionsmay be granted in�x status via the �infix directive� For example� here is ade�nition of a function performing composition of functions�

let successor x � x � ��successor � int � int � �fun let o f g � fun x � f�g x��o � ��a � �b� � ��c � �a� � �c � �b � �fun let add� � o successor �o successor successor��add� � int � int � �fun add� �� int � � infix �o�� let add�� successor o successor o successor��add�� int � int � �fun add�� int � �

It is not possible to specify the precedence of user�de�ned in�xes� nor to makeuser�de�ned non�in�x functions right�associative� Note that the implicit opera�tion of �function application has a higher precedence than any binary operator�so successor � parses as �successor � �� If it is desired to use a func�tion with special status as an ordinary constant� simply precede it by prefix�For example�

o successor successor��Toplevel input�o successor successor��Syntax error� prefix o successor successor�� int � int � �fun �prefix o� successor successor�� int � int � �fun

With these questions of concrete syntax out of the way� let us present asystematic list of the operators on the basic types above� The unary operatorsare�

Operator Type Meaning� int �� int Numeric negationnot bool �� bool Logical negation


and the binary operators� in approximately decreasing order of precedence� are�

Operator Type Meaningmod int �� int �� int Modulus remainder� int �� int �� int Multiplication� int �� int �� int Truncating division� int �� int �� int Addition� int �� int �� int Subtraction! string �� string �� string String concatenation� �a �� a �� bool Equality�� a �� a �� bool Inequality� �a �� a �� bool Less than�� a �� a �� bool Less than or equal� �a �� a �� bool Greater than�� a �� a �� bool Greater than or equal" bool �� bool �� bool Boolean �and or bool �� bool �� bool Boolean �or

For example� x � � " x � is parsed as " �� x �� x �� Note thatall the comparisons� not just the equality relation� are polymorphic� They notonly order integers in the expected way� and strings alphabetically� but all otherprimitive types and composite types in a fairly natural way� Once again� however�they are not in general allowed to be used on functions�

The two boolean operations " and or have their own special evaluation strat�egy� like the conditional expression� In fact� they can be regarded as synonymsfor conditional expressions�

p ) q�� if p then q else false

p or q�� if p then true else q

Thus� the �and operation evaluates its �rst argument� and only if it is true�evaluates its second� Conversely� the �or operation evaluates its �rst argument�and only if it is false evaluates its second�

�� Syntax of ML phrases

Expressions in ML can be built up from constants and variables� any identi�erthat is not currently bound is treated as a variable� Declarations bind namesto values of expressions� and declarations can occur locally inside expressions�Thus� the syntax classes of expressions and declarations are mutually recursive�We can represent this by the following BNF grammar��

�We neglect many constructs that we won�t be concerned with� A few will be introducedlater� See the CAML manual for full details�

�� SYNTAX OF ML PHRASES ��

expression �� variable

j constant

j expression expression

j expression infix expression

j not expression

j if expression then expression else expression

j fun pattern �� expression

j expression�

j declaration in expression

declaration �� let let bindings

j let rec let bindings

let bindings �� let binding

j let binding and let bindings

let binding �� pattern � expression

pattern �� variables

variables �� variable

j variable variables

The syntax class pattern will be expanded and explained more thoroughlylater on� For the moment� all the cases we are concerned with are either justvariable or variable variable � � � variable� In the �rst case we simply bind an ex�pression to a name� while the second uses the special syntactic sugar for functiondeclarations� where the arguments are written after the function name to the leftof the equals sign� For example� the following is a valid declaration of a functionadd�� which can be used to add � to its argument�

let add� x �let y � successor x inlet z � let w � successor y in

successor w insuccessor z��

add� � int � int � �fun add� �� int � �

It is instructive to unravel this declaration according to the above grammar�A toplevel phrase� terminated by two successive semicolons� may be either anexpression or a declaration�

� CHAPTER � FURTHER ML

�� Further examples

It is easy to de�ne by recursion a function that takes a positive integer n and afunction f and returns fn� i�e� f � � � � � f n times��

let rec funpow n f x �if n � then xelse funpow �n � �� f �f x��

funpow � int � ��a � �a� � �a � �a � �fun

In fact� a little thought will show that the function funpow takes a machineinteger n and returns the Church numeral corresponding to n� Since functionsaren t printed� we can t actually look at the lambda expression representing aChurch numeral�

funpow �� a � ��a� � ��a � ��a � �fun

However it is straightforward to de�ne an inverse function to funpow thattakes a Church numeral back to a machine integer�

let defrock n � n �fun x � x � �� defrock � ��int � int� � int � �a� � �a � �fun defrock�funpow �� int � ��

We can try out some of the arithmetic operations on Church numerals�

let add m n f x � m f �n f x��add � ��a � �b � �c� � ��a � �d � �b� � �a � �d � �c � �fun let mul m n f x � m �n f� x��mul � ��a � �b � �c� � ��d � �a� � �d � �b � �c � �fun let exp m n f x � n m f x��exp � �a � ��a � �b � �c � �d� � �b � �c � �d � �fun let test bop x y � defrock �bop �funpow x� �funpow y��test ��a � �a� � �a � �a� ��b � �b� � �b � �b� � �int � int� � int � �c� �int � int � �c � �fun

test add � �� int � �� test mul � �� int � � test exp � �� int � ��

The above is not a very e�cient way of performing arithmetic operations�ML does not have a function for exponentiation� but it is easy to de�ne one byrecursion�

�� FURTHER EXAMPLES ��

let rec exp x n �if n � then �else x � exp x �n � ��

exp � int � int � int � �fun

However this performs n multiplications to calculate xn� A more e�cient wayis to exploit the facts that x�n � xn�� and x�n� � xxn�� as follows�

let square x � x � x��square � int � int � �fun let rec exp x n �

if n � then �else if n mod � � then square�exp x �n � ��else x � square�exp x �n � ��

exp � int � int � int � �fun infix �exp�� exp �� int � �� exp �� int � ��

Another classic operation on natural numbers is to �nd their greatest commondivisor highest common factor� using Euclid s algorithm�

let rec gcd x y �if y � then x else gcd y �x mod y��

gcd � int � int � int � �fun gcd � �� int � � gcd � �� int � � gcd �� int � ��

We have used a notional recursion operator Rec to explain recursive de�ni�tions� We can actually de�ne such a thing� and use it�

let rec Rec f � f�fun x � Rec f x��Rec � ��a � �b� � �a � �b� � �a � �b � �fun let fact � Rec �fun f n � if n � then � else n � f�n � ��fact � int � int � �fun fact ��it � int � �

Note� however� that the lambda was essential� otherwise the expression Rec

f goes into an in�nite recursion before it is even applied to its argument�

let rec Rec f � f�Rec f��Rec � ��a � �a� � �a � �fun let fact � Rec �fun f n � if n � then � else n � f�n � ��Uncaught exception� Out�of�memory


�� Type de�nitions

We have promised that ML has facilities for declaring new type constructors�so that composite types can be built up out of existing ones� In fact� ML goesfurther and allows a composite type to be built up not only out of preexistingtypes but also from the composite type itself� Such types� naturally enough� aresaid to be recursive� They are declared using the type keyword followed by anequation indicating how the new type is built up from existing ones and itself�We will illustrate this by a few examples� The �rst one is the de�nition of a sumtype� intended to correspond to the disjoint union of two existing types�

type ��a��b�sum � inl of �a � inr of �b��Type sum defined�

Roughly� an object of type ��a��b�sum is either something of type �a orsomething of type �b� More formally� however� all these things have di�erenttypes� The type declaration also declares the so�called constructors inl and inr�These are functions that take objects of the component types and inject theminto the new type� Indeed� we can see their types in the ML system and applythem to objects�

inl�� a � ��a� �b� sum � �fun inr�� a � ��b� �a� sum � �fun inl �� int� �a� sum � inl � inr false�� a� bool� sum � inr false

We can visualize the situation via the following diagram� Given two existingtypes � and �� the type �� sum is composed precisely of separate copies of �and �� and the two constructors map onto the respective copies�

�

�

�� sum

��

��

��

inl

inr

XXXXXXXXXXXXXz

�� TYPE DEFINITIONS ��

This is similar to a union in C� but in ML the copies of the component typesare kept apart and one always knows which of these an element of the unionbelongs to� By contrast� in C the component types are overlapped� and theprogrammer is responsible for this book�keeping�

�� Pattern matching

The constructors in such a de�nition have three very important properties�

� They are exhaustive� i�e� every element of the new type is obtainable eitherby inl x for some x or inr y for some y� That is� the new type containsnothing besides copies of the component types�

� They are injective� i�e� an equality test inl x � inl y is true if and onlyif x � y� and similarly for inr� That is� the new type contains a faithfulcopy of each component type without identifying any elements�

� They are distinct� i�e� their ranges are disjoint� More concretely this meansin the above example that inl x � inr y is false whatever x and y mightbe� That is� the copy of each component type is kept apart in the new type�

The second and third properties of constructors justify our using patternmatching� This is done by using more general varstructs as the arguments ina lambda� e�g�

fun �inl n� � n �� inr b� � b��

� � �int� bool� sum � bool � �fun

This function has the property� naturally enough� that when applied to inl n itreturns n � � and when applied to inr b it returns b� It is precisely because ofthe second and third properties of the constructors that we know this does givea wellde�ned function� Because the constructors are injective� we can uniquelyrecover n from inl n and b from inr b� Because the constructors are distinct�we know that the two clauses cannot be mutually inconsistent� since no value cancorrespond to both patterns�

In addition� because the constructors are exhaustive� we know that each valuewill fall under one pattern or the other� so the function is de�ned everywhere�Actually� it is permissible to relax this last property by omitting certain patterns�though the ML system then issues a warning�

fun �inr b� � b��Toplevel input�fun �inr b� � b��Warning� this matching is not exhaustive�� a� �b� sum � �b � �fun


If this function is applied to something of the form inl x� then it will notwork�

let f � fun �inr b� � b��Toplevel input�let f � fun �inr b� � b�� Warning� this matching is not exhaustive�f � ��a� �b� sum � �b � �fun f �inl ��Uncaught exception� Match�failure ��

Though booleans are built into ML� they are e�ectively de�ned by a rathertrivial instance of a recursive type� often called an enumerated type� where theconstructors take no arguments�

type bool � false � true��

Indeed� it is perfectly permissible to de�ne things by matching over the truthvalues� The following two phrases are completely equivalent�

if � � � then � else �� int � �fun true � � � false � � �� int �

Pattern matching is� however� not limited to casewise de�nitions over elementsof recursive types� though it is particularly convenient there� For example� wecan de�ne a function that tells us whether an integer is zero as follows�

fun � true � n � false�� int � bool � �fun �fun � true � n � false� �� bool � true �fun � true � n � false� �� bool � false

In this case we no longer have mutual exclusivity of patterns� since � matcheseither pattern� The patterns are examined in order� one by one� and the �rstmatching one is used� Note carefully that unless the matches are mutually ex�clusive� there is no guarantee that each clause holds as a mathematical equation�For example in the above� the function does not return false for any n� so thesecond clause is not universally valid�

Note that only constructors may be used in the above special way as com�ponents of patterns� Ordinary constants will be treated as new variables boundinside the pattern� For example� consider the following�


let true�� true��true�� bool � true let false�� false��false�� bool � false �fun true�� false�� Toplevel input��fun true�� false�� Warning� this matching case is unused�� int � �

In general� the unit element �� the truth values� the integer numerals� thestring constants and the pairing operation in�x comma� have constructor status�as well as other constructors from prede�ned recursive types� When they occurin a pattern the target value must correspond� All other identi�ers match anyexpression and in the process become bound�

As well as the varstructs in lambda expressions� there are other ways of per�forming pattern matching� Instead of creating a function via pattern matchingand applying it to an expression� one can perform pattern�matching over theexpression directly using the following construction�

match expression with pattern��E� j � � � j patternn��En

The simplest alternative of all is to use

let pattern � expression

but in this case only a single pattern is allowed�

�� Recursive types

The previous examples have all been recursive only vacuously� in that we havenot de�ned a type in terms of itself� For a more interesting example� we willdeclare a type of lists �nite ordered sequences� of elements of type �a�

type ��a�list � Nil � Cons of �a � ��a�list��Type list defined�

Let us examine the types of the constructors�

Nil�� a list � Nil Cons�� a � �a list � �a list � �fun


The constructor Nil� which takes no arguments� simply creates some object oftype ��a�list which is to be thought of as the empty list� The other constructorCons takes an element of type �a and an element of the new type ��a�list andgives another� which we think of as arising from the old list by adding one elementto the front of it� For example� we can consider the following�

Nil�� a list � Nil Cons��Nil�� int list � Cons �� Nil� Cons��Cons��Nil�� int list � Cons �� Cons �� Nil�� Cons��Cons��Cons��Nil�� int list � Cons �� Cons �� Cons �� Nil��

Because the constructors are distinct and injective� it is easy to see thatall these values� which we think of as lists � � �� and �� are distinct�Indeed� purely from these properties of the constructors� it follows that arbitrarilylong lists of elements may be encoded in the new type� Actually� ML already hasa type list just like this one de�ned� The only di�erence is syntactic� the emptylist is written �� and the recursive constructor �� has in�x status� Thus� theabove lists are actually written�

!�� a list � ! �� !�� int list � �! �� !�� int list � �� ! �� !�� int list � �� !

The lists are printed in an even more natural notation� and this is also allowedfor input� Nevertheless� when the exact expression in terms of constructors isneeded� it must be remembered that this is only a surface syntax� For example�we can de�ne functions to take the head and tail of a list� using pattern matching�

let hd �h��t� � h��Toplevel input�let hd �h��t� � h�� Warning� this matching is not exhaustive�hd � �a list � �a � �fun let tl �h��t� � t��Toplevel input�let tl �h��t� � t�� Warning� this matching is not exhaustive�tl � �a list � �a list � �fun


The compiler warns us that these both fail when applied to the empty list�since there is no pattern to cover it remember that the constructors are distinct��Let us see them in action�

hd ��!�� int � � tl ��!�� int list � �� ! hd !��Uncaught exception� Match�failure

Note that the following is not a correct de�nition of hd� In fact� it constrainsthe input list to have exactly two elements for matching to succeed� as can beseen by thinking of the version in terms of the constructors�

let hd x�y! � x��Toplevel input�let hd x�y! � x�� Warning� this matching is not exhaustive�hd � �a list � �a � �fun hd ��!�� int � � hd ��!��Uncaught exception� Match�failure

Pattern matching can be combined with recursion� For example� here is afunction to return the length of a list�

let rec length �fun ! �

� �h��t� � � � length t��length � �a list � int � �fun length !�� int � length ��!�� int � �

Alternatively� this can be written in terms of our earlier �destructor functionshd and tl�

let rec length l �if l � ! then else � � length�tl l��

This latter style of function de�nition is more usual in many languages� no�tably LISP� but the direct use of pattern matching is often more elegant�


�� Tree structures

It is often helpful to visualize the elements of recursive types as tree structures�with the recursive constructors at the branch nodes and the other datatypes atthe leaves� The recursiveness merely says that plugging subtrees together givesanother tree� In the case of lists the �trees are all rather spindly and one�sided�with the list �� being represented as�

��

�

��

��

��

��

��

��

�

�

�

� ��

It is not di�cult to de�ne recursive types which allow more balanced trees�e�g�

type ��a�btree � Leaf of �a� Branch of ��a�btree � ��a�btree��

In general� there can be several di�erent recursive constructors� each with adi�erent number of descendants� This gives a very natural way of representingthe syntax trees of programming and other formal� languages� For example� hereis a type to represent arithmetical expressions built up from integers by additionand multiplication�

type expression � Integer of int� Sum of expression � expression� Product of expression � expression��

and here is a recursive function to evaluate such expressions�

let rec eval �fun �Integer i� � i� �Sum�e��e�� eval e� � eval e�� Product�e��e�� eval e� � eval e��

eval � expression � int � �fun eval �Product�Sum�Integer ��Integer ��Integer �� int � ��


Such abstract syntax trees are a useful representation which allows all sorts ofmanipulations� Often the �rst step programming language compilers and relatedtools take is to translate the input text into an �abstract syntax tree accordingto the parsing rules� Note that conventions such as precedences and bracketingsare not needed once we have reached the level of abstract syntax� the tree struc�ture makes these explicit� We will use such techniques to write ML versions ofthe formal rules of lambda calculus that we were considering earlier� First� thefollowing type is used to represent lambda terms�

type term � Var of string� Const of string� Comb of term � term� Abs of string � term��

Type term defined�

Note that we make Abs take a variable name and a term rather than twoterms in order to prohibit the formation of terms that aren t valid� For example�we represent the term �x y� y x x y� by�

Abs��x��Abs��y��Comb�Var �y��Comb�Comb�Var �x��Var �x��Var �y��

Here is the recursive de�nition of a function free in to decide whether avariable is free in a term�

let rec free�in x �fun �Var v� � x � v

� �Const c� � false� �Comb�s�t�� free�in x s or free�in x t� �Abs�v�t�� not x � v " free�in x t��

free�in � string � term � bool � �fun free�in �x� �Comb�Var �f��Var �x�� bool � true free�in �x� �Abs��x��Comb�Var �x��Var �y�� bool � false

Similarly� we can de�ne substitution by recursion� First� though� we needa function to rename variables to avoid name clashes� We want the ability toconvert an existing variable name to a new one that is not free in some givenexpression� The renaming is done by adding prime characters�

let rec variant x t �if free�in x t then variant �x�� telse x��

variant � string � term � string � �fun variant �x� �Comb�Var �f��Var �x�� string � �x�� variant �x� �Abs��x��Comb�Var �x��Var �y�� string � �x� variant �x� �Comb�Var �f��Comb�Var �x��Var �x�� string � �x��


Now we can de�ne substitution just as we did in abstract terms�

let rec subst u �t�x� �match u withVar y � if x � y then t else Var y

� Const c � Const c� Comb�s��s�� Comb�subst s� �t�x��subst s� �t�x�� Abs�y�s� � if x � y then Abs�y�s�

else if free�in x s " free�in y t thenlet z � variant y �Comb�s�t�� inAbs�z�subst �subst s �Var y�z�� t�x��

else Abs�y�subst s �t�x��subst � term � term � string � term � �fun

Note the very close parallels between the standard mathematical presentationof lambda calculus� as we presented earlier� and the ML version here� All we reallyneed to make the analogy complete is a means of reading and writing the termsin some more readable form� rather than by explicitly invoking the constructors�We will return to this issue later� and see how to add these features�

�� The subtlety of recursive types

A recursive type may contain nested instances of other type constructors� includ�ing the function space constructor� For example� consider the following�

type ��a�embedding � K of ��a�embedding��a��Type embedding defined�

If we stop to think about the underlying semantics� this looks disquieting�Consider for example the special case when �a is bool� We then have an injec�tive function K��bool�embedding��bool��bool�embedding� This directlycontradicts Cantor s theorem that the set of all subsets of X cannot be injectedinto X�� Hence we need to be more careful with the semantics of types� Infact � � � cannot be interpreted as the full function space� or recursive typeconstructions like the above are inconsistent� However� since all functions wecan actually create are computable� it is reasonable to restrict ourselves to com�putable functions only� With that restriction� a consistent semantics is possible�although the details are complicated�

The above de�nition also has interesting consequences for the type system�For example� we can now de�ne the Y combinator� by using K as a kind of typecast� Note that if the Ks are deleted� this is essentially the usual de�nition of

�Proof� consider C � fi�s j s � ��X and i�s �� sg� If i � ��X � X is injective we havei�C � C � i�C �� C a contradiction� This is similar to the Russell paradox and in factprobably inspired it� The analogy is even closer if we consider the equivalent form that thereis no surjection j � X � ��X and prove it by considering fs j s �� j�sg�


the Y combinator in untyped lambda calculus� The use of let is only used forthe sake of e�ciency� but we do need the �redex involving z in order to preventlooping under ML s evaluation strategy�

let Y h �let g �K x� z � h �x �K x�� z ing �K g��

Y � ��a � �b� � �a � �b� � �a � �b � �fun let fact � Y �fun f n � if n � then � else n � f�n � ��fact � int � int � �fun fact �� int � ��

Thus� recursive types are a powerful addition to the language� and allow usto back o� the change of making the recursion operator a primitive�

Exercises

�� What goes wrong in our de�nition of subst if we replace

Var y �� if x � y then t else Var y

by these separate patterns�

Var x �� t # Var y �� Var y

�� It would be possible� though ine�cient� to de�ne natural numbers using thefollowing recursive type�

type num � Zero � Suc of num��

De�ne functions to perform arithmetic on natural numbers in this form�

�� Use the type de�nition for the syntax of lambda terms that we gave above�Write a function to convert an arbitrary lambda term into an equivalentin terms of the S� K and I combinators� based on the technique we gaveearlier� Represent the combinators by Const �S� etc�

�� Extend the syntax of lambda terms to a incorporate Church�style typing�Write functions that check the typing conditions before allowing terms tobe formed� but otherwise map down to the primitive constructors�

�� Consider a type of binary trees with strings at the nodes de�ned by�

type stree � Leaf � Branch of stree � string � stree��


Write a function strings to return a list of the strings occurring in the treein left�to�right order� i�e� at each node it appends the list resulting from theleft branch� the singleton list at the current node� and the list occurring atthe right branch� in that order� Write a function that takes a tree where thecorresponding list is alphabetically sorted and a new string� and returns atree with the new string inserted in the appropriate place�

�� Re�ne the above functions so that the tree remains almost balancedat each stage� i�e� the maximum depths occurring at the left and rightbranches di�er by at most one��

�� Can you characterize the types of expressions that can be written downin simply typed lambda calculus� without a recursion operator�� What isthe type of the following recursive function�

let rec f i � �i o f� �i o i��

Show how it can be used to write down ML expressions with completelypolymorphic type �� Can you write a terminating expression with thisproperty� What about an arbitrary function type ��

�See Adel�son�Vel�skii and Landis �� and various algorithm books for more details ofthese �AVL trees� and similar methods for balanced trees�

�The answer lies in the �Curry�Howard isomorphism� � see Girard Lafont and Taylor�� for example�

Chapter �

Proving programs correct

As programmers know through bitter personal experience� it can be very di�cultto write a program that is correct� i�e� performs its intended function� Most largeprograms have bugs� The consequences of these bugs can vary widely� Somebugs are harmless� some merely irritating� Some are deadly� For example theprograms inside heart pacemakers� aircraft autopilots� car engine managementsystems and antilock braking systems� radiation therapy machines and nuclearreactor controllers are safety critical� An error in one of them can lead directly toloss of life� possibly on a large scale� As computers become ever more pervasivein society� the number of ways in which program bugs can kill people increases��

How can we make sure a program is correct� One very useful method is simplyto test it in a variety of situations� One tries the program out on inputs designedto exercise many di�erent parts of it and reveal possible bugs� But usually thereare too many possible situations to try them all exhaustively� so there may stillbe bugs lying undetected� As repeatedly emphasized by various writers� notablyDijkstra� program testing can be very useful for demonstrating the presence ofbugs� but it is only in a few unusual cases where it can demonstrate their absence�

Instead of testing a program� one can try to verify it mathematically� Aprogram in a su�ciently precisely de�ned programming language has a de�nitemathematical meaning� Similarly� the requirements on a program can be ex�pressed using mathematics and logic as a precise speci�cation� One can then tryto perform a mathematical proof that the program meets its speci�cation�

One can contrast testing and veri�cation of programs by considering the fol�lowing simple mathematical assertion�

�Nn��n �

NN " ��

�It is claimed that this holds for any N � We can certainly test it for any

number of particular values of N � This may lead to our intuitive con�dencein the formula� such that we can hardly believe it could ever fail� However in

�Similar remarks apply to hardware but we will focus on software�

�

� CHAPTER � PROVING PROGRAMS CORRECT

general a preponderance of numerical evidence can be deceptive� there are wellknown cases in number theory where things turned out to be false against theweight of many particular cases�� A more reliable procedure is to prove the aboveformula mathematically� This can easily be done using induction on N � In thesame way� we hope to replace testing a program on a range of inputs by somecompletely general proof that it always works correctly� It is important� however�to appreciate two limitations of program proofs�

� One has no guarantee that the execution of the program on the computercorresponds exactly to the abstract mathematical model� One must beon the lookout for discrepancies� caused perhaps by bugs in compilers andoperating systems� or physical defects in the underlying technology� This is�of course� a general problem in applications of mathematics to science� Forexample� assuming that arithmetic operations correspond exactly to theirmathematical counterparts is a natural simplifying assumption� just as onemight neglect air resistance in the analysis of a simple dynamical system�But both can be invalid in certain cases�

� The program is to be veri�ed by proof against a mathematical speci�cation�It is possible that this speci�cation does not capture accurately what isactually wanted in the real world� In fact� it can be remarkably di�cult toarrive at a precise mathematical version of informal requirements� There isevidence that many of the serious problems in computer systems are causednot by coding errors in the usual sense� but by a mistaken impression ofwhat is actually wanted� Just formalizing the requirements is often valuablein itself�

We can represent this situation by the following diagram�

Actual system

Mathematical model

Mathematical speci�cation

Actual requirements

�

�

�

We are trying to establish a link between the bottom and top boxes� i�e� theactual system and the actual requirements in real life� To do this� we proceed by

�For example Littlewood proved in �� that ��n � li�n changes sign in�nitely often where ��n is the number of primes � n and li�n �

Rn

du�ln�u� This came as a surprise

since not a single sign change had been found despite extensive testing up to ��

�� FUNCTIONAL PROGRAMS AS MATHEMATICAL OBJECTS �

producing a mathematical version of each� It is only the central link� betweenthe mathematical model of the system and the mathematical version of the spec�i�cation� that is mathematically precise� The other links remain informal� Allwe can do is try to keep them small by using a realistic model of the system anda high�level mathematical speci�cation that is reasonably readable�

These reservations notwithstanding� program proving has considerable merit�Compared with testing� it establishes correctness once and for all� possibly for awhole class of programs at once e�g� controlled by the setting of some parameter��Moreover because it is a more principled analytical approach� the process ofproving a program or failing to do so� can lead to a deeper appreciation of theprogram and of the task in hand�

�� Functional programs as mathematical ob

jects

In the introduction� we remarked that pure� functional programs really corre�spond directly to functions in the mathematical sense� For this reason� it isoften suggested that they are more amenable to formal proof than imperativeprograms� Even if this is true� and many would dispute it� it is worth realizingthat the gap between this mathematical abstraction and the �nal execution inhardware is greater than for typical imperative languages� In particular� therecan be substantial storage requirements hidden in the eventual realization on thecomputer� So one may just be getting easier proofs because they prove less aboutwhat really matters� We won t settle this argument here� but we want to showthat reasoning about simple functional programs is often fairly straightforward�

The example at the end of the previous chapter showed that a naive associ�ation of function spaces of ML and function spaces of mathematics is not quiteaccurate� However� if we stay away from the higher reaches of recursive types andare not concerned with the space of all functions� only with some particular ones�we can safely identify the objects of ML with those of abstract mathematicalrealms��

We intend to model functional programs as mathematical functions� Giventhe equations � typically recursive � that de�ne an ML function� we want tointerpret them as �axioms about the corresponding mathematical objects� Forexample� we would assume from the de�nition of the factorial function�

let rec fact � fun � �� n � n � fact�n � ��

that fact�� and that for all n �� we have factn� � n � factn� �� This isright� but we need to tread carefully because of the question of termination� For

�We will neglect arithmetic over�ow whenever we use arithmetic� There is a facility in CAMLfor arbitrary precision arithmetic at the cost of in principle unlimited storage requirements�


negative n� the program fails to terminate� so the second equation is true onlyin the vacuous sense that both sides are �unde�ned � It is much easier to reasonabout functions when we know they are total� so sometimes one uses a separateargument to establish termination� In the examples that follow� this is done inparallel with the correctness proof� i�e� we prove together that for each argument�the function terminates and the result satis�es our speci�cation�

As for the actual proof that the function obeys the speci�cation� this canbe a mathematical proof in the most general sense� However it often happensthat when functions are de�ned by recursion� properties of them� dually� canbe proved by induction� this includes termination� Moreover� the exact form ofinduction arithmetical� structural� wellfounded� usually corresponds to the modeof recursion used in the de�nition� This should become clearer when we have seena few examples�

�� Exponentiation

Recall our simple de�nition of exponentiation�

let rec exp x n �if n � then �else x � exp x �n � ��

We will prove the following property of exp�

Theorem �� For all n � � and x� exp x n is de�ned and exp x n � xn�Proof� The ML function exp was de�ned by primitive� step�by�step� recursion�It is therefore no surprise that our proof proceeds by ordinary� step�by�step induc�tion� We prove that it holds for n � � and then that if it holds for any n � � italso holds for n " ��

�� If n � �� then by de�nition exp x n � �� By de�nition� for any integerx� we have x� � �� so the desired fact is established� Note that we assume�� an example of how one must state the speci�cation precisely whatdoes xn mean in general� and how it might surprise some people�

�� Suppose that for n � � we have exp x n � xn� Since n � �� we haven " � �� and so by de�nition exp x n " �� x � exp x n " �� Therefore�

exp x n " �� x � exp x n " ��

� x � exp x n

� x � xn

� xn�

�� GREATEST COMMON DIVISOR �

Q�E�D�

�� Greatest common divisor

Recall our function to calculate the greatest common divisor gcdm�n� of twonatural numbers m and n�

let rec gcd x y �if y � then x else gcd y �x mod y��

In fact� we claim this works for any integers� not just positive ones� However�one needs to understand the precise de�nition of gcd in that case� Let us de�nethe relation �ujv � or �u divides v for any two integers u and v to mean �v is anintegral multiple of u � i�e� there is some integer d with v � du� For example� �j��j�� j� but � � j�� j�� We will say that d is a greatest common divisor of xand y precisely if�

� djx and djy

� For any other integer d�� if d�jx and d�jy then d�jd�

Note that we say d�jd not d� � d� This belies the use of �greatest � but it is infact the standard de�nition in algebra books� Observe that any pair of numbersexcept � and �� has two gcds� since if d is a gcd of x and y� so is �d�

Our speci�cation is now� for any integers x and y� d � gcd x y is a gcd ofx and y� This is another good example of how providing a precise speci�cation�even for such a simple function� is harder than it looks� This speci�cation alsoexempli�es the common property that it does not completely specify the answer�merely places certain constraints on it� If we de�ned�

let rec ngcd x y � ��gcd x y��

then the function ngcd would satisfy the speci�cation just as well as gcd� Ofcourse� we are free to tighten the speci�cation if we wish� e�g� insist that if x andy are positive� so is the gcd returned�

The gcd function is not de�ned simply by primitive recursion� In fact� gcdx y is de�ned in terms of gcd y �x mod y� in the step case� Correspondingly�we do not use step�by�step induction� but wellfounded induction� We want awellfounded relation that decreases over recursive calls� this will guarantee ter�mination and act as a handle for proofs by wellfounded induction� In general�complicated relations on the arguments are required� But often it is easy toconcoct a measure that maps the arguments to natural numbers such that themeasure decreases over recursive calls� Such is the case here� the measure is jyj�


Theorem �� For any integers x and y� gcd x y terminates in a gcd of x and y�Proof� Take some arbitrary n� and suppose that for all x and y with jyj n wehave that gcd x y terminates with a gcd of x and y� and try to prove that thesame holds for all x and y with jyj � n� This suces for the main result� sinceany y will have some n with jyj � n� So� let us consider an x and y with jyj � n�There are two cases to consider� according to the case split used in the de�nition�

� Suppose y � �� Then gcd x y � x by de�nition� Now trivially xjx andxj�� so it is a common divisor� Suppose d is another common divisor� i�e�djx and dj�� Then indeed we have djx immediately� so x must be a greatestcommon divisor�

� Suppose y �� We want to apply the inductive hypothesis to gcd y x mod y��We will write r � x mod y for short� The basic property of the mod functionis that� since y �� for some integer q we have x � qy " r and jrj jyj�Since jrj jyj the inductive hypothesis tells us that d � gcd y x mod y� isa gcd of y and r� We just need to show that it is a gcd of x and y� It iscertainly a common divisor� since if djy and djr we have djx� as x � qy"r�Now suppose d�jx and d�jy� By the same equation� we �nd that d�jr� Thusd� is a common divisor of y and r� but then by the inductive hypothesis� d�jdas required�

Q�E�D�

Note that the basic property of the modulus operation that we used needscareful checking against the speci�cation of mod in the CAML manual� Thereare well known di�erences between languages� and between implementations ofthe same language� when moduli involving negative numbers are concerned� Ifever one is in doubt over the validity of a necessary assumption� it can be madeexplicit in the theorem� �if � � � then � � � �

�� Appending

We will now have an example of a function de�ned over lists� The functionappend is intended� not surprisingly� to append� or join together� two lists� Forexample� if we append �� and �� we get ��

let rec append l� l� �match l� with ! � l�

� �h��t� � h��append t l��

This is de�ned by primitive recursion over the type of lists� It is de�nedon the empty list� and then for h �� t in terms of the value for t� Consequently�

�� REVERSING �

when proving theorems about it� it is natural to use the corresponding principle ofstructural induction for lists� if a property holds for the empty list� and wheneverit holds for t it holds for any h �� t� then it holds for any list� However this is notobligatory� and if preferred� we could proceed by mathematical induction on thelength of the list� We will aim at proving that the append operation is associative�

Theorem �� For any three lists l�� l� and l� we have�

append l� append l� l�� append append l� l�� l�

Proof� By structural induction on l�� we prove this holds for any l� and l��

� If l� � � then�

append l� append l� l�� append � append l� l��

� append l� l�

� append append � l�� l�

� append append l� l�� l�

� Now suppose l� � h �� t� We may assume that for any l� and l� we have�

append t append l� l�� append append t l�� l�

Therefore�

append l� append l� l�� append h �� t� append l� l��

� h �� append t append l� l��

� h �� append append t l�� l��

� append h �� append t l�� l��

� append append h �� t� l�� l��

� append append l� l�� l��

Q�E�D�

�� Reversing

It is not di�cult to de�ne a function to reverse a list�

CHAPTER � PROVING PROGRAMS CORRECT

let rec rev �fun ! � !� �h��t� � append �rev t� h!��

rev � �a list � �a list � �fun rev ��!�� int list � �� !

We will prove that rev is an involution� i�e� that

revrev l� � l

However� if we try to tackle this directly by list induction� we �nd that weneed a couple of additional lemmas� We will prove these �rst�

Lemma �� For any list l we have append l � � l�Proof� Structural induction on l�

� If l � � we have�

append l � � append � �

� �

� l

� Now suppose l � h �� t and we know that append t � � t� We �nd�

append l � � append h �� t� �

� h �� append t � �

� h �� t

� l

Q�E�D�

Lemma �� For any lists l� and l� we have

revappend l� l�� append rev l�� rev l��

Proof� Structural induction on l��

� If l� � � we have�

revappend l� l�� revappend � l��

� rev l�

� append rev l��

� append rev l�� rev � �

�� REVERSING �

� Now suppose l� � h �� t and we know that

revappend t l�� append rev l�� rev t�

then we �nd�

revappend l� l�� revappend h �� t� l��

� revh �� append t l��

� append revappend t l�� h

� append append rev l�� rev t�� h

� append rev l�� append rev t� �h �

� append rev l�� rev h �� t��

� append rev l�� rev l��

Q�E�D�

Theorem �� For any list l we have revrev l� � l�Proof� Structural induction on l�

� If l � � we have�

revrev l� � revrev � �

� rev �

� �

� l

� Now suppose l � h �� t and we know that

revrev t� � t

then we �nd�

revrev l� � revrev h �� t��

� revappend rev t� �h �

� append rev �h � revrev t��

� append rev �h � t

� append rev h �� t

�� CHAPTER � PROVING PROGRAMS CORRECT

� append append rev � � �h � t

� append append � �h � t

� append �h t

� append h �� t

� h �� append � t�

� h �� t

� l

Q�E�D�

There are lots of theorems relating the list operations that can be proved ina similar style� Generally� one proceeds by list induction� In subtler cases� onehas to split o� lemmas which are themselves proved inductively� and sometimesto generalize the theorem before proceeding by induction� There are plenty ofexamples in the exercises for readers to try their hand at�

Further reading

Neumann �� catalogues the dangers arising from the use of computers insociety� including those arising from software bugs� A very readable discussionof the issues� with extensive discussion of some interesting examples� is given byPeterson �� The general issue of software veri�cation was at one time con�troversial� with DeMillo� Lipton� and Perlis �� among others arguing againstit� A cogent discussion is given by Barwise �� There is now a large liter�ature in software veri�cation� though much of it is concerned with imperativeprograms� Many introductory functional programming books such as Paulson�� and Reade �� give some basic examples like the ones here� It is alsoworth looking at the work of Boyer and Moore �� on verifying properties ofpure LISP functions expressed inside their theorem prover� One of the largestreal proofs of the type we consider here is the veri�cation by Aagaard and Leeser�� of a boolean simpli�er used in VLSI logic synthesis�

Exercises

�� Prove the correctness of the more e�cient program we gave for performingexponentiation�

�� REVERSING ��

let square x � x � x�� let rec exp x n �if n � then �else if n mod � � then square�exp x �n � ��else x � square�exp x �n � ��

�� Recall the de�nition of length�

let rec length �fun ! �

� �h��t� � � � length t��

Prove that lengthrev l� � length l and that lengthappend l� l�� length l� " length l��

�� De�ne the map function� which applies a function to each element of a list�as follows�

let rec map f �fun ! � !

� �h��t� � �f h��map f t��

Prove that if l �� then hdmap f l� � fhd l�� and that map f rev l� �rev map f l�� Recall the de�nition of function composition�

let o f g � fun x � f�g x�� infix �o��

Prove that map f map g l� � map f � g� l�

�� McCarthy s �� function can be de�ned by�

let rec f x � if x � then x � �else f�f�x � ��

Prove that for n � �� we have fn� � �� Pay careful attention toestablishing termination� Hint� a possible measure is �� x��

�� CHAPTER � PROVING PROGRAMS CORRECT

�� The problem of the Dutch National Flag is to sort a list of �colours red�white and blue� so that the reds come �rst� then the whites and then theblues� However� the only method permitted is to swap adjacent elements�Here is an ML solution� The function dnf returns �true i� it has made achange� and this function is then repeated until no change occurs�

type colour � Red � White � Blue��

let rec dnf �fun ! � !�false� �White��Red��rest� � Red��White��rest�true� �Blue��Red��rest� � Red��Blue��rest�true� �Blue��White��rest� � White��Blue��rest�true� �x��rest� � let fl�ch � dnf rest in x��fl�ch��

let rec flag l �let l��changed � dnf l inif changed then flag l� else l��

For example�

flag White� Red� Blue� Blue� Red� White� White� Blue� Red!�� colour list � Red� Red� Red� White� White� White� Blue� Blue� Blue!

Prove that the function flag always terminates with a correctly sorted list�

�� De�ne the following functions�

�� REVERSING ��

let rec sorted �fun ! � true

� h! � true� �h��h��t� � h� �� h� " sorted�h��t��

let rec filter p �fun ! � !

� �h��t� � let t� � filter p t inif p h then h��t� else t��

let sameprop p l� l� �length�filter p l�� length�filter p l��

let rec permutes l� l� �fun ! � true

� �h��t� � sameprop �fun x � x � h� l� l� "permutes l� l� t��

let permuted l� l� � permutes l� l� l��

What do they all do� Implement a sorting function sort and prove thatfor all lists l one has sortedsort l� � true and permuted l sort l� � true�

Chapter

Eective ML

In this chapter� we discuss some of the techniques and tricks that ML program�mers can use to make programs more elegant and more e�cient� We then go onto discuss some additional imperative features that can be used when the purelyfunctional style seems inappropriate�

�� Useful combinators

The �exibility of higher order functions often means that one can write some veryuseful little functions that can be re�used for a variety of related tasks� These areoften called combinators� and not simply because they can be regarded as lambdaterms with no free variables� It often turns out that these functions are so �exiblethat practically anything can be implemented by plugging them together� ratherthan� say� explicitly making a recursive de�nition� In this way they correspond tothe original view of combinators as ubiquitous building blocks for mathematicalexpressions�

For example� a very useful combinator for list operations� often called �itlist or �fold � performs the following operation�

itlist f �x�� x�� xn b � f x� f x� f x� � � � f xn b��

A straightforward de�nition in ML is�

let rec itlist f �fun ! b � b� �h��t� b � f h �itlist f t b��

itlist � ��a � �b � �b� � �a list � �b � �b � �fun

Quite commonly� when de�ning a recursive function over lists� all one is doingis repeatedly applying some operator in this manner� By using itlist withthe appropriate argument� one can implement such functions very easily withoutexplicit use of recursion� A typical use is a function to add all the elements of alist of numbers�

��

�� USEFUL COMBINATORS ��

let sum l � itlist �fun x sum � x � sum� l ��sum � int list � int � �fun sum ��!�� int � �� sum !�� int � sum ��!�� int � �

Those especially keen on brevity might prefer to code sum as�

let sum l � itlist �prefix �� l ��

It is easy to modify this function to form a product rather than a sum�

let prod l � itlist �prefix �� l ��

Many useful list operations can be implemented in this way� For example hereis a function to �lter out only those elements of a list satisfying a predicate�

let filter p l � itlist �fun x s � if p x then x��s else s� l !��filter � ��a � bool� � �a list � �a list � �fun filter �fun x � x mod � � � ��!�� int list � �� !

Here are functions to �nd whether either all or some of the elements of a listsatisfy a predicate�

let forall p l � itlist �fun h a � p�h� " a� l true��forall � ��a � bool� � �a list � bool � �fun let exists p l � itlist �fun h a � p�h� or a� l false��exists � ��a � bool� � �a list � bool � �fun forall �fun x � x � �� !�� bool � true forall �fun x � x � �� !�� bool � false

and here are alternative versions of old favourites length� append and map�

let length l � itlist �fun x s � s � �� l ��length � �a list � int � �fun let append l m � itlist �fun h t � h��t� l m��append � �a list � �a list � �a list � �fun let map f l � itlist �fun x s � �f x��s� l !��map � ��a � �b� � �a list � �b list � �fun

Some of these functions can themselves become useful combinators� and so onupwards� For example� if we are interested in treating lists as sets� i�e� avoidingduplicate elements� then many of the standard set operations can be expressedvery simply in terms of the combinators above�

�� CHAPTER �� EFFECTIVE ML

let mem x l � exists �fun y � y � x� l��mem � �a � �a list � bool � �fun let insert x l �if mem x l then l else x��l��

insert � �a � �a list � �a list � �fun let union l� l� � itlist insert l� l��union � �a list � �a list � �a list � �fun let setify l � union l !��setify � �a list � �a list � �fun let Union l � itlist union l !��Union � �a list list � �a list � �fun let intersect l� l� � filter �fun x � mem x l�� l��intersect � �a list � �a list � �a list � �fun let subtract l� l� � filter �fun x � not mem x l�� l��subtract � �a list � �a list � �a list � �fun let subset l� l� � forall �fun t � mem t l�� l��subset � �a list � �a list � bool � �fun

The setify function is supposed to turn a list into a set by eliminating anyduplicate elements�

�� Writing e�cient code

Here we accumulate some common tricks of the trade� which can often make MLprograms substantially more e�cient� In order to justify some of them� we needto sketch in general terms how certain constructs are executed in hardware�

�� Tail recursion and accumulators

The principal control mechanism in functional programs is recursion� If we areinterested in e�cient programs� it behoves us to think a little about how recursionis implemented on conventional hardware� In fact� there is not� in this respectat least� much di�erence between the implementation of ML and many otherlanguages with dynamic variables� such as C�

If functions cannot be called recursively� then we are safe in storing their localvariables which includes the values of arguments� at a �xed place in memory� this is what FORTRAN does� However� this is not possible in general if thefunction can be called recursively� A call to a function f with one set of argumentsmay include within it a call to f with a di�erent set of arguments� The old oneswould be overwritten� even if the outer version of f needs to refer to them againafter the inner call has �nished� For example� consider the factorial function yetagain�

let rec fact n � if n � then �else n � fact�n � ��

�� WRITING EFFICIENT CODE ��

A call to fact � causes another call to fact � and beyond�� but when thiscall is �nished and the value of fact � is obtained� we still need the original valueof n� namely �� in order to do the multiplication yielding the �nal result� Whatnormally happens in implementations is that each function call is allocated a newframe on a stack� Every new function call moves the stack pointer further down�

the stack� creating space for new variables� When the function call is �nishedthe stack pointer moves up and so the unwanted inner variables are discardedautomatically� A diagram may make this clearer�

SP � n � �

n � �

n � �

n � �

n � �

n � �

n � �

This is an imagined snapshot of the stack during execution of the innermostrecursive call� i�e� fact �� All the local variables for the upper stages are stackedup above� with each instance of the function having its own stack frame� andwhen the calls are �nished the stack pointer SP moves back up�

Therefore� our implementation of fact requires n stack frames when appliedto argument n� By contrast� consider the following implementation of the factorialfunction�

let rec tfact x n �if n � then xelse tfact �x � n� �n � ��

tfact � int � int � int � �fun let fact n � tfact � n��fact � int � int � �fun fact �� int � ��

Although tfact is also recursive� the recursive call is the whole expression�it does not occur as a proper subexpression of some other expression involvingvalues of variables� Such a call is said to be a tail call because it is the very lastthing the calling function does�� and a function where all recursive calls are tailcalls is said to be tail recursive�

�Despite the name stacks conventionally grow downwards�

� CHAPTER �� EFFECTIVE ML

What is signi�cant about tail calls� When making a recursive call to tfact�there is no need to preserve the old values of the local variables� Exactly the same��xed� area of storage can be used� This of course depends on the compiler s beingintelligent enough to recognize the fact� but most compilers� including CAMLLight� are� Consequently� re�coding a function so that the recursive core of it istail recursive can dramatically cut down the use of storage� For functions like thefactorial� it is hardly likely that they will be called with large enough values ofn to make the stack over�ow� However the naive implementations of many listfunctions can cause such an e�ect when the argument lists are long�

The additional argument x of the tfact function is called an accumulator�because it accumulates the result as the recursive calls rack up� and is thenreturned at the end� Working in this way� rather than modifying the return valueon the way back up� is a common way of making functions tail recursive�

We have remarked that a �xed area of storage can be used for the argumentsto a tail recursive function� On this view� one can look at a tail recursive functionas a thinly�veiled imperative implementation� There is an obvious parallel withour C implementation of the factorial as an iterative function�

int fact�int n�

� int x �

while �n � ��

� x � x n

n � n �

�

return x

�

The initialization x � corresponds to our setting of x to � by an outerwrapper function fact� The central while loop corresponds to the recursivecalls� the only di�erence being that the arguments to the tail recursive functionmake explicit that part of the state we are interested in assigning to� Ratherthan assigning and looping� we make a recursive call with the variables updated�Using similar tricks and making the state explicit� one can easily write essentiallyimperative code in an ostensibly functional style� with the knowledge that understandard compiler optimizations� the e�ect inside the machine will� in fact� bemuch the same�

�� Minimizing consing

We have already considered the use of stack space� But various constructs infunctional programs use another kind of store� usually allocated from an areacalled the heap� Whereas the stack grows and shrinks in a sequential mannerbased on the �ow of control between functions� other storage used by the ML


system cannot be reclaimed in such a simple way� Instead� the runtime systemoccasionally needs to check which bits of allocated memory aren t being usedany more� and reclaim them for future applications� a process known as garbagecollection� A particularly important example is the space used by constructors forrecursive types� e�g� �� For example� when the following fragment is executed�

let l � �� ! in tl l��

a new block of memory� called a �cons cell � is allocated to store the instance ofthe �� constructor� Typically this might be three words of storage� one being anidenti�er for the constructor� and the other two being pointers to the head andtail of the list� Now in general� it is di�cult to decide when this memory can bereclaimed� In the above example� we immediately select the tail of the list� so it isclear that the cons cell can be recycled immediately� But in general this can t bedecided by looking at the program� since l might be passed to various functionsthat may or may not just look at the components of the list� Instead� one needs toanalyze the memory usage dynamically and perform garbage collection of what isno longer needed� Otherwise one would eventually run out of storage even whenonly a small amount is ever needed simultaneously�

Implementors of functional languages work hard on making garbage collectione�cient� Some claim that automatic memory allocation and garbage collectionoften works out faster than typical uses of explicit memory allocation in languageslike C malloc etc�� While we wouldn t go that far� it is certainly very convenientthat memory allocation is always done automatically� It avoids a lot of tediousand notoriously error�prone parts of programming�

Many constructs beloved of functional programmers use storage that needs tobe reclaimed by garbage collection� While worrying too much about this wouldcripple the style of functional programs� there are some simple measures that canbe taken to avoid gratuitous consing creation of cons cells�� One very simplerule of thumb is to avoid using append if possible� As can be seen by consideringthe way the recursive calls unroll according to the de�nition

let rec append l� l� �match l� with

! � l�� h��t� � h��append t l��

this typically generates n cons cells where n is the length of the �rst argument list�There are often ways of avoiding appending� such as adding extra accumulatorarguments to functions that can be augmented by direct use of consing� A strikingexample is the list reversal function� which we coded earlier as�

let rec rev �fun ! � !

� �h��t� � append �rev t� h!��


This typically generates about n�� cons cells� where n is the length of thelist� The following alternative� using an accumulator� only generates n of them�

let rev �let rec reverse acc �fun ! � acc� �h��t� � reverse �h��acc� t in

reverse !��

Moreover� the recursive core reverse is tail recursive� so we also save stackspace� and win twice over�

For another typical situation where we can avoid appending by judicious useof accumulators� consider the problem of returning the fringe of a binary tree� i�e�a list of the leaves in left�to�right order� If we de�ne the type of binary trees as�

type btree � Leaf of string� Branch of btree � btree��

then a simple coding is the following

let rec fringe �fun �Leaf s� � s!� �Branch�l�r�� append �fringe l� �fringe r��

However the following more re�ned version performs fewer conses�

let fringe �let rec fr t acc �match t with�Leaf s� � s��acc

� �Branch�l�r�� fr l �fr r acc� infun t � fr t !��

Note that we have written the accumulator as the second argument� so thatthe recursive call has a more natural left�to�right reading� Here is a simple ex�ample of how either version of fringe may be used�

fringe �Branch�Branch�Leaf �a��Leaf �b��Branch�Leaf �c��Leaf �d��

� � string list � �a�� b�� c�� d�!

The �rst version creates � cons cells� the second only �� On larger trees thee�ect can be more dramatic� Another situation where gratuitous consing cancrop up is in pattern matching� For example� consider the code fragment�

fun ! � !� �h��t� � if h � then t else h��t��


The �else arm creates a cons cell even though what it constructs was in factthe argument to the function� That is� it is taking the argument apart and thenrebuilding it� One simple way of avoiding this is to recode the function as�

fun l �match l with ! � !

� �h��t� � if h � then t else l��

However ML o�ers a more �exible alternative� using the as keyword� a namemay be identi�ed with certain components of the pattern� so that it never needsto be rebuilt� For example�

fun ! � !� �h��t as l� � if h � then t else l��

�� Forcing evaluation

We have emphasized that� since ML does not evaluate underneath lambdas� onecan use lambdas to delay evaluation� We will see some interesting exampleslater� Conversely� however� it can happen that one wants to force evaluation ofexpressions that are hidden underneath lambdas� For example� recall the tailrecursive factorial above�

let rec tfact x n �if n � then xelse tfact �x � n� �n � ��

let fact n � tfact � n��

Since we never really want to use tfact directly� it seems a pity to bind it toa name� Instead� we can make it local to the factorial function�

let fact� n �let rec tfact x n �if n � then xelse tfact �x � n� �n � �� in

tfact � n��

This� however� has the defect that the local recursive de�nition is only eval�uated after fact receives its argument� since before that it is hidden under alambda� Moreover� it is then reevaluated each time fact is called� We can changethis as follows

let fact� �let rec tfact x n �if n � then xelse tfact �x � n� �n � �� in

tfact ��


Now the local binding is only evaluated once� at the point of declaration offact�� According to our tests� the second version of fact is about ��* fasterwhen called on the argument �� The additional evaluation doesn t amount tomuch in this case� more or less just unravelling a recursive de�nition� yet thespeedup is signi�cant� In instances where there is a lot of computation involvedin evaluating the local binding� the di�erence can be spectacular� In fact� thereis a sophisticated research �eld of �partial evaluation devoted to performing op�timizations like this� and much more sophisticated ones besides� automatically�In a sense� it is a generalization of standard compiler optimizations for ordinarylanguages such as �constant folding � In production ML systems� however� it isnormally the responsibility of the user to force it� as here�

We might note� in passing� that if functions are implemented by pluggingtogether combinators� with fewer explicit lambdas� there is more chance that asmuch of the expression as possible will be evaluated at declaration time� To take atrivial example� f �g will perform any evaluation of f and g that may be possible�whereas �x� fg x� will perform none at all until it receives its argument� On theother side of the coin� when we actually want to delay evaluation� we really needlambdas� so a purely combinatory version is impossible�

�� Imperative features

ML has a fairly full complement of imperative features� We will not spend muchtime on the imperative style of programming� since that is not the focus of thiscourse� and we assume readers already have su�cient experience� Therefore� wetreat these topics fairly quickly with few illustrative examples� However someimperative features are used in later examples� and some knowledge of what isavailable will stand the reader in good stead for writing practical ML code�

�� Exceptions

We have seen on occasion that certain evaluations fail� e�g� through a failure inpattern matching� There are other reasons for failure� e�g� attempts to divide byzero�

� � ��Uncaught exception� Division�by�zero

In all these cases the compiler complains about an �uncaught exception � Anexception is a kind of error indication� but they need not always be propagatedto the top level� There is a type exn of exceptions� which is e�ectively a recursivetype� though it is usually recursive only vacuously� Unlike with ordinary types�one can add new constructors for the type exn at any point in the program viaan exception declaration� e�g�

�� IMPERATIVE FEATURES ��

exception Died��Exception Died defined� exception Failed of string��Exception Failed defined�

While certain built�in operations generate one usually says raise� exceptions�this can also be done explicitly using the raise construct� e�g�

raise �Failed �I don�t know why��Uncaught exception� Failed �I don�t know why�

For example� we might invent our own exception to cover the case of takingthe head of an empty list�

exception Head�of�empty��Exception Head�of�empty defined� let hd � fun ! � raise Head�of�empty

� �h��t� � h��hd � �a list � �a � �fun hd !��Uncaught exception� Head�of�empty

Normally exceptions propagate out to the top� but they can be �caught insidean outer expression by using try ��with followed by a series of patterns tomatch exceptions� e�g�

let headstring sl �try hd slwith Head�of�empty � ��

� Failed s � �Failure because ��s��headstring � string list � string � �fun headstring �hi�� there�!�� string � �hi� headstring !�� string � ��

It is a matter of opinion whether exceptions are really an imperative feature�On one view� functions just return elements of a disjoint sum consisting of theirvisible return type and the type of exceptions� and all operations implicitly passback exceptions� Another view is that exceptions are a highly non�local control�ow perversion� analogous to goto�� Whatever the semantic view one takes�exceptions can often be quite useful�

�Perhaps more precisely to C�s setjmp and longjmp�


�� References and arrays

ML does have real assignable variables� and expressions can� as a side�e�ect�modify the values of these variables� They are explicitly accessed via referencespointers in C parlance� and the references themselves behave more like ordinaryML values� Actually this approach is quite common in C too� For example� if onewants so�called �variable parameters in C� where changes to the local variablespropagate outside� the only way to do it is to pass a pointer� so that the functioncan dereference it� Similar techniques are often used where the function is to passback composite data�

In ML� one sets up a new assignable memory cell with the initial contents x bywriting ref x� Initialization is compulsory�� This expression yields a referencepointer� to the cell� Subsequent access to the contents of the cell requires anexplicit dereference using the $ operator� similar to unary in C� The cell isassigned to using a conventional�looking assignment statement� For example�

let x � ref ��x � int ref � ref � #x�� int � � x �� unit � �� #x�� int � � x �� #x � #x�� unit � �� x�� int ref � ref � #x�� int � �

Note that in most respects ref behaves like a type constructor� so one canpattern�match against it� Thus one could actually de�ne an indirection operatorlike $�

let contents�of �ref x� � x��contents�of � �a ref � �a � �fun contents�of x�� int � �

As well as being mutable� references are sometimes useful for creating ex�plicitly shared data structures� One can easily create graph structures wherenumerous nodes contain a pointer to some single subgraph�

Apart from single cells� one can also use arrays in ML� In CAML these arecalled vectors� An array of elements of type � has type � vect� A fresh vectorof size n� with each element initialized to x � once again the initialization iscompulsory � is created using the following call�


make�vect n x��

One can then read element m of a vector v using�

vect�item v m��

and write value y to element m of v using�

vect�assign v m y��

These operations correspond to the expressions v�m� and v�m� � y in C� Theelements of an array are numbered from zero� For example�

let v � make�vect � ��v � int vect � �� ! vect�item v �� int � vect�assign v � �� unit � �� v�� int vect � �� ! vect�item v �� int � �

All reading and writing is constrained by bounds checking� e�g�

vect�item v ��Uncaught exception� Invalid�argument �vect�item�

�� Sequencing

There is no need for an explicit sequencing operation in ML� since the normalrules of evaluation allow one to impose an order� For example one can do�

let � � x �� #x � � inlet � � x �� #x � � inlet � � x �� #x � � inlet � � x �� #x � � in��

and the expressions are evaluated in the expected order� Here we use a specialpattern which throws away the value� but we could use a dummy variable nameinstead� Nevertheless� it is more attractive to use the conventional notation forsequencing� and this is possible in ML by using a single semicolon�

x �� #x � ��x �� #x � ��x �� #x � ��x �� #x � ��


�� Interaction with the type system

While polymorphism works very well for the pure functional core of ML� it hasunfortunate interactions with some imperative features� For example� considerthe following�

let l � ref !��

Then l would seem to have polymorphic type � list ref � In accordance withthe usual rules of let�polymorphism we should be able to use it with two di�erenttypes� e�g� �rst

l �� !��

and then

hd�#l� � true��

But this isn t reasonable� because we would actually be writing something asan object of type int then reading it as an object of type bool� Consequently�some restriction on the usual rule of let polymorphism is called for where ref�erences are concerned� There have been many attempts to arrive at a soundbut convenient restriction of the ML type system� some of them very compli�cated� Recently� di�erent versions of ML seem to be converging on a relativelysimple method� called the value restriction� due to Wright �� and CAMLimplements this restriction� with a twist regarding toplevel bindings� Indeed� theabove sequence fails� But the intermediate behaviour is interesting� If we look atthe �rst line we see�

let l � ref !��l � ��a list ref � ref !

The underscore on the type variable indicates that l is not polymorphic inthe usual sense� rather� it has a single �xed type� although that type is as yetundetermined� The second line works �ne�

l �� !�� unit � ��

but if we now look at the type of l� we see that�

l�� int list ref � ref �!

The pseudo�polymorphic type has now been �xed� Granted this� it is clearthat the last line must fail�


hd�#l� � true��Toplevel input�hd�#l� � true�� This expression has type bool�but is used with type int�

So far� this seems quite reasonable� but we haven t yet explained why the sameunderscored type variables oocur in apparently quite innocent purely functionalexpressions� and why� moreover� they often disappear on eta�expansion� e�g�

let I x � x��I � �a � �a � �fun I o I��it � ��a � ��a � �fun let I� � I o I in fun x � I� x�� a � ��a � �fun fun x � �I o I� x��it � �a � �a � �fun

Other techniques for polymorphic references often rely on encoding in thetypes the fact that an expression may involve references� This seems natural�but it can lead to the types of functions becoming cluttered with this specialinformation� It is unattractive that the particular implementation of the function�e�g� imperative or functional� should be re�ected in its type�

Wright s solution� on the other hand� uses just the basic syntax of the ex�pression being let�bound� insisting that it is a so�called value before generalizingthe type� What is really wanted is knowledge of whether the expression maycause side�e�ects when evaluated� However since this is undecidable in general�the simple syntactic criterion of its being or not being a value is used� Roughlyspeaking� an expression is a value if it admits no further evaluation accordingto the ML rules � this is why an expression can often be made into a valueby performing a reverse eta conversion� Unfortunately this works against thetechniques for forcing evaluation�

Exercises

�� De�ne the C combinator as�

let C f x y � f y x��

What does the following function do�


fun f l� l� � itlist �union o C map l� o f� l� !��

�� What does this function do� Write a more e�cient version�

let rec upto n � if n � � then ! else append �upto �n�� n!��

�� De�ne a function to calculate Fibonacci numbers as follows�

let rec fib �fun � � � � �� n � fib�n � �� fib�n � ��

Why is this very ine�cient� Write a similar but better version�

�� Can you think of any uses for this exception or similar recursive ones�

exception Recurse of exn��

�� Implement a simple version of quicksort using arrays� The array is �rstpartitioned about some pivot� then two recursive calls sort the left andright portions� Which recursive calls are tail calls� What is the worst�casespace usage� How can this be improved dramatically by a simple change�

�� Prove that the two versions of rev that we have quoted� ine�cient ande�cient� always give the same answer�

Chapter �

Examples

As we have said� ML was originally designed as a metalanguage for a computertheorem prover� However it is suitable for plenty of other applications� mostlyfrom the same general �eld of �symbolic computation � In this chapter we will givesome examples of characteristic uses of ML� It is not necessary that the readershould understand every detail of the particular examples� e�g� the analyses ofreal number approximations given below� However it is worth trying out theseprograms and similar examples for yourself� and doing some of the exercises�There is no better way of getting a feel for how ML is actually used�

�� Symbolic di�erentiation

The phrase �symbolic computation is a bit vague� but roughly speaking it coversapplications where manipulation of mathematical expressions� in general con�taining variables� is emphasized at the expense of actual numerical calculation�There are several successful �computer algebra systems such as Axiom� Mapleand Mathematica� which can do certain symbolic operations that are useful inmathematics� e�g� factorizing polynomials and di�erentiating and integrating ex�pressions� They are also capable of purely numerical work�� We will illustratehow this sort of application can be tackled using ML�

We use symbolic di�erentiation as our example� because there is an algorithmto do it which is fairly simple and well known� The reader is probably familiarwith certain derivatives of basic operations� e�g d

dxsinx� � cosx�� as well as

results such as the Chain Rule and Product Rule for calculating the derivativesof composite expressions in terms of the derivatives of the components� Justas one can use these systematically in order to �nd derivatives� it is possible toprogram a fairly straightforward algorithm to do it in ML�

��

�� CHAPTER �� EXAMPLES

�� First order terms

We will allow mathematical expressions in a fairly general form� They may bebuilt up from variables and constants by the application of operators� Theseoperators may be unary� binary� ternary� or in general n�ary� We represent thisusing the following recursive datatype�

type term � Var of string� Const of string� Fn of string � �term list��

Type term defined�

For example� the expression�

sinx " y��cosx� expy�� ln� " x�

is represented by�

Fn�� Fn�� Fn��sin�� Fn�� Var �x�� Var �y�!�!��Fn��cos�� Fn�� Var �x�� Fn��exp�� Var �y�!�!�!�!��

Fn��ln�� Fn�� Const �� Var �x�!�!�!��

�� Printing

Reading and writing expressions in their raw form is rather unpleasant� This is ageneral problem in all symbolic computation systems� and typically the system sinterface contains a parser and prettyprinter to allow respectively input and out�put in a readable form� A detailed discussion of parsing is postponed� since itis worth a section of its own� but we will now write a very simple printer forexpressions� so that at least we can see what s going on� We want the printer tosupport ordinary human conventions� i�e�

� Variables and constants are just written as their names�

� Ordinary n�ary functions applied to arguments are written by juxtaposingthe function and a bracketed list of arguments� e�g� fx�� xn��

� In�x binary functions like " are written in between their arguments�

� Brackets are used where necessary for disambiguation�

� In�x operators have a notion of precedence to reduce the need for brackets�

First we declare a list of in�xes� which is a list of pairs� each operator nametogether with its precedence�

let infixes � �� !��

�� SYMBOLIC DIFFERENTIATION ��

It is common to use lists in this way to represent �nite partial functions� sinceit allows more �exibility� They are commonly called association lists� and arevery common in functional programming�� In order to convert the list into apartial function we use assoc�

let rec assoc a ��x�y��rest� � if a � x then y else assoc a rest��Toplevel input�let rec assoc a ��x�y��rest� � if a � x then y else assoc a rest�� Warning� this matching is not exhaustive�assoc � �a � ��a � �b� list � �b � �fun

The compiler warns us that if the appropriate data is not found in the list�the function will fail� Now we can de�ne a function to get the in�x precedenceof a function�

let get�precedence s � assoc s infixes��get�precedence � string � int � �fun get�precedence �� int � � get�precedence �� int � � get�precedence �$��Uncaught exception� Match�failure ��

Note that if we ever change this list of in�xes� then any functions such asget precedence that use it need to be redeclared� This is the main reason whymany LISP programmers prefer dynamic binding� However we can make the setof in�xes arbitrarily extensible by making it into a reference and dereferencing iton each application�

let infixes � ref �� !��infixes � �string � int� list ref �

ref �� ! let get�precedence s � assoc s �#infixes��get�precedence � string � int � �fun get�precedence ��Uncaught exception� Match�failure �� infixes �� #infixes�� unit � �� get�precedence �� int � �

�For more heavyweight applications some alternative such as hash tables is much better but for simple applications where the lists don�t get too long this kind of linear list is simpleand adequate�


Note that by the ML evaluation rules� the dereferencing is only applied afterthe function get precedence receives its argument� and hence results in theset of in�xes at that time� We can also de�ne a function to decide whether afunction has any data associated with it� One way is simply to try the functionget precedence and see if it works�

let is�infix s �try get�precedence s� truewith � � false��

An alternative is to use a general function can that �nds out whether anotherfunction succeeds�

let can f x � try f x� true with � � false��can � ��a � �b� � �a � bool � �fun let is�infix � can get�precedence��is�infix � string � bool � �fun

We will use the following functions that we have already considered�

let hd�h��t� � h�� let tl�h��t� � t�� let rec length l � if l � ! then else � � length�tl l��

Without further ado� here is a function to convert a term into a string�

let rec string�of�term prec �fun �Var s� � s

� �Const c� � c� �Fn�f�args��

if length args � � " is�infix f thenlet prec� � get�precedence f inlet s� � string�of�term prec� �hd args�and s� � string�of�term prec� �hd�tl args�� inlet ss � s�� f�� s� inif prec� �� prec then ��ss�� else ss

elsef��string�of�terms args��

and string�of�terms t �match t with

! � �� t! � string�of�term t� �h��t� � �string�of�term h��string�of�terms t��

The �rst argument prec indicates the precedence level of the operator of whichthe currently considered expression is an immediate subterm� Now if the currentexpression has an in�x binary operator at the top level� parentheses are needed


if the precedence of this operator is lower� e�g� if we are considering the righthand subexpression of x � y " z�� Actually we use parentheses even when theyare equal� in order to make groupings clear� In the case of associative operatorslike " this doesn t matter� but we must distinguish x� y � z� and x� y�� z�A more sophisticated scheme would give each operator an associativity�� Thesecond� mutually recursive� function string of terms is used to print a comma�separated list of terms� as they occur for non�in�x function applications of theform ft�� tn�� Let us see this in action�

let t �Fn�� Fn�� Fn��sin�� Fn�� Var �x�� Var �y�!�!��

Fn��cos�� Fn�� Var �x��Fn��exp�� Var �y�!�!�!�!��

Fn��ln�� Fn�� Const �� Var �x�!�!�!��t � term �Fn�� Fn

�� Fn ��sin�� Fn �� Var �x�� Var �y�!�!��Fn ��cos�� Fn �� Var �x�� Fn ��exp�� Var �y�!�!�!�!��

Fn ��ln�� Fn �� Const �� Var �x�!�!�!� string�of�term t�� string � �sin�x � y� � cos�x � exp�y�� ln�� x��

In fact� we don t need to convert a term to a string ourselves� CAML Lightallows us to set up our own printer so that it is used to print anything of typeterm in the standard read�eval�print loop� This needs a few special commandsto ensure that we interact properly with the toplevel loop�

open �format�� let print�term s �

open�hvbox �print�string��%��string�of�term s��%��close�box��

print�term � term � unit � �fun install�printer �print�term�� unit � ��

Now compare the e�ect�

let t �Fn�� Fn�� Fn��sin�� Fn�� Var �x�� Var �y�!�!��

Fn��cos�� Fn�� Var �x��Fn��exp�� Var �y�!�!�!�!��

Fn��ln�� Fn�� Const �� Var �x�!�!�!��t � term � %sin�x � y� � cos�x � exp�y�� ln�� x�% let x � tx � term � %sin�x � y� � cos�x � exp�y�� ln�� x�%


Once the new printer is installed� it is used whenever CAML wants to printsomething of type term� even if it is a composite datastructure such as a pair�

�x�t�� term � term �%sin�x � y� � cos�x � exp�y�� ln�� x�%�%sin�x � y� � cos�x � exp�y�� ln�� x�%

or a list�

x� t� x!�� term list � %sin�x � y� � cos�x � exp�y�� ln�� x�%�%sin�x � y� � cos�x � exp�y�� ln�� x�%�%sin�x � y� � cos�x � exp�y�� ln�� x�%!

This printer is rather crude in that it will not break up large expressionsintelligently across lines� The CAML library format� from which we used a fewfunctions above� provides for a better approach� Instead of converting the terminto a string then printing it in one piece� we can make separate printing calls fordi�erent parts of it� and intersperse these with some special function calls thathelp the printing mechanism to understand where to break the lines� Many ofthe principles used in this and similar �prettyprinting engines are due to Oppen�� We will not consider this in detail here � see the CAML manual fordetails��

�� Derivatives

Now it is a fairly simple matter to de�ne the derivative function� First let usrecall the systematic method we are taught in school�

� If the expression is one of the standard functions applied to an argumentthat is the variable of di�erentiation� e�g� sinx�� return the known deriva�tive�

� If the expression is of the form fx� " gx� then apply the rule for sums�returning f �x� " g�x�� Likewise for subtraction etc�

� If the expression is of the form fx� � gx� then apply the product rule� i�e�return f �x� � gx� " fx� � g�x��

� If the expression is one of the standard functions applied to a compositeargument� say fgx�� then apply the Chain Rule and so give g�x��f �gx��

�There is also a tutorial written by Pierre Weis available online as part of the CAML page��http��pauillac�inria�fr�caml�FAQ�format�eng�html��


This is nothing but a recursive algorithm� though that terminology is seldomused in schools� We can program it in ML very directly�

let rec differentiate x tm � match tm withVar y � if y � x then Const �� else Const ��

� Const c � Const �� Fn�� t!� � Fn�� differentiate x t!�� Fn�� t��t�!� � Fn�� differentiate x t��

differentiate x t�!�� Fn�� t��t�!� � Fn�� differentiate x t��

differentiate x t�!�� Fn�� t��t�!� �

Fn�� Fn�� differentiate x t�� t�!��Fn�� t�� differentiate x t�!�!�

� Fn��inv�� t!� � chain x t�Fn�� Fn��inv�� Fn�� t�Const ��!�!�!��

� Fn�� t�n!� � chain x t�Fn�� n� Fn�� t� Fn�� n� Const ��!�!�!��

� Fn��exp�� t!� � chain x t tm� Fn��ln�� t!� � chain x t �Fn��inv�� t!�� Fn��sin�� t!� � chain x t �Fn��cos�� t!�� Fn��cos�� t!� � chain x t

�Fn�� Fn��sin�� t!�!�� Fn�� t��t�!� � differentiate x

�Fn�� t�� Fn��inv�� t�!�!�� Fn��tan�� t!� � differentiate x

�Fn�� Fn��sin�� t!�� Fn��cos�� t!�!��and chain x t u � Fn�� differentiate x t� u!��

The chain function just avoids repeating the chain rule construction for manyoperations� Apart from that� we just systematically apply rules for sums� prod�ucts etc� and the known derivatives of standard functions� Of course� we couldadd other functions such as the hyperbolic functions and inverse trigonometricfunctions if desired� A couple of cases� namely division and tan� avoid a compli�cated de�nition by applying the di�erentiator to an alternative formulation�

�� Simpli�cation

If we try the di�erentiator out on our running example� then it works quite well�


t�� term � %sin�x � y� � cos�x � exp�y�� ln�� x�% differentiate �x� t�� term �%�� cos�x � y�� inv�cos�x � exp�y�� sin�x � y� � �� exp�y�� sin�x � exp�y�� inv�cos�x � exp�y�� inv�� x�%

differentiate �y� t�� term �%�� cos�x � y�� inv�cos�x � exp�y�� sin�x � y� � �� exp�y�� sin�x � exp�y�� inv�cos�x � exp�y�� inv�� x�%

differentiate �z� t�� term �%�� cos�x � y�� inv�cos�x � exp�y�� sin�x � y� � �� exp�y�� sin�x � exp�y�� inv�cos�x � exp�y�� inv�� x�%

However� it fails to make various obvious simpli�cations such as � � x � �and x " � � x� Some of these redundant expressions are created by the ratherunsubtle di�erentiation strategy which� for instance� applies the chain rule to ft�even where t is just the variable of di�erentiation� However� some would be hardto avoid without making the di�erentiator much more complicated� Accordingly�let us de�ne a separate simpli�cation function that gets rid of some of theseunnecessary expressions�

let simp �fun �Fn�� Const �� t!�� t� �Fn�� t� Const ��!�� t� �Fn�� t� Const ��!�� t� �Fn�� Const �� t!�� Fn�� t!�� Fn�� t�� Fn�� t�!�!�� Fn�� t�� t�!�� Fn�� Const �� t!�� Const �� Fn�� t� Const ��!�� Const �� Fn�� Const �� t!�� t� �Fn�� t� Const ��!�� t� �Fn�� Fn�� t�!�� Fn�� t�!�!�� Fn�� t�� t�!�� Fn�� Fn�� t�!�� t�!�� Fn�� Fn�� t�� t�!�!�� Fn�� t�� Fn�� t�!�!�� Fn�� Fn�� t�� t�!�!�� Fn�� Fn�� t!�!�� t� t � t��

This just applies the simpli�cation at the top level of the term� We need toexecute it in a bottom�up sweep� This will also bring negations upwards throughproducts� In some cases it would be more e�cient to simplify in a top�downsweep� e�g� � � t for a complicated expression t� but this would need repeatingfor terms like � " �� The choice is rather like that between normal andapplicative order reduction in lambda calculus�


let rec dsimp �fun �Fn�fn�args�� simp�Fn�fn�map dsimp args��

� t � simp t��

Now we get better results�

dsimp�differentiate �x� t�� term �%�cos�x � y� � inv�cos�x � exp�y�� sin�x � y� � �sin�x � exp�y�� inv�cos�x � exp�y�� inv�� x�%

dsimp�differentiate �y� t�� term �%cos�x � y� � inv�cos�x � exp�y�� sin�x � y� � ��exp�y� � sin�x � exp�y�� inv�cos�x � exp�y�� %

dsimp�differentiate �z� t�� term � %%

In general� one can always add more and more sophisticated simpli�cation�For example� consider�

let t� � Fn��tan�� Var �x�!��t� � term � %tan�x�% differentiate �x� t�� term �%�� cos�x�� inv�cos�x�� sin�x� � �� sin�x�� inv�cos�x� � ��%

dsimp�differentiate �x� t�� term � %cos�x� � inv�cos�x��

sin�x� � �sin�x� � inv�cos�x� � ��%

We would like to simplify cos�x� inv�cos�x�� simply to � ignoring� asmost commercial computer algebra systems do� the possibility that cos�x� mightbe zero� We can certainly add new clauses to the simpli�er for that� Similarly�we might like to collect up terms in the second summand� However here we startto see that human psychology plays a role� Which of the following is preferable�It is hard for the machine to decide unaided�

sin�x� � �sin�x� � inv�cos�x� � ��

sin�x� � � � inv�cos�x� � ��

sin�x� � � � cos�x� � �

�sin�x� � cos�x��

tan�x� � �


And having chosen the last one� should it then convert

� � tan�x� � �

into

sec�x� � �

Certainly� it is shorter and more conventional� However it relies on the factthat we know the de�nitions of these fairly obscure trig functions like sec� What�ever the machine decides to do� it is likely that some user will �nd it upsetting�

�� Parsing

A defect of our previous example was that the input had to be written explicitlyin terms of type constructors� Here we will show how to write a parser for thissituation� and make our code su�ciently general that it can easily be appliedto similar formal languages� Generally speaking� the problem of parsing is asfollows� A formal language is de�ned by a grammar� which is a set of productionrules for each syntactic category� For a language of terms with two in�x operators� and and only alphanumeric variables and numeral �� etc�� constants� itmight look like this�

term �� name�termlist�

j name

j �term�

j numeral

j �term

j term � term

j term term

termlist �� term�termlist

j term

We will assume for the moment that we know what names and numeralsare� but we could have similar rules de�ning them in terms of single characters�Now this set of production rules gives us a way of generating the concrete� linearrepresentation for all terms� Note that name� numeral� term and termlist aremeant to be variables in the above� whereas elements like �� and �� written inbold type are actual characters� For example we have�

�� PARSING ��

term �� term term

�� term �term�

�� term �term � term�

�� term �term � term term�

�� numeral �name � name name�

so� following similar rules for name and numeral� we can generate for example�

� �x � y z�

The task of parsing is to reverse this process of applying production rules� i�e�to take a string and discover how it could have been generated by the productionrules for a syntactic category� Typically� the output is a parse tree� i�e� a treestructure that shows clearly the sequence of applications of the rules�

One problem with parsing is that the grammar may be ambiguous� i�e� a givenstring can often be generated in several di�erent ways� This is the case in ourexample above� for the string could also have been generated by�

term �� term term

�� term �term�

�� term �term term�

�� term �term � term term�

�� numeral �name � name name�

The obvious way to �x this is to specify rules of precedence and associativityfor in�x operators� However� we can make the grammar unambiguous withoutintroducing additional mechanisms at the cost of adding new syntactic categories�

atom �� name�termlist�

j name

j numeral

j �term�

j �atom

mulexp �� atom mulexp

j atom

term �� mulexp � term

j mulexp

termlist �� term�termlist

j term


Note that this makes both in�x operators right associative� For a more com�plicated example of this technique in action� look at the formal de�nition ofexpressions in the ANSI C Standard�

�� Recursive descent parsing

The production rules suggest a very simple way of going about parsing using aseries of mutually recursive functions� The idea is to have a function for eachsyntactic category� and a recursive structure that re�ects exactly the mutual re�cursion found in the grammar itself� For example� the procedure for parsingterms� say term will� on encountering a � symbol� make a recursive call to itselfto parse the subterm� and on encountering a name followed by an opening paren�thesis� will make a recursive call to termlist� This in itself will make at leastone recursive call to term� and so on� Such a style of programming is particularlynatural in a language like ML where recursion is the principal control mechanism�

In our ML implementation� we suppose that we have some input list of objectsof arbitrary type � for the parser to consume� These might simply be characters�but in fact there is usually a separate level of lexical analysis which collectscharacters together into tokens such as �x� � �� and �%� � so by the time wereach this level the input is a list of tokens� We avoid committing ourselvesto any particular type� Similarly� we will allow the parser to construct objectsof arbitrary type � from its input� These might for example be parse trees asmembers of a recursive datatype� or might simply be numbers if the parser is totake an expression and evaluate it� Now in general� a parser might not consumeall its input� so we also make it output the remaining tokens� Therefore� a parserwill have type

��list� � � ��list

For example� when given the input characters �x � y� z the function atom

should process the characters �x � y� and leave the remaining characters z� Itmight return a parse tree for the processed expression using our earlier recursivetype� and hence we would have�

atom ��x � y� � z� � Fn�� Var �x�� Var �y�!�� z�

Since any call of the function atom must be from within the function mulexp�this second function will use the result of atom and deal with the input thatremains� making a second call to atom to process the expression �z �

�� Parser combinators

Another reason why this technique works particularly well in ML is that we cande�ne some useful combinators for plugging parsers together� In fact� by giving


them in�x status� we can make the ML parser program look quite similar instructure to the original grammar de�ning the language�

First we declare an exception to be used where parsing fails� Then we de�nean in�x �� that applies two parsers in sequence� pairing up their results� and andin�x ## which tries �rst one parser� then the other� We then de�ne a many�foldversion of �� that repeats a parser as far as possible� putting the results into alist� Finally� the in�x �� is used to modify the results of a parser according to afunction�

The CAML rules for symbolic identi�ers such as �� make them in�x automat�ically� so we suppress this during the de�nition using the prefix keyword� Theirprecedences are also automatic� following from the usual precedence of their �rstcharacters as arithmetic operations etc� That is� �� is the strongest� �� falls inthe middle� and ## is the weakest� this is exactly what we want�

exception Noparse��

let prefix �� parser� parser� input �try parser� inputwith Noparse � parser� input��

let prefix �� parser� parser� input �let result��rest� � parser� input inlet result��rest� � parser� rest� in�result��result��rest��

let rec many parser input �try let result�next � parser input in

let results�rest � many parser next in�result��results��rest

with Noparse � !�input��

let prefix parser treatment input �let result�rest � parser input intreatment�result��rest��

We will� in what follows� use the following other functions� Most of these havebeen de�ned above� the main exception being explode which converts a stringinto a list of ��elements strings� It uses the inbuilt functions sub&string andstring&length� we do not describe them in detail but their function should beeasy to grasp from the example�


let rec itlist f �fun ! b � b� �h��t� b � f h �itlist f t b��

let uncurry f�x�y� � f x y��

let K x y � x��

let C f x y � f y x��

let o f g x � f�g x�� infix �o��

let explode s �let rec exap n l �

if n � then l elseexap �n � �� sub�string s n ��l� in

exap �string�length s � �� !��

In order to get started� we de�ne a few �atomic parsers� The function some

accepts any item satisfying a predicate and returns it� The function a is similar�but demands a particular item� and �nally finished just makes sure there areno items left to be parsed�

let some p �fun ! � raise Noparse� �h��t� � if p h then �h�t� else raise Noparse��

let a tok � some �fun item � item � tok��

let finished input �if input � ! then �input else raise Noparse��

�� Lexical analysis

Our parsing combinators can easily be used� in conjunction with a few simplecharacter discrimination functions� to construct a lexical analyzer for our lan�guage of terms� We de�ne a type of tokens or lexemes�� and the lexer translatesthe input string into a list of tokens� The �other cases cover symbolic identi�ers�they are all straightforward as they consist of just a single character� rather thancomposite ones like ��


type token � Name of string � Num of string � Other of string��

let lex �let several p � many �some p� inlet lowercase�letter s � �a� �� s " s �� z� inlet uppercase�letter s � �A� �� s " s �� Z� inlet letter s � lowercase�letter s or uppercase�letter s inlet alpha s � letter s or s � �� or s � �� inlet digit s � �� s " s �� inlet alphanum s � alpha s or digit s inlet space s � s � � � or s � �&n� or s � �&t� inlet collect�h�t� � h��itlist �prefix �� t �� inlet rawname �

some alpha �� several alphanum �Name o collect� inlet rawnumeral �

some digit �� several digit �Num o collect� inlet rawother � some �K true� Other inlet token ��rawname �� rawnumeral �� rawother� �� several space fst in

let tokens � �several space �� many token� snd inlet alltokens � �tokens �� finished� fst infst o alltokens o explode��

For example�

lex �sin�x � y� � cos�� x � y�� token list � Name �sin�� Other �� Name �x�� Other �� Name �y�� Other ��Other �� Name �cos�� Other �� Num �� Other �� Name �x��Other �� Name �y�� Other ��!

�� Parsing terms

Since we are now working at the level of tokens� we de�ne trivial parsers to accepttokens of only� a certain kind�

let name �fun �Name s��rest� � s�rest� � � raise Noparse��

let numeral �fun �Num s��rest� � s�rest� � � raise Noparse��

let other �fun �Other s��rest� � s�rest� � � raise Noparse��

Now we can de�ne a parser for terms� in a form very similar to the originalgrammar� The main di�erence is that each production rule has associated withit some sort of special action to take as a result of parsing�


let rec atom input� �name �� a �Other �� termlist �� a �Other ��

�fun ��name��args�� Fn�name�args�� name

�fun s � Var s�� numeral

�fun s � Const s�� a �Other �� term �� a �Other ��

�snd o fst�� a �Other �� atom

snd� inputand mulexp input

� �atom �� a�Other �� mulexp �fun ��a��m� � Fn�� a�m!��

�� atom� inputand term input

� �mulexp �� a�Other �� term �fun ��a��m� � Fn�� a�m!��

�� mulexp� inputand termlist input

� �term �� a �Other �� termlist �fun ��h��t� � h��t�

�� term �fun h � h!�� input��

Let us package everything up as a single parsing function�

let parser � fst o �term �� finished fst� o lex��

To see it in action� we try with and without the printer see above� installed�

parser �sin�x � y� � cos�� x � y�� term �Fn�� Fn ��sin�� Fn �� Var �x�� Var �y�!�!��Fn ��cos�� Fn �� Fn �� Const �� Var �x�!��

Var �y�!�!�!� install�printer �print�term�� unit � �� parser �sin�x � y� � cos�� x � y�� term � %sin�x � y� � cos�� x � y�%

�� Automatic precedence parsing

In the above parser� we �hardwired a parser for the two in�x operators� Howeverit would be nicer� and consistent with our approach to printing� if changes to theset of in�xes could propagate to the parser� Moreover� even with just two di�er�ent binary operators� our production rules and consequent parser were already


starting to look arti�cial� imagine if we had a dozen or so� What we would like isto automate the production of a layer of parsers� one for each operator� in orderof precedence� This can easily be done� First� we de�ne a general function thattakes a parser for what� at this level� are regarded as atomic expressions� andproduces a new parser for such expressions separated by a given operator� Notethat this could also be de�ned using parsing combinators� but the function belowis simple and more e�cient�

let rec binop op parser input �let atom��rest� as result � parser input inif not rest� � ! " hd rest� � Other op thenlet atom��rest� � binop op parser �tl rest�� inFn�op� atom�� atom�!��rest�

else result��

Now we de�ne functions that pick out the in�x operator in our associationlist with equal� lowest precedence� and delete it from the list� If we maintainedthe list already sorted in precedence order� we could simply take the head andtail of the list�

let findmin l � itlist�fun ��pr� as p�� pr� as p�� if pr� �� pr� then p� else p��tl l� �hd l��

let rec delete x �h��t� � if h � x then t else h��delete x t��

Now we can de�ne the generic precedence parser�

let rec precedence ilist parser input �if ilist � ! then parser input elselet opp � findmin ilist inlet ilist� � delete opp ilist inbinop �fst opp� �precedence ilist� parser� input��

and hence can modify the main parser to be both simpler and more general�







snd� inputand term input � precedence �#infixes� atom inputand termlist input



let parser � fst o �term �� finished fst� o lex��

for example�

parser �� sin�x�� sin�y�� term �Fn�� Fn �� Const �� Fn �� Fn ��sin�� Var �x�!��

Const ��!�!��Fn�� Fn �� Const �� Fn �� Fn ��sin�� Var �y�!��

Const ��!�!��Const ��!�!�

install�printer �print�term�� unit � �� parser �� sin�x�� sin�y�� term � %� � sin�x� � � � �� sin�y� � � � ��%

�� Defects of our approach

Our approach to lexing and parsing is not particularly e�cient� In fact� CAMLand some other ML implementations include a generator for LR parsers similar toYACC Yet Another Compiler Compiler� which is often used to generate parsersin C under Unix� Not only are these more general in the grammars that theycan handle� but the resulting parsers are more e�cient� However we believethe present approach is rather clear� adequate for most purposes� and a goodintroduction to programming with higher order functions�

The ine�ciency of our approach can be reduced by avoiding a few particularlywasteful features� Note that if two di�erent productions for the same syntactic


category have a common pre�x� then we should avoid parsing it more than once�unless it is guaranteed to be trivial� Our production rules for term have thisproperty�

term �� name�termlist�

j name

j � � �

We carefully put the longer production �rst in our actual implementation�otherwise success in reading a name would cause the abandonment of attempts toread a parenthesized list of arguments� Nevertheless� because of the redundancywe have mentioned� this way of constructing the parser is wasteful� In this caseit doesn t matter much� because the duplicated call only analyzes one token�However the case for termlist is more signi�cant�

let ��and termlist input



Here we could reparse a whole term� which might be arbitrarily large� We canavoid this in several ways� One is to move the ## alternation to after the initialterm has been read� A convenient way to code it is to eschew explicit recursionin favour of many�

let ��and termlist input

� term �� many �a �Other �� term snd� �fun �h�t� � h��t� input��

Therefore the �nal version of the parser is�







� �term �� many �a �Other �� term snd� �fun �h�t� � h��t�� input��


A general defect of our approach� and recursive descent parsing generally� isthat so�called left recursion in productions causes problems� This is where a givencategory expands to a sequence including the category itself as the �rst item� Forexample� if we had wanted to make the addition operator left�associative in ourearlier grammar� we could have used�

term �� term � mulexp

j mulexp

The naive transcription into ML would loop inde�nitely� calling term on thesame subterm over and over� There are various slightly ad hoc ways of dealingwith this problem� For example� one can convert this use of recursion� which insome ways is rather gratuitous� into the explicit use of repetition� We can do thisusing standard combinators like many to return a list of �mulexp s and then runover the list building the tree left�associated�

A �nal problem is that we do not handle errors very gracefully� We alwaysraise the exception Noparse� and this� when caught� initiates backtracking� How�ever this is not always appropriate for typical grammars� At best� it can causeexpensive reparsing� and can lead to ba+ing error indications at the top level�For example� on encountering � �� we expect a term following the �� and if wedon t �nd one� we should generate a comprehensible exception for the user� ratherthan raise Noparse and perhaps initiate several other pointless attempts to parsethe expression in a di�erent way�

�� Exact real arithmetic

Real arithmetic on computers is normally done via �oating point approximations�In general� we can only manipulate a real number� either ourselves or inside acomputer� via some sort of �nite representation� Some question how numbers canbe said to �exist if they have no �nite representation� For example� Kroneckeraccepted integers and rationals because they can be written down explicitly� andeven algebraic numbers� because they can be represented using the polynomialsof which they are solutions� However he rejected transcendental numbers becauseapparently they could not be represented �nitely� Allegedly he he greeted thefamous proof by Lindemann �� that � is transcendental with the remark thatthis was �interesting� except that � does not exist �

However� given our modern perspective� we can say that after all many morenumbers than Kronecker would have accepted do have a �nite representation�namely the program� or in some more abstract sense the rule� used to calculate

�Algebraic numbers are those that are roots of polynomials with integer coe�cients e�g�p� and transcendental numbers are ones that aren�t�

�� EXACT REAL ARITHMETIC ��

them to greater and greater precision� For example� we can write a programthat will produce for any argument n the �rst n digits of �� Alternatively it canproduce a rational number r such that j� � rj ��n� Whatever approach istaken to the successive approximation of a real number� the key point is that itsrepresentation� the program itself� is �nite�

This works particularly nicely in a language like ML where higher order func�tions are available� What we have called a �program above will just be an MLfunction� We can actually represent the arithmetic operations on numbers ashigher order functions that� given functions for approximating x and y� will pro�duce new ones for approximating x " y� xy� sinx� and so on� for a wide rangeof functions� In an ordinary programming language� we would need to de�nea concrete representation for programs� e�g� �G&odel numbering � and write aninterpreter for this representation��

�� Representation of real numbers

Our approach is to represent a real x by a function fx � N � Z that for eachn � N returns an approximation of x to within ��n� appropriately scaled� In fact�we have�

jfxn�� nxj �

This is of course equivalent to jfx n��n

� xj ��n

� We could return a rationalapproximation directly� but it would still be convenient for the rationals to havedenominators that are all powers of some number� so that common denominatorsarising during addition don t get too big� Performing the scaling directly seemssimpler� requiring only integer arithmetic� Our choice of the base � is largelyarbitrary� A lower base is normally advantageous because it minimizes the gran�ularity of the achievable accuracies� For example� if we used base �� then evenif we only need to increase the accuracy of an approximation slightly� we have tostep up the requested accuracy by a factor of ��

�� Arbitrary�precision integers

CAML s standard integers type int� have a severely limited range� so �rst of allwe need to set up a type of unlimited�precision integers� This is not so di�cultto program� but is slightly tedious� Fortunately� the CAML release includes alibrary that gives a fast implementation of arbitrary precision integer and in factrational� arithmetic� A version of CAML Light with this library pre�loaded isinstalled on Thor� Assuming you have used the following path setting�

�E�ectively we are manipulating functions �in extension� rather than �in intension� i�e� notconcerning ourselves with their representation�


PATH��PATH��home�jrh��caml�bin�export PATH

then the augmented version of CAML Light can be �red up using�

� camllight my�little�caml

When the system returns its prompt� use the �open directive to make thelibrary functions available�

open �num��

This library sets up a new type num of arbitrary precision rational numbers�we will just use the integer subset� CAML does not provide overloading� so it isnecessary to use di�erent symbols for the usual arithmetic operations over num�

Note that small constants of type num must be written as Int k rather thansimply k� In fact Int is a type constructor for num used in the case when thenumber is a machine�representable integer� Larger ones use Big int�

The unary negation on type num is written minus num� Another handy unaryoperator is abs num� which �nds absolute values� i�e� given x returns jxj�

The usual binary operations are also available� The truncating� divisionand modulus function� called quo num and mod num� do not have in�x status�However most of the other binary operators are in�xes� and the names are derivedfrom the usual ones by adding a slash �� It is important to realize that ingeneral� one must use �� to compare numbers for equality� This is because theconstructors of the type num are� as always� distinct by de�nition� yet in termsof the underlying meaning they may overlap� For example we get di�erent resultfrom using truncating division and true division on �� and �� even though theyare numerically the same� One uses type constructor Int and the other Ratio�

�Int � �� Int �� Int � � quo�num �Int � �� Int �� Int ��it � bool � false �Int � �� Int �� Int � �� quo�num �Int � �� Int �� Int ��it � bool � true

Here is a full list of the main in�x binary operators�

Operator Type Meaning � num �� num �� num Exponentiation � num �� num �� num Multiplication�� num �� num �� num Addition�� num �� num �� num Subtraction�� num �� num �� bool Equality�� num �� num �� bool Inequality�� num �� num �� bool Less than�� num �� num �� bool Less than or equal�� num �� num �� bool Greater than�� num �� num �� bool Greater than or equal


Let us see some further examples�

Int � �� Int ��it � num � Int � Int � �� Int ��it � num � Big�int �abstr �Int � �� Int �� Int ��it � num � Ratio �abstr quo�num �Int � �� Int �� Int ��it � num � Int ��

Note that large numbers are not printed� However we can always convert oneto a string using string of num�

string�of�num�Int � �� Int �� string � ��

This also has an inverse� called naturally enough num of string�

�� Basic operations

Recall that our real numbers are supposed to be represented by� functions Z� Z�In ML we will actually use int �� num� since the inbuilt type of integers ismore than adequate for indexing the level of accuracy� Now we can de�ne someoperations on reals� The most basic operation� which gets us started� is to producethe real number corresponding to an integer� This is easy�

let real�of�int k n � �Int � �� Int n� �� Int k��real�of�int � int � int � num � �fun real�of�int �� int � num � �fun

It is obvious that for any k this obeys the desired approximation criterion�

jfkn�� nkj � j�nk � �nkj � � �

Now we can de�ne the �rst nontrivial operation� that of unary negation�

let real�neg f n � minus�num�f n��

The compiler generalizes the type more than intended� but this will not troubleus� It is almost as easy to see that the approximation criterion is preserved� Ifwe know that for each n�

jfxn�� nxj �

then we have for any n�


jf�xn�� n�x�j � j � fxn�� n�x�j

� j � fxn�� nx�j

� jfxn�� nxj

�

Similarly� we can de�ne an �absolute value function on real numbers� usingthe corresponding function abs num on numbers�

let real�abs f n � abs�num �f n��

The correctness of this is again straightforward� using the fact that jjxj�jyjj �jx� yj� Now consider the problem of adding two real numbers� Suppose x and yare represented by fx and fy respectively� We could de�ne�

fxyn� � fxn� " fyn�

However this gives no guarantee that the approximation criterion is main�tained� we would have�

jfxyn�� nx " y�j � jfxn� " fyn�� nx " y�j

� jfxn�� nxj" jfyn�� nyj

We can guarantee that the sum on the right is less than �� but not that it isless than � as required� Therefore� we need in this case to evaluate x and y togreater accuracy than required in the answer� Suppose we de�ne�

fxyn� � fxn " �� " fyn " ��

Now we have�

jfxyn�� nx " y�j � jfxn " �� " fyn " �� nx " y�j

� jfxn " �� nxj " jfyn " �� nyj

��

�jfxn " �� n�xj"

�

�jfyn " �� n�yj

�

�� "

�

��

Apparently this just gives the accuracy required� However we have implicitlyused real mathematical division above� Since the function is supposed to yield aninteger� we are obliged to round the quotient to an integer� If we just use quo num�the error from rounding this might be almost �� after which we could never


guarantee the bound we want� however accurately we evaluate the arguments�However with a bit more care we can de�ne a division function that alwaysreturns the integer closest to the true result or one of them in the case of twoequally close ones�� so that the rounding error never exceeds �

�� This could be

done directly in integer arithmetic� but the most straightforward coding is to userational division followed by rounding to the nearest integer� as there are built�infunctions for both these operations�

let ndiv x y � round�num�x �� y��ndiv � num � num � num � �fun infix �ndiv�� Int �� ndiv �Int �� num � Int � �Int �� ndiv �Int �� num � Int � �Int�� ndiv �Int �� num � Int �� Int�� ndiv �Int �� num � Int ��

Now if we de�ne�

fxyn� � fxn " �� " fyn " �� ndiv �

everything works�

jfxyn�� nx " y�j � jfxn " �� " fyn " �� ndiv �� nx " y�j

��

�" jfxn " �� " fyn " �� nx " y�j

��

�"

�

�jfxn " �� " fyn " �� n�x " y�j

��

�"

�

�jfxn " �� n�xj"

�

�jfyn " �� n�yj

�

�"

�

�� "

�

��

� �

Accordingly we make our de�nition�

let real�add f g n ��f�n � �� g�n � �� ndiv �Int ��

We can de�ne subtraction similarly� but the simplest approach is to build itout of the functions that we already have�

let real�sub f g � real�add f �real�neg g��real�sub � �num � num� � �num � num� � num � num � �fun


It is a bit trickier to de�ne multiplication� inverses and division� Howeverthe cases where we multiply or divide by an integer are much easier and quitecommon� It is worthwhile implementing these separately as they can be mademore e�cient� We de�ne�

fmxn� � mfxn " p " �� ndiv �p�

where p is chosen so that �p � jmj� For correctness� we have�

jfmxn�� nmx�j ��

�" j

mfxn " p " ��

�p�� nmx�j

��

�"jmj

�p�jfxn " p " �� np�xj

�

�"jmj

�p�

��

�"

�

�

jmj

�p

��

�"

�

��

In order to implement this� we need a function to �nd the appropriate p� Thefollowing is crude but adequate�

let log� �let rec log� x y �if x �� Int � then yelse log� �quo�num x �Int �� y � �� in

fun x � log� �x �� Int ��

The implementation is simply�

let real�intmul m x n �let p � log� �abs�num m� inlet p� � p � � in�m �� x�n � p�� ndiv �Int � �� Int p��

For division by an integer� we de�ne�

fx�mn� � fxn� ndiv m

For correctness� we can ignore the trivial cases when m � �� which shouldnever be used� and when m � �� since then the result is exact� Otherwise� weassume jfxn�� nxj �� so jfxn��m� �nx�mj �

jmj� �

�� which� together with

the fact that jfxn� ndiv m� fxn��mj � �� yields the result� So we have�

let real�intdiv m x n �x�n� ndiv �Int m��


�� General multiplication

General multiplication is harder because the error in approximation for one num�ber is multiplied by the magnitude of the second number� Therefore� beforethe �nal accuracies are calculated� a preliminary evaluation of the arguments isrequired to determine their approximate magnitude� We proceed as follows� Sup�pose that we want to evaluate x " y to precision n� First we choose r and s sothat jr � sj � � and r " s � n " �� That is� both r and s are slightly more thanhalf the required precision� We now evaluate fxr� and fys�� and select naturalnumbers p and q that are the corresponding �binary logarithms � i�e� jfxr�j � �p

and jfys�j � �q� If both p and q are zero� then it is easy to see that we mayreturn just �� Otherwise� remember that either p � � or q � � as we will needthis later� Now set�

k � n " q � s " � � q " r " �

l � n " p� r " � � p " s " �

m � k " l�� n � p " q " �

We claim that fxyn� � fxk�fyl�� ndiv �m has the right error behaviour�i�e� jfxyn�� nxy�j �� If we write�

�kx � fxk� " �

�ly � fyl� " �

with j�j � and j�j �� we have�

jfxyn�� nxy�j ��

�" j

fxk�fyl�

�m� �nxy�j

��

�" ��mjfxk�fyl�� klxyj

��

�" ��mjfxk�fyl�� fxk� " ��fyl� " ��j

��

�" ��mj�fyl� " �fxk� " ��j

��

�" ��mj�fyl�j" j�fxk�j" j��j�

��

�" ��mjfyl�j" jfxk�j" j��j�

�

�" ��mjfyl�j" jfxk�j" ��

Now we have jfxr�j � �p� so j�rxj �p " �� Thus j�kxj �q��p " ��so jfxk�j �q��p " �� " �� i�e� jfxk�j � �q��p " �� Similarly jfyl�j ��p��q " �� Consequently�


jfyl�j" jfxk�j" � � �p��q " �� " �q��p " �� " �

� �pq� " �p� " �pq� " �q� " �

� �pq� " �p� " �q� " �

Now for our error bound we require jfyl�j " jfxk�j " � � �m�� or dividingby � and using the discreteness of the integers�

�pq� " �p " �q �pq�

We can write this as �pq " �p� " �pq " �q� �pq� " �pq�� which is truebecause we have either p � � or q � �� So at last we are justi�ed in de�ning�

let real�mul x y n �let n� � n � � inlet r � n� � � inlet s � n� � r inlet xr � x�r�and ys � y�s� inlet p � log� xrand q � log� ys inif p � " q � then Int elselet k � q � r � �and l � p � s � �and m � p � q � � in�x�k� �� y�l�� ndiv �Int � �� Int m��

�� Multiplicative inverse

Next we will de�ne the multiplicative inverse function� In order to get any sortof upper bound on this� let alone a good approximation� we need to get a lowerbound for the argument� In general� there is no better way than to keep evaluatingthe argument with greater and greater accuracy until we can bound it away fromzero� We will use the following lemma to justify the correctness of our procedure�

Lemma �� If �e � n " k " �� jfxk�j � �e and jfxk�� kxj �� where fxk�is an integer and e� n and k are natural numbers� then if we de�ne

fyn� � �nk ndiv fxk�

we have jfyn�� nx��j �� i�e� the required bound�Proof� The proof is rather tedious and will not be given in full� We just sketchthe necessary case splits� If jfxk�j � �e then a straightforward analysis gives theresult� the rounding in ndiv gives an error of at most �

�� and the remaining error

is �� If jfxk�j � �e but n " k � e� then although the second component of the


error may now be twice as much� i�e� �� there is no rounding error becausefxk� � ��e divides into �nk exactly� �We use here the fact that �e � � � �e��because since �e � n " k " �� e cannot be zero�� Finally� if jfxk�j � �e andn" k e� we have jfyn�� n �

xj � because both jfyn�j � � and � j�n �

xj ��

and both these numbers have the same sign� Q�E�D�

Now suppose we wish to �nd the inverse of x to accuracy n� First we evaluatefx�� There are two cases to distinguish�

�� If jfx��j � �r for some natural number r� then choose the least naturalnumber k which may well be zero� such that �r " k � n " �� and sete � r " k� We now return �nk ndiv fxk�� It is easy to see that theconditions of the lemma are satis�ed� Since jfx��j � �r " � we havejxj � �r� and so j�kxj � �rk� This means jfxk�j � �rk � �� and asfxk� is an integer� jfxk�j � �rk � �e as required� The condition that�e � n � k " � is easy to check� Note that if r � n we can immediatelydeduce that fyn� � � is a valid approximation�

�� If jfx��j � �� then we call the function �msd that returns the least p suchthat jfxp�j � �� Note that this may in general fail to terminate if x � ��Now we set e � n" p" � and k � e" p� and return �nk ndiv fxk�� Onceagain the conditions for the lemma are satis�ed� Since jfxp�j � �� we havej�pxj � �� i�e� jxj � �

�p� Hence j�kxj � �k�p � �e� and so jfxk�j � �e � ��

i�e� jfxk�j � �e�

To implement this� we �rst de�ne the msd function�

let msd �let rec msd n x �if abs�num�x�n�� Int � then n else msd �n � �� x in

msd ��

and then translate the above mathematics into a simple program�

let real�inv x n �let x � x�� inlet k �if x � Int � then

let r � log� x � � inlet k � n � � � � � r inif k � then else k

elselet p � msd x inn � � � p � � in

�Int � �� Int �n � k�� ndiv �x�k��

Now� of course� it is straightforward to de�ne division�

let real�div x y � real�mul x �real�inv y��


�� Ordering relations

The interesting things about all these functions is that they are in fact uncom�putable in general� The essential point is that it is impossible to decide whethera given number is zero� We can keep approximating it to greater and greateraccuracy� but if the approximations always return �� we cannot tell whether it isgoing to do so inde�nitely or whether the next step will give a nonzero answer��

If x is not zero� then the search will eventually terminate� but if x is zero� it willrun forever�

Accepting this� it is not di�cult to de�ne the relational operators� To decidethe ordering relation of x and y it su�ces to �nd an n such that jxn � ynj � ��For example� if xn � yn " � we have

�nx � xn � � � yn " � � �ny

and so x � y� We �rst write a general procedure to perform this search� and thende�ne all the orderings in terms of it� Note that the only way to arrive at there�exive orderings is to justify the irre�exive version!

let separate �let rec separate n x y �let d � x�n� �� y�n� inif abs�num�d� � Int � then delse separate �n � �� x y in

separate ��

let real�gt x y � separate x y � Int ��let real�ge x y � real�gt x y��let real�lt x y � separate x y �� Int ��let real�le x y � real�lt x y��

�� Caching

In order to try out all the above functions� we would like to be able to printout a decimal representation of some approximation to a real� This is not hard�using some more standard facilities in the CAML library� If d decimal places aredesired� then we make n such that �n � ��d and hence the accuracy correspondsat least to the number of digits printed�

let view x d �let n � � � d inlet out � x�n� �� Int � �� Int n� inapprox�num�fix d out��

�For a proof that it is uncomputable simply wrap up the halting problem by de�ningf�n � � if a certain Turing machine has halted after no more than n iterations and f�n � �otherwise�


Now we can test out some simple examples� which seem to work�

let x � real�of�int ��x � int � num � �fun let xi � real�inv x��xi � int � num � �fun let wun � real�mul x xi��wun � int � num � �fun view x ��it � string � �� view xi ��it � string � �� view wun ��it � string � ��

However there is a subtle and serious bug in our implementation� which showsitself if we try larger expressions� The problem is that subexpressions can beevaluated many times� Most obviously� this happens if they occur several timesin the input expression� But even if they don t� the multiplication� and even moreso� the inverse� require trial evaluations at di�ering accuracies� As the evaluationpropagates down to the bottom level� there is often an exponential buildup ofreevaluations� For example� the following is slow � it takes several seconds�

let x� � real�of�int � inlet x� � real�mul x� x� inlet x� � real�mul x� x� inlet x� � real�mul x� x� inlet x� � real�mul x� x� inlet x� � real�mul x� x� inlet x� � real�mul x� x� inview x� �� string � ��

We can �x this problem using the idea of caching or memo functions Michie�� We give each function a reference cell to remember the most accurateversion already calculated� If there is a second request for the same accuracy� itcan be returned immediately without further evaluation� What s more� we canalways construct a lower�level approximation� say n� from a higher one� say n"kwith k � �� If we know that jfxn " k�� nkxj � then we have�

jfxn " k� ndiv �k � �nxj ��

�" j

fxn " k�

�k� �nxj

��

�"

�

�kjfxn " k�� nkxj

�

�"

�

�k

� �


Hence we are always safe in returning fxn " k� ndiv �k� We can implementthis memo technique via a generic function memo to be inserted in each of theprevious functions�

let real�of�int k � memo �fun n � �Int � �� Int n� �� Int k��

let real�neg f � memo �fun n � minus�num�f n��

let real�abs f � memo �fun n � abs�num �f n��

let real�add f g � memo �fun n ��f�n � �� g�n � �� ndiv �Int ��

let real�sub f g � real�add f �real�neg g��

let real�intmul m x � memo �fun n �let p � log� �abs�num m� inlet p� � p � � in�m �� x�n � p�� ndiv �Int � �� Int p��

let real�intdiv m x � memo �fun n �x�n� ndiv �Int m��

let real�mul x y � memo �fun n �let n� � n � � inlet r � n� � � inlet s � n� � r inlet xr � x�r�and ys � y�s� inlet p � log� xrand q � log� ys inif p � " q � then Int elselet k � q � r � �and l � p � s � �and m � p � q � � in�x�k� �� y�l�� ndiv �Int � �� Int m��

let real�inv x � memo �fun n �let x � x�� inlet k �if x � Int � then

let r � log� x � � inlet k � n � � � � � r inif k � then else k

elselet p � msd x inn � � � p � � in

�Int � �� Int �n � k�� ndiv �x�k��

let real�div x y � real�mul x �real�inv y��

where

�� PROLOG AND THEOREM PROVING ��

let memo f �let mem � ref ��Int � infun n � let �m�res� � #mem in

if n �� m thenif m � n then reselse res ndiv �Int � �� Int�m � n��

elselet res � f n inmem �� n�res�� res��

Now the above sequence of multiplications is instantaneous� Here are a fewmore examples�

let pi� � real�div �real�of�int �� real�of�int ��pi� � int � num � �fun view pi� ��it � string � �� let pi� � real�div �real�of�int �� real�of�int ��pi� � int � num � �fun view pi� ��it � string � �� let pidiff � real�sub pi� pi��pidiff � int � num � �fun view pidiff ��it � string � �� let ipidiff � real�inv pidiff��ipidiff � int � num � �fun view ipidiff ��it � string � ��

Of course� everything we have done to date could be done with rational arith�metic� Actually� it may happen that our approach is more e�cient in certainsituations since we avoid calculating numbers that might have immense numer�ators and denominators when we just need an approximation� Nevertheless� thepresent approach really comes into its own when we want to de�ne transcendentalfunctions like exp� sin etc� We will not cover this in detail for lack of space� butit makes an interesting exercise� One approach is to use truncated Taylor series�Note that �nite summations can be evaluated directly rather than by iteratingthe addition function� this leads to much better error behaviour�

�� Prolog and theorem proving

The language Prolog is popular in Arti�cial Intelligence research� and is used invarious practical applications such as expert systems and intelligent databases�Here we will show how the main mechanism of Prolog� namely depth��rst searchthrough a database of rules using uni�cation and backtracking� can be imple�mented in ML� We do not pretend that this is a full�blown implementation of


Prolog� but it gives an accurate picture of the general �avour of the language�and we will be able to run some simple examples in our system�

�� Prolog terms

Prolog code and data is represented using a uniform system of �rst order terms�We have already de�ned a type of terms for our mathematical expressions� andassociated parsers and printers� Here we will use something similar� but not quitethe same� First of all� it simpli�es some of the code if we treat constants as nullaryfunctions� i�e� functions that take an empty list of arguments� Accordingly wede�ne�

type term � Var of string� Fn of string � �term list��

Where we would formerly have used Const s� we will now use Fn�s�� Notethat we will treat functions of di�erent arities di�erent numbers of arguments�as distinct� even if they have the same name� Thus there is no danger of ourconfusing constants with true functions�

�� Lexical analysis

In order to follow Prolog conventions� which include case sensitivity� we alsomodify lexical analysis� We will not attempt to conform exactly to the full detailsof Prolog� but one feature is very important� alphanumeric identi�ers that beginwith an upper case letter or an underscore are treated as variables� while otheralphanumeric identi�ers� along with numerals� are treated as constants� Forexample X and Answer are variables while x and john are constants� We willlump all symbolic identi�ers together as constants too� but we will distinguishthe punctuation symbols� left and right brackets� commas and semicolons� Non�punctuation symbols are collected together into the longest strings possible� sosymbolic identi�ers need not consist only of one character�

type token � Variable of string� Constant of string� Punct of string��

The lexer therefore looks like this�


let lex �let several p � many �some p� inlet collect�h�t� � h��itlist �prefix �� t �� inlet upper�alpha s � �A� �� s " s �� Z� or s � ��and lower�alpha s � �a� �� s " s �� z� or �� s " s �� and punct s � s � �� or s � �� or s � � � or s � �!�

or s � �� or s � ��and space s � s � � � or s � �&n� or s � �&t� inlet alphanumeric s � upper�alpha s or lower�alpha s inlet symbolic s � not space s " not alphanumeric s " not punct s inlet rawvariable �some upper�alpha �� several alphanumeric �Variable o collect�

and rawconstant ��some lower�alpha �� several alphanumeric ��some symbolic �� several symbolic� �Constant o collect�

and rawpunct � some punct Punct inlet token ��rawvariable �� rawconstant �� rawpunct� ��several space fst in

let tokens � �several space �� many token� snd inlet alltokens � �tokens �� finished� fst infst o alltokens o explode��

For example�

lex �add�X�Y�Z� �� X is Y�Z�� token list � Constant �add�� Punct �� Variable �X�� Punct ��Variable �Y�� Punct �� Variable �Z�� Punct ��Constant �� Variable �X�� Constant �is�� Variable �Y��Constant �� Variable �Z�� Punct ��!

�� Parsing

The basic parser is pretty much the same as before� the printer is exactly thesame� The main modi�cation to the printer is that we allow Prolog lists to bewritten in a more convenient notation� Prolog has �� and �nil corresponding toML s �� and �� and we set up the parser so that lists can be written muchas in ML� e�g� �� We also allow the Prolog notation ��H#T� instead of�cons�H�T� � typically used for pattern�matching� After the basic functions�

let variable �fun �Variable s��rest� � s�rest� � � raise Noparse��

let constant �fun �Constant s��rest� � s�rest� � � raise Noparse��


we have a parser for terms and also for Prolog rules� of these forms�

term�

term �� term� � � � � term�

The parsers are as follows�

let rec atom input� �constant �� a �Punct �� termlist �� a �Punct ��

�fun ��name��args�� Fn�name�args�� constant

�fun s � Fn�s� !�� variable

�fun s � Var s�� a �Punct �� term �� a �Punct ��

�snd o fst�� a �Punct � �� list


� �term �� a �Punct �� termlist �fun ��h��t� � h��t�

�� term �fun h � h!�� input

and list input� �term �� a �Constant �� term �� a �Punct �!��

�snd o fst�� a �Punct �� list

snd�� a �Punct �!��

�K �Fn�� !�� !�� fun �h�t� � Fn�� h� t!��

�� a �Punct �!�� K �Fn�� !�� !�� input

and rule input� �term �� a �Punct ��

�K !�� a �Constant �� term ��

many �a �Punct �� term snd� ��a �Punct ��

�fun ��h��t�� h��t�� input��

let parse�term � fst o �term �� finished fst� o lex��

let parse�rules � fst o �many rule �� finished fst� o lex��

�� Uni�cation

Prolog uses a set of rules to solve a current goal by trying to match one of the rulesagainst the goal� A rule consisting of a single term can solve a goal immediately�


In the case of a rule term �� term�� termn�� if the goal matches term� thenProlog needs to solve each termi as a subgoal in order to �nish the original goal�

However� goals and rules do not have to be exactly the same� Instead� Prologassigns variables in both the goal and the rule to make them match up� a processknown as uni�cation� This means that we can end up proving a special case ofthe original goal� e�g� P fX�� instead of P Y �� For example�

� To unify fgX�� Y � and fga�� X� we can set X � a and Y � a� Thenboth terms are fga�� a��

� To unify fa�X� Y � and fX� a� Z� we can set X � a and Y � Z� and thenboth terms are fa� a� Z��

� It is impossible to unify fX� and X�

In general� uni�ers are not unique� For example� in the second example wecould choose to set Y � fb� and Z � fb�� However one can always choose onethat is most general� i�e� any other uni�cation can be reached from it by furtherinstantiation compare most general types in ML�� In order to �nd it� roughlyspeaking� one need only descend the two terms recursively in parallel� and on�nding a variable on either side� assigns it to whatever the term on the otherside is� One also needs to check that the variable hasn t already been assignedto something else� and that it doesn t occur in the term being assigned to it asin the last example above�� A simple implementation of this idea follows� Wemaintain an association list giving instantiations already made� and we look eachvariable up to see if it is already assigned before proceeding� We use the existinginstantiations as an accumulator�

let rec unify tm� tm� insts �match tm� withVar�x� �

�try let tm�� assoc x insts inunify tm�� tm� insts

with Not�found �augment �x�tm�� insts�

� Fn�f��args�� match tm� withVar�y� ��try let tm�� assoc y insts in

unify tm� tm�� instswith Not�found �

augment �y�tm�� insts�� Fn�f��args��

if f� � f�then itlist� unify args� args� instselse raise �error �functions do not match��


where the instantiation lists need to be augmented with some care to avoid theso�called �occurs check problem� We must disallow instantiation of X to a non�trivial term involving X� as in the third example above� Most real Prologs ignorethis� either for claimed� e�ciency reasons or in order to allow weird cyclic datas�tructures instead of simple �rst order terms�

let rec occurs�in x �fun �Var y� � x � y� �Fn��args�� exists �occurs�in x� args��

let rec subst insts � fun�Var y� � �try assoc y insts with Not�found � tm�

� �Fn�f�args�� Fn�f�map �subst insts� args��

let raw�augment �let augment� theta �x�s� �let s� � subst theta s inif occurs�in x s " not�s � Var�x��then raise �error �Occurs check��else �x�s�� in

fun p insts � p��map �augment� p!� insts��

let augment �v�t� insts �let t� � subst insts t in match t� withVar�w� � if w �� v then

if w � v then instselse raw�augment �v�t�� insts

else raw�augment �w�Var�v�� insts� � � if occurs�in v t�

then raise �error �Occurs check��else raw�augment �v�t�� insts��

�� Backtracking

Prolog proceeds by depth��rst search� but it may backtrack� even if a rule uni��es successfully� if all the remaining goals cannot be solved under the resultinginstantiations� then another initial rule is tried� Thus we consider the whole listof goals rather than one at a time� to give the right control strategy�


let rec first f �fun ! � raise �error �No rules applicable�� h��t� � try f h with error � � first f t��

let rec expand n rules insts goals �first �fun rule �if goals � ! then insts elselet conc�asms �

rename�rule �string�of�int n� rule inlet insts� � unify conc �hd goals� insts inlet local�global � partition

�fun �v�� occurs�in v conc orexists �occurs�in v� asms� insts� in

let goals� � �map �subst local� asms� '�tl goals� in

expand �n � �� rules global goals�� rules��

Here we use a function �rename to generate fresh variables for the rules eachtime�

let rec rename s �fun �Var v� � Var��(��v�s�� Fn�f�args�� Fn�f�map �rename s� args��

let rename�rule s �conc�asms� ��rename s conc�map �rename s� asms��

Finally� we can package everything together in a function prolog that triesto solve a given goal using the given rules�

type outcome � No � Yes of �string � term� list��

let prolog rules goal �try let insts � expand rules ! goal! in

Yes�filter �fun �v�� occurs�in v goal�insts�

with error � � No��

This says either that the goal cannot be solved� or that it can be solved withthe given instantiations� Note that we only return one answer in this latter case�but this is easy to change if desired�

�� Examples

We can use the Prolog interpreter just written to try some simple examples fromProlog textbooks� For example�


let rules � parse�rules�male�albert��male�edward��female�alice��female�victoria��parents�edward�victoria�albert��parents�alice�victoria�albert��sister�of�X�Y� ��female�X��parents�X�M�F��parents�Y�M�F��

rules � �term � term list� list � %male�albert�%� !� %male�edward�%� !�%female�alice�%� !� %female�victoria�%� !�%parents�edward�victoria�albert�%� !�%parents�alice�victoria�albert�%� !�%sister�of�X�Y�%� %female�X�%� %parents�X�M�F�%� %parents�Y�M�F�%!!

prolog rules ��sister�of�alice�edward�� outcome � Yes ! prolog rules �parse�term �sister�of�alice�X�� outcome � Yes �X�� %edward%! prolog rules �parse�term �sister�of�X�Y�� outcome � Yes �Y�� %edward%� �X�� %alice%!

The following are similar to some elementary ML list operations� Since Prologis relational rather than functional� it is possible to use Prolog queries in a more�exible way� e�g� ask what arguments would give a certain result� rather thanvice versa�

let r � parse�rules�append� !�L�L��append� H�T!�L� H�A!� �� append�T�L�A��

r � �term � term list� list � %append� !�L�L�%� !�%append�H � T�L�H � A�%� %append�T�L�A�%!!

prolog r �parse�term �append� ��!� �!� ��!�� outcome � Yes ! prolog r �parse�term �append� ��!� ��!�X�� outcome � Yes �X�� %� � �� !��%! prolog r �parse�term �append� ��!�X�X�� outcome � No prolog r �parse�term �append� ��!�X�Y�� outcome � Yes �Y�� %� � �� X�%!

In such cases� Prolog seems to be showing an impressive degree of intelligence�However under the surface it is just using a simple search strategy� and this caneasily be thwarted� For example� the following loops inde�nitely�

prolog r �parse�term �append�X� ��!�X��


�� Theorem proving

Prolog is acting as a simple theorem prover� using a database of logical factsthe rules� in order to prove a goal� However it is rather limited in the factsit can prove� partly because its depth��rst search strategy is incomplete� andpartly because it can only make logical deductions in certain patterns� It ispossible to make Prolog�like systems that are more powerful� e�g� see Stickel�� In what follows� we will just show how to build a more capable theoremprover using essentially similar technology� including the uni�cation code and anidentical backtracking strategy�

In particular� uni�cation is an e�ective way of deciding how to specializeuniversally quanti�ed variables� For example� given the facts that X� pX� �qX� and pfa�� we can unify the two expressions involving p and thus discoverthat we need to set X to fa�� By contrast� the very earliest theorem proverstried all possible terms built up from the available constants and functions the�Herbrand base ��

Usually� depth��rst search would go into an in�nite loop� so we need to modifythe Prolog strategy slightly� We will use depth �rst iterative deepening� Thismeans that the search depth has a hard limit� and attempts to exceed it causebacktracking� However� if no proof is found at a given depth� the bound isincreased and another attempt is made� Thus� �rst one searches for proofs ofdepth �� and if that fails� searches for one of depth �� then depth � and so on�For �depth one can use various parameters� e�g� the height or overall size of thesearch tree� we will use the number of uni�able variables introduced�

Manipulating formulas

We will simply use our standard �rst order terms to denote formulas� introducingnew constants for the logical operators� Many of these are written in�x�

Operator Meaning'p� not pp " q p and qp # q p or qp �� q p implies qp �� q p if and only if qforallX�p� for all X� pexistsX�p� there exists an X such that p

An alternative would be to introduce a separate type of formulas� but thiswould require separate parsing and printing support� We will avoid this� for thesake of simplicity�


Preprocessing formulas

It s convenient if the main part of the prover need not cope with implicationsand �if and only if s� Therefore we �rst de�ne a function that eliminates these infavour of the other connectives�

let rec proc tm �match tm withFn��(�� t!� � Fn��(�� proc t!�

� Fn��"�� t�� t�!� � Fn��"�� proc t�� proc t�!�� Fn�� t�� t�!� � Fn�� proc t�� proc t�!�� Fn�� t�� t�!� �

proc �Fn�� Fn��(�� t�!�� t�!�� Fn�� t�� t�!� �

proc �Fn��"�� Fn�� t�� t�!��Fn�� t�� t�!�!��

� Fn��forall�� x� t!� � Fn��forall�� x� proc t!�� Fn��exists�� x� t!� � Fn��exists�� x� proc t!�� t � t��

The next step is to push the negations down the formula� putting it into so�called �negation normal form NNF�� We de�ne two mutually recursive functionsthat create NNF for a formula� and its negation�

let rec nnf�p tm �match tm withFn��(�� t!� � nnf�n t

� Fn��"�� t�� t�!� � Fn��"�� nnf�p t�� nnf�p t�!�� Fn�� t�� t�!� � Fn�� nnf�p t�� nnf�p t�!�� Fn��forall�� x� t!� � Fn��forall�� x� nnf�p t!�� Fn��exists�� x� t!� � Fn��exists�� x� nnf�p t!�� t � t

and nnf�n tm �match tm withFn��(�� t!� � nnf�p t

� Fn��"�� t�� t�!� � Fn�� nnf�n t�� nnf�n t�!�� Fn�� t�� t�!� � Fn��"�� nnf�n t�� nnf�n t�!�� Fn��forall�� x� t!� � Fn��exists�� x� nnf�n t!�� Fn��exists�� x� t!� � Fn��forall�� x� nnf�n t!�� t � Fn��(�� t!��

We will convert the negation of the input formula into negation normal form�and the main prover will then try to derive a contradiction from it� This willsu�ce to prove the original formula�

The main prover

At each stage� the prover has a current formula� a list of formulas yet to beconsidered� and a list of literals� It is trying to reach a contradiction� Thefollowing strategy is employed�


� If the current formula is �p " q � then consider �p and �q separately� i�e�make �p the current formula and add �q to the formulas to be considered�

� If the current formula is �p # q � then try to get a contradiction from �p and then one from �q �

� If the current formula is �forallX�p� � invent a new variable to replace �X �the right value can be discovered later by uni�cation�

� If the current formula is �existsX�p� � invent a new constant to replace �X �

� Otherwise� the formula must be a literal� so try to unify it with a contra�dictory literal�

� If this fails� add it to the list of literals� and proceed with the next formula�

We desire a similar backtracking strategy to Prolog� only if the current in�stantiation allow all remaining goals to be solved do we accept it� We could uselists again� but instead we use continuations� A continuation is a function thatis passed to another function and can be called from within it to �perform therest of the computation � In our case� it takes a list of instantiations and tries tosolve the remaining goals under these instantiations� Thus� rather than explicitlytrying to solve all remaining goals� we simply try calling the continuation�

let rec prove fm unexp pl nl n cont i �if n � then raise �error �No proof�� elsematch fm withFn��"�� p�q!� �

prove p �q��unexp� pl nl n cont i� Fn�� p�q!� �

prove p unexp pl nl n�prove q unexp pl nl n cont� i

� Fn��forall�� Var x� p!� �let v � mkvar�� inprove �subst x�Var v! p� �unexp' fm!� pl nl �n � �� cont i

� Fn��exists�� Var x� p!� �let v � mkvar�� inprove �subst x�Fn�v� !�! p� unexp pl nl �n � �� cont i

� Fn��(�� t!� ��try first �fun t� � let i� � unify t t� i in

cont i�� plwith error � �

prove �hd unexp� �tl unexp� pl �t��nl� n cont i�� t �

�try first �fun t� � let i� � unify t t� i incont i�� nl

with error � �prove �hd unexp� �tl unexp� �t��pl� nl n cont i��


We set up the �nal prover as follows�

let prover �let rec prove�iter n t �try let insts � prove t ! ! ! n I ! in

let globinsts � filter�fun �v�� occurs�in v t� insts in

n�globinstswith error � � prove�iter �n � �� t in

fun t � prove�iter �nnf�n�proc�parse�term t��

This implements the iterative deepening strategy� It tries to �nd the proofwith the fewest universal instantiations� if it succeeds� it returns the number re�quired and any toplevel instantiations� throwing away instantiations for internallycreated variables�

Examples

Here are some simple examples from Pelletier ��

let P� � prover �p �� q �� (�q� �� (�p��P� � int � �string � term� list � � ! let P�� prover �p � q " r �� p � q� " �p � r��P�� int � �string � term� list � � ! let P�� prover ��p �� q� � �q �� p��P�� int � �string � term� list � � ! let P�� prover �exists�Y�forall�X�p�Y��p�X��P�� int � �string � term� list � �� ! let P�� prover �exists�X�forall�Y�forall�Z�

�p�Y��q�Z��p�X��q�X��P�� int � �string � term� list � �� !

A bigger example is the following�

let P�� prover�lives�agatha� " lives�butler� " lives�charles� "�killed�agatha�agatha� � killed�butler�agatha� �killed�charles�agatha�� "

�forall�X�forall�Y�killed�X�Y� �� hates�X�Y� " (�richer�X�Y�� "

�forall�X�hates�agatha�X�� (�hates�charles�X�� "

�hates�agatha�agatha� " hates�agatha�charles�� "�forall�X�lives�X� " (�richer�X�agatha��

�� hates�butler�X�� "�forall�X�hates�agatha�X� �� hates�butler�X�� "�forall�X�(�hates�X�agatha�� (�hates�X�butler��

� (�hates�X�charles�� killed�agatha�agatha��

P�� int � �string � term� list � �� !


In fact� the prover can work out �whodunit �

let P�� prover�lives�agatha� " lives�butler� " lives�charles� "�killed�agatha�agatha� � killed�butler�agatha� �killed�charles�agatha�� "�forall�X�forall�Y�

killed�X�Y� �� hates�X�Y� " (�richer�X�Y�� "�forall�X�hates�agatha�X�

�� (�hates�charles�X�� "�hates�agatha�agatha� " hates�agatha�charles�� "�forall�X�lives�X� " (�richer�X�agatha��

�� hates�butler�X�� "�forall�X�hates�agatha�X� �� hates�butler�X�� "�forall�X�(�hates�X�agatha�� (�hates�X�butler��

� (�hates�X�charles�� killed�X�agatha��

P�� int � �string � term� list � �� X�� %agatha%!

Further reading

Symbolic di�erentiation is a classic application of functional languages� Othersymbolic operations are major research issues in their own right � Davenport�Siret� and Tournier �� give a readable overview of what computer algebrasystems can do and how they work� Paulson �� discusses the kind of simpli��cation strategy we use here in a much more general setting�

Parsing with higher order functions is another popular example� It seems tohave existed in the functional programming folklore for a long time� an earlytreatment is given by Burge �� Our presentation has been in�uenced by thetreatments in Paulson �� and Reade ��

The �rst de�nition of �computable real number in our sense was in the orig�inal paper of Turing �� His approach was based on decimal expansions�but he needed to modify it Turing �� for reasons explored in the exercisesbelow� The approach to real arithmetic given here follows the work of Boehm�Cartwright� O Donnel� and Riggle �� More recently a high�performanceimplementation in CAML Light has been written by M,enissier�Morain ��whose thesis in French� contains detailed proofs of correctness for all the al�gorithms for elementary transcendental functions� For an alternative approachusing �linear fractional transformations � see Potts ��

Our approach to Prolog search and backtracking� either using lists or contin�uations� is fairly standard� For more on implementing Prolog� see for exampleBoizumault �� while for more on the actual use of the language� the classictext by Clocksin and Mellish �� is recommended� A detailed discussion ofcontinuations in their many guises is given by Reynolds �� The uni�cationalgorithm given here is simple and purely functional� but rather ine�cient� For


faster imperative algorithms� see Martelli and Montanari �� Our theorem

prover is based on leanTAP Beckert and Posegga �� For another importanttheorem proving method conveniently based on Prolog�style search� look at theProlog Technology Theorem Prover Stickel ��

Exercises

�� Modify the printer so that each operator has an associativity� used whenprinting iterated cases of the operator to avoid excessive parentheses�

�� The di�erentiation rules for inverses ��gx� and quotients fx��gx� arenot valid when gx� � �� Several others for functions like ln are also trueonly under certain conditions� Actually most commercial computer algebrasystems ignore this fact� However� try to do better� by writing a new versionof the di�erentiator that not only produces the derivative but also a list ofconditions that must be assumed for the derivative to be valid�

�� Try programming a simple procedure for inde�nite integration� In general�it will not be able to do every integral� but try to make it understand a fewbasic principles��

�� Read the documentation for the format library� and try to implement aprinter that behaves better when the output is longer than one line�

�� What happens if the parser functions like term� atom etc� are all eta�reducedby deleting the word input� What happens if precedence is consistentlyeta�reduced by deleting input�

�� How well do precedence parsers generated by precedence treat distinctoperators with the same precedence� How can the behaviour be improved�Can you extend it to allow a mix of left and right associative operators�For example� one really wants subtraction to associate to the left�

�� Rewrite the parser so that it gives helpful error messages when parsing fails�

� Try representing a real number by the function that generates the �rst ndigits in a decimal expansion� Can you implement the addition function�

�� How can you retain a positional representation while making the basicoperations computable�

�In fact there are algorithms that can integrate all algebraic expressions �not involving sin ln etc� � see Davenport ��


�� Implement multiprecision integer arithmetic operations yourself� Make yourmultiplication algorithm Onlog�� where n is the number of digits in thearguments�

�� Using memo as we have de�ned it� is it the case that whatever series ofarguments a function is applied to� it will always give the same answer forthe same argument� That is� are the functions true mathematical functionsdespite the internal use of references�

�� Write a parser�evaluator to read real�valued expressions and create the func�tional representation for the answer�

�� Extend the real arithmetic routines to some functions like exp and sin�

�� Our preprocessing and negation normal form process can take exponentialtime in the length of the input in the case of many �if and only if s� Modifyit so that it is linear in this number�

�� Extend the initial transformation into negation normal form so that it givesSkolem normal form�

�� Implement a more e�cient uni�cation algorithm� e�g� following Martelliand Montanari ��

�� Implement a version of the Prolog Technology Theorem Prover Stickel��

�Skolem normal form is covered in most elementary logic books e�g� Enderton ��

Bibliography

Aagaard� M� and Leeser� M� �� Verifying a logic synthesis tool in Nuprl� Acase study in software veri�cation� In v� Bochmann� G� and Probst� D� K�eds�� Computer Aided Veri�cation� Proceedings of the Fourth InternationalWorkshop� CAV�� Volume �� of Lecture Notes in Computer Science� Mon�treal� Canada� pp� ��-�� Springer Verlag�

Abramsky� S� �� The lazy lambda�calculus� In Turner� D� A� ed�� ResearchTopics in Functional Programming� Year of Programming series� pp� ��-��Addison�Wesley�

Adel son�Vel skii� G� M� and Landis� E� M� �� An algorithm for the organi�zation of information� Soviet Mathematics Doklady � �� -��

Backus� J� �� Can programming be liberated from the von Neumann style�A functional style and its algebra of programs� Communications of the ACM �� -��

Barendregt� H� P� �� The Lambda Calculus� Its Syntax and Semantics� Vol�ume �� of Studies in Logic and the Foundations of Mathematics� North�Holland�

Barwise� J� �� Mathematical proofs of computer correctness� Notices of theAmerican Mathematical Society � �� -��

Beckert� B� and Posegga� J� �� leanTAP � Lean� tableau�based deduction� Jour�nal of Automated Reasoning � �� -�� Also available on the Web fromhttp��i�www�ira�uka�de�'posegga�LeanTaP�ps�Z�

Boehm� H� J�� Cartwright� R�� O Donnel� M� J�� and Riggle� M� �� Exact realarithmetic� a case study in higher order programming� In Conference Record ofthe �� ACM Symposium on LISP and Functional Programming� pp� ��-��Association for Computing Machinery�

Boizumault� P� �� The implementation of Prolog� Princeton series incomputer science� Princeton University Press� Translated from �Prolog�l implantation by A� M� Djamboulian and J� Fattouh�

��

BIBLIOGRAPHY ��

Boyer� R� S� and Moore� J� S� �� A Computational Logic� ACM MonographSeries� Academic Press�

Burge� W� H� �� Recursive Programming Techniques� Addison�Wesley�

Church� A� �� An unsolvable problem of elementary number�theory� Ameri�can Journal of Mathematics� � ��-��

Church� A� �� A formulation of the Simple Theory of Types� Journal ofSymbolic Logic� � ��-��

Church� A� �� The calculi of lambda�conversion� Volume � of Annals of Math�ematics Studies� Princeton University Press�

Clocksin� W� F� and Mellish� C� S� �� Programming in Prolog �rd ed��Springer�Verlag�

Curry� H� B� �� Grundlagen der Kombinatorischen Logik� American Journalof Mathematics� �� -�� -��

Davenport� J� H� �� On the integration of algebraic functions� Volume �� ofLecture Notes in Computer Science� Springer�Verlag�

Davenport� J� H�� Siret� Y�� and Tournier� E� �� Computer algebra� systemsand algorithms for algebraic computation� Academic Press�

Davis� M� D�� Sigal� R�� and Weyuker� E� J� �� Computability� complexity� andlanguages� fundamentals of theoretical computer science �nd ed�� AcademicPress�

de Bruijn� N� G� �� Lambda calculus notation with nameless dummies� a toolfor automatic formula manipulation� with application to the Church�Rossertheorem� Indagationes Mathematicae� �� -��

DeMillo� R�� Lipton� R�� and Perlis� A� �� Social processes and proofs oftheorems and programs� Communications of the ACM � �� -��

Dijkstra� E� W� �� A Discipline of Programming� Prentice�Hall�

Enderton� H� B� �� A Mathematical Introduction to Logic� Academic Press�

Frege� G� �� Grundgesetze der Arithmetik begri�sschrift abgeleitet� Jena� Par�tial English translation by Montgomery Furth in �The basic laws of arithmetic�Exposition of the system � University of California Press� ��

Girard� J��Y�� Lafont� Y�� and Taylor� P� �� Proofs and Types� Volume �of Cambridge Tracts in Theoretical Computer Science� Cambridge UniversityPress�

�� BIBLIOGRAPHY

Gordon� A� D� �� Functional Programming and Input�Output� DistinguishedDissertations in Computer Science� Cambridge University Press�

Gordon� M� J� C� �� Programming Language Theory and its Implementa�tion� applicative and imperative paradigms� Prentice�Hall International Seriesin Computer Science� Prentice�Hall�

Gordon� M� J� C�� Milner� R�� and Wadsworth� C� P� �� Edinburgh LCF� AMechanised Logic of Computation� Volume � of Lecture Notes in ComputerScience� Springer�Verlag�

Henson� M� C� �� Elements of functional languages� Blackwell Scienti�c�

Hindley� J� R� and Seldin� J� P� �� Introduction to Combinators and ��Calculus� Volume � of London Mathematical Society Student Texts� CambridgeUniversity Press�

Hudak� P� �� Conception� evolution� and application of functional program�ming languages� ACM Computing Surveys� �� -��

Huet� G� �� Con�uent reductions� abstract properties and applications toterm rewriting systems� Journal of the ACM � �� -��

Kleene� S� C� �� A theory of positive integers in formal logic� AmericanJournal of Mathematics� �� -�� -��

Lagarias� J� �� The �x " � problem and its generalizations� TheAmerican Mathematical Monthly � �� -�� Available on the Web ashttp��www�cecm�sfu�ca�organics�papers�lagarias�index�html�

Landin� P� J� �� The next �� programming languages� Communications ofthe ACM � � ��-��

Lindemann� F� �� &Uber die Zahl �� Mathematische Annalen� �� -��

Mairson� H� G� �� Deciding ML typability is complete for deterministic expo�nential time� In Conference Record of the Seventeenth Annual ACM Symposiumon Principles of Programming Languages �POPL�� San Francisco� pp� ��-��Association for Computing Machinery�

Mairson� H� G� �� Outline of a proof theory of parametricity� In Hughes� J�ed�� ACM Symposium on Functional Programming and Computer Archi�tecture� Volume �� of Lecture Notes in Computer Science� Harvard University�pp� ��-��

Martelli� A� and Montanari� U� �� An e�cient uni�cation algorithm� ACMTransactions on Programming Languages and Systems� �� -��

BIBLIOGRAPHY ��

Mauny� M� �� Functional programming using CAML Light� Available on theWeb from http��pauillac�inria�fr�caml�tutorial�index�html�

M,enissier�Morain� V� �� Arithm�etique exacte� conception� algorithmiqueet performances dune impl�ementation informatique en pr�ecision arbitraire�Th.ese� Universit,e Paris ��

Michie� D� �� #Memo$ functions and machine learning� Nature� �� -��

Milner� R� �� A theory of type polymorphism in programming� Journal ofComputer and Systems Sciences� �� -��

Mycroft� A� �� Abstract interpretation and optimising transformations ofapplicative programs� Technical report CST�� Computer Science Depart�ment� Edinburgh University� King s Buildings� May�eld Road� Edinburgh EH��JZ� UK�

Neumann� P� G� �� Computer�related risks� Addison�Wesley�

Oppen� D� �� Prettyprinting� ACM Transactions on Programming Languagesand Systems� �� -��

Paulson� L� C� �� A higher�order implementation of rewriting� Science ofComputer Programming � �� -��

Paulson� L� C� �� ML for the Working Programmer� Cambridge UniversityPress�

Pelletier� F� J� �� Seventy��ve problems for testing automatic theoremprovers� Journal of Automated Reasoning � �� -�� Errata� JAR � ��-��

Peterson� I� �� Fatal Defect � Chasing Killer Computer Bugs� Arrow�

Potts� P� �� Computable real arithmetic using linear fractional trans�formations� Unpublished draft for PhD thesis� available on the Web ashttp��theory�doc�ic�ac�uk�'pjp�pub�phd�draft�ps�gz�

Raphael� B� �� The structure of programming languages� Communicationsof the ACM � � ��-��

Reade� C� �� Elements of Functional Programming� Addison�Wesley�

Reynolds� J� C� �� The discoveries of continuations� Lisp and Symbolic Com�putation� �� -��

�� BIBLIOGRAPHY

Robinson� J� A� �� Logic� computers� Turing and von Neumann� In Furukawa�K�� Michie� D�� and Muggleton� S� eds�� Machine Intelligence �� pp� �-��Clarendon Press�

Sch&on�nkel� M� �� &Uber die Bausteine der mathematischen Logik� Mathe�matische Annalen� �� -�� English translation� �On the building blocksof mathematical logic in van Heijenoort �� pp� ��-��

Schwichtenberg� H� �� De�nierbare Funktionen im ��Kalk&ul mit Typen�Arkhiv f�ur mathematische Logik und Grundlagenforschung � �� -��

Stickel� M� E� �� A Prolog Technology Theorem Prover� Implementation byan extended Prolog compiler� Journal of Automated Reasoning � �� -��

Turing� A� M� �� On computable numbers� with an application to theEntscheidungsproblem� Proceedings of the London Mathematical Society �� -��

Turing� A� M� �� Correction to Turing �� Proceedings of the LondonMathematical Society �� -��

van Heijenoort� J� ed�� From Frege to G�odel� A Source Book in Mathe�matical Logic �� Harvard University Press�

Whitehead� A� N� �� An Introduction to Mathematics� Williams and Norgate�

Whitehead� A� N� and Russell� B� �� Principia Mathematica �� vols�� Cam�bridge University Press�

Winskel� G� �� The formal semantics of programming languages� an intro�duction� Foundations of computing� MIT Press�

Wittgenstein� L� �� Tractatus Logico�Philosophicus� Routledge ) KeganPaul�

Wright� A� �� Polymorphism for imperative languages without imperativetypes� Technical Report TR�� Rice University�

Introduction to Functional Programming. John Harrison

Documents

Transcript of Introduction to Functional Programming. John Harrison