The theory of the foundations of mathematics - 1870 to 1940ft-sipil.unila.ac.id/dbooks/The history...

The theory of the foundations of mathematics- 1870 to 1940 -

Mark Scheffer(Version 1.0)

3

.

Mark Scheffer, id. 415968, e-mail: [email protected]. Last changes:March 22, 2002. This report is part of a practical component of the Com-puting Science study at the Eindhoven University of Technology.

4

To work on the foundations of mathematics, two things are needed:Love and Blood.

- Anonymous quote, 2001.

Contents

1 Introduction 9

2 Cantor’s paradise 132.1 The beginning of set-theory . . . . . . . . . . . . . . . . . . . 132.2 Basic concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3 Mathematical constructs in set-theory 213.1 Some mathematical concepts . . . . . . . . . . . . . . . . . . . 213.2 Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233.3 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293.4 Induction Methods . . . . . . . . . . . . . . . . . . . . . . . . 32

3.4.1 Induction . . . . . . . . . . . . . . . . . . . . . . . . . 323.4.2 Deduction . . . . . . . . . . . . . . . . . . . . . . . . . 333.4.3 The principle of induction . . . . . . . . . . . . . . . . 34

3.5 Real numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . 453.5.1 Dedekind’s cuts . . . . . . . . . . . . . . . . . . . . . . 463.5.2 Cantor’s chains of segments . . . . . . . . . . . . . . . 473.5.3 Cauchy-sequences . . . . . . . . . . . . . . . . . . . . . 483.5.4 Properties of the three definitions . . . . . . . . . . . . 50

3.6 Infinite sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513.7 The Continuum Hypothesis . . . . . . . . . . . . . . . . . . . 603.8 Cardinal and Ordinal numbers and Paradoxes . . . . . . . . . 63

3.8.1 Cardinal numbers and Cantor’s Paradox . . . . . . . . 633.8.2 Ordinal numbers and Burali-Forti’s Paradox . . . . . . 65

4 Peano and Frege 714.1 Peano’s arithmetic . . . . . . . . . . . . . . . . . . . . . . . . 714.2 Frege’s work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

5

6 CONTENTS

5 Russell 795.1 Russell’s paradox . . . . . . . . . . . . . . . . . . . . . . . . . 825.2 Consequences and philosophies . . . . . . . . . . . . . . . . . 885.3 Zermelo Fraenkel . . . . . . . . . . . . . . . . . . . . . . . . . 92

5.3.1 Axiomatic set theory . . . . . . . . . . . . . . . . . . . 925.3.2 Zermelo Fraenkel (ZF) Axioms . . . . . . . . . . . . . 93

6 Hilbert 996.1 Hilbert’s proof theory . . . . . . . . . . . . . . . . . . . . . . . 1016.2 Hilbert’s 23 problems . . . . . . . . . . . . . . . . . . . . . . . 110

7 Types 1137.1 Russell and Whitehead’s Principia Mathematica . . . . . . . . 1137.2 Ramsey, Hilbert and Ackermann . . . . . . . . . . . . . . . . . 1197.3 Quine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

8 Godel 1238.1 Informally: Godel’s incompleteness theorems . . . . . . . . . . 1238.2 Formally: Godel’s Incompleteness Theorems . . . . . . . . . . 127

8.2.1 On formally undecidable propositions . . . . . . . . . . 1278.2.2 The impossibility of an ‘internal’ proof of consistency . 1308.2.3 Godel numbering and a concrete proof of G1, G2 and G3131

8.3 Godel’s theorem and Peano Arithmetic . . . . . . . . . . . . . 1328.4 Consequences . . . . . . . . . . . . . . . . . . . . . . . . . . . 1348.5 Neumann-Bernays-Godel axioms . . . . . . . . . . . . . . . . . 135

9 Church and Turing 1419.1 Turing and Turing Machine . . . . . . . . . . . . . . . . . . . 1419.2 Church and the Lambda Calculus . . . . . . . . . . . . . . . . 1539.3 The Church-Turing thesis . . . . . . . . . . . . . . . . . . . . 166

10 Conclusion 169

A Timeline and Images 181

CONTENTS 7

Mathematical Notations

Many different notations have been developed for set theory and logic.Most notations that we have used are standard today; other notations thatwe have used are introduced in the text.

Mathematical Logic

symbol meaning also described as

∧ conjuction and∨ disjunction (inclusive) or¬ negation notϕ(x) propositional function→ implication if . . . then↔ bi-implication if and only if, iff≡ equivalence is equivalent to∀ universal quantifier for all∃ existential quantifier exists∃! one-element existential quantifier exists a unique

In most places we have chosen to use the following notation1 to denotequantifications:

(relation : range : term)denotes the relationship over a set of terms ranging over range

Consider a general pattern (Q x : ϕ(x0, . . . , xn) : t(x0, . . . , xn)), with Qa quantifier, ϕ a boolean expression in terms of the dummies x0, . . ., xn,and t(x0, . . . , xn) the term of the quantification. The quantification is theaccumulation of values t(x0, . . . , xn) using an operator or relation indicatedby Q, over all values (x0, . . . , xn) for which ϕ(x0, . . . , xn) holds.

1Notation originally due to E.W. Dijkstra.

8 CONTENTS

This notation is suitable for formal manipulation and unambiguous in thesense that it explicitly indicates the quantifier Q, the dummies and the rangeof the dummies that is indicated by the boolean expression ϕ (i.e. it exactlydetermines the domain of the quantification). This allows us to reason aboutgeneral properties of quantifications, in a way in which the (scopes of the)bound variables are clearly identified. Note that this type of quantificationis only suitable for binary operations that are symmetric and associative.

Example:

(∑

x : 0 ≤ x ≤ 5 : x2)

=

02 + 12 + 22 + 32 + 42 + 52

=

5∑x=0

x2

Example:

(∃x : x ∈ N : x3 − x2 = 18)

≡

‘there exists a natural number x such that x3 − x2 = 18’

If the term ranges over all possible values of the variable (here : x), or ifit is clear what the range of a variable is, we can omit it.

Example:

(∀x : true : x ∈ A→ x ∈ B)

≡

(∀x :: x ∈ A→ x ∈ B)

≡

‘all elements of A are also elements of B’

Chapter 1

Introduction

Pure mathematics is, in its way, the poetry of logical ideas.

- Albert Einstein

This report covers the most important developments and theory of thefoundations of mathematics in the period of 1870 to 1940. The tale of thefoundations is fairly familiar in general terms and for its philosophical con-tent; here the main emphasis is laid on the mathematical theory. The historyof the foundations of mathematics is complicated and is a many-sided story;with this article I do not aim to give a definitive or complete version, butto capture what I consider the essence of the theoretical developments, andto present them in a clear and modern setting. Some basic mathematicalknowledge on set-theory and logics are presupposed.

By the middle of the nineteenth century, certain logical problems (forexample paradoxes around the notions of infinity, the infinitesimal and con-tinuity) at the heart of mathematics had inspired a movement, led by Germanmathematicians, to provide mathematics with more rigorous foundations.

This is where the theory of this report begins, with the emergence of settheory by the German mathematician Cantor. In section 2.1 we informallydescribe how work on a problem concerning trigonometric series graduallyled Cantor to his theory of sets (section 2.2). As a result of the work ofWeierstrass, Dedekind and Cantor, pure mathematics had been providedwith much more sophisticated foundations. The notion of infinitesimal hadbeen banished, ‘real’ numbers had been provided with a logically consistent

9

10 CHAPTER 1. INTRODUCTION

definition (section 3.5), continuity had been redefined and, more controver-sially, a whole new branch of arithmetic had been invented which addresseditself to the problems (e.g. paradoxes) of infinity (sections 3.6, 3.7).In 1895 Cantor discovered a paradox (section 3.8.1) that he did not publishbut communicated to Hilbert in 1896. In 1897 it was rediscovered in a slightlydifferent form by Burali-Forti (section 3.8.2). Cantor and Burali-Forti couldnot resolve this paradox, but it was not taken so seriously, partly becausethe paradoxes appeared in a rather technical region.

The Italian mathematician Peano (section 4.1) was able to show that thewhole of arithmetic could be founded upon a system that uses three basicnotions and five initial axioms. At the same time the German mathematicianFrege (section 4.2) worked on developing a logical basis for mathematics. Justas Peano, Frege wanted to put mathematics on firm grounds. But Frege’sgrounds were strictly logic; he followed a development later called logicism,also known as the development of so-called mathematical logic.

The British mathematician Russell noted Peano’s work and later thatof Frege. Soon thereafter he showed (section 5.1) how finite descriptionslike ‘set of all sets’ could be self-contradictory (i.e. paradoxical) and pointedout the difficulties that arose with self-referential terms. This paradox thatRussell found existed not only in specific technical regions but in all of theaxiomatic systems underlying mathematics at the same time (section 5.1).But since the paradoxes could be avoided in most practical applications ofset theory, the belief in set theory as a proper foundation of mathematicsremained. Axiomatic set theory (section 5.3.1) was an attempt to come toa theory without paradoxes. Various responses to the paradox (section 5.2)led to new sets of axioms for set theory. The two main approaches are by theGerman mathematicians Zermelo and Fraenkel (section 5.3), and by the Hun-garian von Neumann, the Hungarian-Austrian Godel and the Briton Bernays(section 8.5). It also led to the emergence of the ‘intuitionistic’ philosophy ofmathematics by the Dutch mathematician Brouwer (not covered here) andto a theory of types, proposed by Russell himself with the help of his for-mer teacher, the English mathematician Whitehead. Despite of the paradoxRussell and Whitehead still claimed that all mathematics could be foundedon a mathematical logic; this believe was given a definite presentation intheir work ‘Principia Mathematica’ (section 7.1). Various consequences fol-lowed (section 7.3) and new conceptions of logic arose (by Wittgenstein and

11

Ramsey, see section 7.2).

At the turn of the century, the German mathematician David Hilbertlisted certain important problems concerning the foundations of mathema-tics and mathematics in general (section 6.2. To overcome paradoxes andother problems that arose in existing systems, Hilbert developed a theory ofaxiomatic systems (section 6.1). He then stimulated his student Zermelo inusing this axiomatic method to develop as first a set of axioms for set theory(section 5.3.2). Hilbert had since then made more precise demands on anyproposed set of axioms for mathematics (section 6.1) in terms of consistency,completeness and decidability.

In 1931 Godel had shown that consistency and completeness could notboth be attained (chapter 8). Godel’s work left outstanding Hilbert’s ques-tion of decidability. The English mathematician Turing proved in 1936 thatthere are undecidable problems, by giving the so-called halting problem thatcannot be solved by any algorithm (section 9.1), after formalizing the no-tion of algorithm with his concept of the Turing Machine. The Americanmathematician Church (independently) obtained the same result but withanother formalization of the notion of an algorithm, using his computationalmodel of lambda calculus (section 9.2). In section 9.3 we state that these twonotions are equivalent and correspond to the intuitive notion of algorithm orcomputability. In chapter 10 I summarize the theory of the foundations ofmathematics, before giving my own opinion and make some suggestions forfuture work.

This article is part of the practical component of my study of computingscience, and written for a large part in 8 weeks at the Heriot-Watt universityin Edinburgh under supervision of prof. F. Kamareddine. I want to thankRob Nederpelt and the formal methods section of the computing science de-partment of the Eindhoven University of Technology for making this possible.Rob Nederpelt always inspired me to continue working on this report and waspatient in explaining difficult proofs to me. And last but not least, I wantto thank Fairouz Kamareddine for her support and positive motivation, andBoukje Nouwen (as she breathes a sigh of relief that this is (I think) the lastrevision) for the typesetting and editing of large parts of this document andfor helping me in many ways to finish this article in such a small period oftime.

12 CHAPTER 1. INTRODUCTION

Chapter 2

Cantor’s paradise

2.1 The beginning of set-theory

Perhaps the most surprising thing about mathematics is that itis so surprising. The rules which we make up at the beginningseem ordinary and inevitable, but it is impossible to foresee theirconsequences. These have only been found out by long study, ex-tending over many centuries. Much of our knowledge is due to acomparatively few great mathematicians such as Newton, Euler,Gauss, or Riemann; few careers can have been more satisfyingthan theirs. They have contributed something to human thoughteven more lasting than great literature, since it is independent oflanguage.

- Titchmarsh, E. C. in [88]

By the late 19th century the discussions about the foundations of geometryhad become the focus for a running debate about the nature of the branchesof mathematics ([23, last paragraph of section 35, page 69/70]). Althoughthere had been no conscious plan leading in that direction, the stage was setfor a consideration of questions about the fundamental nature of mathema-tics.

In the study of logic, the work of the English mathematician George Boolein the 1850s ([49, chapter 2.S4, page 51]), and the American Charles Sanders

13

14 CHAPTER 2. CANTOR’S PARADISE

Peirce around 1880 ([49, page 187]), had contributed to the development of asymbolism to explore logical deductions and in Germany the logician GottlobFrege (see [98]) had directed keen attention to fundamental questions.

All of these debates came together through the pioneering work of theGerman mathematician Georg Cantor on the concept of a set. Cantor hadbegun work in this area because of his interest in Riemann’s theory of trigono-metric series.

In Germany at the university of Halle, the direction of Cantor’s researchturned away from number theory and towards analysis. This was due toHeine, one of his senior colleagues at Halle, who challenged Cantor to provethe open problem on the uniqueness of representation of a function as atrigonometric series (see [30, section 5.2, page 182]). Starting from the workon trigonometric series and on the function of a complex variable done bythe German mathematician Bernhard Riemann (see [75]) in 1854, Cantor in1870 showed ([30, page 182]) that such a function can be represented in onlyone way by a trigonometric series. Consideration of the collection of numbers(originally termed ‘point sets’, see [30, section 5.2, page 184]) that would notconflict with such a representation led him, first, in 1872, to define irrationalnumbers in terms of convergent sequences of rational numbers (or quotientsof integers, see section 3.5.2) and then to begin his major lifework, the theoryof sets and the concept of transfinite numbers.

2.2. BASIC CONCEPTS 15

2.2 Basic concepts

The essence of mathematics lies in its freedom.

- Georg Cantor, quoted in [58]

In 1974 Cantor published his first article on set-theory. A set, wrote Can-tor (in ‘Untersuchungen uber die Grundlagen der Mengenlehre I’, publishedin [20, page 261-281]), is “a collection of definite, distinguishable objects ofperception or thought conceived as a whole”. In this report we use a similardescription of the concept of a set.

What is a set? A (finite or infinite) collection of objects, that is consideredas a single, abstract object.

A set is sometimes also called aggregate, class or (as it was first called byRiemann (see [31, page 88]) and later by the mathematician Russell:) mani-fold . The objects are also called elements or members of the set.

We denote a set of elements between brackets ‘’, ’’, and membership ofan element to a set by the membership relation ∈.

Example: If we consider a set that contains natural numbers, we write 4 ∈2, 3, 4, 5 to indicate that 4 is an element of the set 2, 3, 4, 5. We write4 ∈ 7, 8, 9 to indicate that 4 is not an element of the set 7, 8, 9.

In a mathematical context we mostly consider sets of numbers and functions.We denote the well-known sets of natural numbers by N (this set is also calledthe naturals), the integers by Z, the fractional numbers by Q (this set is alsocalled the rationals) and the reals by R (this set is also called the continuum).The objects of a set themselves can also be sets.

What is set theory? A branch of mathematics that deals with the proper-ties of well-defined collections of objects, which may be of a mathematicalnature, such as numbers or functions, or not.


Cantor defined ([49, page 288]) two sets A and B to be identical (equal),notation A = B, if and only if A and B have the same elements. When laterset-theory was axiomatized, this definition became also known as the

Axiom of extensionality: A = B := (∀x :: (x ∈ A↔ x ∈ B))

Example: 3, 3, 7 = 7, 3 and 2, 3, 4 = 2, 3, 4

The relation ‘is a subset of’, notation ⊆, indicates that one set is con-tained in the other:

Definition of subset: A ⊆ B := (∀x :: x ∈ A→ x ∈ B)

Definition of proper subset: A ⊂ B := (A ⊂ B ↔ A ⊆ B ∧ A = B)

We often want to create a new set from a given set by selecting elementsthat have certain properties. For example we take the set of powers of threeor the set of all even numbers (to be exact: the set containing those ele-ments of the set of natural numbers that have the property to be divisibleby 2). This principle was used by Cantor, and we also call it the unrestrictedor naive comprehension principle because it later (see sections 3.8 and 5.1)turned out to be untenable.

Comprehension principle: For all properties ϕ there is precisely one set,denoted by x | ϕ(x), whose elements are exactly those objects which havethe property ϕ.

We thus have that y ∈ x | ϕ(x) ↔ ϕ(y). As a consequence (by takingfor all x, ϕ(x) = false), there is at least one set that has no elements: theempty set , denoted by ∅.

Theorem: (∃!x :: (∀y :: y /∈ x))Proof: If we take ϕ to be false, the comprehension principle says that ‘thereis precisely one set whose elements are exactly those objects which have theproperty false’. In mathematical notation: (∃!x :: (∀y :: y ∈ x ↔ false)).This is equivalent to saying there is no element y that can be a member ofx: (∃!x :: (∀y :: y /∈ x)). From now on, we denote this unique set x by ∅ andcall it the empty set.


Corollary: (∀a :: ∅ ⊆ a)Proof: We want to prove that (∀a :: ∅ ⊆ a) or, using the definition of thesubset relation: (∀x :: x ∈ ∅ → x ∈ a). From the previous theorem we knowthat (∀y :: y /∈ ∅). This yields us (∀x :: false → x ∈ a), which is true.

Using the comprehension principle we can create new sets from given sets.So now we can introduce some operations on sets, by applying the compre-hension principle. But before we do that, we first introduce some general(i.e. regardless whether the operations are set-theoretic or not) propertiesof operations: idempotence, commutativity, associativity and distributivity.Although Cantor did not formulate these properties as such, they are usedin the branch of calculus and useful in the set theory that follows in thischapter.Suppose ⊕ and are binary1 operations on a certain domain and E, F andG are elements on that domain (for example sets), on which we have definedthe equality relation ‘=’.

Definition of idempotence:⊕ is idempotent := (∀E :: E ⊕ E = E)

Definition of commutativity:⊕ is commutative := (∀E,F :: E ⊕ F = F ⊕ E)

Definition of associativity:⊕ is associative := (∀E,F,G :: (E ⊕ F )⊕G = E ⊕ (F ⊕G))

Definition of distributivity:⊕ is distributive2 over := (∀E,F,G :: E ⊕ (F G) = (E ⊕ F ) (E ⊕G))

1These properties can also be generated for operations of arbitrary arity, but this willnot be necessary for our discussion.

2This form of distributivity is also called left-distributivity, as opposed to right-distributivity.⊕ is right-distributive over := (∀E,F,G :: (E F )⊕G = (E ⊕G) (F ⊕G))In ordinary mathematics this distinction is often left out for commutative operations, andwe for example simply say that × is distributive over + (when in fact it is both left- andright-distributive).


The symbol ∪ is employed to denote the union of two sets. Thus, the setA ∪ B is defined as the set that consists of all elements belonging either toset A or set B.

Definition of union: A ∪B := x | x ∈ A ∨ x ∈ B

The intersection operation is denoted by the symbol ∩. A ∩ B is definedas the set composed of all elements that belong to both A and B.

Definition of intersection: A ∩B := x | x ∈ A ∧ x ∈ B

Any two sets the intersection of which is the empty set are said to be dis-joint . A collection of sets is called (pairwise) disjoint or mutually exclusiveif any two distinct sets in it are disjoint.

Example: The operations union and intersection on sets are both idempo-tent, commutative and associative.

The difference of sets B and A, denoted B − A, contains those elementsof B, that are not in A.

Definition of difference: B − A := x | x ∈ B ∧ x /∈ A

If A ⊆ B we often call the difference B−A the relative complement of Ain B. We then call B the universe, and if it is clear what the universe is weoften denote the relative complement of A by Ac. From the definitions thatwe have introduced so far, we can deduce three properties that are known asthe laws of reciprocity. The second and third law are also known as the lawsof de Morgan, named after the English mathematician Augustus de Morgan:

First law of reciprocity: A ⊆ B ↔ AC ⊇ BC

Second law of reciprocity: (A ∪B)C = AC ∩ BC

Third law of reciprocity: (A ∩B)C = AC ∪ BC

We define the power set of V , denoted by P(V ), as the set of all subsetsof V . Note that if V = ∅, this operation creates a larger set from a given setV .


Definition of powerset: P(V ) := A | A ⊆ V

Given a set V , we thus have that (∀y :: y ∈ P(V )↔ y ⊆ V )

We can extend the union of a pair of sets to any finite collection of sets;the union is then defined as the set of all objects which belong to at leastone set in the collection A. We can do the same for the intersection.

Definition:⋃

A := x | (∃y :: y ∈ A ∧ x ∈ y)Definition:

⋂A := x | (∀y :: y ∈ A→ x ∈ y)

We can divide a set of objects into a partition, that is a family of subsetsthat are mutually exclusive and jointly exhaustive. Assume P is a set ofsubsets of X.

Definition of partition: P is a partition of X :=X =

⋃A | A ∈ P ∧ (∀A,B : A,B ∈ P : A = B ∨ A ∩B = ∅)

In this chapter I have made extensive use of [30] in section 2.1 and [17]in section 2.2.

Chapter 3

Mathematical constructs inset-theory

3.1 Some mathematical concepts

The mathematician is entirely free, within the limits of his imagi-nation, to construct what world he pleases. What he is to imagineis a matter for his own caprice; he is not thereby discovering thefundamental principles of the universe nor becoming acquaintedwith the ideas of God. If he can find, in experience, sets of entitieswhich obey the same logical scheme as his mathematical entities,then he has applied his mathematics to the external world; he hascreated a branch of science.

- J.W.N. Sullivan in Aspects of Science, 1925

Now that we have this apparatus of set-theory available, we will see thatit is not just a separate branch of mathematics, but that we can define somebasic mathematical constructs in set-theory. In this section we will considerpairs and the cartesian product, necessary before we can treat relations (insection 3.2) and functions (in section 3.3).

First we consider the mathematical concept of an ordered pair < a, b >.Compared to a ‘normal’ pair, where two pairs are considered equal if theyhave the same elements, we want an ordered pair to also have the property

21

22 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SET-THEORY

that the elements appear in the same order:

(∀c, d :: < a, b > = < c, d >↔ a = c ∧ b = d)

We can now easily verify that the following definition (see [17, chapter8]) in set-theory satisfies the desired property.

Definition of ordered pair1: < a, b > := a, a, b

As the cartesian product A × B is by definition the set of all orderedpairs < a, b > with a ∈ A and b ∈ B, we can now use the same definition inset-theory:

Definition of cartesian product: A×B := < a, b > | a ∈ A ∧ b ∈ B

Let V = Vi | i ∈ I be a set of sets. We now define the cartesian productof a set of sets, denoted by ×V or ×i∈IVi. The definition uses the concept ofa function, that will be introduced on page 29.

Definition of cartesian product of a set of sets:×V := f : I →

⋃i∈I Vi | (∀i : i ∈ I : f(i) ∈ Vi)

1Representation originally by Kuratowski, see [49, page 294].

3.2. RELATIONS 23

3.2 Relations

Mathematicians do not study objects, but relations between ob-jects. Thus, they are free to replace some objects by others aslong as the relations remain unchanged. Content to them is irre-levant: they are interested in form only.

- J.H. Poincare

In mathematics, a relation maps each element from an input set (calleddomain) to either true or false. We formalize this notion in set-theory.

Definition of binary relation:R is a binary relation between X and Y := R ⊆ X × Y

Note: We can easily generalize this definition for n-ary relations: R is ann-ary relation on X1, . . . Xn := R ⊆ X1, X2 × . . .×Xn, for n ∈ N. We call nthe arity of the function.

Example: We have already seen the definitions of the subset and proper sub-set relations in section 2.1. There we defined the set R ⊆ X×Y implicitly byusing a statement; only those pairs < x, y > are in R for which the statementholds (here we are using in fact the comprehension principle of page 16). Wewill continue to use statements to define relations.

We define the following shorthand notation (sometimes also written ininfix notation as xRy): R(x, y) := < x, y > ∈ R.

The mathematical expression ‘x < y’ is now equivalent to the set theoreticexpression ‘< x, y >∈ R’, with R representing the ‘less than’ relation.

Example: The relation < on the naturals (i.e. between N and N) can bedefined as:

< 0, 1 >,< 1, 2 >,< 2, 3 >, . . .< 0, 2 >,< 1, 3 >,< 2, 4 >, . . .< 0, 3 >,< 1, 4 >,< 2, 5 >, . . ....


On a relation R we can define the concepts of domain and range.

Definition of domain, range:dom(R) := x ∈ X | (∃y : y ∈ Y : R(x, y))ran(R) := y ∈ Y | (∃x : x ∈ X : R(x, y))

If we define the identity relation of X, we want it to have the usual pro-perty that idX(x) = x for all x ∈ X (see for example [3, section 1.9.5.b, page30]). In set-theory, we denote the identity relation on V by IV .

Definition of identity relation: IV := < x, y >∈ V × V | x = y

Assume R is a binary relation on a set X (i.e. R ⊆ X ×X). As we didfor operations in section 2.2, we can also define some general properties ofrelations. Note that we have already defined an equality relation ‘=’ on X atpage 16. Hereby we can explicitly state on which domain the property holds(e.g. R is reflexive on X) or leave this implicit (e.g. simply R is reflexive).

Definition of reflexivity:R is reflexive := (∀x : x ∈ X : R(x, x))

Definition of symmetry:R is symmetric := (∀x, y : x, y ∈ X : R(x, y)→ R(y, x))

Definition of anti-symmetry:R is anti-symmetric := (∀x, y : x, y ∈ X : R(x, y) ∧R(y, x)→ x = y)

Definition of transitivity:R is transitive := (∀x, y, z : x, y, z ∈ X : R(x, y) ∧R(y, z)→ R(x, z))

Definition of connectivity:R is connective := (∀x, y : x, y ∈ X : R(x, y) ∨ (x = y) ∨R(y, x))

Definition of equivalence:R is an equivalence relation := R is reflexive, symmetric and transitive

3.2. RELATIONS 25

Note: Asymmetric means not symmetric, and is not the same as anti-symmetric.

Example: The subset relation is reflexive, anti-symmetric (note that the proofof anti-symmetry uses the axiom of extensionality of page 16) and transitive,but not connective.

If R is an equivalence relation on a set X, we denote the equivalence classof x with respect to R as [x]R.

Definition of equivalence class: [x]R := y ∈ X | R(x, y)

If R is an equivalence relation on X, the quotient set X/R of X moduloR is the set of equivalence classes [x]R for all x ∈ X.

Definition of quotient set: X/R := [x]R | x ∈ X

We now continue to build on the concept of relations, by categorizingthem based on the properties they have. An important property of relationsis the ability to compare and order elements. Suppose X and Y are sets, andR is a relation on X.

Definition of (weak) partial ordering: R is a (weak) partial ordering :=R is reflexive, anti-symmetric and transitive (on X)

Definition of quasi ordering: R is a quasi ordering := R is irreflexive andtransitive

Definition of strict partial ordering: R is a strict partial ordering :=R is irreflexive, anti-symmetric and transitive

Definition of (total or linear) ordering: R is a (total or linear) ordering:= R is irreflexive, anti-symmetric, transitive and connective

Definition of well-ordering: R is a well-ordering := R is an ordering onX and each nonempty subset of X has a least element


Definition of well-foundedness: A set V is well-founded by a relation R:= S is partially ordered by R and contains no infinite descending chains

A set S contains a set C that is an infinite descending chain iffC ⊂ S ∧ C has no minimal element.

Theorem: (without proof) Any subset of a well-founded set is also well-founded.

Now we can speak of a set of which the elements are ordered by a relationR, we define the well-known concepts of (immediate) successor and prede-cessor.

Definition of (immediate) predecessor: An element x1 ∈ X is a pre-decessor of an element x2 ∈ X (with respect to an ordering R on X) :=R(x1, x2) ∧ ¬R(x2, x1). x1 is an immediate predecessor of x2 if in addition(¬∃x3 : x3 ∈ X ∧ x3 = x1 ∧ x3 = x2 : R(x1, x3) ∧R(x3, x2))

Definition of (immediate) successor: An element x2 ∈ X is a suc-cessor of an element x1 ∈ X (with respect to an ordering R on X) :=R(x1, x2) ∧ ¬R(x2, x1). x2 is an immediate successor of x1 if in addition(¬∃x3 : x3 ∈ X ∧ x3 = x1 ∧ x3 = x2 : R(x1, x3) ∧R(x3, x2))

Note that with these definitions it can be easily proved that if a relationR on X is an ordering, then each element except the smallest has a uniqueimmediate predecessor and each element except the largest has a uniqueimmediate successor. The notions of smallest and largest elements will beintroduced hereafter. In the literature the immediate successor or predeces-sor is sometimes called just successor or predecessor. Sometimes we also seethat the term ‘direct’ is used in stead of ‘immediate’, or we simply speak ofthe ‘next’ or ‘previous’ value.

When R is a partial ordering we often denote it by the symbol , andwhen it is a quasi ordering by ≺. Now we can distinguish elements based ontheir order. Let X be a set, partially ordered by and let Y be a subset of X.

Definition of minimal element:x is a minimal element of X := x ∈ X ∧ (¬∃y : y ∈ X ∧ y = x : y x)

3.2. RELATIONS 27

Definition of maximum element:x is a maximum element of X := x ∈ X ∧ (¬∃y : y ∈ X ∧ y = x : x y)

Definition of least element:x is a least (also called smallest or first) element of X :=x ∈ X ∧ (∀y : y ∈ X : x y)

Definition of maximal element:x is a maximal (also called greatest, largest, last) element of X :=x ∈ X ∧ (∀y : y ∈ X : y x)

Definition of lowerbound:x is a lowerbound for Y in X := x ∈ X ∧ (∀y : y ∈ Y : x y)

Definition of upperbound:x is an upperbound for Y in X := x ∈ X ∧ (∀y : y ∈ Y : y x)

Definition of infimum:x is an infimum for Y in X := x is the greatest lowerbound for Y in X

Definition of supremum:x is a supremum for Y in X := x is the smallest upperbound for Y in X

Example: Let X = 4, 6, 12, 24, 36 and R(x, y) := x is a divisor of y. ThenR is a partial order (but not strict) and also a quasi order, but not a (total)order. 4 and 6 are minimal elements of X, but X has no least element. 1 isa lowerbound for X, and 2 is the infimum of X.


The so-called least number principle says that any non-empty subset ofthe natural numbers has a least element. This principle can be shown (aproof can be found in [59, page 7]) to be equivalent to the principles of weakand strong induction, that will be introduced in section 3.4.

Example: The relation < on the naturals is an example of a total orderingon N. From the so-called least number principle we can conclude that N isalso well-ordered by <. We prove the latter.

Proof: We know that < is an ordering on N. We show by induction on thenumber of elements of A, notation | A |, that (∀A : A ⊆ N ∧ A = ∅ : A hasa least element).Suppose N = 0, . . . , n, n ∈ N. Let A ⊆ N . For | N | = 0 it is trivial thatA is well-ordered. For | N | = n + 1, if A ∩ 0, . . . , n = ∅, n + 1 is a leastelement of A. If A ∩ 0, . . . , n = ∅, we can apply the induction principleto conclude that A ∩ 0, . . . , n has a least element. The least element ofA ∩ 0, . . . , n is also a least element of A ∩ 0, . . . , n + 1.

3.3. FUNCTIONS 29

3.3 Functions

In mathematics, a function maps each element from an input set to one ormore elements of an output set; in other words it is a special kind of relationthat indicates for each pair < x, y > of the input and output set if it belongsto the function or not. More precisely, f is a function or mapping from Xto Y means that f assigns to each x ∈ X a uniquely determined y ∈ Y , no-tation f(x) = y. We can define this notion in set-theory by using a relationbetween X and Y such that for each x ∈ X there is a unique y ∈ Y suchthat < x, y >∈ f .

Definition of function: f is a function from a set X to a set Y , notationf : X → Y := f ⊆ X × Y ∧ (∀x : x ∈ X : (∃!y : y ∈ Y : < x, y >∈ f))

The definitions of domain and range as given in the subsection aboutrelations can now also be used for functions. We now introduce a notationfor the set of all functions f : X → Y .

Definition of Y X: Y X := f ∈ P(X × Y ) | f is a function from X to Y

As we did before for relations and operations, we now define some generalproperties for functions.

Definition of injective: f : X → Y is injective or an injection :=(∀x1, x2 : x1, x2 ∈ X : x1 = x2 → f(x1) = f(x2))

Definition of surjective: f : X → Y is surjective or a surjection :=(∀y : y ∈ Y : (∃x : x ∈ X : y = f(x))

Definition of bijective:f : X → Y is bijective or a bijection := f is surjective and f is injective

If f is bijective, f is also called a (one-to-one) correspondence betweenX and Y .

Example: We have the following property:f : X → Y is surjective ↔ Ran(f) = Y .


Example: f : N→ [−2π, 2π], with f(x) = sin(x) is a function and a relation.g : [−2π, 2π]→ N, with g(x) = y iff x = sin(y) is a relation, not a function.

We will now consider two special kinds of functions: the identity functionand the sequence.

Definition of sequence:s is a sequence of X := s is a function from N to X (i.e. s ∈ XN)

Definition of identity function:The identity function idX := idX : X → X and (∀x : x ∈ X : idX(x) = x)

We now introduce some operations on functions in set-theory. We caneasily check that these definitions correspond to mathematical operations.

Definition of composition: The composition gf of two functions f : A→B and g : B → C := the function g f : A→ C with g f(x) = g(f(x)), forall x ∈ A

Definition of inverse function: The inverse of a bijection f : X → Y :=the function f−1 : Y → X with (∀y : y ∈ Y : f−1(y) = x↔ y = f(x))

Definition of restricted function: The restriction of a function f :X → Y to X0, with X0 ⊆ X := the function fX0 : X0 → Y with(∀x : x ∈ X0 : fX0 (x) = f(x))

Just as in algebra, we can now combine a set and relations on that setinto a structure.

Definition of (relational) structure: 〈X,R0, . . . , Rp〉 is a (relational)structure := X is a set and R0, . . . , Rp are relations on X

The concept of a structure enables us to abstract from the exact set andrelations, and reason about sets of structures instead. There also is a usefuldefinition for equivalence of structures, called isomorphism.

3.3. FUNCTIONS 31

Let R = 〈X,R0, . . . , Rp〉 and S = 〈Y, S0, . . . , Sp〉 be two structures, suchthat (∀i : 0 ≤ i ≤ p : the arity of Ri and Si is ni + 1).

Definition of isomorphism: f is an isomorphism between R and S := fis a bijection from X to Y and (∀i : 0 ≤ i ≤ p : (∀x0, . . . , xni

: x0, . . . , xni∈

X : Ri(x0, . . . , xni)↔ Si(f(x0), . . . , , f(xni

))))

With the notion of isomorphism, we can now abstract over structures.When two structures are similar (the sets are of the same size and the rela-tionships between the elements in one structure are retained between imagesof those elements in the other structure), we call them isomorphic.

Definition of isomorphic: Two structures R and S are isomorphic, nota-tion R S := there exists an isomorphism from R to S

Definition of automorphism:f is an automorphism of R := f is an isomorphism from R to R

Example: An isomorphism from structure 〈N, <〉 to 〈Neven, < 〉 is given byf : N → Neven, with f(n) = 2n. f is not an isomorphism from 〈N,⊕〉 to〈N, < 〉, with a⊕ b := b divides a.

Example: The function g : R+ → R+ with g(x) = log(x) is an isomorphismbetween 〈R+, ∗〉 and 〈R+, +〉, because for all r1, r2 ∈ R+, log(r1 ∗ r2) =log(r1) + log(r2).

Example: An automorphism of 〈A,R0, . . . , Rp〉 is the identity function idA :A → A, so idA = < a, a > | a ∈ A. Also, the function f(x) = 2x3 is anautomorphism of 〈R, <〉.


3.4 Induction Methods

There is a tradition of opposition between adherents of inductionand deduction. In my view it would be just as sensible for the twoends of a worm to quarrel.

- A. Whitehead, quoted in [76]

3.4.1 Induction

Induction is a method of reasoning from a part to a whole, from particu-lars to generals, or from the individual to the universal. It should not beconfused with the mathematical principle of induction (treated in section3.4.3). In ordinary induction we examine a certain number of cases andthen generalize. Reasoning by analogy, where a conclusion is made based onan analogues situation, is also a primitive form of induction (see [23, page 6]).

Example of inductive reasoning: 2

Coffee shop burger no. 1 was greasy . . .Coffee shop burger no. 2 was greasy . . . . . ....Coffee shop burger no. 100 was greasy . . .Therefore, all coffee shop burgers are greasy (or: the next coffee shop burgerwill be greasy).

So in induction the conclusion contains information that was not con-tained in the premisses. This is the source of uncertainty in inductions:inductions are strengthened as confirming instances pile up, but they cannever bring certainty (unless every possible cause is actually examined, inwhich case they become deductions). As said in [49, page 366], the broaddifference between deductive and inductive reasoning is that in deductionthe conclusion asserts less than the premisses, whereas in induction it assertsmore. In chapter 14, section 3 of [49] there is a more detailed treatment ofinductive reasoning, including a distinguishment between determinative andconceptual induction. In both these kinds of induction, the conclusion goesbeyond the premisses (or the evidence).

2Example from: Peter Suber, Philosophy department, Earlham College.

3.4. INDUCTION METHODS 33

3.4.2 Deduction

Mathematics, in its widest significance, is the development of alltypes of formal, necessary, deductive reasoning.

- A. Whitehead, quoted in [100]

In contrast to induction, deduction is a method of reasoning that is basedon a rigorous proof: a derivation (using fixed rules called a system of logic), ofone statement (the conclusion) from one or more statements (the premisses)- i.e. a chain of statements, each of which is either a premise or a consequenceof a statement occurring earlier in the proof. In deductive reasoning, we arenot directly concerned with the truth of the conclusion but rather whetherthe conclusion does or does not follow from the premisses. If the conclusionfollows from the premisses, we say that our reasoning is valid ; if it does notwe say that our reasoning is invalid .

The Greek found deductive reasoning, not empirical procedures, the methodto establish mathematical facts. This usage is a generalization of what theGreek philosopher Aristotle called the syllogism (see [49, chapter 1, section5 and 6)]), but a syllogism is now recognized as merely a special case of adeduction. Also, the traditional view that deduction proceeds from the gene-ral to the specific has been abandoned as incorrect by most logicians. Someexperts regard all valid inferences as deductive in form and for this and otherreasons reject the supposed contrast between deduction and induction. TheGerman mathematician Hilbert greatly contributed to deductive reasoning aswe will see when we introduce his proof theory (also known as the axiomaticmethod) in chapter 6. Logic, in mathematical context, can be seen as thetheory of the formal structure of deductive reasoning. The logic of Hilbert’smetamathematics (see section 6.1) and Russell’s Principia Mathematica (seesection 7.1) are a form of reasoning with deductive certainty, although othershave proposed different formalizations of deductive logic (see [49, page 121]).Originally based on Aristotle’s logic, the deductive argument has becomemore subtle and complex and is now based on modern symbolic logic.


3.4.3 The principle of induction

Informal

The principle of induction, also known as mathematical induction, is animportant process for proving theorems. It was even used by Peano to definethe concept of natural numbers (see section 4.1, axiom 3). ‘Mathematicalinduction’ is unfortunately named, for it is unambiguously a form of deduc-tion. The name was probably inspired by the fact that, just like induction,it generalizes to a whole set from a smaller sample. But, as we will see,mathematical induction concludes with deductive certainty.

The informal structure of the proof of a theorem by mathematical induc-tion is fairly simple:

1) Basis . Prove that the theorem holds for a specific case (which often isminimal for a given ordering of the elements). This case is also calledbase case.

2) Induction step. Prove a rule that says that if the theorem holds for anarbitrary element, it is true for the next case. This often is a rule ofheredity that tells us that the theory is true for the immediate successorcase of an arbitrary element if it is true for the arbitrary element itself.The claim that the theorem is true for an arbitrary element is calledthe induction hypothesis .

3) Conclusion. Together, 1 and 2 imply that the theorem holds for allcases starting with the base case. If you didn’t use the minimal case instep 1, then you have proven only that the theorem holds for that caseand its successors, not for all possible cases.

The induction step can take two forms which correspond to two forms ofmathematical induction. Again we assume there is an ordering of the ele-ments with +1 the immediate successor relation.Weak: prove that if the theorem holds for an arbitrary element n, then itholds for the element n + 1Strong: prove that if the theorem holds for all elements up to some arbitraryelement n, then it holds for the element n + 1


We will now formally state the principle of induction. This is important,since many mistakes are being made in applying the principle. It does notgo without saying that if we are to use mathematical induction to prove thatsome theorem applies to ‘all possible cases’, then those cases must somehowbe enumerable and in some way linked to the integers. And we have to beable to speak about the minimal case, the nth case, the successor of a givencase, etc.

Formal

Suppose that we want to prove a property ϕ(s) that holds for all s ∈S. The induction principle assumes that S is a well-founded set and everyelement except for the smallest has an immediate predecessor. This conditionis also known as S is inductive. The structure of an inductive set in factresembles that of the naturals, i.e. if we have the axioms (see Peano axiomsin section 4.1) 0 is in N and if x is in N then x + 1 is in N, the set N isinductive. In case the set S is the naturals, we also refer to the principle asnatural induction.The principle presupposes the following two conditions:

AS is a set, well-founded by relation R (such that ‘+’ denotes the im-mediate successor of an element with respect to the relation R) andwith smallest element e

BEvery element except e has a (unique) immediate predecessor and ϕis a property of elements of S

If Aand Bhold, we can use the induction principle.

Definition of the (weak) (mathematical) induction principle:if

Cϕ(e) (i.e. e has a property ϕ)

D(∀s : s ∈ S : ϕ(s) → ϕ(s+)) (i.e. if s ∈ S has property ϕ, then the(unique) immediate successor of s also has property ϕ)

then the property ϕ holds for every element in S


Step Cis also called the base of a proof by induction, step Dis alsocalled the induction step, and ϕ(s) is called the induction hypothesis .

Proof: Suppose S is a well-founded set and every element except the small-est, denoted e, has an immediate predecessor, and suppose that a propertyϕ is true for e, as well as for the immediate successor s+ ∈ S if it is true fors ∈ S. We now prove by contradiction that ϕ holds for all s ∈ S. Supposethat ϕ is not true for all s ∈ S. Let N be the set of elements of S for whichϕ is not true, i.e. N = s ∈ S | ¬ϕ(s). By the theorem of page 26 we alsoknow that if S is well-founded, any subset of S is also well-founded, thus Ncontains a smallest element n. If n = e, we have a contradiction. If n > e, nhas an immediate predecessor, denoted n−. Since n is the smallest elementfor which ϕ doesn’t hold, ϕ must hold for n−. But then by D, ϕ must alsohold for the immediate successor of n−, that is n: contradiction. Thus ϕmust be true for all s ∈ S.

As we mentioned before, this principle can be generalized in several ways.One way is to prove in step Cthat ϕ holds for a (possibly non-minimal) caseb ∈ S. In step Dwe then show that (∀s : s ∈ S ∧ s ≥ b : ϕ(s) → ϕ(s+)).The conclusion then is that the property ϕ holds for all elements in S thatare ordered larger or equal to b.

We now show (with proof by contradiction) why the additional property Bthat every element except the smallest must have an immediate predecessoris necessary for the induction principle.Consider the natural numbers with the ordering defined as follows:

• if n and m are both even, then n m if n < m

• if n and m are both odd, then n m if n < m

• if n is even and m is odd, we always define n m

We can check that N is well-founded by , but not every element (forexample 1) has an immediate predecessor. We take the property ϕ that everyelement is even. The smallest element in the ordering is 0, which is even.Also, if s has property ϕ then so does the successor of s. That is becausein our ordering, the successor of an even number is always the next evennumber, never an odd number, and if s has property ϕ, then s must be even.


Therefore (with only conditions A, Cand Dholding) every natural num-ber is even: contradiction!

There is however a weaker principle, called transfinite induction which -suitably stated - does apply to every well-ordered set. But first we regard astronger principle, that is based on the same assumptions (Aand B) as theweak induction principle.

Principle of strong (mathematical) induction: The same as for (weak)induction, but instead of Cand Dwith

D2

) (∀x : x ∈ S : (∀y : y ∈ S : R(y, x) → ϕ(y)) → ϕ(x)) (i.e. for all x ∈ Swe have ϕ(x) if all R-predecessors y of x have property ϕ)

Sometimes this is also informally stated using the infamous three dots as(∀s : s ∈ S : (ϕ(e) ∧ ϕ(e+) ∧ . . . ∧ ϕ(s))→ ϕ(s+).

Proof: Suppose 〈X,R 〉 is a structure such that A, Band Ehold. Againwe use proof by contradiction, and assume (∃x : x ∈ X : ¬ϕ(x)). Thusx ∈ X | ¬ϕ(x) is non-empty and has a smallest element e′ (since 〈X,R 〉is well-founded). We now have ¬ϕ(e′) ∧ (∀z : z ∈ X : R(z, e′) → ϕ(z)).According to E(substitute z for y, X for S, and take e′ for x) we then haveϕ(e′): contradiction.

Note that the base case is not really left out, since it is implicitly presentin the quantification (take e for x). This form of induction, when appliedto ordinals (ordinals form a well-ordered and hence well-founded set and areintroduced in section 3.8.2) is called transfinite induction.

Principle of transfinite induction3: The same as for strong induction,but instead of Aand Bas assumptions, it can be applied to any set Sthat is well-ordered by relation a R, and with smallest element e.

3Sometimes this principle is called the Principle of Complete Induction, for example in[4], but this is less common.


An example of such a set are the ordinals or cardinals, or even the classof all ordinals. A proof by transfinite induction typically needs to distinguishthree cases:

1. s is a minimal element

2. s has an immediate predecessor (i.e. the set of elements which aresmaller than s has a largest element)In this case we can apply normal induction.

3. s has no immediate predecessor (i.e. s is a so-called limit-ordinal, seealso section 3.8.2)The case for limit ordinals is typically approached by noting that a limitordinal b is (by definition) the union of all ordinals a < b and using thisfact to prove ϕ(b) assuming that ϕ(a) holds true for all a < b.

Proof: The proof of the principle of transfinite induction is similar to theproof of the strong induction principle.

Clearly, all three given principle are equivalent, since we proved them tobe true. These proofs however are based on an underlying set of axioms (theso-called ZF axioms and the Peano axioms, that will be introduced in section5.3 and chapter 4 respectively). Without these conditions (to be exact, with-out Peano’s induction axiom), we cannot directly prove the principles to betrue from the ZF axioms alone4. In that case we can prove the equivalenceof the principles by showing that they imply each other. As an example,we now prove that (mathematical) induction is a special case of transfiniteinduction, for the set of natural numbers. To prove this it suffices to showthat ( Cand D) ↔ E.

4With only the fundamental axioms of Zermelo-Fraenkel set theory, it is not possible toprove mathematical induction. An extra axiom is needed, the infamous Axiom of Choice,or one of its equivalent forms. The four statements known as ‘Axiom of Choice’, ‘Zorn’sLemma’, ‘Well-Ordering principle’ (also known as well-ordering theorem, see page 3.8.2)and ‘Mathematical Induction Principle’ are all equivalent, meaning that if you assume oneof them to be true, the others follow as consequences, but none of them can be provenfrom the other fundamental axioms in ZF set theory alone. There are also other equivalentstatements that are sometimes used (such as Zermelo’s postulate), and it is a nice exerciseto prove the equivalence of these statements.


Normal induction (IND):

(∀ϕ :: ϕ(0) ∧ (∀k : k ∈ N : ϕ(k)→ ϕ(k + 1))→ (∀n : n ∈ N : ϕ(n)))

Transfinite induction (TFIND):

(∀ψ :: (∀q : q ∈ N : (∀p : p ∈ N : p < q → ψ(p))→ ψ(q))→ (∀m : m ∈ N : ψ(m)))

We can prove the equivalence of IND and TFIND in two ways: in a con-structive way or with a proof by contradiction. We give both proofs.

Proof by Contradiction: (from: [17])

It suffices to prove that IND’ ≡ TFIND’, with

IND’ ≡ (∀ϕ :: ϕ(0) ∧ (∀k : k ∈ N : ϕ(k)→ ϕ(k + 1)))

TFIND’ ≡ (∀ψ :: (∀q : q ∈ N : (∀p : p ∈ N : p < q → ψ(p))→ ψ(q)))

Proof of TFIND’ → IND’: Assume ϕ is a property. We assume TFIND’,and instantiate ψ with the property ϕ. We now want to prove IND’. If wetake q = 0, (∀p : p ∈ N : p < 0 → ϕ(p)) is trivially true. Thus we haveϕ(0). We now prove by contradiction that (∀k : k ∈ N : ϕ(k) → ϕ(k + 1)).Assume k ∈ N, ϕ(k) ∧ ¬ϕ(k + 1). That means the condition of TFIND’(∀p : p ∈ N : p < q → ϕ(p)), with q = k + 1 must not be true: ¬(∀p :p ∈ N : p < k + 1 → ϕ(p)), i.e. (∃p : p ∈ N : p < k + 1 ∧ ¬ ϕ(p)).Let s ∈ N be the smallest number such that s < k + 1 ∧ ¬ ϕ(s), that is(∀r : r ∈ N : r < s → ϕ(r)). But then we would have ϕ(s) according toTFIND’ (namely if we take s for q and r for p), contradiction. Now we haveproved that (∀ϕ :: (∀k : k ∈ N : ϕ(k) → ϕ(k + 1))), and since we alreadyhave proven (∀ϕ :: ϕ(0)), we have IND’.

Proof of IND’ → TFIND’: Assume IND’, instantiate ϕ with ψ. For allproperties ψ we have to prove (∀q : q ∈ N : (∀p : p ∈ N : p < q →ψ(p)) → ψ(q)). First we prove this for q = 0. If we take q = 0, we have(p < 0 → ψ(p)) → ψ(0), i.e. ψ(0). This is true by the assumption of IND’.Now we prove this for q > 0. Suppose we have (∀q : q ∈ N : (∀p : p ∈ N : p <q → ψ(p)). By IND’ we also know that (∀k : k ∈ N : ϕ(k)→ ϕ(k + 1)), andthus ϕ(q) also holds for all q > 0. Hereby we have proved TFIND’.


Constructive Proof:

Proof of TFIND → IND: Assume TFIND, and let ϕ be a property. Wenow need to prove that ϕ(0) ∧ (∀k : k ∈ N : ϕ(k) → ϕ(k + 1)) → (∀n :n ∈ N : ϕ(n)). Assume ϕ(0) ∧ (∀k : k ∈ N : ϕ(k) → ϕ(k + 1)). Wewant to use TFIND to conclude (∀n : n ∈ N : ϕ(n)). TFIND gives us:(∀k : k ∈ N : (∀l : l ∈ N : l < k → ϕ(l)) → ϕ(k)). Let k ∈ N. We now havethat (∀l : l ∈ N : l < k → ϕ(l))→ ϕ(k). If k = 0, (∀l : l ∈ N : l < k → ϕ(l))is trivially true since the range of l is empty. Thus ϕ(k) holds for k = 0.Assume k > 0, and (∀l : l ∈ N : l < k → ϕ(l)). This means ϕ(k − 1) holds(since k−1 ∈ N). But we have assumed that (∀k : k ∈ N : ϕ(k)→ ϕ(k+1)).Thus ϕ(k) holds also for k > 0.

Proof of IND → TFIND: Assume ψ is a property. Also assume that(i): (∀k : k ∈ N : (∀l : l ∈ N : l < k → ϕ(l)) → ϕ(k)). Let s(k) :=(∀l : l ∈ N : l < k → ϕ(l)), for all k ∈ N. We prove (∀n : n ∈ N : ϕ(n)) byfirst proving that (∀n : n ∈ N : s(n)) by using IND, and subsequently that(∀n : n ∈ N : s(n) → ϕ(n)). Clearly, s(0) holds trivially since the range ofl is empty in that case. Suppose s(k) holds. Since s(k + 1) ≡ s(k) ∧ ϕ(k),we can conclude s(k + 1) because ϕ(k) follows from (i) and the definition ofs(k). Now we have s(0) ∧ (∀k : k ∈ N : s(k)→ s(k + 1)), and thus (by usingIND) that (∀n : n ∈ N : s(n)). And, by the definition of s, (i) gives us that(∀n : n ∈ N : ϕ(n)).

Structural Induction

In many cases we do not want to prove properties about the integers orsimilar well-ordered sets. In such cases straight induction is not always useful.However, forms of induction can also be appropriate when trying to proveproperties about structures defined recursively. This generalized inductionprinciple is known as structural induction. It is useful when objects are builtup from more primitive objects: if we can show the primitive objects havethe desired property, and that the act of building preserves that property,then we have shown that all objects must have the property. The induc-tive hypothesis (i.e., the assumption) is to assume that something is true for‘simpler’ forms of an object and then prove that it holds for ‘more complex’forms. ‘Complexity’ can be defined in several ways: the most common wayis to say that one object is more complex than another if it includes that


other object as a subpart, but this need not always be the case.

A general treatment of recursively defined structures (formal definitionof structural induction over recursive datatypes) will be presented in a laterversion of this report.

Example: We show that mathematical induction is an instance of the generalnotion of structural induction over values of recursively defined types, in alater version of this report.

Example: As an example of the use of mathematical induction we prove thebinomial theorem. The binomial theorem states that for all x, y ∈ R, andn ∈ N we have

EQ ≡ (x + y)n =n∑

j=0

(n

j

)xn−j yj

We call the left-hand side of this equality LHS, and the right-hand sideRHS, and abbreviate the equality by EQ. We assume two real numbers xand y and prove EQ by induction on n.

Basis case: For n = 0 the EQ clearly is correct, since both sides are 1. Forsome reason, most textbooks take n = 1 as the basis, in which case LHS issimply x + y, and RHS is(

1

0

)x1−0y0 +

(1

1

)x1−1y1 = x + y

Induction case: We assume EQ is true for n = k and have to show that it isthen also true for n = k + 1 :

(x + y)k+1 =k+1∑j=0

(k + 1

j

)xk+1−j yj

First, we rewrite the left side of this equation:

LHS = (x + y)k+1 = (x + y)k (x + y) =

(here in fact we are using the induction hypothesis)(k∑

j=0

(k

j

)xk−j yj

)(x + y) =


k∑j=0

(k

j

)xk−j+1 yj +

k∑j=0

(k

j

)xk−j yj+1

In rewriting the right side of the equation, we use Pascal’s identity:

(∀k, n : k, n ∈ N ∧ 0 < k < n :

(n + 1

k

)=

(n

k − 1

)+

(n

k

))

We first prove the latter:(n

k − 1

)+

(n

k

)=

n!

(k − 1)!(n− k + 1)!+

n!

k!(n− k)!

=n! k

k!(n− k + 1)!+

n! (n− k + 1)

k! (n− k + 1)!=

n! (k + (n− k + 1))

k! (n− k + 1)!

=(n + 1)!

k! (n + 1− k)!=

(n + 1

k

)Now we rewrite RHS:

RHS =k+1∑j=0

(k + 1

j

)xk+1−j yj =

We split out the j = 0 and j = k + 1 terms before applying Pascal’sidentity.

xk+1 + yk+1 +k∑

j=1

(k + 1

j

)xk+1−j yj =

xk+1 + yk+1 +k∑

j=1

((k

j

)+

(k

j − 1

))xk+1−j yj =

xk+1 + yk+1 +k∑

j=1

(k

j

)xk+1−j yj +

k∑j=1

(k

j − 1

)xk+1−j yj

We can now bring xk+1 into the first sum (as the j = 0 term), and yk+1

into the second sum (as the j = k + 1 term). This gives

RHS =k∑

j=0

(k

j

)xk+1−j yj +

k+1∑j=1

(k

j − 1

)xk+1−j yj


and

LHS =k∑

j=0

(k

j

)xk−j+1 yj +

k∑j=0

(k

j

)xk−j yj+1

The first sums of LHS and RHS are the same, and we can see that thesecond sums are also equal, by doing a dummy transformation (let i = j−1):

k+1∑j=1

(k

j − 1

)xk+1−j yj =

k∑i=0

(k

i

)xk−i yi+1

So LHS = RHS, and we can conclude that EQ holds for all x, y ∈ R andn ∈ N.

Example: We give an example of a proof about binary trees using structuralinduction. First we define a data structure for binary trees. For this examplewe will use a definition in the notation of the language Z to describe recur-sive data structures. The structure of a binary tree is well known and saysthat a tree is either a leaf or made up of two subtrees glued together by a node.

TREE ::= leaf | node < TREE × TREE >

An example of such a tree is node(leaf, node(node(leaf, leaf), leaf)). Wenow define the size of a tree, by counting both the leaves and the nodes. Thebasic idea of the definition is that we define the size of a tree inductively overthe structure, saying how the size of a given tree is calculated from the sizesof its parts. Again we define the size in the language Z, by first declaring itstype and then saying how it is defined in each of the two cases:


size : TREE → N

∀ t1, t2 : TREE •size(leaf) = 1 ∧size(node(t1, t2)) = 1 + size(t1) + size(t2)

Similarly, we make two new definitions about trees:

leaves: TREE → N

nodes: TREE → N

∀ t1, t2 : TREE •leaves(leaf) = 1 ∧leaves(node(t1,t2)) = leaves(t1) + leaves(t2) ∧nodes(leaf) = 0 ∧nodes(node(t1,t2)) = 1 + nodes(t1) + nodes(t2)

We now want to prove the following theorem by structural induction on thesize of the tree t.

Theorem: For all trees t, size(t) = leaves(t) + nodes(t).

Proof: Let t, t′, t1 and t2 be of type TREE. We prove the theorem byinduction on the size of t.Base case: Assume t=leaf. Then size(t) = size(leaf) = 1. Also, leaves(t) +nodes(t) = leaves(t) + 0 = 1 + 0 = 1.Induction case: Assume t = node(t1, t2). The induction hypothesis says thatthe theorem holds for all t′ with size(t′) < size(t). Then size(t)= size(node(t1,t2)) = 1 + size(t1) + size(t2) = (apply induction hypothesis to t1 and t2) 1+ (leaves(t1) + nodes(t1)) + (leaves(t2) + nodes(t2)).And leaves(t) + nodes(t) = leaves(node(t1, t2)) + nodes(node(t1, t2)) =(leaves(t1) + leaves(t2)) + (1 + nodes(t1) + nodes(t2)) = (commutativity andassociativity of + ) 1 + (leaves(t1) + nodes(t1)) + (leaves(t2) + nodes(t2)).

3.5. REAL NUMBERS 45

3.5 Real numbers

What do we mean when we say ‘continuum’? Here is a description AlbertEinstein gave on page 83 of [21]:

The surface of a marble table is spread out in front of me. I canget from any point on this table to any other point by passingcontinuously from one point to a ‘neighboring’ one, and repeatingthis process a (large) number of times, or, in other words, bygoing from point to point without executing ‘jumps’. I am surethe reader will appreciate with sufficient clearness what I meanhere by ‘neighboring’ and by ‘jumps’ (if he is not too pedantic).We express this property of the surface by describing the latter asa continuum.

People have been using the concept of real numbers for a long time (theBabylonians for example already calculated with roots long B.C., see [12]).In order for set theory to cover the fundamental structures of analysis, aprecise and formal basis for the real numbers was needed. Even simple equa-tions have no solutions if all we knew were rational numbers (for example,there is no rational number x such that x2 = x ∗ x = 2).

When Cantor developed his set theory, it was well known that each type ofnumber could be constructed as the limit of a sequence of numbers of anothertype. But it became clear that, especially in connection with theorems as-serting the existence of some limit relations, (see [30, page 182]) the proofmight require irrational numbers to be defined in terms of rational ones, inorder to avoid begging the question of existence involved in the theorem.Cauchy and Heine tried to define the irrational or real numbers in the secondhalf of the 19th century. In 1872 Cantor and Dedekind followed with theirprecise definition of the real numbers. We first present the three methods(of Dedekind, Cantor and Cauchy) of defining the reals in terms of rationalsand then show that they are identifiable.


3.5.1 Dedekind’s cuts

As a professor in the Polytechnic School in Zurich I found my-self for the first time obliged to lecture upon the elements of thedifferential calculus and felt more keenly than ever before the lackof a really scientific foundation for arithmetic.

- Richard Dedekind, in the opening of the paper in which Dedekind’scuts were introduced.

Dedekind defined a cut to determine a real number. A cut is a partitionof a sequence into two disjoint nonempty subsequences, all the members ofone of which are less than all the members of the other. Dedekind used thepoint at which the sequence is partitioned5 to define a real number.

Definition of a (Dedekind) cut:Given an ordering < on a set V , a subset C ⊆ V is a cut in V :=

1) C = ∅ ∧ C = V

2) (∀a, b : a, b ∈ C : a ∈ C ∧ b < a→ b ∈ C)

3) C does not have a greatest element

Example: x ∈ Q | x2 < 2 is a cut in Q. Notice that we can also define thesame cut as x ∈ Q | x4 < 4.

Each real number r can now be defined by a cut C in Q if r is the supre-mum for C. Each cut then determines a unique real number (see paragraph3.5.4). We want to identify cuts that define the same real number, such asfor example x ∈ Q | x2 < 2 and x ∈ Q | x4 < 4.

Definition of (Dedekind) cut equivalence: A cut C1 is equivalent to acut C2, notation C1 ∼ C2 := there is a supremum r for C1 and for C2

We can now define RDedekind as the set of all equivalence classes of all cutsin Q: RDedekind := C ⊆ Q | C is a cut in Q /∼.

5Actually, Dedekind’s original definition did not use a partition but a slightly morecomplex division. For details see the link ‘Dedekind cuts’ at http://zax.mine.nu/stage.


Example: x ∈ Q | x2 < 2 has√

2 as supremum. We can identify the realnumber

√2 with the equivalence class of all sets that have

√2 as supremum.

3.5.2 Cantor’s chains of segments

In mathematics the art of proposing a question must be held ofhigher value than solving it.

- A thesis defended in Cantor’s doctoral examination.

Cantor defined a chain of segments to determine a real number (see also[17, chapter 12]). This is a sequence of ever decreasing intervals in Q, thelimit of which determines a unique real number.

Definition of chain segments:< an, bn >V

n∈N is a chain of segments (in V ) :=

1) (∀n : n ∈ N : an ∈ V ∧ bn ∈ V )

2) (∀n : n ∈ N : an ≤ an+1 ≤ bn+1 ≤ bn)

3) (∀n : n ∈ N : bn − an ≤ 2−n)

Example: Consider the following chain of segments in Q:<< 1, 2 >,< 1.4, 1.5 >,< 1.41, 1.42 >,< 1.414, 1.415 >, . . . >.Each segment ‘includes’

√2.

Note that < an, bn >Vn∈N (notation < an, bn >V or < an, bn > when it

is clear which set V is meant) is actually a sequence, and in 3) a minimumbound is put on the speed of convergence. We now want to be able to saywhen two chains are equivalent.

Definition of chain equivalence: The chains of segments < an, bn > and< cn, dn > are equivalent, notation < an, bn > ∼ < cn, dn > :=(∀k : k ∈ N : bk ≥ ck ∧ dk ≥ ak)

Theorem: ∼ is an equivalence relation on the set of all chains of segmentsof Q


Each equivalence class of chains of segments in Q now determines uniquelya real number r. To be precise, r is determined by < an, bn >∼ if(∀n : n ∈ N : an < r < bn). r then is the only real number with this property(see also paragraph 3.5.4).

We can now define RCantor as the set of all equivalence classes of chainsof segments in Q : RCantor :=< an, bn >Q

n∈N / ∼

3.5.3 Cauchy-sequences

Men pass away, but their deeds abide.

- Louis Cauchy, his last words quoted in [22].

Cauchy defined a Cauchy sequence to determine a real number. His sequenceof numbers defines a real by letting the numbers come closer to the real num-ber in every step.

Definition of Cauchy Sequence: With a partial order on a set6 V ,anVn∈N is a Cauchy sequence in V :=

1) (∀n : n ∈ N : an ∈ V )

2) (∀k : k ∈ N : (∃p : p ∈ N : (∀n,m : n,m ∈ N : n,m > p→| an − am |≤ 2−k)))

Example: The informally (using ‘. . .’ to informally indicate an infinite con-tinuation) defined sets 1, 1.4, 1.414, 1.4142, 1.41421, 1.414213, . . . and1, 1.414, 1.4121, . . . are both Cauchy sequences. For each n ∈ N, an+1 layscloser to

√2 than an.

We also denote a Cauchy sequence ann∈N simply by an. We now wantto be able to say when two Cauchy sequences are equivalent.

6V is in general an ordered, commutative ring. We will not further discuss this here,and for the rest of this paragraph take V = Q.


Definition of Cauchy sequence equivalence: The sequences an and bn

are equivalent, notation an ∼ bn := limn→∞(an) = limn→∞(bn)

Note that in the definition of equivalence the hitherto undefined notionof a limit is used. With the following definition we can formalize the notionof a limit.

Definition of sequence convergence: A sequence ann∈N of elements ofa set V is said to converge to a sequence bnn∈N, notation limn→∞(an) =limn→∞(bn) := (∀k : k ∈ N : (∃p, q : p, q ∈ N : (∀n,m : n,m ∈ N ∧ n >p ∧m > q : | an − bm |< 2−k)))

Note: convergence is usually defined in terms of real numbers, but we can-not use such definition here because we yet have to define the reals. The num-ber r is then called the limit of the sequence an, notation limn→∞(an) = r,if (∀k : k ∈ N : (∃p : p ∈ N : (∀n : n ∈ N ∧ n > p :| an − r |< 2−k))).A sequence is said to diverge if it does not converge.

Theorem: Any convergent sequence ann∈N is bounded and has a uniquelimit.Proof: First we prove (by contradiction) the uniqueness. Suppose the se-quence has 2 limits, c and c′. Take any k ∈ N. Then from the definition ofconvergence there is an integer p such that | an−c | < 2−k if n > p. Also, thereis an integer p′ such that | an − c′ | < 2−k, if n > p′. Adding the two equa-tions we get (using the triangle inequality: (∀a, b :: | a + b | ≤ | a | + | b | )): | c′ − c | = | (an − c) + (c′ − an) | ≤ | an − c | + | an − c′ | < 2−k ∗ 2.Hence, | c′− c | < 2 ∗ 2−k, for all k ∈ N, if n > p∧ n > p′. This means c = c′,thus the limit is indeed unique. Now we prove boundedness. The sequenceconverges, so we can take, for example, k = l. Then there is a p such that| aj − c | < 2−k for j > p. We then have, again using the triangle inequality,that | aj | ≤ | aj − c | + | c | < 2−l + | c |. Then the sequence can bebounded by M = max.| a1 |, | a2 |, . . . , | ap |, (1 + | c |)

Each real number can now be defined by an equivalence class of Cauchysequences: r is determined by an ∼ if r = limn→∞(an), for each sequence an

from the equivalence class an ∼.


We can now define RCauchy as the set of all equivalence classes of Cauchysequences in Q : RCauchy :=< an >Q

n∈N / ∼

3.5.4 Properties of the three definitions

Before these definitions for real numbers were given, we intuitively thought ofthe reals as infinite sequences of (decimal) digits. In the rest of this sectionwe assume that by R we mean this set of reals, i.e. all infinite sequencesof decimal numbers. We can now check whether the three new definitionsindeed are correct ways to identify real numbers:

1) < an, bn > Q is a chain of segments → (∃!c : c ∈ R : (∀n : n ∈ N : an ≤c ≤ bn))

2) C is a cut in Q → (∃!c : c ∈ R : c = supremum(C))

3) ann∈N is a Cauchy sequence → (∃!c : c ∈ R : limn→∞(an = c))

Then we can check for every newly defined set X of reals that:

a) it contains a countable, densely ordered (i.e. (∀r1, r2 : r1, r2 ∈ D : (∃q :q ∈ Q : r1 < q < r2))) set D without endpoint, which is dense in X.

b) every Dedekind cut has a supremum in X.

Every set for which a) and b) hold is isomorphic with R. If a definitionsatisfies a) and b) it possesses the properties we intuitively want the realnumbers to have. It can be proven that if these two properties hold we havedefined the reals successfully such that there is a total ordering on the reals,the reals are densely ordered and the ordering is continuous.

3.6. INFINITE SETS 51

3.6 Infinite sets

Our minds are infinite, and yet even in these circumstances offinitude we are surrounded by possibilities that are infinite, andthe purpose of life is to grasp as much as we can out of that in-finitude.

- A.N. Whitehead in [76]

The size of a finite set V , notation | V |, can be defined by the number ofelements that it has. But counting the elements does not end for infinite sets.Cantor was concerned with the problem of measuring the sizes of infinite sets(because he was investigating questions about singularities of Fourier series,see [30, chapter 4]) and proposed a rather nice solution to this problem. Heobserved that two finite sets have the same size if the elements of one setcan be paired with the elements of the other set; this method compares setswithout resorting to counting and can be extended to infinite sets.

This is the concept of an equivalence relation between sets (the relation isalso referred to as ‘are of the same cardinality’, ‘equipotent’ or ‘equipollent’(see [30, page 229])).

Definition of set equivalence: A set V is equivalent to a set W , notationV ∼ W := there is a bijection f : V → W

It is simple to check that ∼ has the properties of an equivalence relation,i.e. it is reflexive, symmetric and transitive. But if we consider ∼ to be atrue relation, we need the concept of V , the set of all sets: ∼ ⊆ V × V . Butthe existence of V is paradoxical, see section 3.8.

This new method to measure the number of elements of a set is reflectedin the notion of cardinality of a set, and led to the surprising result thatthere are many levels of infinity. Before we present a proof of this result,using Cantor’s famous diagonalization method, we first introduce some moredefinitions.


Postulate for Cardinal numbers:With every set V is associated a well-defined abstract entity V , called the

cardinal number of V , such that V ∼ W ↔ V = W . We can think of Vas denoting the common property of set equivalence (as defined above) of allsets in the equivalence class of V .

It proved difficult however, to come to an exact definition of cardinalityfrom this postulate. Cantor regarded cardinals as special abstract entitiesof a new kind. In 1884, the German mathematician Frege came with hisown definition of cardinal numbers. He discussed it with the mathematicianRussell and they proposed the idea of defining V as V/ ∼, the equivalenceclass of V modulo ∼. The postulate for cardinal numbers then follows at

once. Frege also denoted finite cardinal numbers as natural numbers: ∅ = 0,

∅ = 1, ∅, ∅ = 2, . . .. This Frege-Russell definition would become stan-dard, until - as we will later see in section 3.8 - it became known that thisdefinition could also lead to a paradox.

Cantor used the Hebrew letter aleph to name the different levels of in-finity. The cardinality of the set of natural numbers is by definition calledaleph-null or aleph-nough, notation ℵ0. The ‘next levels’ of infinity are calledℵ1,ℵ2, . . .. Since the cardinality of the set of reals was unknown, Cantor de-fined it as c. If we assume the continuum hypothesis (see section 3.7), thatsays there is no level of infinity between the cardinality of N and R, the car-dinality of the set of reals can also be denoted by aleph-one, notation ℵ1.

Property of cardinality: Given the cardinality V of a set V , we have

• If V is finite: V = the number of elements of V

• If V is infinite: V = ℵi, when there exists a bijection between V andthe set P i(N)

Sometimes the cardinality of a set V is also denoted by | V | , after the sizeof a set V . A more rigorous treatment of cardinal numbers will be given insection 3.8.1. This new concept enabled Cantor to define more concepts forthe analysis of infinite sets. It also inspired others to analyze the propertiesof infinite sets.


No other question has ever moved so profoundly the spirit of man,no other idea has so fruitfully stimulated his intellect; yet no otherconcept stands in greater need of clarification than that of the in-finite.

- D. Hilbert, quoted in [96]

In the rest of this section we will present some of the results of the researchof infinite sets.

Definition of finite: A set V is finite := (∃n : n ∈ N : V ∼ x ∈ N | x < n)

Definition of infinite: A set V is infinite := V is not finite

Definition of Dedekind infinite:A set V is Dedekind infinite := (∃W : W ⊂ V : V ∼ W )

Theorem: V is Dedekind infinite ↔ V is infinite (from [17])Proof: We show that V is infinite iff N ≤1 V . We prove the two implicationsof the theorem separately:

V is Dedekind infinite → V is infinite: V is Dedekind infinite, i.e. thereexists a W ⊂ V such that V ∼ W , i.e. there exists a bijection f : V → W .Because W is nonempty and W ⊂ V there also exists an a ∈ V such thata /∈ W . Consider the function g : N → V , defined recursively by g(0) = aand g(k + 1) = f(g(k)). We now have to show that g is an injection, i.e forall i, j ∈ N : i = j → g(i) = g(j). We use induction on i:

i = 0: if 0 = j then g(0) = a /∈ W and g(j) ∈ W , so g(0) = g(j).

i = k + 1 : assume k + 1 = j, then we can prove g(k + 1) = g(j) byinduction on j:

j = 0 : g(0) = a /∈W and g(k + 1) ∈ W , so g(k + 1) = g(0).

j = l + 1: we know k = 1 = j = l + 1, so k = l. By the inductionhypotheses g(k) = g(l). Since f is a bijection we also have thatf(g(k)) = f(g(l)), i.e g(k + 1) = g(l + 1) or g(i) = g(j).


V is Dedekind infinite ← V is infinite: N ≤1 V , so there exists a bijec-tion f : N → V . We show that W := V − f(0), clearly a real subsetof V (W ⊂ V ), is equivalent to V (W ∼ V ). The following function g is abijection from V to W : g(f(i)) = f(i+1), g(x) = x if x = f(i), for all i ∈ N.

Definition of countable:A set V is countable, also called denumerable := V is finite or V ∼ N

Definition of uncountable: A set V is uncountable := V is not countable

Definition of denumeration: A denumeration of a set V is a bijectionf : N→ V

Cantor then proved that N, Z and Q all have the same cardinality andalso called these sets countably infinite.

Theorem: Q is countableProof: We give a bijection from N to Q, by listing all elements of Q. Considera table with all fractionals a

b(a ∈ N, b ∈ N+, with fractional a

bon the ath

row and the bth column. If we list all elements row by row, we would notobtain a correspondence between N and Q, since the list would never getto the second row. By listing the elements at the diagonals (south-west tonorth-east), starting from the north-west corner, we obtain a correspondencebetween N and Q. Because 2

2= 1

1, etc, we hereby skip an element when it

would cause a repetition. We can also give a bijection from Q to an infinitesubset of N which is equivalent to N: for each fractional a

b∈ Q with a and b

relative prime, let f(< a, b >) := 12(a + b)(a + b + 1) + n.

An example of an uncountable set is the set of real numbers, R. In 1873Cantor proved that R is uncountable, using a technique called diagonaliza-tion (also known as the diagonal method), see [17, page 99].

Theorem: R is uncountableProof: Suppose there is a bijection f between N and R. We contradict thisby finding an x in R that is not paired with anything in N. We constructthis X by taking the first fractional digit of x arbitrarily but never 0 or 9 orthe first fractional digit of f(1), the second fractional digit of x also differentfrom 0, 9, and the second fractional digit of f(2), etc. Continuing this way


down the diagonal of the table of digits, we obtain all digits of x. x is notf(n) for any n because the nth fractional digit of x differs from the nth frac-tional digit of f(n).Note that we avoid the problem of certain numbers such as 2.3999 . . . and2.4000 . . . being equal by never selecting a 9 or a 0. Similarly, we can usethis diagonalization method to show that N ∼ 0, 1N.

Theorem: (∀V :: P(V ) ∼ 0, 1V ). (see [17, page 98])Proof: We show that there is a bijection K from P(V ) to 0, 1V . ForW ⊆ V , define K(W ) (also denoted KW ), the characteristic function of W ,as:KW (v) = 1 if v ∈ WKW (v) = 0 if v /∈ W .

We now show that K is a bijection from P(V ) to 0, 1W :

1) f is injective: let W1,W2 ⊂ V and suppose W1 = W2, that means thereis an element w ∈ V , such that (w ∈ W1∧w /∈W2)∨(w /∈W1∧w ∈W2).Then we have that (KW1(w) = 1 ∧ KW2(w) = 0) ∨ (KW1(w) = 0 ∧KW2(w) = 1), and thus (∃w : w ∈ V : KW1(v) = KW2(v)), i.e. KW1 =KW2 .

2) f(w) is surjective: suppose g ∈ 0, 1V . Let Wg = v ∈ V | g(v) = 1.Then (∀v : v ∈ V : KWg(v) = 1 ↔ g(v) = 1), thus (∀v : v ∈ V :KWg(v) = g(v)), and g = KWg .

We can define an ordering relation ≤1 on the cardinalities of sets. We

say that V ≤1 W if there is an injection from V to W but not vice versa.Then V <1 W of course means that V ≤1 W holds but not V ∼ W . Thisrelation on the set of cardinals only depends on the cardinals themselves andnot on the choice of the particular sets V and W . The relation ≤1 is reflexiveand transitive. Cantor also conjectured that ≤1 is a partial order. This waslater proven independently by the two mathematicians F. Bernstein and E.Schroder (see [59, page39]).

We give two theorems that are based on the relation <1:

Theorem: (without proof) (∀V : V is a non-empty set: V <1 P(V ))


Theorem: V is Dedekind infinite ↔ N ≤1 VProof: This theorem follows directly from the theorem on page 53 and thedefinition of infinite.

Although we have seen that N is countable but R is not, we might stillthink that there is some smaller interval of the reals that can be paired tothe naturals.

Theorem: N ∼ [0, 1]Proof of Poincare (see [17]) We show there is no bijection f : N → [0, 1],in particular (∀f : (f : N → [0, 1]) : f is not surjective). We do thisby constructing for every function f : N → [0, 1] a y ∈ [0, 1] such that(∀n : n ∈ N : f(n) = y). We construct this y by means of a chain ofsegments (see paragraph 3.5.2).Let f : N→ [0, 1]. Let Sn be an infinite chain of segments such that

1) (∀i : i ∈ N : f(i) /∈ Si)

2) (∀i : i ∈ N : Si+1 ⊆ Si)

3) (∀i : i ∈ N : | Si |= 3−i−1),with | Si | being the length of segment Si.

We can construct such a chain of segments, for if we divide a segmentSn = [pq, qn] in three equal parts (i.e. each part has length 3−n−1), at leastone of these parts does not contain f(n + 1). We take this part for Sn+1.The constructed chain of segments determines (see paragraph 3.5.2) a realnumber y, with (∀n : n ∈ N : y ∈ Sn), and thus certainly y ∈ [0, 1]. We alsohave that (∀n : n ∈ N : f(n) /∈ Sn ∧ y ∈ Sn), i.e. so (∀n : n ∈ N : y = f(n)).

The following theorem gives a way to prove the equivalence of sets:

Theorem of Cantor-Bernstein: V ≤1 W ∧W ≤1 V → V ∼ WProof: Assume V ≤1 W and W ≤1 V . Then there are injections f : V → Wand g : W → V . We know that Dom(g) = W , so to prove g is surjectivewe have to prove Ran(g) ∼ W . Since Ran(g) ⊆ V and g f is an injec-tion from V to Ran(g), we have V ≤1 Ran(g). And since for all W and V ,W ⊆ V ∧ V ≤1 W → V ∼ W (see the lemma below), we have Ran(g) ∼ V .


Lemma: W ⊆ V ∧ V ≤1 W → V ∼ WProof: Suppose W ⊆ V and V ≤1 W . There is an injection h : V → W . LetA0 := V −W , and (∀n : n ∈ N : An+1 := h(An)). We now give the desiredbijection k : V → W .

• k(a) := a if a /∈⋃

n An

• k(a) := h(a) if a ∈⋃

n An

We show that k is a bijection:

• k is injective: Suppose a = b, then k(a) = k(b) by using a case analysisa /∈

⋃n An ∧ b /∈

⋃n An, a /∈

⋃n An ∧ b ∈

⋃n An, a ∈

⋃n An ∧ b /∈⋃

n An , a ∈⋃

n An∧b ∈⋃

n An. For all cases, it follows that k(a) = k(b)by the definition of k and the injectivity of h.

• k is surjective: Suppose w ∈ W , thus w /∈ A0. Again we use caseanalysis:

– if w /∈⋃

n An then w = k(w).

– if w ∈⋃

n An, assume w ∈ Ap. Since w /∈ A0, p ≥ 1. Thus there isa w′ ∈ Ap−1 such that w = k(w′).

Example: We prove that (a, b) ∼ [0, 1] for all a, b ∈ R by using the theoremof Cantor-Bernstein. We first prove that (0, 1) ∼ [0, 1] and consequentlythat (0, 1) ∼ (a, b). Then, by the transitivity of ∼ we can conclude that(a, b) ∼ [0, 1].

Proof of (0, 1) ∼ [0, 1]: The identity function id(0,1) : (0, 1) → [0, 1]is an injection from (0, 1) to [0, 1], so (0, 1) ≤1 [0, 1]. The functionf(x) = 1

3(x + 1) is an injection from [0, 1] to (0, 1), so [0, 1] ≤1 (0, 1).

By the theorem of Cantor-Bernstein we now know that (0, 1) ∼ [0, 1].

Proof of (0, 1) ∼ (a, b): The function f(x) = (b− a)x + a is a bijectionfrom (0, 1) to (a, b).

Using the Cantor-Bernstein theorem we can also prove that(a, b) ∼ (0, 1) ∼ R ∼ Rn ∼ 0, 1R ∼ P(N) ∼ NN, for all n ∈ N, n ≥ 1.


Theorem: V is infinite → N ≤1 VProof: V is infinite and thus not empty. We take one element x0 ∈ V . Next,we take an element x1 ∈ V − x0. We can repeat this infinitely (i.e. for alln we can select an x ∈ V − x0, . . . , xn), if we assume that it is possibleto always select an element from any non-empty set (see the axiom of choicebelow). In this way we get a countable subset of V , namely x0, x1, x2, . . ..The only assumption we have made here is the so-called axiom of choice.

Axiom of choice (AC): Given any set W of non-empty sets V , there is afunction f which assigns to each member V of W an element f(V ) of V .

This definition was proposed first in an article by Zermelo in 1908 (trans-lated in [93, pages 199-215]). Such a function f is called a choice functionfor W . The axiom can be restricted by limiting to those families W of a par-ticular cardinality. Since for any finite W the axiom is provable, the weakestnon-trivial case occurs when W is denumerable (see page 54 for the definitionof denumerable). This case is known as the Denumerable axiom. Zermeloregarded the AC as already implicitly used by mathematicians. In responsesome people asked when this assumption developed from mathematics, whenit is implicitly used, and when exactly it can or cannot be avoided. Zermeloattempted to prove AC, but the controversy over his proof of 1904 (see [63,page 310]) led Zermelo to axiomatize set theory (see section 5.3.1). We canadd AC to set theory based on the axioms of Zermelo and Fraenkel (ZF, seesection 5.3), in which case it is termed ZFC (ZF supplemented by the Axiomof Choice). For more details on the role of the AC, we refer to section 5.3and [63]. See http://zax.mine.nu/stage and click on ‘links’ for some quotesabout the AC.

An instance of the following theorem (without proof) of the British ma-thematician F.P. Ramsey is often used in graph theory. The notation V n inthis theorem is defined as the set of all subsets of V with n elements, i.e.V n := X ⊆ V | X has n elements.

Theorem of Ramsey: If V is a denumerable set and f : V n → 0, 1, . . . , m−1 with n,m ∈ N and n,m ≥ 1 then (∃W : W ≤1 V : W is denumerable andf is constant on W n).


Theorem: R2 ∼ R ∼ (0, 1)Proof: We can say that R ∼ (0, 1) if there is a bijection between (0, 1)and R. Indeed, there exists a bijection f : (0, 1) → R, defined as f(x) =tan(π

2(2x−1)). Thus: R ∼ (0, 1). If we consider an element of R2, that is two

real numbers between 0 and 1, then we can map these numbers to an elementr ∈ R by interchangeably taking the next digit of each of the two numbers.For example, we map (0.76584 . . . , 0.13275, . . .) uniquely to (0, 71635 . . .).Thus: R2 ∼ R. Since ∼ is transitive, we know that R2 ∼ R ∼ (0, 1).

Theorem: P(N) ∼ (0, 1)Proof: First we show that P(N) ≤1 R. Suppose V ∈ P(N), map V tothe decimal 0.a1a2 . . ., with ai = 1 if i ∈ V and ai = 0 otherwise. Thisinjection proves that P(N) ≤1 R. Now we give an injection from (0, 1) toP(N): assume r ∈ (0, 1), i.e. r = 0.a1a2 . . . with 0 ≤ ai ≤ 9. We wantto identify numbers such as 0.3999 . . . and 0.4000 . . .. Therefore we assumethere is not an i ∈ N such that for all n > i, n ∈ N, an = 9. Then wemap r to the set 1a1, 1a1a2, . . . of natural numbers. Clearly, this map-ping is well-defined. For example, r = 0.17803 . . . is mapped to the set11, 117, 1178, 11780, 117803, . . .. Thus (0, 1) ≤ P(N), hence P(N) ∼ (0, 1).

Corollary: P(N) ∼ R

Proof: This directly follows from P(N) ∼ (0, 1) and (0, 1) ∼ R, and thetransitivity of ∼.


3.7 The Continuum Hypothesis

We still think that the study of the size of the continuum shouldbe our guiding light for further research in set theory.

- Judah Haim in [33]

After showing that the real numbers cannot be put into one-to-one corre-spondence with the natural numbers (see section 3.5), Cantor hypothesizedin 1877 that each infinite subset of R is either denumerable or equivalentto the continuum. This hypothesis was first published in 1878 in [13] andbecame later known as:

The Continuum Hypothesis (CH): (N ≤1 A ≤1 R)→ (A ∼ N∨A ∼ R)

This hypothesis (as given in [17, page 128]) is also known in many otherforms, of which we will mention and explain the most important. We canimmediately see that the following version of CH is equivalent to the givendefinition: ‘any set of real numbers is either finite, countably infinite or hasthe same cardinality as the entire set of reals’. This means that ‘the num-ber of real numbers is the next level of infinity above the number of naturalnumbers’ (see also [30, page 197]).

As we saw in section 3.6, Cantor defined the cardinality of the naturalnumbers to be ℵ0, and the next levels of infinity to be ℵ1,ℵ2,ℵ3, etc. He alsonamed the cardinality of the reals c, for continuum. Cantor’s original for-mulation of CH was: (B) c = ℵ1. Since Cantor also proved that P(N) ∼ R

(see page 59), we can also state CH as: (C) P(N) ∼ ℵ1. The cardinality ofthe power set of any set X is equal to the cardinality of 0, 1X (see page55), often denoted as 2X , so another formulation7 of CH is: (D) 2ℵ0 = ℵ1

(see [31]). These formulations, although (B) leads us to think about sizesof reals, (C) about subsets and (D) about cardinal exponentiations, are allequivalent in ZFC. We will not go into details of less precise or more de-pendant formulations such as ‘what is the cardinality of the set of points ona geometrical line?’.

7Actually in this formulation we have identified the cardinalities ℵ0 and ℵ1 with thesets that have these cardinalities.

3.7. THE CONTINUUM HYPOTHESIS 61

Some of the theory that is needed in the remaining part of this section, forthe generalized continuum hypothesis, will be introduced in later chapters.If you are not familiar with the notations that are used, you might want toskip the remaining part of this section and get back to it later.

In 1908 the German mathematician Felix Haussdorf proposed the follo-wing generalization of CH (that is also called aleph-hypothesis):

The Generalized Continuum Hypothesis (GCH):(∀r : r is an ordinal : 2ℵr = ℵr+1)

For a definition and the notation of ordinal numbers, we refer to section3.8.1. Obviously, (see section 5.3) we have that ZF + GCH ! CH. Notethat ZF + GCH ! AC (so we don’t need ZFC once we have GCH).Cantor and many other great mathematicians spent years trying to proveCH or its negation (Cantor tried to prove his hypothesis by using a decom-pensation theorem; for details see [31, page 117]), but did not succeed. Thisproblem was so important that Hilbert (see section 6.2) put it first in his listof 23 problems.

In 1938 significant progress was made when the mathematician Godelproved (in his article ‘What is Cantor’s continuum problem?’) that CH isconsistent with ZFC (see section 5.3.2) by constructing a model of ZFC +CH. Since at the same period, Godel proved his famous incompleteness the-orem (see chapter 8), people suspected that CH was one of the statements(of ZFC) that can neither be proved nor disproved. Mathematicians sus-pected that CH was undecidable in ZFC but it took until 1963 until thiswas proved by Paul Cohen in [15].

To do that he used a new technique called forcing . Forcing is a combi-natorial technique for proving statements consistent with the axioms of settheory. Cohen used it in order to prove that the negation of AC and thenegation of CH are consistent with the axioms of set theory (AC and CHwere already known to be consistent). Essentially it consists of a methodof performing the following algorithm: start with a model of set theory M.Construct an object X not inM with certain properties. Consider the smal-lest modelM′ with X an element ofM′ andM a subset ofM′ (this is donein a way such that the construction of M′ is implicit in the construction of


X). For more details on forcing, see [51] and [81].

Thus Cohen constructed a model of ZFC + ¬CH and this, along withGodel’s model of ZFC + CH, showed that CH is undecidable in ZFC. Sothis means that either CH or ¬CH could be added as an axiom of ZFC.But since neither of these axioms seems axiomatic or ‘self-evident’ they have,unlike AC, not been adopted as axioms of set theory. Mathematicians eitheraccept this incompleteness in set theory or try to find more intuitive axiomsthat will help decide it. In other words, the question remains what intuitiveaxiom of set theory we need to make it more complete, and whether, withsome axiom system for set theory, the continuum hypothesis is true.

3.8. CARDINAL AND ORDINAL NUMBERS AND PARADOXES 63

3.8 Cardinal and Ordinal numbers and Para-

doxes

Every transfinite consistent multiplicity, that is, every transfiniteset, must have a definite aleph as its cardinal number.

- Georg Cantor

3.8.1 Cardinal numbers and Cantor’s Paradox

In section 3.6 we already encountered cardinal numbers and the notion ofset equivalence. After defining the equivalence of sets (see page 51), Cantorrealized that all sets that are equivalent to a given set V have a common

property. He identified this property with the cardinal number V of a set V ,a property that abstracts from the nature and order of the elements of a set.

Example: Consider the following sets: A = 1, 2, 3, B = 3, 2, 1, C =4, 7, a, b, D = 1, 4. We can say that A ∼ B ∼ C, or (equiva-

lently) A = B = C. We also have A ∼ D, or A = D. Note that in thisexample the equality ‘=’ between cardinal numbers is a new type of equality

that is defined as A = B ↔ A ∼ B.

We can see that cardinality abstracts from the order and nature of theelements, and for finite sets the cardinal number can be identified with theordinary ‘number of elements’. Therefore we identify the cardinal number ofa finite set of n elements with the natural number n. We denote the smallestinfinite set (or transfinite) cardinal number by ℵ0. As we have already seenon page 52, this is the cardinal number of N or any denumerable infinite set.Cantor defined the ‘next’ levels of infinity by ℵ1,ℵ2, . . ..

The next question was how to pass from the abstract notion of cardinalnumbers to real cardinal numbers, i.e. one wanted to regard cardinal numbersas objects of the mathematical system. It turned out to be quite a problem

to define the cardinal V of a set V as an object of set theory. In naive settheory, as well as in Quine’s ‘New Foundations’ (see section 7.3), the defini-

tion of the cardinal V of V poses no problem: V can be defined as the setof all sets equivalent to V . But this definition (first given by Frege, see page


3.6) of cardinal numbers as given in section 3.6 can lead to a paradox thatwas first found by Cantor.

Cantor’s paradox: The set of all sets is its own power set. Therefore, thecardinality of the set of all sets must be bigger than itself.

In axiomatic set theory however (e.g. in ZF, see section 5.3), without theunrestricted comprehension axiom, there is no set which contains all setsequivalent to V . With this paradox the need arose to find a new definition ofcardinals in a context without the unrestricted comprehension axiom, suchthat traditional paradoxes could no longer be derived.

Several new definitions of cardinal numbers were then proposed, basedon ordinal numbers (for which we refer to the next section8). The followingdefinition that comes from the mathematician von Neumann is now the stan-dard definition for cardinal numbers.

Definition of Cardinal number (or initial number):A cardinal number α := an ordinal number α with property (∀γ :: α ∼ γ →α ≤ γ)

For each set V we can prove (see [17, section 2.10]) that there existsexactly one cardinal number α satisfying V ∼ α (proof uses AC). We callthis unique α the cardinality or cardinal number of the set V , and is also

denoted by V .

In other words, with the axiom of choice we can develop the theory of

ordinals in the von Neumann way and define V to be the least ordinal α equiv-alent to V . The existence of such an α is guaranteed by the well-orderingtheorem. If we have the axiom of foundation among our axioms, even if the

axiom of choice is absent we can define V as the set of all sets W of leastrank among those equivalent with V (see [1]). In the absence of the axioms

of choice and foundation the operation V is undefinable (see [1]).

For more information on the definition and calculus of cardinal numbers,we refer to [59, chapter 6], [25] and [34].

8The rest of this section depends on concepts that are defined in later chapters.


3.8.2 Ordinal numbers and Burali-Forti’s Paradox

We already introduced Cantor’s concept of cardinal number in section 3.6,and saw in the previous paragraph that it abstracts of the order and natureof the elements of a set. Cantor also defined a property of sets, the ordinalnumber , that only abstracts from the nature of the elements of a set, butretains the order in which they are given.

Here we consider sets with a total ordering (see page 25). Recall that inaddition for a well-ordered set, each non-empty subset also has a first mem-ber in the given ordering (see also page 3.2). In the case of ordered sets, theconcept of equivalence is now replaced by the sharper concept of similarity.We consider two ordered sets V and W similar , notation V W , if there isa bijection between V and W that retains all order relations. Note that wehave already seen this relation with the concept of isomorphism (‘is isomor-phic to’, see page 31), and note that is an equivalence relation. Insteadof saying two sets are similar, we also can say they are of the same order type.

Definition of an Order Type: An equivalence class under the (isomor-phism) relation

The equivalence class to which an ordered set V belongs is called theorder type of V . All well-ordered sets that are as such similar to a given setV have a common property. Cantor identified this property with the ordinalnumber V of a well-ordered set V , a property that only abstracts from thenature of the elements of a set. And just as for cardinals (see section 3.8.1)the question was posed how to define ordinal numbers as part of set theory.In 1883 Cantor defined in [13] an ordinal number as the order type of a well-ordered set.

Definition of Ordinal Number (Cantor): A well-ordered set V has or-dinal number o := o is the order type of V

If a set is finite and simply ordered, it is well-ordered and it has an ordinalnumber. The ordinal number of that set is the same, regardless of the orderof the elements. For each finite and simply ordered set, we can therefore


identify the (finite) cardinal number with the ordinal number.

Example: 0 = ∅; 1 = 0; 2 = 0, 1; 3 = 0, 1, 2 are ordinal numbers.

The smallest infinite ordinal number is called ω. This is the ordinal num-ber of the sequence 0, 1, 2, 3, . . ., which can be seen as N or as the sequenceof finite cardinal numbers in their ‘natural’ order. We introduce some othertransfinite ordinals by example (from [10, page 66]).

Example:

If we call the set ∅ as ‘0’, the next set as ‘1’, etc., then consider the unionof all the sets 0, 1, 2, . . . . This is another ordinal called ω and is thefirst non-finite ordinal. It has a successor: ω ∪ ω, called ω + 1. Moreordinals can be obtained by continuing this succession, and taking theunion of all these ordinals yields an ordinal we call ω∗2, etc. The naturalnumbers in reverse order are denoted ∗ω.

V1 = 2, 3, 4, . . . , 1 ; V2 = 3, 4, 5, . . . , 1, 2

V3 = 1, 3, 5, . . . , 2, 4, 6, . . . ; V4 = . . . , 3, 2, 1

V5 = 1, 3, 5, . . . , 6, 4, 2 ; V6 = 1, 11, 21, . . . , 2, 12, 22, . . .

N = ω ; V1 = ω + 1 ; V2 = ω + 2 ; V3 = ω + ω = ω ∗ 2V4 = ∗ω ; V5 = ω +∗ω ; V6 = ω ∗ 10

For ordinal numbers n of N and m of M we say that n < m if the well-ordered set N is similar to a real subset of M .

Unfortunately, a similar situation as for cardinal numbers, was foundfor ordinal numbers. In 1897 it was found by the Italian assistant of themathematician Peano, Burali-Forti, that this definition can give rise to aparadox (see [18, page 259]).


The Burali-Forti Paradox: The set of all ordinal numbers, taken in theirnatural order, form a well-ordered series, and therefore also has an ordinalnumber Ω. But the ordinal number of any subset of the set of all ordinalsexceeds every number of that subset, and therefore Ω exceeds any ordinalnumber whatsoever.

This led to new proposals for definitions of ordinal numbers. Hence wehereunder present another definition, given by John von Neumann in [61].In 1923 he pointed out that among all well-ordered sets having a Cantorianordinal as their order type, there is a particular one with some very specialproperties. Von Neumann defined this particular set as the ordinal of thatorder type.

Definition of ordinal number: A set α is an ordinal number :=

1) α is a well-ordered set with the binary relation ∈ as its ordering

2) (∀β :: β ∈ α↔ β ⊂ α)

With this definition of ordinal numbers, the Burali-Forti paradox canno longer be applied, since the set of all ordinals is well-ordered by ε and2) also holds (a proof is given in [59, section 4.2]). According to this def-inition, the empty set is an ordinal number. This ordinal number is alsodenoted by 0. Similarly we also denote the ordinal numbers 0 by 1, 0, 1by 2, 0, 1, 2 by 3, etc. Otherwise said: 0 = ∅, 1 = ∅, 2 = ∅, ∅, . . ..These ordinal numbers, which are finite sets, are called finite ordinal num-bers . The finite ordinal numbers are identified with the natural numbers.The set ω = 0, 1, 2, . . . of all natural numbers is also an ordinal number.An ordinal number that is an infinite set, like ω, is called a transfinite ordi-nal number . For every well-ordered set V , there exists exactly one ordinalnumber isomorphic to V .

Definition of ordinal number of a well-ordered set V :The ordinal number of a well-ordered set V := the ordinal number isomorphicto V


A detailed treatment of ordinal calculus that is based on this definitionof of ordinal numbers, is outside the scope of this report. In the remainderof this section we will only define the most common concepts.

As we saw in 3.2 we also write α ∈ β (we denote ordinals by lower-caseGreek letters) as α < β, which defines an ordering on the ordinal numbers.The least ordinal number is of course 0, and the ordering of the finite ordi-nal numbers coincides with the usual ordering of the natural numbers. Theleast transfinite ordinal is ω (see also 5.3.2). The ordering ≤, defined byα ≤ β := α < β ∨ α = β, is a linear ordering and a well-ordering of theordinal numbers. Therefore we can apply transfinite induction (see page 37)on ordinal numbers.

For any ordinal number α, the set α′ = γ | γ ≤ α (called a seg-ment of α) also is an ordinal number, and α′ is the unique predecessor ofα. A transfinite ordinal without a predecessor is called a limit ordinal num-ber , and all the other ordinal numbers are called isolated ordinal numbers .The first limit ordinal number is ω. For any set V of ordinal numbers,γ | (∃η : η ∈ V : η ≤ γ) is an ordinal number, the supremum of V .

A full treatment of the theory of ordinal numbers is omitted here. Ri-gorous study has produced a complete calculus of ordinal numbers and pro-duced significant results. We only mention here the so-called well-orderingtheorem, which Cantor had accepted as true (see [18, page 257]) but thatwas first proved rigorously by Zermelo in 1904.

Well-Ordering Theorem: Every set can be well-ordered.

This means that ordinals give us a way of ‘counting’ any set, even if it isnot finite. The particular significance of the well-ordering theorem lies in thepossibility that we can apply the principal of mathematical induction (whichis well known for denumerable sets, see section 3.4.3) to any arbitrary well-ordered set. Ordinal numbers form the basis of transfinite induction whichis a generalization of the principle of induction.


We now have the following properties (given without proof):

• Two finite and ordered sets have the same order type if and only if theyhave the same cardinal number

• Cantor’s theorem : the cardinality of any set is lower than the cardi-nality of the set of all its subsets (i.e. there is no highest aleph)

• If two sets have the same ordinal number, they have the same cardinalnumber, but not necessarily vice versa

For more information and theory on cardinal numbers, ordinal calculusand set theory we refer to two classical books on set-theory: [25] and [34].The first one gives a good introduction to set theory and presupposes littlemathematical knowledge, the latter is more suitable for readers with experi-ence on set theory.

Chapter 4

Peano and Frege

4.1 Peano’s arithmetic

Questions that pertain to the foundations of mathematics, al-though treated by many in recent times, still lack a satisfactorysolution. The difficulty has its main source in the ambiguity oflanguage.

- Peano in the opening of the paper ‘Arithmetices Principia’, novomethodo exposita in which he introduces axioms for the integers

The Italian mathematician Giuseppe Peano (1858-1932) spent most ofhis career successively in the infinitesimal calculus, in foundations of mathe-matics and in linguistic studies. After his work on calculus (see Peano’s firstpublication [65]) and geometry (see [66] [67]), Peano gained particular inter-est in the field of number theory, also known as arithmetic. Like Dedekind(see quote on page 46), Peano became aware of the lack of rigour in mathe-matics by his experience in teaching infinitesimal calculus.

What is number theory? The field of mathematics consisting of the studyof the properties of the natural numbers

Since then, Peano strived for rigor, for an abstract mathematics. He cameto the conclusion that mathematics must be constructed, independently ofintuition or common sense, in a way that absolutely guarantees the validity

71

72 CHAPTER 4. PEANO AND FREGE

of its theorems.

In order to satisfy this requirement he devoted himself to the transforma-tion of mathematics into a self-contained system, and rewrote mathematics insymbolic form as an axiomatic system (see section 6.1), based exclusively onpostulated primitive notions and primitive propositions. To discard intuition,he first renounced ordinary language (because it is often not sufficient andimprecise) and desired a new mathematical symbolism, consisting entirelyof neutral symbols. Second, he formalized the logic of the mathematical ar-gument to replace intuitive inference by application of a limited number ofstated logical rules.

So Peano formalized both the language of mathematics and the logicof the mathematical argument, and thereto first developed parts of sym-bolic logic and first formalized propositional and predicate calculus. Thisdevelopment was rudimentary and would later be worked out in full detailby the mathematicians Russell and Whitehead in ‘Principia Mathematica’(1910, see section 7.1). He introduced letters to denote propositions andpropositional functions (Peano’s logic notation) and the symbol ∈ for themembership relation of a set.

The work of formalization of mathematics was published in the journal‘Rivista di Mathematica’ (this journal was previously founded by himself)and ‘Formulario Mathematico’, a series of 5 books that is also known as‘Formulaire de Mathematique’1. In 1899 he axiomatized the arithmetic ofcardinal numbers, to be published in the third volume of ‘Formulario Math-ematico’ in 1901. Peano based the foundations of arithmetic on 5 axioms(see [31, page 227]), that are formulated with the help of three (undefined)terms, the acquaintance with the latter being assumed:

a) N (the set of natural numbers)

b) 0 (the particular natural number zero)

c) a+ (the immediate successor of the natural number a)

1The original ‘Formulaire de Mathematique’ was called ‘Formulario Mathematico’ whenthe first final version appeared in 1908, because Peano at that time consistently usedInterlingua, his simplificated dialect of Latin, for all his mathematical publications.

4.1. PEANO’S ARITHMETIC 73

Definition of the Peano axioms for the natural numbers:

1) 0 ∈ N

(zero is a natural number)

2) a ∈ N→ a+ ∈ N

(the immediate successor of any number is a number)

3) 0 ∈ S ∧ (∀x :: (x ∈ S → x+ ∈ S))→ N ⊂ S(if a set S contains zero and if it contains any number x it also containsthe immediate successor x+ of that number, then S includes the wholeof N)

4) a, b ∈ N ∧ a+ = b+→ a = b(no two different numbers have the same immediate successor)

5) a ∈ N→ a+ = 0(zero is not the immediate successor of a number)

Axiom three has the function to formalize the principle known as mathe-matical induction. We can show that in ZF (see section 5.3) we can derivethe five axioms of Peano. For more information on the Peano axioms, I referto [31, chapter 5], [49, page 146-147] and [64, appendix A].

After defining the natural numbers, Peano used a recursive definition todefine the arithmetical sum, product and other operators, and he derivedmuch of the elementary number theory.

Example: Peano defined the sum a + b by recursion with respect to b :a + 0 = a, a + (b+) = (a + b)+. Similarly we can define the producta ∗ b : a ∗ 0 = 0, a ∗ (b+) = (a ∗ b) + a.

Peano then showed how rationals and reals can be formally obtained fromnaturals, and further considered elementary analysis and geometry. In lateryears, Peano turned away from the foundations of mathematics and devotedmost of his time on his new international auxiliary language Interlingua. Heinvented this language (see [49, page 148-150]) in an attempt to reduce thegrammatical structure of languages and create a universal language. Hismathematical work were to have a profound influence on the thought ofmathematics, but his language Interlingua received little response.


4.2 Frege’s work

As I think about acts of integrity and grace, I realize that thereis nothing in my knowledge to compare with Frege’s dedication totruth. His entire life was on the verge of completion, much of hiswork had been ignored to the benefit of men infinitely less capa-ble, his second volume was about to be published, and upon findingthat his fundamental assumption was in error, he responded withintellectual pleasure clearly submerging any feelings of disappoint-ment. It was almost superhuman and a telling indication of thatof which men are capable if their dedication is to creative work andknowledge instead of cruder efforts to dominate and be known.

- B. Russell about Frege, in [93, page 127]

The German mathematician and philosopher Gottlob Frege (1848-1925)was one of the founders of modern symbolic logic putting forward the (lo-gistic) view that mathematics is reducible to logic. He has written manyimportant papers on philosophy. Frege once said ‘every good mathematicianis at least half a philosopher, and every good philosopher is at least half amathematician’. Famous is his ontological argument for the existence of god,but we will not discuss his philosophical writings here. We will mention histhree most important works on the foundations of mathematics: Begriffs-schrift, Grundlagen der Arithmetik and Grundgesetze der Arithmetik.

Begriffsschrift

Just as Peano, the German mathematician Gottlob Frege invented a log-ical symbolism to which he gave the name ‘Begriffsschrift’ (in English knownas ‘Concept script’). We will not treat the symbolism that was used in Be-griffsschrift here, in full detail (it can be found in [49, page 175-182] and in[31, page 177-199]), but give a few examples of his new logic and describethe rest of his work in general terms.Frege rejected the subject/predicate regimentation on which Aristotelianlogic depends, and recognized (not as the first) that the patterns of Aris-totle cannot always be used to evaluate inferences correctly.

4.2. FREGE’S WORK 75

Example: Certain obvious inferences, such as:

If Joe doesn’t wear a kilt, than Joe is not Scottish.

Joe doesn’t wear a kilt.

Therefore, Joe is not Scottish.

do not fall under the patterns of traditional logic (also called syllogisms). Ac-tually this is another kind of inference that contains a conditional expressionof the form:

if B then A

B

Therefore, A.

Frege adopted this new rule in the system of logic of his Begriffsschrift.With arbitrary expressions for A and B, the rule became later known asmodus ponens . A logic that evaluates these sorts of expressions is called apropositional logic.

What is propositional calculus (or sentential calculus)?A symbolic system of treating compound propositions and their logical re-lationships. Compound propositions are formed via a set of derivation rulesusing standard symbols: ∧,∨,→,¬ ; Basic propositions consist of simple,unanalyzed propositions.

Frege based his propositional calculus on 6 axioms: for all x, y and z:

1 x→ (y → x)

2 (x→ (y → z))→ ((x→ y)→ (x→ z))

3 (x→ (y → z))→ (y → (x→ z))

4 (x→ y)→ (¬y → ¬x)

5 ¬¬x→ x

6 x→ ¬¬x


Derivations in the propositional calculus were based on two procedures ofsubstitution and the rule of modus ponens. For the full calculus of predi-cates, three additional axioms were needed. For all x, y and (propositionalfunctions) F :

7 (x = y)→ (F (x)→ F (y))

8 x = x

9 (∀x :: F (x))→ F (y)

Frege presented this new logic in his ‘Begriffsschrift’ in 1879. It consistsof three parts. In the first part he provides a list of inferences from which,he believes, all truths of logic can be derived. Then Frege demonstrates inthe second part the completeness of his logic (i.e. all inferences that can beshown to be valid inferences using the techniques of Aristotelian or proposi-tional logic can also be shown to be valid using only Frege’s laws and rulesof inference). The third part of Begriffsschrift shows that logic alone sufficesto show the validity of certain inferences (about properties that are heredi-tary in so-called ‘ancestral sequences’). He also showed that mathematicalinduction (see section 3.4.3) can be replaced by a principle about ancestralsequences that depends only on logical laws.

Grundlagen der Arithmetik

Throughout his work Frege developed (as the first) the main thesis of logi-cism, that mathematics is reducible to logic. But thereto, he had to do morethan developing a new logical symbolism. His next book, ‘Die Grundlagender Arithmetik’ (1884), was devoted to the ‘foundations of arithmetic’. Inthis book, Frege treated the foundations of arithmetic, based on the conceptof (cardinal) numbers. He put forward the logicist philosophy that arithmeticcould be founded upon logic alone, and he discussed work of others in detail(see [49, 184-185]). In [31, page 183] we learn more about Frege’s philosophy.In the introduction of his book Frege announced his three guiding principles:

1) Always to separate sharply the psychological from the logical, the sub-jective from the objective

2) Never to ask for the meaning of a word in isolation, but only in thecontext of a proposition

4.2. FREGE’S WORK 77

3) Never to lose sight of the distinction between concept and object

In his book he presented his own theory of numbers, and wanted to showthat all the truths in arithmetic are derivable from logical laws and defini-tions alone. He did this by sketching the proof, but not giving the officialBegriffsschrift proofs of the truths of arithmetic. Before Frege could do thathe needed a new version of Begriffsschrift, to accompany the new require-ments that his formalization of the concept of numbers had, but also to fillin pieces that were simply missing.

Grundgesetze der Arithmetik

In his next three papers ‘Function and Concept’, ‘On Sense and Meaning’,and ‘On Concept and Object (1892)’, he introduced all modifications that hewas to make to his language, Begriffsschrift, and his logical system. Duringthat period he also completed his definitions of the natural numbers and someof the proofs of simple truths of arithmetic from these definitions and logicallaws. His new logical calculus included a symbolic representation of the truthvalue of any given proposition, which provided a shorter notation for manyBegriffsschrift propositions. The calculus also had several other new logicaland arithmetical symbols, one of the most important of them being a notationfor what Frege called the ‘course-of-values’ of a propositional function. Thecourse-of-values of a propositional function ϕ , denoted by Frege as εϕ(ε),denoted the truth value for all possible values of the argument (here ε). Wedenote it as cov and define equal course-of-values by cov(f) = cov(g)↔ (∀a ::f(a) = g(a)). In 1893, Frege published the first volume of his ‘Grundgesetzeder Aritmetik’, the ‘Basic Laws of Arithmetic’. It set out the new version oflogic and began the proofs that were to make the project successful. In thesecond part Frege wanted to define the natural numbers and some basic lawsgoverning them and, in the third part, he would define the real numbers andlay the foundations for expressing analysis in terms of logic. In 1902, whenvolume 2 was in press, he received a now famous letter from the Englishmathematician and logician Russell (see chapter 5), who pointed out, withgreat modesty, a contradiction could be derived in Frege’s system (see section5.1). This contradiction would later be named after Russell and becomeknown as ‘Russell’s paradox’.


Hardly anything more unwelcome can befall a scientific writerthan one of the foundations of his edifice be shaken after his workis finished. I have been placed in this position by a letter of mr.Bertrand Russell just as printing of the second volume was near-ing completion . . . .

- The first paragraph of the appendix from Frege’s ‘Grundgesetzeder Aritmetik’

After many letters between the two (see for example [93, pages 124-128]),Frege modified one of his axioms and explained in an appendix to the bookthat this was done to restore the consistency of the system. However withthis modified axiom, many of the theorems of volume 1 do not go throughand Frege must have known this. He probably never realized that even withthe modified axiom the system is inconsistent since this was not shown untilafter Frege’s death in 1925, by Leshniewski (see [85]).

The scope of Frege’s Grundgesetze is similar to that of Principia Mathe-matica (to be discussed in section 7.1), and both aimed at a logistic basisfor mathematics, but with Russell’s theory of types Principia Mathematicadid not contain the paradox. Frege’s contribution to the foundations of ma-thematics was therefore largely indirect (through Principia Mathematica,see [49, page 181]). Although Frege attracted only a small audience in hislifetime, he was a major influence on Peano and Russell, and in the yearsthereafter his influence on contemporary philosophy, especially on thoughtabout language and logic, has become ubiquitous.

In this text I have made extensive use of the excellent books [98] and [97]about Frege that contain many more references about Frege and his work,and chapter 4.5 from [31] and chapter 6, section 4 from [49].

Chapter 5

Russell

The fact that all Mathematics is Symbolic Logic is one of thegreatest discoveries of our age; and when this fact has been esta-blished, the remainder of the principles of mathematics consistsin the analysis of Symbolic Logic itself.

- B. Russell in Principles of Mathematics, 1903

The English logician and philosopher Bertrand Russell (1872-1970) pu-blished in his long life an incredible number of books on logic, the theory ofknowledge and many other topics. He certainly was one of the most impor-tant logicians and philosophers of the 20th century.

Russell’s private life, affairs, imprisonment, his social and political cam-paigns and advocacy of both pacifism and nuclear disarmament are certainlyinteresting, but we will not discuss these subjects here (see for more informa-tion and references on Russell’s life and work [62], [80] and [31, chapter 6, 7,11 and sections 8.2, 8.3, 8.4, 8.8.3, 8.9.2, 10.1, 10.2.1]). I quote the followingassessment from [73]: “Bertrand Russell had one of the most widely variedand persistently influential intellects of the 20th century. During most of hisactive life, a span of three generations, Russell had at any time more than40 books in print ranging over philosophy, mathematics, science, ethics, so-ciology, education, history, religion, politics and polemic. The extent of hisinfluence resulted partly from his amazing efficiency in applying his intellect(he normally wrote at the rate of 3,000 largely unaltered words a day) andpartly from the deep humanitarian feeling that was the mainspring of his ac-

79

80 CHAPTER 5. RUSSELL

tions. This feeling expressed itself consistently at the frontier of social changethrough what he himself would have called a liberal anarchistic, left-wing,and skeptical atheist temperament.”

Here, we will focus on Russell’s mathematical contributions to the foun-dations of mathematics. His contributions relating to mathematics includehis discovery of Russell’s paradox, his defense of logicism (the view thatmathematics is, in some significant sense, reducible to formal logic), his in-troduction of the theory of types, and his refining and popularizing of thefirst-order predicate calculus. Along with Kurt Godel (see chapter 8), he isusually credited with being one of the two most important logicians of thetwentieth century. We will look at each of these contributions in more detail.

Russell discovered the paradox which bears his name in 1901, whileworking on his ‘Principles of Mathematics’ (1903). The paradox and theclosely related vicious circle principle are discussed in section 5.1. Russell’sown response to the paradox came with the introduction of types (see chap-ter 7). Using the vicious circle principle also adopted by Henri Poincare,together with Russell’s so-called ‘no-class’ theory of classes, Russell was thenable to explain why the unrestricted comprehension axiom (see section 2.1)fails: propositional functions, such as ‘x is a set’, should not be applied tothemselves since self-application would involve a vicious circle. On this view,it follows that it is possible to refer to a collection of objects for which agiven condition (or predicate) holds only if they are all at the same level or‘type’.

Although first introduced by Russell in 1903 in the Principles, his theoryof types finds its mature expression in his 1908 article ‘Mathematical Logic asBased on the Theory of Types’ and in the monumental work he co-authoredwith Alfred North Whitehead, ‘Principia Mathematica’ (1910, 1912, 1913).Principia Mathematica and the theory of types will be treated in detail inchapter 7. The theory admits of two versions, the ‘simple theory’ and the‘ramified theory’. Both versions of the theory later came under attack. Forsome, they were too weak since they failed to resolve all of the known para-doxes. For others, they were too strong since they disallowed many ma-thematical definitions which, although consistent, violated the vicious circleprinciple. Russell’s response to the second of these objections was to intro-duce, within the ramified theory, the axiom of reducibility. Although the

81

axiom successfully lessened the vicious circle principle’s scope of application,many claimed that it was simply too ad hoc to be justified philosophically.

Of equal significance during this same period was Russell’s defense of logi-cism, the theory that mathematics was in some important sense reducible tologic. First defended in his Principles, and later in more detail in ‘PrincipiaMathematica’, Russell’s logicism consisted of two main theses. The firstis that all mathematical truths can be translated into logical truths or, inother words, that the vocabulary of mathematics constitutes a proper subsetof that of logic. The second is that all mathematical proofs can be recast aslogical proofs or, in other words, that the theorems of mathematics consti-tute a proper subset of those of logic.

Like Gottlob Frege, Russell’s basic idea for defending logicism was thatnumbers may be identified with sets of sets and that number-theoretic state-ments may be explained in terms of quantifiers and identity. It followedthat number-theoretic operations could be explained in terms of set-theoreticoperations such as intersection, union, and the like. In ‘Principia Mathema-tica’ Whitehead and Russell were able to provide detailed derivations of manymajor theorems in set theory, finite and transfinite arithmetic, and elemen-tary measure theory. A fourth volume on geometry was planned but nevercompleted.

For more information on Russell’s theory of types and about PrincipiaMathematica, we refer to chapter 7. In this chapter we used parts of [73]and [39].


5.1 Russell’s paradox

I hoped sooner or later to arrive at a perfect mathematics whichshould leave no room for doubts, and bit by bit to extend the sphereof certainty from mathematics to other sciences.

- Russell, in [78]

Paradoxes have been known for a long time, but in particular with theintroduction of more formal systems at the end of the 19th century paradoxesbecame more influential on the foundations of mathematics. Before we de-scribe the most famous paradox of Russell, we first define the notion of aparadox.

What is a paradox? A paradox is a statement which appears self-contradictoryor contrary to expectations, and is also known as an antinomy

In an axiomatic system (see section 6.1) a paradox is a derivation thatleads to a contradictory statement.

A paradox is properly something which is contradictory to ge-neral opinion; but is frequently used to signify something self-contradictory [...] Paralogism, by its etymology, is best fitted tosignify an offence against the formal rules of inference.

- De Morgan, in [31, page 310]

In [86], three ‘paradox threats’ are identified: when systems are complex,formal or designed for computers, there often is not enough intuition to noticeinconsistencies. With the previously described formalizations , the systemsof Cantor (see chapter 2), Peano (see section 4.1), Frege (see section 4.2),and not to mention Russell himself were at risk. And indeed, in 1902 Russelldiscovered a paradox in Frege’s ‘Grundgesetze der Aritmetik’. The paradoxturned out to be at the basics of mathematics, since it could be formulated inall the systems mentioned above. We first formulate the paradox in Cantor’sset theory:

Russell’s paradox: Let R = x | x ∈ x. Then R ∈ R↔ R /∈ R

5.1. RUSSELL’S PARADOX 83

Russell in 1901 studied Cantor’s work [31, section 6.6.1] and after notingthat some sets belonged to themselves while the rest did not do so, Russellshowed that the set of all sets which do not belong to themselves belongs toitself if and only if it does not do so - and, by repetition of the argument,vice versa also. Russell also expressed this paradox in terms of predicates,and as such first presented his discovery in a letter to Frege (see [93, page124] and see also the quote on page 78).

Since Peano’s system was based on the set theory of Cantor, also Peano’swork contained the paradox. In Frege’s work (Grundgesetze der Aritmetik)self-application was not possible, so R ∈ R was not allowed, but the para-dox could still be expressed by using Frege’s notion (see page 77) of thecourse-of-values of a function. If we define equal course-of-values cov bycov(f) = cov(g) ↔ (∀a :: f(a) = g(a)), we can derive the paradox in Frege’swork as follows (see also [86, page 7] for a slightly different proof):

Define f(x) := (¬∀ϕ :: (cov(ϕ) = x)→ ϕ(x)), and let K := cov(f).

¬f(K)

≡ def. f

¬(¬∀ϕ :: cov(ϕ) = K → ϕ(K))

≡ elim.¬¬

(∀ϕ :: cov(ϕ) = K → ϕ(K))

≡ instantiate ϕ with f

cov(f) = K → f(K)

≡ def. K, elim. →

f(K)

The paradox had a big influence, since it could be formulated in all sys-tems, and all statements in classical logic were entailed by a contradiction.


In the eyes of many mathematicians (e.g. Hilbert, Brouwer) it thereforeappeared that no proof could be trusted once it was discovered that thelogic underlying all mathematics was inconsistent. Russell’s paradox arisesas a result of naive set-theory’s so-called unrestricted or naive comprehensionaxiom (see page 16). Cantor created this axiom with the intuition that anycoherent condition may be used to determine a set. But that means that thecondition ϕ that determines a set V = x | ϕ(x) may depend on the wholeset V , i.e. it allows impredicative definitions (see below for the definition ofimpredicative). Most attempts at resolving Russell’s paradox have thereforeconcentrated on various ways of restricting or abandoning this axiom.

Before we consider the consequences of the discovery of the paradox,we first take a further look at the nature of the paradox, hereby followingRussell’s own analysis. While writing ‘The Principles’, Russell’s attentionwas attracted by what is now known as Cantor’s paradox and (according toa letter he wrote to the French mathematician Jourdain) found that therewas something wrong with his earlier refutation of Cantor’s paradox (see [29,section 7]). He removed his earlier refutation from ‘The Principles’ and hisrevised diagnosis uncovered a true paradox. As we have already seen, hesummarized this discovery and the reasoning that led thereto in a secondletter to Frege.

After discovering his famous paradox, Russell traced the fallacy back towhat he called the ‘vicious circle principle’. The ‘vicious circle’ that his prin-ciple is named after, arises from the assumption that a set of objects maycontain members which can only be defined by means of the set as a whole.Therefore, Russell said that statements are not legitimate and meaningless,if they contain a set of objects such that it will contain members which pre-suppose this (total or whole) set of objects. That means a statement is onlylegitimate if all propositions it contains refer to already defined sets.

Definition of impredicative: A definition is impredicative if it involves aset V that has a member v ∈ V whose definition depends on V .1

1Note that a direct implementation of this definition as a new axiom of set theory is notpossible; We might rephrase the definition as ‘whatever set contains an apparent element,that element must not be dependant on that set’. This might be implemented by fixing‘an apparent element’ of a set and then expressing its independency of other elements ofthat set. This independency means that, regardless of the nature of the elements of the


In a sense those impredicative definitions are thus circular, and were con-sidered the cause of antinomies. For more information about impredicativity,see [57, section 15.3].

Definition of Vicious Circle Principle2: Definitions, assumptions orstatements involving all of a set must not be a part or an element of thatset. In other words, impredicative definitions should be avoided.

In terms of set theory we can formulate the principle as : No set V isallowed to contain members v definable only in terms of V , or members vinvolving or presupposing V .

Vicious circle fallacies are arguments that are condemned by the viciouscircle principle. Such arguments may not necessarily lead to contradictions(since fallacious arguments can lead to true conclusions).

In Principia Mathematica (see [31, section 7.2]), Russell assembles a col-lection of seven different paradoxes, all of which were based on the samecircular type of reasoning, and then he resolved them by making their circu-larity explicit. We will now mention eight of the most well-known paradoxes,most of whom originate from the vicious circle principle.

set, the nature of the apparent element remains the same. The ‘nature’ of the elementscan be seen as all the members of that element (or in case the element is an individual,the nature of the apparent element can be seen as that individual). This leads us to thefollowing axiom:(∀X :: (∀x : x ∈ X : x = a → (∀x′ : x′ ∈ X ∧ x = x′ : x′ = b(x′) → a ∈ X))). Clearlythis does not avoid the paradox of Russell. We consider a set X:=R ≡ x | x /∈ x andan element x ∈ R, i.e. we have x /∈ x. Despite the fact that the set X is ‘too large’, theaxiom does not prohibit the existence of the set X. The axiom tells us x = a → (∀x′ :x′ ∈ R ∧ x = x′ : x′ = b(x′)→ a ∈ R). In other words, we can change each element in Rexcept x and the nature of x should not depend on it. The only thing we know about x isthat x ∈ x and x ∈ R. So to obtain a contradiction we have to show that x ∈ x ∨ x /∈ R.Now we can change all x′ into any value b(x′), but still we will have x /∈ x and x ∈ R. Sounfortunately this most ‘direct’ attempt to solve the paradox fails.

2Russell formulated it originally as ‘Whatever involves all of a collection must not beone of the collection’. Or, as formulated in [49, page 113]: ‘If, provided a certain collectionhad a total, it would have members only definable in terms of that total, then the saidcollection has no total’. Another formulation of [87] says ‘No entity can be defined interms of a totality of which it is itself a possible member’.


1 Russell’s paradox (1903), which we have discussed in this section. Theimpredicativity is clear in the definition of the set that contains all setsthat are not members of themselves. There are many popularizations ofthis paradox, one of them is from Russell himself (1919) and concernsthe plight of the barber of a certain village who has enunciated theprinciple that he shaves only all those persons of the village who donot shave themselves. The paradox is then formed by the question‘Does the barber shave himself?’.

2 Burali-Forti’s paradox (1897), which we have discussed in section 3.8.2.The impredicativity comes from the ordinal number of the naturallyordered set of all order numbers.

3 Cantor’s paradox, which we have discussed in section 3.8.1. The im-predicativity comes from the cardinal number of the set of all sets.

4 The liar’s paradox: We quote from [49, page 127]: “If a man says ‘Iam lying’, his utterance is self-contradictory, and it cannot be eithertrue or false. The oldest form of this particular paradox, in the wordsof Principia Mathematica, is that of Epimenides the Cretan, ‘who saidthat all Cretans were liars, and all other statements made by Cretanswere certainly lies’.”.

5 Richard’s paradox: The French schoolteacher Jules Richard (1862-1956) published a paradox in [74] in 1905. He considered a set V ofall non-terminating decimals that can be defined in a finite number ofwords. By arranging V as a sequence, and applying Cantor’s diagonalargument to the members of V , a different but non-terminating decimalwas produced, defined in a finite number of words.

6 Paradox of definitions. Again we quote from [49]: “The possible defi-nitions of specific ordinal numbers can be arranged in a sequence, andthere are therefore at most ℵ0 of them. But the totality of ordinalnumbers is not denumerable, and so there exist ordinal numbers whichcannot be individually defined. Among such indefinable ordinals thereis a least, and thus it appears that the description ‘the least indefinableordinal’ yields a definition of an entity that cannot be defined.”.

7 Berry’s paradox: “The least integer not nameable in fewer than nine-teen syllables” is itself a name that contains only eighteen syllables.


8 The Grelling-Nelson paradox: The German philosopher Kurt Grelling(1886-1942) published with his friend Leonard Nelson (1882-1927) in1908 a paradox. As described in [31, page 336]: “Some words can bepredicated of themselves: in English, ‘word’ is a word, ‘noun’ is a noun,and so on. This property is called ‘autological’, and is obviously itselfautological. Other English words are not autological; ‘German’, say, or‘verb’. They are called ‘heterological’ - but this word is heterological ifand only if it is not so.”.

The first three paradoxes are logical paradoxes that can be formulatedwithin Cantor’s set theory. The remaining five are mainly paradoxes of nam-ing, they are of a semantic kind. All these paradoxes have stimulated funda-mental research, and especially Russell’s paradox that revealed the viciouscircle principle and first showed the need for a theory of types or other re-striction of the power of the comprehension axiom.


5.2 Consequences and philosophies

Perhaps the greatest paradox of all is that there are paradoxes inmathematics.

- E. Kasner and J. Newman quoted in [46]

The various proposals to overcome this paradox led to various theories.One proposal was to reconstruct set theory on an axiomatic basis (thisaxiomatic method was first suggested by Hilbert, see section 6.1) sufficientlyrestrictive to exclude the paradoxes. Hilbert and other formalists had thebasic idea to allow the use of only well-defined and finitely constructibleobjects, together with rules of inference that were deemed to be absolutelycertain.

The mathematician Zermelo in 1908 as first did an attempt to formulateproper axioms for set-theory such that the paradox is not deducable, butmost other parts of set-theory are. This attempt was successful and, aftera refinement by the mathematician Fraenkel, led to the ZF axiom system(see section 5.3) which is still the most accepted basis today. Subsequentrefinements to ZF have been made by Skolem, and later by the three mathe-maticians von Neumann, Bernays and Godel (see section 8.5).Russell’s own response to the paradox came with the introduction of histheory of types in his Principia Mathematica (see section 5.4). Russell al-ready laid out a first version of his theory to eliminate the paradoxes in1908. Since self-application (R ∈ R) caused a contradiction, he decided tosuppress this. With this approach he assigned types to variables (as typeshe took sets) and allows expressions such as x ∈ y only if the type of xis one less (in some order) than the type of y. The outlawing of impredi-cative definitions seemed a solution to the known paradoxes in set theory.But it turned out there are essential and accepted parts of mathematics thatcontain impredicative definitions. This was a serious problem to Russell’ssolution, despite the fact that many instances of impredicative definitions inmathematics could be circumvented. We quote from [22, page 265]: “In 1918,the German mathematician Hermann Weyl (1885-1955) tried to construct asmuch parts of analysis as possible from the natural number system withoutthe use of impredicative definitions. Although he succeeded in obtaining aconsiderable part of analysis, he was unable to derive the important theorem

5.2. CONSEQUENCES AND PHILOSOPHIES 89

that every nonempty set of real numbers having an upperbound has a leastupperbound”.Other attempts towards a solution for the paradoxes of set theory focus onthe foundations of logic. Luitzen Brouwer and the intuitionists took thisapproach and tried to prevent the paradoxes by denying the principle of theexcluded middle (which states that any mathematical statement is eithertrue or false). Brouwer first attacked the logical foundations of mathematicsin his doctoral thesis in 1907; This formed the beginning of the IntuitionistSchool. The intuitionists had the basic idea that one cannot assert the exis-tence of a mathematical object unless one can also indicate how to go aboutconstructing it.

In the period after the discovery of the paradoxes, we distinguish threemain philosophies of mathematics: logicism, intuitionism and formalism.

What is Logicism? A school of mathematical thought which holds thethesis that mathematics is a part of (or a branch of) logic.

Logicists contend that all of mathematics can be deduced from pure logic,without the use of any specifically mathematical concepts, such as number orset. The first ideas date back to Leibniz (1616) and the actual reduction ofmathematics to logic was started by Dedekind (1818) and Frege (1884-1903)and later by Peano, and Whitehead and Russell (in Principia Mathematica1910-1913).

What is Intuitionism? A school of mathematical thought by the 20th cen-tury Dutch mathematician L.E.J. Brouwer (1881-1966) that contends thatthe primary objects of mathematical discourse are mental constructions go-verned by self-evident laws.

Intuitionists have challenged many of the oldest principles of mathema-tics as being non-constructive (and hence meaningless). They proposed thata proof in mathematics should be excepted only if it constructed the mathe-matical entity it talked about, and not if it merely showed that the entity‘could’ be constructed or that supposing its non-existence would result incontradiction.


Brouwer had the fundamental insight that such nonconstructive argu-ments will be avoided if one abandons a principle of classical logic (whichlies for example behind De Morgan’s laws). This is the principle of the ex-cluded third (or excluded middle), which asserts that for every propositionϕ, either ϕ or ¬ϕ; or equivalently that, for every ϕ, ¬¬ϕ implies ϕ. Thisprinciple is basic to classical logic and had already been enunciated by Aris-totle, though with some reservations, as he pointed out that the statement“there will be a sea battle tomorrow” is neither true nor false.

Because of the weight it places on mental apprehension through construc-tion of purported mathematical entities, intuitionism is sometimes also calledconstructivism. A still more severe form of constructivism which we will notfurther discuss is strict finitism, in which one rejects infinite sets. More in-formation on intuitionism can be found in [60].

What is Formalism? A school of mathematical thought introduced by the20th century mathematician David Hilbert, which holds that all mathematicscan be reduced to rules for manipulating formulas without any reference tothe meanings of the formulas.

Formalists contend that it is the mathematical symbols themselves, andnot any meaning that might be ascribed to them, that are the basic objectsof mathematical thought. Hilbert’s program, called formalism, was to con-centrate on the formal language of mathematics and to study its syntax. Astatement should be a metatheorem, that is a theorem provable within thesyntax of mathematics.

These three philosophies do not necessarily contradict each other, andall philosophies are still advocated today. Whether the logicist thesis hasbeen established seems to be matter of opinion. Though successful, it canbe questioned on the ground that the systematic development of logic pre-supposes mathematical ideas in its formulation. The intuitionists succeededin rebuilding large parts of present-day mathematics, but a large part is stillwanting, making intuitionist mathematics less powerful and in many respectsmuch more complicated than classical mathematics. These are serious ob-jections to the intuitionistic approach, but it is generally conceded that itsmethods do not lead to contradictions, and some hope for a new intuitionistreconstruction of mathematics carried out in a different and more successful

5.2. CONSEQUENCES AND PHILOSOPHIES 91

way. Unfortunately for the formalists, a consequence of Godel’s incomplete-ness theorem (see chapter 8) is that the consistency of mathematics can beproved only in a language which is stronger than the language of mathema-tics itself. Yet, formalism is not dead - most pure mathematicians are tacitformalists, but the naive attempt to prove the consistency of mathematics ina weaker system had to be abandoned. From [11, item from Paul Bernays]we learn that most mathematicians of all three philosophies are also philo-sophical realists: “While no one, except an extremist intuitionist, will denythe importance of the language of mathematics, most mathematicians arealso philosophical realists who believe that the words of this language denoteentities in the real world. Following the Swiss mathematician Paul Bernays(1888-1977), this position is also called Platonism, since Plato believed thatmathematical entities really exist.”. For more information about realism, see[57].


5.3 Zermelo Fraenkel

5.3.1 Axiomatic set theory

After the discovery of Russell’s paradox, it became clear that set theoryneeded a new and more rigorous basis. Hilbert’s proof theory, that will betreated in more detail in chapter 6.1, offered a way to put set theory on firmand hopefully consistent grounds. The so-called ideal calculus was a firstformalization of Cantor’s set theory, but it lacked the preciseness of Hilbert’slater theories and was inconsistent because it still contained in some form the(naive) comprehension principle (see page 16). The first real axiomatizationof set theory was given in 1908 by the German mathematician Ernest Zermeloin [101]. The attitude adopted in his axiomatic development of set theoryis that it is not necessary to know what ‘sets’ are and the ‘things’ that areits elements, nor what the ‘membership relation’ means [49, see page 288,paragraph 1]. Zermelo instead postulated a domain B of abstract objects andrepresented the elements or ‘things’ of this domain by the letters a, b, c, . . ..He then defined the primitive notions of equality and membership: a = bstates that ‘a’ and ‘b’ designate the same ‘thing’. a ∈ b is defined on thedomain B and if a ∈ b holds, we call b a set and a an element of this set. Thussome, but not necessary all objects of B are sets. The assumptions adoptedabout these notions are called the axioms of the theory. Its theorems are theaxioms together with the statements that can be deduced from the axiomsusing the rules of inference (see also section 6), for example by a system oflogic. Criteria for the choice of axioms have been identified by several people(see Hilbert’s theory in section 6, or [49, last sentence of page 287]). Themost accepted criteria (more formally defined in chapter 6) include:

1. Consistency of the system (it should be impossible to derive both astatement and its negation, in other words the paradoxes should beavoided).

2. Plausibility (the axioms should be in accord with intuitive beliefs aboutsets, see [60]).

3. Completeness (richness of the theory: the desirable results of Cantorianset theory ought to be derived as theorems).

In the next paragraph we will present the set of axioms that Zermelo haschosen and that formed the basis for all future axiomatizations of set theory

5.3. ZERMELO FRAENKEL 93

(see also section 8.5).

5.3.2 Zermelo Fraenkel (ZF) Axioms

Zermelo formulated his axiomatic system in 1908, the extensions of Fraenkelare from 1922. In the same year (1922) the Norwegian mathematician Skolem(1887-1963) proposed a formal language for formulating the theory.

Zermelo noted that the sets involved in a derivation of the paradoxes arevery large3 (for Cantor’s paradox it is the set of all sets (see section 3.8.1),for Russell’s paradox it is the set of all sets which are not members of them-selves (see section 3.8.2), and for the Burali-Forti paradox (see section 3.8.2)it is the set of all well-orderings). Therefore he wanted to restrict the size ofsets, and he changed the (naive) comprehension principle into his separationaxiom, such that the paradox could no longer be derived:

Separation Axiom: (∀z∃y∀x :: (x ∈ y ↔ x ∈ z ∧ ϕ(x)))For every set z and definite4 property ϕ of sets there exists a set whose ele-ments are exactly those of z having the property ϕ.

There are also certain limitations on the property ϕ (i.e. it should be de-finite) that we will mention later in section 8.5. We show that the standardderivation of Russell’s paradox cannot be applied when the naive compre-hension axiom is replaced by the separation axiom.

Let R = x | x ∈ Z ∧ x /∈ x

R ∈ R↔ R ∈ Z ∧R /∈ R

→ R /∈ R, contradiction.

R /∈ R↔ R /∈ Z ∨R ∈ R

3The term proper class is sometimes used to refer to these ‘excessively large’ sets; allother sets are then referred to as improper classes. This means all sets are classes but notevery class is a set. A class that is not a set is called a proper class.

4See section 8.5 for the definition of the concept of definiteness.


← R /∈ Z

In both equations above we can only conclude that R ∈ R ↔ R /∈ R ifwe know that R /∈ Z. Since we cannot directly conclude (or refute) R ∈ Z,Russell’s derivation of his paradox does not apply.

However, this fact alone does not guarantee that there does not exist aparadox, as claimed in some articles, but merely that the separation axiomdoes not permit the construction of paradoxical sets with elements definedin terms of the sets themselves. But until consistency is proved, there mightbe other less obvious ways to construct a paradox.

We now give all of the ZF axioms that constitute set theory. The firstseven axioms are those that were originally formulated by Zermelo. Axiom8 and 9 were later added by Fraenkel and von Neumann respectively. Theaxioms 1 through 8 are the original set of the Zermelo-Fraenkel axioms.

In the definitions below we use several shorthand notations. If we wishhowever we can express these definitions in full detail, such that the notationof each expression does not depend on previous axioms. For example, inaxiom 8 we used the ∃! to denote that there is exactly one y, and in axiom9 we used the symbols ∩ and ∅, and in axiom 6 we used ⊆ to express x ⊆ zas a shorthand for (∀y :: y ∈ x → y ∈ z). The separation and substitutionaxioms are actually axiom schemes.

The Zermelo-Fraenkel axioms:

1. Extensionality axiom (or axiom of determination):(∀x, y, z :: (z ∈ x↔ z ∈ y)→ x = y)Sets are uniquely determined by their members,or to be exact: if everyelement of a set x is at the same time an element of y, and conversely,then x = y.

2. Axiom of the empty set:(∃x∀y :: y /∈ x)There is an (improper, see also footnote on page 93) set, the ‘null’ or‘empty’ set, which contains no elements at all.


3. Separation axiom:(∀z∃y∀x :: x ∈ y ↔ x ∈ z ∧ϕ(x)), ϕ is definite and does not contain y.For every set z there exists a set y whose elements are exactly those ofz having the property ϕ.

4. Pairing axiom:(∀a, b :: (∃y∀x :: x ∈ y ↔ x = a ∨ x = b))Given two sets a and b there exists a set whose elements are exactly aand b.

5. Sum-set axiom or Union axiom:(∀z∃y∀x :: x ∈ y ↔ (∃w :: w ∈ z ∧ x ∈ w))For every set z there exists a set y whose elements are exactly thoseobjects occurring in at least one element of z.

6. Power set axiom(∀z∃y∀x :: x ∈ y ↔ x ⊆ z)For every set z there is a set y that includes every subset of x.

7. Axiom of infinity:(∃z :: ∅ ∈ z ∧ (∀a : a ∈ z : a ∈ z))There exists a successor set.

8. Axiom of replacement or axiom of substitution (by Fraenkel):(∀x∃!y :: ϕ(x, y))→ (∀a :: (∃b∀y :: y ∈ b↔ (∃x : x ∈ a : ϕ(x, y))))The image of a set under an operation ϕ (functional property) is againa set.

9. Axiom of foundation or axiom of regularity (by von Neumann):(∀a ::= ∅ → (∃b :: b ∈ a ∧ b ∩ a = ∅))Every non-empty set is disjoint from at least one of its elements.

Theorem: (from [49, chapter 11]) The domain B itself (see page 92) is nota set.Proof: Suppose V is any given set. Then5, V has a subset W that consists ofthose elements of V that are not members of themselves. But then W is notan element of itself (because in that case we would have W ∈ W , while W

5Since the property x /∈ x is definite. See section 8.5 for the definition of the conceptof definiteness.


consists of elements that are not members of themselves). But if W wouldbe an element of V −W , we would also have W ∈ W . This means that Wis not a member of V . But V is certainly in B, and therefore B is not thesame as V . Thus B cannot coincide with any set at all.

The theory is not complete, since many statements are independent ofZF. Independent of the previous axioms, the following two statements havea more dubious status (and are not part of standard ZF ):

10. Axiom of choice (AC):(∀x :: (∃f : f is a function : Dom(f) = x−∅∧Ran(f) ⊂

⋃A∧ (∀a :

a ∈ Dom(f) : f(a) ∈ a)))Every set x has a choice function.

Definition of choice function: A function f is called choice function forthe set V := Dom(f) = V − ∅ ∧ (∀v : v ∈ Dom(f) : f(v) ∈ V )

11. Generalized Continuum Hypothesis(GCH):

For any cardinal ℵr, 0, 1ℵr = ℵr+1

In 1908 Felix Haussdorf proposed this generalization of CH. Anotherformulation of this axiom and more information are given in section 3.6. Inthe remainder of this section, we will give a short explanation of the natureof the other axioms. For more detailed information, we refer to section 8.5and to the rich literature on set theory that is available (for example [17],[24], [49, chapter 11], [28]).

The axioms are not minimal. For example, as we have already seen insection 2.26, the axiom of the empty set can be deduced from the separationaxiom. We also have empty set axiom + substitution axiom ! separationaxiom. We have also seen in section 2.2 how we can define basic operationswith the extensionality and separation axioms. The pairing, sum and pow-erset axioms, together with the extensionality axiom, ensure uniqueness ofthe pairs, sums and powersets of sets. With these axioms alone we can al-ready create an infinite number of axioms. However, each set constructed

6The existence of the empty set in section 2.2 was actually derived from the compre-hension principle but the result can similarly be obtained from the separation axiom.


with axioms 1 to 6 only has a finite number of elements. It is the infinityaxiom that we need to create infinite sets. These sets are not unique, but thesmallest successor set, denoted ω, is unique. We call its elements the naturalnumbers. With this axiom we can now also prove the principle of inductionfor ω (see section 3.4.3). The substitution axiom says that whenever ϕ is aproperty of sets, such that to every x there is exactly one y for which ϕ(x, y),and a is a set, then there exists a set, the elements of which are exactlythose y for which an x ∈ a exists such that ϕ(x, y). The foundation axiomsays that each non-empty set has epsilon-minimal elements (see below). Animplication of this axiom is that there is no function f defined on ω suchthat (∀i : i ∈ ω : f(i + 1) ∈ f(i)). For a motivation and analysis of the roleof the foundation axiom we refer to [17, section 2.1].

Definition of epsilon-minimal:An element b ∈ a is epsilon-minimal in a := b ∩ a = ∅

Another corollary of the foundation axiom is that there is no set whichhas itself as its only element. Note that to prevent the paradoxes we needthe separation axiom, not the foundation axiom.

The origin of the axiom of choice was Cantor’s recognition of the impor-tance of being able to well-order arbitrary sets; i.e., to define an orderingrelation for a given set such that each nonempty subset has a least element.The virtue of a well-ordering for a set is that it offers a means of provingthat a property holds for each of its elements by a process (transfinite in-duction) similar to mathematical induction. Zermelo (1904) gave the firstproof that any set can be well-ordered. His proof employed a set-theoreticprinciple that he called the axiom of choice, which, shortly thereafter, wasshown to be equivalent to the so-called well-ordering theorem. One form ofthis principle is expressed as the axiom of choice. A choice function for a setA ‘chooses’ an element from each non-empty subset in A. If x is a nonemptyset the elements of which are nonempty sets, then there exists a function fwith domain y such that for member a of y, f(a) ∈ a. For a more detaileddiscussion of the axiom of choice we refer to [17, section 2.9].Intuitively, the axiom asserts the possibility of making a simultaneous choiceof an element in every nonempty member of any set; this guarantee accountsfor its name. The assumption is significant only when the set has infinitelymany members. Zermelo was the first to state explicitly the axiom, although


it had been used but essentially unnoticed earlier. It soon became the subjectof vigorous controversy because of its unconstructive nature. There are a fewmathematicians who feel that the use of the axiom of choice is improper, butto the vast majority it, or an equivalent assertion, has become an indispens-able and commonplace tool. For this discussion of the axiom of choice wehave used [63], [77] and [11].

A discussion of the Generalized Continuum Hypothesis can be found insection 3.7.

Chapter 6

Hilbert

The further a mathematical theory is developed, the more harmo-niously and uniformly does its construction proceed, and unsus-pected relations are disclosed between hitherto separated branchesof science.

- Hilbert, quoted in [76]

David Hilbert (1862-1943) was a German mathematician who reducedgeometry to a series of axioms and contributed substantially to the esta-blishment of the formalistic foundations of mathematics. His first work wason invariant theory and in 1888 he proved his famous Basis theorem (see[5]). After that he did significant work in the areas of algebraic numbertheory, and published his ‘Zahlbericht’, or ‘Report on the theory of numbers’in 1897. In 1899 he published the ‘Grundlagen der Geometrie’ (to appearin English as ‘The foundations of Geometry’ in 1902), which contained (see[31, section 4.7.2]) what would become a widely accepted set of 21 axiomsfor Euclidian geometry and an analysis of their significance. This axiomaticmethod that Hilbert used (for geometry, but its application and conceptis more general and can be used far beyond the domain of geometry, seealso [57, section 14.7]) will be treated in section 6.1. A substantial part ofHilbert’s fame rests on a list of 23 mathematical problems he outlined in1900, and posed as a challenge for the next century. Some of these problemswere related to the foundations of mathematics (see section 6.2). In 1905Hilbert attempted to lay a firm foundation of mathematics by proving itsconsistency, resulting in two volumes of ‘Grundlagen der Mathematik’ that

99

100 CHAPTER 6. HILBERT

were intended to lead to a proof theory. Despite that in 1931 Kurt Godelshowed this goal to be unattainable (see chapter 8), the work Hilbert haddone on the foundations of mathematics nevertheless remained influential tothe development of logic. Hilbert’s work on integral equations in about 1909,(see [45]) led to research in functional analysis and established the basis forhis work on infinite-dimensional space, later called Hilbert space (see [22,page 232]). When Hilbert was made an honorary citizen of Gottingen hegave an address which ended with six famous words, showing his enthusiasmfor mathematics and optimism for solving mathematical problems: “Thereare absolutely no unsolvable problems. Instead of the foolish ignorabimus[Latin for ‘the ignorant’], our answer is on the contrary: Wir mussen wissen,Wir werden wissen” [We must know, We shall know].

6.1. HILBERT’S PROOF THEORY 101

6.1 Hilbert’s proof theory

Hilbert formalized mathematical theories in order to turn them into well-defined objects of discussion, thus making possible the new kind of investi-gation to which he gave the new name meta-mathematics. Hilbert was thefirst who emphasized that strict formalization of a theory involves the totalabstraction from the meaning, the result being called a formal system orformalism. In its structure, a formalized theory is no longer a system ofmeaningful propositions but one of sentences as sequence of words, whichin turn are sequences of letters (a symbolic language). Hilbert’s method ofmaking the formal system as a whole the object of mathematical study iscalled metamathematics or proof theory .

What is metamathematics? The study about mathematics itself (withrespect to formalized mathematical systems, metamathematics thus consistsof statements about the signs and formulas occurring within axiomatic sys-tems). One of the primary goals of metamathematics is to determine thenature of mathematical reasoning

After Hilbert presented an axiomatic development of geometry in ‘Grund-lagen der Geometrie’ (1899), he devoted himself to the much greater task ofapplying his new metamathematic method to pure mathematics as a whole.Or, as Hilbert wrote in 1917: “Since the examination of the consistency is atask that cannot be avoided, it appears necessary to axiomatize logic itselfand to prove that number theory and set theory are only parts of logic”.Hilbert took a formal(istic) approach to achieve this logistic goal (logicismis the study that uses logic as the basis of mathematics and formalists at-tempted to successfully axiomatize mathematics, see also the philosophies insection 5.2). Thereto Hilbert identified three properties that an axiomaticsystem should have: it should be decidable, complete and consistent . In or-der to define these notions, we first have to make precise some other concepts.

Definition of an axiom:A proposition that is regarded as true without proof

Definition of free variable:A variable that is not bound within the scope of a quantifier


An axiom that does not contain any variables is also called an axiomstatement, an axiom with free variables is called an axiom scheme and eachfree variable is to be quantified over all well-formed formulas.

Definition of statement (or sentence): A well-formed formula with nofree variables

Of the systems that Hilbert’s proof theory applies to, we here considerthose susceptible to Godel’s incompleteness theorem (that will be presentedin chapter 8).

Definition of an STGA language: A language1 L is Susceptible toGodel’s argument (STGA) if it consists of:

1 E , a denumerable set of (well-formed) expressions (also called formulas)of L

2 S ⊆ E , sentences of L (i.e. with no free variables)

3 P ⊆ S, provable sentences of L

4 R ⊆ S, refutable sentences of L

5 H ⊆ E , predicates of L (i.e. with free variables, H ∩ S = ∅). Forconvenience, we here assume predicates to have exactly one variable.

6 A function ϕ : E × N → E , ϕ assigns to every E ∈ E and n ∈ N anexpression E(n) such that for every H ∈ H we take for E and everyn ∈ N, H(n) is a sentence (H(n) ⊆ E hence, H(n) ⊆ S).We can think of such a function ϕ as a substitution function. Infor-mally, the sentence H(n) expresses the proposition that the number nbelongs to the set names by H.

The following set is the only one that depends on a semanticinterpretation of the expressions, and is normally determined by amodel that we accept as representing the truth. The model shouldbe distinguished from the set of derivation rules that (syntacticallyor mechanically) determines whether sentences are provable or

1Sometimes also called system, since it not only defines a language but also includesthe (dis)provability and truth of expressions.


refutable. It is important to realize that the truth of a sentenceis not the same as the provability of that sentence.

7 T ⊆ S, true sentences of L. This set can be determined by a model(see page 107)

First, we give an intuitive explanation of this definition: In most parts ofmathematics, not every sequence of symbols is meaningful or useful. There-fore we only consider the so-called well-formed formulas E . Some of theseformulas (also called propositions) do not contain free variables, we namethem sentences (S). Some of them are provable from the axiomatic system(i.e. they can be derived from the axioms and derivation rules of the axiomaticsystem), and are elements of P . Others are refutable, also called disprovable(i.e. their negation can be derived from the axioms and derivation rules ofthe axiomatic system) and are elements of R. These notions only depend onwhether the sentence is derivable from the axiomatic system and are inde-pendent from the truth of the sentence. We call the set of true sentences T(the other sentences are false). Other formulas have free variables, i.e. theyare functions. We call them predicates (H). We also assume there exists afunction ϕ that assigns to every expression H ∈ H and natural number n asentence H(n).

What is an Axiomatic System? An axiomatic system (sometimes alsocalled formal axiomatic system) is a logical system that gives rise to an STGLlanguage and has an explicitly stated finite set of axioms from which provablesentences can be derived (using a finite set of derivation rules)

The set of axioms and derivation rules determines which sentences of Lare provable or not. The axiomatic system also contains a syntax definitionthat determines the well-formedness of expressions of L. Normally, the syn-tax definition of an axiomatic system consists of an alphabet of symbols anda set of rules. We show that this notion of an axiomatic system gives riseto a language that falls under the category of STGL languages. Such anaxiomatic system A is often defined as follows:


Definition of axiomatic system: An axiomatic system A consists of:

• An alphabet Σ, consisting of a finite number of constants (with theirarities) and variables.

• A recursive definition of a syntax, determining which formulas are well-formed formulas.

• An initially determined and fixed set of axioms and derivation rules(also called transformation rules or rules of inference).

The recursive definition over the given alphabet gives us the set of ex-pressions. The variables enable us to form predicates. The set of axioms andderivation rules let us prove or refute sentences. Ideally, we want all sen-tences that are provable coincide with the sentences we intuitively considertrue (P = T ) and the refutable sentences coincide with those we considerfalse. We call a system with this property correct. We now give an exampleof a definition of a simple axiomatic system.

Example: axiomatic system A1

• Σ = ∨2,¬1, (0, )0,∀2, x0, y0, R20, true0, false0

The numbers that are written in superscript denote the arity of therelations; a constant or variable is a 0-ary relation.

• ϕ is a well-formed formula if it

0. is one of the constants true and false.

1. is an atomic formula Ri(x1, . . . , xj), with Ri a relation with arityj, and x1, . . . , xj variables or constants.

2. has the form of ϕ1 ∨ ϕ2, ϕ1 ∧ ϕ2, (ϕ1),¬ϕ1,∀xi(ϕ1), where ϕ1 andϕ2 are smaller formulas and xi is some variable from Σ.


• For all variables x, variables or constants c and d and well-formed for-mula ϕ,

R0(c, d)

true

∀x(ϕ)

false

¬false

true

¬true

false

true ∧ ϕ

ϕ

false ∧ ϕ

false

true ∨ ϕ

true

ϕ ∨ true

ϕ


The STGA language L that can be constructed2 on the basis of A1,denoted by LA1 , consists of the following parts:

1. E is the set of usual mathematical predicates formed by the symbols ofthe given alphabet (so E includes the binary relation R0).

2. S is the set of those expressions without free variables (i.e. proposi-tions).

3. The provable sentences P are those that are true from the derivationrules. For example, ¬ false ∧ R0(false, true) → true ∧ R0(false, true)→ true ∧ true → true.

4. The refutable sentences R are those that are false from the derivationrules. For example, ∀y (false ∨y) ∧ true → false ∧ true → false.

5. The predicates are those expressions with one free variable.

6. For each such predicate we can replace the free variable by a formulathat is represented3 by a natural number, and obtain a proposition.

7. The definition of an axiomatic system does not include a model. If wethink of the standard logic that is used in practice, we can see that forall formulas except those with an ∀-symbol, the formulas are derivableif and only if they are true.

We now introduce some concepts related to STGA languages and axiomaticsystems. We assume that A is an axiomatic system that gives rise to anSTGA language L.

Definition of derivable: A formula ϕ is derivable in L := ϕ ∈ P.A formula ϕ is derivable from an axiomatic system A, notation A ! ϕ :=there is an axiom ai of A and a sequence of formulas ϕ1,. . . , ϕl such thatϕ1 = ai and ϕl = ϕ and each ϕi follows from the preceding formulas and theaxioms of A by the derivation rules of A.

2Sometimes it is also said that an axiomatic system A1 gives rise to a language LA3An example of such a bijective function between a predicate and a set of natural

numbers will be given in section 8.2.


We call the sequence of formulas ϕ1, . . . , ϕl in a derivation of the state-ment ϕ a formal proof π of the statement ϕ. When A ! ϕ, we also writeϕ ∈ A.

Example:

A1 ! ¬ false ∧ R0(false, true)

A1 ! ∀x)x¬ (since the formula is not well-formed, i.e. does not followto be true from the syntax definition)

A1 ! ∀y (false ∨y) ∧ true (since it does not follow from the derivationrules, i.e. is a refutable sentence)

Hilbert proposed a program to reformulate all mathematics as a formalaxiomatic theory, and this theory has to be proved to be consistent, i.e. freefrom contradiction. The standard method that was used to prove the consis-tency of axiomatic systems was to give a ‘model’. A model for an axiomatictheory is simply a system of objects, chosen from some other theory andsatisfying the axioms.

This means we can relate axiomatic systems to existing systems by meansof a model, also called interpretation or structure. A model of a formalaxiomatic theory is a well-defined mathematical system with the particularstructure that is characterized by the theory.

Definition of universe: Set of values that variables of an axiomatic systemmay take

Definition of a model: A universe together with an assignment of n-aryrelations to n-ary constants, and a corresponding assignment of the variables.

We define a modelM for an axiomatic system A by : M = (U, P1, . . . , Pk)with U a universe for A and P1, . . . , Pk the relations corresponding to symbolsR1, . . . , Rk of A. If a formula ϕ is true in the modelM (i.e. by interpretationof the relation symbols by the corresponding relations), notationM |= ϕ, wesay thatM is a model of ϕ.


Example: LetM1 = (N,≤) be a model for axiomatic system A1

M1 |= ∀x∀y(x ≤ y ∨ y ≤ x)M1 |= ∀x∀y(x ≤ y ∧ y ≤ x)Note that instead of using R1 for the relation symbol, we immediately tookthe interpretation ≤.

A theory Th of a modelM, notation Th(M) is the set of true statementsin the language of that model.

Definition of a theory: Th(M) := ϕ | ϕ is a statement andM |= ϕ

So now we can say that Hilbert was looking for an axiomatic system forwhich logic can be a model. Hilbert proposed such an axiomatic system tohave the properties of consistency, completeness and decidability. We willnow introduce these concepts, along with some other properties of axiomaticsystems. Since the properties of an axiomatic system A give rise to corre-sponding properties in the language LA, we here distinguish in each definitionbetween the property of a language and of an axiomatic system.

Definition of decidability:A language L is decidable := (∀ϕ :: (ϕ ∈ P ∨ ϕ ∈ R)).An axiomatic system A is decidable := (∀ϕ :: there is an algorithm that de-cides in a finite number of steps whether (or not) A ! ϕ) (see also [49, page270])

Definition of consistency:A language L is consistent := ¬(∃s : s ∈ S : s ∈ P ∧ s ∈ R), i.e. P ∩R = ∅or no sentence is both provable and refutable in L.An axiomatic system A is consistent := ¬(∃ϕ :: A ! ϕ ∧ A ! ¬ϕ) (i.e. it isnot possible for any formula ϕ, to derive both ϕ and ¬ϕ) (see also [49, page240])

A language L is inconsistent if is not consistent. Clearly, L is inconsistentif P and R are not disjoint. Note that consistency and decidability do notrefer to T , but only concern P and R. The following definitions of com-pleteness, soundness and correctness also depend on the truth set T (andtherefore on the model that determines that truth set).


Definition of completeness:A language L is complete for a modelM := (∀ϕ :: M |= ϕ→ ϕ ∈ P).An axiomatic system A is complete for modelM :=(∀ϕ :: M |= ϕ → A ! ϕ) (i.e. all true statements in the model are deriva-ble/provable)

A language L is incomplete if it is not complete. Note that the statement(∀ϕ :: M |= ϕ→ A ! ϕ) is equivalent with (∀ϕ :: A ! ϕ→M |= ϕ), i.e. allstatements ϕ that are not derivable/provable, are also not true in the model.

Definition of soundness:A language L is sound for a modelM := (∀ϕ :: ϕ ∈ P →M |= ϕ).An axiomatic system A is a sound axiomatization for a modelM :=(∀ϕ :: A ! ϕ → M |= ϕ) (i.e. if a statement ϕ is derivable/provable, it istrue in the model)

Definition of correctness:A language L is correct for a model M := P ⊆ T ∧ R ∩ T = ∅ (i.e. everyprovable sentence is true and every refutable sentence is false (not true)).An axiomatic system A is correct for a model M := A is sound for M andA is complete forM

Theorem: If L is correct, it is consistent.Proof: This follows directly from the definitions of correctness and consis-tency because if P is a subset of T and T is disjoint from R, then P mustbe disjoint from R.


6.2 Hilbert’s 23 problems

Who of us would not be glad to lift the veil behind which the futurelies hidden: to cast a glance at the next level of our science andat the secrets of its development during future centuries? Whatparticular goals will there be toward which the leading mathema-tical spirits of coming generations will strive? What new methodsand new facts in the wide and rich field of mathematical thoughtwill the next centuries disclose?

- D. Hilbert, in the opening of his speech to the 1900 Congressin Paris

In 1900 Hilbert outlined his list of 23 mathematical problems to the In-ternational Congress of Mathematics in Paris, which he urged upon the at-tention of his contemporaries. His famous address was important and stilltoday influences and stimulates mathematical research all over the world.It was not only a collection of problems, but it was also his philosophy ofmathematics (see also the formalist viewpoint in section 5.2) and a collec-tion of problems important to that philosophy. Many of the problems havesince been solved, and each solution was a noted event (or even a mathema-tical breakthrough). Some of these problems however remain unsolved tillthis day. In 2000, in the footsteps of Hilbert, the Clay Mathematics Insti-tute (see http://zax.mine.nu/interests/questions/clay.htm) has made a newlist of 7 (for a large part mathematical) problems to be solved in this century.

Among those problems is one of the original problems (number 8) ofHilbert. It requires a solution to the Riemann hypothesis , which is usuallyconsidered to be the most important unsolved problem in mathematics. Wemention some of the original problems that are related to the foundationsof mathematics. For a complete source of information on the 23 (or 25?,see [32]) original publications of Hilbert, see the articles [41] and [40], alsoavailable online [42].

6.2. HILBERT’S 23 PROBLEMS 111

• Problem 1: Cantor’s problem of the cardinal number of the continuum.This problem is also known as the Continuum Hypothesis and exten-sively covered in section 3.7.

• Problem 2: The consistency of the axioms of arithmetic. The questionis whether it can be shown that the axioms on which arithmetic is basedare consistent. Godel later showed that any formal system that containsarithmetic (see chapter 8) can never prove its own consistency. Anothermetamathematical argument might exist, that cannot be expressed inthe system, but can prove its consistency.

• Problem 6: Mathematical treatment of the axioms of physics, asks totreat in the same manner, by means of axioms, those physical sciencesin which mathematics plays an important part; in the first rank are thetheory of probabilities and mechanics. So far no complete axiomatiza-tion of physics has been found.

• Problem 9: Proof of the most general law of reciprocity in algebraicnumber theory. For any field of numbers, the law of reciprocity (formore references see http://www.mathematik.uni-bielefeld.de/∼kersten/-hilbert/prob9.html) is to be proved for the residues of the lth power,when l denotes a prime, and further when l is a power of 2 or a powerof an odd prime. This problem is still unsolved.

• Problem 10: Decidability of solvability of diophantine equations. Thisquestion asks if, ‘given a diophantine equation with any number of un-known quantities and with rational integral numerical coefficients, todevise a process according to which it can be determined by a finitenumber of operations whether the equation is solvable in rational inte-gers’. In modern terminology the problem asks to devise an algorithmthat tests whether a polynomial has an integral root. A root of a poly-nomial is an assignment of values to its variables so that the value of thepolynomial is 0. A root is an integral root if all variables are assignedinteger values. Some polynomials have an integral root (for example6x3yz2 +3xy2−x3− 10 has an integral root at x = 5, y = 3 and z = 0)and some do not.Hilbert did not use the term algorithm but rather ‘a process accordingto which it can be determined by a finite number of operations’. Inorder to solve this problem this notion had to be made more precise


(this was done by Turing, see section 9.1). Also, Hilbert asked that analgorithm be devised . Thus he apparently assumed such an algorithmexists, but now we know that this problem is algorithmically unsolv-able. In 1970, the young Russian Yuri Matijasevic, building on thework of Martin Davis, Hilary Potnam and Julia Robinson, showed thatno algorithm exists for testing whether a polynomial has integral roots.

• Problem 23: Further development of the methods of the calculus ofvariations. Of the 23 problems Hilbert posed, this one is the least defi-nite, since it involves the general question of extending the calculus ofvariations, which basically is the theory of the variation of functions.With some examples that we will not treat here, Hilbert gave a jus-tification of the necessity for an extension of the differential and in-tegral calculus (for more references see http://www.mathematik.uni-bielefeld.de/∼kersten/hilbert/prob23.html).

At the end of his article, Hilbert says that he does not believe mathema-tics will, like other sciences, split into separate branches whose connectionbecomes ever more loose, but that the organic unity of mathematics is in-herent in the nature of this science, for mathematics is the foundation of allexact knowledge of natural phenomena. For a more detailed assessment ofHilbert’s view, see [49, section 12.4] and [31, section 4.7].

Chapter 7

Types

7.1 Russell and Whitehead’s Principia Ma-

thematica

Logic has become more mathematical and mathematics has be-come more logical. The consequence is that it has now becomewholly impossible to draw a line between the two; in fact, the twoare one. They differ as boy and man; logic is the youth of ma-thematics and mathematics is the manhood of logic.

- B. Russell in [79, page 194]

In section 4.1 we saw that with the postulates he presented, Peano statedand organized the fundamental laws of number theory, the core of mathema-tics. If statements satisfying these conditions could be derived in this logic,it would show that (at least part of) mathematics was founded in pure logic.As we have seen in section 4.2, Frege was adherent to the goal of logicism thatall of mathematics could be derived from logic alone. But unfortunately thelanguage that he created was inconsistent, as we have learned from Russell’sparadox in section 5.1. In his 1908 paper, ‘Mathematical Logic as Based onthe Theory of Types’, Russell laid out a theory to eliminate the paradoxes.With Principia Mathematica, Bertrand Russell and his teacher, the mathe-matician Alfred Whitehead, presented this theory to prevent the paradoxeswhile at the same time allowing many of the operations Frege considered de-sirable. The theory of types basically says that all sets and other entities have

113

114 CHAPTER 7. TYPES

a logical ‘type’, these types can be ordered and sets are always constructedfrom specified members with lower types. We will look at the theory of typesin more detail in section 7.2.Principia Mathematica consisted of three volumes (sometimes also called ‘thePrincipia’) and was named after the ‘Philosophiae naturalis principia mathe-matica’ of the English physician Isaac Newton. But unlike Newton’s book itdealt not with the application of mathematical techniques to physics, but tologic and mathematics itself. With their mathematical treatment of the prin-ciples of the mathematicians, Russell and Whitehead intended to summarizethe recent work in logic as well as to give a revolutionary and systematicaldevelopment of mathematical logic and derive basic mathematical principlesfrom the principles of logic alone.Their collaboration began in 1903 when Whitehead and Russell were bothin the initial stages of preparing second volumes to earlier books on relatedtopics: Whitehead’s 1898 ‘A Treatise on Universal Algebra’ and Russell’s1903 ‘The Principles of Mathematics’. Their work overlapped considerablyand they began collaborating on what would become ‘Principia Mathema-tica’. The approach of Russell and Whitehead was essentially that of Frege,to define mathematical entities (like numbers) in pure logic and then derivetheir fundamental properties. Indeed, their definition of natural numbers wasbasically the same as the one of Frege, but unlike him, they opted to avoidthe philosophical aspects and justifications. Although ‘Principia’ was largelysuccessful there still was critique on the axioms of infinity and the axiom ofreducibility, they were considered to be too ad hoc solutions to be justifiedphilosophically. In 1919 Russell published about the philosophy behind hiswork in an ‘Introduction to Mathematical Philosophies’ which was accessibleto a broad audience and therefore has been the main source through whichRussell’s logicist view of mathematics has become known.

I quote the following assessment about Principia Mathematica from [91]:“In addition to its notation (much of it borrowed from Peano), its mas-terful development of logical systems for propositional and predicate logic,and its overcoming of difficulties that had beset earlier logical theories andlogistic conceptions, the Principia offered discussions of functions, definitedescriptions, truth, and logical laws that had a deep influence on discus-sions in analytical philosophy and logic throughout the 20th century. Whatis perhaps missing is any hesitation or perplexity about the limits of logic:whether this logic is, for example, provably consistent, complete, or decida-

7.1. RUSSELL AND WHITEHEAD’S PRINCIPIA MATHEMATICA 115

ble, or whether there are concepts expressible in natural languages but notin this logical notation. This is somewhat odd, given the well-known list ofproblems posed by Hilbert in 1900 that came to animate 20th-century logic,especially German logic. The Principia is a work of confidence and masteryand not of open problems and possible difficulties and shortcomings; it is awork closer to the naive progressive elements of the Jahrhundertwende thanto the agonizing fin de siecle.”. We would like to add that with the very for-mal and accurate build-up of mathematics, Russell and Whitehead not onlymanaged to avoid the paradoxes but also created one of the most impressiveand complicated works of all times and that is, next to Aristotle’s Organon,considered to be the most influential book on logic that was ever written.

In the next section we will further investigate Russell’s theory of types.The English mathematician Frank Plumpton Ramsey (1903-1930) offeredcriticism to the theory of types that was accommodated in later editions ofPrincipia Mathematica. The result of this is the ‘deramified theory of types’that will be treated in subsequent sections, together with a later simplificationto this theory by the mathematicians Hilbert and Wilhelm Ackermann (1896-1962) from Germany.The mathematician Alonzo Church also published articles on type systems,but did not develop his typed version of lambda calculus before the 1940’s,and his typed lambda calculus thereby falls outside the scope of this article(1870-1940). We will only summarize his work in this paragraph. The maindifference between the type structure of Russell and that of Church is thatthe former is set-based with linear ordering of types and the latter is functionbased with a non-linear order of types. The type theory that emerged fromChurch’s lambda calculus (see section 9.2) was extended with simple typesin 1940 to prevent paradoxes, similar to the extension of logical set theorywith simple types by Russell in 1910 to avoid the paradoxes. Church alsoproposed another logical set theory in 1974.

[..] in the simple theory of types it is well known that the indi-viduals may be dispensed with if classes and relations of all typesare retained; or one may abandon also classes and relations of thelowest type, retaining only those of higher type. In fact any finitenumber of levels at the bottom of the hierarchy of types may bedeleted. But this is no reduction in the variety of entities, becausethe truncated theory of types, by appropriate deletions of entities


in each type, can be made isomorphic to the original hierarchy -and indeed the continued adequacy of the truncated hierarchy tothe original purposes depends on this isomorphism.

- A. Church in ‘The need for abstract entities’.

Organization of Principia Mathematica

The nearly 2,000 pages Principia Mathematica starts with a short prefacethat explains what it wants to demonstrate, namely that pure mathematicscan be based on logic alone and requires no other primitive notions. Russellclassifies statements that involve logical constants only (such as the laws ofreciprocity, see page 18 of Principia Mathematica) as pure mathematics, andother mathematical assertions that also refer to non-logical contents (such asthe statement that (perceptual) space is three-dimensional) as part of appliedmathematics. The belief was then expressed that pure mathematics was suf-ficient to include all traditional mathematics. Then, after an introduction,the first volume introduces a symbolic logic that is based on a small setof axioms, and then lays out the propositional and predicate calculi. Builtupon these, Whitehead and Russell define types, sets, relations and theirproperties, and basic operations on sets. The second volume continues witha purely logical theory of cardinal and ordinal arithmetic. This allowed themto introduce basic arithmetic, including addition, multiplication and expo-nentiation of both finite cardinals and of relations.The volume ends with a general theory of simply ordered sets (series) whichis followed by a logical base of fundamental mathematical analysis, includingsubjects as convergent sequences, continuity, limits and derivatives.The third volume was meant to prepare the ground for the fourth and con-cluding volume on geometry (which was never completed), and contained atheory of numbers that was called ‘measurement’. It starts with a theory ofwell-ordered sets, finite, infinite and continuous series, the negative integers,ratios and the real numbers, and finally vectors, coordinates and basic geo-metric notions such as angles.More details about the organization of Principia Mathematica and a criticalassessment of its work can be found in [31, chapter 7, and specifically section7.8].

7.1. RUSSELL AND WHITEHEAD’S PRINCIPIA MATHEMATICA 117

The symbolic logic and notation of Principia Mathematica

Russell and Whitehead opted for a more modern notation of Peano in-stead of Frege’s Begriffsschrift. Unlike Frege, Russell and Whitehead treatedfunctions as first-class citizens. A good introduction to the logical calculusand the specific notation that was used in Principia Mathematica can befound in [49, section 3.2 and 3.3] and [31, sections 7.2, 7.3, 7.7 and 7.8].

Russell’s theory of types

Russell’s 1908 book included a categorization of most of the importantcontradictions of that time, and an analysis of their common characteristics.To prevent the paradoxes he catalogued, Russell formulated the vicious circleprinciple (see page 85) and implemented it using types in Principia Mathe-matica (see for details [31, section 7.9] and [49, section 3.2 and 3.3]).

What is a type?A type is the range of significance of a propositional function, that is, thecollection of arguments for which the said function is significant and has val-ues.

The type of a variable in a proposition is fixed by all the values the func-tion is concerned with, i.e. by the totality over which the variable ranges.This division of objects into types (the type of an object can be seen as aproperty of that object) is necessary to conform to the vicious circle principle,i.e. to make sure that ‘whatever contains an apparent variable must not bea possible variable of that variable’. This can be established by making surethat ‘an apparent variable’ is of a different and higher type than the possiblevalues of that type. This linear order of types prevents vicious circles, sincethe variables contained in an object determine the type of that object.

Russell then defined an individual as being not a proposition but a con-stant, destitute of complexity. We can now categorize propositions by theirtypes. First order propositions are elementary propositions that only con-tain individuals, second order propositions are propositions with first-orderpropositions as variables and possibly propositions of lower than first ordertypes. This can be continued, such that the n + 1th order propositions con-tain propositions of order n and possibly others of order smaller than n.


We now also restrict relations like ∈ so that x ∈ y is only significant wheny is of a type one level higher than x, and we confine quantifiers alwaysto a single level. As can be proved however, this way of restricting propo-sitions prevents the paradoxes but can in some cases be needlessly restrictive.

For more information about types in Principia Mathematica, see [31,section 7.9] and [49, section 3.3]. For a formalization (in modern notation) ofRussell’s Ramified Theory of Types (RTT), we refer to [86, chapter 3]. Onits turn, this reference is again partly based on [52], [53], [54] and [43], all ofwhich in a certain context discuss RTT.A detailed introduction to the (symbolic) logic and notation of PrincipiaMathematica, as well as a formal introduction to RTT, STT and NF andMP (see section 7.3), is to be included in a later version of this report.

7.2. RAMSEY, HILBERT AND ACKERMANN 119

7.2 Ramsey, Hilbert and Ackermann

Suppose a contradiction were to be found in the axioms of settheory. Do you seriously believe that a bridge would fall down?

- F.P. Ramsey, quoted in [58]

Ramsey published his first major work ‘The Foundations of Mathematics’(see [69, page 105-142]) in 1925. In this publication he attempted to improvePrincipia Mathematica in two ways. First he proposed dropping the axiomof reducibility which, he writes, is “[...] certainly not self-evident and thereis no reason to suppose it true; and if it were true, this would be a happyaccident and not a logical necessity, for it is not a tautology.”. His secondsimplification is to suggest simplifying Russell’s theory of types by regardingcertain semantic paradoxes as linguistic. He accepted Russell’s solution toremove the logical paradoxes of set theory arising from, for example, ‘theset of all sets which are not members of themselves’. However, the seman-tic paradoxes such as ‘this is a lie’ are, Ramsey claims, quite different anddepend on the meaning of the word ‘lie’. These he removed with his reinter-pretation of the axiom of reducibility.After his suggestions, Russell’s theory became known as the ramified theoryof types (RTT), and Ramsey’s modification of the theory as the deramifiedtheory of types.For more detailed information about the history of deramification, we referto [86, chapter 4].

Hilbert, together with Ackermann (see [2]), simplified Russell’s theory oftypes by removing the orders into what has become known as the ‘simpletheory of types’ (STT). We quote from page 115 of [49]: “[In the simpletheory of types,] every individual or individual variable is said to be of typei; and if a predicate or predicate variable ϕ(x1, . . . , xn) has arguments x1,. . . , xn, of types τ1, . . . , τ2 respectively, then ϕ(x1, . . . , xn) is said to be oftype (τ1, . . . , τ2). Thus, for example, any predicate with two individual ar-guments is of type (i, i), while a predicate with a single argument that isitself a predicate with two individual arguments is of type (i, i, (i, i)). Havingintroduced the hierarchy of types in this way, we shall now require boundvariables to be of some definite type. Every quantifier will then range overthe totality of all entities of the same type as the bound variable. When


this is done, we have a very comprehensive logical calculus which is secureagainst vicious circularity”.

A further discussion and formalization (in the form of Church’s simplytyped lambda calculus λ→ c) of the simple theory of types can be found in[86].

7.3. QUINE 121

7.3 Quine

Just as the introduction of the irrational numbers . . . is a conve-nient myth [which] simplifies the laws of arithmetic . . . so physicalobjects are postulated entities which round out and simplify ouraccount of the flux of existence . . . The conceptional scheme ofphysical objects is [likewise] a convenient myth, simpler than theliteral truth and yet containing that literal truth as a scattered part

- Quine, quoted in [50]

Willard Van Orman Quine (1908-2000) was an American mathematicianwho became interested in the work of Russell. An alternative to Russell’s sys-tem is one that allows a single universe of all types (or all sets). In Russell’stheory such an object is too big but according to others, including Quine,having a set of all sets or a type of all types is legitimate as long as we do notpermit forming all subsets. If there is some restriction on which subsets canbe formed, for example by requiring a stratified predicate to define the sub-set, then no contradiction will result. Quine proposed in [94, pages 80-101]a system called New Foundations, NF, based on this idea. To restrict theway subsets are formed, Quine further restricted the comprehension axiom to:

NFC(omprehension) Axiom: ∃x∀y :: (y ∈ x ↔ ϕ(y)), where x is notfree in ϕ(y) and ϕ(y) is stratified

In [86, footnote 4], we find two definitions of stratification.

Definition of heterogeneous stratification: A well-formed formula ϕis heterogeneously stratified := there is a function f from the variables andconstants of ϕ to the natural numbers such that for each atomic well-formedformula F (x1, . . . , xn) of ϕ, f(F ) = 1 + (max : 1 ≤ i ≤ n : f(xi))

Definition of homogeneous stratification: A well-formed formula ϕ ishomogeneously stratified := ϕ is heterogeneously stratified and for the corre-sponding function f we also have that f(xi) = f(xj) for 0 ≤ i, j ≤ n

With the NFC axiom the paradox is obviously prevented, since the sen-tence ϕ ≡ x /∈ x is not stratified.


We quote from [86, page 3]: “NF is weak for mathematical induction andthe axiom of choice is not compatible with NF. We cannot prove Peano’saxiom[s] in it, unless we assume the existence of a class with m + 1 ele-ments. Also, NF is said to lack motivation because its axiom of compre-hension is justified only on technical grounds and one’s mental image of settheory does not lead to such an axiom. To overcome some of the difficulties,Quine adopted similar measures to NBG (Neumann-Bernay-Godel, see sec-tion 8.5) set theory[, and developed another non-iterative set theory calledML (Mathematical Logic), first presented in [70]]. Like NBG, ML containsa bifurcation of classes into elements and non-elements. Sets can enjoy theproperty of being full objects whereas classes cannot. ML was obtained fromNF by replacing (NFC) by two axioms, one for class existence and one forelementhood. The rule of class existence provides [. . . ] the existence of theclasses of all elements satisfying any condition ϕ, stratified or not. The ruleof elementhood is such as to provide the elementhood of just those classeswhich exist for NF. Therefore, the two axioms of comprehension for ML [are]:Comprehension by a set: (∃y∀x :: x ∈ y ↔ ϕ(x)), where ϕ(x) is stratifiedwith set variables only in which y does not occur free.Impredicative comprehension by a class: (∃∀x :: x ∈ y ↔ ϕ(x)), where ϕ(x)is any formula in which y does not occur free.ML was liked both for the manipulative convenience we regain in it andthe symmetrical universe it furnishes. It was however proved subject to theBurali-Forti paradox”.

For more information, we refer to [70], [71], [72] and the websitehttp://diamond.boisestate.edu/∼holmes/holmes/nf.html.

Chapter 8

Godel

The development of mathematics towards greater precision hasled, as is well known, to the formalization of large tracts of it,so that one can prove any theorem using nothing but a few me-chanical rules. [. . .] It will be shown below that this is not thecase, that on the contrary there are in the two systems mentioned[viz. Principia Mathematica and ZF] relatively simple problemsin the theory of integers that cannot be decided on the basis of theaxioms.

- K. Godel, in the opening of the paper introducing the incom-pleteness theorem (1931)

8.1 Informally: Godel’s incompleteness theorems

No system of Hilbert’s type in which the integers (or Peano’s arithmetic, seesection 4.1) can be defined can be both consistent and complete. At thetime this seemed unreal, but in 1931 Kurt Godel (born in 1906 in Brnn,Austria-Hungary, what is now Brno, Czech Republic) presented mathema-ticians with the astounding and melancholy conclusion that the axiomaticmethod has certain limitations, which rule out the possibility that even theordinary arithmetic (as by Peano) can ever be fully axiomatized. As a corol-lary of this theorem, he proved that it is impossible to establish the internallogic consistency of a very large class of deductive systems. It provoked areappraisal of philosophies of mathematics.

123

124 CHAPTER 8. GODEL

Godel’s famous incompleteness theorem and the corresponding corollaryare also called the first and the second incompleteness theorem. Godel wasable to show that, if an axiomatic system of formalized arithmetic is wideenough, then

1. The system is necessarily incomplete, in the sense that there exists aformula ϕ of the system such that neither ϕ nor its negation is derivable(see also section 8.2 for the definition of incompleteness), and

2. If the system is consistent, then no proof of its consistency is possiblewhich can be formalized within it (see also section 8.2 for the definitionof consistency).

We first indicate (in 8 steps, following the lines of the original proof ofGodel) the main lines of both theorems in this section, and provide a morerigorous and exact proof of the theorems in section 8.2 and further sections.

1 The (syntax of) formulas of an axiomatic system are precisely definedand built up from a finite alphabet of symbols. Proofs are noth-ing but a finite series of formulas and can be replaced by numbers.With such a representation, the Godel numbering , Godel gave a well-ordering of all well-formed formulae of an axiomatic system S (to beprecise, of ω-complete systems, see section 8.2 for more details). Godelthen showed how to represent metamathematical concepts as ‘formula’,‘proof-schema’ and ‘provable formula’ by a series of natural numbers.We define gn(ϕ) to be the Godel number corresponding to well-formedformula ϕ of S.

2 We consider a formula prov(ϕ) of S, stating that ϕ is a provable for-mula. Precisely, we define prov(ϕ) := ‘ϕ is a provable formula’. A classsign is a formula with just one free variable. We suppose that the classsigns are ordered by a function R with domain N, such that R(n) isdefined as the nth class sign. By [R(n); q] we denote the formula whichis denoted by replacing the free variable in R(n) by q.

3 We now define a set K of natural Godel numbers by n ∈ K ↔¬prov([R(n); n]). Since the symbols that are used in this formula areall definable in S, there also is a formula with one free variable (i.e. aclass sign) that denotes n ∈ K, for some natural number n. We call

8.1. INFORMALLY: GODEL’S INCOMPLETENESS THEOREMS 125

this class sign C. So there is a natural number q such that C = R(q).We now show that the proposition G ≡ [R(q); q], is unprovable in S.Since1 this formula says that q ∈ K, that is ¬prov([R(q); q]), we cansay that G is a property that asserts of itself that it is not provable.

4 We show that G is provable ↔ ¬G is provable, and hence is undecid-able:

• Suppose G is provable, this means [R(q); q] is provable, (by replac-ing the variable in the class sign by q) that is q ∈ K, i.e. ¬prov([R(q); q]),and this says ¬prov(G) : G is not provable.

• Suppose G is not provable, this means its negation ¬[R(q); q] isprovable, (by replacing the variable in the class sign C by q)that is q /∈ K, i.e. (¬¬prov([R(q); q]), and this is equivalent withprov([R(q); q]) or prov(G) : G is provable.

A proof of G leads to a proof of ¬G and vice versa, thus the system Sis inconsistent. So if we assume that S is consistent, then both G and¬G must not be provable: G is undecidable in S.

5 By a metamathematical consideration we know however that G is true.Because from the remark that G asserts its own unprovability, it followsat once that G is true, since G is unprovable (because undecidable).So there is a true statement in S (namely G) that is not provable: thesystem S is incomplete!

6 If we add G as an axiom, we can again apply the argument givenin the previous five steps in the same way. Basically we then createanother formula G′, since in step 3 a proposition is defined that states‘this formula is not provable’, or in other words ‘this formula does notfollow from the axioms’. That means, the proposition depends on theset of axioms. Therefore, as I. Grattan-Guinness cleverly calls it in [31,page 510], the system S is ‘essentially incompleteable’.

7 Godel then showed that ‘if arithmetic is consistent, it is incomplete’.We want to prove this conditional statement as a whole. We definethe condition of the statement by A: ‘arithmetic is consistent’. We

1By replacing in the class sign C, which expresses that n ∈ K for some natural numbern, the free variable by q.


already have seen in section 6.1 that this means that there is at leastone formula ϕ of arithmetic that is not true. So we can express A ≡(∃y :: (∀x :: ¬prov (x is a proof of y))). A system is incomplete ifthere is a true statement that is not provable. Thus we can representthe conclusion of the conditional statement by G.

8 We can now formally prove A→ G (see section 8.2 for the proof). Thismeans that if A is provable, we know (by modus ponens or the role ofdetachment) that G is provable. But we already saw that (unless Sis inconsistent), G is not provable; thus if S is consistent, A is notprovable! That means if arithmetic is consistent its consistency cannotbe established by metamathematical reasoning within the formalismof arithmetic (this is Godel’s theorem 11, see [93, page 614]). Or, asexpressed in [31, page 510], ‘any set S of consistent formulae of PMcannot include the formula F asserting its consistency’.

8.2. FORMALLY: GODEL’S INCOMPLETENESS THEOREMS 127

8.2 Formally: Godel’s Incompleteness Theorems

The first incompleteness theorem says that Principia Mathematica or anyother system in which arithmetic can be developed, is essentially incomplete,that is in any consistent set of arithmetical axioms there are statements thatare true but cannot be derived from the set.

The second theorem says that it is impossible to give a metamathemat-ical proof of the consistency of a system comprehensive enough to containthe whole of arithmetic - unless the proof itself employs rules of inferencein certain essential respects different from the derivation rules identifyingtheorems within the systems.

In the following two paragraphs, we will first give an abstract version ofGodel’s first and second incompleteness theorem, investigate the set of lan-guages that the theorem applies to, and then in the third paragraph fill inthe details by giving a specific Godel numbering for arithmetic. Then in thenext sections we will apply the theorem to the system of Peano Arithmeticand that of Principia Mathematica, and discuss the consequences of the in-completeness theorem.

8.2.1 On formally undecidable propositions

We assume there is an STGA language L and investigate the conditions fora system L for which Godel showed that there is a true sentence that is notprovable in L (i.e. (∃t : t ∈ T : t /∈ P)). We define the following concepts:A predicate H expresses a set of numbers A := (∀n :: H(n) ∈ T ↔ n ∈ A)A is expressible in L if A is expressed by some predicate of L. Note thatexpressibility in L only concerns with T and not with P and R.

Theorem: Not every set of numbers is expressible.Proof: (from [84]) Since L is built up of a finite number of symbols andderivation rules, there are only denumerably many expressions or predicatesof L. But (by Cantor’s theorem, see page 69) there are non-denumerablymany sets of natural numbers. Therefore, not every set of numbers is ex-pressible in L.


Let gn be a function that assigns to each expression a unique naturalnumber (just as in step 1 in section 8.1, i.e. gn is a bijection between E andN). For any E ∈ E , we also call gn(E) the Godel number of E. We willgive a specific numbering in section 8.2.3. For this abstract treatment theonly assumption2 we make is that every number is the Godel number of someexpression.

We define En to be the inverse of gn, i.e. gn(En) = n. The diagonali-zation of En for En ⊆ H, is defined by En(n). We define d(n) to be theGodel number of the diagonalization of En, that is: d(n) := gn(En(n)), andcall d the diagonal function of the system. For each set of natural numbersA, we define A∗ to be the set of all numbers n such that d(n) ∈ A, i.e. wehave n ∈ A∗ ↔ d(n) ∈ A. For any set of natural numbers A, we define

its complement A to be the set of all natural numbers not in A. The com-plement operation ∼ binds stronger than the ∗, i.e. (A∗) is to be read as (A)∗.

Abstract form of Godel’s first theorem: Let P be a set of Godel num-bers of all the provable sentences. If the set P∗ is expressible in L and L iscorrect, then there is a true sentence of L not provable in L.Proof: (based on [84]) Suppose L is correct and P ∗ is expressible in L bya predicate H with Godel number h. Let G be the diagonalization of H(i.e. the sentence H(h)). We show that G is true but not provable in L. H

expresses P ∗ in L, i.e. H(n) is true ↔ n ∈ P ∗ for all n ∈ N. In particular,

H(h) is true ↔ h ∈ P ∗. We have that h ∈ P ∗ ↔ d(h) ∈ P ↔ d(h) /∈ P .But since h is the Godel number of H and by the definition of d, d(h) isthe Godel number of H(h) and so d(h) ∈ P ↔ H(h) is provable in L andd(h) /∈ P ↔ H(h) is not provable in L. Now we have: H(h) is true ↔ H(h)is not provable in L. This means that H(h) is either true and not provablein L or false but provable in L. The latter alternative violates the hypothe-sis that L is correct. Hence it must be that H(h) is true but not provable in L.

Note that in this proof we have not defined the set T by a model butdetermined the truth of G by a metamathematical argument just as we haveseen in step 5 of section 8.1, that is nevertheless commonly accepted by allmathematicians. Note also that the proposition G corresponds to the propo-

2This assumption is for technical reasons that make the proof more simple; Godel’soriginal numbering did not have this restriction.


sition G of point 3 of section 8.1, since H(h) is a proposition that expressesof itself that it is not provable.

Theorem: If L is correct and if the set P ∗ is expressible in L, then L isincomplete.Proof: A system L that is correct and for which the set P ∗ is expressible inL contains a sentence G that is true but not provable or refutable (By theprevious theorem and the assumption of correctness). Hence G is true, butundecidable in L, and hence also incomplete.

That is where the name incompleteness theorem comes from. By thistheorem, it follows immediately that if a system is consistent , and the setP ∗ is expressible in that system (which we will later see is true for a systemof basic arithmetic) then it is incomplete. Note that this is the statementA→ G of point 8 in section 8.1.When we study a particular language L, such as a system containing Peano’sarithmetic or the system of Principia Mathematica, we have to verify theassumption that P ∗ is expressible in L. We can do this by separately verifyingthe following conditions.

G1 : For any set A expressible in L, the set A∗ is expressible in L.

G2 : For any set A expressible in L, the set A is expressible in L.

G3 : The set P is expressible in L.

Theorem: G1 ∧G2 ∧G3 → P ∗ is expressible in L.Proof: G1 and G2 imply that for any expressible set A, A∗ is expressible inL. In particular we then have that if P is expressible in L (i.e G3 holds), P ∗

is expressible in L.

Before we prove a general form of Godel’s second incompleteness theo-rem, we introduce some more definitions.

A sentence En is a Godel sentence for a set A of natural numbers if eitherEn is true and its Godel number lies in A, or En is false and its Godel numberlies outside A, i.e. En is a Godel sentence for A if and only if En ∈ T ↔ n ∈ A.

Diagonal Lemma: For any set A, if A∗ is expressible in L, then there is aGodel sentence for A.


Proof: Suppose H is a predicate that expresses A∗ in L; let h be its Godelnumber. Then d(h) is the Godel number of H(h). For any number n, H(n)is true ↔ n ∈ A∗, therefore, H(h) is true ↔ d(h) ∈ A, and since d(h) is theGodel number of H(h), then H(h) is a Godel sentence for A.

Lemma: If L satisfies G1, then for any set A expressible in L, there is aGodel sentence for A.Proof: L satisfies G1, thus for any expressible set A, A∗ is expressible inL. Now we can apply the previous lemma to conclude that there is a Godelsentence for A.

With the diagonal lemma we can also prove the first theorem as follows:Since P ∗ is expressible in L, by the diagonal lemma, there is a Godel sentenceG for P . A Godel sentence for P is a sentence which is (by the definitionof a Godel sentence) true if and only if it is not provable in L. So for any

correct system L, a Godel sentence for P is a sentence which is true but notprovable in L.

8.2.2 The impossibility of an ‘internal’ proof of consis-tency

With the diagonal lemma we can also prove a general form of Godel’s secondtheorem, that was first formulated in this form by the Polish mathematicianAlfred Tarski.

A general form of Godel’s second theorem (by Tarski)

1. The set T ∗ is not expressible in L

2. If condition G1 holds, then T is not expressible in L

3. If conditions G1 and G2 both hold, then the set T is not expressible inL (i.e. for systems for which G1 and G2 hold, truth within the systemis not definable within the system.)

Proof: To begin with, there cannot possibly be a Godel sentence for the setT because such a sentence would be true if and only if its Godel number wasnot the Godel number of a true sentence, and this is absurd.


1. If T ∗ were expressible in L, then by the diagonal lemma, there would bea Godel sentence for the set T , which we have just shown is impossible.Therefore, T ∗ is not expressible in L.

2. Suppose condition G1 holds. Then if T were expressible in L, the setT ∗ would be expressible in L, violating (1).

3. If G2 also holds, then if T were expressible in L, then T would also beexpressible in L, violating (2).

Now we have seen both theorems in a general form, we will consider particularmathematical languages, starting with first order arithmetic, which we canbuild on in section 8.3 to prove the incompleteness of systems based onPeano’s arithmetic and other systems.

8.2.3 Godel numbering and a concrete proof of G1, G2

and G3

This section will be completed in a later version of this document. For themoment we refer to Godel’s original work that can be found in [93].


8.3 Godel’s theorem and Peano Arithmetic

The classification of the various modes of syllogisms, when theyare exact, has little importance in mathematics. In the mathema-tical sciences are found numerous forms of reasoning irreducibleto syllogisms.

- G. Peano in [68, page 379]

There are various different incompleteness proofs of Peano Arithmetic(with and without exponentiation). We mention three of them. The sim-plest uses a truth set defined by Tarski and shows that every axiomatizablesubsystem of N (the complete theory of arithmetic) is incomplete. Thisproof of Godel’s first theorem however cannot be formalized in arithmetic(since the truth set is not expressible in arithmetic), and was based on theunderlying assumption that Peano Arithmetic is correct, implying that everysentence provable in Peano Arithmetic is a true sentence. Godel’s originalincompleteness proof involves the much weaker assumption of ω-consistency.

Definition of simple consistency: An axiomatic system A issimply consistent := no sentence is both provable and refutable in A

Definition of ω-inconsistent: An axiomatic system A is ω-inconsistent:= there is a predicate F (w) (in one free variable w) such that the sentence(∃w :: F (w)) is provable but all the sentences F (0), F (1), . . . are refutable

Definition of ω-incomplete: An axiomatic system A is ω-incomplete := Ais a simply consistent axiomatic system in which all Σ0-sentences are provable

Godel’s original proof was based on the assumption of ω-consistency andshows that every axiomatizable ω-consistent system in which all true Σ0-sentences are provable is incomplete. This proof is of course formalizable inPeano Arithmetic (and this is necessary for Godel’s second theorem) and alsoshows that any axiomatic system A that is simply consistent and in whichall Σ0-sentences are provable, is ω-incomplete.The third proof (1936) is due to Rosser and uses the even weaker assumptionof simple consistency. It is based on an axiomatic system by the Americanmathematician Raphael Robinson (1912-1995), that we refer to as R. It

8.3. GODEL’S THEOREM AND PEANO ARITHMETIC 133

shows that every axiomatizable simply consistent extension of R is incom-plete, but thereto uses a more elaborate sentence than the Godel sentence‘G is undecidable’.

We intend to include the three proofs in a later version of this document.They can be found in [84] but in a particular presentation that does not usethe concept of a model for axiomatic systems, and that sometimes attachesdifferent meanings to established definitions, nevertheless it contains in ouropinion one of the best discussions of Godel’s incompleteness theorems.In a later version of this document we will also show how, given the proofof incompleteness of Peano Arithmetic, Godel’s theorems apply to PrincipiaMathematica.

We quote K. Godel on the first page of [27]:

The most comprehensive formal systems that have been set uphitherto are the system of Principia Mathematica on the one handand the Zermelo-Fraenkel axiom system of set theory (further de-veloped by J. von Neumann) on the other. These two systems areso comprehensive that in them all methods of proof today usedin mathematics are formalized, that is, reduced to a few axiomsand rules of inference. One might therefore conjecture that theseaxioms and rules of inference are sufficient to decide any ma-thematical question that can at all be formally expressed in thesesystems. It will be shown that this is not the case, that on thecontrary there are in the two systems mentioned relatively simpleproblems in the theory of integers that cannot be decided on thebasis of the axioms”.


8.4 Consequences

I had a lot of conversations with him [Godel] and a lot of dis-agreements. Like most others, I was hard to convince about theincompleteness theorem. There was at the time a tendency, whichI shared, to think that it was special to a certain type of formali-zation of logic and that a radical reformalization might have theeffect that the Godel argument did not apply. I persisted in thatlonger than I should have, and he was always trying to convinceme otherwise.

- A. Church in an interview at Princeton University (1985)

In a later version of this document we will discuss the implications ofGodel’s theorem and show the reactions that followed the publication of hispaper [27] in 1931.

8.5. NEUMANN-BERNAYS-GODEL AXIOMS 135

8.5 Neumann-Bernays-Godel axioms

There is an infinite set A that is not too big.

There’s no sense in being precise when you don’t even know whatyou’re talking about.

- John von Neumann (sources unknown)

Let us recapture the situation of the axiomatic theory of sets before weintroduce the Neumann-Bernays-Godel theory.

When Cantor introduced his set theory, he gave the informal definition(see page 16) of a set being ‘any comprehension into a whole M of definiteand separate objects m of our intuition or thought’. After Hilbert proposedhis proof theory, set theory was given a more rigorous basis, and axiomatictheories for Cantor’s sets were proposed. Cantor’s definition was replaced bythe principle of comprehension (see page 16), which was adopted by Fregeand Russell. Based on this principle a first formal theory of sets, called ‘idealcalculus’ was developed (not treated in detail here, see for example [36]). Theantinomies of Burali-Forti and Russell however showed that this theory wasinconsistent, and one way to restore consistency was to incorporate in thesystem a theory of types, as was done by Russell. At the same time, intu-itionists tried to do mathematics without Cantor’s set theory at all. Otherstried to overcome the inconsistencies by making Cantor’s set theory morerigidly axiomatic, and the most successful axiomatization of set theory waspresented by Zermelo in 1908.

The problem for him was to solve the problem of axiomatization in sucha way that it excludes all contradictions but still is sufficiently wide for allthat is valuable in this theory to be preserved. As we have seen in section5.3, Zermelo postulated a domain of abstract objects (sets) and elements ofthis domain, defined the primitive notions of ‘equality’ and ‘is element of’relation, and introduced 7 axioms. The comprehension axiom was replacedby the weaker separation axiom, that only allows new sets to be createdfrom existing sets and with definite predicates. Before we will describe whythe Hungarian mathematician von Neumann opposed this solution and camewith his own solution to the paradoxes, we will look at this separation axiom


in more detail. Zermelo defined the separation axiom as follows:

Separation axiom:(∀z∃y∀x :: x ∈ y ↔ x ∈ z ∧ ϕ(x)), ϕ is definite and does not contain y. Forevery set z there exists a set y whose elements are exactly those of z havingthe property ϕ.

The concept of definiteness in this axiom was defined by Zermelo as fol-lows: “A question or assertion ϕ, the validity or invalidity of which is decidedwithout arbitrariness by the basic laws of logic, is said to be ‘definite’ ”.We have already seen on page 93 that this axiom excludes the paradoxes ofRussell and Burali-Forti, and as Kneebone remarks3 in [49, page 263] alsothe semantic paradoxes.

In [83], the Norwegian mathematician Skolem pointed out that the defi-nition of ‘definiteness’ was rather vague and he made precise the formulationof ‘by the basic laws of logic’. Fraenkel used Skolem’s idea to formulatethe separation axiom in a new way (for details, see [49, page 290, 291]). In1922 Fraenkel proposed the introduction of another axiom that allows theexistence of larger cardinal numbers than hitherto possible. The foundationaxiom of von Neumann makes occurrence of so-called extraordinary sets im-possible. A set is extraordinary if there is a sequence of sets V1, V2, V3, . . .such that V2 ∈ V1, V3 ∈ V2, etc. Von Neumann’s subsequent interest in settheory led to the second major axiomatization of set theory in the 1920s.

His formulation differed considerably from Zermelo and Fraenkel (see sec-tion 5.3) because the notion of function, rather than that of set, was takenas primitive. In a series of papers beginning in 1937, however, the Swisslogician Paul Bernays, a collaborator with the formalist David Hilbert, mod-ified the von Neumann approach in a way that put it in much closer contactwith Zermelo and Fraenkel. In 1940, the Czech-born Kurt Godel, known forhis incompleteness proof (see chapter 8), further simplified the theory. Thisversion is known as the Neumann-Bernays-Godel (NBG) axioms.

3We quote: “since a definite property is one that is decidable by the basic relations ofthe domain B [of sets, the abstract objects postulated by Zermelo], no such property asthat of being definable in a finite number of words can be used in the definition of a set,and the semantic paradoxes are thus also excluded”.


Before we give the axioms, it is convenient to adopt the undefined notionsof class and the membership relation (though, as is also true in Zermelo andFraenkel, ∈ suffices). In the axioms we distinguish between the use of capitalLatin letters and lowercase Latin letters for the variables. The capital lettersstand for variables that take classes (the totalities corresponding to certainproperties) as values. A class is defined to be a set if it is a member of someclass; those classes that are not sets are called proper classes. The lowercaseletters are used as special restricted variables for sets.

Example: ‘for all x,A(x)’ stands for ‘for all X, if X is a set, then A(X)’;i.e. the condition holds for all sets . Intuitively, sets are intended to be thoseclasses that are adequate for mathematics, and proper classes are thoughtof as those collections that are ‘so big’ that, if they were permitted to besets, contradictions would follow. In the Neumann-Bernays-Godel axioms,the classical paradoxes are avoided. This can be proven by showing in eachcase that the collection on which the paradox is based is a proper class, i.e. isnot a set.

Theorem: With the Neumann-Bernays-Godel axioms, the derivation ofRussell’s paradox does not apply.

Proof: We show that R := x | x is a set ∧ x /∈ x is a class, but nota set. For all y we have that y ∈ R ↔ y is a set ∧ y /∈ y. We prove bycontradiction that R is not a set.Suppose R is a set. Suppose R ∈ R. But then we have (take R for y in theabove statement) R ∈ R ↔ R is a set ∧ R /∈ R: contradiction. So we musthave R /∈ R. Then by our assumption we have R is a set ∧ R /∈ R, andthus R ∈ R: contradiction. Since in both cases (R ∈ R and R /∈ R) we geta contradiction, out assumption that R is a set must be wrong.

The Neumann-Bernays-Godel axioms (NBG):

1 Extensionality axiom (or axiom of determination):(∀X,Y, z :: (z ∈ X ↔ z ∈ Y )→ X = Y )Classes are uniquely determined by their members, to be exact: if everyelement (that is a set) of a class X is at the same time an element ofY , and conversely, than X = Y .


2 Axiom of the empty set:(∃x∀y :: y /∈ x)There is an (improper, see also footnote on page 93) set, the ‘null’ or‘empty’ set, which contains no elements at all.

3 Axiom for class formation: (∃Y ∀x :: (x ∈ Y ↔ ϕ(x)), ϕ is a proposi-tion in which set variables are only introduced by existential and uni-versal quantifiers. For every set z there exists a set y whose elementsare exactly those of z having the property ϕ.

4 Pairing axiom:(∀a, b :: (∃y∀x :: x ∈ y ↔ x = a ∨ x = b))Given two sets a and b there exists a set whose elements are exactly aand b.

5 Sum-set axiom or Union axiom:(∀z∃y∀x :: x ∈ y ↔ (∃w :: w ∈ z ∧ x ∈ w))For every set z there exists a set y whose elements are exactly thoseobjects occurring in at least one element of z.

6 Power set axiom(∀z∃y∀x :: x ∈ y ↔ x ⊆ z)For every set z there is a set y that includes every subset of x.

7 Axiom of infinity:(∃z :: ∅ ∈ z ∧ (∀a : a ∈ z : a ∈ z))There exists a successor set.

8 Axiom of choice:(∀x :: (∃f : f is a function : Dom(f) = x − ∅ ∧ (∀a : a ∈ Dom(f) :f(a) ∈ x)))Every set x has a choice function.

9 Axiom of replacement or axiom of substitution (by Fraenkel):(∀x∃!y : ϕ is a class : ϕ(x, y)) → (∀a :: (∃b∀y :: y ∈ b ↔ (∃x : x ∈ a :ϕ(x, y))))The image of a set under an operation (functional property) is again aset.

10 Axiom of restriction: X = ∅ → (∃y : y ∈ X ∧ X ∩ y = ∅) Everynon-empty class is disjoint from one of its elements.


The axioms 1, 3, 9 and 10 are different from ZF. The third axiom (scheme)is presented in a form to facilitate a comparison with the third axiom (scheme)of ZF. In a detailed development of NBG, however, there appears, instead,a list of seven axioms (not schemes) that for each of certain conditions thereexists a corresponding class of all those sets satisfying the condition. Fromthis finite set of axioms, each instance of the above scheme, can be obtainedas a theorem. When obtained in this way, the third axiom scheme of NBGis called the class existence theorem.

In contrast to the ninth axiom scheme of ZF (see section 5.3.2), that ofNBG is not an axiom scheme but an axiom. Thus, with the comments aboveabout the third axiom in mind, it follows that NBG has only a finite numberof axioms. On the other hand, since the ninth axiom or scheme of ZF providesan axiom for each formula, ZF has infinitely many axioms. The finiteness ofthe axioms for NBG makes the logical study of the system simpler.

The relationship between the theories may be summarized by the state-ment that ZF is essentially the part of NBG that refers only to sets. We givethe following theorems without proof:

Theorem: Every theorem of ZF is a theorem of NBG

Theorem: Any theorem of NBG that speaks only about sets is a theoremof ZF

Theorem: ZF is consistent if and only if NBG is consistent

Note that the fact that NBG avoids the classical paradoxes and thatthere is no apparent way to derive any one of them in ZF does not settle thequestion of the consistency of either theory. All we know from this theoremis that either both axioms are consistent, or both are inconsistent.

Chapter 9

Church and Turing

9.1 Turing and Turing Machine

We may hope that machines will eventually compete with men inall purely intellectual fields.

- Alan Turing in [38, page 46]

Alan Mathison Turing (1912-1954) was an English mathematician andlogician who pioneered in the field of computer theory and who contributedimportant logical analyses of computer processes. Turing studied in Cam-bridge, worked there on probability theory and (independently of de Moivre)discovered the central limit theorem. In 1936 he won the Smith’s Prize. Aswe have seen in the previous chapters, many mathematicians had attemptedto eliminate all possible error from mathematics by establishing a formal,or purely algorithmic, procedure for establishing truth (the so-called for-malist program). With his incompleteness theorem (see section 8.1), KurtGodel threw up an obstacle to this effort, for he showed that any useful ma-thematical axiom system is incomplete in the sense that there must existpropositions whose truth can never be decided (within the system). Turingwas motivated by Godel’s work to seek an algorithmic method of determiningwhether any given propositions were undecidable, with the ultimate goal ofeliminating them from mathematics. Instead, he proved in his seminal paper‘On Computable Numbers, with an Application to the Entscheidungspro-blem’ (reprinted in [19]) that there cannot exist any such universal methodof determination. We now regard this decision problem, or Entscheidungs-

141

142 CHAPTER 9. CHURCH AND TURING

problem, in more detail.

Decidability was one of Hilbert’s requirements for an axiomatic system(see section 6.1). The problem of decidability asks if, given a mathematicalproposition, one could find an algorithm which would decide if the propo-sition is true or false. When given an algorithm, it is easy to see that itcan prove certain propositions. But it is more difficult to prove there is noalgorithm that can solve certain propositions. Thereto Turing introduced ahypothetical computing device (later called Turing machine). The TuringMachine and proof of undecidability are given later in the section.

After this important publication Turing completed his Ph.D. in 1938 onsystems of logic based on ordinals, under direction of Alonzo Church (seesection 9.2). During the war Turing worked on breaking German Enigmacodes, and in 1948 he worked in Manchester on the construction of a newdigital computer. He described a modern computer before technology hadreached the point where the construction was a realistic possibility. His ef-forts in the construction of early computers and the development of earlyprogramming techniques were of prime importance. He also championed thetheory that computers eventually could be constructed that would be capableof human thought, and he proposed a simple test, now known as the Tur-ing test, to assess this capability. Turing’s papers on the subject are widelyacknowledged as the foundation of research in artificial intelligence. In 1952Turing published the first part of his theoretical study of morphogenesis, thedevelopment of pattern and form in living organisms.

The Turing Machine

Turing introduced his hypothetical computing device in 1936. He origi-nally conceived the machine as a mathematical tool that could infallibly re-cognize undecidable propositions - i.e., those mathematical statements that,within a given formal axiomatic system (that includes at least arithmetic),cannot be either true or false. Godel had demonstrated that such proposi-tions exist in any such system. Turing instead proved there can never existany universal algorithmic method for determining whether a proposition isundecidable. This was left open by Godel, since the incompleteness theorem(see section 8.1) only stated that consistency and completeness could not atthe same time be attained; that means there were statements (in consistent

9.1. TURING AND TURING MACHINE 143

systems) about numbers, indubitably true, which could not be proved fromfinitely many rules. But the decidability of mathematical statements wasnot settled by Godels theorem because it needs a formal definition of (al-gorithmic) method in the formulation of the problem (or a definition of thenotion of algorithm in the definition of decidability in section 6.1). TheretoTuring introduced a machine that was later to be called the Turing machine,an idealized mathematical model that reduces the logical structure of anycomputing device to its essentials. By extrapolating the essential features ofinformation processing, Turing was instrumented in the development of themodern digital computer. His model served as a basis for all subsequent digi-tal computers, which share his basic scheme of an input/output device (tapeand head), memory (tape) and central processing unit (head and transitionfunction).

Nowadays there are many models of computing devices available in thetheory of computation (complexity). We will not cover restricted modelssuch as finite automata and pushdown languages (and corresponding notionssuch as regular languages and context-free grammars). We now directly in-troduce the much more powerful model of Turing that we need to invest allmathematical problems.

The Turing Machine model uses an infinite tape as its unlimited memory,and has a tape head that can read and write symbols (of a set Γ) and movearound a tape (to the L(eft) or R(ight)). We here assume the tape is right-infinite; this means the tape continues infinitely to the right side but it hasa left-most position. Initially the tape contains an input string of symbolsfrom an input alphabet Σ and is blank (i.e. filled with a special blank symbol") everywhere else. The Turing Machine is in a state q of a set of states Q,and starts in an initial state q0. It uses a transition function δ that deter-mines how it gets from one configuration (that is the current state, the tapecontents and the head location) to the next. This transition can consist ofwriting a new symbol of the tape alphabet Γ to the tape and moving the tapehead either Left or Right, and depends on the current state and the currentsymbol on tape. This computation (i.e. sequence of transitions) continuesuntil the Turing Machine enters either the (final) state qaccept or the (final)state qreject. We can define a Turing Machine (sometimes called determin-istic, since each transition is determined uniquely given the configuration)formally as a septuple:


Definition of a Turing Machine (TM):A Turing Machine (TM) := (Q, Σ, Γ, δ, q0, qaccept, qreject) with:

1 Q is a finite set of states.

2 Σ is a finite input alphabet not containing the special blank symbol ".

3 Γ is a finite tape alphabet, where " ∈ Γ and Σ ⊆ Γ.

4 δ is the transition function, where δ is finite andδ : Q× Γ→ Q× Γ× L,R.

5 q0 is the start state, where q0 ∈ Q.

6 qaccept is the accept state, where qaccept ∈ Q.

7 qreject is the reject state, where qreject ∈ Q and qreject = qaccept.

We call configurations accepting configurations if the state is qaccept, re-jecting configurations if the state is qreject, and halting configurations if thestate is either qaccept or qreject. A start configuration C on input w is a con-figuration with state q0 and the head is on the leftmost position on the tapewith just w on it.

After defining the Turing Machine, Turing made his famous proposal(known as Turing’s thesis, see also section 9.3) for the concept of ‘com-putability by a Turing machine’. The proposal says that whenever thereis an effective method for obtaining the values of a mathematical function(i.e. it is intuitively or effectively computable), the function can be computedby a Turing Machine. The converse claim is trivial, and if the thesis is correctwe can reduce problems of (non-)existence of effective methods by problemsof the (non-)existence of Turing Machine problems. We quote one of Turing’sformulations from [90]:

Turing’s Thesis: LCM’s [Logical Computing Machines, Turing’s expres-sion for Turing Machines] can do anything that could be described as “ruleof thumb” or “purely mechanical”.

We now introduce more of Turing’s theory of Turing Machines before wedefine his proof of undecidability.


We define a language to be a set of strings, a string being a series ofalphabet symbols (i.e. w ∈ Σ∗, for all strings w). We say that a TM Maccepts input string w if a sequence of configurations C1, . . . , Ck exists where

1 C1 is the start configuration of M on input w.

2 Each Ci yields Ci+1 via the transition function δ on M .

3 Ck is an accepting configuration.

A set of strings that M accepts is called the language of M .

Definition of the language of a TM: The language of a TM M , notationL(M) := w | w is a string that M accepts .

Let w ∈ Σ∗. We now define a notion that covers the ability of a TM toend in the accept state when started with any string of a certain language.

Definition of Turing-recognizable: A language L is recognized by a TMM := there exists a TM M such that for all strings

1 with input w, M stops in qaccept if w ∈ L and

2 with input w, M stops in qreject or does not stop (loops) if w /∈ L.

If language L is recognized by a TM M we say that M is an acceptor forL. We distinguish between recognizing and deciding capabilities.

Definition of Turing-decidable (or decidable): A language L is decidedby a TM M := there exists a deterministic TM M such that:

1 with input w, M holds in qaccept if w ∈ L, and

2 with input w, M holds in qreject if w /∈ L.

If a language L is decided by a TM M we say that M is a decider for L.

There are several variants on Turing Machines such as double-sided in-finite Turing Machines, multitape Turing Machines, non-deterministic Tur-ing Machines and certain types of so-called enumerators. Most variants areequivalent in the sense that they can recognize the same set of languages


(but not necessarily equally efficient).

Example: We now give an example of a Turing Machine solving a mathema-tical problem by first defining it as a language problem. The problem (ideafrom [56]) is to design a Turing Machine that computes the function

f(x, y) = x + y if x ≥ y

f(x, y) = 0 if x < y

For simplicity, we assume x and y to be positive integers. First we have tochoose a convention for representing positive integers, and decide what theinitial situation of the tape is. We choose a unary notation in which anypositive integer xis represented by w(x) ∈ 1+, such that | w(x) |= x. Weassume that w(x) and w(y) are on the tape in unary notation, separated bya single ‘0’ and with the read-write head on the left-most symbol of w(x).We first describe how the sum of x and y can be calculated, then how thecomparison x ≥ y can be made and finally how to combine those two ma-chines into a Turing Machine that computes the desired function.

Calculating the sum

To add the two numbers a and b, we only have to remove the separating0, so addition amounts to the concatenation of two strings. The followingTuring Machine, called Adder, adds a and b and is constructed relativelysimple:Adder = (Q, Σ, Γ, δ, q0, qA, qR), with

Q = q0, q1, . . . , q4

Σ = 0, 1

Γ = 0, 1,"

q0 = q0

qA = q4

qR =

δ(q0, 1) = (q0, 1, R)


δ(q0, 0) = (q1, 1, R)

δ(q1, 1) = (q1, 1, R)

δ(q1,") = (q2,", L)

δ(q2, 1) = (q3, 0, L)

δ(q3, 1) = (q3, 1, L)

δ(q3,") = (q4,", R)

Note that we remove the ‘0’ by temporarily creating an extra ‘1’, a factthat is remembered by putting the machine into state q1. The transitionδ(q2, 1) = (q0, 0, R) is needed to remove this ‘1’ at the end of the computa-tion. Finally, we move the read-write head back to the leftmost ‘1’. Thisis not strictly necessary in this example, because the machine is designedsuch that it will terminate right after any addition, but it is not harmful andnormally a good habit to let any action terminate in a state from which it iseasy to take further transitions.

Comparison

To compare two numbers a and b, we again assume they are written in thenotation that we used before and divided by a single ‘0’. We will constructa Turing Machine that halts in an accepting state if a ≥ b and in a rejectingstate if a < b. Thereto we can match each ‘1’ on the left of the dividing‘0’ with a ‘1’ on the right. We can do this by starting at the leftmost ‘1’(of the number a) and interchangeably check off the leftmost symbols of thenumbers a and b by replacing them with the symbols ‘x’ and ‘y’ respectively.The matching will stop when one of the two sequences of ‘1’s is completelychecked off. If x < y then the right sequences will still contain ‘1’s, andif x ≥ y either the left sequence contains ‘1’s or neither sequence contains‘1’s. In the first case, we still find a ‘1’ on the right when all ‘1’s on the lefthave been replaced. We use this to get into the state q5. In the second case,if a ≥ b, when we attempt to match another ‘1’, we encounter a blank atthe right of the working space, which can be used as a signal to enter theaccepting state. If we work this out in detail, we get the following TuringMachine called Comparer :=(Q, Σ, Γ, δ, q0, qA, qR), with:


Q = q0, q1, q2, q3, q4, q5, q6, q7

Σ = 0, 1

Γ = 0, 1, x, y,"

q0 = q0

qA = q5

qR = q7

The transitions of δ can be grouped in several parts.

δ(q0, 1) = (q1, x, R)

δ(q1, 1) = (q1, 1, R)

δ(q1, 0) = (q2, 0, R)

δ(q2, y) = (q2, y, R)

δ(q2, 1) = (q3, y, L)

This set replaces the leftmost ‘1’ of a with ‘x’, then causes the read-writehead to travel right to the first ‘1’ of b and replace it with the symbol ‘y’.When the dividing ‘0’ is passed, the machine enters state q2, indicating that itis now dealing with the number b. When the symbol ‘y’ has been written, themachine enters a state q3, indicating that on ‘1’ of ‘y’ has been successfullypaired with a ‘1’ of ‘x’. The next group of transitions reverses the directionand repositions the read-write head over the leftmost ‘1’ of a, and returnscontrol to the initial state,

δ(q3, y) = (q3, y, L)

δ(q3, 0) = (q4, 0, L)

δ(q4, 1) = (q4, 1, L)

δ(q4, x) = (q0, x, R)


The rewriting continues this way when the input is a string 1x01y, stoppingonly when on one side no more ‘1’s can be replaced. In that case either theleft side will not contain anymore ‘1’s (a ≤ b), or the right side has run out of‘1’s (a > b). In case the left side will not contain anymore ‘1’s, the transitionδ(q4, x) = (q0, x, R) will leave the read-write head on a ‘0’ in stead of a ‘1’.

δ(q0, 0) = (q5, x, L) (a ≤ b)

δ(q2,") = (q6,", L) (a > b)

In the first case we still have to check whether the right side has any ‘1’s left,to determine whether a = b. This is done in the state q5.

δ(q5, x) = (q5, x, R)

δ(q5, 0) = (q5, 0, R)

δ(q5, y) = (q5, y, R)

δ(q5, 1) = (q7, y, R) (a < b)

δ(q5,") = (q6,", L) (a = b)

Combining Turing Machines for complicated tasks

We now have to put together the Turing Machines’ Adder and Comparerto obtain the desired Turing Machine that computes the given function. Wecan do this by starting with the input a and b in the previously describednotation and starting position, and using Comparer to determine whetheror not a ≥ b. We index all states with a C, i.e. the last transition will beδ(qC,0, x) = (qC,5, x, L) or δ(qC,2,") = δ(qC,6,", L). In the first case (a ≥ b),the Comparer should send a ‘start signal’ to the Adder, to give a + b as out-put. In the second case (a < b), the Comparer should send a ‘start signal’to a Turing Machine, (called Eraser) that simply replaces all ‘1’s by ‘0’s tooutput the value 0 in the desired format.We show how we can let the Comparer send these ‘start signals’. We firstindex all states of the Adder by A and of the Eraser by E. Now in case ofa ≥ b, Comparer ends in state qC,5, and we can add a transition δ(qC,5, ∗) =δ(qA,0, ∗). The star ‘∗’ stands for any possible symbol, so actually this tran-sition is a shorthand notation for a set of transitions. Similarly, we can let


δ(qC,7, ∗) = δ(qE,0, ∗) bring the Eraser in the initial state. The Adder respec-tively Eraser will then give the desired output because their behavior on theinput does not change as a result of the remaining of the states by comparer(to be exact: the state in which the comparer terminates is suitable as aninitial position for Adder or Eraser). The only thing we have not taken careof is that when the Comparer enters a final state, it does not have the initialrepresentation of the numbers a and b on tape, but has replaced the ‘1’s by‘x’s and ‘y’s. We can easily (it is just some extra work, you can try it as anexercise if you want) fix this by letting Comparer, as the last action beforeentering a final state, replace all ‘x’s and ‘y’s by ‘1’s. The result is a TuringMachine that combines Comparer, Adder and Eraser to compute the func-tion f . Similarly to this example, we can for example multiply two numbersa and b, and we can also translate macro-instructions like ‘if p then qj elseqk’ (meaning that when we read ‘p’ on tape, then the Turing Machine goesinto a state qj and otherwise into a state qk), and even combine them intocomplicated subprograms that can be invoked repeatedly whenever needed.(End of Example)

The Entscheidungsproblem

After introducing the notion of a TM in [89], Turing answered Hilbert’sdecision problem for mathematical logic (in German called ‘Entscheidungs-problem’) in the negative. The Entscheidungsproblem asks whether thereexists a definite method or algorithm which (at least in principle) can be ap-plied to any given mathematical property to decide whether that propositionis provable. We now define the notion of an algorithm with the notion of aTuring Machine, and the set of provable propositions by the set of languagesthat can be decided by some TM. If we look at the definition of decidabilityin section 6.1, we have that for all formulas ϕ an algorithm, i.e. a TM, existsthat decides whether ϕ is true or not. If we code ϕ by means of a language,and this is always possible (see the previous example for a demonstration),we can reformulate the problem as: for all strings w ∈ L, there exists a TMM that decides ϕ. We now show that this is not possible for all problems(i.e. languages) by giving a specific problem, the Halting problem, that is notdecidable.

The Halting problem is the problem of testing whether a TM accepts agiven input string. We define the problem by stating it as a language pro-


blem, and asking whether that language is decidable.

Definition of the Halting problem:For all strings w, H := < M,w > | M is a TM and M accepts w. Is Hdecidable? (i.e. is there for each language a TM that decides for all stringsw if they belong to the language or not, that is (using Turing’s thesis, seesection 9.3): is there for each problem an algorithm that can decide it?).

Theorem: H is recognizableProof (by Turing): The following TM U , also called Universal Turing Ma-chine because it is capable of simulating any other Turing Machine, recog-nizes H. We informally define U , because a detailed definition of the septuplesuch a TM consists of (see the definition of a TM) is a lot of work.

Description of Universal Turing Machine: U =“On the input < M,w > where M is a TM and w is a string:

1 simulate M on input w

2 if M ever enters its accept state, accept”

Note that this TM loops on input < M,w > if M loops on w, which iswhy this machine does not decide H. If the algorithm had some way to de-termine that M was not halting on w, it could reject . Hence H is sometimescalled the Halting problem. As Turing demonstrated, an algorithm has noway to make this determination.

Theorem: H is undecidable (see also [82, page 165]).Proof (by Turing): We assume H is decidable and obtain a contradiction.Suppose D is a decider for H, and defined byD(< M,w >) :=“

• accept if M accepts w

• reject if M does not accept w”

Now we construct a new TM O with D as a subroutine. This new TMcalls D to determine what M does when the input to M is its own description< M >. Once O has determined this information, it does the opposite. Thatis, it rejects if M accepts and accepts if M does not accept. The following isa description of O: O = “On input < M >, where M is a TM:


1 run D on input < M,< M >>,

2 output the opposite of what D outputs; that is if D accepts, reject andif D rejects, accept”

We summarize the behavior of O as follows:O(< M >) = “

• accept if M does not accept < M >

• reject if M accepts < M > ”

Now we obtain the contradiction by running O with its own description< O > as input. In that case we get:O(< O >) = “

• accept if O does not accept < O >

• reject if O does accept < O > ”

Thus neither O nor D can exist.

Turing wrote in his last publication about the interpretation of unsolvableproblems, such as the Halting problem for Turing machines:

These . . . may be regarded as going some way towards a demon-stration, within mathematics itself, of the inadequacy of ‘reason’unsupported by common sense.

- Alan Turing

In this section I have made extensive use of [38] [92] for information onthe life and work of Turing and [89] [82] [19] for the theory of TM’s and theHalting problem. Another valuable source of information on Turing’s life andwork is the website http://www.turing.org.uk/

9.2. CHURCH AND THE LAMBDA CALCULUS 153

9.2 Church and the Lambda Calculus

Alonzo Church (1903-1995) was an American mathematician, whose work isof major importance in mathematical logic, recursion theory and in theore-tical computer science. One of the most important contributions to logic ishis invention in the 1930s of the lambda calculus. He is also rememberedfor Church’s theorem published in 1936 in [14, page 345-363], stating thatthe lambda calculus can be used to embody a correct formalization of thenotion of computability (see section 9.3). The notion of lambda definabilityis conceptually the basis for the discipline of functional programming, andthe lambda calculus is also the basis for type theory. Church also foundedthe Journal of Symbolic logic in 1956. He had 31 doctoral students includingfamous mathematicians such as Turing, Kleene, Kemeny and Smullyan. Wenow introduce the lambda calculus (Church’s formalization of the notion ofeffective calculability) in a modern setting, using [9, chapter 4].

Application and abstraction

First we introduce the basic concepts of λ-calculus. A formalization fol-lows thereafter. The lambda calculus has only two basic operations, abstrac-tion and application.

• Abstraction is for constructing functions: For an expression E we in-troduce λx.E to denote the abstraction of E over x, i.e. ‘the functionof x which computes E’.Example1: λx . x + 1, λn . n× n, etc.We will later see how to define a recursive function; this is not so easysince we do not have function names.

• (Function) application: The expression F A denotes that F is consid-ered as a function (an algorithm) applied to input A. The originallambda calculus theory is type-free so we also consider F F , that is, Fapplied to itself.Example: (λx . x + 1) 4, (λn . n× n) 7, etc.

1Note that in some examples we have simplified the notation for the clarity of theexample, since in pure lambda calculus we do not have arithmetic symbols, like + and ×,but we can encode these operations in the pure lambda calculus, as we will later see.


These two notions can be very powerful if we introduce the rule of betareduction which allows us to apply an expression over an abstraction, and forexample, rewrite (λx . x+1)4 to 4+1. Similarly (λn . n×n) 7 can be reducedto 7×7. It is also allowed to use arbitrary nesting: ((λn . λx . (x+1)×n) 7) 4can be reduced to (λx . (x + 1)× 7) 4 and then to (4 + 1)× 7.

Similar to ordinary mathematics, the names of the variables are irrele-vant to the rules that can be applied, which allows a transformation of thenames (also known as dummy transformation). This rule in lambda calculusis called alpha conversion. For example, alpha conversion allows us to rewriteλn . nn to λx . xx, since they are essentially the same function.

Note that we also want to use functions as variables and arguments:((λf . (λn . λx . fx × n) 7)(λy . y + 1)) 4 should reduce to the earlierexpression.But above we only have functions of one argument; we now introduce functionswith more arguments, while avoiding new notations. We can solve this pro-blem by using iteration of applications, often called currying after the Amer-ican mathematician H.B. Curry who made it popular.

Example: f(x, y) = 3× x + y can be written as F1 ≡ λx . (λy . 3× x + y).Then f(4, 5) is written (F1 4) 5, that is ((λx . (λy . 3× x + y)) 4) 5, whichcan be reduced to (by using beta reduction): 3× 4 + 5.

The above explanation and examples give an idea of what lambda calcu-lus is. We will now work towards a more formal definition of lambda calculus.The system of lambda calculus is based on the structure of Abstract Reduc-tion Systems (ARS). The terms of the ARS then coincide with the inductivelydefined lambda terms and the reduction relation will be β−reduction. So be-fore we formally define the lambda calculus, we introduce the most relevanttheory of abstract reduction systems.

Abstract Reduction Systems

Definition of Abstract Reduction System (ARS): An abstract reduc-tion system A := a structure 〈 A,→ 〉 consisting of a set A and a binaryrelation → on A (i.e. →⊆ A× A).The relation is also called reduction or rewrite relation. If for a, b ∈ A, wehave a→ b, we call b a one-step reduct of a.


The transitive and reflexive closure of→ is written as (or alternatively→∗). This means is the smallest relation on A satisfying, for all a, b, c ∈ A,

(closure of →) if a→ b then a b,

(reflexive) a a, and

(transitive) if a b and b c then a c.

Thus a b if and only if there exists a finite sequence of reduction stepsa ≡ a0 → a1 → . . . → an ≡ b. This sequence may be empty, in which casea ≡ b. Here ≡ denotes (the syntactic) identity of elements of A, i.e. a ≡ b ifand only if a and b are the same element of A.

Definition of Normal Form: A term a ∈ A of an ARS < A,→> is anormal form := there is no b ∈ A such that a→ b. Furthermore, b ∈ A hasa normal form if and only if b a for some normal form a ∈ A

Definition of Weakly Normalizing: The reduction relation → of anARS < A,→> is weakly normalizing (or weakly terminating) := every a ∈ Ahas a normal form. In this case we also say that A is weakly normalizing

Definition of Strongly Normalizing: The reduction relation → of anARS < A,→> is strongly normalizing (also called terminating, well-foundedor noetherian) := there exists no infinite reduction a0 → a1 → a2 → . . .,with for all n ∈ N, an ∈ A.

Lemma If an ARS is strongly normalizing, it is weakly normalizing.

Proof: We prove this by proving the contraposition: if 〈A,→〉 is not weaklynormalizing then 〈A,→〉 is not strongly normalizing. Suppose 〈A,→〉 is notweakly normalizing. Then there is a0 ∈ A without a normal form. Since a0

has no normal form, then certainly a0 is not a normal form itself, so there isa1 ∈ A such that a0 → a1. Now a0 has no normal form, so a1 can not be anormal form. Thus we get an element a2 ∈ A such that a1 → a2. Repeatingthis process yields an infinite reduction a0 → a1 → a2 → . . ..

Definition of Unique Normal Form: The reduction relation → of anARS < A,→> has the unique normal form property := for all a, b, c ∈ A


such that a b, a c, and b, c are normal forms, we have b ≡ c

Lemma An ARS < A,→> with the unique normal form property is notalways weakly normalizing.Proof: For instance, the abstract reduction system with only element a ∈ Aand rewrite rule a → a has no normal forms, so it trivially has the uniquenormal form property and is not weakly normalizing.

Definition of Local Confluence: A reduction relation → of an ARS< A,→> is called locally confluent or weakly confluent (also weakly Church-Rosser) := for all a, b, c ∈ A with a→ b and a→ c there exists a d ∈ A suchthat b d and c d

Definition of Confluence: A reduction relation → of an ARS < A,→>is called confluent (or has the Church-Rosser property , or is Church-Rosser):= for all a, b, c ∈ A with a b and a c there exists a d ∈ A such thatb d and c d

Lemma If a reduction relation has the unique normal form property and isweakly normalizing then it is confluent.Proof: Suppose we have a b and a c. Since → is weakly normalizing,there are normal forms b′ and c′ such that b b′ and c c′. By transitivitywe also have a b′ and a c′, and thus by the unique normal form propertyb′ ≡ c′. Hence b b′ and c b′.

Lemma If → is confluent then → has the unique normal form property.Proof: Suppose a b, a c, and b, c are normal forms. By confluence,there exists a d such that b d and c d. Since b and c are normal forms,we must have b ≡ d and c ≡ d, thus b ≡ c.

Syntax

Now we have seen the basic principle of lambda calculus, we will give amore formal definition. We formally define the syntax of the lambda calculusby giving its grammar.


Definition of the Syntax of Lambda Terms:Lambda Term E := C | v | (E1E2) | (λv . E) , with

• C ranges over a set of constants(we will use the constant names a, b, c, . . . for elements of C)

• v ranges over a (denumerable) set of variables (using v, w, x, . . .)

• (E1E2) denotes a combination involving the application of one expres-sion (E1) to another (E2). The subexpression E1 is referred to as theoperator and E2 is referred to as the operand

• (λv . E) denotes an abstraction. Informally it denotes a function of vwhich produces result E. The subexpression E is referred to as the bodyof the abstraction and v is called the bound variable of the abstraction

We also call lambda terms simply ‘terms’ or ‘expressions’.Notational conventions: to achieve a minimal notation, we drop parentheseswhenever possible, and assume:

• Association to the left for iterated application:F E1 E2 . . . En denotes (. . . ((F E1) E2) . . . En),

• Association to the right for iterated abstraction:λx1 . x2. . . . .xn.E or shortly λx1 x2 . . . xn . Edenotes λx1 . (λx2 . (. . . (λxn . E) . . .)).

Example: We can write the expression F1 of the previous example asλx y . 3× x + y, and λv . E1E2 means (λv . (E1E2)).

Free/Bound Variables and α-conversion

We distinguish between free and bound occurrences of variables in an ex-pression. An occurrence of v in E is said to be bound if it occurs within asubexpression of E with the form λv . E1, and the occurrence is said to befree otherwise.

Example: n occurs free in λx . (x + 1) × n, whereas x occurs bound in thisexpression. Both n and x occur bound in λn . (λx . x + 1) × n. Further xoccurs both bound and free in (λx . x + 1)× x (the second occurrence of ‘x’


in this expression is bound, the third occurrence is free).

Definition of free variables: The free variables of a term E, denoted byFV (E), is a set of variables defined recursively by:

• FV (C) = ∅,

• FV (v) = v,

• FV (E1E2) = FV (E1) ∪ FV (E2),

• FV (λv . E) = FV (E)− v.

An expression E is said to be closed if FV (E) = ∅.

Example: The expression λz . (λx . z + x)(λy . y × z) is closed.

α-conversion

We consider two terms as ‘equivalent’ if they only differ in their boundvariables. So λx . x and λy . y are considered being equivalent. But we mustdistinguish λx . y +x and λy . y + y, since one has a free occurrence of y andthe other not. Note also that λxy . xy and λxy . yx are not equivalent. Therenaming process is called α-conversion, and allows us to change the nameof a bound variable, as long as we do so consistently. It is formally definedas the equivalence relation generated by the following reduction:

Definition of α-reduction: λx . E →α λy . E ′, where E ′ is obtained fromE by replacing all free occurrences of x in E by y, provided y is fresh, that is,y neither occurs as a free variable nor as a bound variable in the expressionE (i.e. it does not occur in E).

Expressions that can be made textually equivalent by renaming boundvariables are called α-convertible or alpha(betically) equivalent . When twolambda terms E1 and E2 are α-convertible in this sense we write E1 ≡α E2,and often also E1 ≡ E2.


Example: Some α-conversions:λx . x + 1 ≡α λy . y + 1λx . (λy . y× x) ≡α λy . (λy . y× y) (because the y’s in (λy . y× x) will getbound)λx .(λy . x× y)y ≡α λx .(λz . x× z)y

From now on, two λ-terms are considered (syntactically) equal if they areα-convertible to each other.

Substitution

We now formally define the concept of substitution of a variable in lambdaterms.

Definition of Substitution: The substitution of expression E for each freeoccurrence of v in expression E0, denoted by E0[E/v], is defined by inductionon the structure of E0 as:

• C[E/v] ≡ C

• x[E/v] ≡

E if x ≡ vx if x ≡ v

• (E1E2)[E/v] ≡ (E1[E/v])(E2[E/v])

• (λx . E1)[E/v] ≡

λx . E1 if x ≡ vλx . (E1[E/v]) if x ≡ v and x /∈ FV (E)λy . ((E1[y/x])[E/v]) if x ≡ v and x ∈ FV (E)

and y /∈ FV (E1E)

Example: (λx . z+7×x)[x+3/z] ≡ λy . (z+7×y)[x+3/z]≡ λy . (x+3)+7×y.

The following lemma tells us that substitution behaves well; it can beproven by induction on the structure of λ-terms.

Lemma For all terms E0, E1, E2 and variables x, y such that x ≡ y:

E0[E1/x][E2/y] ≡ E0[E2/y][E1[E2/y]/x].


Reduction System for the Lambda Calculus

As we have seen with an example at the beginning of this section, themain rule for the lambda calculus is the beta reduction rule, that we can nowformally define.

Definition of β-reduction: β-reduction is the compatible relation gener-ated by (λv . E1)E2 →β E1[E2/v], with the rules:

E1 →β E2

E1E →β E2EE1 →β E2

EE1 →β EE2

E1 →β E2

λv.E1 →β λv.E2

As before, any term matching the left-hand side of the rule is called a redexand thus any expression of the form (λv . E1)E2 is called a β-redex .β-reduction is a reduction relation→β of the pure lambda calculus. We oftenwrite → resp. instead of →β and β. We use =β (or sometimes simply=) to denote the equivalence relation generated by →β. Note the differencebetween ≡( α) and =(β).

Example: (λnx . (x + 1)× n) 7 4→β (λx . (x + 1)× 7) 4→β (4 + 1)× 7.

Example: This example illustrates the need of α-conversion during β reduc-tion, even if distinct names are chosen from the start. Define TWICE ≡λf . λx . f(fx), then

(λy . yy)TWICE

→β TWICE TWICE

≡ (λf . λx . f(fx)) TWICE

→β λx . TWICE (TWICE x)

≡ λx . TWICE ((λf . λx . f(fx))x)

→β λx . TWICE ((λx . f(fx))[x/f ]) (Note the name clash)

≡α λx . TWICE ((λy . f(fy))[x/f ])


≡ λx . TWICE (λy . x(xy))

→β . . .

Example:

1. (λx . x + 1) ((λy . y × y) 3) β (two possibilities) (3× 3) + 1,so different reduction paths are possible.

2. Ω ≡ (λx . xx)(λx . xx) →β (λx . xx)(λx . xx) →β · · ·, thus infinitesequences of steps are possible: β-reduction is not always terminating.This corresponds to ‘self-reproducing programs’.

3. (λx . xxx)(λx . xxx) →β (λx . xxx)(λx . xxx)(λx . xxx) →β · · ·, soterms can even become arbitrarily large.

4. (λy . c)((λx . xxx)(λx . xxx))→ c, but also(λy . c)((λx . xxx)(λx . xxx))→ (λy . c)((λx . xxx)(λx . xxx)(λx . xxx))and the latter term can be reduced to c or again to a longer term, etc.

Although we already saw that λ-calculus is neither weakly nor strongly nor-malizing, it does have the important confluence property. First we introducethe following definition of the diamond property that we use to prove that→β is confluent. To prevent confusion in the notation we will from now onalso use the implication symbol ⇒.

Definition of the Diamond Property: A binary relation → on thelambda terms Λ satisfies the diamond property, notation →|= ♦ :=(∀M,M1,M2 : M,M1,M2 ∈ Λ : (M → M1 ∧M → M2) ⇒ (∃M3 : M3 ∈ Λ :M1 →M3 ∧ M2 →M3))

Note that a reduction →β has the Church-Rosser property if it satisfiesthe diamond property.

Lemma: Let → be a binary relation on a set Λ with its transitive,reflexive closure and let →|= ♦. Then |= ♦.


Proof: Assume → is a binary relation on a set Λ with its transitive,reflexive closure, and →|= ♦. We now have to prove that |= ♦. SupposeM , L, K ∈ Λ, M L and M K. We then have to prove (∃N : N ∈ Λ :L N ∧K N). Let

(*) M ≡M0 →M1 → . . .→Mn ≡ L, for some n ∈ N

(**) M ≡ K0 → K1 → . . .→ Km ≡ K, for some m ∈ N

We now need to apply a technique called induction loading (see for moreinformation the links on http://zax.mine.nu/stage/) to prove that K and Lhave a common reduct N. To be precise, we show that l(m,n) holds for allm,n ∈ N, with

l(m,n) := there exists a N(i, j) ∈ Λ, with i, j ∈ N and 0 ≤ i ≤ n∧ 0 ≤ j ≤ m such that:

(a) N(i, 0) ≡Mi if 0 ≤ i ≤ n

(b) N(0, j) ≡ Kj if 0 ≤ j ≤ m

(c) N(i, j)→ N(i, j + 1) if 0 ≤ i ≤ n ∧ 0 ≤ j < m

(d) N(i, j)→ N(i + 1, j) if 0 ≤ i < n ∧ 0 ≤ j ≤ m

Clearly, when l(m,n) is true for all m,n ∈ N, we know that K and L havea common reduct. So the only remaining proof obligation is to show thatl(m,n) holds for all m,n ∈ N. We prove this by induction to n.Base case (n): n=0

(a) let N(0, 0) be M0, then (a) holds trivially by reflexivity of ‘≡’.

(b) let N(0, j) be Kj for 0 ≤ j ≤ m, then (b) also holds trivially.

Note that this is valid in combination with the definition under (a)since N(0, 0) ≡M0 ≡M ≡ K0.

(c) N(i, j)→ N(i, j + 1) holds because i = 0 and (**).

(d) N(i, j)→ N(i + 1, j) holds trivially because n = 0 yields an empty range for i.


Induction case (n): Induction hypothesis (i.h.-n): suppose that for n = k,k ∈ N, for all m ∈ N the statement l(m,n) is true. We now prove thestatement for n = k + 1. We do this by induction to m.Base case (m): m=0

(a) let N(k + 1, 0) be Mk+1 for 0 ≤ k ≤ m, then (a) holds trivially.

(b) since j = 0 this amounts to N(0, 0) ≡ K0.

This is true because of our previous definition of N(0, 0) ≡M0.and the fact that M0 ≡M ≡ K0.

(c) holds trivially, because m = 0 yields an empty range for j.

(d) N(i, j)→ N(i + 1, j) because j = 0 and (*).

Induction case (m): Induction hypothesis (i.h.-m): suppose that for m = rand n = k + 1, r ∈ N, the statement l(m,n) is true. We now prove thestatement for m = r + 1.

(a) N(i, 0) ≡Mi for 0 ≤ i ≤ k + 1 follows from i.h.-n.

(b) N(0, j) ≡ Kj for 0 ≤ j ≤ r + 1 follows from i.h.-m.

(c) and (d)

We already know from the induction hypotheses that N(i, j) →N(i, j + 1) is okay for (0 ≤ i ≤ k + 1∧ 0 ≤ j < r) ∨ (0 ≤ i < k ∧ 0 ≤j < r + 1). What we now have to show is that this is also true fori = k+1 and j = r+1. We know by (c) of i.h.-m there exists a N(k, r)such that N(k, r) → N(k, r + 1). We also know by (d) of i.h.-n thatthere exists a N(k, r) such that N(k, r)→ N(k + 1, r). Then by thediamond property of→ we know (∃N(k +1, r+1) : N(k +1, r+1) ∈Λ : N(k, r + 1)→ N(k + 1, r + 1) ∧N(k + 1, r)→ N(k + 1, r + 1)).

We can now sketch the proof2 of the following fundamental theorem ofthe untyped lambda calculus:

2The lines of the proof are due to W. Tait and P. Martin-Lof (see [6], section 3.2]), butas far as I know this is the first proof that formalized the above lemma to a reasonableextent.


Theorem (Church, Rosser): →β is confluent.Proof: By the previous lemma, we know that if any binary relation on a setsatisfies the diamond property, its transitive reflexive closure also satisfies thediamond property. Suppose we have a binary relation→partial−β on the set Λsuch that β is the transitive reflexive closure of →partial−β. So if we provethat→partial−β satisfies the diamond property, by application of the previouslemma we have proved that β satisfies the diamond property, i.e. →β isconfluent.A concrete definition of→partial−β, a proof that its transitive reflexive closureis indeed→β, and a proof that→partial−β satisfies the diamond property canbe found on pages 60-62 of [6].

Theorem: λ-calculus has the unique normal form property.Proof: Suppose that a term a of 〈Λ,→〉 has two normal forms, n1 ∈ Λand n2 ∈ Λ. This means there is no b ∈ Λ such that n1 → b or n2 → b.But a n1 ∧ a n2, and then by the Church-Rosser property we know(∃c : c ∈ Λ : a n1 ∧ a n2). But then we must have n1 ≡ n2.

Example: All constants are normal forms, as well as x, λx.x, λx.xx, yy, . . ..

Note that the term (λx.xx)(λx.xx) cannot be reduced to a normal form.Confluence is a fundamental property for functional programming; we relayon this when we evaluate programs by rewriting, knowing that we never haveto backtrack an evaluation (this is also one of the main differences with logicprogramming).

In the λ-calculus we have defined in this section, we can represent naturalnumbers and basic operations on the natural numbers. We will not showthis here; in most books on the lambda calculus there are some examples ofhow to do basic arithmetic in lambda calculus. The λ-calculus represents acertain class of (partial) functions on the integers. By a classical result of theAmerican mathematician Stephen C. Kleene (1909-1994) this is exactly theset of (partial) recursive functions. The proof can be found in [6, theorem9.2.16]. Church also thought of the set of functions that could be calculatedin his λ-calculus, and conjectured the following thesis:


Church’s thesis (1936) The set of effectively computable functions, i.e. functionsthat intuitively (effectively) can be computed, is the same as the set offunctions that can be defined in λ-calculus.

A more formal version and detailed treatment of Church’s thesis can befound in section 9.3.

Alan Turing proved in 1937 that the class of Turing computable functions isthe same as the class of functions definable in λ-calculus.

So the power of Turing Machines is the same as the power of λ-calculus.Both models capture the intuitive idea of computation. This important thesisis the subject of the next section.


9.3 The Church-Turing thesis

The Church-Turing thesis concerns the intuitive notion of algorithm (or ef-fective or mechanical method) in logic and mathematics. The notion of analgorithm or an effective method is an informal one, and attempts to char-acterize this effectiveness lacked rigor, mainly because the key requirementthat the method demands no insight or ingenuity is left unexplicated.

One of Turing’s achievements in his paper of 1936 (reprinted in [19]and online available at http://www.abelard.org/turpap2/tp2-ie.asp) was topresent a formally exact predicate with which the informal predicate ‘can becalculated by means of an algorithm or effective method’ may be replaced.The formal concept proposed by Turing is that of computability by a TuringMachine (see section 9.1). He introduced this thesis in [90] in the course ofarguing that the ‘Entscheidungsproblem’ for the predicate calculus is unsolv-able.

Turing’s thesis: TM’s can do anything that could be described as intu-itively computable

Church also presented in [14] a formally exact way to express this no-tion of intuitively computable. Turing’s method was however more obviousand more general than Church’s, since the latter only considered functionsof positive integers. In order to calculate the values of the function Churchintroduced his lambda calculus and specified the notion of a recursive func-tion (see section 9.2).

Church’s thesis: A function of positive integers is effectively computableonly if it is recursive

The reverse implication is also referred to as the converse of Church’sthesis. The class of lambda-definable functions and the class of recursivefunctions were later shown to be identical. This was established in the caseof functions of positive integers by Church and the American mathematicianKleene (see [47], [14]). After learning of Church’s proposal, Turing quicklyestablished that the apparatus of lambda-definability and his own apparatusof computability were equivalent ([89], page 263).

9.3. THE CHURCH-TURING THESIS 167

Theorem: Lambda-definability and Turing Machine-computabilityare equivalent.Proof: See [89, page 263] for a proof that Turing’s machines and Church’slambda calculus can compute the same set of functions.

Although Turing and Church had chosen different ways to formalize theintuitive notion of effective computability, respectively by identifying the no-tion with that of computability by a Turing Machine and in the lambda cal-culus, both methods are equivalent. After this proof of equivalence, Kleeneintroduced the term ‘Church-Turing thesis’ to refer to any of the two equiv-alent theses ([48], page 232).

Church-Turing thesis: The intuitive notion of an algorithm equals theTuring Machine algorithm or (equivalent) the calculable functions of lambda-calculus

There are a number of misunderstandings of the Church-Turing thesis,collected in [16]; Turing did not show that

• Any problem can be solved ‘by instructions, explicitly stated rules orprocedures’

• A universal TM ‘can compute any function that any computer, withany architecture can compute’ (Turing said noting about the limits ofwhat can be computed by a machine)

• Whatever can be calculated by a machine (working on finite data inaccordance with a finite program of instructions) is Turing-machine-computable (this is known as Thesis M, see [16])

• Any process that can be given a systematic mathematical description(or a ‘precise enough characterization of a set of steps’, or that is‘scientifically describable’ or ‘scientifically explicable’) can be simulatedby a TM (this is known as Thesis S, see [16])

Since the word ‘computable’ is often tied by definition to effective calcu-lability, the Church-Turing thesis is often stated as ‘All computable functionsare computable by a Turing Machine’ (a function is said to be computable ifand only if there is an effective procedure for determining its values).


If we summarize the above, we can say that to define the concept of analgorithm, Church used a notational system, the lambda calculus. Turing didthe same with his theoretical computing device, the Turing Machine. On theface they seemed very different from one another, but these two definitionsturned out to be equivalent, in the sense that each picks out the same setof mathematical functions. The Church-Turing thesis is the assertion thatthis set contains every function whose values can be obtained by a methodor algorithm corresponding to our intuitive notion of effectively computable.Clearly, if there were functions of which an informal (intuitive) statement,but not the formal statement, were true, then the latter would be less gene-ral than the former and so could not be reasonably be employed to replaceit. When the thesis is expressed in terms of the formal concept by Turing,it is appropriate to refer to the thesis also as the Turing thesis, and idemfor the case of Church. It is agreed amongst mathematicians and logiciansthat ‘computable by means of a TM’ is the correct accurate rendering of theinformal notion in question.

Chapter 10

Conclusion

It is a profoundly erroneous truism, repeated by all copy booksand by eminent people, when they are making speeches, that weshould cultivate the habit of thinking of what we are doing. Theprecise opposite is the case. Civilization advances by extendingthe number of important operations which we can perform with-out thinking about them. . . . The study of mathematics is apt tocommence in disappointment . . . We are told that by its aid thestars are weighed and the billions of molecules in a drop of waterare counted. Yet, like the ghost of Hamlet’s father, this greatestscience eludes the efforts of our mental weapons to grasp it.

- A. Whitehead, in [99]

When I started my study on the foundations of mathematics, I did notquite know what to expect. By now I’ve learned that the foundations ofmathematics can be a fascinating and important subject. Learning this newsubject was an interesting challenge, but sometimes hard work when I hadto go through numerous books that were full of details or too vague andphilosophical. Most books that I found on the foundations of mathematicswere either very detailed and descriptive (with an unmatched level of detailand exactness is the book [31] of I. Grattan-Guinness) or treat only a partof the theory that was developed from 1890 to 1940 (for example [17] givesan excellent introduction to set theory). One of the better, though relativelyunknown, is the book of G.T. Kneebone [49] that is quite complete and stillconsiderably theoretic. One of the motivations to write this article was to

169

170 CHAPTER 10. CONCLUSION

present the theory properly. Hopefully that makes it more clear and enjoy-able. Some of the good literature used, such as the books just mentioned,will be found in the references at the end of this report.

At the same time, I also tried to briefly introduce the reader to the his-torical context of the most important developments. Most undergraduatecourses I have taken gave little or no information about the history that islaying behind the theory. Emphasis was laid on the accumulation of mathe-matical knowledge. I believe that the history of mathematics in educationcan not only make the study of mathematics more interesting, but also helpin the growth of mathematical understanding and appreciation of the currentform of the theory.

I want to conclude this report with a summary of the theory and my ownview on the project, and with some ideas for future work.

The project

In the beginning of the 20th century Hilbert said we should formalize allof mathematics, mathematical reasoning. This ‘project’ (from now on I willrefer to it as the project) has been the central theme of this report. Whenreading about the work and biographies of all those brilliant men that haveput themselves on this problem, you can (at least that’s what happened tome) get caught up into this fascinating philosophical question.

To most people however, this all seems very impractical. We all knowyou can make a popular operating system or start your own business on theweb and in one year make a million dollars if you’re lucky. And when itcomes to verifying mathematical proofs and making reliable software, a for-mal basis is rarely used, the human mind is still the most important, andother techniques, such as model-checking, are preferred. It might be worthwriting another article, on how and why in that respect the more practical,working mathematicians and more theoretical logicians (or formalists, if youprefer) grew apart. But let’s first go back to the project.

The attempt to formalize mathematical reasoning is not new - the Greekalready thought rationality was the supreme goal. We can think of Plato

171

and Reason, or as Russell1 would say - think of Pythagoras and Rationality!Aristoteles made a big step in formalizing the reasoning, with his patternsof reasoning that are known as syllogisms. Ever since, logic was furtherdeveloped and important contributions come from De Morgan, Leibniz andespecially Boole. Because he was interested in theology and God (see [31,chapter 3] and also [30, section 5.8, page 203]), Cantor became obsessed withthe notion of infinite, and developed his theory of infinite sets. With Cantormathematics got more abstract, and some people regarded his set theoryas a disease. Poincare, the great French mathematician, said2: (from [95])“Later generations will regard Mengenlehre (set theory) as a disease fromwhich one has recovered.”. Peano and Frege, as we have learned in chapter4, brought mathematical reasoning to an even higher level of formalization.So far, so good. But there turned out to be some problems, and althoughCantor had already noticed this (see Cantor’s paradox in section 3.8), it wasRussell who spread the bad news to everyone, by stating his Russell paradox.At that point Hilbert proposed to use a formal axiomatic method to solvethese problems, and he gave his famous three requirements of consistency,completeness and decidability.

This proposal of Hilbert to formalize mathematics, led to the developmentof several axiomatic systems, such as those of Zermelo and Fraenkel, and ofGodel, Bernays and Neumann. Russell and Whitehead made their own at-tempt to formalize mathematics, with their theory of types. But althoughall of these attempts were fruitful to a certain extent, in total they all failed,and it took Godel and Turing to show that in fact ‘the project’ couldn’tbe done. Formalizing mathematics so that we have absolute truth is notpossible! But these works of Godel and Turing were new and complicated,and not everyone clearly recognized its importance. And even nowadays, fewpeople are familiar with the details of their work, and we often see confu-sion between notions like ‘checking the proof of a statement’ and ‘checkingwhether a statement is true (or not)’. There is also much confusion about theexact implications of Godel’s and Turing’s work. Godel created a statementwithin arithmetics, that is not provable in any axiomatic system. Turinglater formalized the notion of computability to show there is no mechanical

1Although rationality is more commonly associated with Plato, Russell always insistedon attributing it to Pythagoras (see [62]).

2Whether or not he actually said this is a matter of debate amongst historians ofmathematics.


procedure to decide if a statement is correct or not.

At first this was a shock, but then mathematicians were saying (andagain it would be nice to write an article about the different responses ofmathematicians and logicians): so what - we should do mathematics exactlythe same way as we’ve always done it, this does not apply to the problemsI care about. Indeed mathematicians continued with their work, and thetheorems of Godel and Turing had no or little impact in practice on howwe (should) do mathematics. The only effect the project might have had onworking mathematicians, is that they have become a bit more precise in theuse of language and in writing their proofs. Some of course were inspiredby problems like the 23 of Hilbert. But there has been another consequenceof all this theoretical work, that I was made aware of through a videotapedlecture of G.J. Chaitin on the internet. I quote him about Hilbert’s attemptto formalize all mathematics after the publications of the theorems of Godeland Turing: “It failed in that precise technical sense. But in fact it succeededmagnificently, not formalization of reasoning, but formalization of algorithmshas been the great technological success of our time - computer programminglanguages! So if you look at the history of the beginning of this century you’llsee papers by logicians studying the foundations of mathematics in whichthey had predicate calculi. Now you look back and you say this is clearlya programming language! [...] If you look at Turing’s paper of course thereis a machine language [...]. Or, as von Neumann said: the universal TuringMachine is really the notion of a general purpose programmable computer -and that’s the idea of software. [...] If you look at papers by Alonzo Churchyou see the lambda calculus, which is a functional programming language.If you look at Godel’s original paper you see what looks like LISP, it’s veryclose to LISP”. As he showed there are numerous examples of unexpectedoffspring of theoretical research, and all of the foundational work is not soimpractical after all! As G.J. Chaitin concluded in his speech, this is theway “we’re all benefiting from the glorious failure of this project!”. Nowthis is not entirely true, but it is true that theoretical studies, as he says“don’t have spin-off in dollars right away, but sometimes they have vastlyunexpected consequences”. Formal methods/studies have not always done agood job promoting themselves - maybe we can emphasize this aspect andshow that technology often advances through fascinating impractical ideas.

173

Status of the project

That brings us to ask if the question of the foundation of mathematics,more than a decade after Hilbert formulated it, is now settled once and forall. The short answer is: it is not. Even from the amount of interestingresources on current research that are available on the internet alone, we canconclude there is still a lot of work to do on the foundations of mathematics.I consider creating an online version of this document with more backgroundinformation and links.

Although Godel and Turing showed that it is impossible to totally for-malize even basic arithmetic, let alone the whole of mathematics, it is stillpossible to formalize parts of mathematics (for example, geometry) success-fully. As P. Andrews says in [4], “attempts to understand the nature of rea-soning and to build sophisticated information systems which can draw logicalconclusions may be regarded as part of an endeavor to fashion more powerfulintellectual tools for coping with the increasingly complex problems whichconfront mankind.” In that respect the formalization is not restricted to ma-thematical reasoning, and it can also be applied to other disciplines (suchas physics, chemistry or even social sciences). Especially the developmentof software and computer systems will be facilitated by a formalization oftheories. Despite that total formalization of parts of mathematics is veryuseful, this is not the focus of most current research: (most people believethat) the human mind will (at least for the near future) be the one to provewhether a given mathematical statement is true or not.

Ideas for future work anddistinguishment between mathematics and software

And although it cannot be determined by a machine whether any givenmathematical statement is true, we can try to develop an axiomatic systemsuch that as much as possible of the interesting statements3 can be provedwithin that system. This is useful because, even when all axiomatic systemsare incomplete and there are undecidable statements, if we provide one of the

3As interesting statements, we consider all statements in the (everyday) work of prac-ticing mathematicians. These ‘practical’ statements do not include the specific purelytheoretical statements that Godel invented for his incompleteness theorem.


statements that the system does contain, and which we claim to be decida-ble by providing a concrete and completely formalized (dis)proof of it withinthat system, we still have a way to decide mechanically whether or not theproof is correct for the given statement. The question then is if the set ofstatements for which we can do this, still forms a part of mathematics thatis interesting enough. This has to be a part of our investigation: to find outhow many of the practical mathematical proofs contain ‘meta-arguments’, inother words which classes will fall outside our system. Although we want tochange as little as possible to the (side of) mathematics itself, this also mightbe a necessary option4. As P. Andrews calls his book [4], we get: ‘to truththrough proof’. This should be the first goal for the near future:

(1) Investigate which parts of mathematics can(not) be formalized (i.e. con-tain ‘meta-arguments’), which formalization is best usable and allows mostparts of (practical) mathematics to be formalized, and totally formalize proofchecking for as most parts of mathematics as possible.

Formalization is not only important to check the correctness of mathema-tical theories that are becoming ever more complex. Many models in physicsand chemistry depend on underlying mathematical theorems, and the suc-cess of the model depends on the correctness of the mathematical theorems.Also, we are becoming more and more dependent on automated systems, inparticular computers and software. There is a growing need for reliable (thatis, correctly specified and working according to the specifications) software,not only for (safety) critical systems, but also in everyday applications. Aformal approach can not only be used to prove correctness of mathematicalstatements but also of computer programs. This is an important point:Distinguishment between mathematics and software construction.

Instead of the proofs of mathematical statements, we are then checkingthe derivation steps of program derivations. I want to emphasize this differ-ence, since it is often unclear or left implicit which of the two is meant whenarguments for/against formalistic studies are given. We have to realize thatwe can never obtain a 100% guarantee of correctness of any algorithm, since

4For a successful formalization of parts of mathematics we therefore do not only lookat the axiomatic system, but it also might require us to limit certain parts of mathematicsso that they contain less undecidable proofs or require us to rewrite certain existing proofsto a form that is permitted by the system.

175

we also are dependent on the correctness of the proof-checker. That is whywe have to try to keep the proof-checker as simple, small and intuitive as pos-sible (see also the ‘Bruijn criterion’ in [26, pages 4 and 26]). And analogue,we can never obtain a 100% guarantee of correctness of any mathematicalstatement, since we learned from Godel that the consistency of any axiomaticsystem cannot be proved within that system, and therefore we better alsotry to keep the axiomatic system as simple, small and intuitive as possible(we could see all this as the Bruijn criterion variant for axiomatic systems).But nevertheless, any such implementation of a proof checker would give usthe highest degree of certainty possible.

Software and Proof Checking

I would also like to remark that proof checking for programs can only giveus a way to verify the correctness of programs. At least as important (to ob-tain correct programs) is the correct construction of programs. This is thefocus of the work in the area of programming methodology. At the EindhovenUniversity of Technology for example, the techniques of E.W. Dijkstra areused to derive correct programs from their specification. Unfortunately bothareas (proof checking/verification vs. construction/derivation) are merely ad-vocates of their own approach, while a combination of both could give thebest results. Although there has been some minor work on formalizing theseproof techniques and combining formal methods and program derivations(see for example [26]), cooperation is still minimal. If we go one step furtherback in the process of creating correct software, the success of any piece ofsoftware depends on the correctness of its specification. These first phases ofsoftware engineering (indicating user requirements/specifications) can also beadopted to comply with the methods of program derivation and formal proofcheckers (note that we not only use the term ‘proof checker’ for mathematics,i.e. to check mathematical statements, but also for the software variant: forchecking algorithms/programs derivations). And since we can never obtaina 100% guarantee of correctness of software (it depends for example on thecorrectness of the specifications and the proof checker itself), model checkingtechniques can also be used as a verification method to improve reliabilityeven further. Therefore I stress for an integrated approach, for the combina-tion of all of the mentioned methods can only together give us the highestreliability (i.e. highest chance of correctness of software). Such an integratedapproach requires research and cooperation between the various branches


representing the methods I mentioned before and ultimately incorporationin the software engineering process.

Mathematics and Proof Checking

Let’s go back to proof checking of mathematical statements. We men-tioned the first goal of investigating and formalizing proof checking. As anext step (2) we can think of building proof assistants. Proof assistants notonly check the proofs for us, but also help us in making the proofs: theyare tools that are a combination of a proof development system and a proofchecker. A good article about proof assistants using dependent type systemscan be found in [8]. Also an interesting article on computer assisted mathe-matics (for computer algebra) is [7] with an abstract history of computationsversus proofs in mathematics. The notion of ‘helping’ or ‘assisting’ in makingproofs might be considered vague. For complicated statements, we can thinkof tools that keep track of the context of the proof, of the remaining proofobligations and even fill in part of the proofs for us automatically.

Proof assistants should make it easier for us to prove mathematical theorems.Then (3) we can think of building a standard library of proved mathematics.After a proof checker has confirmed the correctness of a given mathematicalstatement and its corresponding proof, they can be stored in a database. Itcan be accessible to everyone via the internet and even be used for previouslymentioned automated proving methods by proof assistants. And although wecan not see the quality of mathematical work as evident as the quality of phys-ical products, this could be the long awaited ‘quality stamp’ for mathematics.There have already been attempts to build standard libraries of mathematics(see the Mizar project at http://www.mizar.org/ and the PRL project, seehttp://.www.cs.cornell.edu/Info/Projects/NuPRL/nuprl.html, but they lackthe formal basis that has to be provided by (1) and (2)). Barendregt andhis group have formalized parts of algebra using the theorem prover COQ.This shows that it is possible to formalize large parts of mathematics, butthe process itself of formalizing mathematics is too direct and informal andneeds to be further developed. Many valuable experiences have come out ofattempts on what are here called phase (2), (3) and (4), but for a successfulresult this is premature and do we first have to start thoroughly at the be-ginning (1). Work in this direction was done in [44], where a syntax-drivenderivation system is presented for a formal language of mathematics called

177

Weak Type Theory. This is a start of a more rigorous approach to the trans-lation of mathematical texts (statements and proofs).

We see the extension of proof assistants with more intelligent and sophis-ticated automated proving methods, as the last and final phase (4) of futurework. Part of the branch of automated proving are classical theorem provingmethods (such as for example automated induction, etc.). New methods arefrom areas such as neural networks, fuzzy logic and genetic and DNA com-puting and in the future possibly even quantum computing.

I want to end these ideas by summarizing the steps that are laying aheadof us, in a new project.

The new project (for mathematics):

1 Investigate which parts of mathematics can(not) be formalized (i.e. con-tain ‘meta-arguments’), which formalization is best usable and allowsmost parts of (practical) mathematics to be formalized, and totallyformalize proof checking for as most parts of mathematics as possible

2 building a proof assistant (probably based on some form of WTT andsome form of TT)

3 build a standard library (archive) of proved mathematics

4 further develop automated proving techniques (to build in the proofassistant)

And similarly we can formulate the new project for computer systems:

The new project (for software construction):

1 formalize as much of program derivation checking as possible

2 build a programming assistant (environment) based on a suited (andpreferably popular) programming language

3 build a standard library of reusable correct software (i.e. suitable forcomponent based software engineering) and its specification

4 further develop automated proving and program derivation techniques


One of the most important questions, part of step (1), has so far in thisconclusion been avoided: What to take for the basis of mathematics? This isone of the most difficult questions and as we have seen many great scientistshave thought about this. There is currently no consensus of what is the bestapproach, and I am not in the position to give an argumented opinion. Athorough research of the alternatives will have to yield the best approach andwill show which choice of foundational system is best usable in practice.The only thing I can say is that it seems that recently most people seem tofavor type theory over category theory, relational calculi and also over settheory. P.J. Scott for example favors type theory over category theory inthe introduction of [55]. H. Barendregt gives arguments for the use of typetheory over set theory in [7], and we quote from [4, the second page of thepreface]: “[People prefer the approach they are most familiar with.] However,those familiar with both type theory and axiomatic set theory recognize thatin some ways the former provides a more natural vehicle than the latter forformalizing what mathematicians actually do”. On the contrary, on http://-www.rbjones.com/rbjpuc/logic/jrh0111.htm we find a detailed assessmenton the choice for a foundational system, with advantages of set theory overtype theory. Also, several new types of logic have been proposed, such as IFlogic (see [37]) and several types of so-called ‘fuzzy logics’, but until so farit seems they lack preciness, formalization and proofs to support claims thatthey can be used successfully as a foundation for mathematics.A final remark on the debate between type theory and axiomatic set theoryas a foundational basis, is that if there is a mapping from the axioms of(some form of) set theory in (some form of) type theory and vice versa, typetheoretic expressions have their counterparts in set theory. It is interesting toinvestigate if among such mappings there is indeed a bijection. That wouldshow the equivalence of both theories in expressive power, so that the debatecan turn onto the question which theory is more intuitive and useful.Some do not really believe in a successful formalization of mathematics butrather see the indeterminacies in mathematical representations and the un-decidabilities in any formal system as the source of problem solving andcreative power (see [87, page 174]). This standpoint was already mentionedin 1807 by the German mathematician Hegel (1770-1831) in [35]: “Dagegenmuß behauptet werden, daß die Wahrheit nicht ein ausgepragte Munze ist,die fertig gegeben und so angestrichen werden kann”.

179

I am aware of the limitations of this report. Many chapters are still infor-mal, such as the work of Frege in chapter 4. The theory of types in chapter7 and of Godels incomepleteness theorem in chapter 8 are not completelycovered and certain subjects closer to logic (such as intuitionism) are treatedvery minimally. The only excuse I have is that it is simply not possible tostudy all the original works in such a short period of time, and include alltheory in this report. I hope to complete this work at a later stage. It mightalso be worth to extend (on both sides) the period of which the theory istreated in this report. Recently we have seen interesting new theories oncategory and type theory and even on the foundations of mathematics, aswe look at Chaitin’s results on randomness; it seems that he went furtherwhere Godel and Turing left off. Finally I would like to remark that the ‘newproject’, consisting of the four steps mentioned in this conclusion, is just myown view of work that lays ahead of us. To end with a concluding remarkby Alan Turing, from his paper on the Turing test: “We can only see a shortdistance ahead, but we can see plenty there that needs to be done”.

Mark Scheffer, August 20015

5p.s. To those who wonder what the turtle and the elephant are doing on the cover ofthis report, I refer to the website http://zax.mine.nu/stage/.

Appendix A

Timeline and Images

Figure A.1: Luitzen Brouwer

Figure A.2: George Cantor

Drawings by Soshichi Uchii, [email protected];Photo Quine by Kelly Wise;Photo Ramsey due to Harcourt, Brace, Jovanovich.

181

182 APPENDIX A. TIMELINE AND IMAGES

Figure A.3: Richard Dedekind

Figure A.4: Gottlob Frege

Figure A.5: Kurt Godel

Figure A.6: David Hilbert

183

Figure A.7: John von Neumann

Figure A.8: Giuseppe Peano

Figure A.9: Henri Poincare

Figure A.10: Willard Van Orman Quine


Figure A.11: Frank Plumpton Ramsey

Figure A.12: Bertrand Russell

Figure A.13: Alan Turing

Bibliography

[1] Y. Bar-Hillel A.A. Fraenkel and A. Levy. Foundations of set theory.North-Holland Press, Amsterdam, 2 edition, 1973. First edition 1958.

[2] W. Ackermann and D. Hilbert. Grundzuge der Theoretischen Logik,volume Band XXVII of Die Grundlehren der Mathematischen Wis-senschaften in Einzeldarstellungen. Springer-Verlag, first edition, 1928.Berlin.

[3] J.H.J. Almering. Analyse. Delftse Uitgevers Maatschappij, 1993.

[4] P. Andrews. An introduction to mathematical logic and type theory: totruth through proof. Academic press, 1986.

[5] J. Backer and P. Rudnicki. Hilbert’s basis theorem. Association ofMizar Users, University of Bialystok, 12, 2000, 2000. Published inJournal of Formalised Mathematics.

[6] H. Barendregt. The Lambda Calculus - Its Syntax and Semantics, vol-ume 103. Elsevier Science Publishing Company, Inc., 1984.

[7] H. Barendregt and A.M. Cohen. Electronic Communication of Ma-thematics and the Interaction of Computer Algebra Systems and ProofAssistants. J. Symbolic Computation. Academic Press, 2001.

[8] H. Barendregt and H. Geuvers. Proof-checking using Dependent TypeSystems, volume 2, chapter 18, pages 1149-1240 of Handbook of Artifi-cial Reasoning. Oxford Press, 2001.

[9] C.J. Bloo. Computational Models. TU/e Press, 2001. Manuscriptoriginally started by H. Geuvers and J. Hooman.

187

188 BIBLIOGRAPHY

[10] J. Breuer. Introduction to the Theory of Sets. Prentice-Hall, August1958.

[11] Encyclopedia Brittanica. P. Bernays. EB, 2000.

[12] K.S. Brown. Mathematics. Seanet, 1991.

[13] G. Cantor. Ein beitrag zur mannigfaltigkeitslehre. Journal f. reine undangew. Math., Gesammelte Abhandlungen., 84, pages 119-133, 1878.Translated in ‘Contributions to the foundation of the theory of transfi-nite numbers (translation from German’, by Philip E. Jourdain, DoverPublishing, 1952.

[14] A. Church. An unsolvable problem in elementary number theory, vol-ume 58. American journal of Mathematics, 1936.

[15] P.J. Cohen. Set Theory and the Continuum Hypothesis. Benjamin,1966.

[16] B.J. Copeland. The Church-Turing Thesis. Springer-Verlag, 1997. Itemin Stanford Encyclopedia of Philosophy.

[17] H.C. Doets D. van Dalen and H. de Swart. Sets: Naive, Axiomatic andApplied. Pergamon Press, 1978.

[18] J.W. Dauben. Georg Cantor, His Mathematics and Philosophy of theInfinite. Harvard University Press, 1979.

[19] M. Davis. The Undecidable: Basic Papers on Undecidable Propositions,Unsolvable Problems and Computable Functions. Raven Press, NewYork, 1965.

[20] Diverse. Mathematische Annalen, 65. Springer-Verlag, Berlin, 1908.

[21] A. Einstein. Relativity: the special and general theory. Methuen Press,London, 1970.

[22] H. Eves. Mathematical Circles Revisited. Boston Press, 1971.

[23] H. Eves. Foundations and fundamental concepts of mathematics. Doverpublications inc., Mineola, New York, third edition edition, 1997.

BIBLIOGRAPHY 189

[24] A. Fraenkel. Einleitung in die Mengenlehre. Springer-Verlag, thirdedition, 1928.

[25] A.A. Fraenkel. Abstract Set Theory. North-Holland Press, Amsterdam,3 edition, 1966. First edition in 1953.

[26] M. Franssen. Cocktail. Eindhoven University Press, 2000. Doctoralthesis.

[27] K. Godel. On formally undecidable propositions of Principia Mathema-tica and related systems. Dover publications, New York, 1992. Englishtranslation of Godel’s original 1931 publication of the incompletenesstheorem. First published in 1962 by Basic Books, inc., New York.

[28] D. Goldrei. Classic set theory, a guided independant study. Chapmanand Hall, 1996.

[29] I. Grattan-Guinness. How did Russell write the principles of mathema-tics (1903). McMaster University Library Press, 1997. In the Journalof the Bertrand Russell Archive.

[30] I. Grattan-Guinness. From the Calculus to Set theory 1630-1910.Princeton University Press, 2000. First published in 1980 by G. Duck-worth & Co, London.

[31] I. Grattan-Guinness. The Search for Mathematical Roots 1870-1940.Princeton University Press, 2000.

[32] I. Grattan-Guinness. A sideways look at Hilbert’s Twenty-three Pro-blems of 1900. Middlesex University Press, 2000.

[33] J. Haim. Introduction of the Israel Mathematical Conference Procee-dings, volume 6. Bar-llan University Press, 1993.

[34] P.R. Halmos. Naive Set Theory. Van Nostrand Press, London, 1990.

[35] G.W.F. Hegel. Phanomenologie des Geistes. Reprint: Meiner, Hbg.,1807. English translation ‘The Phemenology of Mind’ by J.B. Bailliein 1910, London.

[36] H. Hermes and H. Schulz. Mathematische Logik. Unknown, 1952. InEncyklopedia Mathematische Wissenschaften, I1, 1, I, page 58.

190 BIBLIOGRAPHY

[37] J. Hintikka. The Principles of Mathematics Revisited. Cambridge Uni-versity Press, 1996.

[38] A. Hodges. Turing. The Great Philosophers. Phoenix, 1997.

[39] A.D. Irvine. Bertrand Arthur William Russell. Stanford UniversityPress, 2000.

[40] D. Joyce. Hilbert’s 1900 Address. Clark University, Worcester, 1997.

[41] D. Joyce. A list of Hilbert’s problems. Clark University, Worcester,1997.

[42] D. Joyce. The Mathematical Problems of David Hilbert, http://-alepho.clarku.edu/ djoyce/hilbert/. Clark University, Worcester, 1997.

[43] F. Kamareddine and T. Laan. A reflection on russell’s ramified typesand kripke’s hierarchy of truths. Journal of the Interest Group in Pureand Applied Logic, 4 (2):195–213, 1996.

[44] F. Kamareddine and R. Nederpelt. A derivation system for a formallanguage of mathematics. To be published, July 2001.

[45] I. Kaplansky. Encyclopedia Brittanica, item on David Hilbert. EB,1990.

[46] E. Kasner and J. Newman. Mathematicians and the imagination. NewYork Publishing, 1940.

[47] S.C. Kleene. Lambda-definability and recursiveness. Duke Mathemati-cal Journal 2:340-353, 1936.

[48] S.C. Kleene. Mathematical Logic. New York, 1967.

[49] G.T. Kneebone. Mathematical logic and the foundations of mathema-tics. D. van Nostrand Company, 1963. Reprint 2001.

[50] J. Koendrink. Solid Shape. Cambridge, 1990.

[51] K. Kunen. Set theory: an introduction of independence proofs. NewYork Press, 1980.

BIBLIOGRAPHY 191

[52] T. Laan. A formalization of the ramified type theory. TUE ComputingScience Reports, 1994. Technical Report 94-33.

[53] T. Laan. The Evolution of Type Theory in Logic and Mathematics.PhD thesis, Eindhoven University of Technology, 1997.

[54] T. Laan and R.P. Nederpelt. A modern elaboration of the ramifiedtheory of types. Studia Logica, 57(2/3):243–278, 1996.

[55] J. Lambek and P.J. Scott. Introduction to higher order logic. CambridgeUniversity Press, 2001.

[56] P. Linz. An introduction to formal languages and automata. D.C. Heathand Company, 1990.

[57] J.R. Lucas. The conceptual roots of mathematics. Rootledge Press,2000.

[58] D. MacHale. Comic Sections. Dublin, 1993.

[59] Mosche Machover. Set theory, logic and their limitations. CambridgeUniversity Press, 1996.

[60] P. Mancosu. From Brouwer to Hilbert, the debate on the foundationsof mathematics in the 1920s. Oxford University Press, 1998.

[61] E. Maor. To infinity and beyond. Boston Press, 1987.

[62] R. Monk. Russell. The Great Philosophers. Routledge, 1999. Firstpublished in 1997 by Phoenix.

[63] G.H. Moore. Zermelo’s axiom of choice: it’s origins, development andinfluence. Springer-Verlag, 1982.

[64] E. Nagel and J. R. Newman. Godel’s proof. New York University Press,1986. First published in 1958.

[65] G. Peano. Calcolo differenziale e principii di calcolo integrale. TurinPress, 1884.

[66] G. Peano. Applicazioni geometriche del calcolo infinitesimale. TurinPress, 1887.

192 BIBLIOGRAPHY

[67] G. Peano. Calcolo geometrico secundo lAusdehnungslehre di H. Grass-mann e precedutto dalle operazioni della logica deduttiva. Fratelli Bocca,Torino, 1888. Translation in German ‘Geometric Calculus : Accor-ding to the Ausdehnungslehre of H. Grassmann’ by Lloyd Kannenberg,november 1999, Publisher Birkhauser.

[68] G. Peano. Dizionario di matematica. Parte prima. Logica matematica.Unknown, 1901. In Ri(e)vista di mathematica, edited by Peano.

[69] L.J.J. Wittgenstein P.M. Sullivan. The foundations of mathematics.Unknown, June 1927. Reprinted by F. P. Ramsey, June 1927, Theoria61 (2) (1995), pages 105-142.

[70] W. Van Orman Quine. Mathematical Logic. Harvard University Press,1951. Revised edition of Norton, New York 1940.

[71] W. Van Orman Quine. From a Logical Point of View: 9 Logico-Philisophical Essays. Harvard University Press, 2 edition, 1961. Cam-bridge, Massachusetts.

[72] W. Van Orman Quine. Set Theory and its Logic. Harvard UniversityPress, 1963. Cambridge, Massachusetts.

[73] R.C.W. Bertrand Russell entry in Encyclopedia Brittanica. EB, 2000.

[74] J. Richard. Les principes de mathematiques et le probleme des ensem-bles. Revue gnrale des sciences pures et appliques, 16, 1905. Publishedalso in Acta Mathematica 30 (1906), pages 295-296.

[75] B. Riemann. Uber die Hypothesen, welche der Geometrie zu grundeliegen. Gottingen Press, 1854.

[76] N. Rose. Mathematical Maxims and Minims. Raleigh NC, 1988.

[77] H. Rubin and J.E. Rubin. Equivalents of the axiom of choice. North-Holland Press, Amsterdam, 1963.

[78] B. Russell. My philosophical development. London: George Allen andUnwin, New York: Simon and Schuster, 1959.

BIBLIOGRAPHY 193

[79] B. Russell. Introduction to Mathematical Philosophy. The GreatPhilosophers. London: George Allen and Unwin; New York: TheMacmillan Company, 1999. First published in 1997 by Phoenix.

[80] B. Russell. The autobiography of Bertrand Russell. Routledge, 2000.

[81] S. Shelah. Proper forcing, lecture notes in mathematics. Springer-Verlag, 1982.

[82] M. Sipser. Introduction to the theory of computation. PWS PublishingCompany, Boston, 1997.

[83] A.T. Skolem. Einige bemerkungen zur axiomatischen begrundung dermengenlehre. Akademiska Bokhandeln, Helsinki, 1922. In ‘Matem-atikerkongressen i Helsingfors 4-7 juli 1922, Den femte skandinaviskamatematikerkongressen’, pages. 217-232. Reprinted in ‘Selected Worksin Logic’, by A.T. Skolem, edited by Jens E. Fenstad, 1970, PublisherUniversitetsforlaget, Oslo.

[84] R.M. Smullyan. Godel’s incompleteness theorems. Oxford LogicGuides. Oxford University Press, 1992.

[85] B. Sobocinski. L’analyse de l’antinomie Russellienne par Lesniewski.Unknown, 1950. Methodus I, pages 94-107, 220-228, 308-316; Metho-dus II, pages 237-257.

[86] F. Kamareddine T. Laan and R. Nederpelt”. Types in Logic and Ma-thematics before 1940, volume 8. Bulletin of Symbolic Logic, January2002. To be published.

[87] M. Tiles. Mathematics and the image of reason. Routledge, 1991.

[88] E.C. Titchmarsh. Mathematical Maxims and Minims. Rome Press,1988.

[89] A.M. Turing. On computable numbers, with an application to the Ent-scheidungsproblem, volume 42, pages 230-265 of 2. London Mathe-matical Society, 1936. With corrections from Proceedings of the Lon-don Mathematical Society, Series 2, Vol.43 (1937) pages 544 to 546.Reprinted with some annotations in ‘The Undecidable: Basic Paperson Undecidable Propositions, Unsolvable Problems and ComputableFunctions’, ed. Martin Davis, 1965, Raven Press, New York.

194 BIBLIOGRAPHY

[90] A.M. Turing. Intelligent Machinery. National Physical Labatory,1948. National Physical Labatory Report in ‘Machine Intelligence 5’by Meltzer, B. and Michie, P., 1969, Edinburgh University Press.

[91] Unknown. Encyclopedia Brittanica; Item on Principia Mathematica.EB, 2000.

[92] Unknown. Encyclopedia Brittanica; Item on Turing. EB, 2000.

[93] J. van Heijenoort. From Frege to Godel: source book in mathematicallogic 1879-1931. Harvard University Press, 1967.

[94] W. van Orman Quine. New foundations for Mathematical Logic. TheAmerican Monthly, February 1937. 44(2), pages 70-80.

[95] Various. The Mathematical Intelligencer, volume 13. Springer-Verlag,Berlin, 1991.

[96] J. von Neumann. Zur Einfurung der transfiniten Zahlen. Acta Szeged.1:199-208 [I, 3], 1923.

[97] J. Weiner. Frege in Perspective. Cornell, 1990.

[98] J. Weiner. Frege. Past Masters. Oxford University Press, 1999.

[99] A. Whitehead. An introduction to Mathematics. Williams and Norgate,London, 1911.

[100] A. Whitehead. A treatise on universal algebra. New York, 1960.

[101] E. Zermelo. Untersuchungen uber die Grundlagen der Mengenlehre,I. Springer-Verlag, 1908. In Mathematische Annalen 65, 1908, pages261-281.

The theory of the foundations of mathematics - 1870 to 1940ft-sipil.unila.ac.id/dbooks/The history...

Documents

Transcript of The theory of the foundations of mathematics - 1870 to 1940ft-sipil.unila.ac.id/dbooks/The history...