MT3502 - Real Analysis Jonathan M. Fraserjmf32/MT3502Notes.pdfMT3502 - Real Analysis Jonathan M....

MT3502 - Real Analysis∗

Jonathan M. FraserSeptember 17, 2018

About the courseThis module continues the study of analysis begun in MT2502 Analysis. The approach willbe rigorous, with precise definitions of the concepts involved and careful proofs of importanttheorems which become the building blocks of more advanced work. Near the end of the coursethe language of metric spaces will be introduced to show how the ideas developed for realfunctions fit into a much broader framework. These notes are adapted from notes originallywritten by Kenneth Falconer.

The course will have the following components:

1. Brief review of sets, numbers and functions

2. Countable and uncountable sets

3. Review of convergence, continuity and uniform continuity

4. Riemann integration

5. Power series

6. Convergence and continuity in normed and metric spaces

There are many books on analysis which cover most of this course and related topics, inparticular the following (note that these texts are for further reading or getting a different per-spective on the course).

John M. Howie, Real Analysis, Springer, 2001.

Robert G. Bartle & Donald R. Sherbert, Introduction to Real Analysis, 4th edition. (Wiley,2011)

Kenneth Ross, Elementary Analysis, 2nd edition. (Springer, 2014).

David Brannan, A First Course in Mathematical Analysis. (Cambridge University Press, 2006)

D.J.H. Garling, A Course in Mathematical Analysis, Vol.1. (Cambridge University Press,2014) (More advanced)

Ian Stewart and David Tall, The Foundations of Mathematics. 2nd Edition. (Oxford UniversityPress, 2015) (For those interested in set theory, the nature of numbers, infinity, etc.)

∗This work by Kenneth J. Falconer and Jonathan M. Fraser is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

1

https://creativecommons.org/licenses/by-nc-sa/4.0/

https://creativecommons.org/licenses/by-nc-sa/4.0/

•Writing mathematics clearly and carefully becomes increasingly important as you get furtherinto the subject. Advanced mathematical arguments can have a complicated logical structure,and this structure must be clear to any reader (and even more importantly to the writer!).• In particular, written mathematics should not just be a list of equations but should make cleartheir logical relationship to each other. The careful use of words to express such relationshipsis important. In particular, the precise use of words such as ‘for all’, ‘for some’, ‘implies’, ‘if’,‘only if’ is crucial.• Styles of mathematical writing vary considerably. You should develop your own style usingwords as well as symbols, to make your arguments as clear as you can.

1 Sets, functions, numbers, relations - RevisionThis section is just a reminder of the standard notation and terminology which we will be using.

1.1 SetsSets and relations between sets are fundamental to mathematics.

We will take a set to be a collection of objects which we term the members or elements ofthe set. Generally we use capital letters to denote sets and small letters to denote elements.

We write a ∈ A to mean that an element a is a member of a set A and a 6∈ A to mean it is nota member of A.

We sometimes write {· · ·} for the set containing the elements · · · .Here are some standard sets that we will encounter:

Z= {. . . ,−3,−2,−1,0,1,2,3, . . .} the IntegersQ= {p/q such that p,q ∈ Z,q 6= 0} the Rational numbersR= the Real numbersC= the Complex numbersN= Z+ = {1,2,3, . . .} the Natural numbers or Positive integersQ+ = the Positive rationals, etc.∅ the Empty set or Null set, i.e. the set with no elements.

We write{x : Property involving x} or {x| Property involving x}

to mean “the set of x such that the Property involving x holds”.

In this notation, the closed and open intervals of real numbers between a and b are, respectively,

[a,b] = {x : a≤ x≤ b} and (a,b) = {x : a < x < b}.

We say that A is a subset of B (A ⊆ B) if every element of A is an element of B. Note that2 ∈ Z but {2} ⊆ Z, etc.

If A⊆ B and A 6= B we say A is a proper subset of B (some books distinguish this by writingA⊂ B).

2

1.1.1 Set operations

Here are some basic operations on sets:

Intersection A∩B = {x : x ∈ A and x ∈ B};

Union A∪B = {x : x ∈ A or x ∈ B};

Set difference A\B = {x : x ∈ A and x 6∈ B};

Complement Ac = {x : x ∈U and x 6∈ A} where U is the universal set under consideration.

More generally, if we have a collection of sets Ai (i ∈ I) indexed by another set I, we can formtheir intersection and union:⋂

i∈I

Ai = {x : x ∈ Ai for all i ∈ I};⋃i∈I

Ai = {x : there exists i ∈ I such that x ∈ Ai}.

Power setsThe power set of A, written P (A), is the set of all subsets of A, that is P (A) = {Y : Y ⊆ X}. Forexample, P ({1,2}) =

{∅,{1},{2},{1,2}

}.

Cartesian productsThe Cartesian product or direct product A×B of two sets A and B is the set of all ordered pairs

A×B = {(a,b) : a ∈ A, b ∈ B}.

Example 1.1.1. If A = {1,2} and B = {1,2,3} then

A×B = {(1,1),(1,2),(1,3),(2,1),(2,2),(2,3)}.

2. R×R= {(x,y) : x ∈ R,y ∈ R}= R2 = the Euclidean plane.

1.2 FunctionsFunctions relate elements of different sets. A function, mapping or transformation f from aset A to a set B is a rule or formula that associates to each element a ∈ A exactly one elementf (a) ∈ B.

We write f : A→ B to mean that f is a function from A to B. We call A the domain of f andB the range (or codomain) of f .

[Formally a function f : A→ B is a subset f ⊆ A×B with the property that for each a ∈ Athere exists a unique element b ∈ B such that (a,b) ∈ f . Note that some books, especially inalgebra, write a f instead of f (a)].

Two functions f and g are equal if and only if they have the same domain and the samerange and f (x) = g(x) for all x in the domain.

3

Example 1.2. The following are functions:

1. f : {a,b,c,d}→ {1,2,3}; f (a) = 1, f (b) = 2, f (c) = 1, f (d) = 3.

2. f : Z→ Z; f (x) = x+1 for each x ∈ Z.

3. f : Z→ Z; f (x) = 2x for each x ∈ Z.

4. f : Q→Q; f (x) = 2x for each x ∈Q.

5. For any given set A the mapping iA : A→ A given by f (x) = x for each x ∈ A is called theidentity function on A.

Example 1.3. The operation of addition on the integers Z is a function + : Z×Z→ Z.

Example 1.4. A sequence (a1,a2,a3, . . .) of elements from a set A is just a convenient way ofdefining the function f : N→ A given by f (n) = an.

Types of function

A function f : A→ B is surjective or onto, if, for every b ∈ B, there exists a (not necessarilyunique) a ∈ A such that b = f (a).

A mapping f : A→ B is injective or one-to-one, if distinct elements of A always map to distinctimages in B, i.e. for x,y ∈ A, x 6= y, then f (x) 6= f (y).

A mapping is bijective or a one-to-one correspondence if it is both injective and surjective.

In Example 1.2 above, 1, 2, 4 and 5 are surjective; 3 is not. For example, to prove function 2is surjective: given any y ∈ Z, let x = y− 1, then f (x) = y− 1+ 1 = y. To show that 3 is notsurjective, note that there is no element x ∈ Z such that f (x) = 1.

Usually, a proof of injectivity uses the contrapositive of the condition just stated – one showsthat:

f (x) = f (y) =⇒ x = y.

Functions 2, 3, 4 and 5 are injective; 1 is not. For example, function 3 is injective because

f (x) = f (y) =⇒ 2x = 2y =⇒ x = y.

Function 1 is not injective since the distinct elements a and c both map to 1.

Thus functions 2, 4 and 5 are bijections; 1 and 2 are not.

4

Example 1.5. The following functions are all different:

sin : R→ R is neither injective or surjectivesin : R→ [−1,1] is surjectivesin : [−π

2 ,π

2 ]→ R is injectivesin : [−π

2 ,π

2 ]→ [−1,1] is bijective.

Composition and inverses of functions Given two functions f : A→ B and g : B→ C, weoften want to combine these functions to give us a new function from A to C whose effect on anelement a ∈ A is that of first applying the f to a, and then applying g to the result:

For two functions f : A→ B and g : B→C, their composition is the function g◦ f : A→Cdefined by (g◦ f )(a) = g( f (a)) for all a ∈ A.

Note that composition is associative; that is if h : A→ B, g : B→ C and f : C→ D thenf ◦ (g◦h) and ( f ◦g)◦h are equal, as they both map A→ D and(

f ◦ (g◦h))(a) = f

((g◦h)(a)

)= f(g(h(a))

)= ( f ◦g)

((h)(a)

)=(( f ◦g)◦h

)(a)

for all a ∈ A.Let f : A→ B. A function g : B→ A is called the inverse of f if g ◦ f = iA and f ◦ g = iB.

If such an inverse exists we say that f is is invertible. We write f−1 for the inverse of f , so thatf−1 ◦ f = iA and f ◦ f−1 = iB. (The inverse of a function is unique).

It turns out that we can exactly characterize the invertible mappings:

Theorem 1.6. Let f : A→ B. Then f has an inverse if and only if it is a bijection.

Proof. Suppose f has inverse f−1. To show that f is injective, let x,y ∈ A be such that f (x) =f (y). Then x = f−1( f (x)) = f−1( f (y)) = y.

To show f is surjective, let z ∈ B so that f−1(z) ∈ A. Then f ( f−1(z)) = iB(z) = z. So f issurjective and thus bijective.

Now suppose f is a bijection. We obtain an inverse by ‘reversing the arrows’. The bijectiveproperty means that we can define f−1 : B→ A by f−1(y) = {the unique x : f (x) = y}. Thenimmediately f ( f−1(y)) = y and f−1( f (x)) = x.

Example 1.7. tan : (−π

2 ,π

2 )→ R is a bijection and tan−1 : R→ (−π

2 ,π

2 ) is its inverse.

1.3 NumbersThroughout this course we will assume the familiar and standard arithmetic properties of Z,Qand, in particular for this course, R.

Concisely, the reals R together with the operations of + and × form a field, i.e. R is acommutative group under +, and R\{0} is a commutative group under ×, and the distributivelaw holds.

R also satisfies the standard order properties, that is for all a,b,c,d ∈R we have either a≤ bor b≤ a, with both holding iff a = b, and

a≤ b,b≤ c implies a≤ ca≤ b,c≤ d implies a+ c≤ b+d0≤ a≤ b,0≤ c≤ d implies ac≤ bd.

A key property of R, that distinguishes it from Q, is the completeness property: everyCauchy sequence of real numbers converges to a real number (there are various other equivalentdefinitions of completeness). We will come back to this in more detail later.

5

1.4 RelationsAnother vital concept in pure mathematics is that of relations. Formally, given a set X , arelation on X is a set R ⊆X×X . At first this seems like a rather pathological object to hold suchimportance, but recognising that (x,y)∈R should be interpreted as “x is related to y”, highlightsthe significance: a collection of ordered pairs is the mathematically rigourous formulation ofobjects being related to each other. Sometimes we write xR y to mean (x,y) ∈ R , and then ifthe relation is defined by a familiar mathematical symbol, for example, the less than or equal torelation “≤”, then we might use this symbol instead of R . For example, the less than or equalto relation on Z can be represented as ≤ = {(2,3),(−4,0),(10,10), . . .} for example. Noticethat 2≤ 3 and so the pair (2,3) ∈≤ but 3 > 2 and so (3,2) /∈≤.

Certain types of relations take on a greater significance and so have special names. Let Rbe a relation on a set X . Then

1. If xR x for all x ∈ X , then R is called reflexive.

2. If xR y⇒ yR x for all x,y ∈ X , then R is called symmetric.

3. If (xR y and yR z)⇒ xR z for all x,y,z ∈ X , then R is called transitive.

Finally, if a relation is reflexive, symmetric and transitive, then it is called an equivalencerelation. If R is an equivalence relation on X , then the equivalence class of x ∈ X is defined by

[x] = {y ∈ X : xR y},

that is, the collection of all points which are thought of as ‘equivalent to x’. Equivalence classesform a partition of X , in that X = ∪x∈X [x] and [x]∩ [y] 6= /0⇒ [x] = [y].

6

2 Countable and uncountable sets

2.1 Size and similarity of setsWhen do two sets have the same size? When is one set larger than another?

For finite sets, this is easy: two sets X and Y have the same size if they contain the samenumber of elements. The set X is larger than the set Y if X contains a greater number ofelements. But, if you think about it, what is the ‘number’? We determine the number of elementsof a finite set by establishing a one-one correspondence between the set and another set forwhich we ‘know’ its number of elements (e.g. the ‘set’ of fingers on our hand). For example{1,2,3} and {3,4,5} have the same size because there is a one-one correspondence betweenthem: 1↔ 3, 2↔ 4, 3↔ 5. The set {1,2,3} is smaller than the set {5,6,7,8} because no suchcorrespondence can be established, but there is an injection 1 7→ 5, 2 7→ 6, 3 7→ 7.

What about infinite sets? For example, which set is larger: N = {1,2,3, . . .} or N \ {1} ={2,3,4, . . .}? One may be tempted to say that N is larger as it contains all the elements ofN\{1} plus an extra element. But there is nevertheless a bijection or correspondence betweenthe two sets of numbers:

N 1 2 3 4 5 6 . . .l l l l l l

N\{1} 2 3 4 5 6 7 . . .

– removing an element from an infinite set does not change its size from this point of view! Buthow about N and 2N= {2,4,6, . . .}? This time we have removed ‘lots’ of elements: 1, 3, 5,. . . .Nonetheless, they still ‘look the same’. More precisely, there is a bijection between them:

N 1 2 3 4 5 6 . . .l l l l l l

2N 2 4 6 8 10 12 . . .

This leads us to the definition of ‘similarity’ of sets.

7

Definition 2.1. We say that a set A is similar to a set B if there is a bijection f : A→ B, in whichcase we write A' B.

Example 2.2. {1,2,3} ' {4,5,6} since x 7→ x+3 is a bijection between the sets.{1,2,3, . . .} ' {4,5,6, . . .} since x 7→ x+3 is a bijection between the sets.

Proposition 2.3. Similarity ' is an equivalence relation; that is for all sets A,B and C:

Reflexive: A' A (since the identity mapping iA : A→ A, iA(x) = x is a bijection)

Symmetric: A' B =⇒ B' A (since if f : A→ B is a bijection then so is f−1 : B→ A).

Transitive: A ' B and B 'C =⇒ A 'C (since if f : A→ B and g : B→C are bijectionsthen so is their composition g◦ f : A→C)

Because of the symmetry condition, we can simply say that two sets A and B are similar ifthere exists a bijection between A and B.

The ‘size’ of finite sets does not pose a problem.

Definition 2.4. We say that a non-empty set A is finite if it is similar to {1,2, . . . ,n} for somen ∈N, in which case we say that A has cardinality n ∈N and we write |A|= n. A non-empty setthat is not finite is infinite.

Proposition 2.5. For A,B finite sets, A' B if and only if A and B have the same cardinality.

Proof. If A and B both have cardinality n then A ' {1,2, . . . ,n} and {1,2, . . . ,n} ' B so it bytransitivity A' B.

If A' B and |A|= n then {1,2, . . . ,n}' A so by transitivity {1,2, . . . ,n}' B so |B|= n.

The situation for infinite sets gets rather more complicated as some examples indicate.

Example 2.6.

1. N ' N\{1} ≡ {2,3,4, . . .} since the mapping f : N→ N\{1} given by f (x) = x+1is a bijection.

2. N' 5N≡ {5,10,15, . . .} since f : N→ 5N given by f (n) = 5n is a bijection; thus wehave the following 1-1 correspondence:

N 1 2 3 4 5 6 . . .l l l l l l

5N 5 10 15 20 25 30 . . .

3. N' Z since f : N→ Z given by

f (n) =

{n/2 if n is even,−((n−1)/2) if n is odd.

is a bijection. Thus we have the pairing:

N 1 2 3 4 5 6 . . .l l l l l l

Z 0 1 −1 2 −2 3 . . .

8

4. R' R+ since exp : R→ R+ is a bijection.

5. (−π/2,π/2)' R since tan : (−π/2,π/2)→ R is a bijection.

Whilst these examples might seem reasonable, we will shortly encounter rather more sur-prising pairs of similar and non-similar sets. In particular, we will show that N'Q but N 6' R.Thus the ‘size’ of R is ‘strictly larger’ than the ‘size’ of N and Q, in other words the infinite setsQ and R are of different sizes.

2.2 Countable sets

Definition 2.7. An infinite set A is countable if N'A, that is if there exists a bijection f :N→A.We can think of such a bijection as a list or enumeration of the elements of A:

N 1 2 3 4 5 6 . . .l l l l l l

A a1 a2 a3 a4 a5 a6 . . .

where each element of A appears exactly once as an ai.

Thus to show that an infinite set A is countable it is enough to show that we can list, count,or enumerate its elements as a sequence (a1,a2,a3,a4, . . .) so that each element of A occurssomewhere in the sequence. (Bear in mind that such a list is just a way of specifying a bijectionN→ A).

Alternatively, since ' is an equivalence relation, if we can show that A' B for a set B thatis known to be countable, then A must be countable.

[Note that some books also consider finite sets to be countable.]

Example 2.8.

1. Z is countable since we may enumerate Z as 0,1,−1,2,−2,3,−3,4, . . ..

2. Q∩ (0,1), i.e. the set of rational numbers between 0 and 1, is countable since it maybe enumerated in the logical sequence:

12,

13,

23,

14,��2

4,

34,

15,

25,35,

45,

16,��2

6, . . .

where we delete any fraction that has already appeared in another form. (We will see laterthat Q itself is countable.)

3. The set N×N of all pairs of natural numbers is countable. To see this write N×N inan array as below (as coordinates of points in the plane with both coordinates naturalnumbers) and enumerate in the manner indicated by superscripts:

... ↖ ......

...10(4,1) 14(4,2) 19(4,3) 25(4,4) . . .

↖ ↖ ↖6(3,1) 9(3,2) 13(3,3) 18(3,4) . . .

↖ ↖ ↖3(2,1) 5(2,2) 8(2,3) 12(2,4) . . .

↖ ↖ ↖ ↖1(1,1) 2(1,2) 4(1,3) 7(1,4) . . .

9

Thus we may enumerate N×N as

(1,1),(1,2),(2,1),(1,3),(2,2),(3,1),(1,4),(2,3),(3,2),(4,1),(1,5), . . . .

More formally, the mapping f : N×N→ N

f (a,b) =(a+b−2

∑k=1

k)+a = 1

2(a+b−2)(a+b−1)+a

is a bijection that gives the position of (a,b) in the list.

4. By writing pairs in an array in the same way as (3) we may show that if A and B arecountable then the product A×B = {(a,b) : a ∈ A,b ∈ B} is countable. It follows thatZ, Z×Z, Z×Z×Z, . . . are all countable.

Sometimes it is easier to set up an injection or surjection than to find a bijection between Nand a given set.

Proposition 2.9. (i) Every subset of a countable set is countable or finite.(ii) If f : A→ B is a surjection and A is countable then B is countable or finite.(iii) If f : A→ B is an injection and B is countable then A is countable or finite.

Proof. (i) If A is countable we may list its elements as (a1,a2,a3, . . .). Any subset may beenumerated by deleting elements not in the subset, i.e. as (ai1,ai2,ai3, . . .) where 1≤ i1 < i2 <i3 < · · · and this will either terminate or give an enumeration of a countable set.

(ii) List A as (a1,a2,a3, . . .). Then define A′⊆A by A′= {ai ∈A such that f (ai) 6= f (a j) for all1≤ j < i}. Then A′ ' B and by (i) A′ is countable or finite, so B is countable or finite.

(iii) We have A' f (A)⊆ B, so using (i) f (A) and thus A is countable or finite.

Part (iii) enables us to show certain sets are countable by ‘coding’ them by integers andusing the uniqueness of factorisation of integers.

Example 2.10. The map f : N×N→N given by f (a,b) = 2a3b is an injection, since if 2a3b =2a′3b′ then (a,b) = (a′,b′) by unique factorisation. By Proposition 2.9 (iii) N×N is countable.(This is an alternative to the direct proof above).

The following property, often stated as ‘A countable union of countable sets is countable’,is useful in showing certain sets are countable.

Theorem 2.11 (Cantor’s Theorem). Let A1,A2, . . . be a countable family of countable sets. Then⋃∞i=1 Ai is countable.

Proof. For each i = 1,2,3 . . . let Ai = {ai1,ai2,ai3, . . .}. We can enumerate⋃

∞i=1 Ai as

(a11,a12,a21,a13,a22,a31,a14, . . .) and then delete any duplicates. Alternatively, define an injec-tion f :

⋃∞i=1 Ai→N by f (ai j) = 2i3 j so that countability follows from Proposition 2.9 (iii).

Example 2.12. (i) The set Q of all rationals is countable.(ii) The set of all algebraic numbers (i.e. solutions of polynomial equations with integer

coefficients) is countable.

Proof. (i) For n = 1,2,3, . . . let An = {r/n : r ∈ Z}. Then An is countable, so by Theorem 2.11Q=

⋃∞n=1 An is countable.

(ii) For n = 1,2,3, . . . let An be the set of all zeros of polynomials with integer coefficientsof degree ≤ n. There are countably many such polynomials (since the list of coefficients are inbijective correspondence with the (n+1)-fold product Z×·· ·×Z), and each such polynomialhas at most n distinct roots, so each An is countable. But the set of algebraic numbers is

⋃∞n=1 An

which is countable by Theorem 2.11.

10

2.3 Some uncountable setsAn infinite set that is not countable is called uncountable. We start with Cantor’s classical‘diagonal argument’ that demonstrates that the real numbers are uncountable.

Proposition 2.13. R∩ (0,1) is uncountable and so R is uncountable.

Proof. Suppose, for a contradiction, that R∩ (0,1) is countable, so that we may enumerate itselements as a list (a1,a2,a3,a4, . . .) which must contain every number in (0,1). We may expressthese numbers in decimal form

a1 = 0 .a11 a12 a13 a14 . . .

a2 = 0 .a21 a22 a23 a24 . . .

a3 = 0 .a31 a32 a33 a34 . . .

a4 = 0 .a41 a42 a43 a44 . . .

......

so that ai j is the jth decimal digit of ai. (If ai is a number with two decimal expansions, takethe one ending in a string of 0s rather than that ending in a string of 9s.)

Define

b = 0.b1 b2 b3 b4 . . . where bi =

{5 if aii 6= 57 if aii = 5 .

Then b 6= ai for all i, since bi 6= aii, that is b differs from ai in the ith decimal place. Thus b is notin the list, which contradicts the assumption that the list contains all real numbers in (0,1).

Recall that P (A) denotes the power set of A, that is the set of all subsets of A. The followingresult, which is really just a variant of the previous one, intuitively says that the power set of aset A has cardinality strictly larger than that of A itself.

Theorem 2.14. Let A be a non-empty set. Then P (A) 6' A.

Proof. Suppose, for a contradiction, that there exists a bijection f : A→ P (A). Let B = {x ∈A such that x /∈ f (x)}. Since f is a bijection, B = f (a) for some a ∈ A.

From the definition of B, a ∈ B iff a /∈ f (a) = B, a contradiction. Thus there that there canbe no bijection from A to B.

It follows, at least intuitively, that, given any infinite set there is a strictly larger one, so thereare infinitely many ‘different sizes’ of infinity.

Example 2.15. (i) The set of all subsets of N is uncountable.(ii) However, the set of all finite subsets of N is countable.

Proof. (i) This follows from Theorem 2.14.(ii) For each n let An be the set of all subsets of {1,2, . . . ,n}. Then An is finite; indeed

|An|= 2n. The set of all finite subsets of N is just⋃

∞n=1 An so is countable by Theorem 2.11.

11

2.4 Cardinality of infinite sets - a very brief discussionThis section is non-examinable.

We return to thinking of cardinality as representing the size of sets – can the idea of |A|denoting the number of elements in a finite set A be extended in a meaningful manner to infi-nite sets? We recall and extend the Definition 2.1 to allow comparison as well as equality ofcardinalities – we start to think of |A| as the size of A even if A is infinite.

Definition 2.16. As before, two sets A and B are similar if there is a bijection f : A→ B, inwhich case we now write |A|= |B| and say that they have the same cardinality.

We say that the set A has cardinality less than or equal to B if there is an injection f : A→ B,and we write this as |A| ≤ |B|. We say that the set A has cardinality strictly less than B if |A| ≤ |B|and |A| 6= |B| in which case we write |A|< |B|

Example 2.17. |Z|< |R|, since f : Z→ R given by f (n) = n is an injection and Z 6' R.

Theorem 2.18. Let A, B, C be sets. Then(i) |A|= |A|,(ii) If |A|= |B| then |B|= |A|,(iii) If |A|= |B| and |B|= |C| then |A|= |C|,(iv) |A| ≤ |A|,(v) If |A| ≤ |B| and |B| ≤ |C| then |A| ≤ |C|.

Proof. Parts (i)-(iii) are just a restatement of Proposition 2.1 concerning properties of bijections.Part (iv) follows since the identity map is an injection, and (v) follows since if f : A→ B

and g : B→C are injections then g◦ f : A→C is an injection.

Despite the notation ‘=’ and ‘≤’ with its intuitive connotations, one thing is missing fromthe above list of properties, namely that if |A| ≤ |B| and |B| ≤ |A| then |A| = |B|. Without this‘antisymmetry’ property one might have both |A|< |B| and |B|< |A| and cardinality would bea rather limited notion. Of course, this can be proved, but it is a serious theorem known as theSchroeder–Bernstein Theorem.

Theorem 2.19 (Schroeder–Bernstein). Given two sets A and B, if there exist injections f : A→Band g : B→ A then there exists a bijection h : A→ B. More succinctly:

|A| ≤ |B| and |B| ≤ |A| =⇒ |A|= |B|.

Proof. Included for information only.If a ∈ A is such that f (a) = b, call a the parent of b. Similarly, b ∈ B is the parent of c ∈ A

if g(b) = c. (Notice that the injectivity of f and g means that an element can have at most oneparent.)

Let z ∈ A∪B. An ancestral chain for z is a sequence z0,z1, . . . such that z0 = z and zi+1 isthe parent of zi for each i. (An ancestral chain may be of finite or infinite length.) If there is noinfinite ancestral chain for z, then the depth of z is the index of the last element in the uniquelongest ancestral chain for z; otherwise z has infinite depth. (Observe that z may have depth 0.)

Let Ae, Be be the subsets of A, B consisting of even-depth elements; Ao, Bo be their sub-sets consisting of odd-depth elements; and A∞, B∞ be their subsets consisting of infinite-depthelements.

12

Notice that f maps Ae to Bo, Ao to Be, and A∞ to B∞. Similarly, g maps Be to Ao, Bo to Ae,and B∞ to A∞.

Observe that elements of Ao ∪Bo ∪A∞ ∪B∞ always have parents; this may not be true forelements of Ae∪Be, since an element of this set may have depth 0.

Define h : A→ B by

h(a) =

{f (a) if a ∈ Ae∪A∞,g−1(a) if a ∈ Ao.

This mapping is defined everywhere since g−1(a) exists for all a ∈ Ao and is unique by theinjectivity of g.

Suppose a1,a2 ∈ A are such that h(a1) = h(a2). If h(a1) = h(a2) lies in Be, then a1,a2 ∈ Ao.So g−1(a1) = h(a1) = h(a2) = g−1(a2). So a1 = g(g−1(a1)) = g(g−1(a2)) = a2. If h(a1) =h(a2) lies in Bo ∪ B∞, then a1,a2 ∈ Ae ∪ A∞. So f (a1) = h(a1) = h(a2) = f (a2). But f isinjective, so a1 = a2. Thus h is injective.

Choose b ∈ B. If b ∈ Be, let a = g(b). Then h(a) = g−1(g(b)) = b. If b ∈ Bo∪B∞, then bhas a parent a ∈ A with f (a) = b. In fact, a must lie in Ae∪A∞, so h(a) = f (a) = b. Thus h issurjective.

Therefore h is a bijection from A to B and thus |A|= |B|.

This fact that ≤ satisfies all the properties of a total order allows us to extend some of ourearlier notions to infinite cardinalities.

Corollary 2.20. (i) Let A be a non-empty set. Then |A|< |P (A)|.(ii) There are infinitely many different infinite cardinalities.(iii) There is no largest cardinality.(iv) There is no set containing all sets.

Proof. (i) We showed in Theorem 2.14 that |A| 6= |P (A)|, but clearly |A| ≤ |P (A)| since a 7→ {a}is an injection. Then |A|< |P (A)|.

(ii) We may define a sequence by A1 =N and An = P (An−1) for n≥ 2. By (i) |A1|< |A2|<|A3|< · · · .

(iii) This follows from (i).(iv) If there was such a set it would have to have cardinality strictly greater than its own, by

(i).

So far we have not given a meaning to |A| outside the context of ‘=’ and ‘≤’. Motivated byfinite sets, we can think of |A| as an ‘infinite number’ or cardinal representing the cardinality ofA, and so of any set B such that A' B.

Example 2.21. We write ℵ0 (‘aleph nought’) for the cardinality of N so a set is countable if|A|= ℵ0. Thus |N|= |Q|= ℵ0.

We write c for the cardinality of R or ‘cardinality of the continuum’ . Thus |R| = |C| = cand ℵ0 < c.

It is possible to define an arithmetic on cardinals:

Definition 2.22. Let A and B be disjoint infinite sets. The sum and product of cardinals |A| and|B| is defined by

|A|+ |B|= |A∪B|, |A| · |B|= |A×B|.

13

It may be checked that these operations are well-defined, i.e. independent of which sets ofgiven cardinality are chosen. The basic arithmetic for infinite cardinals is very simple:

Theorem 2.23. For any two infinite sets A and B

|A|+ |B|= |A| · |B|= max(|A|, |B|).

Typically, if you are given specific A and B this is reasonably easy to prove. However,proving the statement in general is quite difficult, and is beyond our scope here.

Example 2.24. |C|= |R×R|= c× c= c.

The Continuum HypothesisDoes there exist a cardinal a such that ℵ0 < a< c, or to put it another way, is there a set X suchthat |N|< |X |< |R|? That is, is there an uncountable set that is ‘smaller’ than the reals?

The conjecture, originally made by Cantor, that no such set exists is called the ContinuumHypothesis.

Godel showed that one cannot disprove the Continuum Hypothesis using standard mathe-matical logic, and later Cohen showed that one cannot prove the Continuum Hypothesis usingstandard mathematical logic.

Therefore the Continuum Hypothesis is independent of the usual axioms of mathematics andone can choose whether to assume that the Continuum Hypothesis is true or that its negation istrue.

14

3 Review of convergence, continuity and uniform continuity‘Mathematical analysis’ usually refers to the rigorous treatment of those areas of mathematicsthat involve some sort of limit, including continuity, differentiation, integration, etc., and theremainder of this course concerns central topics in analysis. We need to be careful to workfrom precise definitions and develop properties and theorems from those definitions, otherwiseit is easy to draw false conclusions. Sketch diagrams, such as graphs, are often very helpfulin getting an idea of what is happening, but this intuition then has to be converted into preciseargument.

This chapter essentially reviews material that you will have met in MT2502 that we willneed in this course. For detailed proofs and many examples, see the MT2502 notes or any bookon basic analysis. There is one ‘new’ topic in this chapter, namely uniform continuity.

3.1 Bounds of sets of real numbers

Many of the difficulties that are encountered with real numbers stem from the the fact that aset of real numbers need not contain a maximum or minimum element. To get around this, weintroduce infima and suprema.

Definition 3.1. (Bounds, suprema and infima) Let X be a non-empty set of real numbers. Thens is an upper bound for X if x≤ s for all x ∈ X. Similarly s is an lower bound for X if x≥ s forall x ∈ X. A set that has an upper (resp. lower) bound is termed bounded above (resp. below)A set of numbers that has both a lower and upper bound is called bounded. (Thus A is boundedif there exists a number M such that |x| ≤M for all x ∈ X.)

For X a non-empty set of real numbers, a number s is called the supremum or least upperbound of X, written supX, if

(a) x≤ s for all x ∈ X (i.e. s is an upper bound) and(b) for all t < s, there exists x ∈ X with x > t (i.e. no number less than s is an upper bound).

Similarly a number s is called the infimum or greatest lower bound of X, written infX, if(a) x≥ s for all x ∈ X (i.e. s is a lower bound) and(b) for all t > s, there exists x ∈ X with x < t (i.e. no number greater than s is a lower

bound).

We think of supX and infX as the ‘generalised maximum and minimum’ of the set X , but itis important to realise that supX and infX need not be members of X .

Examples

1. The closed interval [2,7] has a maximum element of 7, which is the supremum, and aminimum element of 2, which is the infimum.

2. However, the open interval (2,7) does not contain a maximum element. Nevertheless, 7is the supremum of the set, since it is an upper bound and no number less than 7 is anupper bound. Similarly, 2 is the infimum of the set. Thus sup(2,7) = 7 and inf(2,7) = 2.

3. The set X = {0, 12 ,

23 ,

34 , . . .}= {(n−1)/n : n ∈ N} has supX = 1, infX = minX = 0, and

X contains its infimum, but not its supremum.

15

Axiom of completeness

The axiom of completeness states: every non-empty set of real numbers X that is boundedabove has a supremum (i.e. a least upper bound).

The axiom of completeness is what distinguishes the real numbers from the rationals. It isequivalent to asserting the existence of real numbers since we can essentially regard any realnumber as the supremum of a set of rationals.

For example, let

X = {x ∈Q+ : x2 < 2} ⊇ {1,1.4,1.41,1.414.,1.4142, . . .}.

Clearly X is bounded above (by 10, say) so by the axiom has a supremum. It is easy to checkthat (supX)2 = 2, so the axiom guarantees the existence of

√2 as a real number, but supX /∈Q

as there is no rational which when squared gives 2.

3.2 Convergence of sequencesFormally, a sequence (of real numbers) is a function f : N → R, but we invariably writexn = f (n) and think of a sequence as an infinite list of numbers (x1,x2,x3, . . .) which we mayabbreviate as (xn)

∞n=1 or (xn)n or just (xn) if the context is clear.

Usually the most important thing about a sequence is how the terms xn behave when n islarge, that is the limiting behaviour as n→ ∞ and in particular whether or not the sequenceconverges.

Definition 3.2. (Convergence) The sequence xn converges to x or has limit x, written xn→ xor limxn = x, if for all ε > 0 there exists N ∈ N such that |xn− x| < ε for all n ≥ N. Such asequence is called convergent; a sequence that is not convergent is divergent.

To show that a sequence is convergent, we must show that for every number ε > 0 we canfind an N as stated. Note that we do not have to find the least such N.

Examples 1. The sequence(2n+1

3n+1

)converges to 2

3 .

Proof. Let ε > 0 be given. If n > 1/(9ε) then∣∣∣∣2n+13n+1

− 23

∣∣∣∣= 19n+3

<1

9n< ε,

so 2n+13n+1 →

23 . (We have taken ‘N’ = d1/9εe.)

2. The sequence((−1)n) is divergent.

Proof. Suppose (−1)n→ x. Take ε = 12 . If x≥ 0 then for all odd n, |(−1)n−x|= |−1−x|> 1

2 ,and if x≤ 0 then for all even n, |(−1)n−x|= |1−x|> 1

2 . In either case there is no N such that|(−1)n− x|< 1

2 for all n≥ N, so the sequence is divergent.

3. (Standard sequences: fixed powers of integers) For all α > 0, 1/nα→ 0.

4. (Standard sequences: powers of a fixed number) For c ∈ R

cn→ 0 if |c|< 1cn→ 1 if c = 1cn is divergent if |c|> 1 or c =−1

16

The following standard results on convergence of sequences are recorded here for reference.Full details of proofs, etc, may be found in the MT2503 notes or in books on analysis.

Lemma 3.3. (Boundedness) Every convergent sequence is bounded.

Proposition 3.4. (Arithmetic properties) If (xn),(yn) are sequences with xn → x, yn → y andλ ∈ R, then:

(a) xn + yn→ x+ y(b) λxn→ λx(c) xn− yn→ x− y(d) xnyn→ xy(e) xn/yn→ x/y provided yn,y 6= 0.

Sample proof. (d) By Lemma 3.3 (yn) is bounded, say |yn| ≤ M for all n. Using the triangleinequality,

|xnyn− xy|= |(xn− x)yn + x(yn− y)| ≤ |xn− x||yn|+ |x||yn− y|≤ |xn− x|M+(|x|+1)|yn− y|. (∗)

Given ε > 0, chose N1 such that |xn−x|< ε

2M if n≥N1 and chose N2 such that |yn−y|< ε

2(|x|+1)if n≥ N2. By (∗), if n≥max{N1,N2} then |xnyn− xy|< ε

2 +ε

2 = ε. Thus xnyn→ xy.

Lemma 3.5. (Subsequences) Let xn→ x. If yn is a subsequence of xn (i.e. a sequence obtainedby deleting terms) then yn→ x.

The following key properties of sequences are in fact equivalent to the axiom of complete-ness. They also guarantee that certain sequences converge without the need to find the limit.

Proposition 3.6. (Increasing sequences) Every increasing sequence bounded above is conver-gent. Similarly, every decreasing sequence bounded below is convergent.

Proof. Let (xn) be an increasing sequence bounded above by some M, so that xn ≤ xn+1 andxn ≤M for all n. By the axiom of completeness the set of terms {xn : n ∈N} has a supremum x.

Let ε > 0. By definition of supremum, there is some N such that xN > x−ε. So for all n≥N

x− ε < xN ≤ xn ≤ x < x+ ε.

Thus xn→ x, i.e. the sequence is convergent. 2

Theorem 3.7. (Convergent subsequences) Every bounded sequence has a convergent subse-quence.

Definition 3.8. (Cauchy sequences) (xn) is a Cauchy sequence if given ε > 0 there exists Nsuch that |xn− xm|< ε for all n,m≥ N.

Theorem 3.9. (General principle of convergence.) A sequence of real numbers is convergent ifand only it is a Cauchy sequence.

Again, this can guarantee convergence without the need to find the limit. For example, thesequence given by

x1 = 1, xn+1 = xn +sinn2n

is easily seen to be Cauchy and so is convergent.

17

3.3 Continuous functionsThroughout this section we will consider functions f : A→ R where A⊆ R. Usually A will beeither an interval [a,b] or R and these are the cases you should keep in mind.

Definition 3.10. (Continuous function) A function f : A→ R is continuous at c ∈ A if for allε > 0 there exists δ > 0 such that if x ∈ A and |x− c|< δ then | f (x)− f (c)|< ε.

A function f is continuous on the set A if f is continuous at all x ∈ A.

To remember and understand this definition, think of the picture!Example f : [0,1]→ R given by f (x) = x2 is continuous.

Proof. Let c ∈ [0,1]. Let ε > 0 be given. Suppose x ∈ [0,1] satisfies |x−c|< 12ε (thus 1

2ε is our‘δ’). Then

| f (x)− f (c)|= |x2− c2|= |x+ c||x− c| ≤ 2|x− c|< 2× 12ε = ε.

Thus f is continuous at c for every c ∈ [0,1], so is continuous on [0,1]. 2

The following characterisation of continuity in terms of sequences can be very useful.

Proposition 3.11. (Sequence characterisation of continuity) The function f : A→ R is contin-uous at c ∈ A if and only if f (xn)→ f (c) for all sequences (xn) with xn ∈ A and xn→ c.

Example f : [0,1]→ R given by f (x) = x2 is continuous.

Proof. Let c ∈ [0,1]. Suppose xn → c. Then by Proposition 3.4(d) f (xn) = x2n → c2 = f (c).

Thus f is continuous at c by the sequence definition, so f is continuous on [0,1]. 2

The standard properties of continuity may be proved from the definitions, in particular com-bining continuous functions in natural ways yields continuous functions.

Proposition 3.12. (Arithmetic properties) Let f ,g : A→R be continuous at c ∈ A (resp. on A),and λ ∈ R. Then the following are continuous at c (resp. on A):

(a) f +g (b) λ f(c) f −g (d) f g(e) f/g (provided g(x) 6= 0 for x ∈ A) (f) | f |(g) min{ f ,g} (h) max{ f ,g}.

Sample proof. (c) (using an ε-δ argument) Let f ,g be continuous at c. So given ε > 0 there existδ1 > 0 such that | f (x)− f (c)| < 1

2ε if |x− c| < δ1, and δ2 > 0 such that |g(x)− g(c)| < 12ε if

|x− c|< δ2. Then if |x− c|< min{δ1,δ2},

|( f (x)−g(x))− ( f (c)−g(c))|= |( f (x)− f (c))− (g(x)−g(c))|

≤ | f (x)− f (c)|+ |g(x)−g(c)|< 12ε+ 1

2ε = ε.

Thus f −g is continuous at c.

Sample proof. (e) (using a sequence argument) Let xn → c. Since f ,g are continuous at c,f (xn)→ f (c) and g(xn)→ g(c) 6= 0. Applying Proposition 3.4(e), f (xn)/g(xn)→ f (c)/g(c),so f/g is continuous at c.

Sample proof. (g) Note that min{ f ,g}= 12

(( f +g)−| f −g|

)and use parts (a), (c), (f). 2

18

Corollary 3.13. (Polynomials) Every polynomial p is continuous on R. (A polynomial is afunction of the form p(x) = anxn + · · ·+a1x+a0 where ai ∈ R.)

Proof. The constant functions f (x) = a and the identity g(x) = x are clearly continuous on R.Every polynomial may be formed from these two functions using combinations of +,−,×, so,using Proposition 3.12 repeatedly, is continuous on R. 2

Proposition 3.14. (Composition) Let g :R→R be continuous at x and f :R→R be continuousat g(x). Then the composition f ◦g : R→ R is continuous at x (where ( f ◦g)(x) = f (g(x)) ).

Proof. Exercise - use sequences to get a very short proof! 2

There are two very important theorems concerning continuous functions on closed intervals:the Extreme Value Theorem and the Intermediate Value Theorem. We state them here; thereare various proofs which can be found in books or in the MT2503 notes, but all of them dependcrucially on the completeness of the real numbers (or some equivalent property) along with thedefinition of continuity.

Theorem 3.15. (Extreme value theorem) Let f : [a,b]→ R be continuous. Then f is boundedand attains its bounds, i.e. there exists M such that | f (x)| ≤M for all x ∈ [a,b], and there existc1,c2 ∈ [a,b] such that f (c1) = inf{ f (x) : x ∈ [a,b]} and f (c2) = sup{ f (x) : x ∈ [a,b]}.

Theorem 3.16. (Intermediate value theorem) Let f ,g : [a,b]→ R be continuous. Let k ∈ Rsatisfy f (a)< k < f (b) (or f (b)< k < f (a)). Then there exists c ∈ (a,b) with f (c) = k.

Finally, another important theorem in real analysis, which we may call upon from time totime.

Theorem 3.17. (Mean value theorem) Let f : [a,b]→ R be a continuous function and supposef is differentiable on (a,b). Then there exists a point c ∈ (a,b) such that

f ′(c) =f (b)− f (a)

b−a.

3.4 Uniform continuityUniform continuity is a very useful (and natural) notion that allows us to simplify certain ar-guments. There is one main result that we will need, namely that ‘a continuous function on aclosed interval is uniformly continuous’; in many ways this is in the spirit of the extreme valueand intermediate value theorems mentioned above.

Consider the following proofs that certain functions are continuous on R.

Examples 1. Let f : R→ R be f (x) = 3sinx. Let c ∈ R. Given ε > 0, if |x− c|< 13ε then

| f (x)− f (c)|= |3sinx−3sinc|= 3|sinx− sinc| ≤ 3|x− c|< 3× 13ε = ε.

Thus f is continuous at c for all c ∈ R, so is continuous on R.Note here that the same ‘δ’= 1

3ε works for all c ∈ R.

2. Let f : R→ R be f (x) = x2. Let c ∈ R. Given ε > 0, if |x− c|< min{1,ε/(2|c|+1)} then

| f (x)− f (c)|= |x2− c2|= |x+ c||x− c| ≤ (|x|+ |c|)|x− c| ≤((|c|+1)+ |c|

)|x− c|< ε. (∗)

19

Thus f is continuous at c, so is continuous on R.Note here that the larger c is, the smaller this ‘δ’ = min{1,ε/(2|c|+1)} is. For a given ε we

cannot find a δ that works for all c. From (∗), we need |x− c|< δ to imply that | f (x)− f (c)|=|x+ c||x− c| < ε, so |x− c| has to be very small if c is large and x is close to c making |x+ c|large.

Thus the situation in Example 1 is nicer than that in Example 2 in the sense that, in thedefinition of continuity, for any given ε we can find a δ that works for all c ∈R simultaneously.In mathematics we use the word uniform to refer to such situations when a number does notdepend on some underlying parameter (c in this case).

This leads to the following definition.

Definition 3.18. (Uniform continuity) Let A⊆R. A function f : A→R is uniformly continuouson A if for all ε > 0 there exists δ > 0 such that for all c ∈ A and all x ∈ A with |x− c| < δ wehave | f (x)− f (c)|< ε.

This should be compared with Definition 6.10 relating to continuity on a set which may bestated in the following way. Note that changing the order of the conditions significantly changesthe meaning.

Alternative Definition 3.10. (Continuity on a set) Let A ⊆ R. A function f : A→ R is con-tinuous on A if for all c ∈ A and for all ε > 0 there exists δ > 0 such that for all x ∈ A with|x− c|< δ we have | f (x)− f (c)|< ε.

Thus the function in Example 1 is uniformly continuous on R but the function in Example2 is continuous on R but not uniformly continuous.

Of course, every function that is uniformly continuous function on a set A is necessarilycontinuous on A. However, the converse is not in general true, as Example 2 shows. However,the converse is true in the very important case when A = [a,b] is a bounded closed interval.

Theorem 3.19. (Uniform continuity) Let f : [a,b]→ R be a continuous function. Then f isuniformly continuous on [a,b].

Proof. This may be proved in various ways that all depend (directly or indirectly) on the com-pleteness axiom. Here we give a proof using the fact that every bounded sequence has a con-vergent subsequence.

Assume, seeking a contradiction, that f : [a,b]→ R is continuous but not uniformly con-tinuous. Then there is some ε > 0 such that for every δ > 0 we can find c,x ∈ [a,b] such that|x− c|< δ and | f (x)− f (c)| ≥ ε.

Thus, for each n ∈ N, taking δ = 1n , choose cn,xn ∈ [a,b] such that

|xn− cn|< 1n and | f (xn)− f (cn)| ≥ ε.

By Proposition 3.7 (cn) has a convergent subsequence (c′n), with c′n → c, say; let (x′n) be thecorresponding subsequence of (xn). Then

x′n = (x′n− c′n)+ c′n→ 0+ c.

Since f is continuous at c, using the sequence definition of continuity,

0 < ε≤ | f (x′n)− f (c′n)| → | f (c)− f (c)|= 0,

20

which is the contradiction sought. 2

Examples 1. Every polynomial is uniformly continuous on every bounded closed interval, sincepolynomials are continuous.

2. Let f : (0,1)→ R be f (x) = 1/x. Then f is not uniformly continuous on (0,1) (of courseit is continuous on (0,1)). To see this, take ε = 1 and some δ > 0 . Then | f (c+ δ)− f (c)| =∣∣ 1

c+δ− 1

c

∣∣ = δ

c(c+δ) which will be greater than 1 if we take c close enough to 0 to ensure thatδ

c(c+δ) > 1.This example shows that Theorem 3.19 is not in general true for open intervals (a,b).

21

4 Riemann integrationIntegration is often first encountered as a means of finding the area under a curve. The difficulty,of course, is that, whilst we can easily find the area of a rectangle by multiplying its side-lengths,for an irregular region finding or even defining the area may be less obvious. A similar problemarises with volumes, averages and other quantities associated with varying functions. We thinkof an integral as some sort of limit of approximating sums, but this needs to be made precise.

When one first meets integration one learns that to integrate a function one has to find an‘anti-derivative’. It is really rather remarkable that two apparently different concepts are relatedin this way. A precise definition of derivative was given in MT2502 and in this chapter we willdefine the integral carefully using the definition due to Bernhard Riemann in 1854. We will thensee how various properties of integration follow from this definition, including the ‘fundamentaltheorem of calculus’ which relates integration and differentiation.

4.1 Dissections and lower and upper sumsThe underlying idea in defining an integral is to approximate a function by ‘step functions’ forwhich the areas under the graphs are easily calculated. We start with the notion of a dissectionof an interval, which we use to define lower and upper sums that approximate the integral.

Definition 4.1. (Dissection) A dissection or partition D of a closed interval [a,b] is a set ofnumbers

a = a0 < a1 < a2 < · · ·< an−1 < an = b.

The numbers {ai : i = 0,1, . . . ,n} are called the points of dissection of [a,b]. The maximumlength of the intervals formed by the dissection max1≤i≤n{ai− ai−1} is sometimes called themesh or norm of the dissection.

Definition 4.2. (Lower and upper sums) Let f : [a,b]→R be a bounded function. We define thelower and upper sums of f with respect to the dissection D = {a = a0 < a1 < · · ·< an = b} by

S(D) =n

∑i=1

(ai−ai−1) inf{ f (x) : ai−1 ≤ x≤ ai}, (4.1)

S(D) =n

∑i=1

(ai−ai−1)sup{ f (x) : ai−1 ≤ x≤ ai}. (4.2)

We sometimes write S f (D) or S f (D) instead of S(D) or S(D) if we wish to emphasise thefunction, but when just one function is under consideration we omit the subscript.

Note that S(D) and S(D) are the areas under the graphs of lower and upper approximationsto f by piecewise constant functions. Note also that S(D), respectively S(D), are finite numbersif and only if f is bounded below, respectively above.

Clearly S(D)≤ S(D) for every dissection D .The next lemma and its corollary note how the lower and upper sums change when extra

points are added to a dissection.

Lemma 4.3. Let D be a dissection of [a,b] and let D ′ be a dissection obtained from D byinserting a new point of dissection a′. Then

S(D ′)≥ S(D) and S(D ′)≤ S(D).

22

Proof. Let D be the dissection a = a0 < a1 < a2 < · · ·< an−1 < an = b so D ′ is the dissectiona = a0 < · · ·< ak < a′ < ak+1 < · · ·< an = b for some 0≤ k ≤ n−1.

From the definition (4.1) the only difference between the lower sums S(D ′) and S(D) resultsfrom the contributions between ak and ak+1. Thus

S(D ′)−S(D) =[(a′−ak) inf

ak≤x≤a′f (x)+(ak+1−a′) inf

a′≤x≤ak+1f (x)

]− (ak+1−ak) inf

ak≤x≤ak+1f (x)

≥[(a′−ak) inf

ak≤x≤ak+1f (x)+(ak+1−a′) inf

ak≤x≤ak+1f (x)

]− (ak+1−ak) inf

ak≤x≤ak+1f (x)

= 0.

Similarly S(D ′)−S(D)≤ 0. 2

Definition 4.4. (Refinement) We say that a dissection D ′ is a refinement of a dissection D ifevery point in D is a point of D ′.

Corollary 4.5. Let D ′ be a refinement of D . Then

S(D ′)≥ S(D) and S(D ′)≤ S(D).

Proof. This follows from Lemma 4.3 by repeatedly adding single points to the dissection. 2

It is not surprising that upper sums are always greater than or equal to lower sums.

Proposition 4.6. Let D1 and D2 be any dissections of [a,b]. Then

S(D1)≤ S(D2).

Proof. Let D1∪D2 be the dissection consisting of the union of the points of dissection in D1and D2. Then D1∪D2 is a refinement of both D1 and D2, so by Corollary 4.5

S(D1)≤ S(D1∪D2)≤ S(D1∪D2)≤ S(D2). 2

4.2 Definition of the integralFrom Proposition 4.6,

supD

S(D) ≤ infD

S(D). (4.3)

It is easy to find functions for which there is strict inequality in (4.3). However, for very manyfunctions we get equality and we take the common value of supD S(D) and infD S(D) to be theintegral of the function.

Definition 4.7. (Integrable, integral) Let f : [a,b]→R be a bounded function. If supD S f (D) =

infD S f (D) we say that f is Riemann integrable, or that the integral of f exists, and we call thecommon value the integral of f . We denote this by the usual symbol, thus∫ b

af (x)dx = sup

DS f (D) = inf

DS f (D). (4.4)

To avoid a clutter of symbols we may just write∫ b

a f or even∫

f when the interval ofdefinition is clear,

If b < a it is convenient to define∫ b

a f (x)dx =−∫ a

b f (x)dx.It is often easier to work with a sequence of dissections rather than considering the inf or

sup over all dissections.

23

Corollary 4.8. A bounded function f : [a,b]→R is integrable with∫ b

a f = l if and only if thereis a sequence of dissections Dn such that S(Dn)→ l and S(Dn)→ l.

In particular, f is integrable if and only if for all ε > 0 there is a dissection D such thatS(D)−S(D)< ε.

Proof. ‘ =⇒ ’ If f is integrable then for each n ∈ N we may find dissections:D ′n with S(D ′n) ≥ supD S(D)− 1

n = l− 1n

and D ′′n with S(D ′′

n) ≤ infD S(D)+ 1n = l + 1

n .Let Dn = D ′n∪D ′′

n . Then

l− 1n ≤ S(D ′n) ≤ S(Dn) ≤ S(Dn) ≤ S(D ′′

n) ≤ l + 1n ,

so S(Dn), S(Dn)→ l.

‘⇐=’ Since S(Dn) ≤ supD S(D) ≤ infD S(D) ≤ S(Dn),

if S(Dn)→ l and S(Dn)→ l then supD S(D) = infD S(D) = l, so f is integrable. 2

Using the definition of integral we can now integrate basic functions.

Example 4.9. The function f : [0,1]→ R given by f (x) = x2 is integrable with∫ 1

0 x2dx = 13 .

Calculation. For each n ∈ N, let Dn be the dissection of [0,1] into n equal parts,0 < 1

n < 2n < · · ·< n−1

n < 1. Then

S(Dn) =n

∑i=1

1n

(i−1

n

)2

=1n3

n

∑i=1

(i−1)2

=1n3 ×

16(n−1)n(2n−1)

=16

(1− 1

n

)(2− 1

n

)→ 1

3as n→ ∞.

S(Dn) =n

∑i=1

1n

(in

)2

=1n3

n

∑i=1

i2

=1n3 ×

16

n(n+1)(2n+1)

=16

(1+

1n

)(2+

1n

)→ 1

3as n→ ∞.

[Recall the sum of squares formula: ∑ni=1 i2 = 1

6n(n+1)(2n+1).]

We conclude that∫ 1

0 x2dx exists and∫ 1

0 x2dx = 13 . 2

Example 4.10. Define f : [0,1]→ R by f (x) ={

1 (x ∈Q)0 (x /∈Q)

. Then f is not integrable.

Calculation. Let D be any dissection 0 = a0 < a1 < · · ·< an = 1. Since every interval containsboth rational and irrational numbers, S(D) = ∑

ni=1(ai−ai−1)×0 = 0 and

S(D) = ∑ni=1(ai−ai−1)×1 = an−a0 = 1. Thus supD S(D) = 0 and infD S(D) = 1 so f is

not integrable. 2

Of course, many functions are integrable, in particular all continuous functions and allmonotonic functions.

Proposition 4.11. (Continuous functions) Let f : [a,b]→R be continuous. Then f is integrable.

24

Proof. By Theorem 3.19 f is uniformly continuous on [a,b]. Thus, given ε > 0 there existsδ > 0 such that if |x− y| < δ then | f (x)− f (y)| < ε

b−a . Let D be any dissection of [a,b] withintervals all < δ in length. Then:

S(D)−S(D) =n

∑i=1

(ai−ai−1)[

sup{ f (x) : ai−1 ≤ x≤ ai}− inf{ f (x) : ai−1 ≤ x≤ ai}]

≤n

∑i=1

(ai−ai−1)(

ε

b−a

)= (b−a)

ε

b−a= ε.

This is true for all ε > 0 so supD S(D) = infD S(D) and f is integrable. 2

Recall that a function f : [a,b]→ R is increasing if f (x)≤ f (y) for all a≤ x≤ y≤ b and isdecreasing if f (x) ≥ f (y) for all a ≤ x ≤ y ≤ b. We call a function that is either increasing ordecreasing monotonic.

Proposition 4.12. (Monotonic functions) Let f : [a,b]→R be monotonic. Then f is integrable.

Proof. Assume that f is increasing; the proof for decreasing functions is similar. We may alsoassume that f (a)< f (b) (otherwise f is constant and trivially integrable). Let ε > 0 and let Dbe a dissection of [a,b] such that ai−ai−1 ≤ ε

f (b)− f (a) for all i = 1, . . . ,n. Then:

S(D)−S(D) ≤n

∑i=1

(ai−ai−1) f (ai) −n

∑i=1

(ai−ai−1) f (ai−1)

=n

∑i=1

(ai−ai−1)[

f (ai)− f (ai−1)]

≤ ε

f (b)− f (a)

n

∑i=1

[f (ai)− f (ai−1)

]= ε.

This is true for all ε > 0 so supD S(D) = infD S(D) and f is integrable. 2

4.3 Properties of the integralThere are many familiar properties of the integral which need to be derived from the definition.We list these and indicate proofs. As before, we assume that our functions are bounded.

1. If f : [a,b]→ R and a < c < b then∫ b

a f =∫ c

a f +∫ b

c f , provided the integrals exist.

Proof. Let D ′n,D′′n be dissections of [a,c], [c,b] with

S(D ′n)→∫ c

a f and S(D ′n)→∫ c

a f ,S(D ′′

n)→∫ b

c f and S(D ′′n)→

∫ bc f .

Let Dn be the dissection of [a,b] given by the points of D ′n,D′′n together with the point c. Then

S(Dn) = S(D ′n)+S(D ′′n)→

∫ ca f +

∫ bc f ,

S(Dn) = S(D ′n)+S(D ′′n)→

∫ ca f +

∫ bc f .

2. Let f be integrable and λ ∈ R. Then∫ b

a λ f = λ∫ b

a f .Proof. Follows since for every dissection Sλ f (D)= λS f (D) (λ≥ 0) and Sλ f (D)= λS f (D) (λ<0) and similarly for the upper sums.

25

3. Let f ,g : [a,b]→ R be integrable. Then f +g is integrable and∫ b

a ( f +g) =∫ b

a f +∫ b

a g.

Proof. Take dissections D ′n,D′′n with

S f (D ′n),S f (D ′n)→∫ b

a f and Sg(D′′n),Sg(D

′′n)→

∫ ba g.

Let Dn = D ′n∪D ′′n . Then

S f (Dn),S f (Dn)→∫ b

a f and Sg(Dn),Sg(Dn)→∫ b

a g.Note that

sup[( f +g)(x) : ai−1 ≤ x≤ ai

]≤ sup

[f (x) : ai−1 ≤ x≤ ai

]+ sup

[g(x) : ai−1 ≤ x≤ ai

],

with the reverse inequality for ‘inf’, so thatS f (Dn)+Sg(Dn)≤ S f+g(Dn)≤ S f+g(Dn)≤ S f (Dn)+Sg(Dn),

from which S f+g(Dn),S f+g(Dn)→∫ b

a f +∫ b

a g.

Note that (2) and (3) show that integration is linear, i.e.∫ b

a (λ f +µg) = λ∫ b

a f +µ∫ b

a g.

4. Let f ,g : [a,b]→ R be integrable. Then f g is integrable.Proof. First we show that f 2 is integrable. Since f is integrable, it is bounded, so | f (x)| ≤Mfor some M, for all x ∈ [a,b]. Then for all x,y,

| f (x)2− f (y)2|= | f (x)+ f (y)|| f (x)− f (y)| ≤(| f (x)|+ | f (y)|

)| f (x)− f (y)| ≤ 2M| f (x)− f (y)|.

In particular, for any interval I,

supx∈I

{f (x)2}− inf

x∈I

{f (x)2}≤ 2M

[supx∈I

{f (x)

}− inf

x∈I

{f (x)

}].

Thus for every dissection D

S f 2(D)−S f 2(D)≤ 2M[S f (D)−S f (D)

].

Given ε> 0 we can find a dissection such that S f (D)−S f (D)< ε/2M, so S f 2(D)−S f 2(D)< ε,and it follows by Corollary 4.8 that f 2 is integrable.

The conclusion for general products now follows using (2) and (3), sincef g = 1

4

[( f +g)2− ( f −g)2].

5. Let f : [a,b]→ R be integrable. Then | f | is integrable and∣∣∣∫ b

af∣∣∣≤ ∫ b

a| f |.

Proof. Note that for any interval I,

supx∈I

{| f (x)|

}− inf

x∈I

{| f (x)|

}≤ sup

x∈I

{f (x)

}− inf

x∈I

{f (x)

},

so for every dissection D

S| f |(D)−S| f |(D)≤ S f (D)−S f (D).

Thus if f is integrable so is | f | by Corollary 4.8. Also S f (D) ≤ S| f |(D) for all dissectionsD , so taking suprema over dissections gives the inequality.

26

6. Let f : [a,b]→ R be integrable with M1 ≤ f (x)≤M2 for all x ∈ [a,b]. Then

M1(b−a)≤∫ b

af ≤M2(b−a).

In particular, if f ≥ 0 then∫ b

a f ≥ 0.

Proof. This follows since for any dissection D , M1(b−a)≤ S(D)≤ S(D)≤M2(b−a).

7. (Cauchy-Schwartz inequality) Let f ,g : [a,b]→ R be integrable.Then(∫ b

a| f g|

)2≤

∫ b

af 2

∫ b

ag2.

Proof. Note that for all λ ∈ R

0≤∫ b

a(λ| f |+ |g|)2 = λ

2∫ b

af 2 +2λ

∫ b

a| f g|+

∫ b

ag2.

The inequality now follows from the discriminant condition ‘b2 ≤ 4ac’ for the quadratic ex-pression to be non-negative for all λ.

4.4 Integration and differentiationGiven the different ways in which they are defined, it is remarkable that integration and differ-entiation are ‘inverse’ operations. This is made precise in the ‘fundamental theorem of calculus’which is conveniently stated in two parts.

Note that in (4.5) where the upper limit of integration is a variable x, we often just write∫

fto denote this function of x in the usual way.

Theorem 4.13. (Fundamental theorem of calculus – Part 1) Let f : [a,b]→ R be integrable.Define

F(x) =∫ x

af (t)dt (a≤ x≤ b). (4.5)

Then F is continuous on [a,b]. Moreover, F is differentiable at all x at which f is continuous,with

F ′(x) = f (x). (4.6)

Proof. Since every integrable function is bounded, | f (x)| ≤ M for some M > 0. Using theproperties (1), (5), (6) of the integral, if x,x+h ∈ [a,b],∣∣F(x+h)−F(x)

∣∣= ∣∣∣∫ x+h

af −

∫ x

af∣∣∣= ∣∣∣∫ x+h

xf∣∣∣≤ ∣∣∣∫ x+h

x| f |∣∣∣≤Mh.

Thus given ε > 0, if |h|< ε/M then∣∣F(x+h)−F(x)

∣∣< M× (ε/M) = ε, so F is continuous atx and so is continuous on [a,b].

Now let x be a point at which f is continuous. Given ε > 0, there exists δ > 0 such that if|y− x|< δ then f (x)− ε < f (y)< f (x)+ ε. By property (6), if 0 < h < δ,

h( f (x)− ε)≤∫ x+h

xf ,

∫ x

x−hf ≤ h( f (x)+ ε),

27

that is, if 0 < |h|< δ,

f (x)− ε≤ F(x+h)−F(x)h

=1h

∫ x+h

xf ≤ f (x)+ ε.

Thus limh→0

F(x+h)−F(x)h

= f (x), so F is differentiable at x with derivative f . 2

Theorem 4.14. (Fundamental theorem of calculus – Part 2) Let f : [a,b]→ R be differentiablewith derivative f ′ continuous on [a,b]. Then∫ x

af ′ = f (x)− f (a) ≡

[f]x

a. (4.7)

Proof. Let φ(x) =∫ x

a f ′. By Theorem 4.13, since f ′ is continuous, φ is differentiable withφ′(x)= f ′(x) for all x∈ [a,b]. Hence φ′(x)− f ′(x)≡ 0, so by the mean value theorem φ(x)− f (x)is constant on [a,b]. In particular,

φ(x)− f (x) = φ(a)− f (a) = − f (a),

that is ∫ x

af ′ = φ(x) = f (x)− f (a) (x ∈ [a,b]). 2

It follows from Theorem 4.13 that to integrate a function f it is enough to find a functionF(x) with derivative f (x) and add (or subtract) a constant so that F(a) = 0. Of course, weconventionally write

∫f for the indefinite integral F which is a function as opposed to the

definite integral∫ b

a f which is a number. Thus∫ x

1(4x3−6x2 +2x−1)dx = x4−2x3 + x2− x+1,

by checking the derivative of the RHS and getting the constant term by setting x = 1.

The standard rules for integration follow from the Fundamental Theorem of Calculus.

Proposition 4.15. (Integration by substitution) Let g : [a,b]→ R be differentiable with g′ con-tinuous, and let f : g([a,b])→ R be continuous. Then∫ g(b)

g(a)f (x)dx =

∫ b

af(g(y)

)g′(y)dy. (4.8)

Proof. Write F(x) =∫ x

c f for some c, so F ′(x) = f (x) by (4.6). Then∫ b

af(g(y)

)g′(y)dy =

∫ b

aF ′(g(y)

)g′(y)dy

=∫ b

a(F ◦g)′(y)dy (chain rule)

= F(g(b)

)−F

(g(a)

)(by(4.7))

=∫ g(b)

cf −

∫ g(a)

cf (definition of F)

=∫ g(b)

g(a)f . 2

28

Example∫ √π

0sin(y2) y dy =

∫ √π

0

12 sin(y2) 2y dy =

∫π

0

12 sinx dx =

[− 1

2 cosx]π

0 = 1,

where we have taken f (x) = 12 sinx, g(y) = y2, a = 0 and b =

√π in (4.8).

Proposition 4.16. (Integration by parts) Let f ,g : [a,b]→R be differentiable with f ′,g′ contin-uous. Then ∫ b

af g′ =

[f g]b

a −∫ b

af ′g.

Proof. From the product rule for differentiation, ( f g)′ = f g′+ f ′g. By (4.7)

[f g]b

a =∫ b

a( f g)′ =

∫ b

af g′+

∫ b

af ′g. 2

4.5 Improper integralsSo far we have assumed that we are integrating over a bounded interval [a,b] and that theintegrand f is bounded. We can remove these restrictions by defining integrals as the limit ofintegrals of bounded functions over bounded intervals.

Definition 4.17. (Improper integral - unbounded domain) Let f : [a,∞)→ R be a boundedfunction. We define ∫

∞

af (x) dx = lim

X→∞

∫ X

af (x) dx,

provided that the right-hand integrals exist for all X and the limit exists.

Example ∫∞

1

dxx2 = lim

X→∞

∫ X

1

dxx2 = lim

X→∞

[− 1

x

]X

1= lim

X→∞

[1− 1

X

]= 1. 2

Domains of the form (−∞,b] or (−∞,∞) may be treated in a similar way.

Definition 4.18. (Improper integral - unbounded function) Let f : [a,b]→R be bounded belowbut not above. We define ∫ b

af (x) dx = lim

M→∞

∫ b

afM(x) dx,

where fM(x) = min{ f (x),M}, provided that the right-hand integral exists for all M and thelimit exists.

Example To find∫ 1

0

dx√x

let fM(x) ={

M (0≤ x≤ 1/M2)1/√

x (1/M2 ≤ x≤ 1)for M > 1. Then

∫ 1

0fM(x)dx =

∫ 1/M2

0Mdx+

∫ 1

1/M2

dx√x

=1

M2 ×M+[2x1/2

]1

1/M2=

1M

+[2− 2

M

]→ 2. 2

29

5 Sequences and series of functions and power seriesThe ultimate aim of this chapter is to determine when we can take a convergent power seriesand differentiate or integrate it term by term to get a series summing to the derivative or integralof the original series sum. First, we consider sequences of functions which will eventually playthe role of partial sums of series.

5.1 Convergence of sequences of functionsConsider a sequence of functions fn : [a,b]→R (or R→R). What does it mean to say that ( fn)converges to a function f ? There are many possible ways of defining this, the most obviousdefinition just requires convergence of the sequence of numbers fn(x) at every x in the domain.

Definition 5.1. (Pointwise convergence) Let fn, f : [a,b]→R (or R→R) . We say that fn→ fpointwise if for all x ∈ [a,b] (or R), fn(x)→ f (x) (as a sequence of real numbers).

Examples.

1.x2

n+ x2 → 0 pointwise on R, since, given ε > 0,∣∣∣ x2

n+ x2 −0∣∣∣≤ x2

n< ε if n > x2/ε.

However, the following examples show that pointwise convergence does not behave very well.(Drawing graphs may help you see what is happening.)

2. Let fn : R→ R be fn(x) =

0 (x≤ 0)nx (0≤ x≤ 1/n)1 (1/n≤ x)

.

Then fn→ f pointwise where f (x) ={

0 if (x≤ 0)1 if (0 < x) . Note that f is not continuous

even though every fn is continuous.

3. Let fn : R→ R be fn(x) =nx2

1+nx2 .

Then fn(0) = 0→ 0, but fn(x)→ 1 if x 6= 0, since given ε > 0, if n > 1/(εx2),

| fn(x)−1|=∣∣∣ nx2

1+nx2 −1∣∣∣= 1

1+nx2 <1

nx2 < ε.

Again, f is not continuous even though every fn is continuous.

4. Let fn : [0,1]→ R be fn(x) =

n2x (0≤ x≤ 1/n)n2(2/n− x) (1/n≤ x≤ 2/n)0 (2/n≤ x≤ 1)

.

Then fn(x)→ 0 for all x ∈ [0,1]. But for all n

1 =∫ 1

0fn 6→

∫ 1

00 = 0.

5. Let fn : R→ R be fn(x) = 1n sinnx. Then | fn(x)| ≤ 1

n → 0 so fn → 0 pointwise, butf ′n(x) = cosnx 6→ 0 = d

dx0.

30

These examples show that continuity, integration and differentiation do not behave well ontaking pointwise limits of a sequence of functions. To overcome such problems we introduce astronger form of convergence, which requires the same ‘N’ to apply at all points x simultane-ously in the definition of convergence.

Definition 5.2. (Uniform convergence) Let fn : [a,b]→ R (or R→ R). We say that fn → funiformly if given ε > 0 there exists N such that for all n≥ N and all x ∈ [a,b] (or R), we have| fn(x)− f (x)|< ε.

Thus for pointwise convergence, ‘N’ depends on x, whilst for uniform convergence the sameN is required for all x.

Note that if fn converges to f uniformly then fn→ f pointwise, but the converse is not true.In the above examples, the sequence in (5) converges uniformly, but those in (1), (2), (3), (4) donot.

Example. Let fn : [0,5]→ R be fn(x) =4n+3sinx

2n+ x2 . Then fn(x)→ 2 uniformly on [0,5]. For

given ε > 0, ∣∣∣4n+3sinx2n+ x2 −2

∣∣∣= ∣∣∣3sinx−2x2

2n+ x2

∣∣∣≤ |3sinx|+2|x2|2n

≤ 532n

< ε

if n > 53/2ε; this is independent of x ∈ [0,1], so convergence is uniform.

The first advantage of uniform convergence is that it behaves well with respect to continuity.The following result is often stated succinctly as ‘The uniform limit of a sequence of continuousfunctions is continuous’.

Theorem 5.3. (Continuity and uniform convergence) Let f , fn : [a,b]→ R with fn → f uni-formly. If fn is continuous for all n, then f is continuous.

Proof. Let c ∈ [a,b]. Given ε > 0 we may choose N ∈ N such that | fN(x)− f (x)| < 13ε for

all x ∈ [a,b] by uniform continuity. Since fN is continuous at c there exists δ > 0 such that| fN(x)− fN(c)|< 1

3ε if |x− c|< δ. Thus, if |x− c|< δ,

| f (x)− f (c)| =∣∣ f (x)− fN(x)+ fN(x)− fN(c)+ fN(c)− f (c)

∣∣≤

∣∣ f (x)− fN(x)∣∣+ ∣∣ fN(x)− fN(c)

∣∣+ ∣∣ fN(c)− f (c))∣∣ (triangle inequality)

< 13ε+ 1

3ε+ 13ε = ε.

Hence f is continuous at c. 2

Uniform convergence also behaves well with respect to integration.

Theorem 5.4. (Integration and uniform convergence) Let fn : [a,b]→ R be integrable, and

fn→ f uniformly. Then f is integrable, and∫ b

afn→

∫ b

af .

Proof. Given ε > 0, by uniform convergence choose N such that | fn(x)− f (x)|< 13(b−a)ε for all

x ∈ [a,b] and n≥ N. Take a dissection D such that S fN (D)−S fN (D)< 13ε.

Since fN(x)> f (x)− 13(b−a)ε,

31

S fN (D) > S f (D)− (b−a)1

3(b−a)ε = S f (D)− 1

3ε.

SimilarlyS fN (D) < S f (D)+ 1

3ε.

ThusS f (D)−S f (D) ≤ S fN (D)+ 1

3ε−S fN (D)+ 13ε < 1

3ε+ 13ε+ 1

3ε = ε.

We can find such a dissection for all ε > 0, so f is integrable.Moreover, for all n≥ N,

fn(x)−1

3(b−a)ε ≤ f (x) ≤ fn(x)+

13(b−a)

ε,

so integrating ∫ b

afn− 1

3ε ≤∫ b

af ≤

∫ b

afn +

13ε

for all n≥ N. Thus∫ b

a fn→∫ b

a f . 2

Example. From the uniform convergence in the previous example,∫ 5

0

4n+3sinx2n+ x2 dx→

∫ 5

02dx = 10.

We need to be a little careful differentiating sequences of functions and taking the limit -here we need uniform convergence of the derivative rather than just of the function.

Corollary 5.5. (Differentiation and uniform convergence) Let fn : [a,b]→ R be differentiablewith f ′n continuous for all n. Suppose fn→ f pointwise and f ′n converges uniformly on [a,b].Then f : [a,b]→ R is differentiable and f ′(x) = limn→∞ f ′n(x) for all x ∈ [a,b].

Proof. Let f ′n→ g uniformly on [a,b]. Then∫ x

ag = lim

n→∞

∫ x

af ′n (as f ′n→ g uniformly)

= limn→∞

[fn(x)− fn(a)

](FTC-2)

= limn→∞

fn(x)− limn→∞

fn(a)

= f (x)− f (a).

As g is continuous (as the uniform limit of continuous functions) applying FTC-1 gives f ′(x) =g(x) = limn→∞ f ′n(x). 2

5.2 Convergence of series of functionsLike sequences of real numbers, sequences of functions often occur in the form of partial sumsof series, in particular of power series. We can formulate the results of the previous section interms of series of functions.

32

Definition 5.6. (Uniform convergence of series) Let fn : [a,b]→ R (or R→ R) and define thepartial sums SN(x) = ∑

Nn=1 fn(x). If SN tends to a function S uniformly on [a,b] (or R), we say

that the series ∑∞n=1 fn converges uniformly on [a,b] (or R) with sum S.

We can read off properties of integration and differentiation of series from the correspondingresults for sequences of functions in the last section.

Corollary 5.7. Let S(x) = ∑∞n=1 fn(x) be a uniformly convergent series on [a,b].

(1) If fn is continuous for all n then the sum S is continuous,(2) if fn is integrable for all n then the sum S is integrable with

∫ ba S(x)dx=∑

∞n=1

∫ ba fn(x)dx,

(3) if fn is differentiable with f ′n continuous for all n and ∑∞n=1 f ′n(x) is uniformly

convergent on [a,b], then the sum S is differentiable with S′(x) = ∑∞n=1 f ′n(x).

Proof. (1) and (2) follow from the definition of uniform convergence of a series by applyingTheorems 5.3 and 5.4 to the partial sums SN(x) = ∑

Nn=1 fn(x) which are uniformly convergent

to S(x).For (3), we have SN(x)→ S(x) pointwise and S′N(x) = ∑

Nn=1 f ′n(x) continuous and uniformly

convergent on [a,b]. By Corollary 5.5 S(x) is differentiable with S′(x) = limN→∞ S′N(x) =limN→∞ ∑

Nn=1 f ′n(x) = ∑

∞n=1 f ′n(x). 2

The Weierstrass M-test gives a very useful way of checking uniform convergence of series.Note that here and elsewhere, if an ≥ 0 for all n, we follow the usual convention of writing∑

∞n=1 an < ∞ to mean that the series of non-negative terms is convergent.

Proposition 5.8. (Weierstrass M-test) Let fn : [a,b]→ R (or R→ R). Suppose that there is asequence of non-negative numbers (Mn)n∈N such that ∑

∞n=1 Mn < ∞ and | fn(x)| ≤ Mn for all

n ∈ N and x ∈ [a,b] (or R). Then the series∞

∑n=1

fn(x) converges uniformly on [a,b] (or R).

Proof. First note that for each x ∈ [a,b] the series ∑∞n=1 | fn(x)| is convergent by the comparison

test (comparing with ∑∞n=1 Mn), so we may define the sum S(x) = ∑

∞n=1 fn(x) for each x.

Since ∑∞1 Mn is convergent, given ε > 0 we may find N such that ∑

∞N+1 Mn < ε. Then for all

x ∈ [a,b], ∣∣S(x)− N

∑1

fn(x)∣∣= ∣∣ ∞

∑N+1

fn(x)∣∣ ≤ ∞

∑N+1| fn(x)| ≤

∞

∑N+1

Mn < ε.

Hence the partial sums ∑Nn=1 fn(x) converge uniformly to S(x), so the series is uniformly con-

vergent. 2

Example. For 0 ≤ x ≤ 1,∣∣∣ x(n+ x2)2

∣∣∣ ≤ 1n2 and

∞

∑n=1

1n2 < ∞. By the Weierstrass M-test,

∞

∑n=1

x(n+ x2)2 is uniformly convergent on [0,1]. Thus we may integrate term by term to get

∫ 1

0

( ∞

∑n=1

x(n+ x2)2 dx

)=

∞

∑n=1

∫ 1

0

x dx(n+ x2)2 =

∞

∑n=1

[ −12(n+ x2)

]1

0=

∞

∑n=1

12

(1n− 1

n+1

)=

12.

33

Example. Fix a > 1. Then |n−x| ≤ n−a if x≥ a and ∑∞n=1

1na < ∞, so by the Weierstrass M-test,

the Riemann zeta function

ζ(x) =∞

∑n=1

1nx = 1+

12x +

13x +

14x + · · ·

is uniformly convergent on [a,∞). We may differentiate term by term to get

ζ′(x) =

∞

∑n=1

− lognnx =

− log22x +

− log33x +

− log44x + · · · ,

for all x ∈ [a,∞), since ∑∞n=1

1na | − logn| < ∞, and thus for all x > 1 since this is valid for all

a > 1.

5.3 Power seriesThe most important examples of series of functions are power series, that is series of the form

∞

∑n=0

anxn ≡ a0 +a1x+a2x2 +a3x3 + . . . where an ∈ R,

corresponding to fn(x) = anxn in the previous section.

Lemma 5.9. Suppose that the power series∞

∑n=0

anxn converges for x = x0 for some x0 6= 0. Then

for all 0 < r < |x0| the series converges uniformly and absolutely on [−r,r] and is continuouson [−r,r].

Proof. The terms of a convergent series are bounded, so there is some number M such that|anxn

0| ≤M for all n. Then for |x| ≤ r < |x0|

|anxn| ≤ |anrn| = |anxn0|

rn

|x0|n≤M

(r|x0|

)n

.

Since∞

∑n=0

M( r|x0|

)n< ∞ is a convergent geometric series with ratio

r|x0|

< 1, it follows from

the Weierstrass M-test that∞

∑n=0|anxn| is uniformly convergent on [−r,r]. 2

It now follows easily that a series converges uniformly strictly inside its ‘interval of conver-gence’.

Theorem 5.10. (Uniform convergence of power series) For a power series∞

∑n=0

anxn, one of the

following is true:(a) the series is uniformly and absolutely convergent on the interval [−r,r] for every r > 0.(b) the series converges only when x = 0,(c) there is a number R > 0 such that the series is uniformly and absolutely convergent

on every interval [−r,r] with 0 < r < R and diverges if |x|> R.

34

Proof. Let A = {|x| : ∑∞n=0 anxn converges}. If A = {0} we are in case (b). If A is unbounded,

then the conclusion of case (a) follows from Lemma 5.9. Otherwise, A is bounded and, takingR = supA, the conclusion of case (c) follows from Lemma 5.9. 2

Definition 5.11. (Radius of convergence) The number R in (c) is called the radius of conver-gence of the series; for convenience we say that the radius of convergence is 0 if case (b) holdsand is ∞ in case (a).

In case (c) there is an interval of convergence on which the series converges, and this willbe (−R,R),(−R,R], [−R,R) or [−R,R] depending on whether the series converges at the endpoints of the intervals.

Recall that the radius of convergence of a series may often be found using the ratio test.

Proposition 5.12. (Ratio test) Let∞

∑n=1

cn be a series with cn 6= 0 for all n. Suppose limn→∞

∣∣∣∣cn+1

cn

∣∣∣∣= L.

Then (a) if L < 1 the series is absolutely convergent and thus convergent,(b) if L > 1 the series is divergent,(c) if L = 1 no conclusion can be drawn.

Proof. Done in earlier courses. 2

We now show that the radius of convergence of the ‘differentiated’ and ‘integrated’ seriesare the same as that of a given series.

Proposition 5.13. (Radii of convergence of differentiated and integrated series) The power

series∞

∑n=0

anxn, the differentiated series∞

∑n=0

n anxn−1 and the integrated series∞

∑n=0

an

n+1xn+1 all

have the same radii of convergence.

Proof. Let 0 < |x|< |x1|. There is a constant M such thatn|x|

( |x||x1|

)n≤M for all n ∈ N. Then

1|x||anxn| ≤ |n anxn−1| = n

|x|

( |x||x1|

)n|anxn

1| ≤ M|anxn1|.

Let R and R′ be the radii of convergence of the original series and of the differentiated series.If |x| < R′ then ∑

∞n=1 |n anxn−1| converges, so by the comparison test ∑

∞n=0 |anxn| converges,

giving R ≥ R′. Similarly, if |x1| < R then ∑∞n=0 |anxn

1| converges so by the comparison test∑

∞n=1 |n anxn−1| converges for all |x|< |x1| giving R′ ≥ R. Thus R′ = R.

The proof for the integrated series is similar; alternatively set nan = bn−1 (n ≥ 1) in thedifferentiated series. 2

That power series can be differentiated and integrated term by term inside their interval ofconvergence now follows from the results for general series of functions.

35

Corollary 5.14. (Integration and differentiation of power series) Let the power series∞

∑n=0

anxn

have radius of convergence R (possibly ∞) and write S(x) =∞

∑n=0

anxn for the sum. Then for

|x|< R, the sum S is continuous at x, and∫ x

0S =

∞

∑n=0

an

n+1xn+1,

and

S′(x) =∞

∑n=0

n an xn−1.

Proof. The integrated and differentiated series also have radius of convergence R by Proposition5.13. Take r such that |x|< r < R. These series are uniformly convergent on [−r,r] by Theorem5.10. The conclusion is immediate from Corollary 5.7 taking fn(x) = anxn. 2

Example. The geometric series

1− x2 + x4− x6 + · · ·= 11+ x2

has radius of convergence 1 (and interval of convergence (−1,1)). Thus for every 0 < r < 1,the series is uniformly convergent on [−r,r] by Proposition 5.10. By Corollary 5.14, if |x|< 1,

x− x3

3+

x5

5− x7

7+ · · · =

∫ x

0

dx1+ x2 = tan−1 x. (5.9)

In fact this series also sums to tan−1 x when x = 1 which is Gregory’s series for π = 4tan−1 1.It may be shown, using trig formulae or complex numbers, that

π

4= 4tan−1 1

5− tan−1 1

239.

Using the series (5.9) to estimate the two terms in this identity can give very accurate approx-imations to π; indeed until relatively recently the value of π to the greatest number of decimalplaces was found in this way.

Example. Fix 0 < a < 1. The geometric series

1− x+ x2− x3 + · · ·= 11+ x

has radius of convergence 1 (and interval of convergence (−1,1)). Thus for every 0 < r < 1,the series is uniformly convergent on [−r,r] by Proposition 5.10. By Corollary 5.14, we get thelogarithmic series for |x|< 1,

x− x2

2+

x3

3− x4

4+ · · · =

∫ x

0

dx1+ x

= log(1+ x).

36

6 Metric and normed spacesThe aim of this chapter is to show that ideas such as convergence, continuity, etc, extend veryeasily to a vastly wider setting, that includes N-dimensional Euclidean space RN , spaces ofsequences and spaces of functions. We introduce the very general concept of a metric space,where there is a notion of distance which satisfies some very basic properties, and in particularconsider normed spaces where there is an underlying vector space.

6.1 Metric spacesAny reasonable notion of the distance between points of a set should satisfy three basic andnatural conditions.

Definition 6.1. (Metric space) Let X be a non-empty set. A metric or distance function d on Xis a function d : X×X → R such that

(M1) d(x,y) ≥ 0 for all x,y ∈ X, with d(x,y) = 0 if and only if x = y, (positivity)

(M2) d(x,y) = d(y,x) for all x,y ∈ X, (symmetry)

(M3) d(x,y) ≤ d(x,z) + d(z,y) for all x,y,z ∈ X. (triangle inequality)

The pair (X ,d) is called a metric space.

Thinking of d(x,y) as the distance between two points, these three conditions have obviousinterpretations. (M1) says that all distances are non-negative and are strictly positive just whenthe points are distinct. (M2) says that the distance from one point to another is the same as thedistance going the other way. The triangle inequality (M3) says that the length of a journeycannot decrease if a specified intermediate point is visited on the way.

For most examples, conditions (M1) and (M2) are virtually obvious but some checking maybe needed for (M3).

Examples.

1. d(x,y) = |x− y| is a metric on R.

(M1) and (M2) are trivial, for (M3) note that for all x,y,z ∈ R,

d(x,y) = |x− y|= |(x− z)+(z− y)| ≤ |x− z|+ |z− y|= d(x,z)+d(z,y)

using the triangle inequality for real numbers.

2. d(x,y) =∣∣1

x −1y

∣∣ is a metric on [1,∞).

(M1) and (M2) are easy, for (M3), for all x,y,z ∈ R,

d(x,y) =∣∣1

x −1y

∣∣= ∣∣(1x −

1z

)+(1

z −1y

)∣∣≤ ∣∣1x −

1z

∣∣+ ∣∣1z −

1y

∣∣= d(x,z)+d(z,y)

again using the triangle inequality for real numbers.

3. Let X be any set and define d(x,y) ={

0 (x = y)1 (x 6= y) . Then d is a metric on X called the

discrete metric.

Again (M1) and (M2) are clear. For (M3), either d(x,y) = 0 in which case the inequalityis obvious, or d(x,y) = 1 so x 6= y with either x 6= z so d(x,z) = 1 or z 6= y so d(z,y) = 1.

37

4. d((x1,x2),(y1,y2)

)= max{|x1− y1|, |x2− y2|} is a metric on R2.

(M1) and (M2) are clear. For (M3), for all (x1,x2),(y1,y2),(z1,z2) ∈ R2,

d((x1,x2),(y1,y2)

)= max{|x1− y1|, |x2− y2|}≤ max{|x1− z1|+ |z1− y1|, |x2− z2|+ |z2− y2|}≤ max{|x1− z1|, |x2− z2|}+ max{|z1− y1|, |z2− y2|}= d

((x1,x2),(z1,z2)

)+d((z1,z2),(y1,y2)

)where we have used the triangle inequality for real numbers.

5. d((x1,x2),(y1,y2)

)= +

√|x1− y1|2 + |x2− y2|2 is the Euclidean or usual metric on R2.

(M1) and (M2) are clear. The triangle inequality (M3) is an exercise using Cauchy’sinequality, or see below.

6.2 Normed spacesMany of the most important metric spaces are vector spaces (e.g. RN , spaces of continuousfunctions), and the natural distance is translation invariant, in the sense that, d(x+ z,y+ z) =d(x,y) for all x,y,z. In other words the distance between two points x,y depends only on theirvector difference. In this situation it is natural to define distances in terms of the ‘length’ ofvectors.

Here, all vector spaces will be over R, i.e. vectors will only be multiplied by real scalars.However, there is no problem in extending the work to vector spaces over C.

Definition 6.2. (Normed space) Let X be a vector space. A norm on X is a function ‖·‖ : X→Rsuch that:

(N1) ‖x‖ ≥ 0 for all x ∈ X, with ‖x‖= 0 if and only if x = 0, (positivity)

(N2) ‖λx‖ = |λ|‖x‖ for every x ∈ X and every λ ∈ R, (scalar property)

(N3) ‖x+ y‖ ≤ ‖x‖+‖y‖ for all x,y ∈ X. (triangle inequality)

The pair (X ,‖ · ‖) is called a normed space.

We can immediately make two deductions from the triangle inequality for norms.

Lemma 6.3. In a normed space (X ,‖ · ‖)

(1)∥∥∥ n

∑i=1

xi

∥∥∥ ≤ n

∑i=1‖xi‖ for all xi ∈ X ,

(2)∣∣∣‖x‖−‖y‖∣∣∣ ≤ ‖x− y‖ for all xi ∈ X (reverse triangle inequality).

Proof (1) follows by repeated application of the triangle inequality (N3).For (2), let x,y ∈ X . Then

‖x‖= ‖(x− y)+ y‖ ≤ ‖x− y‖+‖y‖,

so‖x‖−‖y‖ ≤ ‖x− y‖.

38

Interchanging the roles of x and y completes the proof. 2

We think of ‖x‖ as the ‘length’ of the vector x in the vector space X , so that ‖x− y‖ is a‘distance’ between the two points x and y. In this way a normed space is automatically a metricspace in a natural way, as the following lemma makes precise.

Lemma 6.4. Let (X ,‖ · ‖) be a normed space. Define d : X×X → R by

d(x,y) = ‖x− y‖.

Then d is a metric on X

Proof. The three properties required for d to be a metric follow easily from the three definingproperties of the norm.

Property (M1) follows from (N1) by noting that d(x,y) = ‖x− y‖ ≥ 0 with equality if andonly if ‖x− y‖= 0, i.e. x = y.

For (M2), d(x,y) = ‖x− y‖= ‖−1(y− x)‖= |−1| ‖(y− x)‖= d(y,x), using (N2).For (M3), d(x,y) = ‖x−y‖= ‖(x− z)+(z−y)‖ ≤ ‖(x− z)‖+‖(z−y)‖= d(x,z)+d(z,y),

using (N3). 2

Examples. Here follow some standard examples of normed spaces. In most cases (N1) and(N2) are easy to check, but (N3) will depend on establishing an appropriate inequality whichmay be more involved.

1. The familiar space (R, | · |) is a normed space, with (N2) the absolute value multiplicationproperty and (N3) the triangle inequality for real numbers.

2. Norms can be defined on the vector space RN ={

x = (x1,x2, . . . ,xN) : xi ∈ R}

of coor-dinate vectors in many ways, including

(a) ‖x‖= max1≤i≤N |xi| (maximum norm),

(b) ‖x‖1 = ∑Ni=1 |xi| (1-norm or ‘New York’ norm (why?)),

(c) ‖x‖2 =(∑

Ni=1 |xi|2

)1/2 (2-norm, Euclidean norm or usual norm),

(d) ‖x‖p =(∑

Ni=1 |xi|p

)1/p (for p≥ 1 only) (p-norm).

Showing (a) and (b) are norms on RN are simple exercises (using the triangle inequalityfor real numbers), (d) is rather harder and depends on Minkowski’s inequality.

We show here that (c) is a norm on RN :

For (N1), note that ‖x‖2 ≥ 0 and ‖x‖2 = 0 if and only if xi = 0 for all i, i.e. if and only ifx = 0.

For (N2), ‖λx‖2 =(

∑Ni=1 |λxi|2

)1/2=(

∑Ni=1 |λ|2|xi|2

)1/2= |λ|

(∑

Ni=1 |xi|2

)1/2= |λ|‖x‖2.

For (N3) we use Cauchy’s inequality, that(

∑Ni=1 |aibi|

)2 ≤(

∑Ni=1 a2

i)(

∑Ni=1 b2

i)

for realnumbers ai,bi. [This can be proved in exactly the same way as the Cauchy-Schwartzinequality for integrals.] Then

‖x+ y‖22 = ∑

Ni=1(xi + yi)

2 = ∑Ni=1 x2

i + 2∑Ni=1 xiyi + ∑

Ni=1 y2

i

≤ ∑Ni=1 x2

i +2(

∑Ni=1 x2

i)1/2(

∑Ni=1 y2

i)1/2

+ ∑Ni=1 y2

i

=((

∑Ni=1 x2

i)1/2

+(

∑Ni=1 y2

i)1/2

)2=(‖x‖2 + ‖y‖2

)2.

39

3. The set C[0,1] of real-valued continuous functions on [0,1] forms a vector space underthe obvious operations

( f +g)(x) = f (x)+g(x) and (λ f )(x) = λ f (x) (x ∈ [0,1], λ ∈ R).

Possible norms on C[0,1] include:

(a) ‖ f‖∞ = supx∈[0,1]

| f (x)| is known as the supremum norm or uniform norm, since, as

we will see, it is closely related to uniform convergence.

For (N1), note that ‖ f‖∞ ≥ 0 and ‖ f‖∞ = 0 if and only if supx∈[0,1] | f (x)|= 0, i.e. if andonly if f (x) = 0 for all x ∈ [0,1].

For (N2), ‖λ f‖∞ = supx∈[0,1]

|λ f (x)|= supx∈[0,1]

|λ|| f (x)|= |λ| supx∈[0,1]

| f (x)|= |λ|‖x‖∞.

To verify the triangle inequality (N3),

‖ f +g‖∞ = supx∈[0,1] | f (x)+g(x)| ≤ supx∈[0,1](| f (x)|+ |g(x)|)≤ supx∈[0,1] | f (x)| + supy∈[0,1] |g(y)| = ‖ f‖∞ + ‖g‖∞.

(b) ‖ f‖1 =∫ 1

0| f (x)|dx is known as the 1-norm. (Exercise: show it is a norm.) Note

that a function may have a large ‖ · ‖∞-norm but a small ‖ · ‖1-norm.

(c) ‖ f‖p =(∫ 1

0| f (x)|pdx

)1/pis the p-norm, and is a norm for p≥ 1. The triangle

inequality requires an integral version of Minkowski’s inequality.

4. The space C1[0,1] of functions on [0,1] with continuous derivative becomes a normedspace taking

‖ f‖= supx∈[0,1]

| f (x)|+ supx∈[0,1]

| f ′(x)|.

(Exercise).

5. (For those who are familiar with inner products.) If 〈 , 〉 is an inner product on a vectorspace X then ‖x‖= |〈x,x〉|1/2 defines a norm on X . The triangle inequality follows as in2(c) above, using the Cauchy-Schwartz inequality in the form |〈x,y〉| ≤ ‖x‖‖y‖.

6.3 Convergence in metric and normed spacesMetric spaces (and thus normed spaces) have enough structure to be able to define notions suchas convergence and continuity. Indeed, the definitions are virtually the same as those we knowfor sequences of real numbers and real functions. We just have to replace |xn− x| by d(xn,x),etc. in the definitions.

Definition 6.5. (Convergence) In a metric space (X ,d), the sequence (xn)∞n=1 converges to x,

written xn → x or limxn = x, if for all ε > 0 there exists N ∈ N such that d(xn,x) < ε for alln≥ N.

Thus, in a normed space (X ,‖ · ‖), xn → x if for all ε > 0 there exists N ∈ N such that‖xn− x‖< ε for all n≥ N.

40

Things become easy if we note that the above definition is precisely the definition that thesequence of real numbers

(d(xn,x)

)∞

n=1 converges to 0. Thus we may reformulate this definition.

Definition 6.6. (Equivalent definition of convergence) In a metric space (X ,d), the sequence(xn)

∞n=1 converges to x if d(xn,x)→ 0 as n→ ∞.In a normed space (X ,‖ · ‖) this becomes ‖xn− x‖→ 0 as n→ ∞.

Examples.

1. In (R2,‖ · ‖2), the sequence((n+1

n , 2nn+1)

)→ (1,2). To see this,∥∥(n+1

n , 2nn+1)− (1,2)

∥∥2 =

√(n+1

n −1)2 +( 2nn+1 −2)2 =

√1n2 +

4(n+1)2 ≤

√5n2 =

√5

n → 0.

2. (Uniform convergence) Let C[0,1] be the vector space of continuous functions on [0,1],and let ‖ f‖∞ = supx∈[0,1] | f (x)| be the uniform norm on C[0,1], with corresponding metric

d( f ,g) = supx∈[0,1]

| f (x)− g(x)|. Then fn→ f in ‖ · ‖∞ if and only if fn→ f uniformly on

[0,1] .

To see this, note fn→ f in ‖ · ‖∞ if and only if supx∈[0,1] | fn(x)− f (x)|= ‖ fn− f‖∞→ 0,that is if and only if given ε > 0 there is an N such that for all n ≥ N, supx∈[0,1] | fn(x)−f (x)| ≤ ε, i.e. | fn(x)− f (x)| ≤ ε for all x∈ [0,1] and n≥N, which is uniform convergence.

3. Let fn ∈C[0,1] be fn(x) =

2nx (0≤ x≤ 1/2n)2n(1

n − x) (1/2n≤ x≤ 1/n)0 (1/n≤ x≤ 1)

.

Then ‖ fn− 0‖∞ = supx∈[0,1] | fn(x)− 0| = 1 for all n, but ‖ fn− 0‖1 =∫ 1

0 | fn(x)− 0|dx =1/(2n)→ 0. Thus fn→ 0 in ‖ · ‖1 but fn 6→ 0 in ‖ · ‖∞.

4. Let d be the discrete metric on some set X given by d(x,y) ={

0 (x = y)1 (x 6= y) .

Then xn→ x iff d(xn,x)→ 0 iff there is an N such that xn = x for all n≥ N.

There are many instances where two different metrics (or norms) on the same space areclosely enough related to yield the same convergent sequences, as well as sharing other proper-ties. This motivates the definition of equivalent metrics (or norms).

Definition 6.7. (Equivalence of metrics or norms) We say that two metrics d and d′ on the sameset X are equivalent if there are positive constants 0 < a≤ b such that

ad(x,y)≤ d′(x,y)≤ bd(x,y) for all x,y ∈ X . (6.10)

Similarly, two norms ‖ · ‖ and ‖ · ‖′ on the same vector space X are equivalent if

a‖x‖ ≤ ‖x‖′ ≤ b‖x‖ for all x ∈ X . (6.11)

Note that given equivalent norms satisfying (6.11), the associated metrics given by d(x,y) =‖x− y‖ and d′(x,y) = ‖x− y‖′ automatically satisfy (6.10).

Lemma 6.8. Let d and d′ be equivalent metrics on X. If xn,x ∈ X then xn→ x in d if and onlyif xn→ x in d′. Similarly for equivalent norms.

41

Proof Using (6.10), if xn→ x in d then d′(xn,x)≤ bd(xn,x)→ 0 so xn→ x in d′. Conversely, ifxn→ x in d′ then d(xn,x)≤ 1

ad′(xn,x)→ 0 so xn→ x in d. 2

Corollary 6.9. The norms ‖ · ‖1, ‖ · ‖2 and ‖ · ‖∞ on RN are all equivalent.

Proof (for R2). Note that for (x1,x2) ∈ R2,

12(|x1|+ |x2|

)≤max

{|x1|, |x2|

}≤(|x1|2 + |x2|2

)1/2 ≤(|x1|+ |x2|

),

that is12

∥∥(x1,x2)∥∥

1 ≤∥∥(x1,x2)

∥∥∞≤∥∥(x1,x2)

∥∥2 ≤

∥∥(x1,x2)∥∥

1,

i.e. the norms are equivalent. 2

It follows that in Example 1 above, we showed that the sequence considered converges in‖ · ‖2, so it also converges in the equivalent norms ‖ · ‖1 and ‖ · ‖∞

In fact it can be shown that any two norms on RN are equivalent.

Note that we can easily define continuity of a function between metric spaces, either with an‘ε-δ’ definition or using sequences, by direct analogue with continuity of real-valued functionson R.

Definition 6.10. (Continuous functions between metric spaces) Given metric spaces (X ,d) and(Y,d′), a function f : X →Y is continuous at c ∈ X if for all ε > 0 there exists δ > 0 such that ifx ∈ X and d(x,c)< δ then d′( f (x), f (c))< ε.

A function f is continuous on X if f is continuous at all c ∈ X.

The sequential characterisation of continuity also works as expected. That is, if (X ,d) and(Y,d′) are metric spaces and f : X → Y , then f is continuous at c ∈ X if for every sequence(xn)n in X with xn → c in d, we have f (xn)→ f (c) in d′. Continuity of functions betweenmetric spaces can be developed in a similar way to that for functions on the real numbers.

6.4 CompletenessRecall the General Principle of Convergence: every Cauchy sequence of real numbers con-verges. Such a statement often holds in other metric or normed spaces; such spaces are calledcomplete. Anything that guarantees that a sequence converges without the need to find the limitis extremely useful.

Definition 6.11. (Cauchy Sequence) A sequence (xn) in a metric space (X ,d) is called a Cauchysequence (or just Cauchy) if for all ε > 0, there exists N ∈ N such that d(xn,xm) < ε for alln,m≥ N.

Thus (xn) is Cauchy in a normed space (X ,‖ ·‖) if for all ε > 0, there exists N ∈N such that‖xn− xm‖< ε for all n,m≥ N.

Example.

1. Let fn(x) = sin(x+1/n) in(C[0,1],‖ · ‖1).Then

‖ fn− fm‖1 =∫ 1

0

∣∣sin(x+1/n)− sin(x+1/m)∣∣dx≤

∫ 1

0

∣∣1/n−1/m∣∣dx≤ 1

n+

1m

< ε

if n,m≥ 2/ε = ‘N’. Thus ( fn) is Cauchy.

42

Proposition 6.12. In a metric or normed space, every convergent sequence is a Cauchy se-quence.

Proof. Suppose xn→ x in (X ,d). Given ε > 0 we may find N ∈ N such that d(xn,x) < ε/2 ifn≥ N. Then if n,m≥ N,

d(xn,xm)≤ d(xn,x)+d(x,xm)< ε/2+ ε/2 = ε,

using the triangle inequality, so (xn) is Cauchy. 2

The converse of this proposition is in general not true. The Cauchy sequences may bethought of sequences that are ‘trying to converge’ and a space is complete if there is somewherefor them to converge to.

Definition 6.13. (Complete metric or normed space) A metric space (X ,d) or normed space(X ,‖ · ‖) is called complete if every Cauchy sequence in the space converges to some point inthe space. [A complete normed space is called a Banach space.]

Of course, we already know that the real numbers are complete under the usual distance.

Proposition 6.14. (General Principle of Convergence) Every Cauchy sequence of real numbersconverges, i.e. the space (R, | · |) is complete.

It is useful to have the completeness property in a normed space since we can assert thatlimits of sequences exist without explicitly having to find the limit.

Many important normed spaces are complete and we will give some of these below. Indeedthe completeness of many spaces follows without too much difficulty from the completeness ofthe real numbers.

Important Examples 6.11.

1. Let [a,b]⊂ R be a closed interval. Then ([a,b], | · |) is complete.

Proof: Note that if (xn) is a Cauchy sequence in ([a,b], | · |) then it is certainly a Cauchysequence of real numbers, so xn→ x for some x ∈ R. Then x−a = limn→∞(xn−a) ≥ 0and b− x = limn→∞(b− xn)≥ 0, so x ∈ [a,b]. 2

2. The spaces (RN ,‖ · ‖2), (RN ,‖ · ‖1), and (RN ,‖ · ‖∞) are all complete.

Proof . We first consider (RN ,‖ · ‖1); to keep coordinate notation reasonably simple wejust prove completeness of (R2,‖ · ‖1), though the calculation is virtually the same forlarger N. Let (xn) =

((x1

n,x2n))

be a Cauchy sequence in ‖ · ‖1. Thus given ε > 0, there isan N such that if n,m≥ N then

ε > ‖xn− xm‖1 =∥∥(x1

n,x2n)− (x1

m,x2m)∥∥

1 =∥∥(x1

n− x1m,x

2n− x2

m)∥∥

1 = ∑2i=1∣∣xi

n− xim∣∣.

In particular, ε > |xin− xi

m∣∣ for each i = 1,2, i.e. (xi

n)∞n=1 is a Cauchy sequence of real

numbers, so by the GPC, xin→ xi for some xi ∈ R for i = 1,2. Let x = (x1,x2). Then

‖xn− x‖1 = ∑2i=1 |xi

n− xi| → 0,

i.e. xn→ x in ‖ · ‖1. Thus (R2,‖ · ‖1) is complete.

We noted in Corollary (6.9) that the norms ‖·‖1, ‖·‖2 and ‖·‖∞ are equivalent. It followseasily from the definition of equivalence that if a space is complete under one norm thenit is complete under any equivalent norm, so RN is also complete under ‖ · ‖2 and ‖ · ‖∞.[In fact RN is complete under any norm that might be defined on it.] 2

43

3. The space (C[0,1],‖ · ‖∞) is complete.

[Recall that C[0,1] is the space of continuous functions on [0,1] and ‖ f (x)‖= supy∈[0,1] | f (y)|;of course [0,1] can be replaced by any other closed interval [a,b].]

Proof: Suppose ( fn)n is a Cauchy sequence in (C[0,1],‖ · ‖∞). For each x ∈ [0,1],

| fn(x)− fm(x)| ≤ supy∈[0,1]

| fn(y)− fm(y)|= ‖ fn− fm‖∞,

so given ε > 0, there is N ∈ N such that for all n,m≥ N,

| fn(x)− fm(x)|< ε for every x ∈ [0,1]. (6.12)

Thus for each x, the sequence(

fn(x))

n is a Cauchy sequence of real numbers, so by thecompleteness of R (i.e. the GPC) it converges to a limit which we call f (x).

Letting m→ ∞ in (6.12) it follows that if n ≥ N then | fn(x)− f (x)| ≤ ε for every x ∈[0,1]. Thus fn→ f uniformly on [0,1]. As the uniform limit of a sequence of continuousfunctions is continuous (Theorem 5.3) f ∈C[0,1], and also | fn(x)− f (x)|∞ ≤ ε for all xif n≥ N, so fn→ f in ‖ · ‖∞. 2

Here is an example of how we can use completeness to show that a sequence of functionsconverges where it would be impossible to find an explicit function that is the limit.

Example. Define a sequence of continuous functions fn : [0,1]→ R by

f0(x) = x3, fn(x) =x2

sin(

fn−1(x))+

12

cosx (n = 1,2, . . .). (6.13)

Then fn converges in ‖ · ‖∞.

Check. For positive integers n,m, (6.13) gives

| fn(x)− fm(x)|=x2

∣∣sin(

fn−1(x))− sin

(fm−1(x)

)∣∣≤ x2

∣∣ fn−1(x)− fm−1(x)∣∣

so taking suprema,

‖ fn(x)− fm(x)‖∞ ≤12‖ fn−1(x)− fm−1(x)‖∞.

We can iterate this m times to get

‖ fn(x)− fm(x)‖∞ ≤1

2m ‖ fn−m(x)− f0(x)‖∞ ≤1

2m ,

where the final inequality follows since 0 ≤ fn(x) ≤ 1 for all n and all x ∈ [0,1] ( by a simpleinduction using (6.13)).Thus, given ε > 0, choose N such that 1/2N < ε. Then if n,m ≥ N,‖ fn(x)− fm(x)‖∞ ≤ 1/2N < ε. Hence fn is a Cauchy sequence in the complete normed space(C[0,1],‖ · ‖∞), so fn converges in the norm ‖ · ‖∞ to some f ∈C[0,1]. 2

44

6.5 The contraction mapping theoremThis final section concerns a very important and useful theorem valid in any complete metric ornormed space that can be used to show that a unique solution exists to a wide range of problems,including simple equations, systems of equations, matrix equations, differential equations, etc,etc. It also provides an algorithm for finding the solution.

We will work with a function f that contracts distances on a complete metric space (X ,d).

Definition 6.15. Let (X ,d) be a metric space. A function f : X → X is a contraction if there isa constant 0 < c < 1 such that

d(

f (x), f (y))≤ c d(x,y) for all x,y ∈ X . (6.14)

Given a function f : X → X we write f n : X → X for the nth iterate of f , defined by f n(x) =f(

f(

f (· · · f (x) · · ·)))

where f is applied n times.We will prove the contraction mapping theorem, also known as Banach’s fixed point theo-

rem.

Theorem 6.16. (Contraction mapping theorem) Let f be a contraction on a complete metricspace (X ,d). Then f has a unique fixed point x0, i.e. there exists exactly one x0 ∈ X such thatf (x0) = x0.

Furthermore, for every x ∈ X, we have d(

f n(x),x0)≤ cnd

(x,x0)→ 0 as n→ ∞.

Proof. Take any x ∈ X . We show that the sequence ( f n(x))∞n=1 is Cauchy. First, using (6.14)

repeatedly, for all n ∈ N,

d(

f n(x), f n−1(x))≤ cd

(f n−1(x), f n−2(x))≤ c2d

(f n−2(x), f n−3(x)

)≤ ·· · ≤ cn−2d

(f 2(x), f 1(x)

)≤ cn−1d

(f 1(x),x

). (6.15)

Now let n > m be integers. Applying the triangle inequality repeatedly

d(

f n(x), f m(x))≤ d

(f n(x), f n−1(x)

)+d(

f n−1(x), f n−2(x))+ · · ·+d

(f m+1(x), f m(x)

)≤

[cn−1 + cn−2 + · · ·+ cm]d( f (x),x

)< cm d( f (x),x)

1− cusing (6.15) and summing the geometric series. Given ε > 0, choose N large enough so thatcNd( f (x),x)/(1− c)< ε. Then d

(f n(x), f m(x)

)< ε if n,m≥ N. Thus ( f n(x))∞

n=1 is a Cauchysequence in the complete metric space (X ,d) so f n(x)→ x0 for some x0 ∈ X .

Using the triangle inequality and (6.14),

d( f (x0),x0)≤ d( f (x0), f n(x))+d( f n(x),x0)≤ cd(x0, f n−1(x))+d( f n(x),x0)→ 0

as n→ ∞. It follows that d( f (x0),x0) = 0 and so f (x0) = x0, that is x0 is a fixed point of f .

Suppose that x1 is also a fixed point. Then by (6.14)

d(x0,x1) = d( f (x0), f (x1))≤ cd(x0,x1),

so d(x0,x1) = 0 since 0 < c < 1, so x0 = x1 giving uniqueness of the fixed point.

Finally, for every x ∈ X , applying (6.14) repeatedly,

d(

f n(x),x0)= d(

f n(x), f n(x0))≤ cnd(x,x0)→ 0

as n→ ∞. 2

45

Example 6.17. (Roots of simple equations) The equation x3 + 3x− 1 = 0 has a unique realsolution x' 0.32219.

Rearranging the equation gives x =1

x2 +3.

The mapping f : R→ R given by f (x) = 1/(x2 + 3) is a contraction on the complete space(R, | · |), since

| f (x)− f (y)|=∣∣∣∣ 1x2 +3

− 1y2 +3

∣∣∣∣= ∣∣∣∣ (y− x)(y+ x)(x2 +3)(y2 +3)

∣∣∣∣≤ |x− y|

(|x|

x2 +31

y2 +3+|y|

y2 +31

x2 +3

)≤ 2

3 |x− y|.

By the CMT there is a unique x ∈ R such that f (x) = x, i.e. the equation has a unique solutionin R. Moreover, for any y ∈R, the sequence ( f ny)n converges to the solution, so taking y = 0.5we get the sequence of approximations

0.5, 0.308, 0.323, 0.3221, 0.32219, . . . . 2

Example 6.18. (Simultaneous nonlinear equations) The simultaneous equations

4x− siny = 5, cos2 x+4y = 3,

have a unique solution (x,y) ∈ R2.

We can rearrange these equations as

15x+ 1

5 siny+1 = x, −16 cos2 x+ 1

3y+ 12 = y. (6.16)

Recalling that ‖(x,y)‖∞ = max{|x|, |y|}, define f on (R2,‖ · ‖∞) by

f (x,y) =(1

5x+ 15 siny+1,−1

6 cos2 x+ 13y+ 1

2

).

Then

f (x,y)− f (x′,y′) =(1

5(x− x′)+ 15(siny− siny′),−1

6(cos2 x− cos2 x′)+ 13(y− y′)

),

so a simple estimate gives

‖ f (x,y)− f (x′,y′)‖∞ ≤max{25 ,

23}max{|x− x′|, |y− y′|}= 2

3‖(x,y)− (x′,y′)‖∞.

Hence f is a contraction on the complete normed space (R2,‖ · ‖∞), so by the CMT there is aunique (x,y) ∈ R2 such that f (x,y) = (x,y), that is a unique solution to (6.16).

As before, f n(x1,y1) gives a sequence of increasingly good approximations to this solutionfor any initial (x1,y1) 2

We next turn to the question of whether a given differential equation has a solution and if sowhether it is unique. This does not always happen even for simple DEs, for example,

dydx

= 2|y|1/2 with y(0) = 0

has y = 0 and y = x2 both as solutions, in fact, it has infinitely many solutions. However, theCMT may used to show that that a wide variety of differential equations have unique solutions.

46

Example 6.19. (Differential equations) The differential equation

d fdx

= 14 cos

(f (x)

)+ x3 with f (0) = 0

has a unique solution for 0≤ x≤ 1.

This differential equation is equivalent to the integral equation

f (t) = 14

∫ x

0cos(

f (u))du+ 1

4x4, (6.17)

which incorporates the initial conditions of the DE. Define a mapping φ on (C[0,1],‖ · ‖∞) by(φ( f )

)(x) = 1

4

∫ x

0cos(

f (u))du+ 1

4x4.

Then ∣∣(φ( f ))(x)−

(φ(g)

)(x)∣∣= ∣∣∣1

4

∫ x

0

(cos(

f (u))− cos

(g(u)

))du∣∣∣

≤ 14

∫ x

0

∣∣cos(

f (u))− cos

(g(u)

)∣∣du

≤ 14

∫ x

0

∣∣ f (u)−g(u)∣∣du

≤ 14x ‖ f −g‖∞ ≤ 1

4‖ f −g‖∞ since x ∈ [0,1].

In particular, ‖φ( f )−φ(g)‖∞ ≤ 14‖ f − g‖∞. Hence φ is a contraction on the complete normed

space (C[0,1],‖ · ‖∞). By the CMT there is a unique fixed point of φ, i.e a unique function thatsatisfies φ( f ) = f , that is the solution to the integral equation (6.17) and thus to the differentialequation. We can use iteration to find a sequence of approximations φn(g) to this solution fstarting with any initial g ∈C[0,1] and this can be done numerically. 2

47

MT3502 - Real Analysis Jonathan M. Fraserjmf32/MT3502Notes.pdfMT3502 - Real Analysis Jonathan M....

Documents

Transcript of MT3502 - Real Analysis Jonathan M. Fraserjmf32/MT3502Notes.pdfMT3502 - Real Analysis Jonathan M....