Notes on Real Analysis, 3A03 · (D)If Sis unbounded below, we write inf S= 1 . Returning to Example...

Notes on Real Analysis, 3A03

Prof. S. Alama1

Revised April 5, 2020

Contents

1 The Completeness Axiom, Supremum and Infimum 5

2 Sequences 122.1 Limits of Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.2 Monotone Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.3 Divergent Sequences and Subsequences . . . . . . . . . . . . . . . . . . . . . 232.4 Cauchy Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3 Series 29

4 Cardinality 37

5 Limits and Continuity 425.1 Limits of Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435.2 Continuous Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475.3 Extensions of limit concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . 525.4 Uniform Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

6 Differentiability 586.1 Differentiable Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 586.2 The Mean Value Theorem and its Consequences . . . . . . . . . . . . . . . . 616.3 Taylor’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

7 The Fundamental Theorem of Calculus 70

Introduction

Real Analysis is the study of the real numbers R and of real valued functions of one or more

real variables. If this sounds familiar, it should: you’ve been doing calculations with real

numbers and functions for several years now, starting in high school and continuing through

calculus. On the other hand, most of what you’ve done with real numbers and functions

has involved applying “rules” intended to justify which kinds of calculations are appropriate

(in the sense that they lead to correct results,) which might have seemed mysterious and

arbitrary to you at the time. The goal of this course is to develop some understanding

of functions of a real variable using logical deductive reasoning, and in doing so obtain a

1 c©2018 All Rights Reserved. Do not distribute without author’s permission.

1

much firmer grasp of how these “rules” may be justified and how to determine whether any

computational procedure will always yield correct results.

One of the great mysteries, in fact, is to figure out what the real numbers actually are.

You’ve learned that they include all of the other more familiar number systems from elemen-

tary mathematics: the counting (natural) numbers N, integers Z, rational numbers (fractions

of integers) Q, and algebraic numbers (ie, roots of polynomials with integer coefficients, such

as√

2, 3√

5, . . . .) In high school you mostly learned the algebra of real numbers. Amazingly,

one cannot really understand the reals via algebra. Their fundamental properties involve

analysis– basically, one must understand some sense of limits.

Let’s summarize some algebraic and geometric properties of the reals, which you aleady

know but perhaps haven’t thought of in these terms:

• R is a field: If a, b, c ∈ R, then so are a + b, ab, and b/c if c 6= 0. Addition and

multiplication are commutative, a + b = b + a and ab = ba, associative, a + (b + c) =

(a+ b) + c and a(bc) = (ab)c, and distributive, a(b+ c) = ab+ ac. Each a ∈ R has an

additive inverse −a ∈ R, and a multiplicative inverse 1/a ∈ R, provided a 6= 0.

• R is an ordered set. If a 6= b are distinct real numbers, then either a b and b > c, then a > c,) and is

consistent with the algebraic operations. In particular, if a, b, c ∈ R with a 0, then ac < bc, but if c < 0, then ac > bc.

• R has a “metric property”, determined by the absolute value,

|x| =

{x, if x ≥ 0,

−x, if x < 0.

Then, the distance between a, b ∈ R is defined by d(a, b) = |a − b|, the length of the

line segment joining the two points on the number line.

We say R is an ordered field. However, the rational number set Q is also an ordered

field! The distinction between R and Q, and the intimate relationship between them form

an important theme in the course. To see how the algebraic and ordering properties are

defined, and how the various familiar “rules” are justified from the axioms, see Section 2 in

the Bartle & Sherbert textbook.

Let’s take a little time to verify some basic ordering and metric properties of R.

Lemma 0.1. Assume a, b ∈ R with a, b > 0. Then

a < b if and only if a2 < b2.

2

We often write “iff” for if and only if, or use the equivalence arrow with two heads, ⇐⇒ .

So we could also have written the statement of the lemma symbolically as,

∀a, b > 0, [a 0, we multiply the inequality by a

and it remains true:

a2 = a · a 0 and a2 < b2; what we need to conclude is that a < b.

Subtracting a2 from both sides gives,

0 < b2 − a2 = (b− a)(b+ a),

using the field properties to write the difference of squares as a factored product. Now,

a, b > 0 implies that (a+ b) > 0, so multiplying both sides by 1/(a+ b) (which is positive,)

we have (b−a) > 0. Adding a to both sides yields a < b, as needed. The Lemma is therefore

proven.

We remark that the following is a consequence of the Lemma: if x, y > 0, then x < y

iff√x <

√y. (You don’t need to do anything much to verify this, just think about it for

a minute.) Finally, note that the condition a, b > 0 is essential– the Lemma would be false

if stated for any a, b ∈ R. (Try a numerical example with a < 0 and b > 0.) Moral: the

hypotheses of a statement are very very important!

We conclude this section with some facts about the absolute value. From the definition

of the absolute value, we have:

(a) For all x ∈ R, |x| ≥ 0;

(b) For all x ∈ R, −|x| ≤ x ≤ |x|;

(c) For all x, y ∈ R, |xy| = |x| |y|.

To verify each, treat the cases x ≥ 0 and x < 0 separately, in the definition. For example, to

verify (b): if x ≥ 0, |x| = x and −|x| ≤ 0 ≤ x, and if x < 0 then −|x| = x and |x| > 0 ≥ x.

We leave the other two properties as an exercise.

The following inequality is very important, and we will use it constantly.

3

Theorem 0.2 (Triangle Inequality). For any x, y ∈ R, |x+ y| ≤ |x|+ |y|.

Proof. First, notice that if both x, y = 0, or if x + y = 0, then the inequality is true, so we

can assume in the rest of the proof that both sides of the inequality are strictly positive.

First, notice that, by the definition of the absolute value, |x|2 = x2 ∀x ∈ R. Calculate the

square of the left term,

|x+ y|2 = (x+ y)2 = x2 + y2 + 2xy using the algebraic axioms

≤ x2 + y2 + |2xy| using property (b) of absolute value

= |x|2 + |y|2 + 2|x| |y| using property (c) of absolute value

= (|x|+ |y|)2,

so we have that |x + y|2 ≤ (|x| + |y|)2. Since a = |x + y| ≥ 0, b = |x| + |y| > 0, we apply

Lemma 0.1 to obtain the triangle inequality.

How about |x − y|, which measures the distance between x and y on the number line?

The absolute value does not behave in the same way for differences; after all, either x or y

could be negative!! We do have this interesting fact:

Theorem 0.3 (Reverse Triangle Inequality). For all x, y ∈ R, |x− y| ≥∣∣|x| − |y|∣∣.

Proof. Use the triangle inequality in this tricky way:

|x| =∣∣(x− y) + y

∣∣ ≤ |x− y|+ |y|.Therefore, |x − y| ≥ |x| − |y|. Now, if we switch the roles of x, y in the above line, we get

the opposite inequality, |x− y| = |y − x| ≥ |y| − |x|. (After all, who’s to say which number

is called x?) Then, by definition of absolute value,

∣∣|x| − |y|∣∣ =

{|x| − |y|, if |x| ≥ |y|,|y| − |x|, if |x| < |y|,

≤ |x− y|,

in either case.

4

1 The Completeness Axiom, Supremum and Infimum

Upper and Lower Bounds

We assume throughout that S ⊂ R, a proper, nonempty subset of the real numbers.

Definition 1.1. We say that S ⊂ R is bounded above if there exists v ∈ R with x ≤ v for

all x ∈ S. The number v is called an upper bound for the set S.

We say that S ⊂ R is bounded below if there exists t ∈ R with x ≥ t for all x ∈ S. The

number t is called an lower bound for the set S.

The set S is called bounded if it is both bounded above and below. If S is not bounded,

we say it is unbounded.

If S is bounded above or below it has infinitely many upper or lower bounds. In particular,

if v is an upper bound for S, any z > v is also an upper bound.

Example 1.2. (a) S1 = (0, 3] = {x ∈ R : 0 < x ≤ 3} is bounded above by v = 3 or π or

10000000. It is bounded below by t = 0 or −π or -9999999999999. It is a bounded set.

(b) S2 = N = {1, 2, 3, . . . } is bounded below by any t ≤ 1. It is unbounded above (although

we really need to prove this!) and is an unbounded set.

(c) S3 = {q ∈ Q : q2 ≤ 2} = [−√

2,√

2] ∩ Q is a bounded set. Any v ≥√

2 is an upper

bound, and any t ≤ −√

2 is a lower bound.

In each example, when there is an upper of lower bound there is a “best” choice, the

most efficient upper or lower bound, the unique one which is closest to the set S.

Definition 1.3. (A) If S ⊂ R is bounded above, we define its supremum u = supS to be

the least (smallest) upper bound of S:

(i) u is an upper bound for S; and

(ii) if v is any upper bound for S, u ≤ v.

(B) If S ⊂ R is bounded below, we define its infimum w = inf S to be the greatest

(largest) lower bound of S:

(i) w is a lower bound for S; and

(ii) if t is any lower bound for S, t ≤ w.

(C) If S is unbounded above, we write supS = +∞.

5

(D) If S is unbounded below, we write inf S = −∞.

Returning to Example 1.2, inf S1 = 0, supS1 = 3, inf S2 = 1, supS2 = +∞, inf S3 =

−√

2, and supS3 =√

2, inf S3 = −√

2. Notice that even though the set S3 ⊂ Q, its

supremum and infimum are irrational. Thus, if we base our number system on only the

rational numbers Q, then some sets would not have suprema or infima. This is the final

axiom we choose for the real numbers R:

Completeness Axiom:2 If S ⊂ R is any nonempty subset which is bounded above, then

u = supS exists as a real number.

One cannot prove that this is true from the other axioms; what is required is to demon-

strate that one can construct a set of objects which satisfies the Ordered Field, Metric, and

Completeness Axioms. This was done (separately) by Cantor and by Didekind, both in

1872 but by completely different procedures. This construction is mind-blowingly abstract

in both cases, and the details are not important to what we will do in this course, so we will

not present it.

Proposition 1.4. Let S ⊂ R and define the set T = {x | − x ∈ S}. If T is bounded below,

show that T is bounded above, and inf T = − supS.

Proof. First, we show that T is bounded above: since S is bounded below, it has a lower

bound t, with t ≤ x for all x ∈ S. For any y ∈ T , y = −x with x ∈ S, and therefore

y = −x ≥ −t, for all y ∈ S, that is (−t) is an upper bound for T , so T is bounded above.

Let u = supS. We need to show inf T = −u, that is, we need to show (−u) satisfies the

conditions B(i) and B(ii) from the definition. First we verify (i): for any x ∈ S, y = −x ∈ Tand since u is an upper bound for T , −x = y ≤ u, which is equivalent to x ≥ −u. Therefore

(−u) is a lower bound for S and B(i) holds.

Next we verify (ii): let t be any lower bound for S, so x ≥ t ∀x ∈ S. Equivalently,

y = −x ∈ T and y ≤ −t, so (−t) is an upper bound for T . Since u = supT is the smallest

upper bound, u ≤ −t and thus −u ≥ t. Hence, B(ii) is also satisfied with −u = inf S.

As a corollary to the Proposition, we see that the Completeness Axiom implies that every

nonempty S ⊂ R which is bounded below must have an infimum w = inf S ∈ R.

When using the definition of the sup and inf in proofs it will be convenient to have a

more concrete criterion for each. The first statement, that u = supS is an upper bound, is

easy to represent concretely:

(i) x ≤ u ∀x ∈ S.2Sometimes called the “Continuum Axiom”

6

The second condition may be written in different ways which are equivalent. If u is the

smallest upper bound for S, then any v v.

Furthermore, if v 0. And so it is also

equivalent to demand:

(ii)′′ ∀ε > 0 ∃x ∈ S with x > u− ε.

The same applies to the infimum:

Theorem 1.5. Assume S ⊂ R is bounded below. Then w = inf S if and only if:

(i) x ≥ w ∀x ∈ S,

and one of the following holds:

(ii)′ for any t > w, ∃x ∈ S with x < t; or

(ii)′′ for any ε > 0, ∃x ∈ S with x < w + ε.

Let’s use these!

Proposition 1.6. Assume S is bounded below, a > 0, and T = {ax : x ∈ S}. Then

inf T = a inf S.

Proof. Call w = inf S. By the defninition, we need to show that (aw) is a lower bound for

T , and that it is the largest lower bound for T .

First, for any y ∈ T , y = ax for some x ∈ S. Using part (i) of the definition of inf S,

(and a > 0,) we have y = ax ≥ aw, and so (aw) is a lower bound for T (and (i) is verified

for T .)

Second, we will verify (ii′) for T using (ii′) for S. Take any t > aw. Then, ta> w (since

a > 0). By (ii′) for inf S there exists x ∈ S with x < ta. Multiply by a and call y = ax ∈ T ,

with the property y = ax < a ta

= t. Therefore, t is not a lower bound for T , and by (ii′)

we’re done.

Proposition 1.7. Let S1, S2 ⊂ R be bounded below, and T = {x1 + x2 : x1 ∈ S1, x2 ∈ S2}.Then inf T = inf S1 + inf S2.

Proof. Call inf S1 = w1 and inf S2 = w2. We need to show that (w1 + w2) is a lower bound

for T , and that it is the largest lower bound for T .

7

First, for any y ∈ T we can find x1 ∈ S1 and x2 ∈ S2 with y = x1 + x2. Using property

(i) from the definition of the inf of S1 and S2, we have

y = x1 + x2 ≥ w1 + w2,

and so (w1 + w2) is a lower bound for T .

Second, we will verify (ii′′) for T . Given any ε > 0, we apply (ii′′) for S1 and S2, but with

ε/2 replacing ε: there exist x1 ∈ S1 and x2 ∈ S2 so that x1 < w1 + ε2

and x2 < w2 + ε2. Since

x1 + x2 = y ∈ T , with

y = x1 + x2 < w1 +ε

2+ w2 +

ε

2= (w1 + w2) + ε,

we have verified (ii′′) for T , and therefore by definition we have proven the Proposition.

8

The Rationals are Dense in the Reals

The rationals Q are a proper subset of the reals R: irrational numbers are like “holes” in Q.

However, we don’t worry about this too much, since we have learned that irrational numbers

can always be approximated to arbitrary precision by rational numbers. We say that Q is

“dense” in R because of this relationship. There are several equivalent ways to state the

density of the rationals Q in the reals R. The one we prove here is that between any two

real numbers there is at least one rational number.

Theorem 1.8 (Density Theorem). For any a, b ∈ R with a < b, there exists a rational

number r = mn

(m ∈ Z, n ∈ N) with a < r < b.

This is another way of saying that any real number can be approximated by rational

numbers, to arbitrary precision. Indeed, for any x ∈ R, consider a = x − 10−k and b =

x + 10−k, for some k ∈ N. Then applying the Density Theorem to the interval (a, b), we

obtain r ∈ Q with x− 10−k < r < x+ 10−k, which is equivalent to |r− x| < 10−k, that is, r

agrees with x up to the kth decimal place.

How about the irrational numbers? They’re dense in the real numbers too!

Corollary 1.9. For any a, b ∈ R with a < b, there exists an irrational number s with

a < s < b.

To prove the density of the irrationals, apply the Density Theorem to the interval ( a√2, a√

2),

to get a rational number r ∈ Q with a < r√

2 < b. Since the product of a rational number

with an irrational number is always irrational (prove it!), the Corollary is verified with

s = r√

2.

We require two basic facts to prove the Density Theorem. These may seem obvious, but

can be proven using sup and inf.

Theorem 1.10 (Archimedean Principle). For any x ∈ R there exists n ∈ N with x < n.

In other words, the set N of natural numbers is not bounded from above.

Proof. To derive a contradiction, we assume that there is no such n ∈ N, that is, n ≤ x for

all n ∈ N. So x is an upper bound for the set S = N, and by the Completeness Theorem

there exists a supremum, u = supN.

By the two properties of the sup, u is an upper bound for N,

u ≥ n for all n ∈ N, (1.1)

9

and u− 1 is not an upper bound for N, so there exists m ∈ N with

u− 1 < m.

But then, u < m+ 1, and since (m+ 1) ∈ N, this contradicts (1.1). Thus, there must be an

n ∈ N with x < n.

Example 1.11. Let S ={

1n

: n ∈ N}

. Then inf S = 0.

First, ∀n ∈ N, 1n> 0, so w = 0 is a lower bound. Second, for any t > 0, we apply the

Archimedean Property to x = 1/t to conclude that there exists n ∈ N with n > 1t. Hence,

1n< t, and t > 0 is not a lower bound for S. By definition, inf S = 0.

Theorem 1.12. Let S ⊂ Z be nonempty and bounded below. Then S contains a minimal

element: there exists m ∈ S with m ≤ n for all n ∈ S.

Proof. Since S is bounded below, by the Completeness Axiom there exists w = inf S. If

w ∈ S then we are done. So assume w 6∈ S, in order to derive a contradiction. We apply

the two properties of the inf: first, w ≤ n for all n ∈ S. Since w 6∈ S, in fact we have the

slightly stronger

w < n for all n ∈ S,

and w + 1 is not a lower bound for S, so there exists k ∈ S ⊂ Z for which

k < w + 1.

Since S has no minimum element, k is not the smallest element of S, so there must be j ∈ Swith j < k. Putting these all together,

w < j < k < w + 1.

In particular, 0 < k − j < (w + 1) − j < (w + 1) − w = 1, but k − j ∈ Z and the distance

between any two integers is at least one! So this is impossible, and we conclude that S has

a minimum element.

Applied to subsets of the natural numbers this property is called the “well-ordering

principle”:

Corollary 1.13 (Well-Ordering Principle). Any nonempty subset of N has a minimum ele-

ment.

The Corollary follows from the previous Theorem, since N is bounded below (by 1), and

so any nonempty subset of N is bounded below.

We’re now ready to prove the Density Theorem.

10

Proof of the Density Theorem. First we choose the denominator of r = m/n. By the Archimedean

Property, there exists n ∈ N with n > 1(b−a) , that is

1

n< (b− a). (1.2)

To find the numberator, consider the set S = {k ∈ Z : k > na}. Since S ⊂ Z and S is

bounded below by na, by Theorem 1.12 S contains a minimal element m, and so m ∈ S and

m− 1 6∈ S. This implies:

m > na and m− 1 ≤ na. (1.3)

Putting (1.2) and (1.3) together we get:

a <m

n=m− 1

n+

1

n≤ a+

1

n< a+ (b− a) = b.

Then the conclusion holds with r = m/n.

11

2 Sequences

A sequence of real numbers is a function f : N → R; for each counting number n ∈ Nwe associate to it a real number f(n) = xn. There are many different ways of denoting the

sequence. Here are a few which you may see:

(x1, x2, x3, . . . ) = (xn) = (xn)n∈N = (xn : n ∈ N) .

The text also writes X = (xn), using a single capital letter for the sequence as a whole.

Example 2.1. (a) The function f can be explicitly given. For instance, xn = 2n2+1n2 ,(

2n2 + 1

n2

)n∈N

=

(3,

9

4,19

9,33

16, . . . ,

).

(b) (xn) = ( (−1)n )n∈N = (−1, 1,−1, 1,−1, . . . ).

(c) We may define sequences by iteration. Let g : R→ R be a given real-valued function.

Choose an initial value x1 ∈ R and then define the sequence iteratively,

xn+1 = g (xn) , n = 1, 2, 3, . . .

This is a natural way to define sequences, for example to approximate solutions to equa-

tions. For a more specific example, take g(x) = 12

(x+ 2

x

), and generate the sequence

(xn) by iteration:

x1 = 2 and xn+1 =1

2

(xn +

2

xn

), n = 1, 2, 3, . . . .

The first few values are:

(xn) = (2, 1.5, 1.416, 1.414215686 . . . , 1.414213562 . . . , . . . )

Later, we will prove that this sequence converges to√

2.

Note that a sequence is not the same thing as a set. A sequence is an infinite ordered

list of numbers. In a sequence the same number may appear several times, and changing

the order of the elements of a sequence creates an entirely different sequence. A set has no

order, and there is no point in repeating the same value several times. Taking Example (b)

above, {xn : n ∈ N} = {−1,+1} is a set with two elements. The sequence (xn) is not the

same thing. If we let {yn}n∈N = (−1,−1, 1, 1, −1,−1, 1, 1, . . . ), this is a different sequence

than {xn}n∈N, yet it takes values in the same set {yn : n ∈ N} = {−1,+1}.

12

2.1 Limits of Sequences

Definition 2.2. We say that the sequence (xn) converges to x ∈ R if:{for every ε > 0 there exists K ∈ N so that

|xn − x| < ε for every n ≥ K.

We write x = limn→∞

xn, or xn −−−→n→∞

x as n→∞.

If there is no value of x for which xn converges to x, we say (xn) diverges.

Example 2.1 (a) converges to x = 2, xn = 2n2+1n2 −−−→

n→∞2 as n→∞. Let’s verify this via

the definition: let ε > 0 be given. We calculate

|xn − 2| =∣∣∣∣2n2 + 1

n2− 2

∣∣∣∣ =

∣∣∣∣(2n2 + 1)− 2n2

n2

∣∣∣∣ =

∣∣∣∣ 1

n2

∣∣∣∣ =1

n2. (2.1)

We need to determine when the right-hand side 1/n2 < ε. This is true when n > 1/√ε.

By the Archimedean property, we may choose K ∈ N with K > 1/√ε, so for all n ≥ K,

n ≥ K > 1/√ε and so 1

n2 < ε for n ≥ K. Plugging back into (2.1) we have

|xn − 2| = 1

n2< ε ∀n ≥ K,

and so xn −−−→n→∞

2 by definition.

Remark 2.3. By a Practice Problem, the inequality condition |xn − x| < ε is equivalent to

two inequalities, above and below,

x− ε < xn < x+ ε.

Another more geometrical way to write this is in terms of open intervals,

xn ∈ (x− ε , x+ ε) .

The interval (x− ε , x+ ε) is called an open neighborhood of size ε centered at x. So

xn −−−→n→∞

x is equivalent to saying that the elements of the sequence xn ∈ (x− ε , x+ ε)

eventually always, that is, for all n ≥ N(ε).

We verify the following properties of the limit for sequences:

Theorem 2.4. Assume the sequence (xn)n∈N is convergent, x = limn→∞

xn.

(a) The limit is unique: if limn→∞

xn = y, then y = x.

(b) (xn)n∈N is bounded; that is, ∃ M ∈ R with |xn| ≤M , ∀n ∈ N.

13

(c) If xn ≥ 0 ∀n ∈ N, then x ≥ 0.

(d) If a ≤ xn ≤ b ∀n ∈ N, then a ≤ x ≤ b.

Be careful: a bounded sequence may not be convergent! Note also that if we have strict

inequality in any of the above, for instance, if xn > 0 ∀n ∈ N, we may not conclude that

x > 0. (Exercise: find examples of sequences which which demonstrate these two remarks!)

Proof. For (a), suppose that there are two different limits, y 6= x. We apply the definition

of convergence twice: for any ε > 0, there exist two numbers N1 = N1(ε) ∈ N and N2 =

N2(ε) ∈ N, so that

|xn − x| < ε ∀ n ≥ N1 and |xn − y| < ε ∀ n ≥ N2.

By taking N = max{N1, N2}, then both of the above conditions hold,

|xn − x| < ε and |xn − y| < ε ∀n ≥ N.

This is true for any ε > 0, and here we will choose ε = |y − x|/2. Then, using the Triangle

Inequality trick,

|x− y| = |(x− xn) + (xn − y)| ≤ |xn − x|+ |xn − y| < 2ε = |x− y|, ∀n ≥ N,

which is impossible (because of the strict inequality.) This proves (a).

For (b), Apply the definition of limit with ε = 1, so ∃N ∈ N for which |xn − x| < 1

whenever n ≥ N . Then, by the triangle inequality,

|xn| = |(xn − x) + x| ≤ |xn − x|+ |x| < 1 + |x|, ∀n ≥ N.

This gives a bound on the absolute value of all but finitely many of the xn; for the finite

collection xn, . . . , xN−1, one of these has the largest absolute value, and for every n ∈ N,

|xn| ≤ max {|x1|, |x2|, . . . , |xN−1|, 1 + |x|} .

The right hand side is a number independent of n, and so the sequence {xn} is bounded.

For (c), argue by contradiction and suppose that the limit x < 0. Let ε = |x| > 0, and

apply the definition of limit: ∃N ∈ N for which |xn − x| < ε, whenever n ≥ N . Unfolding

the absolute value into two inequalities, we have

x− ε < xn < x+ ε ∀n ≥ N,

but we are only interested in a lower bound on x, and so we keep

0 ≤ xn < x+ ε = x+ |x| = 0 ∀n ≥ N,

14

since we are assuming xn ≥ 0 but x < 0. This is impossible (strict inequality!) and hence

x ≥ 0.

For (d), let yn = xn − a, and so by hypothesis yn ≥ 0 ∀n ∈ N. Since

|yn − (x− a)| = |(xn − a)− (x− a)| = |xn − a|,

by the definition of the limit yn −−−→n→∞

(x− a). Using part (c) we conclude that (x− a) ≥ 0,

in other words, x ≥ a. To verify that x ≤ b, define the sequence zn = (bn − xn) ≥ 0, and

argue as above. (Exercise!)

Here’s an old friend from calculus:

Theorem 2.5 (Squeeze Theorem). Suppose (xn)n∈N, (yn)n∈N, and (zn)n∈N are sequences in

R with:

(i) xn ≤ yn ≤ zn ∀n ∈ N; and

(ii) (xn)n∈N, (zn)n∈N are convergent, with

limn→∞

xn = L = limn→∞

zn.

Then (yn)n∈N is convergent, and limn→∞

yn = L.

Proof. This is a very simple consequence of the definition of convergence. As in the proof of

(a) above, for any ε > 0 there exists a single value N = N(ε) ∈ N for which both

|xn − L| < ε and |zn − L| < ε, ∀n ≥ N.

Using a Practice Problem from Chapter 2 in the book, the inequality with the absolute value

may be written as a double inequality,

L− ε < xn < L+ ε and L− ε < zn < L+ ε, ∀n ≥ N.

Using only the red parts of the inequalities, we have:

L− ε < xn ≤ yn ≤ zn< L+ ε, ∀n ≥ N.

Reading only the blue parts, L − ε < yn < L + ε, ∀n ≥ N , which is equivalent to yn −−−→n→∞

L.

Remark 2.6. We can also think of convergence to a limit in terms of the limit of the distance

from xn to the limit value x. That is, call this distance rn = |xn−x|, ∀n ∈ N. Then, rn ≥ 0,

and so the limit exists if and only if: ∀ε > 0 there exists N ∈ N so that

0 ≤ rn = |xn − x| < ε, ∀n ≥ N.

Notice that this statement is equivalent to limn→∞

rn = 0.

15

Theorem 2.7. Assume (xn)n∈N and (yn)n∈N are convergent, xn −−−→n→∞

x and yn −−−→n→∞

y.

Then each of the following combinations are convergent:

(a) xn + yn −−−→n→∞

x+ y;

(b) xn yn −−−→n→∞

x y;

(c) If y 6= 0,xnyn−−−→n→∞

x

y.

The proof of the above Theorem may be found in section 3.2 of the textbook. To give

you an idea of how to do it (Triangle Inequality!), consider (b): write

|xnyn − xy| = |xnyn − xny + xny − xy| = |xn(yn − y) + (xn − x)y|

≤ |xn| |yn − y|+ |xn − x| |y|.

By Theorem 2.4 (b), convergent sequences are bounded so there exists M ∈ R such that

|xn| ≤M . Hence,

|xnyn − xy| ≤M |yn − y|+ |y| |xn − x|. (2.2)

(Note that |y| is a constant!) Now we can either proceed by the definition (for any ε > 0,

choose N = N(ε) such that both

M |yn − y| <ε

2and |y| |xn − x| <

ε

2,

∀n ≥ N ,) or by using Remark 2.6 observing that the right-hand side of (2.2) tends to zero.

and substitute above. The others may be proven in a similar way.

Example 2.8. We will show that limn→∞

n1/n = 1. First, note that for n ≥ 2, n1/n > 1. So we

can write

n1/n = 1 + bn, with bn > 0.

Hence, taking the nth power and applying the Binomial Theorem,

(a+ b)n =n∑k=0

(n

k

)an−kbk,

we have:

n = (1 + bn)n

=n∑k=0

(n

k

)bkn

> 1 +n(n− 1)

2b2n.

16

The inequality in the last line is valid since the terms omitted from the sum are all positive,

and hence the last line is smaller than the sum above it.

Rearranging, we have (n− 1) > n(n−1)2

b2n, and hence 0 < bn <√

2n, ∀n ≥ 2. Recalling the

definition of bn,

1 < n1/n = 1 + bn < 1 +

√2

n

for all n ≥ 2. By the Squeeze Theorem, we obtain the desired limit.

17

2.2 Monotone Sequences

Definition 2.9. Let (xn)n∈N be a sequence in R.

(i) The sequence is monotone increasing if x1 ≤ x2 ≤ x3 . . . , ie, if xn ≤ xn+1 ∀n ∈ N.

(ii) The sequence is monotone decreasing if x1 ≥ x2 ≥ x3 . . . , ie, if xn ≥ xn+1 ∀n ∈ N.

(iii) The sequence is bounded above if ∃M ∈ R with xn ≤M ∀n ∈ N.

(iv) The sequence is bounded below if ∃m ∈ R with xn ≥ m ∀n ∈ N.

If the sequence is both bounded above and bounded below, then it is bounded in the

sense of Theorem 2.4 (b). (Check it!) If (a) or (b) hold with strict inequalities, ie, xn < xn+1

∀n ∈ N, we say the sequence is strictly monotone increasing.

Example 2.10. Define a sequence by iteration:

x1 = 0, xn+1 =√

5 + xn, n ∈ N.

So

X = (xn)n∈N =

(0,√

5,

√5 +√

5,

√5 +

√5 +√

5, . . .

).

Typically with a sequence defined via iteration it’s not possible to have a simple explicit

formula for xn; to get x24 you first need to calculate the first 23 values.

We first claim that 0 ≤ xn < 5 ∀n ∈ N. To verify this we use induction. First, when

n = 1,0 = x1 < 5, so the claim is true for n = 1. Next, assume that the claim is true for xn,

and show it must hold for xn+1. Indeed, if 0 ≤ xn < 5, then xn+1 =√

5 + xn > 0 (square

roots are positive), and

xn+1 =√

5 + xn <√

5 + 5 =√

10 <√

25 = 5,

so the claim must be true for all n.

In particular, by the claim, (xn)n∈N is bounded above and below.

We also claim that (xn) is monotone increasing, xn ≤ xn+1 for all n. Again we use

induction: when n = 1, x1 = 0 < 5 = x2, so it’s true. Assuming xn−1 ≤ xn, we calculate

xn+1 − xn =√

5 + xn −√

5 + xn−1

=(5 + xn)− (5 + xn−1)√

5 + xn +√

5 + xn−1

[using a− b =

a2 − b2

a+ b

]=

xn − xn−1√5 + xn +

√5 + xn−1

≥ 0,

by the assumption xn ≥ xn−1 (and since the denominator is positive.) Therefore xn+1 ≥ xn

holds for all n.

18

Now, a sequence which is monotone increasing and bounded above is hemmed in, and has

nowhere to go. It must always move to the right, but can never get past the upper bound;

in this case, xn ≤ xn+1 ≤ 5 for all n ∈ N. So the values have to get compressed together as

n increases, that is, the sequence must converge:

Theorem 2.11 (Monotone Sequence Theorem). (a) If (xn)n∈N is monotone increasing and

bounded above, then (xn)n∈N is convergent; ∃x ∈ R with xn −−−→n→∞

x.

(b) If (xn)n∈N is monotone decreasing and bounded below, then (xn)n∈N is convergent; ∃x ∈R with xn −−−→

n→∞x.

Returning to Example 2.10, the sequence (xn) defined there is monotone increasing and

bounded above, so by the Montone Sequence Theorem it is convergent, that is xn −−−→n→∞

x.

But what is the limit x? We can find an equation for it by passing to the limit in the iteration

equation, xn+1 =√

5 + xn ∀n ∈ N. This implies:

xn+1 =√

5 + xn, ∀n ∈ N =⇒ x2 = limn→∞

x2n+1 = limn→∞

(5 + xn) = 5 + x,

and so x is a solution of the equation x2−x−5 = 0. By the quadratic formula, x = 12± 1

2

√21

are the roots of the polynomial. Since xn > 0 ∀n, by Theorem 2.4 the limit x ≥ 0, and we

conclude that x = 12(1 +

√21) = limn→∞ xn.

If the terms “bounded above” and “bounded below” sound familiar from supremum and

infimum, it’s no accident. For a monotone increasing sequence, we will show below that

limn→∞

xn = sup{xn | n ∈ N}, that is, we define the set of numbers included in the sequence,

S = {xn | n ∈ N}, and

limn→∞

xn = supS = sup{xn | n ∈ N}.

Often, this is written as supn∈N

xn for convenience. Similarly, if the sequence is monotone

decreasing,

limn→∞

xn = infn∈N

xn = infn∈N{xn | n ∈ N}.

The Monotone Sequence Theorem is logically equivalent to the Competeness Theorem

(the existence of the supremum for sets which are bounded above.) In other words, we could

have chosen to take the Monotone Sequence Theorem as an axiom for the completeness of

R, and used it to prove that every set S which is bounded above has a supremum. To really

prove one or the other is to “construct R from Q”, and come to grips with what kind of

beast R really is. But let’s show that they really are equivalent!

19

Completeness Theorem =⇒ Monotone Sequence Theorem. Let (xn)n∈N be monotone increas-

ing and bounded above, and define (as above) the set

S = {xn | n ∈ N}.

Then the set S is bounded above, and by the Completeness Theorem there exists a supremum

x ∈ R, x = supS. We need to show that xn −−−→n→∞

x.

First, x is an upper bound for S, so xn ≤ x ∀n ∈ N. It is the smallest upper bound for

S, so given any ε > 0 there exists an element xK ∈ S (so ∃K ∈ N,) with x− ε < xK . Since

the sequence is monotone increasing, xK ≤ xn ∀n ≥ K. Putting these inequalities together,

x− ε < xK ≤ xn ≤ x < x+ ε, ∀n ≥ K,

and therefore xn −−−→n→∞

x.

Monotone Sequence Theorem =⇒ Completeness Theorem. This one is trickier, but it’s more

fun! Let S ⊂ R be a nonempty set which is bounded above. The idea is to define two se-

quences, xn ∈ S and a sequence of upper bounds vn for S, each of which converges monoton-

ically to supS. Start by taking any element x1 ∈ S and any upper bound v1. Then x1 ≤ v1,

and if x1 = v1 then they must agree with the supremum. (We had a practice problem like

that!) Suppose they’re not the same, and let r1 = v1 − x1 > 0.

Now look at the midpoint between these points, y1 = 12(x1 + v1). If y1 is an upper bound

for S, we define the second point in each sequence by v2 = y1 and x2 = x1. Since v2 = y1 is

the midpoint, the distance between x2 and v2 is half of what it was, r2 = v2 − x2 = 12r1. On

the other hand, if y1 is not an upper bound for S, then there exists x2 ∈ S with x1 < y1 < x2.

In this case, we keep the old upper bound, v2 = v1. In choosing x2 > y1 the interval (x2, v2)

has shrunk by at least half, so r2 = v2 − x2 ≤ 12r1. So in either case, we have found x2 ∈ S

and upper bound v2 so that

x1 ≤ x2 ≤ v2 ≤ v1, and r2 = v2 − x1 ≤1

2r1.

Now proceed by iteration. Assume we have already found x1, . . . , xn ∈ S and upper

bounds v1, v2, . . . , vn with

x1 ≤ x2 ≤ · · · ≤ xn−1 ≤ xn ≤ vn ≤ vn−1 ≤ · · · ≤ v2 ≤ v1,

and rn = vn − xn ≤ 2−n+1r1. Following the above procedure, we ask if the midpoint

yn = 12(xn + vn) is an upper bound for S or not. If it is, we keep xn+1 = xn but swap

vn+1 = yn. If not, we find xn+1 ∈ S with xn+1 > yn > xn and keep the upper bound

vn+1 = vn. The distance between them rn+1 ≤ 12rn ≤ 2−nr1 −−−→

n→∞0 as n→∞.

20

Since (xn) is monotone increasing and bounded above (by v1,) by the Monotone Sequence

Theorem it converges, xn −−−→n→∞

x. Similarly, (vn) is monotone decreasing and bounded below

(by x1,) so it also converges, vn −−−→n→∞

v. Since 0 ≤ vn − xn ≤ rn −−−→n→∞

0, by the Squeeze

Theorem we must have x = v. So we only need to show that v = supS.

First, take any y ∈ S. Since each vn is an upper bound for S, y ≤ vn ∀n ∈ N. By

Theorem 2.4 (d), y ≤ v, and so v is an upper bound for S. Now, for any ε > 0, since

xn → v there exists K ∈ N with v − ε < xK < v + ε. In particular, xK ∈ S, and so the first

inequality shows that v − ε is not an upper bound for S. We conclude that v = supS, and

so the supremum exists.

As a corollary of the above construction, we obtain an interesting and clarifying fact

about the supremum and infimum.

Theorem 2.12. Let S ⊂ R be a nonempty set, bounded above. Then there exists a sequence

(xn) with xn ∈ S ∀n, monotone increasing, for which xn −−−→n→∞

supS.

Of course, the same applies to the infimum, except that the sequence will be monotone

decreasing.

Another interesting application (done in class) is constructing sequences which converge

monotonically to√a, for any a > 0. See Example 3.3.5 in the text.

Exercise 2.13. We return to Example 2.1 (c),

x1 = 2 and xn+1 =1

2

(xn +

2

xn

), n = 1, 2, 3, . . . .

to show limn→∞

xn =√

2.

(a) Use induction to show that x2n − 2 ≥ 0 for all n ∈ N.

(b) Use induction to show that (xn) is monotone decreasing in n.

(c) Use the Monotone Sequence Theorem to show convergence, xn −−−→n→∞

x, and identify x

as the solution of a polynomial equation.

About Induction. We hope that you’ve already seen mathematical induction somewhere

else before. But we remind you of it here, and in the spirit of healthy skepticism about all

things, (which we encourage in studying math,) we show that it isn’t magic.

Proposition 2.14. For each n ∈ N, let P (n) denote some statement involving the value of

n. If we can show both:

21

(1) P (1) is true; and

(2) For every n ∈ N, if P (n) is assumed to be true then P (n+ 1) is true,

then P (n) is true for all n ∈ N.

Proof. Let S = {n ∈ N | P (n) is false.}. To derive a contradiction, we suppose that S is not

empty. By the Well Ordering Principle (Corollary 1.13), S has a minimal element. Call the

minimal element (k + 1), so (k + 1) ∈ S ⊂ N but for any n ≤ k, n 6∈ S. Therefore, P (k) is

true. But, by (2), P (k) true implies P (k+1) is also true, and this contradicts (k+1) ∈ S.

From the proof, we can see that the following version of induction is also verified:

If we can show both:

(1) P (1) is true; and

(2) For every k ∈ N, if P (n) is assumed to be true for all n ≤ k, then P (k + 1) is true,

then P (n) is true for all n ∈ N.

Sometimes this form of induction is needed (for example, for sequences defined by iteration

involving several previous terms.) This is discussed in section 1.2 of the textbook.

22

2.3 Divergent Sequences and Subsequences

Although we prefer sequences which converge, there are also many interesting things to learn

about divergent sequences. Sequences can diverge in various ways, some more interesting

than others.

The simplest kind of divergence is called proper divergence in the book.

Definition 2.15 (Properly divergent sequences). Let (xn)n∈N be a sequence in R.

(a) We say the sequence properly diverges to +∞ if:

∀a > 0 ∃ K ∈ N so that xn > a ∀n ≥ K.

We write limn→∞

xn = +∞ or xn −−−→n→∞

+∞, even though the sequence does not converge.

(b) We say the sequence properly diverges to −∞ if:

∀b < 0 ∃ K ∈ N so that xn 0, xn is eventually always

larger than a.

Example 2.16. Let xn =√n(3 + sin(n)), n ∈ N. Then xn −−−→

n→∞+∞.

Choose any a > 0. Since sin(n) ≥ −1 for any n, we have

xn ≥√n(3− 1) = 2

√n > a

provided n > a2

4. So, take K ∈ N with K > a2

4. If n ≥ K > a2

4, by the above calculation,

xn > a, and by definition xn −−−→n→∞

+∞.

Proposition 2.17. If (xn)n∈N is monotone increasing and not bounded above, then xn −−−→n→∞

+∞ (properly divergent).

If (xn)n∈N is monotone decreasing and not bounded below, then xn −−−→n→∞

−∞ (properly

divergent).

Proof. Take any a > 0. If (xn)n∈N is unbounded above, then in particular a is not an upper

bound for the sequence, so there exists K ∈ N so that a < xK . Since the sequence is

monotone increasing, xK ≤ xn for all n ≥ K, and therefore

a < xK < xn ∀ n ≥ K.

By definition, xn −−−→n→∞

+∞.

The second statement is similar, and is left as an exercise.

23

Example 2.18. Let r > 1 and define the sequence xn = rn, n ∈ N. Then, xn+1 = rn+1 =

r xn > xn ∀n ∈ N, so (xn)n∈N is (strictly) monotone increasing. In particular, notice that

xn > 1 ∀n.

We claim that the sequence is not bounded above, in which case we can apply Propo-

sition 2.17 to conclude xn −−−→n→∞

+∞. We argue by contradiction, and suppose that (xn)

is bounded above. In that case, by the Monotone Sequence Theorem, xn = rn −−−→n→∞

x for

some x ∈ R. But we pass to the limit in the equation xn+1 = r xn, to obtain the equation

x = r x. Since r > 1, we must have x = 0. But, by Theorem 2.4, x ≥ 1, a contradiction.

Therefore, the sequence (xn) is unbounded, and Proposition 2.17 applies.

If the sequence isn’t monotone the variety of divergent behavior is much greater.

Example 2.19. Define the sequence

xn = n sin[nπ

2

]=

0, if n = 2k, k ∈ N (even),

n, if n = 4k − 3, k ∈ N,

−n, if n = 4k − 1, k ∈ N,

= (1, 0,−3, 0, 5, 0,−7, 0, . . . ).

The sequence is unbounded above and unbounded below. It must therefore diverge (The-

orem 2.4!) but it does not properly diverge to either ±∞. Instead, it breaks down into

pieces, each of which has different limiting behavior. We call those parts of the sequence

subsequences.

Definition 2.20. (a) Let X = (xn)n∈N be a sequence in R, and let (nk)k∈N be a strictly

increasing sequence of counting numbers: nk ∈ N and n1 < n2 < n3 < . . . . We call the

new sequence (xnk)k∈N a subsequence of the original sequence X.

(b) If (xnk)k∈N is a subsequence of X = (xn)n∈N which converges, and y = lim

k→∞xnk

, we

call y a subsequential limit point of the sequence X.

[Often we will drop “subsequential” and simply call y a “limit point” of the sequence.

Other books use the term cluster point.]

So a subsequence is a sequence which is extracted from the original sequence. It must be

itself a sequence (an infinite ordered list) and it must take elements from the original sequence

in the same order as they originally appeared. You can think of making a subsequence by

eliminating undesirable elements from X, but still leaving an infinite number.

In the previous example, there are three natural choice of subsequences. First, the

even elements, indexed by nk = 2k, k ∈ N are x2k = 0. This subsequence is convergent,

0 = limk→∞

x2k, and so y = 0 is a subsequential limit point of the original sequence.

24

Another interesting subsequence is defined by the indices nk = 4k − 3, k ∈ N, with

x4k−3 = nk = 4k − 3. This subsequence is properly divergent to +∞. The third distinct

subsequence is defined by the choice nk = 4k− 1, and so x4k−1 = −(4k− 1) −→ −∞ is also

a properly divergent subsequence. We do not call ±∞ subsequential limit points, as they

are not real numbers.

Exercise 2.21. Consider the following crazy looking sequence:

(xn)n∈N =

(1

2,

1

3,2

3,

1

4,2

4,3

4,

1

5,2

5,3

5,4

5,

1

6,2

6,3

6,4

6,5

6, . . . . . .

)[Notice that each xn ∈ (0, 1), and every rational number between 0 and 1 is somewhere on

the list!] What is the set of all possible subsequential limit points of (xn)?

What if the sequence is actually convergent: do we learn anything more by looking at

subsequences? NO:

Theorem 2.22. If xn −→ x (is convergent), then every subsequence (xnk)k∈N converges (as

k →∞) to x also.

This statement almost does not require proof. For, if xn −→ x, then by definition, ∀ε > 0

∃K ∈ N so that |x − xn| < ε ∀n ≥ K. The subsequence consists of elements of (xn), so

whenever k is large enough that nk ≥ K we have |x − xnk| < ε, and so the subsequence

(xnk)k∈N converges to x too.

The contrapositive of Theorem 2.22 is useful to determine divergence of sequences:

Corollary 2.23. If (xn)n∈N has two different subsequential limits (ie, it contains two subse-

quences which converge to distinct values,) then the sequence is divergent.

Example 2.24. The sequence xn = (−1)n is divergent, since it has two distinct subsequential

limits. The odd subsequence x2k−1 −→ −1 as k → ∞, and the even subsequence x2k −−−→k→∞

+1 as k →∞. By Corollary 2.23 we conclude that the whole sequence must be divergent.

If a sequence diverges, must it always have (subsequential) limit points? A properly

divergent sequence has no limit points (why?) so we need to restrict ourselves a bit to get a

positive answer:

Theorem 2.25 (Bolzano-Weierstrass Theorem). Let (xn)n∈N be a bounded sequence in R.

Then (xn)n∈N contains a convergent subsequence.

More precisely, if xn ∈ [a, b] ∀n ∈ N, then there exists y ∈ [a, b] and a subsequence

(xnk)k∈N with lim

k→∞xnk

= y.

25

There are several different proofs of this important theorem. We will use one based on

this interesting fact about any sequence of real numbers:

Lemma 2.26. Every sequence (xn)n∈N contains a monotone subsequence.

The subsequence we find might be monotone increasing or monotone decreasing; it de-

pends on the sequence (xn)n∈N.

Proof of Lemma 2.26. Given a sequence (xn) of real numbers, we define the set of its “peaks”,

P = {k ∈ N | xk ≥ xn ∀n ≥ k}.

The idea is that if we graph the sequence in the plane with points (n, xn) and connect the

dots, then we get a mountain range, and standing at the peak points our view to the right

is not blocked by the rest of the sequence. Notice that for a monotone decreasing sequence,

every n is a peak, while for a strictly increasing sequence, no n is a peak.

If P is an infinite set, then it is an infinite subset of N and can be written as a sequence,

(nj)j∈N, that is the set

P = {n1 < n2 < n3 < · · · } = {nj | j ∈ N}.

Hence, it defines a subsequence (xnj)j∈N, and each is a peak, so xnj

≥ xnj+1∀j ∈ N, is a

monotone decreasing sequence.

If P is not an infinite set (ie, it has only finitely many elements,) then it has a largest

element; call it K. Then, n1 = K + 1 is not a peak, so there exists n2 > n1 for which

xn2 > xn1 . Since n2 > K it is not a peak either, so ∃n3 > n2 > n1 with xn3 > xn2 > xn1 .

Continuing in this way, we construct a strictly monotone increasing subsequence.

The proof of the Bolzano-Weierstrass Theorem is then very easy:

Proof of B-W. Assume (xn)n∈N is a bounded sequence, so there exists M ∈ R with |xn| ≤M

∀n ∈ N. By Lemma 2.26 it contains a monotone subsequence (xnk)k∈N (either increasing

or decreasing.) As the whole sequence is bounded, so is the subsequence, |xnk| ≤ M ∀k ∈

N. Thus, by the Monotone Sequence Theorem the subsequence converges: ∃x ∈ R with

limk→∞ xnk= x. If a ≤ xn ≤ b ∀n ∈ N, then by Theorem 2.4 we must have the limit

a ≤ x ≤ b also.

Finally, for a divergent sequence (xn) we distinguish the two most important subsequen-

tial limits:

26

Definition 2.27. If X = (xn) is a bounded sequence, let S be the set of all subsequential

limits of X. The limit superior of X is the supremum of this set, and the limit inferior is

its infimum,

lim supn→∞

xn = supS, lim infn→∞

xn = inf S.

If S is unbounded above, we define lim supn→∞ xn = +∞, and if S is unbounded below, define

lim infn→∞ xn = −∞.

Example 2.28. a. (xn) =(2n cosnπ2n−1

)= (−2,+4

3,−6

5,+8

7,−10

9, . . . )

The odd subsequence x2k−1 → −1, which is a subsequential limit point. The even subsequence

also converges, x2k → +1, another limit point. In fact, these are the only two, so S =

{−1,+1} is the set of all limit points. Therefore, lim supn→∞ xn = +1, and lim infn→∞ xn =

11. Notice that inf{xn | n ∈ N} = −2 which is not the same as the liminf!

b. (xn)n∈N = (2n(−1)n) = (1

2, 4, 1

8, 16, 1

32, 64, . . . ).

Verify yourself that there is an unbounded subsequence, and a subsequence converging to 0,

so lim infn→∞ xn = +∞ while lim infn→∞ xn = 0.

Proposition 2.29. Let (xn)n∈N be a bounded sequence. Then, lim infn→∞ xn ≤ lim supn→∞ xn,

and they are equal if and only if (xn)n∈N converges.

Proof. The infimum of any set is smaller or equal to the supremum, so lim infn→∞ xn ≤lim supn→∞ xn.

If (xn)n∈N is convergent to x, then x is its only limit point (remember, all subsequences

of a convergent sequence converge to the same limit,) and so S = {x} has supS = x = inf S.

To prove the converse, suppose lim infn→∞ xn = lim supn→∞ xn = L but (xn)n∈N diverges.

Since L is not the limit of (xn)n∈N, we have:

∃ε > 0 so that ∀N ∈ N there exists n ≥ N with |xn − L| ≥ ε.

Let’s use this to construct a subsequence! First, take N = 1: there exists n1 ≥ 1 with

|xn1 − L| ≥ ε. Next, take N = n1 + 1: there exists n2 ≥ n1 + 1 with |xn2 − L| ≥ ε. Then,

take N = n2 + 1: there exists n3 ≥ n2 + 1 with |xn3 − L| ≥ ε. Continue like this forever.

You get indices n1 < n2 < n3 < · · · and a subsequence (xnk)k∈N with |xnk

− L| > ε, that is,

the subsequence does not converge to L.

Since the original sequence was bounded, so is the subsequence. But the Bolzano-

Weierstrass Theorem says that any bounded sequences contains a convergent subsequence,

so (xnk)k∈N has a further subsequence which must converge, to a limit point y =6= L. But

we are assuming that there is only one limit point, and so this is a contradiction.

27

There are formulas for liminf and limsup (which are in the book, but I will not prove

them here):

Lemma 2.30. For any sequence (xn)n∈N (bounded or not),

lim supn→∞

xn = infk∈N

(supn≥k

xn

)= lim

k→∞

(supn≥k

xn

),

lim infn→∞

xn = supk∈N

(infn≥k

xn

)= lim

k→∞

(infn≥k

xn

).

2.4 Cauchy Sequences

Cauchy introduced a concept of convergence which looks like the usual definition, except

that it makes no mention of the limit value.

Definition 2.31. (xn)n∈N is a Cauchy sequence if: ∀ε > 0, ∃H ∈ N for which

|xn − xm| < ε ∀n,m ≥ H.

A sequence is Cauchy if eventually (n ≥ H) every element is arbitrarily close to every

other element (m ≥ H also.) Notice that n,m are treated symmetrically in this definition,

but often it will be convenient to indicate which one is the larger one, so we are permitted

to choose one to be the larger, ie,

|xn − xm| < ε ∀n > m ≥ H.

The definition of Cauchy sequences looks very much like that of limits, and indeed for

sequences in R the two are in fact equivalent.

Theorem 2.32. For any real number sequence, (xn)n∈N is a Cauchy sequence if and only if

(xn)n∈N is convergent.

This theorem is yet another equivalent statement of the Continuum Property of the

reals. That is, each of the following statements is logically equivalent, in the sense that if

you assume any one of them to be true we can prove the others are true:

(a) Every set S ⊂ R which is bounded above has a supremum u = supS ∈ R.

(b) Every sequence (xn)n∈N which is monotone and bounded converges to some x ∈ R.

(c) Every Cauchy sequence (xn)n∈N converges to some x ∈ R.

28

Each of these is an existence statement, and each is about “filling in the holes”, the idea

that R is a continuum.

Proof of Theorem 2.32. We break the argument down into steps.

Step 1: If (xn)n∈N is a Cauchy sequence then (xn)n∈N is bounded.

This is done exactly as in the proof of Theorem 2.4(b) (the fact that a convergent sequence

must be bounded.) Again, take ε = 1 in the definition of Cauchy Sequence, and get K ∈ Nfor which |xn − xK | < ε = 1, ∀n ≥ K. Now, finish the proof as above, letting xK play the

role of the limit in the proof of Theorem 2.4(b). [The details are left as an exercise.]

Step 2: By Bolzano-Weierstrass, (xn)n∈N contains a convergent subsequence (xnk)k∈N.

Thus, ∃ x ∈ R with limk→∞ xnk= x.

Step 3: If (xn)n∈N is a Cauchy sequence which contains a convergent subsequence, then

the whole sequence (xn)n∈N converges.

Let ε > 0 be any given value. By the Cauchy condition, ∃N ∈ N with |xm − xj| < ε2

for

all m, j ≥ N . The subsequence converges, so ∃ K1 ∈ N with |xnk− x| < ε

2for all k ≥ K1.

Finally, since nk → ∞ as k → ∞, we may choose K2 ∈ N with nk ≥ N whenever k ≥ K2.

Take K = max{K1, K2}. For every m ≥ N and k ≥ K,

|xm − x| ≤ |xm − xnk|+ |xnk

− x| < ε,

and so xm → x.

Where did we use the Completeness Axiom? It’s in Step 2: the Bolzano-Weierstrass

Theorem uses the Monotone Sequence Theorem, which is equivalent to the Completeness

Axiom!

3 Series

You may want to think of a series of real numbers,∞∑n=1

an as an “infinite sum”, but that

notion is too vague and may lead to confusion and error. We define an infinite series as a

limit of finite sums, and so it is just another example of a sequence!

Definition 3.1. Given a sequence (an)n∈N in R, we define the series∞∑n=1

an as follows: for

each k ∈ N, define the kth partial sum

sk =k∑

n=1

an.

29

We say the series∞∑n=1

an converges if the sequence (sk)k∈N is convergent, and the series

diverges if the sequence (sk)k∈N is divergent.

Since each partial sum is a finite sum of real numbers, each sk is very clearly defined; it

is the sum of the first k terms, the running tally as you sum the series one term at a time.

The series is the limiting value, if such a limit exists.

Remark 3.2. From the definition we note that the convergence or divergence of the series

doesn’t depend on which value of n the sum starts with; the series∑∞

n=1 an,∞∑n=0

an,∞∑n=3

an,

and∞∑

n=2018

an are either all convergent or all divergent. All that matters is the “tail” of the

infinite sequence (sn)n∈N of partial sums.

Example: Geometric Series∞∑n=0

rn, where r ∈ R is constant.

This is the most important series, and it illustrates divergence and convergence very well.

This is one of the only series where we have an explicit formula for the partial sums:

sk =k∑

n=0

rn =1− rk+1

1− r, k = 0, 1, 2, . . . .

If you want to start the series with n = 1, then each term has a common factor of r, so∞∑n=1

rn = r∞∑n=0

rn, and

k∑n=1

rn = r

k−1∑n=1

rn = r1− rk

1− r=r − rk+1

1− r.

If |r| < 1, then rk+1 −−−→k→∞

0, and so the partial sums converge, and the series is

convergent,∞∑n=0

rn = limk→∞

sk = limk→∞

1− rk+1

1− r=

1

1− r,

and the limit value is explicitly known.

If r > 1, rn → ∞ (properly divergent), and since sk > rk the partial sums properly

diverge to infinity, the series is divergent. If r = 1, then rk = 1 and sk = k + 1 (the number

of terms) and the series is also properly divergent to infinity.

When r < −1, rk+1 is unbounded and diverges, so also does sk. (But not properly!

|rk+1| −−−→k→∞

0 but the values oscillate in sign.) A more interesting case is r = −1, for which

30

sk alternates between 1 (k even) and 0 (k odd). So a series can diverge even if the partial

sums remain bounded; there are divergent series which are not properly divergent to ±∞.

For most convergent series we don’t know to what value the series converges to. This is

where Cauchy sequences come in handy: a series converges if and only if its partial sums

form a Cauchy sequence!

Theorem 3.3. The series∑∞

n=1 an converges if and only if: ∀ε > 0, ∃H ∈ N so that∣∣∣∣∣p∑

n=m+1

an

∣∣∣∣∣ < ε, ∀p > m ≥ H. (3.1)

Proof. This is not difficult to prove, since the left-hand side of (3.1) can be rewritten in

terms of the partial sums,

|sp − sm| =

∣∣∣∣∣p∑

n=m+1

an

∣∣∣∣∣ .So the condition (3.1) is exactly the statement that the partial sums (sk)k∈N form a Cauchy

sequence in R. By Theorem 2.32 they converge if and only if (3.1) holds.

An immediate consequence comes from looking at a special case of (3.1), when m+1 = p,

and there is only one term in the sum,

|ap| =

∣∣∣∣∣p∑

n=p

an

∣∣∣∣∣ < ε, ∀p > H.

We may conclude (after changing notation from ap to the more usual an,) that:

Corollary 3.4. If∑∞

n=1 an is a convergent series, then limn→∞

an = 0.

Thus, for a series to converge, a necessary condition is that the terms an −−−→n→∞

0. Stewart

calls this a “test for divergence”, since it is most used in the contrapositive: if an does not

tend to zero as n→∞, then the series must diverge.

This condition is, however, not sufficient for convergence of the series; not only must

the general term an −−−→n→∞

0, but it must tend to zero sufficiently rapidly that the partial

sums settle at a limiting value. The boundary between convergent and divergent behavior

is subtle, as we will see via examples later on.

31

Nonnegative Series

An important special case of series are series with nonnegative terms, an ≥ 0 ∀n ∈ N.

For nonnegative series, the partial sums are monotone increasing,

sk+1 =k+1∑n=1

an = sk + ak+1 ≥ sk, ∀k ∈ N.

Theorem 3.5. If∑∞

n=1 an is a series with nonnegative terms, an ≥ 0 ∀n ∈ N, then the

series either converges or properly diverges to +∞.

In particular, the series converges if and only if the partial sums form a bounded sequence.

Note that this is not true for series whose terms change sign: for example∞∑n=1

(−1)n has

bounded partial sums even though the series diverges.

Remark 3.6. As noted in Remark 3.2, it is enough that the series has nonnegative terms

eventually always, that is, ∃N ∈ N for which an ≥ 0 ∀n ≥ N . In that case, the partial sums

will be monotone increasing from n = N on, ie, sN ≤ sN+1 ≤ sN+2 ≤ · · · , the sequence

(sN+k)k=0,1,2,... is monotone increasing, and hence either convergent or properly divergent to

+∞.

Example 3.7 (The Harmonic Series).∞∑n=1

1

ndiverges. To get an idea of what’s happening,

write out the first several terms, and group them in this funny way:

∞∑n=1

1

n= 1 +

1

2+

[1

3+

1

4

]+

[1

5+

1

6+

1

7+

1

8

]+

[1

9+ · · ·+ 1

16

]+ · · ·

≥ 1 +1

2+ 2 · 1

4+ 4 · 1

8+ 8 · 1

16+ · · ·

≥ 1 +1

2+

1

2+

1

2+

1

2+ · · ·

Basically, we group the terms a2m−1 + · · · + a2m+1 together. There are 2m−1 terms in each

group, and each term in the group is ≥ 2−m, so each group adds 12

to the sum. To make this

official, write it in terms of partial sums with k = 2m,

s2m+1 = s2m +

[1

2m + 1+

1

2m + 2+ · · ·+ 1

2m+1

]︸︷︷︸

2m terms, each ≥ 2−(m+1)

≥ s2m +1

2.

We may then conclude (induction!) that s2m ≥ 1 + m2

for each m ∈ N, and so the sequence

of partial sums is unbounded, and thus divergent. (Remember, it is monotone increasing as

the series has nonnegative terms!)

32

Example 3.8 (The p-series). The series∞∑n=1

1

npconverges for p > 1. We use essentially the

same idea as for the harmonic series, but not we want to show that the partial sums are

bounded above, so we group the terms a little differently:

∞∑n=1

1

np= 1 +

[1

2p+

1

3p

]+

[1

4p+

1

5p+

1

6p+

1

7p

]+

[1

8p+ · · ·+ 1

15p

]+ · · ·

≤ 1 + 2 · 1

2p+ 4 · 1

4p+ 8 · 1

8p+ · · ·

= 1 +1

2p−1+

1

4p−1+

1

8p−1· · ·

= 1 +1

2p−1+

(1

2p−1

)2

+

(1

2p−1

)3

+ · · ·

=∞∑m=0

(1

2p−1

)m,

so the p-series is bounded above by a geometric series, with r = 2−(p−1). Since p > 1,

0 < 2−(p−1) < 1, and the geometric series converges, and hence so does the p-series for p > 1.

With a few examples as above we can test convergence of series with nonnegative terms

via the Comparison Tests:

Theorem 3.9 (Comparison Tests). Let∞∑n=1

an,∞∑n=1

bn be series with nonnegative terms, for

which 0 ≤ an ≤ bn ∀n ∈ N.

(a) If∞∑n=1

bn converges, then∞∑n=1

an converges also.

(b) If∞∑n=1

an diverges, then∞∑n=1

bn diverges also.

Remark 3.10. As in Remark 3.2, it is not really necessary for 0 ≤ an ≤ bn ∀n ∈ N; it

suffices that it be true eventually always, that is: there exists N ∈ N so that 0 ≤ an ≤ bn

∀n ≥ N.

Example 3.11.∞∑n=1

1

npwith 0 < p < 1, diverges (properly to +∞.)

Let an = 1n. Since np ≤ n for 0 < p < 1 and n ∈ N, we have an ≤ 1

np . Since the harmonic

series∑∞

n=1 1/n diverges, by the Comparison Test we conclude that the p-series∑∞

n=1 1/np

diverges for 0 < p < 1 also.

33

Example 3.12.∞∑n=1

√n+ 3

n3 + 2nconverges. Since n ≥ 1, we obtain a term bn with the same

order of magnitude as an by the following estimate:√n+ 3

n3 + 2n≤√n+ 3n

n3=

2√n

n3=

2

n5/2= bn

Since∑∞

n=1 bn converges (p-series, p = 5/2 > 1,) by the Comparison Test the original series

converges also.

Exercise 3.13. (a) Assume an, bn ≥ 0 ∀n ∈ N,∞∑n=1

an converges and (bn)n∈N is bounded.

Show that∞∑n=1

anbn converges.

(b) Assume an, bn ≥ 0 ∀n ∈ N,∞∑n=1

an diverges, and ∃c > 0 for which bn ≥ c ∀n. Show

that∞∑n=1

anbn diverges.

(c) Assume an ≥ 0 ∀n ∈ N and∞∑n=1

an converges. Show that∞∑n=1

a2n converges. [Hint: use

(a).]

Absolute and Conditional Convergence

When the terms of the series may change sign the variety of behavior of series is greater. This

isn’t surprising, the partial sums are no longer monotone, so it mirrors the situation we had

for nonmonotone sequences, which could diverge by oscillation without properly diverging

to ±∞.

Let’s look at the simple example:∞∑n=1

(−1)n

n. This is an alternating series, an =

(−1)nbn with bn ≥ 0 ∀n ∈ N. In this case bn = 1n, but what we do for this example will be

the same for any alternating series with the additional property that (bn)n∈N is monotone

decreasing. As in the previous examples, we get interesting information by grouping the

terms in two different ways. First, group the terms in pairs with the even terms in front:∞∑n=1

(−1)n

n= −1 +

1

2− 1

3︸︷︷︸≥0

+1

4− 1

5︸︷︷︸≥0

+1

6− 1

7︸︷︷︸≥0

+ · · ·

In other words, since the series terms alternate in sign with (−1)n, and the absolute values

decrease in magnitude, the terms

a2j + a2j+1 = b2j − b2j−1 ≥ 0, ∀j = 1, 2, 3, . . .

34

Thus, the odd partial sums, s2j+1 satisfy:

s2j+1 = s2j−1 + a2j + a2j+1 ≥ s2j−1,

and therefore the subsequence of odd partial sums are monotone increasing, s1 ≤ s3 ≤ s5 ≤· · · ≤ s2j+1 ≤ · · ·

Grouping the terms in pairs, but starting with the odd terms,

∞∑n=1

(−1)n

n= −1 +

1

2︸︷︷︸≤0

+−1

3+

1

4︸︷︷︸≤0

+−1

5+

1

6︸︷︷︸≤0

+ · · ·

With this grouping,

a2j−1 + a2j = −b2j−1 + b2j ≤ 0, ∀j = 1, 2, 3, . . .

and the even partial sums are monotone decreasing, s3 ≥ s4 ≥ s6 ≥ · · · ≥ s2j ≥ · · ·Since a2j = b2j ≥ 0, by the monotonicity above we have

s1 ≤ s2j−1 ≤ s2j−1 + b2j = s2j ≤ s2.

Therefore, s2 is an upper bound for the monotone increasing sequence of odd partial sums

(s2j−1)j∈N, and by the Monotone Sequence Theorem, it converges: ∃sodd with limj→∞

s2j−1 =

sodd. Similarly, s1 is a lower bound for the monotone decreasing sequence of even partial

sums (s2j)j∈N, and hence ∃seven with limj→∞

s2j = seven. But, s2j = s2j−1 + b2j, and

limj→∞

b2j = limj→∞

1

2j= 0,

so

seven = limj→∞

s2j = limj→∞

s2j−1 + b2j = sodd.

Therefore, both the odd and the even subsequences converge to the same limit s = sodd =

seven, and so s = limk→∞

sk and the alternating series converges.

As you may have noticed, we really didn’t use the exact form of the series, only the facts

that it alternated, with absolute value of the terms monotonically decreasing to zero. We

have thus proven the following general convergence theorem for alternating series:

Theorem 3.14 (Alternating Series Theorem). Let∑∞

n=1 an be an alternating series, an =

(−1)nbn, ∀n ∈ N, with: bn ≥ 0 ∀n ∈ N; b≥bn+1 ∀n ∈ N; and bn −−−→n→∞

0. Then∑∞

n=1 an

converges.

35

We want to distinguish series which converge because of cancellation between positive

and negative terms, and those which converge because the magnitude of the terms tends to

zero rapidly enough.

Definition 3.15. We say the series∑∞

n=1 an converges absolutely if the series of its abso-

lute values∞∑n=1

|an| is convergent. If∑∞

n=1 an is convergent but∞∑n=1

|an| diverges we say the

original series∞∑n=1

an converges conditionally.

Example 3.16. The alternating series∞∑n=1

(−1)n

nconverges conditionally. More generally,

the alternating p-series∞∑n=1

(−1)n

npconverges for any p > 0! It converges absolutely for all

p > 1, and conditionally if 0 < p ≤ 1.

One thing we should check: if a series converges absolutely, is it convergent (in the sense

of the original definition)? Fortunately, the answer is yes, and absolute convergence is a

stronger condition than simple convergence (as in Definition 3.1.

Theorem 3.17. If the series∑∞

n=1 an is absolutely convergent, then it is convergent.

Proof. This is another exercise in the Cauchy criterion for convergence. Since∑∞

n=1 an is

absolutely convergent, its partial sums form a Cauchy sequence, so by Theorem 3.3, ∀ε > 0

∃H ∈ N so that ∣∣∣∣∣p∑

n=m+1

|an|

∣∣∣∣∣ =

p∑n=m+1

|an| < ε, ∀p > m ≥ H.

Now we apply the triangle inequality repeatedly (use induction!) to verify that∣∣∣∣∣p∑

n=m+1

an

∣∣∣∣∣ ≤p∑

n=m+1

|an|

< ε, ∀p > m ≥ H.

By Theorem 3.3,∑∞

n=1 an is convergent.

Exercise 3.18. Find a convergent series∞∑n=1

xn for which the series∞∑n=1

xn diverges. [The

original series must be conditionally convergent (why?)]

This part was not covered in class, but maybe we’ll come back to it later. . .

The tests for convergence of series which you learned from Stewart may all be obtained

by using the Cauchy criterion or the Comparison tests. As an example, we prove the Root

Test.

36

Theorem 3.19 (Root Test). Suppose limn→∞

n√|an| = L exists. Then,

∑∞n=1 an converges

absolutely if L < 1 and diverges if L > 1.

As you already know, the case L = 1 is indeterminant, since any of the p-series an = 1/np

gives L = 1 regardless of the value of p. (Check it!)

Proof. First assume L < 1. Choose ε > 0 so that r = L + ε < 1. Then, by the existence of

the limit, there exists K ∈ N with

n√|an| < L+ ε = r, ∀n ≥ K.

In other words, |an| < rn holds ∀n ≥ K. By the Comparison Test, since r < 1 and the

Geometric Series∑∞

n=K rn converges, the series

∑∞n=1 an converges absolutely.

Now assume L > 1. Choose ε > 0 so that L+ ε ≥ 1 this time. Then, by the existence of

the limit, there exists K ∈ N so that

n√|an| > L+ ε ≥ 1, ∀n ≥ K.

So |an| ≥ 1 ∀n ≥ K, so by the necessary condition for convergence, Corollary 3.4, the series∑∞n=1 an diverges.

Remark 3.20. Notice that we really didn’t need the limit to exist in the Root Test; what

we really needed for convergence was: ∃ r < 1 and ∃K ∈ N for which n√|an| ≤ r ∀n ≥ K.

Then, the Root Test is just the Comparison Test (to the Geometric Series) in disguise. This

weaker condition would be satisfied if we knew that any subsequential limit point of n√|an| was

strictly smaller than one. In other words, for convergence we only need lim supn→∞

n√|an| < 1.

Similarly, for divergence we really only need to set up the necessary condition by showing

that n√|an| > 1 infinitely often in n. This is equivalent to saying that there is a subsequential

limit point of n√|an| was strictly larger than one, or lim supn→∞

n√|an| > 1. So a sharper

version of the Root Test is stated in terms of limsup rather than limit:

∞∑n=1

an converges absolutely if lim supn→∞

n√|an| < 1 and diverges if lim sup

n→∞

n√|an| > 1.

4 Cardinality

We have been careful to distinguish sets in R from sequences of real numbers.

• A set S ⊂ R is any collection of real numbers. It has no order, and there is no point

in repeating elements when describing S. It could contain finitely or infinitely many

elements.

37

• A sequence (xn)n∈N is an ordered, infinite list of real numbers, indexed by n ∈ N.

Question: when can a set S be written as a sequence,

S = {xn | n ∈ N}??

• This question really is about counting, putting the elements of S in order.

• Children use their fingers to count. Mathematicians use N = {1, 2, 3, . . . }.

We start with finite sets. Clearly, a set S ⊂ R with only finitely many elements can not

be represented as a sequence. But we can understand what it means to count the elements

of a finite set.

Definition 4.1. For each n ∈ N, define the subsets Jn = {1, 2, 3, . . . , (n− 1), n} ⊂ N. A set

S is finite if it is in one-to-one correspondence with one of the sets Jn. A set which is not

finite is called infinite.

• So Jn is like having n fingers, and we assign to each element of S exactly one “finger”

in Jn.

• Be careful not to confuse “finite” and “bounded”. The set S = [0, 1] is bounded, but

infinite.

• A one-to-one correspondence establishes an equivalence between sets; two sets which

are both in 1-1 correspondence with Jn are equivalent to each other in the sense of

counting.

• If two sets S, T are in one-to-one correspondence, we say they have the same cardi-

nality.

Now let’s consider infinite sets.

Definition 4.2. We say the set S is countably infinite (or denumerable) if S is in one-

to-one correspondence with N.

In other words, S is countably infinite if its elements can be indexed by n ∈ N. In yet

other words, a countably infinite set can be written as a sequence!

Example 1: the counting numbers, S1 = N is a countably infinite set.

Example 2: the even counting numbers, S2 = {2, 4, 6, 8 . . . } = {2k | k ∈ N} is countably

infinite.

38

• Here xk = 2k, k ∈ N, gives the 1-1 correspondence from N to S.

• Notice that S2 ⊂ S1, and {xk}k∈N is a subsequence of the sequence N = (1, 2, 3, . . . ).

The example can be generalized:

Theorem 4.3. If S1 is countably infinite and S2 ⊂ S1, then S2 is either finite or countably

infinite.

Proof. • Suppose S2 in an infinite set.

• Since S1 is countably infinite, there is a sequence (xn)n∈N which lists all of the elements,

S1 = {xn |n ∈ N}.

• Let n1 be the smallest index for which xn1 ∈ S2, n2 > n1 the next index for which

xn2 ∈ S2, etc.

• Then the subsequence (xnk)k∈N in an enumeration of the elements of S2. ♦

Now let’s think about some operations on sets. For example, the union of the sets A and

B, A ∩ B is the set whose elements x satisfy x ∈ A or x ∈ B. Here is an example which is

special, but in the end completely typical:

Example: S = Z = N ∪ {0,−1,−2,−3, }, a union of countably infinite sets, is countable.

• Careful! If we count all the positive integers first, we’ll never get to the negative ones!

• Instead, we alternate between the two sets:

Z = {0, 1,−1, 2,−2, 3,−3, . . . }

• We could write down an exact formula for the 1-1 correspondence, xk = f(k),

f(k) =

{−k/2, if k is even,

(k − 1)/2, if k is odd,

and verify that it’s injective and surjective, but usually we just indicate how to count

the elements by writing the sequence!

39

Now let’s look at any two countably infinite sets, S1 and S2. We write each one as a

sequence,

S1 = x1, x2, x3, . . . and S2 = {y1, y2, y3, . . . },

and then the union may be written as a sequence too,

S1 ∪ S2 = {x1, y1, x2, y2, x3, y3, . . . }.

Theorem 4.4. If S1, S2 are countably infinite sets, then the union S1 ∪S2 is also countably

infinite.

Can we find an infinite set which isn’t countable? Suppose we look at an infinite number

of sets, and take their union.

Theorem 4.5. Suppose we have a countably infinite collection of sets, S1, S2, S3, . . . , and

each of the sets Si, i ∈ N is countably infinite, so

Si = {ai,1, ai,2, ai,3, . . . , ai,j, . . . }, ∀i ∈ N.

Then, their union, S =⋃∞i=1 Si is also countably infinite.

By the infinite union we mean that S contains all elements ai,j with i, j ∈ N.

Proof. List each set Si as the ith row in an array,

S1 ={a1,1, a1,2, a1,3, a1,4, . . .

}S2 =

{a2,1, a2,2, a2,3, a2,4, . . .

}S3 =

{a3,1, a3,2, a3,3, a3,4, . . .

}S4 =

{a4,1, a4,2, a4,3, a4,4, . . .

}...

......

......

and count them on diagonals:

Thus,

S = {a1,1, a1,2, a2,1, a3,1, a2,2, a1,3, a1,4, a2,3, . . . },

writes the union S as a sequence, and so S is countably infinite.

Let’s see how that works for an important example, the set of all rational numbers!

S = {q ∈ Q, q > 0} =∞⋃i=1

{i

j| j ∈ N

}

Now you may start to be thinking that all infinite sets are countable. This is not the

case: there are different “sizes” of infinite sets, and some infinite sets are too large to be

counted!

40

Figure 1: Counting a countable union of countable sets.

Figure 2: Counting the rational numbers, Q.

Theorem 4.6. The real number set R is uncountable. That is, R is not in one-to-one

correspondence with N.

So R is much larger than N or Z or Q, each of which is of the same cardinality.

There are two famous arguments to prove this, both due to Georg Cantor. Here is one

which uses an old friend, the monotone sequence theorem.

Proof. We argue by contradiction: suppose that R = {x1, x2, x3, . . . } is a complete enumer-

ation of the real numbers.

• First take x1, and find any closed interval I1 = [a1, b1] with x1 6∈ I1.

• Next, take x2, and find a closed interval I2 = [a2, b2] ⊆ I1 so that x2 6∈ I2. Since

I2 ⊆ I1, also x1, x2 6∈ I2.

41

• We continue in this way, constructing nested closed intervals,

Ik = [ak, bk] ⊆ Ik−1 ⊆ · · · ⊆ I2 ⊆ I1,

with the property that x1, x2, . . . , xk /∈ Ik.

Recall:

Ik = [ak, bk] ⊆ Ik−1 ⊆ · · · ⊆ I2 ⊆ I1, and x1, x2, . . . , xk /∈ Ik.

• Look at the endpoints of these intervals,

a1 ≤ a2 ≤ a3 ≤ · · · ≤ ak ≤ · · · bk ≤ · · · ≤ b3 ≤ b2 ≤ b1

• ¡2-¿ The (ak) are monotone increasing, and bounded above by b1; the (bk) are monotone

decreasing, and bounded below by a1.

• ¡3-¿ By the Monotone Sequence Theorem, both sequences converge,

limk→∞ ak = a ≤ b = limk→∞ bk

and the interval [a, b] ⊆ [ak, bk] = Ik ∀k.

• ¡4-¿ Therefore, xk /∈ [a, b] for any k, so there exist real numbers which are not on the

list (xk)k∈N, a contradiction.

• ¡5-¿ Even though Q is a dense set in R, it is much smaller: Q is countably infinite,

but R is uncountably infinite!

5 Limits and Continuity

We now consider functions f : A ⊆ R→ R, defined on a domain set A ⊆ R. We will review

the basic concept of limits in this context (familiar to you from calculus) and connect it back

to the limit concept for sequences studied above.

42

5.1 Limits of Functions

To study the limit of f(x) as x → a, we need to know that we can approach the number c

among points x ∈ A, the domain of f .

Example 5.1. Consider the function

f(x) =

{√x, if x ≥ 0,

−7, if x = −1.

The domain of the function is the set A = {−1}∪ [0,∞). The point c = −1 is in the domain,

but it is an isolated point: it has no neighboring points which are also in the domain, and so

we can’t “approach” c = −1 as a limit of points in the domain A. On the other hand, any

point in the other part of the domain, [0,∞) has neighbors which are also in the domain of

f .

For any δ > 0, call the set Vδ(c) = (c − δ, c + δ) the δ-neighborhood of c. It is an open

interval of width 2δ centered at x = c on the real line. In the previous example, when c = −1

and δ < 1, the δ-neighborhoods Vδ(−1) do not contain any other points from the set A other

than −1 itself. That is what we mean when we say c = −1 is an isolated point in A.

Definition 5.2. Let A ⊂ R. A point c ∈ R is called a limit point (often called cluster point)

of the set A if for every δ > 0, the δ-neighborhood (c− δ, c+ δ) contains a point x ∈ A, with

x 6= c.

If A is an interval, such as [a, b], (a, b), (a, b], (a,∞), [−∞, b], etc., then any point in the

interval or any endpoint is a limit point of the interval. This will typically be the case with

the examples we do, and so we won’t worry so much about limit points in most examples.

CAUTION: a point c ∈ R can be a limit point of a set A without being an element of A.

For example, if A = (0, 1), the endpoints c = 0 and c = 1 are limit points of A even though

they’re not elements of A.

Applying the definition repeatedly with δ = 1/n, n ∈ N, we observe that an equivalent

definition of limit point can be given in terms of sequences:

Theorem 5.3. Let A ⊆ R. Then, c ∈ R is a limit point of A if and only if:{ ∃ a sequence (xn)n∈N with:

xn ∈ A ∀n ∈ N, xn 6= c ∀n ∈ N, and xn −−−→n→∞

c.

We may now define what we mean by limx→c

f(x), for limit points c of the domain of f .

43

Definition 5.4. Let f : A ⊆ R→ R and suppose c is a limit point of the domain A. Then

limx→c

f(x) = L if: {∀ε > 0, ∃δ > 0 so that :

|f(x)− L| < ε ∀x ∈ A with 0 < |x− c| < δ.

Notice that the condition 0 < |x− c| < δ says that x is very close to c, but not equal to

c (as in the definition of limit point!) So the limit of a function is in general not related

to the value of f(c), if indeed c ∈ A at all.

We also point out that the condition “x ∈ A” means that if c is an endpoint of the set

A, the limit above is actually a one-sided limit: a limit from the right if we are at the left

endpoint, and a limit from the left if c is the right endpoint of A. We define a limit from the

right as follows, by restricting to values of x > c:

Definition 5.5. Let f : A ⊆ R → R and suppose c is a limit point of the domain {x ∈A | x > c}. Then lim

x→c+f(x) = L if:{

∀ε > 0, ∃δ > 0 so that :

|f(x)− L| < ε ∀x ∈ A with c < x < c+ δ.

Similarly, we may define the limit from the left, limx→c−

f(x) as above, but for c−δ < x < c.

Example 5.6. Let f(x) =√x, defined on A = [0,∞). We will show that lim

x→0

√x = 0.

(Note that c = 0 is a limit point of A.)

Let ε > 0 be any given value. Since x ∈ A = [0,∞) with |f(x) − 0| =√x < ε exactliy

when 0 ≤ x < ε2, we choose δ = ε2 > 0. Then, if x ∈ A and 0 < |x − 0| = x < δ, indeed

|f(x)−0| =√x < ε, and the definition is satisfied. Notice that because c = 0 is an endpoint

of the interval A = [0,∞), this is actually a limit from the right. Since f is not defined to

the left of c = 0, there is no limit possible from the left.

Now, try limx→4

√x = 2. (Note again that c = 4 is a limit point of A.) Again, we simplify

the desired expression |f(x)− 2|, in order to express it in terms of x− 4:

|f(x)− 2| =∣∣√x− 2

∣∣ =

∣∣∣∣ x− 4√x+ 2

∣∣∣∣ =|x− 4|√x+ 2

.

Since x ∈ A = [0,∞),√x+ 2 ≥ 2, and hence we have |f(x)− 2| ≤ 1

2|x− 4|. So if we choose

δ = 2ε, then if |x− 4| < δ = 2ε, by the above calculation, |f(x)− 4| ≤ 12|x− 4| < ε, and so

the limit statement is verified. ♦

We next relate limits of functions to limits of sequences, which we spent so much time

studying in the previous sections.

44

Theorem 5.7. Let f : A ⊆ R→ R, and c ∈ R a limit point of A. L = limx→c

f(x) if and only

if: {For every sequence (xn)n→∞ with xn ∈ A, xn 6= c ∀n ∈ N

and xn −−−→n→∞

c, it is true that f(xn) −−−→n→∞

L.

Proof. First, assume L = limx→c

f(x). Then, for any ε > 0, ∃δ > 0 with |f(x) − L| < ε for

all x ∈ A with |x − c| < δ. Take any sequence (xn)n∈N with xn ∈ A, xn 6= c ∀n ∈ N and

xn −−−→n→∞

c. By the convergence of the sequence, ∃K ∈ N so that |xn − c| < δ, ∀n ≥ K.

Taking x = xn in the first limit condition, whenever n ≥ K we then have |f(xn) − L| < ε,

that is the sequence (f(xn))n∈N converges to L, which is the desired conclusion.

Now, assume the second statement is true, that for every sequence (xn)n∈N with xn ∈ A,

xn 6= c ∀n ∈ N and xn −−−→n→∞

c, we have f(xn) −−−→n→∞

L. We argue by contradiction, and

assume that f(x) does not converge to L as x→ c. This means that:{∃ε0 > 0 so that ∀ δ > 0, there exists x ∈ A

with 0 < |x− c| < δ and |f(x)− L| ≥ ε0.

Applying this repeatedly with δ = 1/n, n ∈ N, we may generate a sequence (xn)n∈N with

the property that xn ∈ A ∀n ∈ N, 0 < |xn − c| < 1n, and |f(xn)− L| ≥ ε0. That is, xn 6= c,

xn −−−→n→∞

c, and f(xn) 6→ L. This contradicts the hypothesis that all such sequences converge

to L, and so the Theorem is verified.

This equivalent description of convergence of limits of functions will have many important

consequences for us. In particular, the contradiction argument in the second part of the proof

will be useful as a criterion for determining when a limit does not exist!

Corollary 5.8. If there exists two sequences (xn)n∈N, (un)n∈N ⊂ A, with xn, un 6= c ∀n and

xn, un −−−→n→∞

c, for which limn→∞

f(xn) 6= limn→∞

f(un), then limx→c

f(x) does not exist.

Example 5.9. f(x) = cos (1/x), A = {x 6= 0}. Then c = 0 is a limit point of A. If we look

at the sequences xn = 1/(nπ), n ∈ N, then xn −−−→n→∞

0, xn 6= 0 ∀n, and f(xn) = cos(nπ) =

(−1)n. So there are two subsequences (corresponding to odd and even n,) along which there

are different subsequential limit values of f(xn), and by the Corollary the limit limx→0

cos(1/x)

does not exist.

We can also use the equivalence of the two kinds of convergence to extend various facts

about limits from sequences to functions:

Theorem 5.10. Assume f, g : A ⊆ R → R, and c ∈ R a limit point of A. Assume also

that L = limx→c

f(x) and M = limx→c

g(x) both exist. Then:

45

(a) For any α ∈ R, limx→c

αf(x) = αL.

(b) limx→c

[f(x) + g(x)] = L+M .

(c) limx→c

[f(x) · g(x)] = LM .

(d) If M 6= 0, limx→c

[f(x)/g(x)] = L/M .

These all follow from Theorem 5.7 and the corresponding facts proven for sequences. If

(xn)n∈N, with xn ∈ A, xn 6= c ∀n ∈ N and xn −−−→n→∞

c, then

limn→∞

[f(xn) + g(xn)] = limn→∞

f(xn) + limn→∞

g(xn) = L+M.

Since this is true for all such sequences, statement (b) must hold. The others are all proven

in the same way. We also have:

Theorem 5.11 (Squeeze Theorem). If f(x) ≤ g(x) ≤ h(x) ∀x ∈ A, c ∈ R is a limit point

of A, and limx→c f(x) = L = limx→c h(x), then limx→c g(x) = L also.

Example 5.12. We show that limx→0

x cos(1/x) = 0. Note that cos(1/x) has no limit as x→ 0,

so the limit laws in Theorem 5.10 does not apply! However, | cos(1/x)| ≤ 1 for all x 6= 0, so

−|x| ≤ x cos(1/x) ≤ |x| ∀x 6= 0.

Since limx→0 |x| = 0 (Exercise: prove it via the definition!), by the Squeeze Theorem we have

limx→0 x cos(1/x) = 0.

46

5.2 Continuous Functions

By introducing limits we have devised a new “operation” which may be performed on func-

tions, other than evaluation of a function f(x) at a point x in its domain. We distinguish a

special class of functions, the continuous functions which are well-behaved with respect to

the taking of limits.

Definition 5.13. Let A ⊆ R, and f : A→ R. We say that f is continuous at a ∈ A if

∀ε > 0 ∃δ > 0 so that |f(x)− f(a)| < ε, ∀ x ∈ A with |x− a| < δ.

We note that in the definition, a ∈ A is in the domain of f and so f(a) has been defined.

And, as long as a is a limit point of the set A, the condition given in the definition matches

with the existence of the limit, limx→a

f(x) = f(a).

We thus have the equivalent definition of continuity at a limit point a ∈ A:

Theorem 5.14. Let f : A ⊆ R→ R and a ∈ A a limit point. Then f is continuous at a if

and only if limx→a

f(x) = f(a).

If a ∈ A is not a limit point, in other words, if a is an isolated point of A, then any

f is continuous at a. So generally we only really care about continuity at limit points of

the domain A. As with limits, it will often be convenient to think of continuity in terms

of sequences. Applying Theorem 5.7 to Theorem 5.14, we have yet one more equivalent

condition for continuity at a limit point:

Theorem 5.15. Let f : A ⊆ R→ R and a ∈ A a limit point. Then f is continuous at a if

and only if limn→∞

f(xn) = f(a) holds for every sequence (xn)n∈N with xn −−−→n→∞

a.

The best functions are continuous everywhere on their domain A:

Definition 5.16. Let f : A ⊆ R→ R. We say f is continuous on A if f is continuous at

each point x ∈ A.

Continuity on a set A can be written in a very satisfying way: a continuous function is

one for which

limn→∞

f(xn) = f(

limn→∞

xn

)holds for every convergent sequence (xn)n∈N ⊂ A.

As in the previous discussion of limits, the use of sequences makes for a simple condition

for a function to be discontinuous:

Corollary 5.17. Let f : A ⊆ R → R and a ∈ A. If there exists a sequence (xn)n∈N ⊂ A

with xn −−−→n→∞

a such that limn→∞

f(xn) 6= f(a), then f is discontinuous at a.

47

Note that if there is a sequence (xn)n∈N with xn −−−→n→∞

a for which (f(xn))n∈N doesn’t

converge at all, then f is discontinuous at a. One way to exploit this is to find a sequence

xn −−−→n→∞

a for which f(xn) has two different subsequential limits. For example, if

f(x) = sgn (x) =

−1, if x < 0;

0, if x = 0;

1, if x > 0,

we take xn = (−1)n 1n−−−→n→∞

0, then f(xn) = (−1)n, which is divergent since it has two

subsequential limit points. Thus, the signum function is discontinuous at x = 0.

Using the properties of the limit from Theorem 5.10 we immediately derive the usual

properties of continuous functions from calculus:

Theorem 5.18. Assume f, g : A ⊆ R → R are each continuous at a ∈ A. Then (f + g),

fg, and f/g (provided g(a) 6= 0) are all continuous at a ∈ A.

The other natural combination between continuous functions is composition. To define

the composition of two functions h(x) = g ◦ f(x) = g(f(x)) we have to be sure that their

domain and range are compatible.

Theorem 5.19. Let f : A ⊆ R → R and g : B ⊆ R → R with f(A) ⊆ B. If f is

continuous at a ∈ A and g is continuous at b = f(a) ∈ B, then h = g ◦ f is continuous at a.

Proof. Using the definition of continuity for g at y = b, for any ε > 0 there exists γ > 0 for

which

|g(y)− g(b)| < ε for all y ∈ B with |y − b| < γ.

Now apply the definition of continuity for f at x = a (adapting the Greek letters appropri-

ately): there exists δ > 0 for which

|f(x)− f(a)| < γ for all x ∈ A with |x− a| < δ.

Now, for x ∈ A with |x − a| < δ, call f(x) = y ∈ B, and so |y − b| = |f(x) − f(a)| < γ.

By the first continuity statement, we then have |g(f(x))− g(f(a))| = |g(y)− g(b)| < ε, and

we’re done.

As an exercise, you can make an alternate proof of the continuity of g ◦ f by using

sequences (xn)n∈N ⊂ A.

With these combinations we can verify that a lot of common elementary functions are

continuous, such as polynomials and rational functions (as long as the denominator is not

zero.)

48

Example 5.20. First, f(x) =√x is continuous on [0,∞). For any a ∈ (0,∞), by a familiar

trick

|f(x)− f(a)| =∣∣√x−√a∣∣ =

∣∣∣∣ x− a√x+√a

∣∣∣∣ ≤ a−12 |x− a|.

Given any ε > 0 let δ = ε√a, and the definition of continuity at a is verified. For a = 0,

given ε > 0 we take δ = ε2 (verify the details!)

Now we know that functions of the form h(x) =√P (x), with P (x) polynomial, are

continuous at any points in their domain. (That is, x for which P (x) ≥ 0.) How do we get a

larger “vocabulary” of continuous functions? Via calculus, as we’ll see in the section on the

derivative. . .

Continuous functions are special, but we gain even more special properties when the

domain of the continuous function is a closed, bounded interval: that is, A = [a, b], and

f : [a, b]→ R is continuous on the set [a, b].

Theorem 5.21. Let f : [a, b]→ R be continuous. Then:

(a) f is bounded on [a, b]; that is, ∃M > 0 so that |f(x)| ≤M ∀x ∈ [a, b].

(b) f attains its maximum and minimum values in [a, b]. That is, there exist points

α, β ∈ [a, b] for which

f(α) ≤ f(x) ≤ f(β) for all x ∈ [a, b].

Another way to express (a) is that the image of f , the set of values y = f(x),

f([a, b]) := {f(x) | x ∈ [a, b]}

is a bounded set in R. As for (b), we could equivalently state it in this way: there exists

points α, β ∈ [a, b] so that:

f(α) = infx∈[a,b]

f(x), and f(β) = supx∈[a,b]

f(x).

CAREFUL: this theorem does not hold in case that the domain of f is either unbounded,

or not closed (ie, lacks one or both endpoints.) Try the example f(x) = e−x on the interval

[0,∞) (there is no minimum) or the interval (0, 1] (there is no maximum.)

Proof. We argue by contradiction, and assume that f is not bounded. Since |f(x)| ≥ 0

always, this means we are assuming that {|f(x)| | x ∈ [a, b]} is unbounded above. Therefore,

for any n ∈ N, n is not an upper bound for f([a, b]), and there exists xn ∈ [a, b] with

49

|f(xn)| ≥ n. As we have done many times before, we recognize that we have constructed a

sequence (xn)n∈N, in this case in the interval [a, b] and so a ≤ xn ≤ b ∀n ∈ N. Hence (xn)n∈N

is a bounded sequence, and by Bolzano-Weierstrass it contains a convergent subsequence,

xnk→ x. By properties of limits of sequences, a ≤ x ≤ b, so x ∈ [a, b], and by continuity,

f(x) = limk→∞

f(xnk).

However, f(xn) −−−→n→∞

+∞ (diverges properly), and so this is impossible. Therefore f([a, b])

is bounded, and (a) must hold.

For (b), let’s show that α exists; a similar argument works for β. Let

A = infx∈[a,b]

f(x) = inf{f(x) | x ∈ [a, b]},

which exists since we just showed that f([a, b]) is bounded. By the definition of the infimum,

for every n ∈ N, ∃xn ∈ [a, b] with

A ≤ f(xn) ≤ A+1

n, ∀n ∈ N. (5.1)

Again, we’ve constructed a sequence (xn)n∈N, a ≤ xn ≤ b, ∀n ∈ N, which is bounded, and

using Bolzano-Weierstrass again we conclude the existence of a subsequence (xnk)k∈N, and

α ∈ R with xnk→ α, with α ∈ [a, b]. From (5.1), f(xn) −−−→

n→∞A, and by continuity we then

have:

f(α) = limk→∞

f(xnk) = A,

since every subsequence of a convergent sequence converges to the same limit. This proves

(b), for the existence of a minimum.

Here is an old friend from Calculus:

Theorem 5.22 (Intermediate Value Theorem). Let f : [a, b]→ R be continuous. If f(a) <

k < f(b), then ∃c ∈ (a, b) with k = f(c).

There are (at least) two proofs of this, by slightly different means. The first one is based

on the Supremum, and it heavily uses the following Lemma, which is problem A in Practice

Problems # 5:

Lemma 5.23. If g : A ⊂ R → R is continuous at d ∈ A and g(d) > 0 then ∃δ > 0 so that

g(x) > 0 ∀x ∈ D with d− δ < x < d+ δ.

Proof # 1. Define the set

S = {u | f(x) < k ∀ x ∈ [a, u]} ⊆ [a, b],

50

and c = supS. Then either f(c) < k, f(c) > k, or f(c) = k; we eliminate the first two.

If f(c) < k, by Lemma 5.23 with g(x) = k − f(x), ∃δ > 0 for which g(x) = k − f(x) > 0

in (c− δ, c+ δ), and so (c+ δ) ∈ S, which contradicts c = supS. Hence, f(c) ≥ k.

If f(c) > k, again by Lemma 5.23, but now with g(x) = f(x) − k, ∃δ > 0 for which

g(x) = f(x) − k > 0 in (c − δ, c + δ), and therefore (c − δ) is an upper bound for S, again

contradiction the choice of c = supS.

In conclusion, f(c) = k, with c ∈ (a, b).

The second proof is more constructive– in fact, it gives an algorithm, “bisection”, by

which one can approximate the value of c.

Proof # 2. Define L = b− a, the length of the interval I0 = [a, b]. Consider the midpoint of

the interval, p0 = 12(a+ b). If f(p0) = k, then we are done, and c = p0. If not, define a new

interval I1 = [a1, b1] as follows:

• if f(p0) < k, then take a1 = p0 and b1 = b.

• if f(p0) > k, then take a1 = a and b1 = p0.

The interval I1 ⊂ I0, and has length L1 = 12L. Now repeat the above procedure with I1,

getting I2 ⊂ I1 ⊂ I0, and repeat again, etc. That is, if we have already constructed n

intervals, In ⊂ In−1 ⊂ · · · ⊂ I1 ⊂ I0, with each Ij = [aj, bj] having length Ln = 12Ln−1 =

2−nL, then let pn = 12(bn − an) be the midpoint of In. If f(pn) = k, then we are done and

c = pn. If not, define the next interval In+1 = [an+1, bn+1] as follows:

• if f(pn) < k, then take an+1 = pn and bn+1 = bn.

• if f(p0) > k, then take an+1 = an and bn+1 = pn.

If the process is non terminating, we get a sequence of nested intervals, I0 ⊂ I1 ⊂ I2 ⊂ · · · ,and so the endpoints form two monotone sequences,

a ≤ a1 ≤ a2 ≤ · · · ≤ an ≤ · · · ≤ bn ≤ · · · ≤ b2 ≤ b1 ≤ b.

By the Monotone Sequence Theorem, each sequence converges,

c = limn→∞

an, d = limn→∞

bn, and c ≤ d.

As bn − an = 2−nL −−−→n→∞

0, by the Squeeze Theorem d = c. And by continuity, f(c) =

limn→∞ f(an) = limn→∞ f(bn). Since f(an) < 0 and f(bn) > 0 ∀n, we have f(c) = 0.

51

5.3 Extensions of limit concepts

Here we consider some variations on the theme of limits of functions. First, as in Calculus

we can also consider one-sided limits, as x tends to a from the right or from the left. It

is also useful to consider the possibility that functions diverge properly as x→ a, from one

side or another.

Definition 5.24. Let f : A ⊆ R→ R, and a ∈ R a limit point of the set A.

(a) We say that f(x) converges to L as x tends to a from the right, and write limx→a+

= L if:{∀ε > 0 ∃δ > 0 such that

|f(x)− L| < ε whenever x ∈ A, and a < x < a+ δ.

(b) We say that f(x) converges to L as x tends to a from the left, and write limx→a−

= L if:{∀ε > 0 ∃δ > 0 such that

|f(x)− L| < ε whenever x ∈ A, and a− δ < x < a.

(c) We say that f properly diverges to +∞ as x tends to a from the right, and write

limx→a+

f(x) = +∞ if:{∀M > 0 ∃δ > 0 such that

f(x) > M whenever x ∈ A, and a < x < a+ δ.

(d) We say that f properly diverges to +∞ as x tends to a from the left, and write

limx→a−

f(x) = +∞ if:{∀M > 0 ∃δ > 0 such that

f(x) > M whenever x ∈ A, and a− δ < x < a.

We could also have proper divergence of f(x) to −∞ as x → a±; just switch f(x) <

−M instead of f(x) > M in (c), (d) above, as limx→a±

f(x) = −∞ is the same as when

limx→a±

[−f(x)] = +∞.

There is also the case of limits as x→ ±∞:

Definition 5.25. Let f : [a,∞)→ R. We say limx→∞

f(x) = L if

∀ε > 0, ∃B > 0 so that |f(x)− L| < ε ∀x > B.

Let f : (−∞, b]→ R. We say limx→−∞

f(x) = L if

∀ε > 0, ∃B > 0 so that |f(x)− L| < ε ∀x < −B.

52

Example 5.26. f(x) = 1/x3, x 6= 0. Then limx→0−

f(x) = −∞: since for any M > 0,

f(x) = 1/x3 < −M is equivalent to the inequality 0 > x > − 1M1/3 . So by taking δ = 1

M1/3 we

satisfy the definition.

In addition, limx→∞

f(x) = 0: for any ε > 0 and x > 0,

|f(x)− 0| = 1

x3< ε ≡ x > ε−

13 .

Choosing B = ε−13 the definition is satisfied.

53

5.4 Uniform Continuity

This is a refined form of continuity, which seems to be a kind of technical difference with the

original definition but which is surprisingly more useful than plain continuity.

Definition 5.27. Let f : A ⊆ R→ R. We say that f is uniformly continuous on A if:{∀ε > 0, ∃δ > 0 such that

∀x, u ∈ A with |x− u| < δ, then |f(x)− f(u)| < ε.

This looks a lot like the statement “f is continuous on A”,{For all u ∈ A, and for all ε > 0,

∃δ > 0 such that if x ∈ A with |x− u| < δ, then |f(x)− f(u)| < ε.

But notice the subtle difference: for regular continuity on A, δ could be different for different

values of u ∈ A, whereas in uniform continuity the same value of δ must be chosen no matter

which u ∈ A is tested.

Example 5.28. f(x) = 1/x, with domain A = [a,∞) for any a > 0. Let’s see if we can

apply the definition: for x, u ≥ a > 0, we have,

|f(x)− f(u)| =∣∣∣∣1x − 1

u

∣∣∣∣=|x− u|xu

(since |x− u| = |u− x|) (5.2)

≤ 1

a2|x− u| (since x, u ≥ a > 0).

This last inequality is true ∀x, u ∈ A = [a,∞), so given any ε > 0, if we choose δ = a2ε ,

then we may conclude that the definition is satisfied, and f is uniformly continuous on A.

The above inequality gives the perfect set-up for proving uniform continuity: the differ-

ence |f(x)−f(u)| is at most proportional to the distance |x−u|. This situation occurs often

enough that we have a special name for it:

Definition 5.29. A function f is called Lipschitz continuous on the set A ⊆ R if ∃K ≥ 0

for which

|f(x)− f(u)| ≤ K |x− u|, ∀ x, u ∈ A.

In the previous example, f(x) = 1/x is Lipschitz continuous on the set A = [a,∞), with

“Lipschitz constant” K = 1a2

. It is an easy exercise in applying the definition to prove the

following:

54

Theorem 5.30. If f is Lipschitz continuous on A then f is also uniformly continuous on

A.

What’s nice about Lipschitz is that it has a geometrical interpretation, which uniform

continuity doesn’t have: it says that the slopes of the secant lines to the graph of y = f(x)

are uniformly bounded,

m =

∣∣∣∣f(u)− f(x)

u− x

∣∣∣∣ ≤M, ∀x, u ∈ A, x 6= u.

Exercise 5.31. Show that f(x) =√x is continuous on [0,∞) but not Lipschitz continuous.

[Hint: assume it is, and get a contradiction when u = 0.]

We know that the function f(x) = 1/x is continuous everywhere on it’s domain (it’s a

rational function,) but it seems we can only show it’s uniformly continuous if the domain

excludes a neighborhood of the bad point u = 0. How would we show that it’s not uniformly

continuous? We need to invert the definition.

Lemma 5.32. Let f : A ⊆ R→ R. f is not uniformly continuous on A if and only if:{∃ε > 0 such that ∀ δ > 0 ∃x, u ∈ A with

|x− u| < δ and |f(x)− f(u)| ≥ ε,

That is, to show a function is not uniformly continuous, we need to find, for each δ > 0,

pairs of points which are close to within distance δ (|x− u| < δ) for which the y-values are

not close (|f(x) − f(u)| ≥ ε). Applying this criterion for a sequence of δ = δn = 1n, n ∈ N,

we have:

Lemma 5.33. Let f : A ⊆ R→ R. f is not uniformly continuous on A if and only if:{∃ε > 0 and a pair of sequences (xn)n∈N, (un)n∈N with xn, un ∈ A,

(xn − un) −−−→n→∞

0, and |f(xn)− f(un)| ≥ ε,

Let’s try these on f(x) = 1/x, A = (0, 1). We know the problem is near u = 0. Setting

up Lemma 5.32, given δ with 0 < δ < 1, let u = δ and x = 12δ, so |u − x| = 1

2δ < δ. And

then, by the calculation(5.2),

|f(x)− f(u)| = |x− u|xu

=12δ

12δ2

=1

δ≥ 1,

so by Lemma 5.32 f is not uniformly continuous on A = (1, 0).

55

Alternatively, we could use Lemma 5.33, with a pair of sequences. Take sequences which

both converge to zero, say xn = 2−n, un = 2−n−1, so |xn − un| = 2−n−1 −−−→n→∞

0, but

|f(xn)− f(un)| = |x− u|xu

=2−n−1

2−2n−1= 2n ≥ 2.

Whichever way you prefer, we obtain the non-uniform continuity of f on A.

Is there a way to guarantee that a continuous function is in fact uniformly continuous?

Yes, provided the domain is a closed and bounded set:

Theorem 5.34. If f is continuous on the interval [a, b] (closed & bounded), then f is also

uniformly continuous on [a, b].

Example 5.35. f(x) =√x is uniformly continuous on [0, 1]. We already showed it was con-

tinuous, so the upgrade to uniform continuity follows directly from the Theorem. However,

from Exercise 5.31 it is not Lipschitz continuous, so this gives an example of how a function

can be uniformly, but not Lipschitz, continuous.

Proof. To obtain a contradiction, assume that f is continuous on A = [a, b] but not uniformly.

By Lemma 5.33 there exist ε > 0 and a pair of sequences (xn)n∈N, (un)n∈N with xn, un ∈ A,

and (un − xn) −−−→n→∞

0, but |f(un) − f(xn)| ≥ ε > 0, ∀n ∈ N. By our old friends Bolzano

and Weierstrass, there exists a subsequence and u ∈ [a, b] for which limk→∞ unk= u. Next,

consider xnk)k∈N, with the same nk as for u. If a sequence converges, then we know that all

subsequences converge to the same limit, so (unk− xnk

) −−−→k→∞

0. Therefore,

xnk= unk

− (unk− xnk

) −−−→k→∞

u,

so the subsequences (xnk)k∈N, (unk

)k∈N both converge to the same u ∈ [a, b]. Since f is

continuous,

limk→∞

(f(un′k)− f(xn′k)

)= f(u)− f(x) = 0,

which contradicts |f(un)− f(xn)| ≥ ε > 0, ∀n ∈ N.

Example 5.36. Show that f(x) = 3√x is uniformly continuous on [0,∞).

Since (a3 − b3) = (a− b)(a2 + ab+ b2), if one of x, u 6= 0,

|f(x)− f(u)| = | 3√x− 3√u| = |x− u|

(x2/3 + x1/3u1/3 + u2/3).

So if both x, u ≥ 1, we have

|f(x)− f(u)| ≤ 1

3|x− u|,

56

and f is Lipschitz continuous on [1,∞), and therefore uniformly continuous there. Therefore,

∀ε > 0, ∃δ1 > 0 for which x, u ∈ [1,∞) with |x− u| < δ1 implies |f(x)− f(u)| < ε.

Next, for fixed u ∈ (0, 2], f is continuous at u since ∀ε > 0,

|f(x)− f(u)| ≤ 1

u2|x− u| < ε, for all x > 0 with |x− u| < u2ε.

And when u = 0, |f(x)−f(0)| = 3√x < ε whenever x > 0 and |x−0| < ε3, so f is continuous

on the closed interval [0, 2]. By Theorem 5.34 then f is uniformly continuous on [0, 2], so

∃δ0 > 0 so that ∀x, u ∈ [0, 2] with |x− u| < δ0 we have |f(x)− f(u)| < ε.

Since any x, u ∈ [1,∞) with |x − u| < 1 both lie either in [0, 2] or in [1,∞), if we let

δ = min{δ0, δ1, 1}, the definition of uniform continuity is satisfied with this δ.

57

6 Differentiability

You know all about the derivative from first-year calculus. But here we’re interested in

understanding the definition and what it says about functions, not so much about calculating

derivatives in particular examples. Much of what we’ll do here actually is proven in Stewart’s

book, but now we will try to put things together so they give a more complete picture.

6.1 Differentiable Functions

Throughout the section we will assume f : I ⊆ R → R, where the domain of f is an

interval. This could be an open interval, I = (a, b) or (a,∞) or (−∞, b); or a closed

interval, I = [a, b] or [a,∞) or (−∞, b]; or neither, I = (a, b] or I = [a, b).

Definition 6.1. Assume f : I ⊆ R → R and c ∈ I. We say f is differentiable at c, with

derivative f ′(c), if limx→c

f(x)− f(c)

x− c= f ′(c). In other words,

∀ε > 0, ∃δ > 0 so that ∀x ∈ I with 0 < |x− c| < δ,∣∣∣∣f(x)− f(c)

x− c− f ′(c)

∣∣∣∣ < ε.

Remark 6.2. Notice that if c is an endpoint of the interval I, then this is actually a one-sided

limit. For instance, if I = [a, b], and c = a (left endpoint), then x ∈ I with 0 < |x − c| < δ

means a = c < x ≤ b, so the limit is from the right only.

If f is differentiable at all c ∈ I, we say the function f is differentiable on I.

We want to think of differentiability as a property which certain functions have, like

continuity or uniform continuity. The existence of the derivative says something nontrivial

about the smoothness of the graph y = f(x): it says that there is a well-defined tangent line.

And the property of differentiability is stronger than continuity, in the sense that it is more

special:

Theorem 6.3. Suppose f is differentiable at c ∈ I. Then, f is continuous at c.

Proof. Take any x ∈ I, x 6= c. Then,

f(x)− f(c) =f(x)− f(c)

x− c(x− c).

Since f is differentiable, both terms in the product on the right-hand side have limits as

x→ c, and therefore the limit exists:

limx→c

[f(x)− f(c)] = limx→c

f(x)− f(c)

x− c(x− c) = 0.

In particular limx→c f(x) = f(c), so (by definition) f is continous at c.

58

Example 6.4. Let f(x) = |x|, which is continuous on R. (Check it with the definition!) At

c = 0, f is not differentiable: for x 6= 0,

f(x)− f(0)

x− 0=|x|x

= sgnx.

The limit as x→ 0 does not exist. (We did this example back in the beginning of Section 5.)

Be careful! When f is differentiable, f itself must also be continuous, but f ′(x) might be

discontinuous! For example

f(x) =

{x2 sin(x−1), if x 6= 0

0 if x = 0,(6.1)

is differentiable at all x ∈ R, including c = 0. For x 6= 0, use the usual rules from calculus,

f ′(x) = 2x sin(x−1)− cos(x−1).

When x = 0, we use the definition,

f ′(0) = limx→0

f(x)− f(0)

x− 0= lim

x→0x sin(x−1) = 0,

(by the Squeeze Theorem!) So f is differentiable at x0 = 0. However, f ′(x) has no limit as

x→ 0, so f ′(x) is discontinuous at c = 0.

The verification of all the usual “rules” of differential calculus (product, quotient, chain

rule) may now be done in a logical, rigorous fashion. The only one which is a little subtle is

the chain rule.

Theorem 6.5 (Chain Rule). Suppose I, J ⊆ R are intervals, g : J → R, f : I → R,

f(I) ⊆ J . Assume f is differentiable at c ∈ I, and g is differentiable at d = f(c) ∈ J . Then

h(x) = g(f(x)) is differentiable at x = c, and

h′(c) = g′(f(c)) f ′(c).

This follows directly from the definition of differentiability.

Proof of the Chain Rule. The key is the following observation: since g(y) is differentiable at

y = d, the function

ϕ(y) =

{g(y)−g(d)y−d , if y 6= d.

g′(d), if y = d,

is continuous at y = d. Then,

h(x)− h(c)

x− c=

[g(f(x))− g(d)

f(x)− d

] [f(x)− f(c)

x− c

]= ϕ(f(x))

[f(x)− f(c)

x− c

].

59

By the continuity of ϕ(y) and the differentiability of f at x = c, each term in the product

converges as x→ c, and hence

h′(c) = limx→c

h(x)− h(c)

x− c= g′(f(c)) f ′(c).

The following should be a very familiar result; we will need it in a crucial way in the next

part:

Theorem 6.6. Suppose f attains its maximum at c ∈ (a, b), ie, f(c) ≥ f(x), ∀x ∈ (a, b). If

f is differentiable at x = c, then f ′(c) = 0.

Of course, the same applies for minima, f(c) ≤ f(x), ∀x ∈ (a, b).

Proof. For all x ∈ (a, c), x < c and f(x) ≤ f(c) so f(x)−f(c)x−c ≥ 0. Since f is differentiable,

f ′(c) = limx→c−

f(x)− f(c)

x− c≥ 0.

On the other hand, x ∈ (c, b), x > c and f(x) ≤ f(c) so f(x)−f(c)x−c ≤ 0, and we also have

f ′(c) = limx→c+

f(x)− f(c)

x− c≤ 0.

The only possibility is f ′(c) = 0.

Remark 6.7. Notice that if the maximum is attained at the right endpoint, c = b, then

the first half of the above is still valid, and we conclude that f ′(c) ≥ 0 (f increases towards

its max at the right endpoint.) And if the max occurs on the left endpoint, c = a, then

the second half of the argument gives us f ′(c) ≤ 0 (and f decreases away from its max on

the left endpoint.) For minima, the inequalities are reversed: g′(a) ≥ 0 or g′(b) ≤ 0 if the

minimum occurs on an endpoint.

As a first application of this principle, we prove the following property of the derivative

due to Darboux.

Theorem 6.8 (Darboux’s Theorem). Suppose f is differentiable on [a, b], and there exists

a constant λ with f ′(a) < λ < f ′(b). Then ∃ c ∈ (a, b) with f ′(c) = λ.

The same conclusion holds if the order of the values is reversed, and f ′(b) < λ < f ′(a);

just consider g(x) = −f(x) and µ = −λ, and apply the Theorem to g and µ.

60

Proof. Let g(x) = f(x)−λx, so g is differentiable on [a, b], g′(x) = f ′(x)−λ, and g′(a) < 0 <

g′(b). Therefore, it’s enough to show that ∃c ∈ (a, b) with g′(c) = 0. Since g is differentiable

on [a, b], by Theorem 6.3 it is continuous, and so by Theorem 5.3.4 in the textbook it attains

its minimum value in [a, b]. By Remark 6.7, the minimum cannot occur on the endpoints

x = a, b, since g′(a) < 0 < g′(b). So the minimum must be attained in the open interval,

c ∈ (a, b), and so we are done, by Theorem 6.6.

Remember from the example (6.1) above that f ′(x) might be discontinuous. However,

Darboux’s Theorem says that whether f ′ is continuous or not, it satisfies the Intermediate

Value property. In particular, this says that f ′(x) cannot have a jump discontinuity, only

discontinuities in which at least one of the one-sided limits doesn’t exist (as in the example

(6.1) above.)

6.2 The Mean Value Theorem and its Consequences

Theorem 6.9 (Lagrange’s Mean Value Theorem). Suppose f is continuous on [a, b] and

differentiable on (a, b). Then, there exists c ∈ (a, b) so that

f(b)− f(a) = f ′(c)(b− a).

Equivalently, there exists c ∈ (a, b) so that

f(b)− f(a)

b− a= f ′(c).

The left-hand side gives the slope of the secant line through the points (a, f(a) and (b, f(b))

on the graph y = f(x), while the right-hand side gives the slope of the tangent line to the

graph at x = c.

There is a somewhat stronger version of the Mean Value Theorem, due to Cauchy:

Theorem 6.10 (Cauchy’s Mean Value Theorem). Assume f, g are both continuous on [a, b]

and differentiable on (a, b), and in addition assume that g′(t) 6= 0, ∀t ∈ (a, b). Then,

∃c ∈ (a, b) withf(b)− f(a)

g(b)− g(a)=f ′(c)

g′(c). (6.2)

This has a similar geometrical interpretation as Lagrange’s MVT: consider a path, r(t) =

(x(t), y(t)), t ∈ [a, b], in the plane. The left-hand side of (6.2) is the slope of the secant line

connecting the endpoints (g(a), f(a)) and (g(b), f(b)), and the right-hand side is the slope

of the tangent line at t = c.

It is easy to see that the Lagrange MVT is a corollary of Cauchy’s MVT, obtained by

choosing the function g(t) = t. To prove the MVT we start with the following special case:

61

Theorem 6.11 (Rolle’s Theorem). Suppose f is continuous on [a, b] and differentiable on

(a, b), with f(a) = 0 = f(b). Then there exists c ∈ (a, b) with f ′(c) = 0.

So between any two zeros of a differentiable function, there must be a critical point!

Proof of Rolle’s Theorem. If f(x) = 0 ∀x ∈ [a, b] we are done, since f ′(c) = 0 ∀c ∈ (a, b).

So assume f is not the constant function zero, and so it must be either positive or negative

somewhere. As f is continuous on [a, b], by Theorem 5.3.4 in the textbook, it attains

its maximum and minimum values in [a, b]. Since f(a) = f(b) = 0, either the (positive)

maximum or the (negative) minimum (or both) must be attained in the interior, at some

c ∈ (a, b). By Theorem 6.6, f ′(c) = 0.

The two Mean Value Theorems are merely Rolle’s Theorem if we use the secant line to

the curve as the horizontal axis. That is, we subtract off the equation of the secant, and

apply Rolle’s Theorem:

Proof of Cauchy’s MVT. First, we claim that g(a) 6= g(b). Indeed, if it were true that

g(a) = g(b), then by Rolle’s Theorem we would have g′(c) = 0 for some c ∈ (a, b), but we

have assumed that g′(t) 6= 0, ∀t ∈ (a, b).

Then let

h(t) = f(x)− f(a)− (f(b)− f(a))

(g(b)− g(a))(g(x)− g(a)).

Then h is continuous on [a, b] and differentiable on (a, b), with h(a) = 0 = h(b), so by Rolle’s

Theorem, ∃c ∈ (a, b) with h′(c) = 0, that is

0 = f ′(c)− (f(b)− f(a))

(g(b)− g(a))g′(c),

which is (6.2).

Almost every important tool you learned in calculus is related to the Mean Value Theo-

rem; many are actually more or less equivalent to the Mean Value Theorem!

Here are some easy consequences:

Proposition 6.12. Suppose f is continuous on [a, b] and differentiable on (a, b).

(a) If f ′(x) ≥ 0 ∀x ∈ (a, b), then f is monotone increasing on [a, b]. (Strictly monotone

increasing if f ′(x) > 0∀x.)

(b) If f ′(x) ≤ 0 ∀x ∈ (a, b), then f is monotone decreasing on [a, b]. (Strictly monotone

decreasing if f ′(x) < 0∀x.)

(c) If f ′(x) = 0 ∀x ∈ (a, b), then f(x) is constant in [a, b].

62

Of course you’ve known forever that if f is constant, then f ′(x) = 0∀x; part (c) is the

converse. It says that the only functions whose derivatives are zero are the constants.

Proof. Let a ≤ x 0, the sign of the term on the right is given by the sign of f ′(c), so that

proves (a), (b). For (c), this says f(x) = f(u), for any pair x, u ∈ [a, b], which is another

way of saying f(x) is a constant function– it always takes on the same value, independent

of x.

We can also connect back to the concepts of uniform continuity and Lipschitz continuity,

from the last section.

Example 6.13. Suppose f is differentiable on an interval I ⊆ R, and f ′(x) is a bounded

function on I; that is, ∃B ≥ 0 with

|f ′(x)| ≤ B ∀x ∈ I. (6.3)

Show that f is Lipschitz continuous on I.

Take any x, u ∈ I, and apply the MVT in the interval between them: ∃c ∈ I between x, u

for which f(x)− f(u) = f ′(c)(x− u). Taking the absolute value and using the boundedness

condition (6.3),

|f(x)− f(u)| = |f ′(c)| |x− u| ≤ B |x− u|,

which holds ∀x, u ∈ I. Hence f is Lipschitz continuous on I, and therefore it is uniformly

continuous on I.

Here is an old friend from Calculus, which is easily proven using the Cauchy MVT. Note

that in this formulation we consider a one-sided limit x→ a+, and do not need to know that

the functions are differentiable at the endpoint x = a:

Theorem 6.14 (L’Hopital’s Rule). Assume both f, g are differentiable on (a, b), and g′(x) 6=0, ∀x ∈ (a, b). Suppose that

limx→a+

f(x) = 0 = limx→a+

g(x), and ∃ L ∈ R with limx→a+

f ′(x)

g′(x)= L.

Then,

limx→a+

f(x)

g(x)= L.

63

Proof. By hypothesis, ∀ε > 0, ∃δ > 0 so that ∀x ∈ (a, a+ δ),

L− ε < f ′(x)

g′(x)< L+ ε. (6.4)

Take any s, t with a < s < t < a + δ. As in the proof of CMVT, by Rolle’s Theorem and

g′(x) 6= 0, we have g(s) 6= g(t), and by the CMVT, ∃c ∈ (s, t) ⊂ (a, a+ δ) with

f(t)− f(s)

g(t)− g(s)=f ′(c)

g′(c).

Substituting into (6.4),

L− ε < f ′(c)

g′(c)=f(t)− f(s)

g(t)− g(s)< L+ ε.

For t fixed, let s→ a+, so we have f(s), g(s)→ 0 (by hypothesis,) and

L− ε ≤ f(t)

g(t)≤ L+ ε,

for all t ∈ (a, a+ δ). By definition, limt→a+f(t)g(t)

= L.

6.3 Taylor’s Theorem

Actually, Taylor’s Theorem also qualifies as a direct consequence of the Mean Value Theorem,

but it deserves its own subsection!

Definition 6.15. Let I ⊆ R be an interval.

We say f is continuously differentiable on I, and write f ∈ C1(I), if f is differentiable

at all x ∈ I and f ′(x) is continuous on I (written f ′ ∈ C(I) or C0(I).)

We say f is n-times continuously differentiable on I, f ∈ Cn(I), if the nth derivative,

f (n)(x) = dn

dxnf(x) exists at every x ∈ I, and is continuous on I (that is, f (n) ∈ C(I).)

By Theorem 6.3, if f ∈ Cn(I) then automatically each f (k)(x) is continuous on I, k =

0, 1, 2, . . . , n − 1. So Ck(I) ⊂ Cn(I) for each k = 0, 1, 2, . . . , n − 1. Recalling Remark 6.2,

continuity and differentiability on the endpoints of the interval I are interpreted in terms of

one-sided limits.

If f is n times differentiable at x0 ∈ I, we may define the nth-order Taylor polynomial,

based at x0,

Pn(x) = Pn(x;x0)

:= f(x0) + f ′(x0)(x− x0) +1

2!f ′′(x0)(x− x0)2 + · · ·+ 1

n!f (n)(x0)(x− x0)n

=n∑k=0

1

k!f (k)(x0)(x− x0)k. (6.5)

64

We say that f and Pn agree to nth order at x0:

f(x0) = Pn(x0), f ′(x0) = P ′n(x0), . . . , f (n) = P (n)n (x0).

In fact, Pn is the unique nth order polynomial which agrees to f to order n.

If we stop at n = 1, then P1(x) is the equation of the tangent line to the graph y = f(x),

an old friend from calculus. The tangent line is the best linear approximation to the graph for

x near x0, and we expect Pn to be the best polynomial of order n to approximate the values

of f(x), for x near x0. The precision of this approximation is given by Taylor’s Theorem:

Theorem 6.16 (Taylor’s Theorem). Suppose I ⊆ R is an interval, f ∈ Cn(I), and f (n+1)(x)

exists for all x ∈ I. Then, for any x0, x ∈ I, ∃c ∈ I with c between x0 and x, so that:

f(x) = Pn(x;x0) +1

(n+ 1)!f (n+1)(c) (x− x0)n+1.

The remainder term Rn(x;x0) = 1(n+1)!

f (n+1)(c) (x− x0)n+1 is the error made by approx-

imating the value of f(x) by Pn(x;x0).

Proof. This proof uses a trick. Let x0, x ∈ I be fixed, and consider (for variable t ∈ I and

thinking of x, x0 as constants) the auxilliary function

F (t) = f(x)− f(t)−n∑k=1

1

k!(x− t)kf (k)(t).

We note that F (x0) = f(x)− Pn(x;x0), which is the remainder term in the Taylor approxi-

mation of f .

By the hypotheses on f , F (t) is differentiable for all t ∈ I, and we calculate:

F ′(t) = −f ′(t)−n∑k=1

[−kk!

(x− t)k−1f (k)(t) +1

k!(x− t)kf (k+1)(t)

]

=n−1∑j=0

1

j!(x− t)jf (j+1)(t)−

n∑k=0

1

k!(x− t)kf (k+1)(t) Re-indexing the first sum, k = j + 1

= − 1

n!(x− t)nf (n+1)(t).

Now, define another function

G(t) = F (t)−(x− tx− x0

)n+1

F (x0),

which is also differentiable on I, with G(x) = 0 = G(x0). Applying Rolle’s Theorem (or the

LMVT), ∃c lying between x, x0 with G′(c) = 0. Hence,

0 = G′(c) = F ′(c) + (n+ 1)(x− c)n

(x− x0)n+1F (x0),

65

and rearranging the above we get

F (x0) = − 1

(n+ 1)

(x− x0)n+1

(x− c)nF ′(c)

=1

(n+ 1)

(x− x0)n+1

(x− c)n1

n!(x− t)nf (n−1)(c)

=1

(n+ 1)!(x− x0)n+1f (n−1)(c),

which is what we had to prove.

Example 6.17. Show that

x− x3

6≤ sinx ≤ x, ∀ x ∈

[0,π

2

]Use Taylor, based at x0 = 0, but which order n? f(x) = sinx is C∞(R), it is differentiable

to every order. Since the highest order in the estimate is x3, let’s try n = 2 (and the cubic

will come from the remainder.) Since P2(x; 0) = x, by Taylor’s Theorem, ∀x ∈ [0, π/2],

∃c = c(x) with 0 < c < x and

sinx = P2(x; 0) +1

3!f ′′′(c)(x− 0)3 = x− 1

6cos(c)x3.

Since c ∈ (0, π/2), 0 < cos(c) < 1, and therefore we obtain the desired estimates.

Here’s a question: assume f is C∞(R), that is all the derivatives f (k)(x) exist for x ∈ R,

and at x = 0 all derivatives f (k)(0) = 0. Is it true that f(x) ≡ 0, the zero function?

Example 6.18. Define a function

g(x) =

{e−1/x, if x > 0,

0, if x ≤ 0.

As an Exercise, use Taylor’s Theorem to prove that ∀n ∈ N,

e1/x ≥ 1

n!x−n, ∀x > 0. (6.6)

[Hint: use u = 1/x and apply Taylor to eu for u ∈ [0,∞).]

As a consequence of (6.6), for any k ∈ N, it is another Exercise to show that

limx→0

g(x)

xk= 0, ∀k. (6.7)

[Hint: use the Squeeze Theorem.]

66

It turns out that g is C∞(R). This is not hard to prove, but is a bit long (there is a sketch

below.) Let’s assume this, and verify that for every n = 0, 1, 2, . . . , the Taylor polynomial

Pn(x; 0) ≡ 0. So every derivative g(n)(0) = 0, yet g is not the constant zero function. We

use induction on n. When n = 0, P0(x; 0) = g(0) = 0. Assume Pn−1(x; 0) ≡ 0, so by

Pn(x; 0) = anxn with an = f (n)(0)

n!. By Taylor’s Theorem, ∀x > 0,∃c ∈ (0, x) so that

g(x) = Pn(x; 0) +1

(n+ 1)!f (n+1)(c)xn+1 = anx

n +1

(n+ 1)!f (n+1)(c)xn+1.

By (6.7),

0 = limx→0

g(x)

xn= lim

x→0

[an +

1

(n+ 1)!f (n+1)(c)x

]= an,

since 0 < c < x and we are assuming g(n+1) is continuous. Thus Pn(x; 0) ≡ 0 and g(n)(0) = 0.

By induction we conclude this is true ∀n ∈ N.

Thus, g(x) is an example of a function which vanishes to all orders at x = 0, yet is not

the constant zero function.

[Here we sketch how to show that g ∈ C∞ in a neighborhood of zero. The first step is to

verify, by induction, that for all n ∈ N and x > 0, there exists a polynomial Qn(u) for which

∀x > 0, by the usual Chain and Product Rules, g(x) is k-times differentiable with

dn

dxng(x) =

dn

dxne−

1x = Qn

(1

x

)e−

1x = Qn

(1

x

)g(x).

Notice that g′(x) = 1x2g(x), so Q1(u) = u2. And when taking higher derivatives, d

dxQn(1/x)

is a different polynomial in u = 1/x.

Now use the definition of the derivative and (6.7) to show that

g(n+1)(0) = limx→0+

g(n)(x)− 0

x− 0= 0.

This verifies that g ∈ C∞(R), with each g(n) = 0.] ♦

Theorem 6.19 (Second Derivative Test). Assume that f is C2 for x ∈ (a, b), and c ∈ (a, b)

with f ′(c) = 0.

(a) If f ′′(c) > 0, then c is a strict local minimum for f : ∃δ > 0 so that f(c) < f(x),

∀x ∈ (c− δ, c+ δ).

(b) If f ′′(c) < 0, then c is a strict local maximum for f : ∃δ > 0 so that f(c) > f(x),

∀x ∈ (c− δ, c+ δ).

67

Proof. We prove (a); for (b) apply (a) to g(x) = −f(x).

Since by hypothesis, f ′′(x) is continuous in (a, b) and f ′′(c) > 0, ∃δ > 0 so that f ′′(x) > 0

for all x ∈ (c− δ, c + δ). Let x ∈ (c− δ, c + δ). Applying Taylor’s Theorem with n = 1, ∃ubetween x and c for which

f(x) = f(c) + f ′(c)(x− c) +1

2f ′′(u)(x− c)2 = f(c) +

1

2f ′′(u)(x− c)2,

since f ′(c) = 0. As u ∈ (c− δ, c+ δ), f ′′(u) > 0, and f(x) > f(c).

Taylor’s Theorem is also useful in Numerical Analysis, in which we want to solve prob-

lems in calculus or differential equations by approximation algorithms implemented on the

computer. To do this, we need to discretize the variables so as to only retain a finite number

of values to represent a function and its derivatives. The method of finite differences consists

in approximating the derivative by difference quotients, with values of f chosen on a grid of

x-values, with spacing xk − xk−1 = h between them.

One of the easiest approximations to the derivative at x = a is the “forward difference”,

Dhf(x) =f(x+ h)− f(x)

h, h > 0.

This is a “forward” difference since it compares f(x) with the next value forward f(x+ h).

You could also make the “backward difference”,

D−hf(x) =f(x)− f(x− h)

h, h > 0,

which is the same as using a negative value for h in the forward difference (and so its analysis

will be exactly the same as for Dhf(x).) Let’s assume that f is C2 in an open interval I

including x, and h is small enough so that x+ h ∈ I also. Then, applying Taylor’s Theorem

with n = 1 (ie, with remainder term involving the second derivative f ′′,) ∃c between x, x+h

with:

f(x+ h) = f(x) + f ′(x)[(x+ h)− x] +1

2!f ′′(c)((x+ h)− x)2 = f(x) + h f ′(x) +

h2

2!f ′′(c).

Rearranging,

Dhf(x)− f ′(x) =f(x+ h)− f(x)− h f ′(x)

h=h

2f ′′(c).

Assume that we know that the second derivative is bounded,

supx∈I|f ′′(x)| ≤M.

68

Then, we have the following estimate on the error made by approximating f ′(a) by forward

differences:

|Dhf(x)− f ′(x)| ≤ 1

2M h.

We say that the forward difference gives a first order accurate approximation to the deriva-

tive, because the error of approximation is at most proportional to the first power h = h1 of

the step size h.

Consider the centered difference approximation to the derivative,

Dchf(x) =

f(x0 + h)− f(x0 − h)

2h, h > 0.

It turns out that this is a better approximation to f ′(x), in the sense that it is second order

accurate.

Exercise 6.20. Suppose f is three times differentiable in I, and ∃M ≥ 0 with the third

derivative uniformly bounded by M , that is,

supx∈I|f (3)(x)| ≤M

Use Taylor’s Theorem (with n = 2) to show that if both x, x+ h ∈ I, then∣∣∣∣f(x+ h)− f(x− h)

2h− f ′(x)

∣∣∣∣ ≤ 1

6M3 h

2.

♦

As the question suggests, you will apply Taylor’s Theorem with n = 2 (with remainder

involving f ′′′,) to obtain two values c+ lying between x and x+h, and c− lying between x−hand x, with

f(x± h) = f(x)± h f ′(x) +1

2h2f ′′(x)± 1

6h3f ′′′(c±).

Now find |Dchf(x)− f ′(x)|, and finish as above.

69

7 The Fundamental Theorem of Calculus

Theorem 7.1 (Fundamental Theorem of Calculus, I). Assume that f is differentiable on

[a, b], and the derivative f ′(x) is Riemann integrable on [a, b]. Then,

f(b)− f(a) =

∫ b

a

f ′(x) dx.

Proof. Let ε > 0 be any given value. Since f ′(x) is Riemann integrable, by Darboux’s

Theorem there exists a partition of [a, b],

P = {a = x0 < x1 < · · · < xN = b},

so that 0 ≤ UP (f ′)−LP (f ′) < ε. By Theorem 6.3, since f is differentiable it is also continuous

on [a, b]. We apply the Mean Value Theorem to f on each subinterval in the partition

[xi−1, xi], i = 1, 2, . . . , N : there exists ci ∈ (xi−1, xi) so that f ′(ci)(xi − xi−1) = f(xi)− xi−1,∀i = 1, 2, . . . , N . We then use these values to make a “tagged” partition P and a Riemann

sum,

S(f ′, P ) =N∑i=1

f ′(ci)(xi − xi−1)

=N∑i=1

[f(xi)− f(xi−1] (telescoping sum!

= f(xn)− f(x0) = f(b)− f(a).

Since mi(P ) ≤ f ′(ci) ≤ Mi(P ), we have LP (f ′) ≤ S(f ′, P ) ≤ UP (f ′) (this is true for any

Riemann sum), and since f ′ is Riemann integrable we similarly have LP (f ′) ≤∫ baf ′(x) dx ≤

UP (f ′). By Darboux’s condition 0 ≤ UP (f ′)− LP (f ′) < ε, and so∣∣∣∣f(b)− f(a)−∫ b

a

f ′(x) dx

∣∣∣∣ =

∣∣∣∣S(f ′, P )−∫ b

a

f ′(x) dx

∣∣∣∣ < ε,

true for any ε > 0. This implies f(b)− f(a)−∫ baf(x) dx = 0, as desired.

If g is a Riemann integrable function on [a, b] and c ∈ [a, b] we may define a function

f : [a, b]→ R by integration,

f(x) =

∫ x

a

g(t) dt. (7.1)

Exercise 7.2. For g Riemann integrable on [a, b], show that f (as defined above) is Lipschitz

continuous on [a, b].

70

Theorem 7.3 (Fundamental Theorem of Calculus, II). If g is a continuous function on [a, b]

then f(x) =

∫ x

a

g(t) dt is differentiable on [a, b], with f ′(x) = g(x) ∀x ∈ [a, b].

For this version of FTC we need the Mean Value Theorem in a slightly different form:

Theorem 7.4 (MVT for Integrals). If g is a continuous function on [α, β], then there exists

c ∈ (α, β) with

g(c) =1

β − α

∫ β

α

g(t) dt.

That is, a continuous g attains its mean (average) value on any closed interval. Notice

that if g(x) = f ′(x) for some differentiable function f , then (applying FTC I) we recover

Lagrange’s Mean Value Theorem for f !

Proof of MVT4I. Since g is continuous on [α, β], it attains it maximum and minimum values,

so ∃u, v ∈ [α, β] with

g(u) ≤ g(t) ≤ g(v), ∀t ∈ [α, β].

By a property of the integral,

g(u)(β − α) =

∫ β

α

g(u) dt ≤∫ β

α

g(t) dt ≤∫ β

α

g(v) dt = g(v)(β − α),

that is,

g(u) ≤ 1

β − α

∫ β

α

g(t) dt ≤ g(v).

So by the Intermediate Value Theorem, ∃c between u and v (and therefore c ∈ (a, b),) with

g(c) =1

β − α

∫ β

α

g(t) dt.

Proof of FTC II. Take any s, x ∈ [a, b], x 6= s. We look at the difference quotient for f , and

apply a property of the Riemann integral:

f(x)− f(s)

x− s=

1

x− s

[∫ x

c

g(t) dt−∫ s

c

g(t) dt

]=

1

x− s

∫ x

s

g(t) dt.

(Recall that we use the convention that∫ sxg(t)dt = −

∫ xsg(t)dt, so we don’t need to worry

whether x < s or s < x.) Since g is continuous on [a, b], we may use MVT4I, to obtain c

between x and s, with g(c) =1

x− s

∫ x

s

g(t) dt. Therefore,

∣∣∣∣f(x)− f(s)

x− s− g(x)

∣∣∣∣ = |g(c)− g(x)|.

71

Since g is continuous at x, ∀ε > 0 ∃δ > 0 so that |g(c)− g(x)| < ε whenever |x− s| < δ. As

c lies between x and s, |c− x| < δ also, so∣∣∣∣f(x)− f(s)

x− s− g(x)

∣∣∣∣ < ε, ∀s with 0 < |x− s| < δ.

By definition of the limit, f is differentiable at x ∈ [a, b], with f ′(x) = g(x).

Corollary 7.5. If g is continuous on [a, b], then g has an antiderivative, f (defined as in

(7.1)) which is differentiable on [a, b].

Remark 7.6. You may have noticed there is a little asymmetry between FTC I and FTC

II: in the first, we only assume f ′ is Riemann integrable, while in the second we ask for g to

be continuous. In fact, not every Riemann integrable function has an antiderivative at each

point in [a, b]: for example, if we take

g(t) =

{1, if t ≥ 0,

−1, if t < 0

then g is Riemann integrable on [−1, 1], and f(x) =∫ x0g(t) dt = |x|, which fails to be

differentiable at x = 0. (It is Lipschitz continuous, as the Exercise above asks!)

Example 7.7. Define h(x) =

∫ lnx

0

ln t dt. For which x is h differentiable, and what is h′(x)?

This is a composition,

h(x) = f(φ(x)) with φ(x) = lnx, and f(u) =

∫ u

0

lnx dx.

As this is analysis (and not calculus) we need to be sure the appropriate Theorems are

applicable. f is differentiable for u ∈ (0,∞), and so FTC II applies on any closed interval

[a, b] as long as a > 0. By the Chain Rule (Theorem 6.5), we need for φ(x) to be differentiable

on an interval I with φ(I) ⊂ (0,∞), the domain where f is differentiable. Therefore, we need

to choose I with φ(x) = lnx ∈ (0,∞), so any I ⊂ (1,∞) will do. Hence, h is differentiable

in (1,∞), with

h′(x) = f ′(φ(x))φ′(x) =ln(lnx)

x, x ∈ (1,∞).

72

Notes on Real Analysis, 3A03 · (D)If Sis unbounded below, we write inf S= 1 . Returning to Example...

Documents

Transcript of Notes on Real Analysis, 3A03 · (D)If Sis unbounded below, we write inf S= 1 . Returning to Example...