Introduction, Motivation, Basics About Sets, Functions, Counting

216
Lectures on Advanced Calculus with Applications, I audrey terras Math. Dept., U.C.S.D., La Jolla, CA 92093-0112 November, 2010 Part I Introduction, Motivation, Basics About Sets, Functions, Counting 1 Preface These notes come from various courses that I have taught at U.C.S.D. using Serge Lang’s Undergraduate Analysis as the basic text. My lectures are an attempt to make the subject more accessible. Recently Rami Shakarchi published Problems and Solutions for Undergraduate Analysis, which provides solutions to all the problems in Lang’s book. This caused me to collect my own exercises which are included. Exams are also to be found. The main difference between the approach of Lang and that of other similar books is the treatment of the integral which emphasizes the properties of the integral as a linear function from the set of piecewise continuous real valued functions on an interval to the real numbers. Thus the approach can be viewed as intermediate between the Riemann integral and the Lebesgue integral. Since we are interested mainly in piecewise continuous functions, we are really getting the Riemann integral. In these lectures we include more pictures and examples than the usual texts. Moreover, we include less definitions from point set topology. Our aim is to make sense to an audience of potential high school math teachers, or economists, or engineers. We did not write these lectures for potential math. grad students. We will always try to include examples, pictures and applications. Applications will include Fourier analysis, fractals, .... Warning to the reader: This course is to calculus as fixing a car is to driving a car. Moreover, sometimes the car is invisible because it is an infinitesimal car or because it is placed on the road at infinity. It is thus important to ask questions and do the exercises. A Suggestion: You should treat any mathematics course as a language course. This means that you must be sure to memorize the definitions and practice the new vocabulary every day. Form a study group to discuss the subject. It is always a good idea to look at other books too; in particular, your old calculus book. Another Warning: Also, beware of typos. I am a terrible proof reader. Your calculus class was probably one that would have made sense to Newton and Leibniz in the 1600s. However, that turned out not to be sufficient to figure out complicated problems. The basic idea of the real numbers was missing as well as a real understanding of the concept of limit. This course starts with the foundations that were missing in your calculus course. You may not see why you need them at first. Don’t be discouraged by that. Persevere and you will get to derivatives and integrals. We will assume that you know the basics of proofs, sets. Other References: Hans Sagan, Advanced Calculus Tom Apostol, Mathematical Analysis Dym & McKean, Fourier Series and Integrals 1.1 Some History Around the early 1800’s Fourier was studying heat flow in wires or metal plates. He wanted to model this mathematically and came up with the heat equation. Suppose that we have a wire stretched out on the x-axis from x =0 to x =1. Let 1

Transcript of Introduction, Motivation, Basics About Sets, Functions, Counting

Page 1: Introduction, Motivation, Basics About Sets, Functions, Counting

Lectures on Advanced Calculus with Applications, I

audrey terrasMath. Dept., U.C.S.D., La Jolla, CA 92093-0112

November, 2010

Part I

Introduction, Motivation, Basics About Sets,Functions, Counting1 Preface

These notes come from various courses that I have taught at U.C.S.D. using Serge Lang’s Undergraduate Analysis as thebasic text. My lectures are an attempt to make the subject more accessible. Recently Rami Shakarchi published Problemsand Solutions for Undergraduate Analysis, which provides solutions to all the problems in Lang’s book. This caused me tocollect my own exercises which are included. Exams are also to be found. The main difference between the approach of Langand that of other similar books is the treatment of the integral which emphasizes the properties of the integral as a linearfunction from the set of piecewise continuous real valued functions on an interval to the real numbers. Thus the approachcan be viewed as intermediate between the Riemann integral and the Lebesgue integral. Since we are interested mainly inpiecewise continuous functions, we are really getting the Riemann integral.In these lectures we include more pictures and examples than the usual texts. Moreover, we include less definitions

from point set topology. Our aim is to make sense to an audience of potential high school math teachers, or economists,or engineers. We did not write these lectures for potential math. grad students. We will always try to include examples,pictures and applications. Applications will include Fourier analysis, fractals, ....Warning to the reader: This course is to calculus as fixing a car is to driving a car. Moreover, sometimes the car is

invisible because it is an infinitesimal car or because it is placed on the road at infinity. It is thus important to ask questionsand do the exercises.A Suggestion: You should treat any mathematics course as a language course. This means that you must be sure

to memorize the definitions and practice the new vocabulary every day. Form a study group to discuss the subject. It isalways a good idea to look at other books too; in particular, your old calculus book.Another Warning: Also, beware of typos. I am a terrible proof reader.Your calculus class was probably one that would have made sense to Newton and Leibniz in the 1600s. However, that

turned out not to be sufficient to figure out complicated problems. The basic idea of the real numbers was missing as well as areal understanding of the concept of limit. This course starts with the foundations that were missing in your calculus course.You may not see why you need them at first. Don’t be discouraged by that. Persevere and you will get to derivatives andintegrals. We will assume that you know the basics of proofs, sets.Other References:Hans Sagan, Advanced CalculusTom Apostol, Mathematical AnalysisDym & McKean, Fourier Series and Integrals

1.1 Some History

Around the early 1800’s Fourier was studying heat flow in wires or metal plates. He wanted to model this mathematicallyand came up with the heat equation. Suppose that we have a wire stretched out on the x-axis from x = 0 to x = 1. Let

1

Page 2: Introduction, Motivation, Basics About Sets, Functions, Counting

u(x, t) represent the temperature of the wire at position x and time t. The heat equation is the PDE below, for t > 0and 0 < x < 1:

∂u

∂t= c

∂2u

∂t2.

Here c is a positive constant depending on the metal. If you are given an initial heat distribution f(x) on the wire at time0, then we have the initial condition: u(x, 0) = f(x) also.Fourier plugged in the function u(x, t) = X(x)T (t) and found that to for the solution to satisfy the initial condition he

needed to express f(x) as a Fourier series:

f(x) =∞∑

n=−∞ane

2πinx. (1)

Note that eix = cosx + isinx, where i = (−1)1/2 (which is not a real number). This means you can rewrite the series ofcomplex exponentials as 2 series - one involving cosines and the other involving sines. Fourier made the claim that anyfunction f(x) has such an expression as a sum of cnsin(nx) and dncos(nx). People took issue with this although they didbelieve in power series expressions of functions (Taylor series and Laurent expansions). But the conditions under which suchseries converge to the function were really unclear when Fourier first worked on the subject.Fourier tells us that the Fourier coefficients are

an =

1∫0

f(y)e−2πinydy. (2)

If you believe that it is legal to interchange sum and integral, then a bit of work will make you believe this, but unfortu-nately, that isn’t always legal when f is a bad guy. This left mathematicians in an uproar in the early 1800’s. And it tookat least 50 years to bring some order to the subject.Part of the problem was that in the early 1800’s people viewed integrals as antiderivatives. And they had no precise

meaning for the convergence of a series of functions of x such as the Fourier series above. They argued a lot. They wouldnot let Fourier publish his work until many years had passed. False formulas abounded. Confusion reigned supreme. So thiscourse was invented. We won’t have time to go into the history much, but it is fascinating. Bressoud, A Radical Approachto Real Analysis, says a little about the history. Another reference is Grattan-Guinness and Ravetz, Joseph Fourier. Stillanother is Lakatos, Proofs and Refutations.We will end up with a precise formulation of Fourier’s theorems. And we will be able to do many more things of interest

in applied mathematics. In order to do all this we need to understand what the real numbers are, what we mean by the limitof a sequence of numbers or of a sequence of functions, what we mean by derivatives and integrals. You may think that youlearned this in calculus, but unless you had an unusual calculus class, you just learned to compute derivatives and integralsnot so much how to prove things about them.Fourier series (and integrals) are important for all sorts of things such as analysis of time series, looking for periodicities.

The finite version leads to a computer algorithm called the fast Fourier transform, which has made it possible to do thingssuch as weather prediction in a reasonable amount of time. Matlab has a nice demo of the search for periodicities. Wemodified it in our book Fourier Analysis on Finite Groups and Applications to look for periodicities in LA yearly rainfall.The first answer I found was 12.67 years. See p. 159 of my book. Another version leads to the number 28.75 years.

2 Why Analysis? Some Motivation and a Look Forward

Almost any applied math. problem leads to an analysis question. Look at any book on mathematical methods of physicsand engineering. There are also many theoretical problems in computer science that lead to analysis questions. The samecan be said of economics, chemistry and biology. Here we list a few examples. We do not give all the details. The idea isto get a taste of such problems.Example 1. Population Growth Model - The Logistic Equation.References.I. Stewart, Does God Play Dice? The Mathematics of Chaos, p. 155.J. T. Sandefur, Discrete Dynamical Systems.

2

Page 3: Introduction, Motivation, Basics About Sets, Functions, Counting

Define the logistics function Lk(x) = kx(1 − x), for x ∈ [0, 1]. Here k is a fixed real number with 0 < k < 4. Letx0 ∈ [0, 1] be fixed. Form a sequence

x0, x1 = Lk(x0), x2 = Lk(x1), · · · , xn = Lk(xn−1), · · ·Question: What happens to xn as n →∞ ?The answer depends on k. For k near 0 there is a limit. For k near 4 the behavior is chaotic. Our course should give us

the tools to solve this sort of problem. Similar problems come from weather forecasting, orbits of asteroids. You can putthese problems on a computer to get some intuition. But you need analysis to prove that you intuition is correct (or not).

Example 2. Central Limit Theorem in Probability and Statistics.References.Feller, Probability TheoryDym and McKean, Fourier Series and Integrals, p. 114Terras, Harmonic Analysis on Symmetric Spaces and Applications, Vol. I

Where does the bell shaped curve originate?

Figure 1: normal curve e−πx2

,−∞ < x <∞

The central limit theorem is the main theorem in probability and statistics. It is the foundation for the chi-squared test.In the language of probability, it says the following.Central Limit Theorem I. Let Xn be a sequence of independent identically distributed random variables with density

f(x) normalized to have mean 0 and standard deviation 1. Then, as n→∞, the normalized sum of these variables

X1 + · · ·+Xn√n

→n→∞ the normal distribution with density G(x), where G(x) =

1√2πe−x

2/2.

Here ”→ ” means approaches. To translate this into analysis, we need a definition.

Definition 1 For integrable functions f and g:R→ R, define the convolution f ∗ g ( f "splat" g) to be

(f ∗ g) =∞∫

−∞f(y)g(x− y)dy.

3

Page 4: Introduction, Motivation, Basics About Sets, Functions, Counting

Then we have the analysis version of the central limit theorem.Central Limit Theorem II. Suppose that f : R → [0,∞) is a probability density normalized to have mean 0 and

standard deviation 1. This means that

∞∫−∞

f(x)dx = 1,

∞∫−∞

xf(x)dx = 0,

∞∫−∞

x2f(x)dx = 1.

Then we have the following limit as n→∞b√n∫

a√n

(f ∗ · · · ∗︸ ︷︷ ︸ fn

)(x)dx →n→∞

1√2π

b∫a

e−x2/2dx.

Example 3. Surprising Formulas.a) Riemann zeta function.References.Lang, Undergraduate AnalysisEdwards, Riemann’s Zeta FunctionTerras, Harmonic Analysis on Symmetric Spaces and Applications, Vol. I

Definition 2 The Riemann zeta function ζ(s) is defined for s > 1 by

ζ(s) =∑n≥1

n−s.

Euler proved ζ(2) = π2/6 and similar formulas for ζ(2n), n = 1, 2, 3, .... .

b) Gamma Function.References.Lang, Undergraduate AnalysisEdwards, Riemann’s Zeta FunctionTerras, Harmonic Analysis on Symmetric Spaces and Applications, Vol. I

Definition 3 For s > 0, define the gamma function by Γ(s) =

∞∫0

e−yys−1dy.

Then we can get n factorial from gamma:

Γ(n+ 1) = n! = n(n− 1)(n− 2) · · · 1.

Another result says

Γ(1/2) =√π.

c) Theta Function.References.Lang, Undergraduate AnalysisTerras, Harmonic Analysis on Symmetric Spaces and Applications, Vol. IOne of the Jacobi identities says that for t > 0, we have

θ(t) =∞∑

n=−∞e−πt

2

=

√π

tθ(1

t).

4

Page 5: Introduction, Motivation, Basics About Sets, Functions, Counting

This is a rather unexpected formula - a hidden symmetry of the theta function. It implies (as Riemann showed in apaper published in 1859) that the Riemann zeta function also has a symmetry, relating ζ(s) with ζ(1− s).

d) Famous Inequalities.i) Cauchy-Schwarz InequalityReference.Lang, Undergraduate Analysis

Suppose that V is a vector space such as Rn with a scalar product < v,w >=n∑i=1

viwi, if vi denotes the ith coordinate

of v in Rn. Then the length of v is ‖v‖ = √< v, v >. The Cauchy-Schwarz inequality says

| < v,w > | ≤ ‖v‖ ‖w‖ . (3)

This inequality implies the triangle inequality

‖v + w‖ ≤ ‖v‖+ ‖w‖ ,which says that the sum of the lengths of 2 sides of a triangle is greater than or equal to the length of the third side.

Figure 2: sum of vectors in the plane

The inequality of Cauchy-Schwarz is very general. It works for any inner product space V - even one that is infinitedimensional such as V = C[0, 1], the space of continuous real-valued functions on the interval [0, 1]. Here the inner productfor f, g ∈ V is

< f, g >=

1∫0

f(x)g(x)dx.

In this case, Cauchy-Schwarz says ⎧⎨⎩1∫0

f(x)g(x)dx

⎫⎬⎭2

≤1∫0

f(x)2dx

1∫0

g(x)2dx. (4)

Amazingly the same proof works for inequality (3) as for inequality (4).ii) The Isoperimetric Inequality.Reference.Dym and McKean, Fourier Series and IntegralsThis inequality is related to Queen Dido’s problem which is to maximize the area enclosed by a curve of fixed length. In

800 B.C., as recorded in Virgil’s Aeneid, Queen Dido wanted to buy land to found the ancient city of Carthage. The locals

5

Page 6: Introduction, Motivation, Basics About Sets, Functions, Counting

would only sell her the amount of land that could be enclosed with a bull’s hide. She cut the hide into narrow strips andthen made a long strip and used it to enclose a circle (actually a semicircle with one boundary being the MediterraneanSea).The isoperimetric inequality says that if A is the area enclosed by a plane curve and L is the length of the curve enclosing

this area,4πA ≤ L2.

Moreover, equality only holds for the circle which maximizes A for fixed L.

Figure 3: curve in plane of length L enclosing area A

6

Page 7: Introduction, Motivation, Basics About Sets, Functions, Counting

iii) Heisenberg Inequality and the Uncertainty Principle.References.Dym and McKean, Fourier Series and IntegralsTerras, Harmonic Analysis on Symmetric Spaces and Applications, Vol. I, p. 20Quantum mechanics says that you cannot measure position and momentum to arbitrary precision at the same time. The

analyst’s interpretation goes as follows. Suppose f : R→ R and define the Fourier transform of f to be

f̂(w) =

∞∫−∞

f(t)e−2πitwdt.

Here i =√−1 and eiθ = cos θ + i sin θ.

Now, suppose that we have the following facts

∞∫−∞

|f(t)|2 dt = 1,∞∫

−∞t |f(t)|2 dt = 0,

∞∫−∞

w∣∣∣f̂(w)∣∣∣2 dw = 0.

Then we have the uncertainty inequality

∞∫−∞

t2 |f(t)|2 dt∞∫

−∞w2∣∣∣f̂(w)∣∣∣2 dw ≥ ( 1

)2.

The integral over t measures the square of the time duration of the signal f(t) and the integral over w measures the squareof the frequency spread of the signal. The uncertainty inequality can be shown to be equivalent to the following inequalityinvolving the derivative of f rather than the Fourier transform of f :

∞∫−∞

t2 |f(t)|2 dt∞∫

−∞|f ′(u)|2 du ≥ 1

4.

This completes our quick introduction to some famous problems of analysis. Hopefully we will manage to investigatemost of them in more detail. But next to set theory.

3 Set Theory and Functions

G. Cantor (1845-1918) developed the theory of infinite sets. It was controversial. There are paradoxes for those who throwcaution to the winds and consider sets whose elements are sets. For example, consider Russell’s paradox. It was statedby B. Russell (1872-1970). We use the notation: x ∈ S to mean that x is an element of the set S; x /∈ S meaning x is notan element of the set S. The notation {x|x has property P} is read as the set of x such that x has property P . Considerthe set X defined by

X = {sets S|S /∈ S}.Then X ∈ X implies X /∈ X and X /∈ X implies X ∈ X. This is a paradox. The set X can neither be a member ofitself nor not a member of itself. There are similar paradoxes that sound less abstract. Consider the barber who must shaveevery man in town who does not shave himself. Does the barber shave himself? A mystery was written inspired by theparadox: The Library Paradox by Catharine Shaw. There is also a comic book about Russell, Logicomix by A. Doxiadisand C. Papadimitriou.We will hopefully avoid paradoxes by restricting consideration to sets of numbers, vectors, functions. This would not be

enough for "constructionists" such as E. Bishop, once at U.C.S.D. Anyway for applied math., one can hope that paradoxicalsets and barbers do not appear.Most books on calculus do a little set theory. We assume you are familiar with the notation. Let’s do pictures in

the plane. We write A ⊂ B if A is a subset of B; i.e., x ∈ A implies x ∈ B. If A ⊂ B, the complement of A inB is B − A = {x ∈ B |x /∈ A} . The empty set is denoted ∅. It has no elements. The intersection of sets A and B isA∩B = {x|x ∈ A and x ∈ B}. The union of sets A and B is A∪B = {x|x ∈ A or x ∈ B}. Here or means either or both.See Figure 4.

7

Page 8: Introduction, Motivation, Basics About Sets, Functions, Counting

Figure 4: intersection and union

8

Page 9: Introduction, Motivation, Basics About Sets, Functions, Counting

Definition 4 If A and B are sets, the Cartesian product of A and B is the set of ordered pairs (a, b) with a ∈ A andb ∈ B; i.e.,

A×B = {(a, b)|a ∈ A, b ∈ B}.

Example 1. Suppose A and B are both equal to the set of all real numbers; A = B = R. Then A×B = R×R = R2.That is the Cartesian product of the real line with itself is the set of points in the plane.

Example 2. Suppose C is the interval [0, 1] and D is the set consisting of the point {2}. Then C×D is the line segmentof length 1 at height 2 in the plane. See Figure 5 below.

Figure 5: The Cartesian product [0, 1]× {2}.

Example 3. [0, 1]× [0, 1]× [0, 1] = [0, 1]3 is the unit cube in 3-space. See Figure 6.

Example 4. [0, 1]× [0, 1]× [0, 1]× [0, 1] = [0, 1]4 is the 4-dimensional cube or tesseract. Draw it by "pulling out" the3-dimensional cube. See T. Banchoff, Beyond the Third Dimension. Figure 7 below shows the edges and vertices of the4-cube (actually more of a 4-rectangular solid) as drawn by Mathematica.

9

Page 10: Introduction, Motivation, Basics About Sets, Functions, Counting

Figure 6: [0, 1]3

10

Page 11: Introduction, Motivation, Basics About Sets, Functions, Counting

Figure 7: [0, 1]4

11

Page 12: Introduction, Motivation, Basics About Sets, Functions, Counting

Next we recall the definitions of functions which will be extremely important for the rest of these notes.

Definition 5 A function (or mapping or map) f maps set A into set B means that for every a ∈ A there is a uniqueelement f(a) ∈ B. The notation is f : A→ B.

Definition 6 The function f : A→ B is one-to-one (1-1 or injective) if and only if f(a) = f(a′) implies a = a′.

Definition 7 The function f : A→ B is onto (or surjective) if and only if for every b ∈ B, there exists a ∈ A such thatf(a) = b.

A function that is 1-1 and onto is also called bijective or a bijection. Given a function f : A → B, we can draw agraph consisting of points (x, f(x)), for all x ∈ A. It is a subset of the Cartesian product A× B. Equivalently a functionf : A→ B can be viewed as a subset F of A×B such that (a, b) and (a, c) in F implies b = c.

Notation. From now on I will use the abbreviations:iff if and only if∀ for every∃ there existss.t. such that→ approaches (in the limit to be defined later in gory detail).= the thing on the left of .= is defined to be the thing on the right of .=

Example 1. Suppose A = B = [0, 1]. A subset which is not a function is the square wave pictured below. This is abad function for calculus since there are infinitely many values of the function at 2 points in the interval. We can make thesquare wave into a function by collapsing the 2 vertical lines to points.

Figure 8: not a function

Example 2. Consider the logistic map L(x) = 3x(1−x) for x ∈ [0, 1]. If we consider L as a map from [0, 1] into [0, 1],we see from the figure below that L is neither 1-1 nor onto. It is not 1-1 since there are 2 points a, a′ ∈ [0, 1] such thatf(a) = f(a′) = 1/2. It is not onto since there is no point a ∈ [0, 1] such that f(a) = 0.9.

12

Page 13: Introduction, Motivation, Basics About Sets, Functions, Counting

Figure 9: The logistic map f(x) = 3x(x− 1).

13

Page 14: Introduction, Motivation, Basics About Sets, Functions, Counting

Definition 8 Suppose f : A→ B and g : B → C. Then the composition g◦f : A→ C is defined by (g ◦ f) (x) = g(f(x)),for all x ∈ A.

It is easily seen (and the reader should check) that this operation is associative; i.e., f ◦(g◦h) = (f ◦g)◦h. However thisoperation is not commutative in general; i.e., f ◦ g �= g ◦ f usually. For example, consider f(x) = x2 and g(x) = x+1,for x ∈ R.There is a right identity for the operation of composition of functions. If f : A → B and IA(x) = x, ∀x ∈ A, then

f ◦ IA = f. Similarly IB is a left identity for f ; i.e., IB ◦ f = f.

Definition 9 If f : A→ B is 1-1 and onto, it has an inverse function f−1 :→ A defined by requiring f ◦ f−1 = IB andf−1 ◦ f = IA. If f(a) = b, then f−1(b) = a.

Exercise 10 a) Prove that if f : A→ B is 1-1 and onto, it has an inverse function f−1.b) Conversely show that if f has an inverse function, then f must be 1-1 and onto.

Example 1. Let f(x) = x2 map [0,∞) onto [0,∞). Then f is 1-1 and onto with the inverse function f−1(x) = √x = x1/2.Here of course we take the non-negative square root of x ≥ 0. It is necessary to restrict f to non-negative real number inorder for f to be 1-1.Example 2. Define f(x) = ex. Then f maps (−∞,∞) 1-1 onto (0,∞). The inverse function is f−1(x) = log x = loge x.

It is only defined for positive x. We discuss these functions in more detail later.

When you draw the graph of f−1, you just need to reflect the graph of f across the line y = x.

4 Mathematical Induction

Notation:Z+ {1, 2, 3, 4, ...} the positive integersZ {0,±1,±2,±3, ...} the integersR (−∞,+∞) real numbers

We will assume that you are familiar with the integers as far as arithmetic goes. They satisfy most of the axioms thatwe will list later for the real numbers. In particular, Z is closed under addition and multiplication (also subtraction butnot division). This means n,m ∈ Z implies n +m,n −m and n ∗m are all unique elements of Z . Moreover, one has anidentity for +, namely 0, an identity for *, namely 1. Addition and multiplication are associative and commutative. Thereis an additive inverse in Z for every n ∈ Z namely −n. But unless n = ±1, there is no multiplicative inverse for n in Z.One thing that differentiates Z from the real numbers R is the following axiom. Moreover there is an ordering of Z which

behaves well with respect to addition and multiplication. We will list the order axioms later, with one exception.

Axiom 11 The Well Ordering Axiom. If S ⊂ Z+, and S �= ∅, then S has a least element a ∈ S such that a ≤ x,∀x ∈ S.

This axiom says that any non-empty set of positive integers has a least element. We usually call such a least element aminimum. By an axiom, we mean that it is a basic unproved assumption.G. Peano (1858-1932) wrote down the 5 Peano Postulates (or axioms) for the natural numbers Z+ ∪ {0}. We won’t list

them here. See, for example, Birkhoff and MacLane, Survey of Modern Algebra. Once one has these axioms it would benice to show that something exists satisfying the axioms. We will not do that here, feeling pretty confident that you believeZ exists.The most important fact about the well ordering axiom is that it is equivalent to mathematical induction.Domino Version of Mathematical Induction. Given an infinite line of equally spaced dominos of equal dimensions

and weight, in order to knock over all the dominos by just knocking over the first one in line, we should make sure that thenth domino is so close to the (n+1)st domino that when the nth domino falls over, it knocks over the (n+1)st domino. SeeFigure 10.

14

Page 15: Introduction, Motivation, Basics About Sets, Functions, Counting

Figure 10: An infinite line of equally spaced dominos. If the nth domino is close enough to knock over the n+1st domino,then once you knock over the 1st domino, they should all fall over.

Translating this to theorems, we get the followingPrinciple of Mathematical Induction I.Suppose you want to prove an infinite list of theorems Tn, n = 1, 2, .... It suffices to do 2 things.1) Prove T1.2) Prove that Tn true implies Tn+1 true for all n ≥ 1.

Note that this works by the well ordering axiom. If S = {n ∈ Z+|Tn is false}, then either S is empty or S has a leastelement q. But we know q > 1 by the fact that we proved T1. And we know that Tq−1 is true since q is the least element ofS. But then by 2) we know Tq−1 implies Tq, contradicting q ∈ S.

Example 1. Tn is the formula used by Gauss as a youth to confound his teacher:

1 + 2 + · · ·+ n = n(n+ 1)

2, n = 1, 2, 3, ....

We follow our procedure.First, prove T1. 1 = 1(2)

2 . Yes, that is certainly true.Second, assume Tn and use it to prove Tn+1, for n = 1, 2, 3, .....

1 + 2 + · · ·+ n = n(n+ 1)

2

Add the next term in the sum, namely, n+ 1, to both sides of the equation:

n+ 1 = n+ 1

Obtain

1 + 2 + · · ·+ n+ (n+ 1) = n(n+ 1)

2+ (n+ 1)

and finish by noting that

n(n+ 1)

2+ (n+ 1) = (n+ 1)

(n2+ 1)= (n+ 1)

(n+ 2

2

),

which gives us formula Tn+1.

Note: I personally find this proof a bit disappointing. It does not seem to reveal the underlying reason for the truthof such a formula and it requires that you believe in mathematical induction. Of course there are many other proofs of thissort of thing. For example, look at

15

Page 16: Introduction, Motivation, Basics About Sets, Functions, Counting

1 + 2 + · · ·+ n

n+ (n− 1) + · · ·+ 1When you add corresponding terms you always get n+ 1. There are n such terms. Thus twice our sum is n(n+ 1).Example 2. The formula relating n! and Gamma.Assume the definition of the gamma function given in the preceding section of these notes makes sense:

Γ(s) =

∞∫0

e−tts−1dt, for s > 0.

We want to prove that for n=0,1,2,..., we haveFormula Gn : Γ(n+ 1) = n! = n(n− 1)(n− 2) · · · 2 · 1.Here 0! is defined to be 1.

Proof. by Induction.Step 1. Check G0.

Γ(0 + 1) =

∞∫0

e−tt1−1dt =

∞∫0

e−tdt = −e−t∣∣∞t=0

= 1− 0 = 0!.

Step 2. Check that Gn implies Gn+1.Assume Gn which is

n! = Γ(n+ 1) =

∞∫0

e−ttndt.

Recall integration by parts which says that ∫udv = uv −

∫vdu.

Set u = e−t and dv = tn. Then du = −e−tdt and v = 1n+1 t

n+1. Plug this into the integration by parts formula and get

n! = Γ(n+ 1) =

∞∫0

e−ttndt =1

n+ 1e−ttn+1

∣∣∣∣∞t=0

+1

n+ 1

∞∫0

e−ttn+1dt

= 0 +1

n+ 1Γ(n+ 2).

This says (n+ 1)n! = Γ(n+ 2), which is formula Gn+1. This completes our induction proof.

There is also a second induction principle. See Lang, p. 10. You should be able to translate it into something youbelieve about dominoes. Of course, you have never really seen or been able to draw an infinite collection of dominos. Youmight want to try to draw a domino picture for the second form of mathematical induction.

16

Page 17: Introduction, Motivation, Basics About Sets, Functions, Counting

5 Finite and Denumerable or Countable Sets

Definition 12 A set S is finite with n elements iff there is a 1-1, onto map

f : S → {1, 2, 3, ..., n}.

Write |S| = n if this is the case.

Examples.1) The empty set ∅ is a finite set with 0 elements.2) If set A has n elements, set B has m elements, and A ∩B = ∅, then A ∪B has n+m elements.

Proposition 13 Properties of |S| for finite sets S, T .1) T ⊂ S implies |T | ≤ |S| .2) |S ∪ T | ≤ |S|+ |T | .3) |S ∩ T | ≤ min{|S| , |T |}.4)|S × T | = |S| · |T | .5) Define TS = {f : S → T}. Then ∣∣TS∣∣ = |T ||S| .Discussion.These facts are fairly simple to prove. It is elementary combinatorics. For example, we sketch a proof of property 5).Let S = {s1, ..., sn} and f ∈ TS means we can write f(si) = ti, with ti ∈ T, for i = 1, 2, ..., n . Then f ∈ TS corresponds

to the element (t1, ..., tn) ∈ T × · · · × T︸ ︷︷ ︸n

. This correspondence is 1-1, onto.

Definition 14 A set S is denumerable (or countable and infinite) iff there is a 1-1, onto map f : Z+ → S.

Examples.1) Z=the set of all integers is denumerable.2) 2Z=the set of even integers is denumerable.

Corresponding statements for infinite sets to those of the preceding proposition defy intuition. Cantor’s set theoryboggled many minds. Luckily we only need to note a few things from Cantor’s theory. Later we will have a new way tothink about the size of infinite sets. For example length of an interval or area of a region in the plane or volume of a regionin 3-space.

Proposition 15 Facts About Denumerable Sets.Fact 1) a) If S is a denumerable set, then there exists a proper subset T ⊂ S, proper meaning T �= S, such that there is

a bijection f : S → T.Fact 1b) Any infinite subset of a denumerable set is also denumerable.Fact 2) If sets S and T are denumerable, then so is the Cartesian product S × T.Fact 3) If {Sn}n≥1 is a sequence of denumerable sets, then the union

⋃n≥1

Sn is denumerable.

Fact 4) The following sets are all denumerable:Z+=the positive integersZ=the integers2Z=the even integersZ-2Z=the odd integersQ={mn

∣∣m,n ∈ Z, n �= 0}= the rational numbersFact 5) The set R of all real numbers is not denumerable. Here we view R as the set of all decimal expansions. We

will have more to say about it in the next section.

17

Page 18: Introduction, Motivation, Basics About Sets, Functions, Counting

Proof. (See the stories after this proof for a more amusing way to see these facts).Fact 1a) Suppose that h:Z+ → S is 1-1,onto. Write h(n) = sn, for n = 1, 2, 3, ..... That is we can think of S as a

sequence {sn}n≥1 with the property that sn = sm implies n = m. So let T = {s2, s3, s4, ...} = S − {s1} . That is, takeone element s1 out of S. Define the map g:Z+ → T by g(n) = sn+1. It should be clear that g is 1-1,onto. Thus T isdenumerable.Fact 1b) Suppose that T is an infinite subset of the denumerable set S. Then writing S as a sequence as in 1a, we have

S = {s1, s2, s3, s4, ...} and T = {sk1 , sk2 , sk3, ...}, with k1 < k2 < k3 < · · · < kn < kn+1 < · · · . We can think of T as asubsequence of S and then mapping h:Z+ → S is defined by h(n) = skn , for all n ∈ Z+. Again it should be clear that h is1-1 and onto.Fact 2) Let S = {sn}n≥1 and T = {tn}n≥1. Then S × T = {(sn, tm)}(n,m)∈Z+×Z+ . So we have a 1-1, onto map

f :Z+ × Z+ → S × T defined by f(n,m) = (sn, tm). Thus to prove 2) we need only show that there is a 1-1, onto mapg:Z+ → Z+ ×Z+. For then f ◦ g is 1-1 from Z+ onto S × T. We will do this as follows by lining up the points with positiveinteger coordinates in the plane using the arrows as in Figure 11.

Figure 11: The arrows indicate the order to enumerate points with positive integer coordinates in the plane.

Start off at (1, 1) and follow the arrows. Our list is

(1, 1), (2, 1), (1, 2), (3, 1), (2, 2), (1, 3), (4, 1), (3, 2), (2, 3), (1, 4), ....

The general formula for the function g−1 : Z+ × Z+ → Z+ is

g(m,n) =(m+ n− 2)(m+ n− 1)

2+ n.

Why? See Sagan, Advanced Calculus, p. 51.The triangle with (m,n) in the line is depicted in Figure 12.The triangle has 1 + 2 + 3 + · · · + [(m + n) − 2] = (m+n−2)(m+n−1)

2 terms. This is the number of terms before(m + n − 1, 1). Then to get to (m,n) you have to add n to this. Lang, Undergraduate Analysis, p. 13, uses a differentfunction g(m,n) = 2m3n. This does not map onto Z+.Fact 3) Let Sm = {sm1, sm2, sm3, ...}. Then

⋃m≥1

Sm = {sm,n|m,n ∈ Z+} . We can use arguments similar to those we

just found to see that this set is denumerable. Of course we need to be careful since the map defined by f(m,n) = sm,n form,n ∈ Z+ may not be 1-1.

18

Page 19: Introduction, Motivation, Basics About Sets, Functions, Counting

Figure 12: In the list to the right of the triangle we count the number of points with positive integer coordinates on eachdiagonal. The last diagonal goes through the point (m,n).

Fact 4) These examples follow fairly easily from 1), 2) and 3). We will let the reader fill in the details.Fact 5) Here we useCantor’s diagonal argument (Sagan, Advanced Calculus, p. 53) to see that the set of real numbers

R is not denumerable. This is a proof by contradiction. We will assume that R is denumerable and deduce a contradiction.In fact, we look at the interval [0, 1] and show that even this proper subset of R is not denumerable. We can represent realnumbers in [0, 1] by decimal expansions like .12379285..... By this, we mean 1

10 +2100 +

31000 +

710000 +

9100000 + · · · .

In general a real number in [0, 1] has the decimal representation α = .a1a2a3a4 · · · , with ai ∈ {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}.Thus if [0, 1] were denumerable, we’d have a list of decimals including every element of [0, 1] exactly once as follows:

α1 = .a11a12a13a14 · · ·α2 = .a21a22a23a24 · · ·α3 = .a31a32a33a34 · · ·

· · · · · · · · · · · · · · · · · · · · · · · · · · ·· · · · · · · · · · · · · · · · · · · · · · · · · · ·· · · · · · · · · · · · · · · · · · · · · · · · · · ·

Here aj ∈ {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}.Look now at the stuff on the diagonal in this list. Then define a new decimal β = .b1b2b3b4 · · · by forcing it to differ

from all of the αj . This is done by defining

bj =

{0, if ajj �= 01, if ajj = 0.

We have thus found an element β ∈ [0, 1] which is not on our great list. That is, we have found a contradiction to theexistence of such a list. Why isn’t β on the list? Well β cannot be equal to any αj since the jth digit in β, namely bj doesnot equal the jth digit in αj , which is ajj .Moral: There are lots more real numbers than rational numbers.

19

Page 20: Introduction, Motivation, Basics About Sets, Functions, Counting

6 Stories about the Infinite Motel - Interpretation of the Facts about De-numerable Sets

Reference: N. Ya Vilenkin, Stories about Sets.Consider the mega motel of the galaxy with rooms labelled by the positive integers. See Figure 13. This motel extended

across almost all of the galaxies.

Figure 13: The megamotel of the galaxy with denumerably many rooms.

A traveller arrived at the motel and saw that it was full. He began to be worried as the next galaxy was pretty far away."No problem," said the manager and proceeded to move the occupant of room rn to room rn+1 for all n = 1, 2, 3, .... Thenroom r1 was vacant and the traveller was given that room. See Figure 14.Moral: If you add or subtract an element from a denumerable set, you still have a denumerable set.A few days later a denumerably infinite number of bears showed up at the motel which was still full. The angry bears

began to growl. But the manager did not worry. He moved the guest in room rn into room r2n, for all n = 1, 2, 3, ....This freed up the odd numbered rooms for the bears. So bear bn was placed in room r2n−1, for all n = 1, 2, 3, .... .Moral. A union of 2 denumerable sets is denumerable.The motel was part of a denumerable chain of denumerable motels. Later, when the galactic economy went into a

depression, the chain closed all motels but 1. All the motels in the chain were full and the manager of the mega motelwas told to find rooms for all the guests from the infinite chain of denumerable motels. This motel manager showed hiscleverness again as Figure 15 indicates. He listed the rooms in all the hotels in a table so that hotel i has guests roomsri,1, ri,2, ri,3, ri,4, .... Then, in a slightly different manner from the proof of Fact 2 in Proposition 15, he twined a red threadthrough all the rooms, lining up the guests so that he could put them into rooms in his motel.Moral: The Cartesian product of 2 denumerable sets is denumerable.But the next part of the story concerns a defeat of this clever motel manager. The powers that be in the commission of

cosmic motels asked the manager to compile a list of all the ways in which the rooms of his motel could be occupied. This

20

Page 21: Introduction, Motivation, Basics About Sets, Functions, Counting

Figure 14: A traveler t arrives at the full hotel. Guest gn is moved to room rn+1 and the traveler t is put in room r1.

21

Page 22: Introduction, Motivation, Basics About Sets, Functions, Counting

Figure 15: The manager of the mega motel has to put the guests from the entire chain of denumerable motels into his motel.He runs a red thread through the rooms to put the guests in order and thus into 1-1 correspondence with Z+ and the roomsin his motel.

22

Page 23: Introduction, Motivation, Basics About Sets, Functions, Counting

list was supposed to be an infinite table. Each line of the table was to be an infinite sequence of 0’s and 1’s. At the nthposition there would be a 1 if room rn were occupied and a 0 otherwise. For example, the sequence 0000000000000 ......would represent an empty motel. The sequence 1010101010101010........ would mean that the odd rooms were occupiedand the even rooms empty.The proof of Fact 5 in Proposition 15 (Cantor’s diagonal argument) shows that this list is incomplete. For suppose the

table is

α1 = a11a12a13 · · · a1n · · ·α2 = a21a22a23 · · · a2n · · ·α3 = a31a32a33 · · · a3n · · ·

............

αn = an1an2an3 · · · ann · · ·............

Now define β = b1b2b3 · · · with bi ∈ {0, 1} by saying bn = 1, if ann = 0, and bn = 0, if ann = 1. Then β cannotbe in our table at any row. For the nth entry in β cannot equal the nth entry in αn, for any n.So the set of all ways of occupying the motel is not denumerable. The motel manager failed this time.Moral. The set of all sequences of 0s and 1s is not denumerable. Nor are the real numbers.

Part II

The Real Numbers7 Pictures of our Cast of Characters

Figure 16 shows our favorite sets of numbers. Of course, they are all infinite sets an thus we cannot put all the elements in.First we have Z = {0,±1,±2....}, the integers, a discrete set of equally spaced points on a line, marching out to infinity.Then we have the rational numbers Q =

{mn

∣∣m,n ∈ Z, n �= 0} . This set is everywhere dense in the real line; everyopen interval contains a rational. For example any open interval containing 0, must contain infinitely many of the numbers12n , n = 1, 2, 3, 4, .... However, Q is full of holes where

√2, e, π would be if they were rational but they are not.

The set of real numbers, R, consists of all decimal expansions including that for

e = 2.71828 18284 59045 23536 02874 71352 66249 7757.....

It can be pictured as a continuous line, with no holes or gaps. You can think of the real numbers algebraically as decimals.By this we mean an infinite series:

α =∞∑

j=−naj10

−j , with aj ∈ {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}.

In the usual decimal notation, we write α = a1a2a3a4a5 · · · . This representation is not unique; for example, 0.999999 · · · =1. To see this, use the geometric series

∞∑n=0

xn =1

1− x, for |x| < 1.

Here I assume that you learned about infinite series in calculus. They are of course limits, which we have yet to definecarefully. Anyway, back to our example, we have

0.9999999 · · · = 9

10

∞∑n=0

(1

10

)n=9

10

1

1− 110

= 1.

Z is the set of real numbers which have a decimal representation with only all 0’s or all 9’s after the decimal point.Q is the set of real numbers with decimals that are repeating after a certain point. For example, 1

3 = 0.333333333 · · · ;17 = 0.142 857142857142857 · · · . This too comes from the geometric series and the fact that 142857

999999 =17 .

23

Page 24: Introduction, Motivation, Basics About Sets, Functions, Counting

Figure 16: The red dots indicate integers in the first line, rational numbers in the second line, and real numbers in the 3rdline. Of course we cannot actually draw all the rationals in an interval so we tried to indicate a cloud of points.

24

Page 25: Introduction, Motivation, Basics About Sets, Functions, Counting

8 The Field Axioms for the Real Numbers

Given real numbers x, y, z we have unique real numbers x+ y, xy such that the following axioms hold ∀ x, y, z ∈ R.A1. associative law for addition: (x+ y) + z = x+ (y + z)A2. identity for addition: ∃0 ∈ R s.t. 0 + x = xA3. inverses for addition: ∀x ∈ R, ∃ − x ∈ R s.t. x+ (−x) = (−x) + x = 0.A4. commutative law for addition: x+ y = y + xM1. associative law for multiplication: x(yz) = (xy)zM2. identity for multiplication: ∃1 ∈ R s.t. 1x = x1 = xM3. multiplicative inverses for non-zero elements: ∀x ∈ R s.t. x �= 0, ∃x−1 ∈ R s.t. xx−1 = 1 = x−1xM4. commutative law for multiplication: xy = yxD. distributive law: x(y + z) = xy + xz

Any set with 2 operations + and × that satisfy the preceding 9 axioms is called a field. The rational numbers Q alsosatisfy these 9 axioms and are thus a field too. Mostly fields are topics studied in algebra, not analysis.From these laws you can deduce the many facts that you know from school before college. For example we list a few

facts.Facts About R that Follow from the Field Axioms.Fact 1) ∀x, y, z ∈ R, if xy = xz and x �= 0, then y = z.Fact 2) 0 · x = 0 ∀x ∈ R.Fact 3) The elements 0 and 1 are unique.Fact 4) −(−x) = (−x)(−x) = x.

Proof. Fact 1) Multiply the equation by x−1 which exists by M3. This gives

y = 1 · y = (x−1 · x)y = x−1(xy) = x−1(xz) = (x−1 · x)z = 1 · z = z.

Here we have used axioms M2, M3,M1.Fact 2) Using our axioms M2, D, A2 we have

0 · x+ x = 0 · x+ 1 · x = (0 + 1) · x = 1 · x = x.

It follows that 0 · x+ x = x. Now subtract x from both sides (or equivalently add −x to both sides) to get

(0 · x+ x)− x = x− x

which says by A1 and A30 · x+ (x− x) = 0.

Thus by A3 and A2, 0 · x+ 0 = 0 and again by A2, we have 0 · x = 0.Fact 3) and Fact 4) We leave these proofs to the reader.

9 Order Axioms for R

The set R has a subset P which we know as the set of positive real numbers. Then P satisfies the following 2 OrderAxioms:Ord 1. R = P ∪ {0} ∪ (−P ), where −P = {−x|x ∈ R} = negative real numbers. Moreover, this union is disjoint;

i.e., the intersection of any pair of the 3 sets is empty.Ord 2. x, y ∈ P implies x+ y and xy ∈ P.

Definition 16 For real numbers x, y we write x < y iff y − x ∈ P. We write x ≤ y iff either x < y or x = y.

All the usual properties of inequalities can be deduced from our 2 order axioms and this definition. We will do a few ofthese.Facts about Order. ∀ x, y, z ∈ RFact 1) Transitivity. x < y and y < z implies x < z.

25

Page 26: Introduction, Motivation, Basics About Sets, Functions, Counting

Fact 2) Trichotomy. For any x, y, z ∈ R exactly one of the following inequalities is true: x < y, y < x, or x = y.Fact 3) x < y implies x+ z < y + z for any z ∈ R.Fact 4) 0 < x iff x ∈ P.Fact 5) If 0 < c and x < y, then cx < cy.Fact 6) If c < 0 and x < y, then cy < cx.Fact 7) 0 < 1Fact 8) If 0 < x < y, then 0 < 1

y <1x .

Proof. We will leave most of these proofs to the reader as Exercises. But let’s do 1)and 7).Fact 1) x < y means y − x ∈ P. y < z means z − y ∈ P. Then by Ord2 and the axioms for arithmetic in R, we havey − x+ z − y = z − x ∈ P. This says x < z.Fact 7) note that by Ord 1, 1 must either be in P or −P , since 1 �= 0 (Why?). If 1 ∈ −P, then −1 ∈ P and Ord 2 says(−1)(−1) = 1 ∈ P, contradiction.

Definition 17 The absolute value |x| of a real number x is defined by |x| ={

x, if x ≥ 0,−x, if x < 0.

Equivalently, we can write|x| =

√x2. (5)

This is true because we always take the positive square root.Properties of the Absolute Value.For all x, y, z ∈ R, we have the following properties.Property 1) |x| ≥ 0 ∀x ∈ R and |x| = 0 iff x = 0.Property 2) |xy| = |x| |y| .Property 3) the triangle inequality: |x+ y| ≤ |x| = |y| .

Proof. We leave all but 3) to the reader as Exercises.

Property 3) Using Formula (5), we have |x+ y|2 =(√

(x+ y)2

)2= (x+ y)

2= x2 + 2xy + y2 = |x|2 + 2xy + |y|2 ≤

|x|2 + 2 |x| |y|+ |y|2 = (|x|+ |y|)2 .Here we have used the fact that x ≤ |x| as well as property 2) of the absolute value. Now take square roots of both sides

to finish the proof. To know that is legal you need to prove that 0 ≤ x < y implies 0 ≤ √x < √y. We prove these 2 factsbelow.Proofs of some Facts about Inequalities that were Needed in the Preceding Proof.Fact 1) x ≤ |x| , ∀x ∈ RFact 2) 0 ≤ x < y implies 0 ≤ √x < √y.

Proof. Fact 1) If x ≥ 0, then |x| = x and the result is clear. If x < 0, then |x| = −x > 0 > x, and the result holds by thetransitivity property of >.Fact 2)We prove it by contradiction. Assume

√x >

√y. Then x =

√x√x >

√x√y >

√y√y = y. Again by transitivity,

we have x > y, a contradiction to our hypothesis.The absolute value is very useful. For example, it allows us to define the distance d(x, y) between 2 real numbers x and

y to be d(x, y) = |x− y| .

26

Page 27: Introduction, Motivation, Basics About Sets, Functions, Counting

10 The Last Axiom for the Real Numbers

10.1 The Holes in Q

We have stated 9 field axioms and 2 order axioms. Both of these axioms are also valid for the rational numbers; i.e., Q isan ordered field just like R. So what distinguishes Q from R? We tried indicate this in Figure 16. Q has holes like

√2, π, e,

while R is a continuum. Of course the holes in Q are as invisible as points. We will prove later that every interval on thereal line contains a rational number.There is a fairly simple axiom that allows us to say that R has no holes. Before stating this axiom, let’s explain why

√2

is irrational. The Pythagoreans noticed this over 1000 years ago but kept it secret on pain of death. It seemed evil to themthat the diagonal of a unit square or the hypotenuse of such a nice triangle as that in Figure 17 should be irrational.

Figure 17:√2 is the length of the diagonal of a square each of whose sides has length 1.

Theorem 18√2 is not rational.

Proof. (by contradiction)Suppose

√2 were rational and √

2 =m

n, with m,n ∈ Z, n �= 0. (6)

We can assume that the fraction mn is in lowest terms; i.e., m and n have no common divisors. Square formula (6). This

gives

2 =m2

n2and then 2n2 = m2.

But then m must be even, since the square of an odd number is odd. So m = 2r, for some r ∈ Z. Thereforem2 = 4r2 = 2n2.

Divide by 2 to see that n has to be even since n2 is even. This is a contradiction since n and m now have a common divisor,namely, 2.

Similarly (or, better, using unique factorization of positive integers as a product of primes) one can show that√m is

irrational for any positive integer m such that m is not the square of another integer. Thus√5,√6 are also irrational. You

can do similar things for cube roots. It is harder to see that π and e are irrational. We will at least show e is irrationallater. In fact, e and π are transcendental, meaning that they are not roots of a polynomial with rational coefficients. Ofcourse,

√2 is a root of x2 − 2. See Hardy and Wright, Theory of Numbers, for more information.

A reference for weird facts about numbers (without proof) is David Wells, The Penguin Dictionary of Curious andInteresting Numbers. Here we learn that J. Lambert proved π /∈ Q in 1766. And in 1882 Lindemann proved π to betranscendental. This implies that it is not possible to square the circle with ruler and compass - one of the 3 famousproblems of antiquity. It asks for the ruler and compass construction of a square whose area equals that of a given circle.

27

Page 28: Introduction, Motivation, Basics About Sets, Functions, Counting

The other 2 problems are angle trisection and cube duplication. To understand these problems you need to figure out theprecise rules for ruler and compass constructions. Many undergraduate algebra books use Galois theory to show that allthree problems are impossible.Despite the provable impossibility of circle squaring, circle squarers abound. In 1897 the Indiana House of Representatives

almost passed a law setting π = 16√3∼= 9.237 6 - due to the efforts of a circle squarer.

Now many computers have been put to work finding more and more digits of π.

year digits where1961 100,000 U.S., Shanks and Wrench1967 500,000 France1988 201 million Japan, Y. Canada1989 over 1 billion U.S., Chudnovsky brothers

What is the point of such calculations? Some believe that π is a normal number, which means that there is, in somesense, no pattern at all in the decimal expansion of π. But we digress into number theory. Anyway here are the first fewdigits:

π ∼= 3.14159 26535 89793 23846 26433 83279 50288 41972 .A reference is S. S. Hall, Mapping the Next Millennium, Chapter 13, which also contains a map obtained from trends in thefirst million digits of π as produced by the Chudnovsky brothers.So, anyway we have lots of irrational numbers like

√2, π, e. But we have another way to know that Q is full of holes.

Thinking of R as the set of all infinite decimals, we know (by Cantor’s diagonal argument) that R is not denumerable, whileQ is denumerable. So R is actually much much bigger than Q.But we only need one more axiom to fill in the holes in Q.

10.2 Axiom C. The Completeness Axiom.

Before stating the completeness axiom, we need a definition.

Definition 19 We say that a real number a is the least upper bound (or supremum) of a set S of real numbers iff

x ≤ a, ∀x ∈ S and if x ≤ b,∀x ∈ S implies a ≤ b.Notation: a = l.u.b.S (or a = supS).

What is the l.u.b.? It is just what it claims to be; namely, the least of all the upper bounds for S (assuming S has upperbounds). There is also an analogous definition of greatest lower bound (g.l.b.) or infimum. We give the reader the jobof writing down the definition. It is the greatest of all possible lower bounds.If the set S is finite, there is no problem finding the l.u.b. or g.l.b. of S. Then you would say l.u.b.S is the maximum

element of the finite set, for example. But when the set S is infinite, things become a lot less obvious. Why even shouldthe l.u.b. of S exist? We will soon state an axiom that proclaims the existence of the l.u.b. or g.l.b. of a bounded set.Otherwise we would have no way to know. We will have proclaimed this existence by fiat. If we were kind, we would alsoproduce a proof that the real numbers actually exist. I am sure you are not really worried about that. Or are you?The l.u.b. of S is the left most real number to the right of all the elements of S. If you confuse right and left as much as

I do, you may find this confusing.Examples.1) l.u.b.{√2, π, e} = π.2) If S = (0, 1) = {x ∈ R|0 < x < 1}, then l.u.b.S = 1. This example shows that the least upper bound need not be

an element of the set.3) If S =

{1n

∣∣n ∈ Z+} , then g.l.b.S = 0. This example shows that the greatest lower bound need not be in the set.The Completeness Axiom.Suppose S is a non-empty set of real numbers and that S is bounded above; i.e., there is a real number B such that x ≤ B

for all x ∈ S. Then there exists a real number a = l.u.b.S.

Assume l.u.b. S = a /∈ S. This is the most interesting case. Figure 18 shows the picture of a least upper bound a fora bounded set S. The red dots represent points of the infinite set S. Of course I cannot really put an infinite number of

28

Page 29: Introduction, Motivation, Basics About Sets, Functions, Counting

points on a page in a lifetime. So you have to imagine them. Also points have no width. So what I am drawing is notreally the points but a representation of what they would be if they had width. Again use your imagination. The point B(blue oval) at the right end is an upper bound for the set S. The point a (blue oval) is the least upper bound of S.

Assume l.u.b. S = a /∈ S. One way to characterize l.u.b. S is to note that, for a small positive ε, the interval (a− ε, a)has infinitely many points from the set S, Why? Otherwise there would be a smaller l.u.b. S than a. On the other hand,the interval [a, a+ ε) has no points from S, again assuming ε is small and positive. Why? Otherwise a would not even bean upper bound for S.

Figure 18: The infinite bounded set S is indicated with red dots. An upper bound B for S is indicated with a blue oval.The least upper bound a of S is indicated with a blue oval. The interval (a − ε, a) contains infinitely many points of S.The interval (a, a+ ε) contains no points of S. Here a− ε and a+ ε are also indicated with blue ovals. Since S is infinitewe cannot really draw all its points. Moreover points are invisible really. So our figure is just a shadow of what is reallyhappening. That you have to imagine.

The completeness axiom will imply all we need to know about limits and their existence. And that is what this courseis about. Without it we would be in serious trouble.The result of all this is that R is characterized by 9+2+1 axioms; 9 field axioms, 2 order axioms, and 1 completeness

axiom. To summarize that, we say that R is a complete ordered field.Next let us look at an example of how the completeness axiom can be used to fill in a hole in the rationals.Example 1. A Set of Rational Numbers whose l.u.b. is

√2 - found by Newton’s Method.

See Figure 19.

Define x1 = 2, x2 = 12x1 +

1x1, ..., xn =

12xn−1 +

1xn−1

, n = 2, 3, 4, 5, ..... In this way, we have an inductive definition ofan infinite set S = {xn|n ∈ Z+} . The first few elements of S are 2, 1.5, 1.416666666...., 1.414215686, ...... We claim thatthe least upper bound of this set is

√2. We will prove this claim later after noting that {xn}n≥1 is an decreasing sequence

that is bounded below and thus must have a limit (which we are about to define), which is√2. This says that 2 is the l.u.b.

of S.Note on Newton’s Method.This is a method which often approximates the root of a polynomial very well. In this case the polynomial is x2− 2. To

find a root near x1 = 2, you need to look at the tangent to the curve y = f(x) = x2 − 2 at the point (x1, f(x1)). See Figure19. The point where that tangent line intersects the x-axis is x2. To find it, look at the slope of the tangent at (x1, f(x1))which is f ′(x1) = 2x1 = 4. Then use the point-slope equation of a line to get:

4 =f(x1)− 0x1 − x2

=2

2− x2 .

That says x2 = 1.5.

29

Page 30: Introduction, Motivation, Basics About Sets, Functions, Counting

Figure 19: Graph showing the function y = x2 − 2 as well as its tangent line at the point x1 = 2. The 1st two Newtonapproximations to

√2 are given. The first approximation is x1 = 2. The second approximation is x2 which is the

intersection of the tangent line at x1 and the x-axis.

30

Page 31: Introduction, Motivation, Basics About Sets, Functions, Counting

Part III

Limits11 Definition of Limits

Recall that a sequence {xn}n≥1 = {x1, x2, x3, x4, ...} has indices n which are positive integers. We do not assume thatthe mapping from Z+ to {xn}n≥1 is 1-1. In fact a possible sequence has all xn equal to the same number, say 2. Noproblem finding the limit of that sequence. You can think of a sequence as a vector with infinitely many components.

Definition 20 If {xn}n≥1 is a sequence of real numbers, we say that the real number L is the limit of xn as n goes to ∞and write L = lim

n→∞xn iff, for every ε > 0 there is an N ∈ Z+ (with N depending on ε) such that n ≥ N implies

|xn − L| < ε.

In this definition you are supposed to think of ε as an arbitrarily small positive guy. Paul Erdös used to call children"epsilons." A computer might think ε = 10−10. Such a small number of inches would be invisible. We have tried to draw apicture of a sequence approaching a limit. See Figure 20. Here we graph points (n, xn) and L = lim

n→∞xn. Given a small

positive number ε, we are supposed to be able to find a positive integer N = N(ε) depending on ε so that every (n, xn),for n ≥ N(ε) is in the shaded box. Then you have to imagine taking an even smaller ε say ε

109 and there will be a newprobably larger N = N( ε

109 ) so that all (n, xn) with n ≥ N( ε109 ) are in the new smaller version of the shaded box. These

xn will be extremely close to a. And the point is that you can make all but a finite number of the xn as close you want to a.

Figure 20: illustration of the definition of limit

This definition is one of the most important in the course. It was not given by I. Newton (1642-1727) or W. Leibniz(1646-1716) when they invented calculus in the 1600s. Our definition makes the idea of the sequence xn approaching a realnumber L very precise by saying the distance between xn and L is getting smaller than any small positive epsilon. Memorizeit or stop reading now. We will have a similar definition of a limit of a function f(x) as x approaches a finite real number a.

31

Page 32: Introduction, Motivation, Basics About Sets, Functions, Counting

This definition of limit dates from the 1800s. It is due to B. Bolzano (1781-1848), A. Cauchy (1789-1857) and K.Weierstrass (1815-1897). Weierstrass was the advisor of the first woman math. prof. - Sonya Kovalevsky. The main factabout Cauchy that comes to my mind is that he lost that memoir of Galois. See Edna Kramer, The Nature and Growth ofModern Mathematics. An entertaining story book on the history of math. is E. T. Bell, Men of Mathematics.Now we have a precise meaning to apply to our example above obtained using Newton’s method to find a sequence of

rationals approaching√2. We will do the proof later after we know more about limits. Let’s consider another favorite

example.Example 2. A Sequence of Rational Numbers Approaching e.Define e by the Taylor series for ex. That is,

e =∞∑n=0

1

n!.

This means that e is the limit of the sequence of partial sums

sn =n∑k=0

1

k!= 1 + 1 +

1

2+1

6+1

24+ · · ·+ 1

n!.

So we have s1 = 1, s2 = 2, s3 = 1 + 1 + 12 = 2.5, s4 = 1 + 1 +

12 +

16∼= 2.666 7, .... This sequence does not converge as fast as

that in example 1.Example 3. The Simplest Limit (except for a constant sequence).Define the sequence xn = 1

n , for all n ∈ Z+. I think it is obvious what the limit of this sequence is; namely,

limn→∞xn = lim

n→∞1

n= 0.

To prove this from the definition of limit, we need to show that given ε > 0, we can find N ∈ Z+ so that n ≥ N implies∣∣ 1n − 0

∣∣ < ε. That means, we need1

n< ε, if n ≥ N.

The inequality is equivalent to saying n > 1ε . This means that we should take N =

⌊1ε

⌋+ 1. Here the floor of x = �x� =

the greatest integer ≤ x. For then if n ≥ N ≥ 1ε + 1 >

1ε , we have n >

1ε .

To picture this you could graph the points (n, 1n ) as in Figure 21. We seek the limit of the y-coordinates as thex-coordinates approach infinity.Example 3 is this sort of limit that anyone could work out - even before having the wonderful Definition 20. More subtle

limits however could confuse experts such as Cauchy himself. We will see more examples after considering the main factsabout limits of sequences.

32

Page 33: Introduction, Motivation, Basics About Sets, Functions, Counting

Figure 21: The red points are (n, 1n ), for n = 1, 2, ..., 10. It is supposed to be clear that the y-coordinates are approaching0 as the x-coordinates march out to infinity. That is the points are getting closer and closer to the y-axis. Of course youcannot actually see what is happening at infinity.

33

Page 34: Introduction, Motivation, Basics About Sets, Functions, Counting

12 Facts About Limits

Now that we have seen a few examples, perhaps we should prove a few things so that we can refer to them rather thanreprove them every time.Facts About Limits of Sequences of Real Numbers.Fact 1) (Uniqueness) If the limit a = lim

n→∞xn exists, then a is unique; i.e., if also b = limn→∞xn, then a = b.

Fact 2) (Limit Exists implies Sequence Bounded) If a = limn→∞xn exists, then the set {xn|n ∈ Z

+} is bounded

above and below.Fact 3) (Limit of a Sum is the Sum of the Limits) If we have limits a = lim

n→∞xn and b = limn→∞yn, then

limn→∞ (xn + yn) = a+ b.

Fact 4) (Limit of a Product) If we have limits a = limn→∞xn and b = lim

n→∞yn, then limn→∞ (xnyn) = ab.

Fact 5) (Limit of a Quotient) If we have limits a = limn→∞xn and b = lim

n→∞yn, and in addition b �= 0, then we haveyn �= 0, for all n sufficiently large and a

b = limn→∞

xnyn, assuming we throw out the finite number of n such that yn = 0.

Fact 6) (Limit of an Increasing Bounded Sequence) Suppose that xn ≤ xn+1 ≤ B, for all n ∈ Z+. Then the limita = lim

n→∞xn exists. Moreover, a is the least upper bound of the set of all elements in the sequence. That is,

a = limn→∞xn = l.u.b.{xn|n ∈ Z

+}.

There is an analogous result for decreasing sequences which are bounded below.Fact 7) (Limits Preserve ≤) Suppose a = lim

n→∞xn and b = limn→∞yn and xn ≤ yn, for all n. Then a ≤ b.

Proof. Fact 1) Let us postpone this one to the end of this subsection. It may seem to be the most obvious but it requiresa little thought.Fact 2) Let ε = 1 be given. Then there exists N such that n ≥ N implies |xn − L| < 1. By properties of inequalities,this implies |xn| = |xn − L+ L| ≤ |xn − L| + |L| < 1 + |L| , if n ≥ N. It follows that a bound on the sequence ismax {|x1| , ..., |xN−1| , |L|+ 1} .Fact 3) Suppose we are given an ε > 0. Then we know we can find N and M depending on ε so that

n ≥ N implies |xn − a| < ε and n ≥M implies |yn − b| < ε. (7)

Therefore if we take K = max{N,M} and n ≥ K, we have

|xn + yn − (a+ b)| ≤ |xn − a|+ |yn − b| < 2ε.

You may worry that we got 2ε rather than ε. The trick to get rid of 2, is to start out in formula (7) with ε2 rather than

ε. This is called an " ε2 proof." Watch out for the ε10 proofs!

Fact 4) The main idea is to start with what you need to prove: for n large enough

|xnyn − ab| < ε. (8)

In order to get this, given our hypotheses, we use a trick that you may remember from calculus, if you ever proved theformula for the derivative of a product. We know |xn − a| is eventually small for large n, so we should be able to show|xnyn − xnb| is small. Thus we subtract xnb from xnyn − ab and add it back in to obtain

|xnyn − ab| = |xnyn − xnb+ xnb− ab| .

Now use the triangle inequality and the multiplicative property of the absolute value, finding that

|xnyn − ab| = |xnyn − xnb+ xnb− ab| ≤ |xnyn − xnb|+ |xnb− ab| = |xn| |yn − b|+ |b| |xn − a| . (9)

In order to make |xn| |yn − b|+ |b| |xn − a| small, we actually need to know that |xn| is not blowing up as n→∞. Fact2 tells us that there is a positive number K such that |xn| ≤ K, for all n. This, plus inequality (9), implies

|xnyn − ab| ≤ K |yn − b|+ |b| |xn − a| . (10)

34

Page 35: Introduction, Motivation, Basics About Sets, Functions, Counting

Given ε > 0, we can find N and M depending on ε so that we have an improved version of the inequalities (7)

n ≥ N implies |xn − a| < ε

1 + |b| and n ≥M implies |yn − b| < ε

K. (11)

We divide by 1+ |b| rather than |b| because b might be 0. We know K > 0 so we don’t need to add 1 to K to avoid dividingby 0.Now we combine inequalities (10) and (11) to get the desired inequality (8).

Fact 5) We will prove this sort of thing later. The reader should think about it though, using ideas similar to the proof of4).Fact 6) We know that the set S = {xn|n ∈ Z+} is non-empty and bounded. Therefore, by the completeness axiom, it hasa least upper bound which we will call a. We want to show that a = lim

n→∞xn. Suppose that we have been given ε > 0. Look

at a− ε. We know that a− ε < a and thus a− ε cannot be an upper bound for the set S of all xn, n ∈ Z+. Look at Figure22.

Figure 22: Picture of the proof of fact 6. {xn} is a bounded increasing sequence and a = lub {xn|n ∈ Z+} . Here we assumen ≥ N. We know N exists such that xN ∈ (a− ε, a] as a− ε is not an upper bound for the set of all xk, k ∈ Z+.

This means there is an N ∈ Z+ so that a− ε < xN ≤ a. Since {xn} is an increasing sequence, this means that for alln ≥ N we have

a− ε < xN ≤ xn ≤ a.This implies that |xn − a| < ε if n ≥ N. Therefore according to our definition of limit, a = lim

n→∞xn.

Fact 7) Using Facts 3 and 4, it suffices to look at zn = yn − xn ≥ 0,∀n. We know that limn→∞zn = b − a and we must show

that limn→∞zn ≥ 0. Do a proof by contradiction looking at Figure 23.We leave the details to the reader as we will return to do a more general version of this argument later. Next we need 2

lemmas which will prove useful now and in the future.

Lemma 21 Suppose that a is a real number such that |a| < ε for all ε > 0. Then a = 0.

Proof. We do a proof by contradiction again. If a is not 0, then |a| > 0 and then, we can take ε = |a|2 . This means

|a| < |a|2 . But that is absurd as it implies 1 <

12 and thus 2 < 1 and 1 < 0. We have our contradiction.

Lemma 22 The set Z+ of positive integers is not bounded above.

Proof. Again we need a proof by contradiction. Suppose that a real number b is an upper bound for the set Z+. By thecompleteness axiom, then Z+ has a least upper bound a. But this means that a − 1 is not an upper bound for Z+ andthere is a positive integer n such that a − 1 < n. Therefore a < n + 1, by a property of inequalities. But then we have acontradiction as n+ 1 ∈ Z+ and a was supposed to be an upper bound for Z+.Now we can prove our 1st fact about limits.

Proof. of Fact 1 About Limits.If a = lim

n→∞xn and b = limn→∞xn, we must show a = b. The number a− b satisfies the hypothesis of Lemma 21. Why?

We have|a− b| = |a− xn + xn − b| ≤ |a− xn|+ |xn − b| .

Given, ε > 0, we know there must be positive integers N and M such that n ≥ N implies |a− xn| < ε2 and n≥ M implies

|b− xn| < ε2 . It follows that n ≥ max{N,M} implies

|a− b| = |a− xn + xn − b| ≤ |a− xn|+ |xn − b| < ε.Thus, by Lemma 21, we see that a = b. Limits are unique.

35

Page 36: Introduction, Motivation, Basics About Sets, Functions, Counting

Figure 23: Picture of a proof by contradiction in which we assume that the sequence {xn} is non-negative while the limit Lis negative. But then |xn − L| ≥ |L| > 0 for all n. This contradicts the definition of limit.

13 More Examples of Limits of Sequences

Example 1. limn→∞

(1 + 1

n

)n= e.

Here xn =(1 + 1

n

)n, for n = 1, 2, 3, .... So we have

x1 = 2, x2 =

(1 +

1

2

)2= 2. 25, x3 =

(1 +

1

3

)3∼= 2.370 4, ....., x30 =

(1 +

1

30

)30∼= 2.674 3.

This is an increasing sequence bounded above by 4. It therefore has a limit. It can be shown (using l’Hopital’s rule aftertaking the natural logarithm) that the limit is e ∼= 2.71828. We will say more about this later. See Section 23.Note that yn =

(1 + 1

n

)n+1is a decreasing sequence, also approaching e as n goes to infinity. We have, for example

y1 = 4, y2 =

(1 +

1

2

)3∼= 3.375, y3 =

(1 +

1

3

)4∼= 3.160 5, ..., y30 =

(1 +

1

30

)31∼= 2.763 5.

Example 2. xn = cos(nπ) = (−1)n, n = 1, 2, 3, ..... This sequence has no limit as n goes to infinity. The sequencealternates between 1 and -1. Thus it cannot decide on a limit.Proof. To prove there is no limit, proceed by contradiction. If L = lim

n→∞(−1)n, then according to the definition of limit

we can take ε = 1 (or any positive number), and find N so that n ≥ N implies |xn − L| < 1. This means that for n evenand large we have

|1− L| < 1 or equivalently − 1 < 1− L < 1and for n odd and large we have

|1 + L| = |−1− L| < 1 or equivalently − 1 < 1 + L < 1.But then we can add the inequalities and obtain −2 < 2 < 2. Contradiction. Thus the sequence has no limit.Example 3. lim

n→∞n

3n−2 =13 .

36

Page 37: Introduction, Motivation, Basics About Sets, Functions, Counting

Proof. We need to show that the distance between n3n−2 and

13 is small for large n. To do this, look at the following∣∣∣∣ n

3n− 2 −1

3

∣∣∣∣ = 1

3

∣∣∣∣3n− (3n− 2)3n− 2∣∣∣∣ = 1

3

∣∣∣∣ 2

3n− 2∣∣∣∣ < ε.

This inequality is equivalent to

3n− 2 > 2

3ε,

which says

n >1

3

(2

3ε+ 2

).

Such an n exists by Lemma 22 above.Another proof can be found using some high school algebra to see that

n

3n− 2 =n

3n− 21n1n

=1

3− 2n

.

Then use the fact that limn→∞

1n = 0 which was proved above and Facts about limits stated above to prove the result.

Next recall the example of a sequence approaching√2 obtained using Newton’s method.

Example 4. Define x1 = 2, x2 = 12x1 +

1x1, ..., xn+1 =

12xn +

1xn, n = 1, 3, 4, 5, ..... Then lim

n→∞xn =√2.

To prove that this limit is indeed correct, using facts about limits proved above, it suffices for us to show the following.Claim. {xn} is a decreasing sequence bounded below; i.e.,

xn > xn+1 > 1, for all n = 1, 2, 3, ....

Once this claim is proved, it is not hard to see that the limit must exist (by the analog of Fact 6 for decreasing boundedsequences) and that it must be

√2. If L = lim

n→∞xn, then look at the recursion

xn+1 =1

2xn +

1

xn.

Take the limit as n→∞ to see that (using the appropriate facts about limits)

L =L

2+1

L.

Multiply by 2L to obtain 2L2 = L2 + 2. Thus L2 = 2. Therefore L = ±√2. Why must L be positive? Use Fact 7.Proof. of the Claim.Here we use mathematical induction. We give the induction step, taking a = xn and b = xn+1. We need to show that if

a2 > 2 and a > 0 then b = a2 +

1a implies 0 < b < a and b

2 > 2 which implies b > 1.To see this, note that

a− b = a−(a

2+1

a

)=a

2− 1

a=a2 − 22a

.

Then a2 > 2 and a > 0 imply that a− b > 0.Next look at

b2 − 2 =(a

2+1

a

)2− 2 = a2

4+ 1 +

1

a2− 2 =

(a

2− 1

a

)2> 0.

So b2 > 2. The proof of the claims is complete which finishes the proof that limn→∞xn =

√2.

37

Page 38: Introduction, Motivation, Basics About Sets, Functions, Counting

Figure 24: n-gons approaching a circle of radius 1 for n = 4, 8, 16.

14 Some History. Dedekind Cuts.

Why is the completeness axiom necessary? Why do we need to understand limits? This idea is fundamental to most ofapplied mathematics. It is central to differential equations and therefore to the theory of earthquakes and cosmology. Theidea of "gradually getting there," "tending towards," "approaching" is one of the most basic. We cannot actually draw apicture of what is happening to xn when n has moved infinitely far out. The idea is subtle.

You can sometimes see it happen on a computer. Look for example at Figure 24.

Here an equilateral polygon approaches a circle. So its area must approach that of a circle. This gives a way toapproximate π. Archimedes did this around 250 B.C. Assume the radius of the circle is 1. Then

4 sides give the area 2

8 sides give the area 2.828

16 sides give the area 3.061.

Around 1821 Cauchy had formulated a principle of convergence of sequences of real numbers. His idea was to use theidea of approximation. We will give the definition of Cauchy sequence soon. It gives a useful way to construct the realnumbers as well as a criterion for convergence of a sequence. See V. Bryant, Yet Another Introduction to Analysis for moreexamples.The need for clarification of the concept of a real number became apparent in 1826 when Abel corrected Cauchy’s belief

that a sequence of continuous functions must have a continuous limit function. This showed that intuition can be verymisleading when investigating limits. See G. Temple, 100 Years of Math., for more discussion of the history of the conceptof limit.The completeness axiom for R can be stated in a different way. If A and B are non-empty sets of real numbers such that

a ≤ b for all a ∈ A and b ∈ B, then there exists a real number ω so that ω ≥ a for all a ∈ A and ω ≤ b for all b ∈ B. Thisaxiom is another way of saying there are no holes in the real line. V. Bryant, Yet Another Introduction to Analysis, p. 11,calls ω "Piggy-in-the-middle." See Figure 25.

Figure 25: piggy in the middle - real number between 2 sets A and B such that a ≤ b for every a ∈ A and b ∈ B.

In 1872 Dedekind used this sort of idea to construct the real numbers. A Dedekind cut consists of 2 sets A and B ofrational numbers such that1) Q = A ∪B2) a ∈ A and b ∈ B implies a ≤ b.

38

Page 39: Introduction, Motivation, Basics About Sets, Functions, Counting

15 Cauchy Sequences

Next we want to define Cauchy sequences. This gives a convergence criterion, a new version of the completeness axiom, anda way to construct the real numbers, the space of Lebesgue integrable functions, and Hensel’s space of p-adic numbers forevery prime p (as space which has many applications in number theory).

Definition 23 A sequence {xn} of real numbers is a Cauchy sequence iff for every ε > 0 there is an N ∈ Z+such thatn,m ≥ N implies |xn − xm| < ε.

In this definition, we just ask that the distance between xn and xm is less than ε for all but a finite number of n and m.To say this another way, we ask that the sequence elements xn and xm become arbitrarily close as m,n → ∞. The usefulthing about this convergence criterion is that it does not require you to know what the limit is.Cauchy made this definition in 1821. He did not prove the following theorem.

Theorem 24 Every Cauchy sequence of real numbers has a limit.

This theorem is actually a consequence of our completeness axiom and we will soon give a proof. In fact, the theorem islogically equivalent to the completeness axiom. Thus one could construct R (as Cantor did in 1883) as "limits of" Cauchysequences of rational numbers. Here we identify 2 Cauchy sequences if they converge to the same limit. This gives aconstruction of R which is analogous to that used to construct the spaces of Lebesgue integrable functions out of the spaceof continuous functions.Before thinking about proving the preceding theorem, we need to think about subsequences.

Definition 25 Suppose that {xn} is a sequence. A subsequence {xnk} is a sequence obtained by selecting out certain termsof the original sequence. Here

1 ≤ n1 < n2 < n3 < · · · < nk < nk+1 < · · · .

Example. Consider the sequence xn = (−1)n. One subsequence consists of the terms with even indices x2n = 1.Another subsequence consists of the terms with odd indices x2n+1 = −1. Both of these subsequences converge (since theyare constant) even though the original sequence does not converge. According the Fact 4 below this gives another proofthat the original sequence does not converge.

Facts About Cauchy (and Other) Sequences of Real Numbers.Fact 1. If a sequence of real numbers has a limit then it is a Cauchy sequence.Fact 2. Cauchy sequences of real numbers are bounded.Fact 3. Any bounded sequence of real numbers has a convergent subsequence.Fact 4. For a Cauchy sequence of real numbers, if a subsequence converges to L, then the original sequence also converges

to L.Proof. Fact 1.Suppose lim

n→∞xn = L. Given ε > 0, we know there is an N ∈ Z+ so that n ≥ N implies |xn − L| < ε. Similarly m ≥ Nimplies |xm − L| < ε. Therefore

|xn − xm| = |xn − L+ L− xm| ≤ |xn − L|+ |L− xm| < 2ε. (12)

Here we have used the triangle inequality. It follows that our sequence is Cauchy. Replace ε by ε2 if you feel paranoid

about the 2ε in formula (12).Fact 2.Suppose that {xn} is a Cauchy sequence of real numbers. Given ε = 1, we know by definition that there is an integer

N1 so that n ≥ N1 implies |xn − xN1 | < 1. Use the triangle inequality to see that n ≥ N1 implies

|xn| = |xn − xN1+ xN1

| ≤ |xn − xN1|+ |xN1

| < 1 + |xN1| .

This gives a bound on the sequence elements xn such that n≥ N1. That means we have a bound on all but a finitenumber of sequence elements. It is then not hard to get a bound on the entire sequence. Such a bound is

max{|x1| , ..., |xN1−1| , 1 + |xN1|}.

39

Page 40: Introduction, Motivation, Basics About Sets, Functions, Counting

Fact 3.We need to show that any bounded sequence of real numbers has a convergent subsequence. We give the proof of Bryant

in Yet Another Introduction to Analysis. There is also a proof in Lang, Undergraduate Analysis as a Corollary to theBolzano-Weierstrass Theorem (p. 38).Step 1. Any sequence has a subsequence which is either increasing or decreasing.To prove this, we use the Spanish or (La Jolla) Hotel Argument. Consider a sequence of hotels placed on the real

axis such that the nth hotel has height xn, n ∈ Z+. See Figure 26.

Figure 26: Picture of case A in the Spanish Hotel Argument. The indices {kj} correspond to hotels of strictly decreasingheight so that an eyeball at the top of the hotels with label kn can see the ocean and palm tree at infinity for every n ∈ Z+.

Note that if the sequence {xn} is not bounded above and below, you can easily find a subsequence that is either increasingor decreasing.There are two possibilities.

Case A). There is an infinite sequence of hotels with unblocked views to the right in the direction of the sea at infinity. SeeFigure 26. This means that there is an infinite sequence of positive integers k1 < k2 < k3 < · · · < kn < · · · such that fromthe top of the corresponding hotel, a person has an unblocked view to the right in the direction of the sea. If this is thecase, then

xk1 > xk2 > xk3 > · · · > xkn > xkn+1 > · · · .Thus we have an infinite strictly decreasing sequence of hotel heights.If Case A) is false, we must be in Case B.

Case B). In this case, after a finite number of hotels, every hotel has a blocked view. Let the finite number of hotels beindexed by kj , j = 1, ..., N . Then the next integer after that is N +1 = m1 with the property that there is a positive integerm2 > m1 such that xm2

≥ xm1. This means hotel m2 blocks the view of hotel m1. Continue to obtain a subsequence which

is increasing:xm1 ≤ xm2 ≤ xm3 ≤ · · · ≤ xmn ≤ xmn+1 ≤ · · · .

See Figure 27.Step 2. Convergence of Subsequence from Step 1.To see the convergence, just recall that we are assuming in Fact 3 that our original sequence and thus any subsequence

is bounded. So we just need to apply Fact 6 about limits. Any increasing or decreasing bounded sequence must converge.Fact 4.Let {xn} be a Cauchy sequence with a convergent subsequence {xnk} such that lim

k→∞xnk = L. So given ε > 0, there is a

positive integer K such that k ≥ K implies |xnk − L| < ε.We want to show that our original sequence xn converges to L. For any k ∈ Z+ we have nk ≥ k. Since our sequence is

Cauchy, we know there exists a positive integer N so that for k ≥ N we have |xk − xnk | < ε. Therefore if k ≥ max{K,N},we have

|xk − L| ≤ |xk − xnk |+ |xnk − L| < 2ε.

40

Page 41: Introduction, Motivation, Basics About Sets, Functions, Counting

Figure 27: Case B of the Spanish Hotel Argument. The sequence of indices {mk} corresponds to hotels of increasing height.

Here we used the triangle inequality. It follows that the original sequence converges to L. Again you may want to replaceε by ε

2 .Proof of Theorem 24Proof. We want to show that every Cauchy sequence of real numbers converges to a real number.Fact 2 about Cauchy sequences says a Cauchy sequence is bounded.Fact 3 about Cauchy sequences says a bounded sequence has a convergent subsequence.Fact 4 about Cauchy sequences says that once you have a convergent subsequence the original sequence is forced to

converge to the same limit as the subsequence.So we are done.

One can base calculus on the concept of an infinitesimal rather than on the idea of a limit. This is called "non-standardanalysis." It was created by A. Robinson. R. Rucker, Infinity and the Mind, p. 93, says that it is simpler to believe inthe infinitely small "dx" rather than to let Δx approach 0. But, Rucker says: "So great is the average person’s fear ofinfinity that to this day calculus all over the world is being taught as a study of limit processes instead of what it really isinfinitesimal analysis." There are a few calculus texts based on non-standard analysis: H.J. Keisler, Elementary Calculusand Henle and Kleinberg, Infinitesimal Calculus. I will not pursue this subject at all in these notes. It seems harder to dealwith than limits since so few people have actually tried to understand it.Others argue that the universe is finite. See Greenspan, Discrete Models, where it is said that "It is unfortunate that

so many scientists have been conditioned to believe that 1030 particles can always be well approximated by an infinitenumber of points." Classical applied math. views the vibrating string as a continuum like R. Greenspan argues that weshould perhaps replace the continuum with a large finite set of points. This replaces calculus with finite difference calculusor the finite element method. We will have nothing to say about that here, except to note that in the end usually one needsa computer to obtain an approximate solution to our applied math. problems and that leads us to replace derivatives withfinite differences, for example.Rucker, Infinity and the Mind, considers this question also. On p. 33, for example, he says: "The question of whether or

not matter is infinitely divisible may never be decided. For whenever an allegedly minimal particle is exhibited, there willbe those who claim that if a high enough energy were available, the particle could be decomposed; and whenever someonewishes to claim that matter is infinitely divisible, there will be some smallest known particle, which cannot be split."There is also a very basic controversy in mathematics - that of constructivism. One aspect of the constructivist approach

is to seek so-called "constructive proofs" which involve a new meaning for the mathematical word "or." Sets and numbersmust be constructed. Constructive mathematicians seek a different approach to the completeness axiom and thus to theexistence of limits. References are E. Bishop, Foundations of Constructive Analysis (where this course and graduate coursesare done in a constructive manner) and Volume 39 of the Journal, Contemporary Mathematics, which was dedicated toErrett Bishop, including an article by Bishop titled "Schizophrenia in Contemporary Mathematics." I personally find thatthis constructive approach to the basics of the logic of our proofs does in fact lead my brain to so many twists and turns thatschizophrenia might be a good description. I will say no more about it here.

41

Page 42: Introduction, Motivation, Basics About Sets, Functions, Counting

Part IV

Limits of FunctionsNow we want to consider the limit of a function f(x) as x approaches a, denoted lim

x→af(x).

Definition 26 Suppose I is an open interval containing the point a and f : I − {a} → R. Then we say L is the limit off(x) (or f(x) converges to L) as x approaches a and write lim

x→af(x) = L iff

∀ε > 0,∃δ > 0 s.t. 0 < |x− a| < δ =⇒ |f(x)− L| < ε. (13)

Figure 28: The graph of a function y = f(x) is red. The definition of limx→a

f(x) = L says that given a positive ε we can find

a positive δ (depending on ε) so that for x �= a in the interval (a− δ, a+ δ), the graph of the function must lie in the bluebox of height 2ε and width 2δ, except perhaps for (a, f(a)). In the picture f(a) is undefined and thus there is a hole in thegraph at (a, L).

Again you need to memorize this definition. And, yes, it is a pretty horrific sentence full of quantifiers ∃∀. If you believein the usual logic of mathematics, then you should be willing to write down the negation of this statement. If you are aconstructivist, I do not know what you would do.See Figure 28 for a picture of the definition of limit. Note that we do not assume that f(a) is defined. Thus |x− a| = 0

is excluded from consideration in the statement in formula (13) of the definition of limit. It is assumed in the definitionthat δ is small enough that 0 < |x− a| < δ implies x ∈ I − {a} and thus f(x) makes sense. If f(a) is defined that is O.K.too. It is not required that f(a) = L however. If f(a) �= L, the point (a, f(a)) would be outside the little blue box for smallenough ε in Figure 28.

42

Page 43: Introduction, Motivation, Basics About Sets, Functions, Counting

Note. We could weaken the hypotheses in the definition of limit. Most authors assume that a is an accumulationpoint of the set S where f is defined. This means that for every δ > 0, there is a point x �= a such that x ∈ S∩ (a− δ, a+ δ).This insures that there are points x �= a such that 0 < |x− a| < δ and f(x) makes sense. See Apostol, Mathematical Analysisor Sagan, Advanced Calculus. Lang, Undergraduate Analysis, does not do this, nor does he assume that he is only takingpoints x �= a in (a− δ, a+ δ). This allows f(a) to have a bad definition (in which case you would not have a limit) or a tobe an isolated point (i.e., any non-accumulation point) of the domain of f (in which case you would have a limit trivially).We give the more general definition of limit (with accumulation points) in Lectures, II. For now, our definition suffices.Example 1. Suppose f(x) = 3x − 1, x ∈ R. Then lim

x→2(3x− 1) = 5. Of course, you do not need the definition to

compute this limit since f(x) is a continuous function and thus our limit is f(2). But we have not proved anything aboutcontinuous functions yet, nor even defined them. So we prove that our limit is correct.Proof. |3x− 1− 5| = |3x− 6| = 3 |x− 2| < ε if |x− 2| < ε

3 = δ.For an example of lim

x→af(x) where f(a) is not defined, look at what we will later call a derivative.

Example 2. limx→a

x2−1x−1 = 2.

To prove this, just note that as long as x �= 1, we have the equality: x2−1x−1 = x+ 1. Thus we can proceed as in Example

1 to obtain the proof. We leave this to the reader. Note that the graph of the function x2−1x−1 is a straight line with a hole

at the point (1, 2).

If you want, you can also consider right and left hand limits. You just replace the open interval I in the definition oflimit above with a half open interval (a− δ, a) or (a, a+ δ), for small positive δ. For example consider the function knownas floor of x = �x� = L = the greatest integer ≤ x. See Figure 29. Then we want to say

limx→1x<1

�x� = 0 and limx→1x>1

�x� = 1.

One also writes limx→1x<1

�x� = limx→1−

�x� and limx→1x>1

�x� = limx→1+

�x� .

Figure 29: graph of the floor of x=�x�

43

Page 44: Introduction, Motivation, Basics About Sets, Functions, Counting

Before looking at more examples, it will help to know the basics about limits.Properties of Limits.In the following, we always assume our functions are defined on an open interval I containing the point a, except perhaps

at x = a.Property 1) Sequential Definition of Limits. lim

x→af(x) = L iff for every sequence {xn} of points in I − {a} such that

limn→∞xn = a, we have lim

n→∞f(xn) = L.Moral. If you hate δ for some reason, you can replace δ’s with sequences and thus N ’s.

Property 2) Limits are unique. That is, limx→a

f(x) = L and limx→a

f(x) = K implies K = L.

Property 3) Limit of a sum is the sum of the limits. Suppose limx→a

f(x) = L and limx→a

g(x) = M. Then

limx→a

(f(x) + g(x)) = L+M.

Property 4) Limit of a product is the product of the limits. Suppose limx→a

f(x) = L and limx→a

g(x) = M. Then

limx→a

(f(x)g(x)) = LM.

Property 5) Limit of a quotient is the quotient of the limits. Suppose limx→a

f(x) = L and limx→a

g(x) = M and, in

addition, M �= 0. Then lim f(x)g(x)

x→a

= LM .

Property 6) Limits preserve inequalities. Suppose limx→a

f(x) = L and limx→a

g(x) =M and, f(x) ≤ g(x), for all x in anopen interval containing a, except perhaps when x = a. Then L ≤M.Proof. Property 1). We leave this proof as an exercise.Property 2). We leave this as an exercise. Imitate the analogous proof for limits of sequences.Property 3). This proof is similar to that of the analogous fact for limits of sequences. We leave it as an exercise.Property 4). We proceed as in the proof of the analogous fact for sequences. Note that

|f(x)g(x)− LM | = |f(x)g(x)− f(x)M + f(x)M − LM |≤ |f(x)g(x)− f(x)M |+ |f(x)M − LM |= |f(x)| |g(x)−M |+ |f(x)− L| |M | .

We need to bound |f(x)| for x close to a in order to be able to make the first term in the last sum small. To do thisuse the definition of limit with ε = 1. This says there is a δ1 > 0 such that 0 < |x− a| < δ1 implies |f(x)− L| < 1. Since|f(x)| = |f(x)− L+ L| ≤ |f(x)− L|+ |L|, it follows that 0 < |x− a| < δ1 implies |f(x)| < 1 + |L| . Thus 0 < |x− a| < δ1implies

|f(x)g(x)− LM | ≤ (1 + |L|) |g(x)−M |+ |f(x)− L| |M | . (14)

We know that ∃ δ2 > 0 s.t.0 < |x− a| < δ2 implies |g(x)−M | < ε

2 (1 + |L|) . (15)

And ∃ δ3 > 0 s.t.0 < |x− a| < δ3 implies |f(x)− L| < ε

2 (1 + |M |) . (16)

Define δ = min{δ1, δ2, δ3}. Then 0 < |x− a| < δ implies (combining inequalities (14), (15) and (16) )

|f(x)g(x)− LM | < (1 + |L|) ε

2 (1 + |L|) +ε

2 (1 + |M |) |M | ≤ ε.

A shorter proof can be obtained using Property 1 and the corresponding fact for limits of sequences.Property 5. By Property 4, it suffices to prove the special case that

limx→a

g(x) =M, with M �= 0 implies lim 1

g(x)x→a

=1

M. (17)

Since dividing by 0 is a "no no," we need to show that for x close enough to a, g(x) �= 0. Since M �= 0, we can takeε = |M |

2 . So ∃ δ1 > 0 s.t. 0 < |x− a| < δ1 implies

||g(x)| − |M || ≤ |g(x)−M | < |M |2.

44

Page 45: Introduction, Motivation, Basics About Sets, Functions, Counting

(The first inequality here follows from the triangle inequality; i.e., ||A| − |B|| ≤ |A−B|. The proof of this is an exercise.)Therefore − |M |

2 < |g(x)| − |M | < |M |2 which implies, upon adding |M | , that

|g(x)| > |M |2> 0, if 0 < |x− a| < δ1. (18)

Now we prove (17). To do this, we need to prove the following is < ε when x is close enough to a:∣∣∣∣ 1g(x) − 1

M

∣∣∣∣ = ∣∣∣∣M − g(x)Mg(x)

∣∣∣∣ .From formula (18), we know 0 < |x− a| < δ1 implies∣∣∣∣M − g(x)

Mg(x)

∣∣∣∣ < |M − g(x)||M |22

= 2|M − g(x)||M |2 .

This can be made < ε since ∃δ < δ1 s.t. |M − g(x)| < ε2 |M |2 if 0 < |x− a| < δ. This completes the proof of (17).

Property 6. Look at h(x) = g(x)−f(x).Then h(x) ≥ 0, for all x in an open interval containing a, except perhaps when x = a.We know from earlier properties that lim

x→ah(x) =M −L � K. Thus it suffices to show K ≥ 0. Assume K < 0 and deduce a

contradiction. Note first that for x in our open interval with x �= a we have |h(x)−K| ≥ |K| = −K > 0. See Figure 26 below.But we know ∀ε > 0 ∃δ > 0 s.t. |h(x)−K| < ε for 0 < |x− a| < δ. This implies |K| = −K ≤ h(x) −K ≤ |h(x)−K| < ε∀ε > 0. Lemma 21 says |K| = 0. This contradicts our hypothesis that K < 0 and we’re done.

Figure 30: We plot f(x) = x4 + 1 ≥ 0 in red; g(x) = −1 in green. It should be clear that the distance between f(x) andg(x); i.e., |f(x)− (−1)| ≥ 1 for all x and thus f(x) cannot approach a negative number like −1 as x→ a.

45

Page 46: Introduction, Motivation, Basics About Sets, Functions, Counting

Example 1. limh→0

(x+h)2−x2h = 2x.

Here we are computing the derivative of the function f(x) = x2. This is the sort of limit everyone (e.g., Newton andLeibniz) could do without knowing the precise definition of limit. The computation goes as follows:

(x+ h)2 − x2h

=x2 + 2xh+ h2 − x2

h= 2x+ h.

Taking limits as h→ 0 and using the properties of limits given above, we see that

limh→0

(x+ h)2 − x2h

= limh→0

(2x+ h) = limh→0

2x+ limh→0

h = 2x.

Here we have used the fact that the limit of a sum is the sum of the limits, the limit of a constant function is the constant,limh→0

h = 0, and the limit of a product is the product of the limits. Note that x is a constant during our calculation.

Example 2.Define f(x) = 1 if x is rational and f(x) = 0 if x is irrational. Then lim

x→af(x) does not exist for any real number a. To

see this, you just have to note that any interval contains both rationals and irrationals. See the proof below. If limx→a

f(x) = L,

we know ∃δ > 0 s.t. 0 < |x− a| < δ implies |f(x)− L| < 12 . But there are points x such that 0 < |x− a| < δ with f(x) = 1

and other points u such that 0 < |u− a| < δ with f(u) = 0. Then 1 = |f(x)− f(u)| ≤ |f(x)− L| + |L− f(u)| < 1. But1 < 1 is impossible.

Theorem 27 Any open interval I contains both rational and irrational numbers.

Proof. Rationals in I.It suffices to show that there is a rational number arbitrarily close to any real number. Given ε > 0 (our measure of

closeness), we know by Lemma 22 there is a positive integer n such that n > 1ε . Therefore, it suffices to show that for any

real number a there is a rational number q such that |a− q| ≤ 1n .

Case 1. a is positive.If a > 0, then, for n as in the preceding paragraph, look at the set S = {k ∈ Z+|na < k} . This set has a least element

m by the well ordering axiom for the positive integers. This means (m− 1) ≤ na. Thereforem

n− 1

n≤ a < m

n,

which says∣∣a− m

n

∣∣ ≤ 1n < ε. Thus we have found a rational number within ε distance of a.

Case 2. a is negative.In this case −a is positive and we can use Case 1 to find a rational number q so that |−a− q| < ε. It follows that

|a− (−q)| < ε. Of course q rational implies −q is rational.Case 3. a=0.In this case, life is even easier as we have

∣∣0− 1n

∣∣ = 1n < ε.

Irrationals in I.We need to show that there is an irrational number arbitrarily close to any given rational number q. We know from the

preceding that we can choose a rational number r such that∣∣r − q√2∣∣ < ε. This implies ∣∣∣ r√

2− q∣∣∣ < ε√

2< ε. Since r√

2is

irrational (as otherwise r√2= u is rational and then so is

√2 = r

u contradicting Theorem 18), we are done.

Before considering another example, we need a Lemma.

Lemma 28 Squeeze Lemma. Suppose f(x) ≤ g(x) ≤ h(x) for all x in some open interval containing a. If limx→a

f(x) =

L = limx→a

h(x), then limx→a

g(x) = L also.

Proof. We know from Property 6 above that if limx→a

g(x) exists, it must be both ≤ L and ≥ L and, therefore = L. But whymust the limit exist? Given ε > 0, ∃δ1 s.t. 0 < |x− a| < δ1 implies |f(x)− L| < ε. Similarly ∃δ2 s.t. 0 < |x− a| < δ2implies |h(x)− L| < ε. It follows that if δ = min{δ1, δ2}, we have

0 < |x− a| < δ implies − ε < f(x)− L ≤ g(x)− L ≤ h(x)− L < ε.

46

Page 47: Introduction, Motivation, Basics About Sets, Functions, Counting

Thus0 < |x− a| < δ implies |g(x)− L| < ε.

Example 3. limx→0

sin xx = 1.

First look at Figure 31. That may convince you that the formula is correct.

Figure 31: a graph of sin xx

How are we going to prove this formula? Later we will see that this limit is the derivative of sinx at x = 0 and thus itis cos 0 = 1. Defining the sine and cosine as in your favorite calculus book, and measuring our angles in radians, we haveFigure 32 which implies that

sinx cosx

2≤ x

2≤ tanx

2.

Multiply by 2sin x to see that

cosx ≤ x

sinx≤ 1

cosx.

Thus sin xx is squeezed between 2 functions approaching 1 as x → 0 and by the Squeeze Lemma (which was Lemma 28), it

must approach 1 as well. Am I cheating by assuming that cosx is continuous at x = 0? We will say more about trigonometricfunctions later.

47

Page 48: Introduction, Motivation, Basics About Sets, Functions, Counting

Figure 32: A circle of radius 1 is drawn with an acute angle x measured in radians; i.e., 0 ≤ x < π2 . By looking at the aras

of the triangle 0AC, the arc of the circle with angle x, the triangle 0BC, we see that sin x cos x2 ≤ x2 ≤ tan x

2 .

48

Page 49: Introduction, Motivation, Basics About Sets, Functions, Counting

16 Limits Involving Infinity

Until now we have been dealing with limits of functions f(x) as x approaches a finite real number a. What happens if we letx approach ∞? This definition is similar to the definition of limit of a sequence. We can of course also let x approach −∞.

Definition 29 Suppose f : (a,∞) → R for some a. Then limx→∞f(x) = L means ∀ε > 0 ∃B > 0 s.t. x ≥ B implies

|f(x)− L| < ε.

Exercise. Show that limx→∞f(x) = L ⇐⇒ lim

y → 0y > 0

f( 1y ) = L.

Example 1. limx→∞

1x = 0.

Proof. Given ε > 0, we need to find B > 0 s.t. x ≥ B implies∣∣ 1x − 0

∣∣ = 1|x| < ε. Since x ≥ B > 0, |x| = x and thus we

need 1x < ε. This is equivalent to x >

1ε . So we can take B = 1 +

1ε . Then x ≥ B implies x > 1

ε and we have the desiredinequality.

It is possible to prove that these infinite limits have the usual properties of limits. We leave that to the reader to check.You can use the exercise after the definition to do this. That is, make the change of variables y = 1/x and let y go to 0.That will also simplify the following examples.Example 2. lim

x→∞1+x1−x = 1.

Proof. Note that 1+x1−x =

1+x1−x

1x1x

=1+ 1

x

1− 1x

. Using the various properties of limits this approaches 1+01−0 = 1, as x→∞.

Example 3. limx→−∞

1x = 0.

Example 4. limx→−∞

1+x1−x = −1.

49

Page 50: Introduction, Motivation, Basics About Sets, Functions, Counting

Part V

Continuous FunctionsIntuitively continuous functions defined on an open interval in R are those whose graphs have no breaks or holes. They canhave jagged peaks and valleys though. See Figure 33.

Figure 33: On the upper left is a continuous function on an interval, while on the lower right is a discontinuous function.

Definition 30 If S ⊂ R, f : S → R is continuous at a ∈ S

∀ε > 0 ∃δ > 0 s.t. |x− a| < δ implies |f(x)− f(a)| < ε. (19)

We say that f is continuous on the set S if f is continuous at every point a ∈ S.

The reader should compare the εδ definition in equation (19) with that in equation (13). The only difference is that herefor a continuous function we let x = a while before in the definition of limit as x approaches a we never allow x to equal a.Note: Assuming that a is an accumulation point of S, f is continuous at a iff lim

x→af(x) = f(a) using the more general

definition of limit mentioned above in the note after the definition of limit. If a is an isolated point of S, all functions definedon S are continuous at a. Thus all functions f whose domain of definition is Z are continuous on Z.

50

Page 51: Introduction, Motivation, Basics About Sets, Functions, Counting

Example 1. Polynomials p(x) = anxn + an−1xn−1 + · · ·+ a1x+ a0, (an, an−1, ..., a1, a0 ∈ R) are continuous at all pointsin R.Proof. This follows from Fact 1 below as well as the fact that constant functions f(x) = c are easily seen to be continuouseverywhere as is the identity function f(x) = x (exercise).

Example 2. Define f(x) ={

sin xx , x �= 01, x = 0.

This function is also continuous at all points in R.

Proof. The continuity at non-zero points is easy from Fact 1 below. At x = 0, however, one must use the definition ofcontinuity and Example 3 from the section on limits. See Figure 31.

Example 3. Define f(x) ={x sin 1

x , x �= 00, x = 0.

. This function is also continuous at all points in R. See Figure 34 below.

The unusual aspect of this function is that there are infinitely many peaks and valleys on the interval [0, 1]. Thus you cannot really draw the graph in a finite amount of time.Proof. Again the continuity at non-zero points is easy from Facts 1 and 2 below. But at x = 0 one must use the definitionof continuity and the fact that |sin θ| ≤ 1 for all angles θ. Thus

∣∣x sin 1x

∣∣ ≤ |x| and we can take ε = δ in the definition ofcontinuity at 0.

Figure 34: a graph of x sin 1x . There are infinitely many wiggles in this curve near 0 although the function is everywhere

continuous.

Facts About Continuous Functions.Fact 1) Sum, Product, Quotient of Continuous Functions are Continuous. Suppose f, g : S → R are bothcontinuous at a ∈ S. Then the sum f + g and product fg are also continuous at a. If g(a) �= 0, then the quotient fg is alsocontinuous at a.Fact 2) Composite of Continuous Functions is Continuous. Suppose f : S → T and g : T → R for subsets S, Tof R. If a ∈ S, then b = f(a) ∈ T. If f is continuous at a and g is continuous at b = f(a), then the composite functiong ◦ f : S → R is continuous at a. Recall that (g ◦ f) (x) = g (f(x)) .Proof. Fact 1) These facts follow from the corresponding facts about limits.Fact 2) Given ε > 0 we know there is δ > 0 s.t. |y − b| < δ implies |g(y)− g(b)| < ε. For this very δ we know there is aδ1 > 0 s.t. |x− a| < δ1 implies |f(x)− f(a)| < δ. Putting these 2 sentences together gives (since b = f(a))

|x− a| < δ1 implies |f(x)− f(a)| < δ which implies |g(f(x))− g(b)| < ε.

51

Page 52: Introduction, Motivation, Basics About Sets, Functions, Counting

The most important properties of a continuous function on a closed finite interval [a, b] are the Intermediate Value Theoremand the Weierstrass Theorem on the existence of maxima and minima. We state these 2 theorems first and then prove them.

Theorem 31 Intermediate Value Theorem. Suppose f : [a, b]→ R is continuous. Then for every γ between f(a) andf(b) there exists a point c ∈ [a, b] such that f(c) = γ.Roughly this says that the graph of f has no breaks or holes. It must cross every horizontal line y = γ if γ is between

f(a) and f(b). See Figure 35.

Figure 35: Assume f is continuous on the closed finite interval [a, b]. The intermediate value theorem says that for any γin f [a, b] there is a point where the horizontal line y = γ intersects the graph of y = f(x). We label the point (c, γ) on ourgraph.

Theorem 32 Weierstrass Theorem on the Existence of Maxima and Minima. Suppose f : [a, b] → R iscontinuous. Then there exists a point c ∈ [a, b] such that f(c) is the minimum value of f on [a, b]; i.e., f(c) ≤ f(x)∀x ∈ [a, b]. We write f(c) = min{f(x) | x ∈ [a, b]}. Similarly there exists a point d ∈ [a, b] such that f(d) is the maximumvalue of f on [a, b]; i.e., f(d) ≥ f(x) ∀x ∈ [a, b]. Write f(d) = max{f(x)x ∈ [a, b]}.

The Weierstrass theorem assures the existence of solutions to max-min problems for continuous functions on closed finiteintervals. If you drop either the hypothesis that the interval is closed and finite or the hypothesis that the function iscontinuous on the interval, then the conclusion may be false. Ultimately it is possible to replace the closed interval with an

52

Page 53: Introduction, Motivation, Basics About Sets, Functions, Counting

arbitrary compact set (see Lang, Undergraduate Analysis, for the definition) such as the Cantor dust (see Figure 36) obtainedby repeatedly removing middle thirds of intervals in [0, 1]. We will see this example again later in the course. The actualCantor set is invisible. We can only show pictures of approximations to the set. The Cantor set is a fractal. We will saymore about fractals later. See Falconer, Fractal Geometry for more information on fractals.

Figure 36: 5 approximations to the Cantor dust, each approximation removing middle thirds of the intervals in the approxi-mation above.

You may question how useful it is to know that a max or min exists without knowing how to find it. Sometimes existencedoes tell you everything. Of course once we have derivatives we will have more information on the location of a max or min.The following example says BEWARE that you know your function and interval satisfy the Weierstrass hypotheses.

Example. Consider the function f(x) = 1x , for x ∈ (0, 1]. This function has no maximum value even though it is

continuous on the interval (0,1]. This is not a closed interval of course.

Given a function like f(x) = x5 − 3x+129, you probably know (at least if you have taken numerical analysis) there aremay ways to solve f(c) = 7 approximately. Programs like Mathematica, Matlab, Scientific Workplace do these things withamazing speed. Scientific Workplace tells me the 5 solutions are approximately

{2. 104 4 + 1. 504 0i, 2. 104 4− 1. 504 0i,−0.780 88− 2. 505 9i,−0.780 88 + 2. 505 9i,−2. 647 0} .

Only the last root is real. That is the one we are talking about here. Yeah, this is real analysis not complex. So we ignore4 out of 5 roots. In Figure 37 we plot the function f(x) = x5 − 3x + 129 and the line y = 7 on the interval [−3, 1]. Theintermediate value theorem tells us that the root c of f(c) = 7 exists since f(−3) = −105 and f(1) = 127.

In what follows we give a numerical analyst’s proof of the Intermediate Value Theorem (from V. Bryant, Yet AnotherIntroduction to Analysis). It tells you how to approximate a point c ∈ [a, b] with f(c) = γ.Proof. of the Intermediate Value Theorem by the Bisection Method. This is a proof by induction.Step 1. Set r1 = a and s1 = b. Assume f(r1) < f(s1). Then f(r1) < γ < f(s1). Define m1 =

r1+s12 =the midpoint

of the interval [r1, s1]. To obtain the next subinterval [r2, s2], there are 3 cases according to the location of f(m1). See theFigure 38.Case a. If f(m1) < γ, set r2 = m1 and s2 = s1.Case b. If f(m1) > γ, set r2 = r1 and s2 = m1.Case c. f(m1) = γ and we are done as m1 = c.Induction Step. Assume that [rj , sj ] have been found for j = 1, 2, ..., n. Define [rn+1, sn+1] as follows. Look at themidpoint mn =

rn+sn2 . Again consider 3 cases according to the location of f(mn).

Case a. If f(mn) < γ, set rn+1 = mn and sn+1 = sn.Case b. If f(mn) > γ, set rn+1 = rn and sn+1 = mn.Case c. f(mn) = γ and we are done as mn = c.Assuming we are never in Case c (when we would be done after a finite number of steps), we create 2 infinite sequences

{rn} and {sn} with the following properties:

a = r1 ≤ r2 ≤ · · · ≤ rn ≤ rn+1 ≤ sn+1 ≤ sn ≤ · · · ≤ s2 ≤ s1 = b

and sn − rn = b−a2n−1 .

53

Page 54: Introduction, Motivation, Basics About Sets, Functions, Counting

Figure 37: The graph of f(x) = x5 − 3x + 129 and that of the line y = 7. The x-coordinate of the intersection of the 2curves is approximately −2.647.

Claim. limn→∞rn = lim

n→∞sn = c and f(c) = γ.

Proof of Claim. The sequence {rn} is increasing and bounded. Thus it must have a limit which we will call c, using Fact6 in the section on facts about limits of sequences. Similarly {sn} is decreasing and bounded and must have a limit whichwe will call c′. We also know that

sn − rn = b− a2n−1

→ 0 as n→∞.It follows that lim

n→∞ (sn − rn) = c′ − c = 0.

We know also thatf(rn) ≤ γ ≤ f(sn).

Since f is continuous, we havef(c) = lim

n→∞f(rn) ≤ γ ≤ limn→∞f(sn) = f(c).

That completes the proof of the Intermediate Value Theorem.

Next we prove the earlier theorem about existence of maxima and minima. This will require us to remember the factsabout limits of sequences proved using the Spanish Hotel argument.Proof. of the Weierstrass Theorem that a Continuous Function Achieves its Maxima and Minima on a FiniteClosed Interval [a,b].Suppose f : [a, b]→ R is continuous. We want to find c ∈ [a, b] so that f(x) ≤ f(c) for all x ∈ [a, b]. We need 3 steps.

Step 1. Any sequence in the closed interval [a, b] has a subsequence converging to a limit in [a, b].This is Fact 3 in the list of facts about Cauchy (and other) sequences of real numbers. It was proved using the Spanish

Hotel argument.Step 2. The set {f(x)|x ∈ [a, b]} , is bounded above.This is true since otherwise there is a sequence of points xn ∈ [a, b] such that f(xn) > n, for all n ∈ Z+. By Step 1,

we know that the sequence {xn} has a convergent subsequence {xnk} such that limk→∞

xnk = c ∈ [a, b]. Now we know thatf(xnk) > nk ≥ k, for all k ∈ Z+. This implies that lim

k→∞f (xnk) =∞. But this contradicts the continuity of the function f

which implies that limk→∞

f (xnk) = f(c) which is a finite real number.

54

Page 55: Introduction, Motivation, Basics About Sets, Functions, Counting

Figure 38: Here the visualize the 1st step in the bisection method with the 1st interval being [a, b] = [r1, s1] = [−3, 1]. Wehave f(x) = x5 − 3x + 129. and seek to find c with f(c) = 7. This is the polynomial from Figure 37. The midpoint ism1 = −1. Then clearly f(m1) > γ = 7. So the next interval is [r2, s2] = [−3,−1]. Of course you can see from the graphapproximately where the curve crosses the line y = 7. And we noted above that an approximation to c with f(c) = 7 is−2. 647 0. The reader should find the next interval [r3, s3], or even enough intervals to get our approximation to c.

55

Page 56: Introduction, Motivation, Basics About Sets, Functions, Counting

Step 3. l.u.b. {f(x)|x ∈ [a, b]} .= β = max {f(x)|x ∈ [a, b]} = f(c) for some c ∈ [a, b].By Step 2 and the completeness axiom we know the least upper bound β exists. But why does β = f(c) for some

c ∈ [a, b]? By the definition of least upper bound, there is a point un ∈ [a, b] such that

f(un)̇ > β − 1

n, for every n ∈ Z+.

Once more by Fact 3 about Cauchy (and other) sequences, we know {un} has a convergent subsequence {unk} withlimk→∞

unk = c ∈ [a, b]. We want to show that f(c) = β. To do this, note that

β ≥ f(unk) > β −1

nk≥ β − 1

kfor every k ∈ Z+.

Therefore by the Squeeze Lemma (which was Lemma 28) and the continuity of f ,

β = limk→∞

f (unk) = f( limk→∞

unk) = f(c).

This completes the proof of the Weierstrass Theorem on Existence of Maxima. We leave minima to the reader. Itamounts to replacing f by −f . You should not have to do the entire proof over again.

The following Corollary follows from the 2 preceding theorems.

Corollary 33 If a function f : [a, b]→ R is continuous, then the image f [a, b] = {f(x)|x ∈ [a, b]} is also a closed interval.

56

Page 57: Introduction, Motivation, Basics About Sets, Functions, Counting

Part VI

Derivatives17 Examples

Next we finally get to do some calculus (if by calculus, you mean derivatives and integrals only). We start with derivatives.It would be hard to do applied mathematics without derivatives. Newton invented calculus (along with Leibniz) to

formulate Newton’s 3 laws of motion such as force = mass × acceleration (1666). Of course acceleration is a second derivative.See E.T. Bell, Men of Mathematics for some of this story. The derivative can be used to represent all sorts of instantaneousrates of change - not just that of distance with respect to time. Thus one finds differential equations in physics, chemistry,economics, biology, ecology, weather modeling.For example, the predator-prey equations describe the evolution of 2 interacting species such as cats and mice on a

desert island. If u(t)=number of mice at time t, v(t)=number of cats at time t, the predator-prey equations say:

u′ = (a− bv −mu)uv′ = (cu− d− nv)v,

where a, b, c, d,m, n are constants. References for these equations include H. Kocak, Differential and Difference Equationsthrough Computer Experiment ; or J. M. Maynard Smith, Mathematical Ideas in Biology. See K. Devlin, Mathematics: TheScience of Patterns, Chapter 3, for more examples.The difference between the graph of an everywhere differentiable function and a merely everywhere continuous function

is quite visible. The more derivatives a function has the smoother the graph.Example 1. An infinitely differentiable function The Gaussian or normal density.Consider the function G(x) = 1√

πe−x

2

which is plotted in Figure 39. The nth derivative G(n)(x) exists for everyn = 1, 2, 3, ... and every real point x. This graph is as smooth as they come.

Figure 39: plot of G(x) = 1√πe−x

2

Example 2. A function which is not differentiable at one point.Consider the absolute value function |x| whose graph is in Figure 40. This function has derivative 1 for x > 0, −1 for

x < 0, but no derivative at x = 0. You can see the problem in the graph. There is a very sharp turn at x = 0.

57

Page 58: Introduction, Motivation, Basics About Sets, Functions, Counting

Figure 40: plot of |x|

Example 3. Weierstrass Continuous Nowhere Differentiable Function.Here we examine a function that is as non-smooth as one can imagine. In fact, people did not believe that such a function

could exist before Weierstrass. Hermite described these functions as a "dreadful plague." Poincaré wrote: "Yesterday, if anew function was invented it was to serve some practical end; today they are specially invented only to show up the argumentsof our fathers, and they will never have any other use." Even as late as the 1960’s, before "everyone" had a computer fastenough to graph these things, such examples were viewed as pathological monsters. Now there are thousands of websiteswith pictures of approximations of them.Later we will examine the Weierstrass continuous nowhere differentiable function in more detail. If you are impatient, see

Falconer, Fractal Geometry. TheWeierstrass function is defined by an infinite series depending on 2 parameters λ ands:

f(x) =∞∑k=0

λ(s−2)k sin(λkx

), for λ > 1 and 1 < s < 2.

We will prove later (in Lectures, II) that f(x) is continuous. This is an easy consequence of the Weierstrass M-test foruniform convergence of series of functions. However it can be shown that the derivative f ′(x) does not exist at any point x.We will give exercises on this in a final exam.Figures 41, 42, and 43 show graphs of approximations of such functions on the interval [0, .5] (in which the infinite series

is cut off at some finite number of terms). It turns out that the graph of the Weierstrass function is a fractal having boxdimension s. We will define box dimension in a final exam. See Falconer, loc. cit., if you feel impatient. A smooth curvewould have box dimension 1. The closer to 2 the dimension gets, the more the curve fills the 2D picture. In 41 we takeλ = 1.5, s = .7, and cut off the sum at k = 3000. In Figure 42 we take λ = 1.5, s = .9, and cut off the sum at k = 3000. InFigure 43 we take λ = 5, s = .9, and cut off the sum at k = 1000.

58

Page 59: Introduction, Motivation, Basics About Sets, Functions, Counting

Figure 41: Mathematica plot of the sum3000∑k=0

1.5(1.7−2)k sin(1.5kx

).

Figure 42: Mathematica plot of the sum3000∑k=0

1.5(1.9−2)k sin(1.5kx

).

59

Page 60: Introduction, Motivation, Basics About Sets, Functions, Counting

Figure 43: Mathematica plot of the sum1000∑k=0

5(1.9−2)k sin(5kt)

60

Page 61: Introduction, Motivation, Basics About Sets, Functions, Counting

18 Definition of Derivative

The derivative f ′(x) can be thought of geometrically as the slope of the tangent line to the curve y = f(x) at the point(x, f(x)). But what does that really mean? It means that you take limits of slopes of secant lines; i.e., lines through points(x, f(x)) and (x+ h, f(x+ h)) for small values of |h| . See Figure 44.

Figure 44: The derivative of f(x) at the point x is the limit of the slopes ΔyΔx =

f(x+Δx)−f(x)Δx of the secant lines as Δx→ 0.

Here by secant line, we mean the line connnecting (x, f(x)) and (x+Δx, f(x+Δx)). The derivative f ′(x) is the slope of thetangent to the curve y = f(x) at the point (x, f(x)).

If instead t =time and y = f(t) = distance from origin at time t, then ΔyΔt =

f(t+Δt)−f(t)Δt = average velocity over the

interval from t to t+Δt. The instantaneous velocity at time t is the limit of ΔyΔt as Δt approaches 0.More precisely, suppose the function f is defined on an open interval I containing the point c

Definition 34 The derivative f ′(c) is defined by

f ′(c) .= limh→0

f(c+ h)− f(c)h

.=df

dx(c).

If this limit exists we say that f is differentiable at c. When f is differentiable at every point in a set S, we say that f isdifferentiable on S.

61

Page 62: Introduction, Motivation, Basics About Sets, Functions, Counting

Example 1. Consider f(x) = x3− 2x+1. This is differentiable at any point x, as are all polynomials. Let’s compute thederivative.

limh→0

f(x+ h)− f(x)h

= limh→0

((x+ h)3 − 2(x+ h) + 1)− (x3 − 2x+ 1)

h

= limh→0

(x3 + 3x2h+ 3xh2 + h3 − 2(x+ h) + 1)− (x3 − 2x+ 1)

h

= limh→0

3x2h+ 3xh2 + h3 − 2hh

= 3x2 − 2.

Example 2. Define the function g(x) =

{1, x ∈ Q0, x /∈ Q. The graph looks like 2 horizontal lines even though one line

has way fewer points, as the rationals are denumerable while the irrationals are not. We saw earlier that this is not acontinuous function at any point x since any interval on the real line contains both rationals and irrationals. See the sectionon properties of limits. Later we will show that this implies the derivative cannot exist at any point. Thus this function isnowhere differentiable.

It is also possible to define 1-sided derivatives of a function f at a point c. For the right-hand derivative at c, you onlyneed f to be defined on [c, c+ δ) for some δ > 0.

Definition 35 The right-hand derivative is defined by: f′+(c) = lim

h→ 0h > 0

f(c+h)−f(c)h .

We leave it to the reader to define the left-hand derivative. The right- and left-hand derivatives must be equal at thepoint c in order for f ′(c), the 2-sided derivative to exist.Example. The absolute value. The function f(x) = |x| has right-hand derivative f

′+(0) = 1, while the left-hand

derivative f′−(0) = −1. Thus, the absolute value function does not have a 1-sided derivative. To see the right-hand formula,

note that

f′+(c) = lim

h→ 0h > 0

|0 + h| − |0|h

= limh→ 0h > 0

|h|h= lim

h→ 0h > 0

h

h= 1.

We leave it to the reader to check the left-hand formula.

19 Alternative Definition of the Derivative: The Linear Approximation.

Definition 36 Suppose f is defined on the open interval I. If c ∈ I, then f is differentiable at c if and only if there is areal number L and a function φ defined on an open interval containing 0 such that

f(c+ h) = f(c) + Lh+ φ(h), and limh→0

φ(h)

h= 0. (20)

In formula (20), L = f ′(c). Note that the first 2 terms on the right (f(c) + Lh) are a linear function of h (holding cfixed). We view φ(h) as a 2nd order term since φ(h)

h must approach 0 with h. This 2nd definition of derivative is thebeginning of a Taylor expansion at c. See Figure 45 for an illustration of formula (20).

Set

ψ(h) =

{φ(h)h , h �= 00, h = 0.

(21)

Then ψ is continuous at 0 and φ(h) = hψ(h) and we see that φ(h) approaches 0 faster than h as h → 0. It is also clearthat φ is continuous at 0.

62

Page 63: Introduction, Motivation, Basics About Sets, Functions, Counting

Figure 45: The linear approximation to the derivative (as a function of h holding x fixed) is shown here as f(x) + f ′(x)h.The difference with f(x+ h) is the function φ(h). One has f(x+ h)− (f(x) + f ′(x)h) = φ(h).

63

Page 64: Introduction, Motivation, Basics About Sets, Functions, Counting

We will need this 2nd definition of derivative in proving the chain rule. It is also a useful way to get a rough linearapproximation to a non-linear function. Before showing the 2 definitions of derivative are the same, let’s do an example.Example. Let f(x) =

√x. Then we can show that f ′(x) = 1

21√x.

f ′(x) =√x+ h−√x

h=

√x+ h−√x

h

√x+ h+

√x√

x+ h+√x=

x+ h− xh(√x+ h+

√x) = 1√

x+ h+√x→ 1

2

1√x, as h→ 0.

Here we assume√x is continuous for x ≥ 0. Exercise. Prove the preceding statement.

Consider using formula (20) to approximate√5 =

√4 + 1 ∼=

√4 + L1, where L = f ′(4) = 1

4 . So we get√5 ∼= 2.25.

Scientific Workplace says√5 = 2.236 1. Our result was not so bad.

Theorem 37 The usual definition 34 of derivative is equivalent to the linear approximation definition 36 with f ′(c) = L.

Proof. Definition 34 implies 36. Suppose f is differentiable at the point c according to Definition 34. Set L = f ′(c)and set

φ(h) = f(c+ h)− f(c)− Lh.Then

limh→0

φ(h)

h= lim

h→0

f(c+ h)− f(c)− Lhh

= limh→0

(f(c+ h)− f(c)

h− L

)= lim

h→0

(f(c+ h)− f(c)

h− L

)= 0,

since L = f ′(c). This proves 36.Definition 36 implies 34. Suppose there is a real number L and there is a function φ as in Definition 36. So we have

f(c+ h) = f(c) + Lh+ φ(h).

Subtract f(c) and divide by h to obtain

f(c+ h)− f(c)h

=Lh+ φ(h)

h= L+

φ(h)

h.

Since we know from Definition 36 that limh→0

φ(h)h = 0, It follows that lim

h→0

f(c+h)−f(c)h = L and f is differentiable according to

Definition 34 with f ′(c) = L.

20 Properties of the Derivative.

To make our lives easier when trying to find the formula for the derivative of some function, it will help to know the propertiesof the derivative.Just the Facts about Derivatives.Fact 1) Differentiability implies Continuity. Suppose f is differentiable at c. Then f must be continuous at c.Fact 2) Derivative is Linear. Suppose f and g are differentiable at c. And suppose k is a real number. Then f + g andkf are differentiable at c and

(f + g)′(c) = f ′(c) + g′(c); (kf)

′(c) = kf ′(c).

Fact 3) Product Rule. Suppose f and g are differentiable at c. Then so is the product fg and

(fg)′(c) = f(c)g′(c) + f ′(c)g(c).

Fact 4) Quotient Rule. Suppose f and g are differentiable at c and g(c) �= 0. Then the quotient fg is differentiable at cand (

f

g

)′(c) =

f ′(c)g(c)− f(c)g′(c)g2(c)

.

64

Page 65: Introduction, Motivation, Basics About Sets, Functions, Counting

Fact 5) Chain Rule. Suppose f is differentiable at c and g is differentiable at f(c). Then the composite (g ◦ f) (x) =g (f(x)) is differentiable at c and

(g ◦ f)′ (c) = g′ (f(c)) f ′(c).The Leibniz notation makes the chain rule easy to remember, setting u = f(x) and y = g(f(x)) = g(u), we have

dy

dx=dy

du

du

dx.

It seems to say that you can just cancel the du’s.Proof. Fact 1) We need to show that the existence of f ′(c) implies f(c + h) approaches f(c) as h → 0. Use the linearapproximation Definition 36 of the derivative. This says

f(c+ h) = f(c) + f ′(c)h+ φ(h), whereφ(h)

h→ 0 as h→ 0.

To finish the proof, you just need to convince yourself that φ(h)→ 0 as h→ 0. We saw this after Formula (21) for example.Fact 2) I leave these proofs to you as Exercises.Fact 3) To show the product rule, we use the trick we needed in the proof of the limit of a product result, i.e., we add andsubtract a term in between the 2 things in the numerator of the difference quotient for the derivative:

f(c+ h)g(c+ h)− f(c)g(c)h

=f(c+ h)g(c+ h)− f(c+ h)g(c) + f(c+ h)g(c)− f(c)g(c)

h

=f(c+ h)g(c+ h)− f(c+ h)g(c)

h+f(c+ h)g(c)− f(c)g(c)

h

= f(c+ h)g(c+ h)− g(c)

h+f(c+ h)− f(c)

hg(c).

Using Fact 1 and properties of limits, we see that this last mess approaches f(c)g′(c) + f ′(c)g(c) as h→ 0.Fact 4) To prove the quotient rule it suffices to do the case f(x) = 1, ∀x in the domain of f and g, by the product rule. Todo this, look at

1g(c+h) − 1

g(c)

h=1

h

g(c)− g(c+ h)g(c)g(c+ h)

=1

g(c)g(c+ h)

g(c)− g(c+ h)h

.

Since g is non-zero in an open interval containing c (because g(c) �= 0 and g is continuous at c by Fact 1), the denominatoris not vanishing for small enough values of h. Since g is continuous at c by Fact 1, we see that the last mess approaches1

g(c)2 (−g′(c)) as h→ 0.

Fact 5) It is easy to convince ourselves of the formula by writing

Δy

Δx=Δy

Δu

Δu

Δx.

Here Δy = g(x+Δx)− g(x) and Δu = f(x+Δx)− f(x). The problem with this "proof" is that Δu might vanish at lotsof points as Δx→ 0. Then you would be dividing by 0 (which is certainly a real no-no). So we need to devise a proof thatavoids this pitfall. That is really why we like the 2nd definition of the derivative using the function ψ rather than φ; seeFormulas (36) and (21).Let

k = k(h) = f(x+ h)− f(x) = f ′(x)h+ hψ1(h) (22)

and let y = f(x). Then we have

g(f(x+ h))− g(f(x)) = g(y + k)− g(y) = g′(y)k + kψ2(k).It follows that

g(f(x+ h))− g(f(x))h

=g′(y)k + kψ2(k)

h= g′(y)

k

h+k

hψ2(k).

Now from formula (22) we see that kh approaches f

′(x) as h→ 0. We also use the fact that the functions ψ2(k) and k arecontinuous at h = 0 and both take the value 0 at h = 0. Therefore the limit of the last mess is g′(f(x))f ′(x) as h→ 0 andthe chain rule is proved.

65

Page 66: Introduction, Motivation, Basics About Sets, Functions, Counting

The Usual Examples.Example 1. Let f(x) = k=constant, for all x ∈ R. Then f ′(x) = 0, for all x ∈ R.Proof.

f(x+ h)− f(x)h

=k − kh

= 0.

Take the limit as h→ 0 and you get 0 (what else?)Example 2. Let f(x) = ax+ b, ∀x ∈ R where a, b are constants. Then f ′(x) = a ∀x ∈ R.Proof.

f(x+ h)− f(x)h

=(a(x+ h) + b)− (ax+ b)

h=ah

h= a.

Take the limit as h→ 0 and you get a, which is the derivative. You could also do this example by doing the case a = 1, b = 0first and then using the linearity property of derivatives.Example 3. Let f(x) = xn, where n = 1, 2, 3, 4, 5, .... Then f ′(x) = nxn−1.

Proof. by Mathematical Induction.The Case n=1. We have done the case n = 1 in Example 2. Here we get f(x) = x, f ′(x) = 1x1−1 = 1x0 = 1.Induction Step. Use the fact that xn = x · xn−1 and the product rule to see that

d(xn)

dx=d(x · xn−1)

dx=dx

dxxn−1 + x

d(xn−1

)dx

= 1xn−1 + x(n− 1)xn−2 = nxn−1.

Here we used the (n− 1)st formula. This completes the proof of the Induction step.

We can use these examples and the rules for derivatives to deduce that all polynomials

p(x) = anxn + an−1xn−1 + · · ·+ a1x+ a0, where aj ∈ R

are everywhere differentiable. By the quotient rule we can differentiate rational functions p(x)q(x) , for polynomials p(x),

q(x), at any point x ∈ R such that q(x) �= 0. Any computer with a program such as Scientific Workplace, Matlab orMathematica can grind out these derivatives and lots more. We have to say more about ex, log x, sinx, cosx, xa, forx > 0, a ∈ R, before we can do more interesting derivatives. Then we can use the chain rule to get derivatives of functionslike

√1 +

√x or xx or sin( 1x ).

Our next goals are to figure out the mean value theorem and the formula for the derivative of the inverse of a function.Here we mean the inverse function for the operation of composition of functions, such as the derivative of

√x knowing the

derivative of x2.

21 The Mean Value Theorem and Applications

This theorem will be of great use in deducing properties of the function f(x) from properties of its derivative f ′(x). It alsohelps us to prove that any function whose derivative is identically 0 on an interval must be a constant. This fact is importantfor integration. It explains why the antiderivative tables always have +C at the end of the formulas.

Theorem 38 The Mean Value Theorem. Suppose f : [a, b]→ R is continuous and differentiable on (a, b). Then thereis a point c ∈ (a, b) such that

f ′(c) =f(b)− f(a)b− a .

The conclusion of the mean value theorem says that the tangent to the curve y = f(x) at the point (c, f(c)) is parallel tothe line through the points (a, f(a)) and (b, f(b)). See Figure 46.

66

Page 67: Introduction, Motivation, Basics About Sets, Functions, Counting

Figure 46: The mean value theorem says the slope of the line through (a, f(a)) and (b, f(b)) equals the slope of the tangentat some point c.

67

Page 68: Introduction, Motivation, Basics About Sets, Functions, Counting

According the C.H. Edwards, Jr., in The Historical Development of the Calculus, p. 314, our proof is due to O. Bonnet(1819-1890). The proof is 1st done in a special case and we need to know the 1st Derivative Test before proceeding.

Theorem 39 First Derivative Test. Suppose that f : (a, b)→ R is differentiable and

f(c) ≥ f(x) for all x ∈ (c− δ, c+ δ) ⊂ (a, b) for some δ > 0.Then we say that f has a local maximum at x = c. It follows that the derivative f ′(c) = 0.

Proof. Consider the difference quotient

f(c+ h)− f(c)h

=

{number ≤ 0, if δ > h > 0number ≥ 0, if − δ < h < 0.

See Figure 47 For the derivative f ′(c) to exist, this must have a limit as h→ 0. This limit must be 0 as the right hand limitis ≤ 0, while the left-hand limit is ≤ 0 (since limits preserve inequalities).

Figure 47: The first derivative test for a local maximum is pictured. The slopes of secant lines on the left are positive whilethose on the right are negative. So the slope of the tangent line must be 0.

Proof. of the Mean Value Theorem.Special Case. f(a) = f(b) = 0. (also known as Rolle’s Theorem).By the hypothesis of the mean value theorem f : [a, b]→ R is continuous and differentiable on (a, b). In this special case

f(b)−f(a)b−a = 0 and thus we need to find a point c ∈ (a, b) such that f ′(c) = 0. If f(x) =constant, then any point c ∈ (a, b) will

work. Otherwise we know f(u) �= f(a) for some u ∈ (a, b). We may assume f(u) > f(a). Otherwise we can make a similarargument which is left to the reader. Let c be the point in [a, b] such that f(c) = max{f(x)|x ∈ (a, b)}. We know that themaximum of a continuous function is attained on a closed finite interval by the Weierstrass theorem (see Theorem 32). Itfollows from the fact that f(a) = f(b) that c �= a and c �= b. Then by the first derivative test we know that f ′(c) = 0. Thiscompletes the proof of the special case.

68

Page 69: Introduction, Motivation, Basics About Sets, Functions, Counting

In general, f(a) �= f(b).Then we pull a function out of a hat or bonnet:

g(x).= f(x)− f(b)− f(a)

b− a (x− a).

The equation of the line joining the points (a, f(a)) and (b, f(b)) is h(x) = f(a) + f(b)−f(a)b−a (x− a). So g(x)− f(a) is the

difference of f(x) and h(x). Then g(x) satisfies the hypotheses of the special case (Rolle’s theorem). For g(a) = f(a) = g(b).Moreover g′(x) = f ′(x) − f(b)−f(a)

b−a . So the special case tells us that there is a point c ∈ (a, b) such that g′(c) = 0. Thismeans

f ′(c) =f(b)− f(a)b− a ,

which is exactly what the mean value theorem asserts in the general case and we are done.

We get the following corollary immediately since now knowing that the slopes of all tangent lines are positive implies thesame for all secant lines.

Corollary 40 Suppose that f : [a, b] → R is continuous and differentiable on (a, b). If f ′(x) > 0 for all x ∈ (a, b), thenthe function f(x) is monotone strictly increasing on [a, b]; i.e., for every pair of points u, v ∈ [a, b] such that u < v, wehave f(u) < f(v).

Proof. By the mean value theoremf(v)− f(u)v − u = f ′(c) for some c ∈ (u, v). (23)

Here we replace a, b with u, v, of course. But we know f ′(c) > 0 and v − u > 0. It follows from equation (23) thatf(v)− f(u) > 0. Thus f(u) < f(v).

Similarly one can show that if f : [a, b] → R is continuous and differentiable on (a, b) and f ′(x) < 0 for all x ∈ (a, b),then the function f(x) is monotone strictly decreasing on [a, b]; i.e., for every pair of points u, v ∈ [a, b] such that u < v,we have f(u) > f(v). This is an exercise.Yet another exercise is to show that if f : [a, b] → R is continuous and differentiable on (a, b) and f ′(x) ≥ 0 for all

x ∈ (a, b), then the function f(x) is monotone increasing on [a, b]; i.e., for every pair of points u, v ∈ [a, b] such thatu ≤ v, we have f(u) ≤ f(v). Similarly if f : [a, b] → R is continuous and differentiable on (a, b) and f ′(x) ≤ 0 for allx ∈ (a, b), then the function f(x) is monotone decreasing on [a, b]; i.e., for every pair of point u, v ∈ [a, b] such thatu ≤ v, we have f(u) ≥ f(v).A similar fact is important enough that we state it as a Corollary. We will need it in the theory of integration.

Corollary 41 Suppose that f : [a, b]→ R is continuous and differentiable on (a, b) and f ′(x) = 0 for all x ∈ (a, b). Thenf(x) is constant.

Proof. This is left to the reader. If v ∈ (a, b], use the mean value theorem to see that f(v)−f(a)v−a = 0.

In the next theorem we use a fact about the second derivative which is of course the derivative of the derivative; i.e.,f ′′ = (f ′)′. We use the 2nd derivative to obtain a test for whether the graph of f(x) is strictly convex up, meaning thatthe graph of y = f(x), for u ≤ x ≤ v lies below the chord (which is the finite part of the secant line) connecting (u, f(u))and (v, f(v)) for every pair of points u, v in the domain of f. You can state this as an inequality by parameterizing the chordthrough (u, f(u)) and (v, f(v)) as the set of points t(u, f(u)) + (1− t)(v, f(v)) = (tu+ (1− t)v, tf(u) + (1− t)f(v)) for theparameter t ∈ [0, 1]. Then f(x) is strictly convex up if

f(tu+ (1− t)v) < tf(u) + (1− t)f(v), for all t ∈ [0, 1].See Figure 48 for an example.Of course there will be a similar test telling whether the graph is strictly convex down (lying above all the secants).

The terminology for this is often confusing. The word "convex" or its negative is often replaced by the word "concave,"especially in older calculus books such as E. Purcell, Calculus with Analytic Geometry, where the definition is slightly differentin another way. Purcell’s definition involves the placement of the tangent lines rather than the secant lines.

69

Page 70: Introduction, Motivation, Basics About Sets, Functions, Counting

Figure 48: A curve is "convex up" if it always lies below the chord (or finite part) of the secant line for every pair of pointsu, v in the domain of the function. Here the chord is shown in turquoise while the curve is in red.

70

Page 71: Introduction, Motivation, Basics About Sets, Functions, Counting

Theorem 42 Convexity Test. Suppose that f : [a, b] → R is continuous and the second derivative f ′′(x) exists andf ′′(x) > 0 for all x ∈ (a, b). Then f(x) is strictly convex up.

Proof. The equation of the secant line is given by

L(x) = f(u) +f(v)− f(u)v − u (x− u).

Set g(x) = L(x)− f(x). Our goal is to show that g(x) > 0 for all x ∈ (u, v). Note that g(u) = g(v) = 0.Use the mean value theorem for f on the interval (u, v) to see that g′(x) = f ′(d) − f ′(x), for some d ∈ (u, v). Here d

depends only on u and v, and is independent of x.Then g”(x) = −f ′′(x). The hypothesis that f ′′(x) > 0 for all x ∈ (a, b) implies that g′ is strictly decreasing on [u, v]. We

see that g′(x) = f(v)−f(u)v−u − f ′(x). We know that g′(d) = 0 for some d ∈ (u, v). So we must have g′(x) > 0 for u < x < d

and g′(x) < 0 for d < u < v. It follows from our discussion after the mean value theorem that g(x) strictly increases from0 to some positive value as x goes from u to d and then g(x) decreases from this positive value back to 0 as x goes from d tov. This is just what we wanted as it shows g(x) > 0 for all x ∈ (u, v).

Theorem 43 Second Derivative Test. If f : [a, b]→ R, suppose f ′(x) exists at any x in an open interval containing thepoint c, and f ′(c) = 0, while f ′′(c) > 0 for some c ∈ (a, b). Then f(c) is a local or relative minimum of f(x); meaningthat ∃ δ > 0 s.t. f(c) ≤ f(x) for all x ∈ [c− δ, c+ δ].

Proof. By the preceding theorem, if we know f ′′(x) exists and is positive on an open interval containing c, then f(x) isstrictly convex up on [c − δ, c + δ] for some δ > 0. This makes f(c) a local minimum since f(x) must be greater than f(c)for all x ∈ [c− δ, c+ δ].But we don’t need to know that f ′′(x) exists except at the point x = c. For we have

0 < f ′′(c) = limh→0

f ′(c+ h)− f ′(c)h

= limh→0

f ′(c+ h)h

.

Therefore f ′(x) must be positive on (c, c+ δ1) for some small δ1 > 0 and f ′(x) must be negative on (c− δ2, c) for some smallδ2 > 0. This means that taking δ = 1

2 min{δ1, δ2}, we have our result. For then f is decreasing on [c− δ, c] and increasingon [c, c+ δ]. That makes f(c) a minimum on [c− δ, c+ δ].

22 Inverse Functions and Their Derivatives

We seek a rule to tell us how to differentiate log x assuming we know the derivative of ex, (or the derivative of√x, assuming

we know the derivative of x2), or ( the derivative of arcsin(x) assuming we know the derivative of sin(x)). This is the inversefunction theorem in 1 variable. We will prove a special case.

Theorem 44 Inverse Function Theorem. Suppose f : [a, b]→ R is continuous and differentiable on (a, b) with f ′(x) > 0for all x ∈ (a, b). Then there is an inverse function g : [f(a), f(b)] → R meaning that g(f(x)) = x for all x ∈ [a, b] andf(g(y)) = y for all y ∈ [f(a), f(b)]. We often write g = f−1 although this is not to be confused with 1

f . The inverse functiong is differentiable and

g′(y) =1

f ′(g(y)). (24)

Note that the Leibniz notation for derivatives makes this theorem look like high school algebra as it says dxdy =

1dydx

.

Moreover, if we know the inverse function to be differentiable, then the formula for its derivative follows from the chain ruleas g(f(x)) = x implies g′(f(x))f ′(x) = 1 and thus as x = g(y), we get formula (24). Unfortunately we need to show thatthe inverse function is differentiable. Before doing that, let’s look at a few examples.Example 1. Consider the function

y = f(x) = xn = x · · ·xn times

, n = 1, 2, 3, 4, .....

71

Page 72: Introduction, Motivation, Basics About Sets, Functions, Counting

The inverse function is g(y) = y1n = n

√y, defined for y ≥ 0 if n is even and all y if n is odd. Then g′(y) = 1

ny1n−1.

No surprise that it follows the general formula for powers. But we need to prove this using the inverse function theorem.Assume y > 0 if n is even and in general that y �= 0. Then by formula (24) we have, using the fact that f ′(x) = nxn−1,

g′(y) =1

f ′(g(y))=

1

n(g(y))n−1=1

ny−

n−1n =

1

ny1n−1.

Here we’ve used the following definition in Example 2 of a rational power of y.

Example 2. Let r ∈ Q and r = pq , with integers p and q, where q > 0. Then we can define

xpq =

(x1q

)p, (25)

assuming x > 0 if q is even. We leave it as an exercise for the reader to show that f ′(x) = rxr−1.

Proof. of the inverse function theorem.Step 1. Define f−1.We are assuming f : [a, b] → R is continuous and differentiable on (a, b) with f ′(x) > 0 for all x ∈ (a, b). Thus f(x)

is strictly increasing on [a, b] by the mean-value theorem. This shows that f is 1-1 on [a, b]. By the intermediate valuetheorem we know that f maps [a, b] 1-1, onto the interval [f(a), f(b)]. Thus the inverse function g : [f(a), f(b)] → [a, b] iswell-defined, 1-1, and onto. Given y ∈ [f(a), f(b)],we know by the intermediate value theorem that there is an x ∈ [a, b] suchthat f(x) = y and so we then define g(y) = x. Note that x is unique since f is 1-1.Step 2. The inverse function g is also strictly increasing.Suppose that f(a) < r < s < f(b). We want to show that g(r) < g(s). We do a proof by contradiction. If g(s) ≤ g(r),

then since f is strictly increasing, we have s = f(g(s)) ≤ f(g(r)) = r. But this says s ≤ r while r < s giving us ourcontradiction. So g must be strictly increasing.Step 3. The continuity of the inverse function.We want to show that g(y) is continuous at r = f(c) for c ∈ [a, b]. Let us first assume that c is not an endpoint. Suppose

we are given ε > 0 which is small enough that c±ε ∈ (a, b). Let g(r+δ2) = c+ε and g(r−δ1) = c−ε. Set δ = min{δ1, δ2}.Then we claim that

|y − f(c)| < δ implies |g(y)− c| < ε.To see this, look at Figure 49 and note that

|y − f(c)| < δ ⇐⇒ −δ < y − f(c) < δ ⇐⇒ f(c)− δ < y < f(c) + δ.

This implies r− δ1 < y < r+ δ2. It follows upon applying g to this inequality, using the fact that g is strictly increasing, wehave

c− ε = g(r − δ1) < g(y) < g (r + δ2) = c+ ε.Thus |g(y)− c| < ε if |y − r| < δ. This proves the continuity of g. We leave the case that c = a or c = b to the reader asan exercise.Note that the graph of the inverse function g is obtained from that of the function f by reflecting the graph of f across

the line y = x. Thus it is inconceivable that if the graph of f has no breaks, then the graph of g could have breaks.Step 4. The formula for the derivative of the inverse function.As in Step 3, let f(c) = r and thus g(r) = c. Then

g′(r) = limy→r

g(y)− g(r)y − r = lim

y→r

x− cf(x)− f(c) = lim

y→r

1f(x)−f(c)

x−c.

This proof is over once we prove that if y approaches r, then x = g(y) approaches c = g(r) as the limit on the right asx approaches c is 1

f ′(x) =1

f ′(g(y)) . But the continuity of g which was proved in Step 3 says that if y approaches r, thenx = g(y) approaches c = g(r). So we’re done.

72

Page 73: Introduction, Motivation, Basics About Sets, Functions, Counting

Figure 49: Figure for the proof of the continuity of the inverse of a function y = f(x) with positive derivative on an interval.The inverse function is x = g(y). For a point c let r = f(c). Given small positive ε, define the positive numbers δ1 and δ2by r + δ2 = f(c+ ε) = and r − δ1 = f(c− ε).

73

Page 74: Introduction, Motivation, Basics About Sets, Functions, Counting

23 Favorite Functions and Their Derivatives

We assume you have seen the functions ex, log x, sinx, cosx, etc. before. But how do you define them precisely? HereI will take the approach that you believe in infinite series and I will define ex by its Taylor series, even though we will notdiscuss series for a while. Then I will define log x to be the inverse function of ex. Many calculus books take a differentapproach and define log x as an integral. That seems less natural to me and we haven’t yet discussed integrals either. Asto the trigonometric functions, I will say very little about them here.

23.1 Exponential

Definition 45 The exponential function is defined for any real number x by

ex = exp(x) =∞∑n=0

xn

n!= lim

M→∞

M∑n=0

xn

n!= 1 + x+

x2

2+x3

6+x4

24+ · · · . (26)

The power series in formula (26) converges for all real x as we will see later using the ratio test. Assuming that it is legalto differentiate a power series term-by-term where it converges (as we will later prove), we see that

dex

dx= 0 + 1 + x+

x2

2+x3

6+x4

24+ · · · = ex.

We also see that e0 = 1. It follows that y = f(x) = ex satisfies the first order ordinary differential equation

dy

dx= y and the initial condition y(0) = 1. (27)

The ODE and Initial Condition in Formula (27) gives another way to define y = ex since the solution to this 1st order ODEwith initial condition problem is unique. The good thing about our definition vs this one is that ours gives us a way tocompute the function. Maybe that doesn’t impress anyone with a calculator or a computer, but it used to make me very

happy. The series converges pretty fast. Try computing e as10∑n=0

1n! = 2.718 3. This gives the first 5 digits by summing 11

terms of the series. Pretty cool.Yes, I love this series. It even works for complex numbers z = x+ iy, where i =

√−1, for square matrices X, and forp−adic numbers so dearly loved by number theorists. Moreover there is a great generalization to Lie algebras, but we won’tgo there. The complex number version gives you a way to understand sinx and cosx without knowing trigonometry sinceeiθ = cos θ + i sin θ (Euler’s formula).Next let’s derive the basic facts about the exponential from the power series definition, making use of theorems about

power series that will be proved later in the course. But you saw them in the power series part of the usual calculus class.

Facts About the Exponential Function.Fact 1. Exp takes addition to multiplication.

ex+y = exey.

Proof. Multiply the power series and use the binomial theorem. This gives

ex+y =∞∑n=0

xn

n!

∞∑m=0

ym

m!=

∞∑k=0

∑m+n=km,n≥0

xn

n!

ym

m!.

Here we treat the double series as we would a double integral. We change variables fromm,n to n, k = m + n. It followsthat

ex+y =∞∑k=0

k∑n=0

xnyk−n

n!(k − n)! =∞∑k=0

1

k!

k∑n=0

k!

n!(k − n)!xnyk−n =

∞∑k=0

1

k!(x+ y)k,

by the binomial theorem.

74

Page 75: Introduction, Motivation, Basics About Sets, Functions, Counting

Fact 2. Exp never vanishes. ex �= 0 and e−x = 1ex . In fact, e

x > 0 for all real numbers x.Proof. By Fact 1, we know that exe−x = ex−x = e0 = 1. This implies that ex �= 0 and e−x = 1

ex . Since the power seriesdefining ex is positive when x ≥ 0 as all the terms are ≥ 0 and the first term is 1, we see that ex > 0 for all real numbersx. For x < 0, we know ex = 1

e−x , and the reciprocal of a positive number is positive.Fact 3. The exponential grows faster than any power of x as x goes to ∞. More precisely, we have

limx→∞

ex

xn=∞.

Proof. By our definition of ex for x > 0, ex is bounded below by any one term of the power series, e.g.,

ex >xn+1

(n+ 1)!.

Therefore, when x > 0, we have

ex

xn>

xn+1

(n+ 1)!

1

xn=

x

(n+ 1)!→∞, as x→∞, since n is fixed.

Now we can graph ex. We know that it is strictly increasing for all real x, since it is its own derivative and it is alwayspositive. The second derivative is the same and positive and thus the graph is convex up everywhere. The function ex goesto ∞ as x→∞. Since e−x = 1

ex , it follows that ex → 0 as x→ −∞.

Figure 50: the graph of y =ex

Fact 4. (ex)y= exy.

Proof. We postpone our proof until we have discussed log x.Fact 5. e is irrational.

Proof. This is a proof by contradiction. Suppose instead that e = ab , where a, b ∈ Z+. Let k ≥ b and look at the kth

partial sum of the power series for e. If we subtract the kth partial sum from e, we get

e−k∑

n=0

xn

n!=a

b−

k∑n=0

xn

n!=

∞∑n=k+1

xn

n!.

75

Page 76: Introduction, Motivation, Basics About Sets, Functions, Counting

Multiply this by k! to find

k!

(a

b−

k∑n=0

xn

n!

)= k!

∞∑n=k+1

xn

n!. (28)

Now the left-hand side of this last equation is an integer while (by the argument below) the right-hand side is in the interval(0, 1). This is a contradiction, proving that e is irrational.To see that the right-hand side of equation (28) is in the interval (0, 1), proceed as follows.

k!∞∑

n=k+1

xn

n!=

k!

(k + 1)!+

k!

(k + 2)!+

k!

(k + 3)!+ · · ·

=1

k + 1+

1

(k + 1)(k + 2)+

1

(k + 1)(k + 2)(k + 3)+ · · ·

<1

k + 1+

1

(k + 1)2+

1

(k + 1)3+ · · ·

This last sum can be evaluated using the formula for the geometric series∞∑n=0

xn = 11−x , if |x| < 1. This gives

∞∑n=1

1

(k + 1)n=

1

k + 1

1

1− 1k+1

=1

k.

It follows that the right-hand side of equation (28) is in the interval (0, 1) and we have our contradiction.

Lambert proved that e and π are irrational in 1761. Gelfond proved that eπ is irrational in 1929. See Hardy and Wright,Introduction to the Theory of Numbers, p. 46, for the proof of the irrationality of π. In fact, both e and π are transcendental;i.e., they are not roots of a polynomial with rational coefficients. Note that rational numbers r = a

b , with a, b ∈ Z, b �= 0,are roots of the first degree polynomial with rational coefficients x− r or, if you want integer coefficients bx− a.There is a whole branch of number theory devoted to such questions. In the 1970’s a French mathematician named

Apéry showed that the following value of the Riemann zeta function is irrational:

ζ(3) =∞∑n=1

1

n3.

Apéry was an unknown older French mathematician when he found his proof and so no one believed him at first. SeeVan Der Poorten, "A Proof that Euler Missed ..." in The Mathematical Intelligencer, 1 (1979), 195-203. Euler showed that

ζ(2) =∞∑n=1

1

n2=π2

6, ζ(4) =

∞∑n=1

1

n4=π4

90,

and similar formulas for ζ(2n), n = 3, 4, 5, 6, .... saying that ζ(2n) = rπ2n, where r is rational. See the last section ofLectures, II for a proof of the 1st formula using Fourier series.

23.2 Logarithm and the Power Function

Here we reverse the treatment of exp and log found in most calculus books. Note that exp : R → R+ = (0,∞) is 1-1 andonto. Why? It is 1-1 since it is its own derivative and it is positive, thus strictly increasing and its graph is convex up. It isonto since ex certainly approaches ∞ as x→∞. Moreover as x→ −∞, we see that ex = 1

e−x → 0. Since ex is continuous,by the intermediate value theorem it must be onto (0,∞). So ex satisfies the hypotheses of our theorem on inverse functions(Theorem 44).The moral of the preceding paragraph is that we can define an inverse function to y = ex.

Definition 46 We define the logarithm base e (natural logarithm) denoted log y by g(y) = log y = x iff y = ex.

76

Page 77: Introduction, Motivation, Basics About Sets, Functions, Counting

Note that most calculus books call this function ln y for natural logarithm to distinguish it from the logarithm base 10.We will have no need for base 10 logs and so we will probably hopefully never write ln y.

Properties of the LogarithmProperty 0) The Differential Equation.The function g(y) = log y maps (0,∞) 1-1 onto R. It satisfies the differential equation g′(y) = 1

y with g(1) = 0.

Proof. We use Theorem 44 which says that if y = f(x) = ex, then the derivative of the inverse function is obtained asfollows:

g′(y) =1

f ′(g(y))=

1

f(g(y))=1

y.

Since f(0) = 1, we know that g(1) = 0.The calculus book usually defines log y as:

log y =

y∫1

1

tdt.

This will make more sense after we have proved the Fundamental Theorem of Calculus (which will be discussed soon). Forthen we will know that the derivative of the integral as a function of its upper endpoint is just the integrand. And, of course,0∫0

1t dt = 0.

Property 1) log takes multiplication to addition; i.e.,

log u+ log v = log(uv).

Proof. To see this, use Fact 1 for ex. This saysex+y = exey. (29)

Let u = ex and v = ey. Then take logs of both sides of formula (29). That says

log(ex+y

)= log (exey) .

Since log erases exp, we havelog u+ log v = x+ y = log (exey) = log(uv).

Now we are ready to graph y = log x. Of course the graph is obtained by reflecting the graph of ex across the line y = x.Since dy

dx =1x > 0 if x > 0, we know that log x is increasing on (0,∞). And the second derivative d2y

dx2 = − 1x2 < 0, for x > 0.

This means that the graph is convex down. We also know that log 1 = 0 and log e = 1. Recall that e ∼= 2.7183. So we getFigure 51. We also know lim log y = −∞

y → 0y > 0

and lim log y =∞y→∞

from the corresponding facts about ex.

77

Page 78: Introduction, Motivation, Basics About Sets, Functions, Counting

Figure 51: graph of y = log x

78

Page 79: Introduction, Motivation, Basics About Sets, Functions, Counting

Now we define a function that includes most of the powers that we might want to think about for real variables.

Definition 47 The power function is defined as follows. Suppose x > 0 and a is any real number. Then definexa = exp(a log x).

Proposition 48 (Definitions of Power Functions Agree for Rational Powers.) If a = mn , for m,n ∈ Z, n > 0,

then xa from Definition 47 gives the same answer as our earlier definition in formula (25). As in Definition 47, we areassuming that x > 0.

Proof. Now our old definition said xmn =

(x1n

)m, for x > 0. First one should look at the case n = 1, when xm is defined to

be a product of x’s takenm times. Why is that the same as exp(m log x)? Note that log(x · · ·x) = log x+· · ·+log x = m log x.This proves the log of the desired formula. Since the log is 1-1, this proves the desired formula.Now look at the special case m = 1. By y = x

1n = n

√x, we mean the inverse function to x = yn. So we need to show

that with our new definition,(x1n

)n= x. Well, we have from properties of ex = expx,

(exp( 1n log x)

)n= exp

(n 1n log x

)=

exp log x = x. Note also that exp( 1n log x) is positive.Finally we leave it to the reader to do the general case as an exercise.Property 2 of Log. log(xa) = a log x.

Proof. By Definition 47 of xa and the fact that ex and log are inverse functions, we have:

log (xa) = log(exp(a log x)) = a log x.

Facts about Powers.Property 1 of Powers. xaxb = xa+b.

Proof. Take the log of both sides. On the left, you get

log(xaxb) = log xa + log xb = a log x+ b log x = (a+ b) log x.

On the right, you getlog xa+b = (a+ b) log x.

Since the 2 logs are equal, so are the original entities, because log is 1-1.Property 2 of Powers. (xa)b = xab.

Proof. Again take the log of both sides. On the left, you get

log((xa)

b)= b log (xa) = ba log x.

On the right, you getlog(xab)= ab log x.

Since the 2 logs are equal, so are the originals.Example 1. lim

h→0(1 + h)

1h = e.

Proof. Take logs. Set g(y) = log y. Recall that g′(y) = 1y and therefore

limh→0

g[(1 + h)

1h

]= lim

h→0

log(1 + h)

h= g′(1) =

1

1= 1.

So erasing g by exponentiating will compute the desired limit as f(x) = ex is continuous. Thus

f

(limh→0

g[(1 + h)

1h

])= lim

h→0

(f(g[(1 + h)

1h

]))= lim

h→0

[(1 + h)

1h

]= f(1) = e.

Example 2. limy→∞

log yy = 0.

Proof. Set y = ex. Note that y →∞ iff x→∞. The stated formula is therefore a result of the fact that log yy = x

ex → 0,as x→∞.

79

Page 80: Introduction, Motivation, Basics About Sets, Functions, Counting

23.3 Complex Numbers and Trigonometric Functions

Since −1 is not the square of a real number, to get a root of x2 + 1 = 0, we need to enlarge our world beyond the real line.Let i denote a creature whose square is −1. Thus i =

√−1 is not real, yes, it’s imaginary, but not really more imaginarythat most of mathematics. Of course the equation x2 + 1 = 0 also has the root −i. And it is really impossible to say whichof the 2 roots we are calling i. But worry not. We just call one of them i.Define the set C of complex numbers to be:

C = {z = x+ iy | x, y ∈ R}

If z = x+ iy, with x, y ∈ R, we say that x = Re z is the real part of z and y = Im z is the imaginary part of z.We can visualize the complex numbers z = x+ iy, with x, y ∈ R as points (x, y) in the plane. See Figure 52.

Figure 52: The complex number z = x+ iy, with x, y real can be pictured as a point in the plane with coordinates (x, y),or thought of as a vector from the origin to (x, y).

The set C forms a field; i.e., it satisfies the 9 field axioms stated in the earlier section on the field axioms for R, wherewe have defined sum and product of complex numbers as follows if z = x+ iy and w = u+ iv, with x, y, u, v ∈ R

z + w = (x+ u) + i(y + v) and zw = (x+ iy)(u+ iv) = xu− yv + i(xv + yu).

And if w �= 0, you can divide by w to getz

w=x+ iy

u+ iv=x+ iy

u+ iv

u− ivu− iv =

xu+ yv + i(yu− xv)u2 + v2

=xu+ yv

u2 + v2+ iyu− xvu2 + v2

.

80

Page 81: Introduction, Motivation, Basics About Sets, Functions, Counting

Here we just recalled an old trick from high school algebra, multiplying by 1 in the form of the conjugate u − iv of thedenominator over u− iv.To say that C forms a field means that you can add, subtract, multiply and divide by non-0 numbers with the usual laws

of associativity, distributivity, commutativity, etc. Note however that C is not an ordered field, just a field. The identityfor addition is the complex number 0 = 0 + i0. A complex number is 0 iff both its real and imaginary parts are 0.Next we define the complex absolute value.

Definition 49 Absolute value of a complex number z is |z| =√x2 + y2 if z = x+ iy, with x, y ∈ R.

Complex conjugate of z is c(z) = z = x− iy.

Then |z|2 = zz. Also c(z + w) = c(z) + c(w) and c(zw) = c(z)c(w). The map c(z) is a continuous function whichmaps C 1-1 onto C fixing elements of R.The properties of complex absolute value are the same as those for the real absolute value.Properties of Complex Absolute Value.Property 1) |z| ≥ 0, for all complex numbers z and |z| = 0 iff z = 0.Property 2) |zw| = |z| |w| .Property 3) triangle inequality |z + w| ≤ |z|+ |w| .We leave the proofs of these properties to the reader. We will prove these things later in these notes, when we consider

normed vector spaces. This time there really is a triangle in the triangle inequality. See Figure 53.

Figure 53: Since the complex absolute value |z| is the length of the vector from the origin to the point representing z in theplane and since the sum of 2 complex numbers z and w corresponds to the sum of the 2 vectors corresponding to z and w,the triangle inequality says the sum of the lengths of 2 sides of a triangle is greater than or equal to the length of the thirdside.

81

Page 82: Introduction, Motivation, Basics About Sets, Functions, Counting

We could redo all of the things about limits and derivatives for complex numbers, but we won’t. That’s another course.And we would fall asleep the 2nd time through. In particular series of complex numbers work just like series of real numbers.If z is a complex number, we can define the complex exponential as a complex power series:

exp(z) = ez =∞∑n=0

zn

n!.

It turns out that this series converges for all complex numbers z. Then we have

ez+w = ezew (30)

just as in the real case. You multiply complex power series just the same way you multiply real power series (carefully).And the binomial theorem is as true for complex numbers as for real numbers.So we see that

ex+iy = exeiy.

What does this mean? Let’s look at eiy :

eiy =∞∑n=0

(iy)n

n!= 1 + iy +

(iy)2

2+(iy)

3

6+(iy)

4

24+ · · ·

= 1 + iy − y2

2− iy

3

6+y4

24+ · · · .

=

{1− y

2

2+y4

24+ · · ·

}+ i

{y − y

3

6+y5

120+ · · ·

}= cos y + i sin y.

Here we are defining sinx and cosx by their Taylor series. You can deduce the standard trigonometric identities fromthe properties of the complex exponential like formula (30).One finds that

∣∣eiy∣∣ = 1, for y ∈ R. To see this note that ∣∣eiy∣∣2 = eiye−iy = eiy−iy = e0 = 1. Since∣∣eiy∣∣ must be

positive it follows that∣∣eiy∣∣ = 1 for all real y. This says sin2 y + cos2 y = 1, for y ∈ R.

See Figure 54.

82

Page 83: Introduction, Motivation, Basics About Sets, Functions, Counting

Figure 54: For an acute angle θ, cos θ is the x−coordinate of eiθ and sin θ is the y-coordinate. In particular, the complexabsolute value

∣∣eiθ∣∣ = 1.

83

Page 84: Introduction, Motivation, Basics About Sets, Functions, Counting

If we define sinx and cosx by their Taylor series, we have

f(x) = cosx =∞∑n=0

(−1)n x2n

(2n)!and g(x) = sinx =

∞∑n=0

(−1)n x2n+1

(2n+ 1)!. (31)

You can check (assuming it is legal to differentiate term-by-term) that sinx and cosx satisfy the following system of ODEswith initial conditions:

f ′(x) = −g(x) and g′(x) = f(x), f(0) = 1 and g(0) = 0. (32)

It is possible to define sin and cos to be solutions of the initial value problems in equation (32). Lang, UndergraduateAnalysis, defines the trig functions this way. I don’t find this to be very satisfying since it does not tell you how to computethem. Of course you might prefer the geometric definition seen in Figure 54 for an acute angle x measured in radians. Alldefinitions are the same once you show that whatever your definition of sin and cos it satisfies formula (32). For you learnin differential equations that solutions to such things are unique.I prefer to deduce all the standard trig identities from our Taylor series for sin and cos, eiy = cos y+i sin y, ez+w = ezew.

It is a nice exercise. For example, you can easily see from the fact that all powers in the Taylor series for cosx are even,that cos(−x) = cosx.To figure out the addition formula for cosx, look at

eiu+iv = eiueiv = (cosu+ i sinu) (cos v + i sin v) = cosu cos v − sinu sin v + i(cosu sin v + sinu cos v).

Thereforecos(u+ v) = cosu cos v − sinu sin v.

I learned this treatment of elementary functions as by power series an undergrad reading P. Dienes, The Taylor Series,Chapter 4. I really like this approach since it allows you to compute the functions and minimizes the memorization of trigidentities.Of course we still have the problem of defining π. Lang defines π by saying that π2 is the first positive zero of cosx. It

takes a bit of effort to see that such a zero exists if you are not allowed to look at Figure 54.Then Lang goes on to show that sin and cos are periodic of period 2π. This is certainly not obvious from the Taylor

series in formula (31). Any finite combination via sum, product, composite, difference, quotient, inverse of the functionsseen up to now (polynomials, powers, exponentials, sines, cosine) are called elementary functions. Of course they mightbe very complicated such as

eπx − 2001 log√x− 500x 2

3 +√x.

These are the functions considered in calculus.There are other favorite functions with many applications; e.g., in statistics, physics,... A good reference for many such

functions is N.N. Lebedev, Special Functions. Mathematica and Matlab know many of these functions: the error function,Bessel functions, Airy integrals, ....We have already seen the gamma function in the introduction. The incomplete gamma function is another favorite, as

are Legendre polynomials, also known as spherical harmonics because they arise in problems with spherical symmetry, suchas the problem of understanding the vibrations of the earth after a large earthquake, the solution of the Schrödinger equationfor the hydrogen atom, or the study of the sun’s magnetic field.Not every function that you can think of is expressible by its Taylor series. One example is the infinitely differentiable

function defined by

f(x) =

{exp(−1x2 ), x �= 0

0, x = 0

One can show that all the derivatives f (n)(0) = 0, for n = 0, 1, 2, 3, .... This implies that the Taylor series of this function atthe origin is identically 0. We will say more about that later when we discuss Taylor series. Similar functions can be usedto glue 0 to 1 in an infinitely differentiable way (C∞ - glue).

84

Page 85: Introduction, Motivation, Basics About Sets, Functions, Counting

Figure 55: The function f(x) =

{exp(−1x2 ), x �= 0

0, x = 0is an infinitely differentiable function not represented by its Taylor

series at 0.

85

Page 86: Introduction, Motivation, Basics About Sets, Functions, Counting

Part VII

Basic Properties of Integrals24 Axioms for Integrals and the Fundamental Theorem of Calculus

In this section we prove the basic properties of the Riemann integral of a continuous function on a finite interval from 2 basicaxioms. We will not show that an integral satisfying these 2 axioms exists until much later, using a method which mayseem different from that of limits of Riemann sums that you saw in calculus. It is an intermediate method between thatusually used for Riemann integrals and that used for Lebesgue integrals. Both kinds of integrals give the same answer forcontinuous functions on finite intervals, or even piecewise continuous functions on finite intervals (i.e., functions with a finitenumber of jump discontinuities; that is, points where the right- and left- hand limits exist but are different). An example isseen in Figure 33.Both Riemann and Lebesgue integrals were invented to discuss Fourier analysis. Yes, the Fourier coefficients in equation

(2) are integrals. Both ideas of integrals start with area. The integral of a constant function f(x) = c, for all x in [a, b]will be c(b− a) which is the area under the curve. See Figure 56.

Figure 56: The integral of the constant function f(x) = 11 from 2 to 14 is 11 ∗ 12 = 132.

But, of course, there are many more reasons we need integrals; for example, to compute probabilities. Many tests from

statistics come from computing 1√2π

∞∫x

e−t2

2 dt.

Anyway, the amazing thing is that we can do most of the stuff you learned in calculus about integrals from 2 measly

86

Page 87: Introduction, Motivation, Basics About Sets, Functions, Counting

axioms. We will write definite integrals as Idc f or

d∫c

f rather than

d∫c

f(x)dx to emphasize that the definite integral is a

real number depending on the function f, the interval [c, d] and nothing else. The variable x is a "dummy variable." Itis there to help us evaluate integrals; for example, when we want to make a substitution. But the definite integral is not afunction of the variable x.

The 2 Simple Axioms for IntegralsSuppose that f : [a, b]→ R is continuous and a < b are finite numbers. We write for a < c < d < b

Idc f =

d∫c

f =

d∫c

f(x)dx.

Axiom I. Suppose that m ≤ f(x) ≤M, for all x ∈ [a, b]. Thenm(d− c) ≤ Idc f ≤M(d− c).

See Figure 57

Figure 57: Illustration of Axiom 1 for integrals showing that when the function is between the positive numbers m and M,then you expect the integral over [a, b] to be between the area of the large rectangle which is M(b− a) and that of the smallrectangle which is m(b− a).

Axiom II. Suppose c < r < d. ThenIdc f = I

rc f + I

dr f.

87

Page 88: Introduction, Motivation, Basics About Sets, Functions, Counting

Figure 58: Illustration of Axiom 2 showing that the integral of f on [a, b] should equal the sum of the integrals of f on [a, r]and on [r, b] if a < r < b.

88

Page 89: Introduction, Motivation, Basics About Sets, Functions, Counting

It follows from Axiom I that Iccf = 0. If c > d, define Idc f = −Icdf.

Later we will construct the integral starting with integrals of step maps or piecewise constant functions on the finiteinterval [a, b]. A step map looks like f(x) = wi, ai−1 < x < ai, for i = 1, ..., n. Here a = a0 < a1 < a2 < · · · < an = b.Then we define

Ibaf =

b∫a

f =

n∑i=1

wi(ai − ai−1).

See Figure 59.

Figure 59: graph of a step function with values wj on the jth subinterval of the partition of [a, b]

Integrals are extended to continuous functions and others by taking limits of Cauchy sequences of step functions usingthe absolute value norm

‖f‖1 =b∫a

|f(x)| dx.

But we have to talk about normed vector spaces first.For the rest of this part of the lectures we assume that there exists an integral satisfying Axioms I and II. These 2 puny

axioms are enough to be able to prove the Fundamental Theorem of Calculus, that famous theorem that Newton and Leibnizfought to own.

89

Page 90: Introduction, Motivation, Basics About Sets, Functions, Counting

Theorem 50 Fundamental Theorem of Calculus. Suppose that we have an integral satisfying our 2 axioms able tointegrate all continuous functions on finite intervals. Then if f : [a, b]→ R is continuous with −∞ < a < b <∞, we have

d

dx

x∫a

f =d

dx

x∫a

f(t)dt = f(x) for all x ∈ [a, b].

Proof. Case 1. The Right-Hand Derivative.For the right-hand derivative, we need to investigate the difference quotient below with h > 0. We use Axiom II to see:

x+h∫a

f −x∫a

f

h=

x+h∫x

f

h.

Here we use the fact that h > 0 and thus a < x < x+ h < b when h is near enough to 0.Let f(u) = min{f(t) | x ≤ t ≤ x+ h} and f(v) = max{f(t) | x ≤ t ≤ x+ h}. We know that u, v ∈ [x, x+ h] exist by

the Weierstrass theorem on the existence of maxima and minima for continuous functions on closed finite intervals. So nowby Axiom I, we have

f(u) =f(u)h

h≤

x+h∫x

f

h≤ f(v)h

h= f(v).

Now let h approach 0 from above. The quantity in the middle approaches the right-hand derivative of

x∫a

f as a function

of x, holding a fixed. The quantities on the outside of the inequalities approach f(x) by the continuity of f. So by theSqueeze Lemma, the right-hand derivative is f(x).Case 2. The Left-Hand Derivative.We leave it to the reader as an exercise to show that the left-hand derivative is also f(x). This is a little tricky because

the denominator in the difference quotient is now negative and a < x+ h < x < b for h < 0. One has, again using Axiom 2,

x+h∫a

f −x∫a

f

h=

−x∫

x+h

f

h=

x∫x+h

f

−h .

Now use the same argument as in Case 1 to see that the left-hand derivative is f(x), to complete the proof.

There was a huge quarrel over who invented calculus and "proved" the fundamental theorem - Newton or Leibniz.The effects may still persist. According to Wiener in 1949, " it became an act of faith and of patriotic loyalty for Britishmathematicians to use the less flexible Newtonian notation and to affect to look down on the new work done by the Leibnizianschool on the Continent .... When the great continental school of the Bernoullis and Euler arose (not to mention Lagrangeand Laplace who came later) there were no men (sic) of comparable calibre north of the Channel to compete with them ...."See Edna E. Kramer, The Nature and Growth of Mathematics, p. 172.

Anyway it is relatively simple now to prove all the basic rules of calculus that you know and love, using our 2 axiomsand the fundamental theorem. No more memorizing formulas! Next we show the basic fact which allows the evaluation ofintegrals by finding antiderivatives.

Corollary 51 The integral is unique. Assume that f : [a, b]→ R is continuous with −∞ < a < b <∞. If F is differentiableon (a, b) with F ′ = f, then for x ∈ (a, b), we have

x∫a

f(t)dt =

x∫a

f = F (x)− F (a). (33)

90

Page 91: Introduction, Motivation, Basics About Sets, Functions, Counting

Proof. Well, by the fundamental theorem, we know that the derivative with respect to x of the left-hand side of formula(33) is f(x). By Corollary 41 of the mean-value theorem, formula (33) is true up to a constant. That is, we have a constantC such that

x∫a

f = F (x) + C.

To determine C, set x = a. We saw earlier that

a∫a

f = 0. Thus 0 = F (a) + C, which implies C = −F (a), proving our

Corollary.The Fundamental Theorem of Calculus says that the derivative erases the integral:

d

dx

x∫a

f = f(x).

The next Corollary essentially says the integral erases the derivative. It is just a restatement of the preceding Corollary.

Corollary 52 With the same hypotheses as the preceding Corollary, we have

x∫a

d

dtF (t) dt = F (x)− F (a).

As a result of the fundamental theorem and the preceding Corollary we see that the integral and the derivative are

essentially inverse functions on functions. Thus the indefinite integral is called an antiderivative and written∫f. It is only

determined up to a constant by Corollary 41 of the mean value theorem.Examples.Example 1. Constant Function.Suppose f(x) = c = constant, for all x ∈ [a, b]. Then by Axiom I we have

c(b− a) ≤b∫a

f ≤ c(b− a).

It follows thatb∫a

c = c(b− a). (34)

If c > 0, this integral is the area of the rectangle bounded by y = c, the x-axis, x = a, and x = b.If c < 0, the integral is the negative of the area of this rectangle. We could also prove formula (34) from the preceding

Corollaries. For the derivative of F (x) = cx is c and thus

b∫a

c = F (b)− F (a) = cb− ca = c(b− a).

Example 2. Standard Examples from Calculus.By the preceding Corollaries any derivative formula gives an integral formula when read backwards. Any table of

derivatives is a table of integrals. So for example, we have the following formulas, assuming that 0 < a < b for the middleformula:

b∫a

ex dx = eb − ea,b∫a

1

ydy = log b− log a,

b∫a

cosx dx = sin b− sin a.

91

Page 92: Introduction, Motivation, Basics About Sets, Functions, Counting

25 Rules for Integrals

Most of the following rules for integrals come from some property of derivatives and the fundamental theorem of calculus.Rules for Integrals. Suppose that f and g are continuous on [a, b].Rule 1) Linearity. If α, β ∈ R, then

b∫a

(αf + βg) = α

b∫a

f + β

b∫a

g.

Rule 2) Integrals Preserve ≤ . Suppose that f(x) ≤ g(x) for all x ∈ [a, b]. Thenb∫a

f ≤b∫a

g.

It follows that ∣∣∣∣∣∣b∫a

f

∣∣∣∣∣∣ ≤b∫a

|f | .

Rule 3) Positivity. Suppose that f(x) ≥ 0 for all x ∈ [a, b]. If there is a point c ∈ [a, b] such that f(c) > 0, thenb∫a

f > 0.

Rule 4) Substitution.Suppose g′(x) is continuous on [a, b]. Then, if f is continuous on g[a, b],

g(b)∫g(a)

f(u)du =

b∫a

f(g(x))g′(x)dx.

Rule 5) Integration by Parts.Suppose that f ′ and g′ are continuous on [a, b]. Then

b∫a

fg′ = f(b)g(b)− f(a)g(a)−b∫a

f ′g.

Proof. Rules 1), 4), 5) come from corollary 51 and corresponding rules for derivatives. So let’s do these first.Rule 1). We want to show that for all x ∈ [a, b] :

x∫a

(αf + βg) = α

x∫a

f + β

x∫a

g.

Take derivatives of the function of x on the left using the fundamental theorem. You get (αf + βg) (x). But the derivativeof the function of x on the right is also (αf + βg) (x), using the linearity of derivatives.So now we know that the left hand function of x has the same derivative as the right hand function of x. But then

corollary 41 of the mean value theorem says that the two functions differ by a constant K; i.e.,

x∫a

(αf + βg) = α

x∫a

f + β

x∫a

g +K.

92

Page 93: Introduction, Motivation, Basics About Sets, Functions, Counting

What is K? Set x = a and use the fact that

a∫a

h = 0. This says K must be 0 and Rule 1 is proved.

Rule 4). We want to prove that for all x ∈ [a, b]:g(x)∫g(a)

f(u)du =

x∫a

f(g(t))g′(t)dt.

Again we differentiate the left hand function of x, using the fundamental theorem of calculus and the chain rule, obtainingf(g(x))g′(x). When we differentiate the right hand side as a function of x, we get the same answer using just thefundamental theorem. Again Corollary 41 of the mean value theorem tells us that the left and right hand side must differby a constant. Plug in x = a to see that the constant must be 0.Rule 5). Prove this as an exercise imitating the proof of Rule 1 using the rule for the derivative of a product.

Rule 2). We want to show assuming a < b and f(x) ≤ g(x) for all x ∈ [a, b] thatb∫a

f ≤b∫a

g. To do this, look at

h = g−f . Then h is ≥ 0 on [a, b] and Axiom 1 for integrals says

b∫a

h ≥ 0(b−a) = 0. By the linearity property just proved,

then

b∫a

g −b∫a

f =

b∫a

h ≥ 0. This implies the result.

Rule 3). We want to prove that

b∫a

f > 0. For this, you should draw a picture. See Figure 60. By the continuity of f at

c, if we’re given ε = f(c)/2, then there is a δ so that |x− c| < δ implies |f(x)− f(c)| < f(c)/2. This means that−f(c)2

< f(x)− f(c) < f(c)

2.

Add f(c) to this and get, for x ∈ (c− δ, c+ δ),

0 <f(c)

2< f(x) <

3f(c)

2.

It follows using Axiom 2 for integrals and the fact that the integral preserves ≤b∫a

f =

c−δ∫a

f +

c+δ∫c−δ

f +

b∫c+δ

f ≥c+δ∫c−δ

f(c)

2=f(c)

2(2δ) > 0.

Here we have assumed a < c < b. If a = c or c = b, the result still works. We leave this to you to prove as an exercise.

26 Questions and History

A Question to Ponder. Can you find a formula for any indefinite integral?

For example, consider

x∫0

dt

(1−t2) 14and

x∫0

e−t2

dt.

Answer.Joseph Liouville (1809-1882) proved that such integrals as

Erf(x) =2√π

x∫0

e−t2

dt = the error function

93

Page 94: Introduction, Motivation, Basics About Sets, Functions, Counting

Figure 60: Picture proof that a non-negative continuous function f(x) on an interval [a, b] such that f(c) > 0 at some pointc must be positive in a subinterval (c − δ, c + δ). Here take ε ≤ f(c)/2 and the corresponding δ with the graph of y = f(x)above the interval (c− δ, c+ δ) in the pink box.

and

F (x|m) =x∫0

(1−m sin2 t)− 12 dt = the elliptic integral of the 1st kind

cannot be expressed in terms of a finite number of elementary functions. A reference is: D.G. Mead, American Math.Monthly, 68 (1961), 152-156. The error function comes up in statistics and in the solution of various partial differentialequations. The elliptic integrals come up in various sorts of applied math. problems as well as in number theory.Mathematica, Matlab, Maple, etc. attempt to do all "doable" integrals. But of course these programs have to recognize

that some integrals simply are not doable. See Wolfram’s book to accompany Mathematica for some discussion of thisproblem.Leibniz search for closed form expressions of integrals. J. Stillwell, Math. and its History, p. 110, says: "The search for

closed forms was a wild goose chase but, like many efforts to solve intractable problems, it led to worthwhile results in otherdirections. Attempts to integrate rational functions raised the problem of factorization of polynomials and led ultimatelyto the fundamental theorem of algebra .... Attempts to integrate (1 − x4)− 1

2 led to the theory of elliptic functions.... theproblem of deciding which algebraic functions may be integrated in closed form has been solved only recently, though not ina form suitable for calculus textbooks, which continue to remain oblivious to most of the developments since Leibniz." SeeJ. H. Davenport, On the Integration of Algebraic Functions, Lecture Notes in Computer Science, Springer-Verlag, 1981.Unlike Leibniz, Newton evaluated integrals by expanding functions in power series and integrating the power series term-

by-term. Isaac Newton offered his works on calculus to the British Royal Society and Cambridge University Press but theworks were rejected. Thus Leibniz managed to publish the first paper on calculus. As Stillwell notes, loc. cit., p. 109:"This led to Leibniz’s initially receiving credit for the calculus and later to a bitter dispute with Newton and his followersover the question of priority for the discovery." Stillwell finds (p. 110): "One thing has changed: it is now much easier topublish a calculus book than it was for Newton."

94

Page 95: Introduction, Motivation, Basics About Sets, Functions, Counting

27 Taylor’s Formula with Remainder

Next we consider the Taylor formula which is a finite version of Taylor series. In 1715 Brook Taylor (1685-1731) was the firstperson to publish the formula.

Theorem 53 Taylor’s Formula with Remainder.Suppose that J is an interval and f : J → R has n continuous derivatives on the interval. If a, x ∈ J, we have

f(x) =n−1∑k=0

f (k)(a)

k!(x− a)k +Rn,

where the remainder is

Rn =1

(n− 1)!

x∫a

f (n)(t)(x− t)n−1dt.

Proof. Induction on n.Start with the fundamental theorem of calculus. This says

f(x)− f(a) =x∫a

f ′(t)dt,

which is the case n = 1 of Taylor’s formula.For the induction step, assume the nth formula and deduce the (n+1)st. Start with Rn, the nth formula for the remainder,

and use integration by parts to find

Rn =f (n)(a)

n!(x− a)n +Rn+1.

Let’s do the details. Integration by parts says∫udv = uv− ∫ vdu. So let u = f (n)(t) and dv = (x− t)n−1dt to obtain

du = f (n+1)(t)dt and v = −1n (x− t)n and thus:

Rn =1

(n− 1)!

x∫a

f (n)(t)(x− t)n−1dt

=1

(n− 1)!

⎧⎨⎩ −1n (x− t)nf (n)(t)∣∣∣∣xa

+1

n

x∫a

(x− t)nf (n+1)(t)dt⎫⎬⎭

=1

n!f (n)(a)(x− a)n +Rn+1.

This completes the proof of the induction step.

It is much easier to remember the alternate formula for the remainder. It looks much like the next term in the Taylorseries.

Corollary 54 Taylor’s Formula with Alternate Remainder. Under the same hypotheses as for Taylor’s formula, wecan find a point c between a and x such that the remainder in Taylor’s formula has the form:

Rn =1

n!f (n)(c)(x− a)n.

Proof. Case 1. a < x.We are assuming that f (n) is continuous on the interval [a, x]. So by the Weierstrass theorem on maxima and minima

we have constants m and M such that

m ≤ f (n)(x) ≤M for all x ∈ [a, b].

95

Page 96: Introduction, Motivation, Basics About Sets, Functions, Counting

In this case, since integrals (in the right direction) preserve inequalities, we have

m

n!(x− a)n ≤ m

(n− 1)!

x∫a

(x− t)n−1dt (35)

≤ Rn =1

(n− 1)!

x∫a

f (n)(t)(x− t)n−1dt (36)

≤ M

(n− 1)!

x∫a

(x− t)n−1dt = M

n!(x− a)n. (37)

Therefore

m ≤ Rn n!

(x− a)n ≤M. (38)

By the intermediate value theorem for f (n), we know there is a point c in [a, x] such that

Rn n!

(x− a)n = f(n)(c).

Solve for Rn to obtain the desired formula.Case 2. x < a.

We leave the details of this case to the reader as an exercise. Note that

x∫a

= −a∫x

and that for n even, −(x−t)n−1 ≥ 0

for t ∈ [x, a]. The inequality is reversed if n is odd. You have to take account of that to get formula (35). But in the lattercase another sign switch will occur when you derive formula (38). The joy of inequalities.

28 Examples of Taylor’s Formula

Example 1. The Taylor Formula Remainder need not always approach 0 as n→∞.Define

f(x) =

{e−1/x, x > 00, x ≤ 0.

This function has the property that f (n)(x) exists for all values of x. Moreover it can be shown that f (n)(0) = 0 for alln = 0, 1, 2, 3, .... See the exercises. So the Taylor formula says

f(x) =n−1∑k=0

f (k)(a)

k!(x− a)k +Rn

= 0 +Rn.

Since f(x) = e−1/x > 0 if x > 0, we see that if we let n → ∞, the Taylor series does not represent the function forx > 0. Thus, for positive x, the remainder Rn cannot approach 0 as n→∞.We say that such a function is "not analytic". Weierstrass defined analytic functions around 1870 and found many

applications. His student Sonya Kovalevsky (alias Sofia Kovalevskaya) wrote a thesis about power series (in 2 or morevariables) solutions of partial differential equations.

Example 2. The Exponential.

ex =∞∑n=0

xn

n!

96

Page 97: Introduction, Motivation, Basics About Sets, Functions, Counting

Figure 61: The function f(x) ={e−1/x, x > 00, x ≤ 0.

The ratio test (which we consider later) shows that the remainder (2nd form) goes to 0 as n→∞Stirling’s formula for n! would make this even clearer as it says

n! ∼√2πnnne−n, as n→∞, meaning that lim

n→∞n!√

2πnnne−n= 1.

Read "∼" as "is asymptotic to." A proof of Stirling’s formula can be found in Lang, Undergraduate Analysis.

As we noted earlier eix = cosx + i sinx, where i =√−1. So you can get the Taylor series for cosx and sinx out of

that for eix.

Example 3. Geometric Series.

1

1− x =∞∑n=0

xn, for |x| < 1.

Example 4. The Logarithm.

− log(1− x) =∞∑n=1

xn

n, for |x| < 1.

Example 5. The Binomial Series.

(1 + x)s =∞∑n=0

(s

n

)xn, for |x| < 1.

Here the generalized binomial coefficient is(s

n

)=s(s− 1) · · · (s− n+ 1)n(n− 1) · · · 2 · 1 =

Γ(s+ n)

Γ(s)Γ(n+ 1),

where Γ(s) is the gamma function from the introduction.You can plug an integer value of s into the binomial series and obtain the binomial theorem. In that case when n becomes

bigger than s the coefficient is 0. When n is negative Γ(n + 1) is infinite and you have to use the first formula for thebinomial coefficient. For example, when s = −1, you get the geometric series with x replaced by −x. When s = 1

2 , you get√1 + x = exp

{12 log (1 + x)

}.

97

Page 98: Introduction, Motivation, Basics About Sets, Functions, Counting

Example 6. Legendre Polynomials (Spherical Harmonics).The Legendre polynomials can be defined by

Pn(x) =1

2nn!

dn

dxn(x2 − 1)n .

This is called the Rodrigues formula for the Legendre polynomials. The first few are:

P0(x) = 1, P1(x) = x, P2(x) =1

2

(3x2 − 1) , P3(x) = 1

2

(5x3 − 3x) .

Mathematica knows them and will graph them quickly. The Mathematica command Plot[Table[LegendreP[i, x],{i, 30}], {x, -1, 1}] produced the following figure.

Figure 62: Graph of Legendre polynomials Pn for n = 0, ..., 30.

There is a way of putting all the Legendre polynomials together in what is called a generating function (beloved ofcombinatorists):

(1− 2xt+ t2)− 12 =

∞∑n=0

Pn(x)tn, for |t| < 1 and |x| ≤ 1.

It is possible to use the generating function to derive properties of the Legendre polynomials quickly. For example, you cansee that Pn(1) = 1.References for Legendre polynomials include: N.N. Lebedev, Special Functions, S. Wolfram, Mathematica, R. Courant

and D. Hilbert, Methods of Mathematical Physics, Vol. I.The Legendre polynomials arise in problems of applied math. which have spherical symmetry. They form a system of

orthogonal polynomials on the interval [−1, 1] with the inner product

< f, g >=

1∫−1f(t)g(t)dt.

The Legendre polynomials are pairwise orthogonal for this inner product; i.e., < Pn, Pm >= 0 if n �= m. It is possible toexpand arbitrary continuous functions on [−1, 1] is series of Legendre polynomials (a generalized Fourier series) convergentin the sense of the norm coming from this inner product ‖f‖ = √< f, f >. We will say much more about such things in the

98

Page 99: Introduction, Motivation, Basics About Sets, Functions, Counting

last section of Lectures, II. The Legendre polynomials are important for obtaining explicit solutions of partial differentialequations with spherical symmetry; e.g., Schrödinger’s equation for the hydrogen atom.

Stillwell, Mathematics and its History, p. 107 says: "It is misleading ... to describe Newton as a founder of calculusunless one understands calculus, as he did, as an algebra of infinite series. In this calculus, differentiation and integrationare carried out term by term on powers of x and hence are comparatively trivial."Newton studied the binomial series and used it to obtain the power series for arcsine in 1669. Mercator found the power

series for log(1− x) in 1668.There is also

arctanx =∞∑n=0

(−1)n+1 x2n+1

2n+ 1, if |x| < 1.

By a theorem of Abel, this converges for x = 1 and gives the Leibniz formula:

π

4= 1− 1

3+1

5− 17+ · · · .

Of course there are many applications of power series. For example, in the theory of perturbations, small deviations fromthe state of a physical system, one plugs a power series into the differential equation describing the system and computes asmany coefficients of the solution as possible, in order to deduce information about the system.

29 A Look at the History of the Integral

References: G. Temple, 100 Years of Mathematics; J. Korevaar, Mathematical Methods.In 1854 Riemann began studying integrals in order to understand Fourier series of periodic functions. He based his

theory on Riemann sums S(f, P ) which approach the Riemann integral

b∫a

f(x)dx as you refine the partition P = {a =

x0 < x1 < · · · < xn = b} of the interval [a, b] on the x - axis.. Here, for sk ∈ [xk, xk+1), the Riemann sum is

S(f, P ) =

n∑k=1

f(sk)(xk − xk−1).

"Refining the partition" means that the lengths of all subintervals [xk, xk+1) get smaller. Assume that f is continuous orpiecewise continuous on [a, b]. You may view this as approximating the function f(x) by the step function with value f(sk)on the kth subinterval [xk, xk+1). See Figure 63.

In 1875 Darboux modified the Riemann sums by taking f(sk) to be either the max or min of f(x) on the kth subinterval.He thus got upper and lower sums. For a function f(x) to be Riemann integrable one wants the upper and lower sums toapproach the same limit as the partition P is refined. The common limit of the upper and lower Darboux sums is called the

Riemann integral

b∫a

f(x)dx. This is fine for piecewise continuous functions.

Our Axioms 1 and 2 for the integral actually come out of upper and lower Darboux sums. We will not go this routehowever. Instead we will prove the existence of the integral for continuous functions (or piecewise continuous functions) bya method closer to that of Lebesgue (1902). Lebesgue’s methods led to a better theory of Fourier series and to the moderntheory of probability measures.If you are only interested in the calculation of the sort of integrals that arise in calculus, then it does not matter what

definition of integral you use. For a continuous function on a closed finite interval or a piecewise continuous function (i.e., afunction that is mostly continuous but has a finite number of jump discontinuities) the Lebesgue integral equals the Riemannintegral. However, the Lebesgue integral will integrate more functions than the Riemann integral. For a positive function on

99

Page 100: Introduction, Motivation, Basics About Sets, Functions, Counting

Figure 63: The Riemann sum is obtained by summing terms f(sk)(xk − xk−1) over k = 1, ..., 7. Since our function isnon-negative, this means we add up the areas of the rectangles with dotted sides, green tops and bases on the x-axis.

100

Page 101: Introduction, Motivation, Basics About Sets, Functions, Counting

an infinite interval (or a positive unbounded function) the Lebesgue integral is the same as the improper Riemann integral,when the latter exists.For theoretical work, the Lebesgue integral is much nicer than the Riemann integral. For example, the interchange of

integral and limit is legal in many more situations for the Lebesgue integral. The hypotheses of the Lebesgue dominatedconvergence theorem are quite weak in comparison to those that we will discuss later for the Riemann integral. Similarlythe story of repeated integrals is much simpler for Lebesgue integrals (see the statement of Fubini’s theorem).There are at least 2 ways to approach the Lebesgue integral.Lebesgue’s Way. Define the Lebesgue measure μ(S) of a (measurable) subset S ⊂ R where μ[a, b] = b−a. Suppose the

bounded function f has range contained in the interval [c, d] on the y-axis. Then partition the interval on the y-axis ratherthan the x-axis by partition Q = {c = y0 < y1 < · · · < yn = d}. Thus obtain step functions approximating f involving setsSk = f

−1[yk−1, yk]. Assume the function is such that all these sets are Lebesgue measurable. Let γk ∈ [yk−1, yk]. Thenthe step function φ(x) is defined to have the constant value γk on Sk, for k = 1, ..., n. The Lebesgue integral of stepfunction ψ(x) is defined to be

n∑k=1

γkμ(Sk).

See Figure 64.

Figure 64: To get a Lebesgue integral create a step function by partitioning the y-axis. Then define the step function φ tobe constant on the inverse image of the intervals on the y-axis. For example, φ(x) = γ1 for all x ∈ f−1[y0, y1] = S1.

The New-Fangled Way. Use Cauchy sequences of step functions with respect to the L1-norm on the space of nice

101

Page 102: Introduction, Motivation, Basics About Sets, Functions, Counting

functions on [a, b]

‖f‖1 =b∫a

|f(x)| dx.

In order to do this, we must think about normed vector spaces (infinite dimensional) of functions. The method is the sameas that used to create the real numbers as limits of Cauchy sequences of rational numbers.

The Lebesgue measure μ of a set S of real numbers starts out with the idea that an interval I = [a, b] or (a, b) or[a, b) or (a, b] has measure μ(I) = b − a. Measurable sets are created by taking countable unions, complements, countableintersections. Then the measure is assumed to have certain properties on measurable sets. In particular, the Lebesgueshould be non-negative and countably additive meaning that if one has a infinite sequence Sn of pairwise disjoint (n �= m=⇒ Sn ∩ Sm = ∅) measurable sets then

μ

( ∞⋃n=1

Sn

)=

∞∑n=1

μ(Sn).

We will not say more about the Lebesgue integral or measure here.

102

Page 103: Introduction, Motivation, Basics About Sets, Functions, Counting

1

∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂

Exercises 1 ∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂

Draw a picture to illustrate each problem, if possible. 1) Let A,B,C be sets. a) Show that ( ) ( ).A B C A B C∩ ∩ = ∩ ∩ b) Show that if ∅ denotes the empty set, then A∪∅=A. c) Let A⊂B and B⊂C. Show that if we denote the complement of B in C by Bc ={x∈C|x∉B} and the complement of A in C by Ac, then Bc⊂Ac. d) Show that (A∩B)c=Ac∪Bc. 2) Define the function L:[0,1]→[0,1] by L(x)=½x(1-x). a) Show that L is not 1-1. b) Show that L is not onto. c) Compute ( ) ( ) ( ( )).L L x L L x=�

3) Suppose that a function f:A→B is 1-1 and onto. a) Define the inverse function f-1:B→A. b) Show that f=(f-1)-1 . c) Look at the example with A=B=[0,1] and f(x)=x2. Show that this function is 1-1 and onto. What is the inverse function f-1 ? 4) Suppose A is a finite set and |A| = the number of elements in A. a) Show that if f:A→B is 1-1, then |A|≤|B|. b) Show that if f:A→B is onto, then |B|≤|A|. 5) Use mathematical induction to prove that if ! ( 1) 2 1n n n= ⋅ − ⋅ ⋅ ⋅ ⋅ , then 2n < n! , for all n=4,5,6,7,....

∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂

Exercises 2 ∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂

1) a) Explain why the set � of rational numbers is denumerable.

b) Suppose that g:S→T is 1-1 and onto. Show that S is denumerable if and only if T is denumerable. 2) Which of the following sets is denumerable: { }2 1,2,3,.....n n = or the interval [-2,1] ?

Explain your answer. 3) a) Suppose that a is a real number and n,m are positive integers. Prove that .n m n ma a a+ = ⋅ b) Prove the formula in part a) if both n and m are negative integers.

c) Again suppose that a is real and n,m are positive integers. Show that ( ) .mn n ma a= �

4) a) Suppose that a is a positive real number. Show that there is a positive integer n such that 1 .xn

<

b) Show that 7 is irrational. 5) a) Use the 2 order axioms ORD1 and ORD2 listed on p. 22 of the lectures plus the definition of a < b to deduce that x < y implies x+z < y+z (which is Fact 3 in our list of facts about order). b) Using the properties of inequalities (see our list of facts about order) and the definition of

absolute value, find the set of real numbers x such that 2 11 .4

x − < Draw a picture of the set.

6) Prove that ||x|-|y||≤|x-y| for all real numbers x and y.

Exercises I

Page 104: Introduction, Motivation, Basics About Sets, Functions, Counting

2

∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃

Exercises 3 ∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃ 1) Tell whether the sequences below have a limit. Find the limit, if possible. Prove that your answer is correct. a) 1+(-1)n b) n!/nn c) (1-n)/(1+n) 2) Assume that {xn} is a sequence of real numbers. Show that if lim nn

a x→∞

= exists, then the set

{x1,x2,x3, .... } is bounded (both above and below). 3) Assume that {xn} and {yn} are sequences of real numbers. Show that if lim nn

a x→∞

= exists and

lim nnb y

→∞= exists, then lim n nn

ab x y→∞

= exists.

4) True-False. State whether the following are true or false. Give a brief reason for your answer (such as a reference to some fact in these notes or a textbook). a) Every bounded sequence of real numbers is convergent to a real number. b) Every bounded sequence of real numbers is Cauchy. c) Suppose that an ≤ bn ≤ cn for n≥n0. If the limit lim limn nn n

L a c→∞ →∞

= = exists and is the same for

both {an} and {cn} , then {bn} has a limit which is also the same; i.e. lim nnL b

→∞= .

5) a) Define lim nna

→∞∞ = .

b) Suppose that {an} is an unbounded sequence of non-negative real numbers. Show that {an} has a subsequence converging to ∞ according to your definition in part a). ∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃

Exercises 4 ∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃ 1) Find the following limits. Then explain your answer using the εδ definition of limit. This means given ε, find δ as a function of ε. a) (5 1)lim

0x

x−

→ b) 2lim

1x

x→ − c) lim

00

xxx

→>

.

2) True-False. Tell whether the following statements are true or false. Give a brief reason for your answer.

a) ( )lim f x Lx a

=→

implies ( )lim0f a h L

h+ =

→.

b) Suppose f:[a,∞)→�. Then ( )lim f x Lx ax a

=→>

implies L = f(a).

Hint: Draw a picture of the graph of y=f(x) near x=a. 3) Prove or disprove: a) ( ) 3 ( )lim lim

3f x f x

x a x a=

→ → b) ( ) (3 )lim lim

3f x f x

x a x a=

→ →.

4) Prove parts 1,2,3 in the Proposition about limits.

Page 105: Introduction, Motivation, Basics About Sets, Functions, Counting

3

5) Prove that if f:(0,∞)→R, then 1( ) ( ) .lim lim

00

f x L f Lxx x

x

= ⇔ =→ ∞ →

>

∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃ Exercises 5

∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃ 1) Determine where the following functions are continuous. Explain your answers briefly. Draw the graphs. a) ( )f x x= � �� � = floor of x= greatest integer ≤ x.

b) 1,

( )0, .if x

f xotherwise

∈�= �

c) f:[-1,1]→[-1,1] defined by setting f(0)=0 and then connecting the point 1 ,0

2k �� � �

to two points

1 1,

2 1 2 1k k �� + +� �

and 1 1,

2 1 2 1k k �� − −� �

for all integers k∈� with k≠0. So the function consists

of infinitely many line segments. This example implies that it is wrong to say that continuity means being able to draw the graph without lifting the pen from the paper. You cannot really draw this graph. 2) a) Use the Intermediate Value Theorem to show that any polynomial of odd degree has a real root. Why doesn't this work for polynomials of even degree? b) Suppose that f:[a,b]→[a,b] is continuous. Show that there is an x∈[a,b] such that f(x)=x The point x is called a fixed point of f. Hint: Look at g(x)=f(x)-x. 3) True-False. Tell whether the following statements are true or false. Give a brief reason for your

answer. a) Every function f:[0,1]→� has a maximum value.

b) Every continuous function f:[0,1] →� has a minimum value.

c) Every continuous function f:(0,1) →���has a maximum value.

4) Suppose that f:[a,b] →��is continuous and strictly increasing; i.e., x < y, for x,y in [a,b] implies

f(x) < f(y). Show that if c=f(a) and d=f(b), the inverse function f-1:[c,d] →� exists and is

continuous and strictly increasing as well. Draw a picture.

∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂

Exercises 6 ∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂

1) a) Assuming the derivative of ex is ex, find the derivative of log(x)=loge(x)=ln(x) using the theorem about derivatives of inverse functions. (In this course, e is the only base we use.) b) Compute the derivative of the following function using properties of derivatives such as the chain rule: f(x)= x1/x . Hint: Recall that the definition of xa for x>0 is xa=ealogx. 2) Assume that f and g are functions on the open interval (a,b). Assume that both f and g are

differentiable at x∈(a,b). Suppose c is a real constant. Prove using the definition of derivative that: a) (f+g)’(x)=f’(x)+g’(x). b) (cf)'=cf' if c is a constant. c) Using mathematical induction, prove the formula for the derivative of xn, for n=1,2,3,... . 3) a) Define f(x) = xsin(1/x) if x≠0 and f(0)=0. Show that f(x) is not differentiable at x=0 but

Page 106: Introduction, Motivation, Basics About Sets, Functions, Counting

4

f(x) is continuous at x=0. b) Discuss the differentiability of the floor function [x]=�x�=the largest integer ≤ x , for all real numbers x. Does the function have right- and left- hand derivatives at some points? Are they

equal? 4) a) Suppose that a continuous function f on [a,b] is differentiable on (a,b) and that the derivative f'(x) is bounded (above and below) on (a,b). Prove that then f is uniformly continuous on [a,b]. Hint. Use the mean value theorem. b) First Derivative Test. Consider the function f(x)=x1/x from problem 1b). Find all relative and absolute extrema for this function on (0,∞). Also find the glb and lub of {f(x)|x}, if possible. Sketch the function.

∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂ Exercises 7

∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂

1) Prove the Generalized Mean Value Theorem also known as Cauchy’s Mean Value Theorem. This says the following. Assume that f and g are differentiable on (a,b) and continuous on [a,b]. Show there exists a point c∈(a,b) such that

[f(b)-f(a)]g’(c)=[g(b)-g(a)]f’(c). Hint. As in the proof of the mean value theorem, consider a function to which you can apply Rolle’s

theorem. Try the function: h(x)=[f(b)-f(a)][g(x)-g(a)]-[g(b)-g(a)][f(x)-f(a)]. 2) Prove l’Hôpital’s rule. This says the following. Assume that f and g are differentiable on an open

interval (a,c). Suppose also that g(x) and g’(x) do not vanish in (a,c). Finally assume that both f(x) and g(x) approach 0 as x goes to c, with x < c. Then l’Hôpital’s rule says:

'( ) ( ), .lim lim'( ) ( )f x f xIf k then kg x g xx c x c

x c x c

= =→ →< <

3) a) Define 1

( 1) , 1( )0, 1

xe for xg xfor x

−−�� >= �

≤�. Prove that g(x) is continuous at x=1.

b) Show that the function g(x) is differentiable everywhere. You have to consider 3 cases: x < 1, x = 1, and x > 1. 4) a) Show that the function g(x) from problem 3 has a second derivative g”(x) for all x. Again there are 3 cases. b) Using mathematical induction, consider the nth derivative g(n)(x) for the function g(x) of problem 1. Show g(n)(x) exists for all n∈�+ and all real numbers x. Again there are 3 cases.

Hint: the induction hypothesis should be something like the following:

g(n)(x)=0, for x≤1 and 1

( 1)( ) 1( )1

xnng x P ex

−− �= � −� �

, for x>1,

where Pn(u) is a polynomial. 5) Sketch a graph of the function g(x) from problem 3. What is the Taylor series for g(x) centered at x=1? Does it represent g(x)?

Page 107: Introduction, Motivation, Basics About Sets, Functions, Counting

5

∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε Exercises 8

∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε

1) Prove that 2

6

sin6 3

x dx

π

π

π π≤ ≤� without evaluating the integral.

2) Assume that a<b<c. Note that if a function f(x) is continuous on [a,b] and also continuous on [b,c], but has a jump discontinuity at x=b, then we can extend the integral to say f is integrable on [a,c] and

( ) ( ) ( ) .c b c

a a b

f t dt f t dt f t dt= +� � �

Show that this integral still satisfies the our axioms I and II. 3) Define sign(x) = 1 if x>0, sign(x)=-1 if x<0, and sign(0)=0. Show that for any a<b, we have (using the

preceding problem): ( ) | | | |b

a

sign x dx b a= −� . Why is |x| not an antiderivative of sign(x)?

4) The Mean Value Theorem for Integrals. Suppose f:[a,b]→� is continuous. Show that then there is a

point c in [a,b] such that ( ) ( )( ).b

a

f t dt f c b a= −�

Hint. First note that f has a maximum M and minimum m on [a,b]. Why? Then use the axioms for

integrals to see that ( ) ( ) ( ).b

a

m b a f t dt M b a− ≤ ≤ −� Then 1 ( )

b

a

f t dtb a− � is in the interval [m,M].

Use the intermediate value theorem to finish the problem.

5) Compute 2

.x

t

x

d e dtdx −

�� � ��

6) What is wrong with the formula 11

211

1 1 2 ?dtt t −−

− �= = −���

7) Write out Taylor’s formula with remainder as in Lang p. 109 with a=0 and n=6 for the function

f(x) = log(1-x). 8) Cauchy-Schwarz Inequality.

Suppose that f and g are continuous on [a,b]. Show that 2

2 2( ) ( ) ( ) ( ) .b b b

a a a

f t g t dt f t dt g t dt≤� � �

Hint: Let x be a fixed real number and look at ( )2

2 2( ) ( ) .b

a

f t xg t dt Ax Bx C− = + +� This is always

non-negative. What does that say about B2-4AC ? 9) Show using the fundamental theorem of calculus that, assuming the derivatives f’, g’ are continuous on

an open interval containing [a,b], then for all x in [a,b], we have:

'( ) ( ) ( ) ( ) ( ) ( ) ( ) '( )x x

a a

f t g t dt f x g x f a g a f t g t dt= − −� � .

∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε

Page 108: Introduction, Motivation, Basics About Sets, Functions, Counting

1

��������������������������������������������������� I, Practice Exam 1

��������������������������������������������������� 1) Define and give an example: a) function f:S→T; 1-1 function; onto function; b) denumerable set; c) lub; d) lim n

nx

→ ∞;

e) If f:(c,d) → � and a∈(c,d), define the 2-sided limit ( )lim

, ( , )

f xx a

x a x c d→

≠ ∈

;

f) Cauchy sequence; g) subsequence.

2) a) State the well ordering axiom for �+.

b) Define the absolute value |x| for a real number x and state its 3 basic properties. c) State the completeness axiom for the real numbers.

d) State the negation of definition of the 2-sided limit ( )lim

( , ),

f xx a

x c d x a→

∈ ≠

for the function f:(c,d) → �

with a∈(c,d). Hint: You need recall how to negate a statement with the quantifiers ∀, ∃.

3) a) Prove that an increasing sequence of real numbers which is bounded above has a limit. b) Show that a Cauchy sequence is bounded. c) Show that if a sequence {xn} is Cauchy and it has a convergent subsequence, then {xn} converges to the same limit as the subsequence.

d) Prove that the set �+ does not have an upper bound in �.

e) Suppose that lim nx Ln

=→ ∞

and lim ny Mn

=→ ∞

. Prove that .lim n nx y LMn

=→ ∞

4) a) Prove that �5 is irrational.

b) Show that � is denumerable.

c) Show that � is not denumerable.

d) State and prove the formula for the number of functions f:S→T, if S and T are finite sets. e) Prove by induction that for each positive integer n and for any real x with x≥-1, we have (1+x)n ≥ 1+nx. This is (Jacob) Bernoulli’s inequality. Where did you use x≥-1? Is it really necessary?

5) Define a sequence inductively as follows: 1 11, 1 .n na a a+= = + Show that

lim nLn

a=→ ∞

exists and find a formula for the limit. Answer: The limit is the golden ratio:

1 5

2L += .

Hint: First show that 0< an < 2 for all n, by induction. Then show that the sequence is strictly increasing. Then note that L must satisfy L2=1+L. Why?

Page 109: Introduction, Motivation, Basics About Sets, Functions, Counting

2

6) State whether the following sequences have limits. If they do, find the limit and prove that the sequence approaches that limit using the definition of limit. If they don't, explain why they don't.

a) sin( )lim nn

π→ ∞

; b) 2lim1

nnn +→ ∞

; c) 2

2lim1

n nnn

−+→ ∞

; d) 1lim2nn→ ∞

.

7) True-False. Tell whether the following statements are true or false. Give a brief reason for your answer. a) The only real number a which satisfies the inequality |a|< ε, for every ε>0, is the number a=0. b) The sum of two irrational numbers is irrational. c) For all real numbers x,y we have |x-y|≤|x|-|y|.

d) If x� �� � denotes the largest integer less than or equal to x, then 1.lim1x

x=� �� �

e) Any decreasing sequence of real numbers has a limit. f) If f:S→T and A is a subset of set T, define the inverse image f-1(A)={x|x∈S and f(x)∈A }. Then f-1(A∩B)=f-1(A)∩f-1(B) for any subsets A and B of the set T. We do not assume here that f is 1-1 and onto. g) If f:S→T is 1-1 and onto then there is an inverse function g:T→S such that the composition f � g =identity on T and g � f=identity on S. h) If x>0 and y is any real number, there is a positive integer n such that nx>y. i) A sequence of real numbers can have 2 different limits. 8. Find the lub and glb for the following sets, if possible.

a) 2 1m mm

+� �∈� + �� ;

b) { }20 5x x and x∈ > <� ;

c) the positive integers d) the open interval (0,3). 9) Find the following limits, if possible. Then explain your answer using the definition of limit.

a) lim0

0

xxx

→>

;

b) Define f(x)=0 if x is irrational and f(x)= 1 if x is rational. Find ( )lim f xx ax a

→≠

for any rational number a.

���������������������������������������������������

Page 110: Introduction, Motivation, Basics About Sets, Functions, Counting

1

��������������������������������������������������� I��Practice Exam �

��������������������������������������������������� 1) a) Define and give an example of a function f(x) which is continuous at x=c.

b) Give 2 definitions for the derivative f’(c). Draw sketches to illustrate the meaning of both defns. c) Define ax for a>0 and explain why this generalizes an for n∈�+

.

d) State the 2 axioms for the integral of a continuous function on [a,b].

2) State and prove: a) mean value theorem b) fundamental theorem of calculus c) Weierstrass theorem on the existence of maxima and minima d) intermediate value theorem

3) Prove the following properties of the derivative

a) chain rule b) linearity c) product rule 4) Prove the following properties of the integral of a continuous function:

a) '( ) ( ) ( ),b

a

f x dx f b f a= −� assuming f’ continuous on [a,b]. b) linearity

c) integration by parts d) substitution formula e) integrals preserve ≤ 5) True-False.

Tell whether the following statements are true or false. If true, give a brief reason for your answer. If false, give a counterexample. a) Suppose f:(a,b)→���is continuous. Then there is a point c in (a,b) such that f(x) ≤ f(c) for all x∈(a,b).

b) b b

a a

fg f g=� � for functions f,g:[a,b]→� continuous.

c) Suppose f:[a,b] →� is continuous and f is differentiable on (a,b) with f’(x) =0 for all x∈(a,b ). Then

f(x)=constant, for all x∈[a,b]. d) Recall that a function f(x) defined for x∈���is even iff f(-x)=f(x) for all x∈�. We say f(x) is odd iff

f(-x)=-f(x), for all x∈�. Then the derivative f’(x) of an even function f(x) is an odd function.

e) Suppose that x=g(y) is the inverse function to y=f(x). Then g’(y)=1/f’(y). f) f,g continuous on [a,b] implies max{f,g} continuous on [a,b]. g) f continuous at x=c implies f differentiable at x=c.

Page 111: Introduction, Motivation, Basics About Sets, Functions, Counting

2

6) a) Compute g’(21) when x=g(y) is the inverse function for y=f(x)=2x3+1. b) Find the set of points where the following function f(x) is continuous:

c) Compute lim 1 .n

n

xn→∞

�+� �� �

d) Suppose that f is a differentiable function on the whole real line such that f’(x)=2xf(x) for all x∈�.

Show that there is a constant C so that 2

( ) xf x Ce= for all x.

e) Compute 0

( ) .td g x t dx

dt−�

7) a) Suppose that f’(x) > 0 for all x in (a,b). Why does y=f(x) have an inverse function x=g(y)? What is the formula for the derivative of the inverse function? Prove it.

b) State Taylor’s formula with remainder (2nd version). Apply it to get the formula for log(1-x) using the first 3 terms of the Taylor series plus remainder.

c) Define ex by its Taylor series and prove that euev = eu+v.

d) Define log(y) as the inverse function to ex and show that log(uv) = logu+logv.

8) Define a function 1/ , 0

( )0, 0

xe for xf x

for x

−� >= � ≤

. Show that f(n)(0) exists for all n∈�+. Sketch this

function. Is f(x) represented by its Taylor series around the point 0? Why?

9) Show that the following function gives an example of a continuous function whose graph cannot be drawn without lifting pen from paper. Define f(x) inductively as a map from [-1,1] to [-1,1] consisting of straight line segments connecting the point (1/(2k),0) to the 2 points (1/(2k+1),1/(2k+1)) and (1/(2k-1),1/(2k-1)) for k=±1,±2,±3, .... Sketch the function.

���������������������������������������������������

1,( )

0,x rational

f xx irrational

�= �

Page 112: Introduction, Motivation, Basics About Sets, Functions, Counting

11/27/2010

1

Sierpinski TriangleSierpinski Triangle

3D Cantor Dust

Lorenz attractor

Coastline of Great Britain

Mandelbrot Set

What makes a fractal?

I’m using 2 references: Fractal Geometry by Kenneth FalconerEncounters with Chaos by Denny Gulick

1) A f t l i b t f �n ith i t di i1) A fractal is a subset of �n with non integer dimension. Of course this make no sense without a definition of dimension.

2) Fractal contains copies of itself at many scales.

3) It is too irregular to be described by traditional language.

Mandelbrot coined the term in 1977 though many examples were known. "Clouds are not spheres mountains are not cones coastlines are notClouds are not spheres, mountains are not cones, coastlines are not circles, and bark is not smooth, nor does lightning travel in a straight line."(Mandelbrot, 1983).

Other references: the web, books by MandelbrotM. Barnsley, Fractals Everywhere

I, Discussion of Final, Fractals:

from Wikipedia: list of fractals by Hausdoff dimension

Page 113: Introduction, Motivation, Basics About Sets, Functions, Counting

11/27/2010

2

1) Cantor Dust.From the interval [0,1] remove the middle third. Then remove the middle third from the remaining 2 intervals [0,1/3] and [2/3,1].Keep going with this removal of middle thirds forever …..You end up with the Cantor dust. Impossible to draw it. We give the first 5 steps. Later we will see the Cantor Dust has box dimension l /l 6ln2/ln3 � .63.

Higher dimension than a point but smaller than an interval. There are many interesting facts about the Cantor dust. For example, the set is uncountable, but if you integrate the function that is 1 on the Cantor set and 0 off the set (using the Lebesgue integral), you get 0. The Riemann integral cannot deal with this.

Defn. Start with C0=[0,1]. Let C1=[0,1/3]�[2/3,1]. Continue in this way, defining Ck as a union of 2k subintervals, each of length 3-k

obtained by removing middle thirds of the intervals in Ck-1. The Cantor set C is the intersection of all the Ck, for k running over all integers � 0.

Problem 1. Show that the Cantor set can be identified with all the real numbers in [0,1] that can be represented in the form

This includes, for example, 20/27 = 2/3 + 2/27 = 0.202000... . Here, if a number has 2 expansions, we are saying 1 and only 1 of these expansions has no dn taking the value 1. Thus 1/3�C since even though 1/3=.10000 …, since we also have 1/3=.0222….

Thi i li th i t i th C t t

1

, {0,2}.3n

nnn

dx with d�

� ��

This implies there are as many points in the Cantor set as there are real numbers. The Cantor set is uncountable.Hint. The numbers requiring a 1 in the 1st place of their ternary expansion lie in the interval (1/3,2/3). The numbers requiring a 1 in the 2nd place of their ternary expansion lie in the union of the intervals (1/9,2/9) and (7/9,8,9).

Page 114: Introduction, Motivation, Basics About Sets, Functions, Counting

11/27/2010

3

2) Sierpinski Triangle.

Start with an equilateral triangle and remove the center triangle.

R h i l f h f hRemove the center triangles from each of the 3 remaining triangles.

Keep going forever.

Fractal dimensions give a way of comparing fractals. Fractal dimensions can be defined in connection with real world data, such as the coastline of Great Britain. It turns out to have fractal dimension approximately 1.2. Here we will only look at the box dimension. It is only one of a wide variety of notions of fractal dimension. (The oldest and perhaps most important is the Hausdoff dimension. It is harder to calculate and you will need to look at the references such as Falconer to find out what that is.)

Definition of Box Dimension.Defn. Suppose S is a subset of �n, n=1,2,3. By an n-box, we mean a closed interval if n=1; a square if n=2; a cube if n=3.Defn. >0 let N() be the smallest number of n-boxes of side length needed to cover S. Example

here we use ln(x) rather than log(x)

Example.It takes 8 boxes of length 1/3 to cover the unit square with center square of side 1/3 removed.Defn. Box dimension of S� �n is defined to be the following limit if it exists:

Note. The box dimension is often called capacity. It sometimes

00

ln ( )dim lim .1lnBNS

p ydiffers from the Hausdorff dimension which we do not define here. But both definitions of dimension agree on most of the simple examples. For an m-dimensional surface in �n, n=1,2,3, the box dimension can be shown to be m. Thus, for example, the box dimension of the surface of the sphere is 2. See Falconer, p. 44.

Problem 2. Show that the box dimension of the unit square is 2.

Page 115: Introduction, Motivation, Basics About Sets, Functions, Counting

11/27/2010

4

Theorems to aid us in computing the box dimension.Theorem 1. Let S be a subset of �n, n=1,2,3. If 0<r<1, then the limit defining dimBS exists iff the following limit exists and then they are equal

ln ( )dim lim .1ln

k

B k

N rS��

Proof Sketch. Given >0 and r with 0<r<1, you just have to find a positive integer k such that

ln kr

1 1

1

. ( ) ( ) ( ).1 1 1ln( ) , ln ln ln ,

k k k k

k k

r r Then N r N N r

As x monotone

� �

� � � �

� � � � � �� � �� � � � � �� � � � � �

� � � � � �

1

1

1

1

( )

ln ( ) ln ( ) ln ( ).ln ( ) ln ( ) ln ( ) .

ln 1 /ln 1 / ln 1 /

k k

k k

k k

k k

r rand N r N N r

N r N N rSor r

� � � � � �� � � � � �� �

� �

Example.The Box dim of the Cantor set C.

Recall that C is obtained from the interval [0,1] by continually removing middle thirds. At each stage there are twice as many intervals as the preceding stage And each interval hasmany intervals as the preceding stage. And each interval has length 1/3 that of the preceding stage. Since N(1/3) =2, we find that, by induction, N(1/3K)=2K, for each k�1.Let r=1/3 in the preceding theorem and find

1lnln 23dim lim limln 31l

kk

B kk k

NC

�� ��

� �� �� �� �

� �� �ln 1

3ln 2 ln 2lim 0.63.ln 3 ln 3

k

k

kk��

� �� �� �

� � �

Page 116: Introduction, Motivation, Basics About Sets, Functions, Counting

11/27/2010

5

Problem 3. Show that the box dimension of the set of rational numbers in [0,1] is 1.

Defn. Suppose our set S is a subset of �n, n=1,2,3. The distance between 2 points x,y is denoted �x-y �. Suppose that a function f:S�S has the property that for some constant r with 0<r<1, �f(x)-f(y)� = r �x-y �, for all x,y in S. Then we call f a similarityf S Th t t i ll d th i il it t tof S. The constant r is called the similarity constant.

Example. Let S=[0,1] & f(x)=2/3 + x/3. Then f is a similarity with constant 1/3. It maps [0,1] onto [2/3,1].

Defn. If there are m similarities f1,…,fm of S such that S=f1(S) � … � fm(S), and the images are non-overlapping except possibly for boundaries, we say that S is a self-similar set. It is composed of m (shrunk) copies of itselfcopies of itself.

Example. The Canter set C. Define g(x)=x/3 and f(x)= 2/3 + x/3. Then C=f(C) � g(C). Both f and g are similarity functions with similarity constant 1/3.

Problem 4. Show that the Sierpinski triangle is a self-similar set. Use this to see that the box dimension of the Sierpinski triangle is ln3/ln2�1.58 using the following theorem. You need to define functions of vectors in the plane. First put the origin at the left hand base point of the big triangle. Then figure out what function shrinks the big triangle to the small one at the origin. Next what vector must you add to that function to shift the left small triangle over to the right one at the base?

Another Theorem making it easy to compute the box dimension.

Theorem 2. Suppose that S is a self similar set in �n, n=1,2,3; i.e., S=f1(S) � … � fm(S), non-overlapping, and such that each similarity function fi has the same similarity constant r. Then

dimBS=ln m/ln(1/r).

Proof. We use Theorem 1. Since N(rk) =cmk, for some positive constant c (Why? is Problem 5 Hint Think about the Cantor set) weconstant c (Why? is Problem 5. Hint. Think about the Cantor set), we have

ln ( ) ln lnlim lim .ln(1/ ) ln(1/ ) ln(1/ )

k k

k kk k

N r m mr r r�� ��

� �

Page 117: Introduction, Motivation, Basics About Sets, Functions, Counting

11/27/2010

6

Defn. A set M of real numbers is said to have Lebesgue measure 0 ifffor every > 0 there is a sequence of intervals En such that

Sets of Lebesgue measure 0 are considered negligible in the theory of

11

( ) .n nnn

M E and length E ��

� ���

Sets of Lebesgue measure 0 are considered negligible in the theory of the Lebesgue integral. One can ignore what a function does on such sets when computing Lebesgue integrals. And one identifies functions that are equal except on a set of Lebesgue measure 0.

Problem 6. Show that any countable set M of real numbers has Lebesguemeasure 0.

Problem 7. Show that the Cantor set has Lebesgue measure 0.Hint. To do this recall that setting C0=[0,1]. Let C2=[0,1/3]�[2/3,1]. Continue g 0 2in this way, defining Ck as a union of 2k subintervals, each of length 3-k obtained by removing middle thirds of the intervals in Ck-1. The Cantor set C is the intersection of all the Ck, for k running over all integers � 0.Show that Ck is has length (2/3)k.

Defn. The Devil's Staircase or the Almost Perfect Sneak. Define a function on the interval [0,1] as follows. Write the complement of the Cantor set in [0,1] as a union of intervals. First define f on the complement of the Cantor set. Define f(x)=½ for x� ; f(x)=¼ for x� , f(x)=¾ for x� .

Then in the kth step, going from left to right on the 2k-1

subintervals left out of the Cantor set define

1 2,3 3

� �� �� �

1 2,9 9

� �� �� �

7 8,9 9

� �� �� �

f(x) =

Define f(x) for x�C=the Cantor set by f(x) = l.u.b.{ f(t) | t < x, t�[0,1]-C} and set f(0)=0.

Problem 8. Prove that f(x) is increasing, continuous and has derivative f’(x)=0 except on the Cantor set C, which has Lebesgue measure 0. But f(0)=0 and f(1)=1. So the fundamental theorem of calculus fails for this function:

1 3 2 1, , ,2 2 2

k

k k k��

1

0

1 (1) (0) '( ) 0.f f f t dt� � � ��Hint. If f were not continuous, f would have a jump. Then there would have to be an open subinterval of [0,1] containing no value of f. But the range of f contains all numbers of the form (2n+1)/2k in [0,1].

Page 118: Introduction, Motivation, Basics About Sets, Functions, Counting

11/27/2010

7

A person moving toward you according to the Devil’s staircase law y=f(x) would cover a unit distance in a unit of time, but you might never see him or her move even if you were watching all the time. Thus Korevaar, Mathematical Methods, p. 404, calls this function "the almost perfect sneak.“

In 1872 Weierstrass found functions that were continuous everywhere but nowhere differentiable. This shocked many famous mathematicians who had thought such a function impossible.

Hermite described these functions as a "dreadful plague."

Poincaré wrote: "Yesterday, if a new function was invented it was to serve some practical end; today they are specially invented only to show

th t f f th d th ill h thup the arguments of our fathers, and they will never have any other use." Even as late as the 1960's, before "everyone" had a computer fast enough to graph these things, such examples were viewed as pathological monsters. Now there are thousands of websites with pictures of approximations of them.

The Weierstrass Nowhere Differentiable Function.Weierstrass published this construction in 1872. It too is a fractal.Defn. The Weierstrass function f. The definition involves parameters �>1 and 1<s<2. Then f:[0,1]�� is defined by

( 2)( ) sin( )s k kf t t� ��

�� � 2

3

4

Theorem 3. If � is large enough, the box dimension of the graph of f(t) in �2 is s. By the graph of f, we mean the set of points (t,f(t)), for all t in [0,1].

There are lots of pictures of this graph on the web. For example h // k / k / f

1

( ) sin( ).k

f t t� ���

0.002 0.004 0.006 0.008 0.01

-1

1

http://en.wikipedia.org/wiki/Weierstrass_function.http://planetmath.org/encyclopedia/WeierstrassFunction.htmlOr see http://www.math.washington.edu/%7Econroy/for an animation zooming in on the Weierstrass function

1

1( ) sin(2 ).2

kk

kf t t

� �

Page 119: Introduction, Motivation, Basics About Sets, Functions, Counting

11/27/2010

8

Problem 9. Show that is a

continuous function on [0,1].

( 2)

1

( ) sin( )s k k

k

f t t� ��

� �

The following pictures of the Weierstrass function with �=2 and

0.002 0.004 0.006 0.008 0.01

The following pictures of the Weierstrass function with �=2 and s=1.9 were produced using the Mathematica commands below. I summed the first 150 terms of the Fourier series for the Weierstrass function and then plotted the result on the interval [0,1]. Why do 150 terms suffice? Look at the geometric series with x=2^(-.1). To get x^k<10^(-4), you need log(x^k)<log(10^-4) to get 4 significant digits. Then we want the next term which is the estimate for the remainder to be

k log(2^(-.1)) < -4 log 10 � -.1 k log2 < -4 log 10 � k > 40 log 10/log 2

40*Log[10]/Log[2]//N �132.877

The commands to sum the first 150 terms and make a function of it. Then plot on the interval [0,1].

fun[t_]:=fun[t]=Sum[2^(-k*(2-1.9))*Sin[t*2^k],{k,1,150}]Plot[fun[t],{t,0,1}];

Same function plotted on the interval [0,0.01].

3

4

Weierstrass function �=2 and s=1.9 plotted on the interval [0,1].

0.2 0.4 0.6 0.8 1

-2

2

4

The essence of a fractal - the

0.002 0.004 0.006 0.008 0.01

-1

1

2

same behavior at all scales. This function also looks nowhere differentiable.

Page 120: Introduction, Motivation, Basics About Sets, Functions, Counting

11/27/2010

9

Defn. Rf[a,b]=sup{|f(t)-f(u)|, t,u in[a,b]}.

Proposition 1. Let f:[0,1]��, be continuous. Suppose that 0<�<1, and m is the least integer greater than or equal to 1/�.

Preparations for computing the box dimension of a graph of a function.

If N(�) is the number of squares of side � that intersect the graph of f, we have

Proof.The number of squares of side � in the column above the interval [k � (k+1) �] that intersect the graph of f is at least

1 11 1

1 1

[ , ( 1) ] ( ) 2 [ ,( 1) ].m m

f fk k

R k k N m R k k� � � � � � �� �

� �

� �

� � � � �� �

[k �,(k+1) �] that intersect the graph of f is at least Rf[k�, (k+1)�]/�. It is at most 2+ Rf[k�, (k+1)�]/�, using the fact that f is continuous. Sum over all the intervals to finish the proof.

Problem 10. Fill in the details in this proof. Draw a picture.0.002 0.004 0.006 0.008 0.01

-1

1

2

3

4

Lemma 1. Let f:[0,1]�� be continuous. Assume the box dimension of the graph of f exists.1) Suppose that � c>0 and s, 1�s�2, such that

|f(t)-f(u)| �c|t-u|2-s, t,u in [0,1]. Then dimBgraphf �s.This remains true if the inequality only holds for |t-u|<� , for some �>0.

2) Suppose that � c>0, �0>0 and s, 1�s�2 such that t in [0,1], and � with 0< � � �0 � u such that |t-u|< �and |f(t)-f(u)|�c �2-s.Then dimBgraphf �s.

Proof.1) From the hypothesis of 1) we see that Rf[a,b]�c|a-b|2-s, for a,b in [0,1]. Using the notation of the proposition, we see that

m<(1+ �-1) and thus

0.002 0.004 0.006 0.008 0.01

-1

1

2

3

4

( )N(�)�(1+ �-1)(2+c�-1�2-s) �c1 �-s,

where c1 is independent of �. The result follows from the definition of box dimension.2) Similarly the hypothesis of 2) implies that Rf[a,b]�c|a-b|2-s. Since �-1 �m, we have from Proposition 1 that N(�) � �-1 �-1 c �2-s

=c�-s. Again, the result follows from the definition of box dimension.

Page 121: Introduction, Motivation, Basics About Sets, Functions, Counting

11/27/2010

10

Problem 11. Suppose f:[a,b] �� has a continuous derivative. Show dimBgraphf=1. Hint. Use the mean value theorem and Lemma 1.Now we prove Theorem 3 which says that � large enough implies that the box dimension of the graph of the Weierstrass function is s.

Given h in (0,1), let N be the integer such that �-(N+1) � h < �-N.

4

Use our definition

Splitting sums up into 2 parts we get:

( 2)

1

( ) sin( ).s k k

kf t t� �

��

� �

� �� � � �

� �� � � � � �� � � �

( 2)

1

( 2) ( 2)

( ) ( ) sin sin

sin sin sin sin

s k k k

k

Ns k k k s k k k

f t h f t t h t

t h t t h t

� � �

� � � � � �

��

�� �

!� � � � �" #

� � � �

� �

0.002 0.004 0.006 0.008 0.01

-1

1

2

3

Here we used the mean value theorem on the first N terms and that |sinx|�1 for the rest of the terms.Then we sum the geometric series to see that

� �� � � � � �� � � �( ) ( )

1 1

( 2) ( 2)

1 1

sin sin sin sin

2 .

k k NN

s k k s k

k k N

t h t t h t

h

� � � � � �

� � �

� � �

�� �

� � �

� � � � � �

� �

� �

� �

( 1) ( 2)( 1)2

1 2( ) ( ) 2 ,1 1

s N s Ns

s shf t h f t ch� �

� �

� � ��

� �� � � � �� �

where c is independent of h.This implies that the dimB(graph f) � s by the Lemma above.

T th th t k d fi i f(t h) f(t) d lit itTo go the other way, take our sum defining f(t+h)-f(t) and split it into 3 parts, the first N-1 terms, the Nth term, and the rest. This implies that

� � � �( 1)

( 2)

,

( ) ( ) sin( ) sin

N N

s N N N

if h then

f t h f t t h t

� �

� � �

� �

� �

� � � � �

(*) ( 2) 1 ( 2)( 1)

1 2

2 .1 1

s N s s N

s s� �

� �

� � � � �

� �� �� �

0.002 0.004 0.006 0.008 0.01

-1

1

2

3

4

Page 122: Introduction, Motivation, Basics About Sets, Functions, Counting

11/27/2010

11

Suppose that �>2 is large enough that

For �<1/�, take N such that ( 1).N N� � �� � �� �

( 2)(*) 1 , .20

s Nright hand side of is N� ��

t, � h, such that

and such that

All this implies that

( 2) 2 21 1( ) ( ) .s N s sf t h f t � � �� � �� � � �

( 1)N Nh� �� � � �

� � 1sin sin .10

N Nt h t� � ! !� � " # " #

It follows from the preceding Lemma that dimB(graphf) �s

( ) ( ) .20 20

f t h f t � � �� � �

Problem 12. Show that any function satisfying condition 2 of Lemma 1 with s>1 must be nowhere differentiable. It follows that the Weierstrass function is continuous but nowhere differentiable.

There is lots more to say about fractals, but we will stop here.I leave you with a picture of the Mandelbrot set from Wikipedia.

Page 123: Introduction, Motivation, Basics About Sets, Functions, Counting

�����FINAL 1) Page 2 of fractals lecture. Show that the Cantor set can be expressed as certain triadic expansions and thus is uncountable. 2) Page 3 of fractals lecture. Show that the box dimension of the unit square is 2. 3) Page 5 of fractals lecture. Show that the box dimension of the set of rational numbers is 1. Compare with

the box dimension of the Cantor set. How can it be that the Cantor set has a smaller box dimension than the rationals even though the Cantor set is uncountable?

4) Page 5 of fractals lecture. Show that the Sierpinski triangle is a self-similar set. Show using Thm. 2 that the box dimension of the Sierpinski triangle is ln3/ln2. 5) Answer the Why? in the Proof of Theorem 2 on p.5 of the fractals lecture. 6) Page 6 of fractals lecture. Show that any countable set M of real numbers has Lebesgue measure 0. 7) Page 6 of fractals lecture. Show that the Cantor C set (fractals lecture p. 2) has Lebesgue measure 0. Hint. To do this show that Ck in the definition of C has length (2/3)k. 8) Page 6 of fractals lecture. Prove that the Devil’s Staircase function f(x) is increasing, continuous and has derivative f’(x)=0 except on the Cantor set C, which has Lebesgue measure 0. 9) Page 8 of fractals lecture. Show that the Weierstrass function is a continuous function on [0,1]. Hint. Use a convergence test (often called the Weierstrass M-Test) from calculus which we will do in the next part of the notes. 10) Page 9 of fractals lecture. Fill in the details of the proof of Proposition 1, page 9 of the fractals lecture. 11) Page 10 of fractals lecture. Suppose f:[a,b]�� has a continuous derivative. Show dimBgraphf=1. Hint. Use

the mean value theorem and Lemma 1. 12) Page 11 of fractals lecture. Show that any function satisfying condition 2 of Lemma 1 must be nowhere differentiable. It follows that the Weierstrass function is continuous but nowhere differentiable.

( 2)

1( ) sin( ) .s k k

kf t tλ λ

∞−

=

=�

Page 124: Introduction, Motivation, Basics About Sets, Functions, Counting

Lectures on Advanced Calculus with Applications, II

Audrey Terras

December, 2010

Part I

Normed Vector Spaces1 Motivation

Recall Section 1.1 of Lectures I where we noted that Fourier needed to express a function f(x) as a Fourier series:

f(x) =∞∑

n=−∞ane

2πinx.

Here eix = cosx + i sinx, where i =√−1 (which is not a real number). This means you can rewrite the series of complex

exponentials as 2 series - one involving cosines and the other involving sines. The Fourier coefficients are

an =

1∫0

f(y)e−2πinydy.

In order to investigate Fourier series as well as integrals, we will need to look at normed vector spaces.

2 Why worry about infinite dimensional normed vector spaces?

We want to understand the integral from a more modern perspective rather than that of your calculus book. Secondly wewant to understand convergence of series of functions - something that proved problematic for Cauchy in the 1800s. Thesethings are important for many applications in physics, engineering, statistics. We will be able to study vibrating things suchas violin strings, drums, buildings, bridges, spheres, planets, stock values. Quantum physics, for example, involves Hilbertspace, which is a type of normed vector space with a scalar product where all Cauchy sequences of vectors converge.The theory of such normed vector spaces was created at the same time as quantum mechanics - the 1920s and 1930s. So

with this approach we are moving ahead hundreds of years from Newton and Leibnitz, perhaps 70 years from Riemann.Fourier series involve orthogonal sets of vectors in an infinite dimensional normed vector space:

C[a, b] = {f : [a, b]→ R |f continuous} .The L2−norm of a continuous function f in C[a, b] is

‖f‖2 =⎛⎝ b∫a

|f(x)|2 dx⎞⎠1/2

.

This is an analog of the usual idea of length of a vector f = (f(1), ..., f(n)) ∈ Rn :

‖f‖2 =⎛⎝ n∑j=1

|f(j)|2⎞⎠1/2

.

1

Page 125: Introduction, Motivation, Basics About Sets, Functions, Counting

There are other natural norms for f ∈ C[a, b] such as:

‖f‖1 =b∫a

|f(x)| dx.

‖f‖∞ = maxa≤x≤b

|f(x)| .

On finite dimensional vector spaces such as Rn it does not matter what norm you use when you are trying to figure outwhether a sequence of vectors has a limit. However, in infinite dimensional normed vector spaces, convergence can disappearif a different norm is used. Not all norms are equivalent in infinite dimensions. We will discuss this in detail later.Note that C[a, b] is infinite dimensional since the set {1, x, x2, x3, ..., xn, ...} is an infinite set of linearly independent

vectors. Prove this as follows. Suppose that we have a linear dependence relationn∑j=0

cjxj = 0, for all x in [a, b]. This

implies all the constants cj = 0. Why? Exercise.Infinite dimensional vector spaces are thus more interesting than finite dimensional ones. Each (inequivalent) norm leads

to a different notion of convergence of sequences of vectors.

3 What is a Normed Vector Space?

In what follows we define normed vector space by 5 axioms. We will not put arrows on our vectors. We will try to keepvectors and scalars apart by mostly using Greek letters for scalars. Our scalars will be real in this section. However, at theend of these lectures, we will allow complex scalars. It simplifies Fourier series.

Definition 1 A vector space V is a set of vectors v ∈ V which is closed under addition and closed under multiplicationby scalars α ∈ R. This means ∀ vectors u, v ∈ V, there is a unique sum u + v ∈ V and ∀ scalar α ∈ R, there is a uniqueproduct αv ∈ V. Moreover the following 5 axioms must hold for all u, v, w ∈ V and all α, β ∈ R:VS1. u+ (v + w) = (u+ v) + wVS2. ∃ 0 ∈ V s.t. 0 + v = vVS3. ∀ v ∈ V ∃ v′ ∈ V s.t. v + v′ = 0.VS4. v + u = u+ vVS5. 1v = v, α(βv) = (αβ)v, (α+ β)v = αv + βv, α(u+ v) = αu+ αv.

You may say we cheated by putting 4 axioms into VS5.

Definition 2 A vector space V is a normed vector space if there is a norm function mapping V to the non-negative realnumbers, written ‖v‖ , for v ∈ V, and satisfying the following 3 axioms:

N1. ‖v‖ ≥ 0 ∀v ∈ V and ‖v‖ = 0 if and only if v = 0.N2. ‖αv‖ = |α| ‖v‖ , ∀v ∈ V and ∀α ∈ R. Here |α| = absolute value of α.N3. ‖u+ v‖ ≤ ‖u‖+ ‖v‖ , ∀u, v ∈ V. Triangle Inequality.

Definition 3 The distance between 2 vectors u, v in a normed vector space V is defined by

d(u, v) = ‖u− v‖ .

Example 1. 3-Space.

R3 =

⎧⎨⎩⎛⎝ x1x2x3

⎞⎠∣∣∣∣∣∣x1, x2, x3 ∈ R⎫⎬⎭ .

2

Page 126: Introduction, Motivation, Basics About Sets, Functions, Counting

Define addition and multiplication by scalars as usual:⎛⎝ x1x2x3

⎞⎠+⎛⎝ y1y2y3

⎞⎠ =

⎛⎝ x1 + y1x2 + y2x3 + y3

⎞⎠ .α

⎛⎝ x1x2x3

⎞⎠ =

⎛⎝ αx1αx2αx3

⎞⎠ , ∀ α ∈ R.The usual norm is

‖x‖2 =√x21 + x

22 + x

23 if x =

⎛⎝ x1x2x3

⎞⎠ .Other norms are possible:

‖x‖1 = |x1|+ |x2|+ |x3| or ‖x‖∞ = max {|x1| , |x3| , |x3|} .

The proof that these definitions make R3 a normed vector space is tedious. So we make it an exercise.

Example 2. The space of continuous functions on an interval.

C[a, b] = {f : [a, b]→ R |f continuous} .For f, g ∈ C[a, b], define (f + g)(x) = f(x) + g(x) for all x ∈ [a, b] and define for α ∈ R (αf)(x) = αf(x) for all

x ∈ [a, b]. We leave it as an exercise to check the axioms for a vector space. The most interesting part of the exercise is toshow that f + g and αf are both continuous functions on [a, b].Again there are many possible norms. We will look at 3:

‖f‖2 =⎛⎝ b∫a

|f(x)|2 dx⎞⎠1/2

.

‖f‖1 =b∫a

|f(x)| dx.

‖f‖∞ = maxa≤x≤b

|f(x)| .

Most of the axioms for norms are easy to check. Let’s do it for the ‖f‖1 norm.Proof of N1. ‖v‖ ≥ 0 ∀ v ∈ V and ‖v‖ = 0 if and only if v = 0.Since |f(x)| ≥ 0 for all x we know that the integral is ≥ 0, because the integral preserves inequalities (see Lectures I).

Suppose that

b∫a

|f(x)| dx = 0. Since f is continuous (as is |f |), this implies f(x) = 0 for all x ∈ [a, b] by the positivity of

the integral in Lectures I.

Proof of N2. ‖αv‖ = |α| ‖v‖ , ∀ v ∈ V and ∀α ∈ R .

Also for any α ∈ R and f ∈ C[a, b], we have ‖αf‖1 =b∫a

|αf(x)| dx =b∫a

|α| |f(x)| dx = |α|b∫a

|f(x)| dx = |α| ‖f‖1 . This

proves N2 for norms. Here we used the multiplicative property of absolute value as well as the linearity of the integral (i.e.,scalars come out of the integral from Lectures, I).

3

Page 127: Introduction, Motivation, Basics About Sets, Functions, Counting

Proof of N3. ‖u+ v‖ ≤ ‖u‖+ ‖v‖ , ∀ u, v ∈ V.Using the definition of the 1-norm, and the triangle inequality for real numbers as well as the fact that the integral

preserves ≤, we see that

‖f + g‖1 =b∫a

|f(x) + g(x)| dx ≤b∫a

(|f(x)|+ |g(x)|) dx.

To finish the proof, use the linearity of the integral to see that

b∫a

(|f(x)|+ |g(x)|) dx =b∫a

|f(x)| dx+b∫a

|g(x)| dx = ‖f‖1 + ‖g‖1 .

Putting it all together gives ‖f + g‖1 ≤ ‖f‖1 + ‖g‖1 .

4 Scalar Products.

You have seen the dot (or scalar or inner) product in R3. It is⎛⎝ x1x2x3

⎞⎠ •⎛⎝ y1y2y3

⎞⎠ = x1y1 + x2y2 + x3y3.

It turns out there is a similar thing for C[a, b]. First let’s define the scalar product on a vector space and see how to geta norm if, in addition, the scalar product is positive definite.

Definition 4 A (positive definite) scalar product is a function mapping (v, w) ∈ V × V to < v,w >∈ R such that:SP1. < v,w >=< w, v >, ∀ v, w ∈ V (symmetry)

SP2. < u, v + w >=< u, v > + < u,w >, ∀ u, v, w ∈ VSP3. < αv,w >= α < v,w >, ∀ v, w ∈ V and ∀αεRSP4. < v, v > ≥ 0,∀ v ∈ V and < v, v >= 0 ⇐⇒ v = 0 (positive definite)

Axioms SP1,2,3 imply that < v,w > is linear in each variable holding the other variable fixed. Axiom SP4 says thescalar product is positive definite. We will always want to assume SP4 because we want to be able to get a norm out ofthe scalar product via the following definition.

Definition 5 If V is a vector space with a (positive definite) scalar product < v,w > for v, w ∈ V, define the associatednorm by ‖v‖ = √< v, v >, for all v ∈ V

Before proving that this really gives a norm, let’s look at some examples.Example 1. In R3 the scalar product is

< x, y >=

⎛⎝ x1x2x3

⎞⎠ •⎛⎝ y1y2y3

⎞⎠ = x1y1 + x2y2 + x3y3.

It is easy to check the axioms. For example, the positive definiteness follows from the fact that squares of real numbers are≥ 0 and sums of non-negative numbers are non-negative:

< x, x >= x21 + x22 + x

23 ≥ 0.

And 0 =< x, x > ≥ x2i implies xi = 0 for all i and thus x = 0.

Example 2. C[a, b] = {f : [a, b]→ R | f is continuous}

4

Page 128: Introduction, Motivation, Basics About Sets, Functions, Counting

For f, g in C[a, b], define the scalar product by

< f, g >=

b∫a

fg.

Once more, it is not hard to use the properties of the integral to check axioms SP1,SP2,SP3 (exercise). To see SP4,note that f(x)2 ≥ 0 for all x ∈ [a, b] implies by the fact that integrals preserve ≥ that

< f, f >=

b∫a

f(x)2dx ≥ 0.

Now suppose that < f, f >= 0. By the positivity property of the integral, we know that f(x)2 = 0 for all x ∈ [a, b] whichsays that f is the 0 function (the identity for addition in our vector space C[a, b]).

Then the norm associated to this scalar product is ‖f‖2 =⎛⎝ b∫a

|f(x)|2 dx⎞⎠1/2

.

The following theorem is so useful people from lots of countries got their names attached.

Theorem 6 Cauchy-Schwarz (Bunyakovsky) InequalitySuppose that V is a vector space with scalar product < v,w >. Then, defining the norm ‖v‖ = √< v, v >, we have for

all v, w ∈ V :|< v,w >| ≤ ‖v‖ ‖w‖ .

Proof. Let t ∈ R and look atf(t) =< v + tw, v + tw > .

By properties of the scalar product, we have 0 ≤ f(t) =< v, v > +2t < v,w > +t2 < w,w > .As a function of f, we see that f(t) = At2 + Bt + C, where A =< w,w >, B = 2 < v,w > and C =< v, v > . So

the graph of f(t) is that of a parabola above or touching the t-axis. For example, in Figure 1, we have drawn a parabolatouching the t-axis at one point.

Figure 1: graph of a parabola with positive leading coefficient and non-positive discriminant

5

Page 129: Introduction, Motivation, Basics About Sets, Functions, Counting

Recall the quadratic formula for the roots r± of f(t) = At2 +Bt+ C = 0,

r± =−B ±√B2 − 4AC

2A.

Since we have at most one real root it follows that

B2 − 4AC ≤ 0.Now plug in A =< w,w >, B = 2 < v,w > and C =< v, v > . This gives the Cauchy-Schwarz inequality.

Corollary 7 Under the hypotheses of the preceding theorem, using Definition 5, ‖v‖ = √< v, v > defines a norm on V .

Proof. We must prove:N1. ‖v‖ ≥ 0 ∀ v ∈ V and ‖v‖ = 0 if and only if v = 0.N2. ‖αv‖ = |α| ‖v‖ , ∀ v ∈ V and ∀α ∈ R .N3. Triangle Inequality. ‖u+ v‖ ≤ ‖u‖+ ‖v‖ , ∀ u, v ∈ V.We get N1 from SP4.We get N2 from SP3. For then ‖αv‖2 =< αv, αv >= α2 < v, v >= |α|2 ‖v‖2 , ∀ v ∈ V and ∀αεR .To prove the triangle inequality N3, we need to use the Cauchy-Schwarz inequality. This proof goes as follows. By the

linearity and symmetry of the scalar product we see that

‖u+ v‖2 = < u+ v, u+ v >=< u, u > +2 < u, v > + < v, v >

= ‖u‖2 + 2 < u, v > + ‖v‖2≤ ‖u‖2 + 2 |< u, v >|+ ‖v‖2 as x ≤ |x|≤ ‖u‖2 + 2 ‖u‖ ‖v‖+ ‖v‖2 by Cauchy − Schwarz= (‖u‖+ ‖v‖)2 .

Now use the fact that the square root √ preserves inequalities to finish the proof of the triangle inequality.

What’s the good of all this? Now we can happily define limits of sequences of vectors {vn} in our normed vector spaceV . Can you guess the definition of lim

n→∞vn = L ∈ V ?Answer: ∀ε > 0, ∃Nε ∈ Z+ s.t. n ≥ Nε implies ‖vn − L‖ < ε. That is, just replace absolute value in the old

definition of limit with the norm.Similarly we can define a Cauchy sequence {vn} in the normed vector space V . We will do all this in detail later, but

you should be able to guess what we will say.

Another use of the scalar product is to define orthogonal vectors in a vector space V with a scalar product.

Definition 8 Two vectors v, w ∈ V, a vector space V with scalar product <,>, are defined to be orthogonal if the scalarproduct < v,w >= 0.

In a vector space with scalar product, you can also define the angle θ between 2 vectors v, w ∈ V, by< v,w >= ‖v‖ ‖w‖ cos θ.

From this, the preceding definition makes sense as 2 vectors are orthogonal iff the cosine of the angle between them is 0.This definition of cosine agrees with the usual one if V = R2.

What is the cosine law? Using the triangles in Figure 2, it says

‖v − w‖2 = ‖v‖2 − 2 ‖v‖ ‖w‖ cos θ + ‖w‖2 .You also need to see, using the axioms for scalar product, that

‖v − w‖2 =< v − w, v − w >= ‖v‖2 − 2 < v,w > + ‖w‖2 .Putting these last 2 formulas together yields < v,w >= ‖v‖ ‖w‖ cos θ.

6

Page 130: Introduction, Motivation, Basics About Sets, Functions, Counting

Figure 2: Visualizing vectors in a normed vector space and the angle between them. The cosine law says ‖v − w‖2 =‖v‖2 − 2 ‖v‖ ‖w‖ cos θ + ‖w‖2 .

5 Comparison of Norms

Suppose {vn} is a sequence in a normed vector space V with norm ‖‖ , We will say limn→∞vn = L and "vn converges to

L ∈ V " ⇐⇒ limn→∞ ‖vn − L‖ = 0. Note that the last sequence ‖vn − L‖ is a sequence of real numbers.

Question. We know that there are lots of norms on V . How can we guarantee that 2 different norms ‖‖α and ‖‖βproduce the same convergent sequences in V ?The answer is that equivalent norms produce the same convergent sequences where we define equivalent as follows.

Definition 9 2 norms ‖‖α and ‖‖β on the vector space V are equivalent iff there are constants A,B > 0 such that for allv ∈ V we have

A ‖v‖α ≤ ‖v‖β ≤ B ‖v‖α .

In the preceding definition we are assuming that the constants A and B are independent of v ∈ V.Why do equivalent norms lead to the same convergent sequences?Answer. Suppose {vn} is a convergent sequence for the ‖‖α-norm; i.e., for some L ∈ V we have lim

n→∞ ‖vn − L‖α = 0.And suppose ‖‖β is an equivalent norm. Then

A ‖vn − L‖α ≤ ‖vn − L‖β ≤ B ‖vn − L‖α .

Since the outside sequences go to 0 as n→∞, it follows by the squeeze lemma that the guy in the middle has to go to 0as well.Similarly lim

n→∞ ‖vn − L‖β = 0 implies limn→∞ ‖vn − L‖ = 0.

Moral. It does not matter which of 2 equivalent norms you use to test a sequence for convergence.

Theorem 10 All norms on Rn are equivalent.

Proof. See Lang, Undergraduate Analysis, p. 145.

7

Page 131: Introduction, Motivation, Basics About Sets, Functions, Counting

Thus, for our purposes, it does not matter which norm you use on finite dimensional vector spaces. You get the samedefinition of convergence of sequences. However, things are very different for infinite dimensional vector spaces. See Figure3 below for a sequence of functions fn in C[0, 1] such that

limn→∞ ‖fn − 0‖1 = 0 using the norm ‖f‖1 =

1∫0

|f(x)| dx.

However,‖fn − 0‖∞ = 1 for all n, using the norm ‖f‖∞ = max

a≤x≤b|f(x)| .

It follows that the norms ‖‖1 and ‖‖∞ on C[0, 1] are not equivalent.

Exercise. Using the same example, show that limn→∞ ‖fn − 0‖2 = 0 using the norm ‖f‖2 =

⎛⎝ 1∫0

|f(x)|2 dx⎞⎠1/2

. It

follows that the norms ‖‖2 and ‖‖∞ on C[0, 1] are not equivalent.

Proposition 11 The norms ‖f‖1 =b∫a

|f(x)| dx and ‖f‖2 =⎛⎝ b∫a

f(x)2dx

⎞⎠1/2

are not equivalent. However, we do have

the inequality‖f‖1 ≤

√b− a ‖f‖2 .

Proof. To prove the inequality, use the Cauchy-Schwarz inequality on the functions |f | and g(x) = 1 for all x ∈ [a, b]. Thisgives

|< f, g >| ≤ ‖f‖2 ‖g‖2 .So we have |< |f | , 1 >| ≤ ‖|f |‖2 ‖1‖2 . Now this really means

‖f‖1 =

b∫a

|f(x)| dx ≤⎛⎝ b∫a

f(x)2dx

⎞⎠1/2⎛⎝ b∫a

1dx

⎞⎠1/2

=√b− a

⎛⎝ b∫a

f(x)2dx

⎞⎠1/2

=√b− a ‖f‖2 .

To see that ‖‖1 and ‖‖2 are not equivalent norms, we take a = 0 and b = 1. Then we look at the following example.Define as in Lang, Undergraduate Analysis, p. 147:

gn(x) =

{ √n, for 0 ≤ x ≤ 1

n ;1√x, for 1

n ≤ x ≤ 1.

Note that gn is continuous. Why? Exercise. Show also that

‖gn‖1 = 2−1√n

and‖gn‖2 =

√1 + log n.

It follows that there cannot be a constant C > 0 such that ‖v‖2 ≤ C ‖v‖1 at least on the interval [0, 1]. Can you extendthis idea to arbitrary intervals [a, b]?

8

Page 132: Introduction, Motivation, Basics About Sets, Functions, Counting

6 Limits in a Normed Vector Space

Let V be a normed vector space with norm ‖‖ . Recall our definition.

Definition 12 We say that a sequence of vectors vn ∈ V converges to L ∈ V and write limn→∞vn = L ⇐⇒

limn→∞ ‖vn − L‖ = 0.

Examples.Example 1) Let V = R2 and consider vn = ( 1n ,

1n2 ). Using whatever norm is your favorite, limn→∞vn = 0 = (0, 0). Prove

this as an exercise.Example 2) Define a sequence of functions on [0, 1] by the picture in Figure 3. One can show (exercise) that

Figure 3: Define a function fn(x), 0 ≤ x ≤ 1 by this graph.

‖fn − 0‖1 → 0 as n → ∞ while ‖fn − 0‖∞ = 1 for all n. So the sequence has a limit function 0 using the 1-norm onC[0, 1] but not using the ∞−norm.

Definition 13 A Cauchy sequence {vn} of vectors in a normed vector space V means that

∀ε > 0,∃N s.t. n,m ≥ N =⇒ ‖vn − vm‖ < ε.

Let’s list some facts about limits of sequences of vectors in V . Compare with the analogs for sequences of real numbersand you will see that mostly we replace the absolute value in the real version with the norm in our new version. We didthese things in Lectures I. Sometimes we write exactly the same formulas as before, though of course we mean somethingdifferent as now we are talking about sequences of vectors in V which may in fact be sequences of functions. The proofs ofthe following facts will be mostly the same as before, after replacing absolute value with norm.Facts1) Uniqueness of Limits.

limn→∞vn = L and lim

n→∞vn =M =⇒ L =M.

2) Cauchy implies BoundedIf the sequence {vn} is Cauchy then {vn} is bounded; i.e., contained in a ball of radius r and center 0;meaning that ‖vn‖ ≤ r for all n ∈ Z+.

9

Page 133: Introduction, Motivation, Basics About Sets, Functions, Counting

3) Linearity of Limits.

limn→∞vn = L and lim

n→∞wn =M

implieslimn→∞ (vn + wn) = L+M

and for all α ∈ R.limn→∞ (αvn) = αL

4) Convergent implies Cauchy. {vn} converges =⇒ {vn} Cauchy.5) Subsequences of Convergent Sequences Converge to Same Limit as the Original Sequence.limn→∞vn = L implies for any subsequence {vnk}k≥1 we have lim

k→∞vnk = L.

6) Limits are the same for Equivalent Norms.If ‖‖α and ‖‖β are 2 equivalent norms on V , then

limn→∞ ‖vn − L‖α = 0⇐⇒ lim

n→∞ ‖vn − L‖β = 0.Proofs of the Facts. Compare with analogous proofs in R.1) ‖L−M‖ = ‖L− vn + vn −M‖ ≤ ‖L− vn‖+ ‖vn −M‖ by the triangle inequality.We know that ‖L− vn‖ → 0 and ‖vn −M‖ → 0 as n→∞. This means 0 ≤ ‖L−M‖ ≤ 0. Thus ‖L−M‖ = 0. By

the first property N1 of norms, this implies L−M = 0.

2) Look at the proof of the analogous fact in Lectures I. Replace the absolute value in the old proof with the norm andyou have proved Fact 2. Let’s do it for practice.Take ε = 1 and find N0 such that n ≥ N0 implies ‖vn − vN0

‖ < 1. Then‖vn‖ ≤ ‖vn − vN0 + vN0‖ ≤ ‖vn − vN0‖+ ‖vN0‖ < 1 + ‖vN0‖ .

Take B = max{‖x1‖ , ..., ‖xN0−1‖ , ‖vN0‖+ 1}. This gives the bound B on all the vn.

3) Use the triangle inequality to see that

‖vn + wn − (L+M)‖ ≤ ‖vn − L‖+ ‖wn −M‖ .So given ε > 0, we can choose N1 and N2 so that

n ≥ N1 implies ‖vn − L‖ < ε/2and

n ≥ N2 implies ‖wn −M‖ < ε/2.Thus n ≥ max{N1, N2} implies

‖vn + wn − (L+M)‖ < ε

2+ε

2= ε.

Exercise. Prove the second part of Fact 3) and then generalize to show that if {αn} denotes a sequence of real numberssuch that lim

n→∞αn = α, then limn→∞αnvn = αL, assuming lim

n→∞vn = L for the sequence {vn} of vectors in V .

4) The analogous proof was in Lectures 1. Replace all the absolute values in that proof with our norm and you have theproof of Fact 4.If lim

n→∞vn = L and ε > 0 is given, there exists Nε ∈ Z+ such that n ≥ Nε implies ‖vn − L‖ < ε/2. So if m ≥ Nε, we

have ‖vm − L‖ < ε/2. It follows from the triangle inequality that for n,m ≥ Nε we have‖vn − vm‖ ≤ ‖vn − L+ L− vm‖

≤ ‖vn − L‖+ ‖L− vm‖ < ε

2+ε

2= ε.

10

Page 134: Introduction, Motivation, Basics About Sets, Functions, Counting

This means that our sequence is Cauchy.

5) See Lectures I. Suppose that {vnk} is a subsequence of {vn} and limn→∞vn = L. We need to prove that limk→∞

vnk = L.

For this we do the usual trick with the triangle inequality:

‖vnk − L‖ = ‖vnk − vn + vn − L‖ ≤ ‖vnk − vn‖+ ‖vn − L‖ .

Suppose we are given ε > 0. Since our sequence {vn} is convergent, it is Cauchy (by Fact 4) and there is an N1 so that

n,m ≥ N1 implies ‖vm − vn‖ < ε.

If k ≥ N1, then nk ≥ k implies nk ≥ N1 and thus ‖vnk − vk‖ < ε.Since lim

n→∞vn = L, we know there is an N2 such that

k ≥ N2 implies ‖vk − L‖ < ε.

It follows that, setting N = max {N1, N2} , we have the result we want, namely

k ≥ N =⇒ ‖vnk − L‖ ≤ ‖vnk − vk‖+ ‖vk − L‖ < 2ε.

6) We proved this in a previous section.

7 Completeness of Normed Vector Spaces

Definition 14 Suppose that V is a normed vector space. We say that V is complete iff every Cauchy sequence {vn} ofvectors vn ∈ V converges to a limit L ∈ V.

Theorem 15 Examples of Complete and Incomplete Normed Vector Spaces.1) Rk is a complete normed vector space.2) The space of continuous functions on a finite interval [a, b] which we denote C[a, b] is complete using the norm ‖‖∞

but not complete using the norms ‖‖1 or ‖‖2 .

Proof.1) We did the case k = 1 in Lectures I. Let’s take k = 2 for simplicity here. Suppose we use the norm ‖‖∞ for our

proof. That is, if x =(x(1)

x(2)

), set ‖x‖∞ = max

{∣∣x(1)∣∣ , ∣∣x(2)∣∣} . This is legal as all norms on R2 are equivalent (see Lang,Undergraduate Analysis, p. 145). Suppose {vn} is a Cauchy sequence of vectors in R2. Write vn =

(v(1)n

v(2)n

).

Claim.{v(j)n

}is a Cauchy sequence in R for j = 1 and j = 2.

Proof of Claim. Since {vn} is a Cauchy sequence and ‖vn − vm‖∞ = max{∣∣∣v(1)n − v(1)m

∣∣∣ , ∣∣∣v(2)n − v(2)m∣∣∣} , for j = 1

or j = 2, we have

∀ε > 0, ∃N s.t. n,m ≥ N =⇒∣∣∣v(j)n − v(j)m

∣∣∣ ≤ ‖vn − vm‖∞ < ε.

Q.E.D. Claim.

Since R is complete and Cauchy implies convergent in R, we know there exist L(1), L(2) ∈ R such that for j = 1 and 2,

L(j) = limn→∞v

(j)n .

11

Page 135: Introduction, Motivation, Basics About Sets, Functions, Counting

Figure 4: Here is a picture of the proof that a Cauchy sequence of vectors in the plane must converge because the projectionsof the sequence to points on the 2 axes converge to limits in R, the red star and the green star.

Let L =(L(1)

L(2)

).

Claim. limn→∞vn = L.

Proof. See Figure 4.

∀ε > 0, ∃Nj s.t. n ≥ Nj =⇒∣∣∣v(j)n − L(j)

∣∣∣ < ε. Let N=max {N1, N2} . Then n ≥ N implies

‖vn − L‖∞ = max{∣∣∣v(1)n − L(1)

∣∣∣ , ∣∣∣v(2)n − L(2)∣∣∣} < ε.

Q.E.D. Claim.2) We postpone the proof that C[a, b] is complete in the ‖‖∞ norm.To see that C[0, 1] is not complete with respect to the ‖‖1 norm, we must find a Cauchy sequence of continuous functions

that does not converge to an element of C[0, 1] in this norm.Example.Consider the function fn(x) defined by

fn(x) =

⎧⎨⎩0, 0 ≤ x ≤ 1

2 − 1n

1 + n(x− 12 ),

12 − 1

n ≤ x ≤ 12

1, 12 ≤ x.

See Figure 5.

Define the function L (x) ={0, 0 ≤ x < 1

2

1, 12 ≤ x ≤ 1.

Since L(x) is not continuous, it is not in the space C[0, 1]. But

‖fn − L‖1 = area of triangle in Figure 6 =1

2n→ 0, as n→∞.

Thus the sequence {fn} approaches a limit in the ‖‖1-norm, but not a limit that is continuous. So we have a Cauchysequence approaching a limit L /∈ C[0, 1]. This says C[0, 1] is not complete. It is like the rationals, full of holes. To make

12

Page 136: Introduction, Motivation, Basics About Sets, Functions, Counting

Figure 5: The function fn(x) designed not to converge to a continuous function with respect to the ‖‖1 norm as n→∞.

Figure 6: Triangle whose area is ‖fn − L‖1

13

Page 137: Introduction, Motivation, Basics About Sets, Functions, Counting

a space containing limits of all Cauchy sequences in C[0, 1], you need to add all Lebesgue integrable functions on [0,1]. Wewill not say much about the Lebesgue integral in this course. If you are interested, you could look at Lang, UndergraduateAnalysis, p. 262, Apostol, Mathematical Analysis, or Korevaar, Mathematical Methods.Figure 7 shows the difference between ‖f − g‖1 and ‖f − g‖∞ . Let f be the purple function and g be the blue one.

Then ‖f − g‖∞ is the maximum length of the pink dotted lines while ‖f − g‖1 is the area between the 2 curves.

Figure 7: This figure can be used to see the difference between ‖f − g‖1 and ‖f − g‖∞ .

14

Page 138: Introduction, Motivation, Basics About Sets, Functions, Counting

8 Open and Closed Sets in a Normed Vector Space and Other Definitions.

Here we give a very brief discussion of some concepts from point set topology, in particular open and closed sets. Youcould probably live without more definitions (unless you plan to go to grad school in math.). But I find that in discussingcontinuity, these ideas clarify things. If you always hated εδ, you will be happy to learn that these ideas allow you to forgetabout εδ.

Definition 16 An open set U in a normed vector space V is a subset of V such that ∀a ∈ U, ∃ r > 0 such that the openball of radius r and center a, B(a, r) = {x ∈ V | ‖x− a‖ < r} ⊂ U .

This means that U has no hard edges - no boundary points. As an example, if V = R with the usual absolute value asnorm, the open interval (a, b) is an open set. In any normed vector space V, the open ball B(a, r) is an open set.A closed set F in a normed vector space V is a subset of V such that the complement F c is an open set. Closed sets

have hard edges. For example, if V = R with the usual absolute value as norm, the closed interval [a, b] is a closed set. Inany normed vector space V, the closed ball B(a, r) ={x ∈ V | ‖x− a‖ ≤ r} is a closed set.

See Figure ?? for pictures of open and closed sets.

Figure 8: pictures of an open set (top) and a closed set (bottom). The boundary of the open set is dashed to indicate thatit is not part of the set.

Theorem 17 Properties of Open & Closed Sets in a normed vector space V1) The empty set ∅ and V are both open and closed.2) Finite intersections of open sets are open.

Arbitrary intersections of closed sets are closed.3) Arbitrary unions of open sets are open.

Finite unions of closed sets are closed.

Proof. I will leave most of these proofs as exercises. Let me do parts of 2) and 3).

2) Suppose that U1, ..., Un are open sets in V . If p ∈n⋂i=1

Ui, then ∀i, ∃δi > 0 such that B(p, δi) ⊂ Ui. Let δ =

min{δ1, ..., δn}. Then B(p, δi) ⊂n⋂i=1

Ui. Thusn⋂i=1

Ui is open. This proof is pictured for n = 2 in Figure 9.

15

Page 139: Introduction, Motivation, Basics About Sets, Functions, Counting

Figure 9: An open turquoise star is intersected with an open purple rhombus. The red point is in the intersection of the 2sets. Then 2 discs with centers at the red point are found, the yellow disc staying inside the star and the pink disc stayinginside the rhombus.

3) Suppose {Ui}i∈I denotes an arbitrary collection of open sets in V. If p ∈⋃i∈IUi, then p ∈ Ui, for some i and thus

∃δi > 0 such that B(p, δ) ⊂ Ui ⊂⋃i∈IUi. It follows that

⋃i∈IUi is open.

The facts about closed sets can be derived from the facts about open sets using (A ∪B)c = Ac ∩Bc.A set may be neither open nor closed. For example, consider the half-open interval (a, b].

Definition 18 The closure A of a subset A of a complete normed vector space V is the set of points p ∈ V such that∀δ > 0,∃x ∈ A such that ‖x− p‖ < δ. This says that ∀δ > 0, B(p, δ) ∩A �= ∅.

You should picture a point in the closure of a set as a point sticking to the set. Some call points in the closure of Aadherent to A for that reason. See Figure 10.

Proposition 19 A point p is in A iff ∃ a sequence {xn} of points xn ∈ A such that p = limn→∞xn.

Proof. =⇒ If p ∈ A, for every n ∈ Z+, ∃xn ∈ B(p, 1n ) ∩A. It follows that p = limn→∞xn.

⇐= If p = limn→∞xn, for a sequence {xn} of points xn ∈ A, it follows that ∀δ > 0, ∃N s.t. n ≥ N implies xn ∈ B(p, δ)∩A.

This implies p ∈ A.Examples.1) Any point in S is in the closure of S; i.e., S ⊂ S. Just take x = p in the definition of closure.2) If V = R2, using any of our favorite norms, any point in the closed ball

{x ∈ R2 | ‖x‖ ≤ r} is in the closure of the

open ball B(0, r) ={x ∈ R2 | ‖x‖ < r} .

If a point x is in the complement of the closed ball, then it is not in the closure of B(0, r).Thus B(0, r) =

{x ∈ R2 | ‖x‖ ≤ r} .

3) What is the closure of Q=the rationals? Answer. R.4) What is the closure of { 1n |n ∈ Z+}? Answer. {0} ∪ { 1n |n ∈ Z+}.5) What is the closure of the interval (a, b) in R? Answer. [a, b].

16

Page 140: Introduction, Motivation, Basics About Sets, Functions, Counting

Figure 10: The red point is an element of the closure of the purple set.

6) What is the closure of the set {polynomials p(x) = anxn + · · ·+ a1x+ a0|aj ∈ R} in the space C[a, b] of continuousfunctions on a finite interval [a, b] using the ‖‖∞ norm? Answer. C[a, b] by the Weierstrass Theorem proved in a latersection saying that any continuous function on a finite interval can be uniformly approximated by polynomials.

Proposition 20 A set S in a normed vector space is closed iff S = S.

Proof. Suppose S is closed. Then Sc is open. First we know that S ⊂ S. Now to go the other way, suppose p ∈ Sc. Then∃δ > 0 such that B(p, δ) ⊂ Sc. This means that p is not in S. That shows S closed implies S = S.To go the other way, suppose S = S. We want to show that Sc is open. Suppose p ∈ Sc. Then we know that p is not

in S. By definition, this means ∃δ > 0 such that B(p, δ) ⊂ Sc. That is what we needed to show Sc open.Corollary 21 S is closed in a normed vector space V ⇐⇒ ∀ sequence {xn} of points xn ∈ S having a limit p = lim

n→∞xn in

V, it follows that p ∈ S.Proposition 22 A closed subset A of a complete normed vector space V is complete.

Proof. Suppose that {xn} is a Cauchy sequence in A. Since V is complete, we know ∃p ∈ V such that limn→∞xn = p. Since

A is closed, it follows from the preceding Corollary that p ∈ A.It can be useful to learn the following definition.

Definition 23 A point a in the set S − {a} is an accumulation point of the set S. This means that for every δ > 0, thedeleted ball B(a, δ)− {a} = {x | 0 < ‖x− a‖ < δ} contains a point of S.

A set S is closed iff it contains all its accumulation points.Examples.Example 1) The set Z of integers has no accumulation points. For n ∈ Z , one sees that (B(n, 12 )− {n}) ∩ Z = ∅.Example 2) What is the set of accumulation points of Q=the rationals? Answer. R.Example 3) What is the set of accumulation points of { 1n |n ∈ Z+}? Answer. {0}.Example 4) What is the set of accumulation points of the open ball B(a, r) in R2 using the usual norm? Answer.

The closed ball B(a, r) = {x ∈ R2 | ‖x− a‖2 ≤ r}.Another important word in the mathematician’s vocabulary is "compact." I will give a definition that works for normed

vector spaces but not more general spaces that mathematicians like to consider. We will not say much more about compactsets except to note that in finite dimensional normed vector spaces a set is compact iff it is closed and bounded. Herebounded means contained in a ball of radius r and center 0. This is false for infinite dimensional normed vector spaces,however. A ball of radius r in infinite dimensions is not compact. Very inconvenient.

Definition 24 A subset S of a normed vector space V is (sequentially) compact iff every sequence {xn} in S has asubsequence {xnk} converging to an element of S.See Lang, Undergraduate Analysis or Apostol, Mathematical Analysis, for more information on compact sets.

17

Page 141: Introduction, Motivation, Basics About Sets, Functions, Counting

9 Limits of Functions in Normed Vector Spaces

This brings us to the story of limits of functions in and on normed vector spaces. Why are we interested in this question?We want to know when we can interchange limit and integral, limit and derivative. We want to know whether a seriesof functions (such as a Taylor series or a Fourier series) converges. In fact, we need to know precisely what we mean byconvergence of a series of functions. We want to think of the definite integral as a function on the space C[a, b] of continuousfunctions on a finite closed interval [a, b]. What sort of function is it? Linear? Continuous?In fact, proving things about limits of functions on normed vector spaces is no harder than it was for functions on the

real line. We will basically copy the proofs from those of the analogous results for functions on the real line. Suppose thatV and W are normed vector spaces. I will use the same symbol for the norm on V and the norm on W . Suppose S ⊂ Vand f : S →W. I will make a more general hypothesis on the nature of a and S in lim

x→af(x) = L, however.

Definition 25 Suppose S ⊂ V and f : S →W. Here V and W are normed vector spaces. We will denote both norms by ‖‖though they may be different. Assume a is an accumulation point of S. Define the limit of f as x approaches a:

limx→ ax ∈ S

f(x) = limx→a

f(x) = L ∈W

to mean that for every ε > 0, there exists δ > 0 (depending on ε) such that x ∈ S and 0 < ‖x− a‖ < δ implies‖f(x)− L‖ < ε.

Examples.Example 1) Let V = C[a, b], the space of continuous real-valued functions on the finite interval [a, b]. Define I : V → R

by I(f) =

b∫a

f(x)dx. Suppose our norms are ‖f‖1 =b∫a

|f(x)| dx on V and the usual absolute value on R. Does

limf→0

I(f) = 0?

Answer: Yes. In fact, here δ = ε, since (using the fact that integrals preserve inequalities), we see that ‖f − 0‖1 < εimplies

|I(f)− I(0)| =∣∣∣∣∣∣b∫a

f(x)dx− 0∣∣∣∣∣∣ ≤

b∫a

|f(x)| dx = ‖f‖1 < ε.

Example 2) Set f(x, y) = y2−x2y2+x2 , for (x, y) �= (0, 0) in R2. Does lim

(x,y)→(0,0)f(x, y) = 0? Here you can use any of our

favorite norms on R2 and ordinary absolute value on R.Answer. No. The function f(x, y) is −1 on the x-axis and +1 on the y-axis. Set y = kx. Then f(x, kx) = k2−1

k2+1 .Thus f(x, y) has different values on various lines through the origin. It follows that there is no limit as (x, y) approaches theorigin. Figure 11 shows a 3D graph of the surface z = f(x, y) with contours down on the x, y-plane. Along the lines (x, kx)the values are k2−1

k2+1 . So the contours are straight lines.It is hard to see how badly the function fails to have a limit as (x, y) approaches (0, 0). Perhaps it is better just to think

about what has to happen to the surface z = f(x, y) if z has to be −1 on the x-axis and +1 on the y-axis.

There are more examples in the exercises. The properties of limits are similar to those for functions on the reals.

18

Page 142: Introduction, Motivation, Basics About Sets, Functions, Counting

Figure 11: 3D plot of z = f(x, y) = y2−x2y2+x2 .

19

Page 143: Introduction, Motivation, Basics About Sets, Functions, Counting

Notes on the fine points of the definition of limit.Suppose f : S → W. Before defining the limit lim

x→af(x), we need to make sure that there are points x ∈ S which are

arbitrarily close to a. If we don’t assure that, then the limit could be anything. Thus we should assume minimally that a ∈ S.Then S∩{x | x ∈ V, ‖x− a‖ < δ} �= ∅ for all δ > 0. This is what is done by Lang, Undergraduate Analysis, p. 160. However,this does not assure the non-emptyness of the intersection with the punctured ball: S ∩ {x | x ∈ V, 0 < ‖x− a‖ < δ} �= ∅for all δ > 0. For example Lang seems to allow S = {a} in the definition of limit. Moreover, Lang leaves out the assumption0 < ‖x− a‖ in the definition of limit. I cannot make myself do that. See also Dieudonné, Foundations of Modern Analysis,Volume I. Sagan, Advanced Calculus and Apostol, Mathematical Analysis, p. 77, assume that a is an accumulation pointof S. So I will do that. It won’t really matter for the examples we want to consider in infinite dimensional normed vectorspaces where our sets will be like C[a, b].Properties of Limits in Normed Vector Spaces.We assume that V,W are normed vector spaces, S ⊂ V, f, g : S →W and we assume that a is an accumulation point

of S.Property 1) Uniqueness.Suppose lim

x→af(x) = L and lim

x→af(x) =M . Then L =M .

Property 2) Linearity.Suppose α, β ∈ R and lim

x→af(x) = L and lim

x→ag(x) =M. Then lim

x→a(αf(x) + βg(x)) = αL+ βM .

Property 3) Composite.Suppose we have 3 normed vector spaces V,W,Z and functions f : S → T, g : T → Z, with the usual assumptions on

the sets S ⊂ V and T ⊂W as well as the points a and L. If limx→a

f(x) = L and limy→L

g(y) =M , then limx→a

g (f(x)) =M.

Property 4) Inequalities.Suppose f, g : S → R (with the absolute value making R a normed vector space) and f(x) ≤ g(x) for all x ∈ S. If

limx→a

f(x) = L and limx→a

g(x) =M, then L ≤M.Property 5) Independence of Which Equivalent Norm is Used.Using equivalent norms on either V or W leads to the same definition of lim

x→af(x) = L.

Some Proofs.Proof. Property 1) First we see that

‖L−M‖ = ‖L− f(x) + f(x)−M‖ ≤ ‖L− f(x)‖+ ‖f(x)−M‖ .Then, for any ε > 0, ∃ δ1 such that 0 < ‖x− a‖ < δ1 implies ‖L− f(x)‖ < ε

2 . And ∃δ2 such that 0 < ‖x− a‖ < δ2implies ‖f(x)−M‖ < ε

2 . Thus taking δ = min{δ1, δ2} we see that 0 < ‖x− a‖ < δ implies‖L−M‖ ≤ ‖L− f(x)‖+ ‖f(x)−M‖ < ε.

Since ε was arbitrary this means ‖L−M‖ = 0 by a result from Lectures I, Section 11. By the first axiom of norms, thisimplies L−M = 0.

Proof. Property 2) First we note that

‖(αf(x) + βg(x))− (αL+ βM)‖ = ‖αf(x)− αL+ βg(x)− βM‖≤ ‖αf(x)− αL‖+ ‖βg(x)− βM‖= |α| ‖f(x)− L‖+ |β| ‖g(x)−M‖ .

Now ∃ δ1 so that 0 < ‖x− a‖ < δ1 implies ‖f(x)− L‖ < ε2(1+|α|) . And ∃ δ2 so that 0 < ‖x− a‖ < δ2 implies

‖g(x)−M‖ < ε2(1+|β|) . Then let δ = min{δ1, δ2} so that 0 < ‖x− a‖ < δ implies

‖(αf(x) + βg(x))− (αL+ βM)‖ ≤ |α| ‖f(x)− L‖+ |β| ‖g(x)−M‖<

ε| |α|2 (1 + |α|) +

ε |β|2 (1 + |β|) < ε.

Proof. Property 3) ∀ε ∃δ1 such that 0 < ‖y − L‖ < δ1 implies ‖g(y)−M‖ < ε.

20

Page 144: Introduction, Motivation, Basics About Sets, Functions, Counting

And ∃δ2 such that 0 < ‖x− a‖ < δ2 implies ‖f(x)− L‖ < δ1.Then 0 < ‖x− a‖ < δ2 implies, setting y = f(x), ‖g(y)−M‖ < ε since 0 < ‖y − L‖ < δ1.

Figure 12: Picture of the proof of property 4 of limits. Here we assume h(x) ≥ 0 and limx→a

h(x) = K < 0. The h(x) values

are red circles to the right of 0. The limiting value K is a blue star to the left of 0. No way can this happen since the redcircles can never get close to the blue star.

Proof. Property 4) Define h(x) = g(x) − f(x). Then h(x) ≥ 0 for all x ∈ S. Let K = M − L. If K < 0, we can geta contradiction. We know from Property 1 of Limits that K = M − L = lim

x→ah(x). See Figure 12. The red circles are the

values of h(x) to the right of 0 and the blue star is K, to the left of 0. Take ε = |K|2 . Then ∃δ such that 0 < ‖x− a‖ < δ

implies |h(x)−K| < |K|2 .

Proof. Then, since K is negative, h(x) −K < −K2 and, adding K to both sides, h(x) < K

2 < 0, a contradiction toh(x) ≥ 0.Exercise. Prove Property 5) of Limits.We won’t discuss limits of products in general. You can refer to Lang, Undergraduate Analysis, p.162-3 for the general

story of limits of products. You get the joy of considering a special case in an exercise. There are lots of other cases one couldlook at; e.g., scalar valued function times vector valued function, matrix valued function times matrix valued function,.....If you hate εδ stuff, you will love the following theorem, which allows you to think about limits of sequences instead.

Theorem 26 Sequential Definition of Limits. Suppose S ⊂ V and f : S → W, where V and W are normed vectorspaces. Assume that a is an accumulation point of the set S. Then the existence of lim

x→af(x) = L is equivalent to saying

that for every sequence of vectors {xn} in S such that limn→∞xn = a, we have lim

n→∞f(xn) = L exists.

Proof. =⇒ We leave this part as an exercise.⇐= Proof by Contradiction. Suppose ∀ {xn} in S s.t. lim

n→∞xn = a, we have limn→∞f(xn) = L. If, by contradiction,

limx→a

f(x) does not equal L, then (using the rules for negating a statement involving lots of ∀∃), we see that ∃ε > 0 s.t.

∀n ∈ Z+, ∃ xn ∈ S with 0 < ‖xn − a‖ < 1n and ‖f(xn)− L‖ ≥ ε. Since then lim

n→∞xn = a, this is a contradiction to

limn→∞f(xn) = L.Maybe we should try to draw a picture of the definition of limit in higher dimensions. The problem is that it is hard to

draw the graph of a function unless it maps a subset of the plane into the reals. Here the graph of z = f(x, y) is 3-dimensional.Just plot the points (x, y, f(x, y)) in 3-space. Of course in the infinite dimensional case good luck drawing pictures. Evendrawing the graph of a function from the plane to the plane requires 4 dimensional pictures. You can still project themdown to 2 dimensions as you would in the case of a real valued function of 2 variables. Or you can make a movie of thegraph being rotated.

21

Page 145: Introduction, Motivation, Basics About Sets, Functions, Counting

10 Continuous Functions in Normed Vector Spaces

Suppose that V,W are normed vector spaces and S is a subset of V.

Definition 27 f : S → W is continuous at c ∈ S iff for every ε > 0, there exists δ > 0 (depending on ε) such thatx ∈ S and ‖x− c‖ < δ implies ‖f(x)− f(c)‖ < ε.

We see (assuming that c is an accumulation point of S) that f : U →W is continuous at c ∈ U iff limx→c

f(x) = f(c).

The following theorem allows you to erase εδ from your vocabulary. For a proof, see Apostol, Mathematical Analysis,p. 82.

Theorem 28 Suppose that V,W are normed vector spaces and S is a subset of V. We say that f : S →W is continuousat c ∈ S iff for every open set Y ⊂W, the inverse image f−1(Y ) = {x ∈ V | f(x) ∈ Y } is open in V .

The definition includes some slightly crazy continuous functions. If V =W = R and S = Z, every point of Z is isolatedmeaning that there is a δ > 0 such that B(p, δ) ∩ Z = {p}. Take δ = 1

2 . Thus every function f : Z→ R is continuous.When V = W = R, we view continuity to mean that the graph of y = f(x) does not break up at x = c. When

V = R2, you can think a similar thing about the surface z = f(x, y) in 3-space. But recalling Figure 11 of the functionf(x, y) = y2−x2

y2+x2 , it is even hard to see the break up for a function of 2 variables. It is easier to recall that we saw that f(x, y)has a different value on various lines through the origin. It is 1 on the y-axis and -1 on the x-axis, for example.We can use the properties of limits to deduce the following properties of continuous functions.

Properties of Continuous Functions.Property 1) Linearity. Suppose that f, g : U → W, where U ⊂ V and V,W are normed vector spaces. Let α, β be(real) scalars. Then f and g continuous at c ∈ U implies that (αf + βg) is continuous at c.Property 2) Composition. Suppose that V,W,Z are normed vector spaces with U ⊂ V and T ⊂W. Let c ∈ U. Supposethat f : U → T and g : T → Z. Suppose f is continuous at c and g is continuous at f(c). Then g ◦ f is continuous at c.Property 3) Sequential Definition of Continuity.Assume V,W are normed vector spaces with U ⊂ V. The function f : U → W is continuous at c ∈ U iff ∀ sequence

{vn} of vectors in V such that limn→∞vn = c, we have lim

n→∞f (vn) = f (c) .

For the proofs, you just have to look at the proofs of the corresponding properties of limits. We leave it to you as anexercise.

Examples (the same as those in the section on limits).Example 1) Let V = C[a, b], the space of continuous real-valued functions on the finite interval [a, b]. Define I : V → R

by I(f) =

b∫a

f(x)dx. Suppose we use ‖f‖1 =b∫a

|f(x)| dx on V and the usual absolute value on R. We showed earlier that

the linear function I(f) is actually continuous at f = 0. Now I claim I(f) is continuous on V ; i.e., continuous everywhere.Why? Using properties of the integral on continuous functions that we proved in Lectures I,

|I(f)− I(g)| = |I(f − g)| ≤ I (|f − g|) = ‖f − g‖1 .

This means that given ε > 0, we can take δ = ε and then ‖f − g‖1 < ε implies |I(f)− I(g)| < ε (which is the εδdefinition of continuity at g (or f). In fact, since δ depends only on ε and not on f or g, we have proved that the functionI(f) is uniformly continuous - a concept we are about to define.

Exercise. Is I(f) still continuous when we replace the norm ‖f‖1 on V with ‖f‖2? Explain your answer.

Example 2) Look again at the function f(x, y) = y2−x2y2+x2 , for (x, y) �= (0, 0) in R2. We know that this function cannot be

continuous at (0, 0) since it has no limit as (x, y) → (0, 0). This function is continuous at every other point though. Thereare more such examples in the exercises.

22

Page 146: Introduction, Motivation, Basics About Sets, Functions, Counting

Definition 29 Suppose that V,W are normed vector spaces and U ⊂ V . We say that f : U →W is uniformly continuouson U iff ∀ε > 0 ∃δ > 0 (with δ depending only on ε) such that ∀ u, v ∈ U, ‖u− v‖ < δ implies ‖f(u)− f(v)‖ < ε.

The point in this definition is that δ does not depend on u, v ∈ U.Example 1 just considered is an example of a uniformly continuous function where in fact δ = ε. We get lots more

examples using the following theorem.

Theorem 30 Suppose K is a compact subset of a normed vector space V and W is any normed vector space. A continuousfunction f : K →W must be uniformly continuous on K.

For a proof of the preceding theorem, see p. 198 of Lang, Undergraduate Analysis.

More Examples.Example 3) V=normed vector space. Let f(x) = ‖x‖ . Then f is uniformly continuous on V . For all x, y ∈ V, thetriangle inequality implies (as it did for ordinary absolute value in an early homework problem from part 1),

|‖x‖ − ‖y‖| ≤ ‖x− y‖ .

This says we can take ε = δ again.

Example 4) Suppose that L : Rn → Rm is linear. Then take ej to be the standard basis vector ej =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎝

0...01...0

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎠, with a 1

in the jth row and the rest of the entries being 0. Every vector v ∈ Rn, can be written uniquely in the form:

v =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎝

v1...

vj−1vj...vn

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎠=

n∑j=1

vjej .

It follows from linearity of L, that

Lv =n∑j=1

vjLej .

Write Lej =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎝

a1j...

aj−1,jajj...amj

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎠∈ Rm. So we see that

Lv =n∑j=1

vj

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎝

a1j...

aj−1,jajj...amj

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎠= Av,

23

Page 147: Introduction, Motivation, Basics About Sets, Functions, Counting

where we multiply the matrix A whose entries are aij with the vector v. As an exercise, show that the linear function L isuniformly continuous. You can see this by using the infinity norm on Rn and Rm and showing that there is a constant C > 0so that ‖Lx‖∞ ≤ C ‖x‖∞ . The constant C depends on the entries aij of the matrix A. If you take the K = max |aij | ,then C = nK should work.When a normed vector space V is not complete, one can always obtain a larger complete space W containing V. This

is done by taking all the Cauchy sequences {xn} of elements xn ∈ V modulo the equivalence relation {xn} � {yn} iff‖xn − yn‖ → 0 as n→∞. This larger space W is then called the completion of V. See Lang, Undergraduate Analysis formore information on completions.

11 Completeness of C[a, b] with respect to the ∞ Norm.

We promised earlier to prove the following theorem.

Theorem 31 The normed vector space C[a, b] of continuous real valued functions on the finite interval [a, b] is complete withrespect to the norm ‖f‖∞ = max

x∈[a,b]|f(x)| .

Proof. Recall that V "complete" means every Cauchy sequence in V converges to an element of V . So let {fn} be a Cauchysequence in C[a, b] using the norm ‖f‖∞. This means for every x ∈ [a, b], the sequence {fn(x)} of real numbers is Cauchy;as ∀ε ∃Nε s.t.

|fn(x)− fm(x)| ≤ ‖fn − fm‖∞ < ε when n,m ≥ Nε. (1)

We showed in Lectures I that Cauchy sequences of real numbers converge to a limit in R. Thus ∀x ∈ [a, b] there is a functionf(x) = lim

n→∞fn(x). Now we need to show that fn converges uniformly to f on [a, b].

Let ε > 0 be given. There is M =M(x, ε) ≥ Nε so that m ≥M implies |fm(x)− f(x)| < ε. Then for n ≥ Nε we havethe following sneaky formula by adding and subtracting fm(x) and using the triangle inequality:

|f(x)− fn(x)| ≤ |f(x)− fm(x)|+ |fm(x)− fn(x)| < ε+ ‖fn − fm‖∞ < 2ε. (2)

We chose Nε so that formula (1) holds. This implies ‖f − fn‖∞ < 2ε for n ≥ Nε which is uniform convergence of fn to fon [a, b] since Nε does not depend on x.Next we need to show that f is continuous on [a, b]. To see this, note that for x, y ∈ [a, b], using the triangle inequality

in a sneaky way again (this time adding and subtracting fn(x)− fn(y)):|f(x)− f(y)| ≤ |f(x)− fn(x) + fn(x)− fn(y) + fn(y)− f(y)|

≤ |f(x)− fn(x)|+ |fn(x)− fn(y)|+ |fn(y)− f(y)| .We know that for n ≥ Nε the 1st and 3rd terms are < 2ε by formula (2). Since fn is continuous, there is a positive δ,

depending on n, ε and y such that |x−y| < δ implies the middle term is also < ε. So the final result is that |f(x)− f(y)| < 5ε,if |x− y| < δ. Replace ε by ε/5, if you like.

Part II

Series in a Normed Vector Space12 The Basics

In this section I will be more sketchy than usual. Hope that is OK and that you remember some of the series part of calculus.We are mostly interested in series of functions like power series and Fourier series. That is, we are interested in series ina normed vector space like C[a, b] that is infinite dimensional. And we want to know: can we interchange limit and

∑,

derivative and∑, integral and

∑? The answer will depend on the norm used. Before doing the infinite dimensional theory

of series we’d better review the theory of series of real numbers.First define what we mean by convergence of a series in a normed vector space; i.e., convergence of the sequence of partial

sums.

24

Page 148: Introduction, Motivation, Basics About Sets, Functions, Counting

Definition 32 Suppose V is a normed vector space. Let {vn} be a sequence of vectors in V . Then we say the series∞∑n=1

vn

converges to s and write∞∑n=1

vn = s iff s = limn→∞

n∑k=1

vk.

So if∞∑n=1

vn = s, the sequence of partial sums sn =n∑k=1

vk is a Cauchy sequence, which means ∀ε ∃N s.t. ‖sn − sm‖ < ε

if n,m ≥ N. But if we assume n > m ≥ N, then sn − sm =n∑k=1

vk −m∑k=1

vk =n∑

k=m+1

vk. Thus we have

∥∥∥∥∥n∑

k=m+1

vk

∥∥∥∥∥ < ε.Now take n = m+1 ≥ N . Then we see that ‖vn‖ < ε. We have proved the following necessary condition for convergence

of a series.

Necessary but NOT Sufficient Condition for Convergence. The series∞∑n=1

vn converges implies that the terms

approach 0; i.e. limn→∞vn = 0.

This condition is not sufficient for convergence. The integral test will give us many examples (e.g., the harmonic series).

13 Tests for Convergence of Series of Non-Negative Real Numbers.

Test 1. Suppose an ≥ 0 for all n ∈ Z+. Then the series∞∑n=1

an converges iff the partial sums sn =n∑k=1

ak are bounded.

Proof. The partial sums form an increasing sequence. We know from Lectures I that an increasing sequence sn convergesiff it is bounded and then it converges to the least upper bound of the set {s1, s2, s3, ...}.

Test 2. Comparison Test. Suppose∞∑n=1

an and∞∑n=1

bn are series of real numbers and there exists a positive constant

C (independent of n) such that 0 ≤ an ≤ Cbn for all n ≥ N.a) If

∞∑n=1

bn converges, then so does∞∑n=1

an.

b) If∞∑n=1

an diverges, then so does∞∑n=1

bn .

Proof. You can ignore all the an for n < N . They cannot affect the convergence of∞∑n=1

an. Explain why as an exercise.

a) So we will just assume the first N terms an are 0. Then we can show that the partial sumsn∑k=1

ak are bounded and

use Test 1 to complete the proof. For we have

n∑k=1

ak ≤ Cn∑k=1

bk ≤ C∞∑k=1

bk.

b) Exercise.

Test 3. Ratio Test. Suppose that 0 ≤ an.a) If ∃ a positive constant C such that 0 < C < 1 and an+1 ≤ Can, for all n ≥ N, then

∞∑n=1

an converges.

b) If ∃ a positive constant C such that C > 1 and an+1 ≥ Can, for all n ≥ N, then∞∑n=1

an diverges.

25

Page 149: Introduction, Motivation, Basics About Sets, Functions, Counting

Proof. a) Compare with the geometric series∞∑n=0

Cn = 11−C . We can, as before, ignore the first N terms of the series.

Then, by induction, we can show that aN+k ≤ CkaN , for k = 1, 2, ..... Again you can do this as an exercise. Thecomparison test will give the conclusion of part a).b) Again compare with the geometric series. Once more, ignore the first N terms and obtain by induction aN+k ≥ CkaN ,

for k = 1, 2, .... The comparison test will then imply the conclusion of part b) since for C > 1,∞∑n=0

Cn diverges. In fact,

the terms Cn do not approach 0.Question: Where is the ratio in the ratio test? It is concealed in the hypothesis: an+1 ≤ Can. Divide by an ifan �= 0 and you have a ratio.Exercise. State and prove the nth root test.

Test 4. The Integral Test.Suppose 0 ≤ f(x) for all x ≥ 1 and that f(x) is monotone decreasing and continuous. Then

∞∑n=1

f(n) converges iff

∞∫1

f(x)dx converges.

Proof. Draw a picture.

Figure 13: a picture proof of the integral test

From Figure 13 you see that

f(n) ≤n∫

n−1f(x)dx ≤ f(n− 1).

Add these inequalities up from n = 2 to N and obtain

N∑n=2

f(n) ≤N∫1

f(x)dx ≤N−1∑n=1

f(n).

We let you finish the argument as an exercise.There is yet one more such test - the nth root test. We leave this one to you to remember or Google.

26

Page 150: Introduction, Motivation, Basics About Sets, Functions, Counting

Example.My favorite function is the Riemann zeta function.

ζ(s) =∞∑n=1

1

ns.

The integral test says that this converges if s > 1 and diverges if s ≤ 1.The case s = 1 is the divergent harmonic series. The cases s = 2n an even integer > 1 were evaluated by Euler as

rational numbers times π2n. Some have wasted much time thinking about the case of s = 3 and the like producing somehorrendous formulas but not anything like Euler’s formula.Riemann wrote a paper about 1850 showing how to make sense of ζ(s) for all complex numbers s except for s = 1. Why

was Riemann interested in this? He was interested in the distribution of prime numbers 2, 3, 5, 7, 11, .... Euler had showedthe product formula that for Re s > 1,

ζ(s) =∏

p=prime

(1− 1

ps

)−1.

Here we define the infinite product as the limit of the partial products. We won’t prove this formula. It relies on thegeometric series and the fundamental theorem of arithmetic (every n ≥ 1 is a product of powers of primes and the productis unique up to order).Thanks to complex analysis, it turns out that the location of the complex numbers s such that ζ(s) = 0 gives information

about the distribution of primes. About 50 years after Riemann’s work Hadamard and de la Vallée Poussin used what wasknown about the location of zeta zeros to show the prime number theorem which says that the number of primes ≤ x isasymptotic to x

log x as x→∞.Riemann stated the Riemann hypothesis saying that the non-real zeros of ζ(s) must have Re s = 1

2 . You win 1 milliondollars from the Clay Math. Institute if you can prove it. However, I think there was a deadline for the proof and thatdeadline has passed. A reference for zeta is Edwards, Riemann’s Zeta Function.

14 Non-Absolute Convergence

Theorem 33 The Alternating Series Test. Suppose 0 ≤ an ≤ an−1 for all n and limn→∞an = 0. Then the series

∞∑n=1

(−1)n−1an converges. This sort of convergence is called conditional convergence if∞∑n=1

an diverges.

Proof. Consider the even and odd partial sums separately.Case 1. The Odd partial sums

s2n−1 = a1 − (a2 − a3)− · · · − (a2n−2 − a2n−1)

s2n+1 = a1 − (a2 − a3)− · · · − (a2n−2 − a2n−1)− (a2n − a2n+1)= s2n−1 − (a2n − a2n+1)≤ s2n−1

This implies (since the terms an are decreasing) that the odd partial sums are decreasing.Case 2. The even partial sums

s2n = (a1 − a2) + (a3 − a4) + · · ·+ (a2n−1 − a2n).

s2n+2 = (a1 − a2) + (a3 − a4) + · · ·+ (a2n−1 − a2n) + (a2n+1 − a2n+2)= s2n + (a2n+1 − a2n+2)≥ s2n.

27

Page 151: Introduction, Motivation, Basics About Sets, Functions, Counting

This says the even partial sums are increasing.We also know the relation between even and odd partial sums, given by

s2n = s2n−1 − a2n ≤ s2n−1So the picture of the partial sums is

s2 ≤ s4 ≤ · · · ≤ s2n ≤ s2n−1 ≤ · · · ≤ s3 ≤ s1.

The even partial sums are an increasing sequence of real numbers which is bounded above and thus they have a limit

L = limn→∞s2n.

using stuff from Lectures I (the existence of the l.u.b.). The odd partial sums are a decreasing sequence bounded below andthus have a limit

M = limn→∞s2n+1.

To see thatL =M = lim

n→∞sn,

use the relation between even and odd partial sums s2n = s2n−1 − a2n and the hypothesis that limn→∞an = 0 to obtain

(using properties of limits):

L = limn→∞s2n = lim

n→∞ (s2n−1 − a2n) = limn→∞s2n−1 − lim

n→∞a2n = limn→∞s2n−1 − 0 = lim

n→∞s2n−1 =M.

Example.∞∑n=1

(−1)n−1 1n converges by the alternating series test. With some effort, using the Taylor series for log(1−x) =∞∑n=1

xn

n , which has radius of convergence 1, it is possible to show that

log 2 =∞∑n=1

(−1)n−1 1n= 1− 1

2+1

3− 14+1

5− 16+1

7− 18+− · · · .

If one rearranges this series to sum one odd then two evens:

s = 1− 12− 14+1

3− 16− 18+1

5− 1

10− 1

12+− · · · ,

with a little algebra, one can show that s= log 22 . That is, we get half the sum of the original series with alternating signs.

Theorem 34 Riemann’s Rearrangement Theorem. You can rearrange a series that converges conditionally to make

it converge to anything or diverge to ∞ or −∞. Rearrangement of the series∞∑n=1

bn means take a 1-1,onto map σ of the

positive integers onto the positive integers and look at the series∞∑n=1

bσ(n).

This moral of this theorem is that conditionally or non-absolutely convergent series are quite nasty. Absolute convergence

of∞∑n=1

vn means∞∑n=1

‖vn‖ converges in R and will be discussed in the next section. For a proof of Riemann’s Rearrangement

Theorem, see Sagan, Advanced Calculus.

28

Page 152: Introduction, Motivation, Basics About Sets, Functions, Counting

15 Absolute Convergence in Normed Vector Spaces

Suppose that V is a complete normed vector space (recall complete means contains limits of all Cauchy sequences whichmust converge to limits in V ).

Theorem 35 Absolute Convergence =⇒ Convergence in a Complete Normed Vector Space.Suppose V is a complete normed vector space and vn ∈ V for all n. Then

∞∑n=1

‖vn‖ converges in R implies∞∑n=1

vn converges to a limit s in V.

This is called absolute convergence of∞∑n=1

vn.

Proof. We will show that the sequence sn =n∑k=1

vk of partial sums is Cauchy and then completeness of V says that it must

converge to a limit s ∈ V . To see this, assume n > m and use the triangle inequality to obtain

‖sn−sm‖ =∥∥∥∥∥

n∑k=m+1

vk

∥∥∥∥∥ ≤n∑

k=m+1

‖vk‖ .

We can chose N to make the last sum < ε for n > m ≥ N by the hypothesis that∞∑n=1

‖vn‖ converges in R.

Example. Suppose X is an m×m real matrix. Define eX = exp(X) =∞∑n=0

1n!X

n. Use the operator norm on X defined

by‖X‖ = max {‖Xv‖ | v ∈ Rm, ‖v‖ = 1} .

Here ‖v‖ denotes any of our favorite norms on Rm. We know we can say max rather than l.u.b. or sup by Theorem 2.2,p. 197, of Lang, Undergraduate Analysis, because the unit sphere in Rm is compact and the function on Rm producedby taking vector v to Xv is continuous linear. More about matrix norms can be found in Strang, Linear Algebra and itsApplications, Chapter 7 or Horn and Johnson, Matrix Analysis, Chapter 5.Then one can show that

‖XY ‖ ≤ ‖X‖ ‖Y ‖ .

Since∞∑n=0

1n! ‖X‖n always converges, we know that the matrix exponential series always converges.

Theorem 36 If∞∑n=1

vn converges absolutely then any rearrangement converges to the same limit.

Proof. See Lang, Undergraduate Analysis, p. 227.

For the following convergence test recall that C[a, b] is the space of continuous functions on a finite closed interval [a, b]and that this space is complete using the norm ‖f‖∞ = max

X∈[A,B]|f(x)| . Recall that we needed this next test in the final

exam from Lectures I when we looked at the fractal nature of the Weierstrass nowhere differentiable continuous function.This test gave the continuity.

Theorem 37 Weierstrass M-Test.

Theorem 38 Let fn ∈ C[a, b]. Suppose ‖fn‖∞ ≤ Mn for all n = 1, 2, 3, ... and suppose that∞∑n=1

Mn converges in R.

Then∞∑n=1

fn converges in the ‖‖∞ norm (i.e., uniformly) to a continuous function f ∈ C[a, b].

29

Page 153: Introduction, Motivation, Basics About Sets, Functions, Counting

Proof. We make our usual argument to see that the sequence of partial sums of∞∑n=1

fn is Cauchy with respect to the ‖‖∞

norm. Then completeness of C[a, b] completes the proof that∞∑n=1

fn converges to a continuous function f .

That is, the nth partial sum sn =n∑k=1

fk implies sn− sm =n∑k=1

fk −m∑k=1

fk =n∑

k=m+1

fk. By the triangle inequality and

the hypothesis of the Weierstrass M-Test, ∀ε > 0,∃N s.t. n > m ≥ N implies∥∥∥∥∥n∑

k=m+1

fk

∥∥∥∥∥ ≤n∑

k=m+1

‖fk‖ ≤n∑

k=k+1

Mk < ε.

It follows that the sequence sn is Cauchy and the series∞∑n=1

fn converges uniformly as we have proved that C[a, b] is

complete in the ‖‖∞−norm.

Example 1.∞∑n=1

sin(nx)n2 converges uniformly and absolutely on any interval [a, b].

Here use the M-Test with Mn =1n2 . The series

∞∑n=1

Mn =∞∑n=1

1n2 converges by the integral test.

If we replace 1n2 with 1

n , the question of convergence is more delicate. We will think about it when we look at Fourierseries later.

Example 2. Let 0 < c < 1. Then∞∑n=0

xn converges uniformly and absolutely to 11−x for all x ∈ [−c, c].

Here use the M-Test with Mn = cn. The series

∞∑n=0

Mn =∞∑n=0

cn = 11−c (using the geometric series).

Example 3. Integral Operators on C[0, 1] = V.Let ‖f‖ denote any of our 3 favorite norms on C[0, 1]. Suppose that K(x, y) is a continuous function of 2 variables for

x, y ∈ [0, 1]. As an example, consider sin(xy). Define the integral operator L with kernel K to be

Lf(x) =

1∫0

K(x, y)f(y)dy.

Then L : V → V is linear and can be shown to be continuous. We can define a norm on the space L(V, V ) of suchoperators L to be ‖L‖ = lub {‖Lf‖ | f ∈ V, ‖f‖ = 1} . This is called the operator norm.

One has an operator-valued geometric series for λ ∈ R:∞∑n=0

λnLn = (I − λL)−1 . One can show, using propertiesof the operator norm that this series converges when ‖λL‖ = |λ| ‖L‖ < 1. This fact is quite useful when considering thespectral theory of integral and differential operators. See Courant and Hilbert, Methods of Mathematical Physics, Vol. Iand Stakgold, Green’s Functions and Boundary Value Problems for more information

Part III

Power Series16 Radius of Convergence

Here we consider convergence and differentiation and integration of power series∞∑n=0

an(x−p)n. I will mostly assume p = 0for simplicity. We consider x ∈ R mostly. The same arguments should work for complex numbers and matrices, with a

30

Page 154: Introduction, Motivation, Basics About Sets, Functions, Counting

little thought.

Theorem 39 a) Suppose {an} is a sequence of real numbers and∞∑n=0

anxn converges for some x �= 0. Then

∞∑n=0

anun

converges absolutely for all u such that |u| < |x| .b) If

∞∑n=0

anxn diverges, then

∞∑n=0

anun diverges if |u| > |x| .

Proof. a)∞∑n=0

anxn converges implies lim

n→∞anxn = 0. Since convergent implies bounded, we have |anxn| ≤ M for all n.

This implies if |u| < |x| , we can apply the comparison test by writing

|anun| =∣∣∣∣anxnunxn

∣∣∣∣ ≤M ∣∣∣∣unxn∣∣∣∣ .

So we can compare the series∞∑n=0

|anun| with the convergent geometric series∞∑n=0

∣∣unxn

∣∣.We leave the proof of b) as an exercise. Hint: Use part a).We are calling the following positive number the radius of convergence because we (the definers) are really thinking of

power series in the complex plane where we actually get an open circle of radius R where the power series converges. Onthe real line where this course mostly lives, we only get an interval.

Definition 40 Now define the radius of convergence R of∞∑n=0

anxn

R = l.u.b.

{r

∣∣∣∣∣ r ≥ 0,∞∑n=0

|an| rn converges

}.

I am using the convention here that R (blackboard bold face font) is the set of real numbers and should not be confusedwith ordinary capital R. This convention comes to us from Nicolas Bourbaki (the unreal mathematician).

Corollary 41 Let S denote the set of real numbers x such that∞∑n=0

anxn converges. Then S must be of the following forms:

{0}, (−R,R), (−R,R], [−R,R), [−R,R], or (−∞,∞) = R.

Theorem 42 Formulas for the Radius of Convergence R of a Power Series∞∑n=0

anxn.

We assume an �= 0, ∀n.Formula 1) Assume the limit lim

n→∞

∣∣∣ anan+1

∣∣∣ exists. Then it is the radius of convergence R for∞∑n=0

anxn. That is

R = limn→∞

∣∣∣∣ anan+1∣∣∣∣ .

Formula 2) Again, assuming the limit involved exists, we have the formula:

1

R= lim

n→∞ |an|1n .

31

Page 155: Introduction, Motivation, Basics About Sets, Functions, Counting

Proof. Formula 1) Use the ratio test for absolute convergence of∞∑n=0

anxn. Setting lim

n→∞

∣∣∣ anan+1

∣∣∣ = c, we see that the ratioof the terms in the power series approaches a limit:∣∣∣∣an+1xn+1anxn

∣∣∣∣ = ∣∣∣∣an+1anx

∣∣∣∣→ |x|c, as n→∞

The ratio test proved above says the series converges if |x| < c. and diverges if |x| > c. Why? (Exercise). So c = R,the radius of convergence. This comes from the definition of radius of convergence as a least upper bound. If |x| > R, theseries diverges and if |x| < R, the series converges. That is what we needed to show.Formula 2) We leave this to the reader as an Exercise.

Example 1). Our friend the geometric series∞∑n=0

xn = 11−x . The ratio of convergence is 1 by any of the formulas since

an = 1, for all n.

Example 2). Our best friend the exponential series∞∑n=0

1n!x

n = ex. Use the first formula for the radius of convergence.

Obtain

R = limn→∞

∣∣∣∣ anan+1∣∣∣∣ = R = lim

n→∞

∣∣∣∣∣ 1n!1

(n+1)!

∣∣∣∣∣ = limn→∞

∣∣∣∣ (n+ 1)!n!

∣∣∣∣ limn→∞

∣∣∣∣ (n+ 1)1

∣∣∣∣ =∞.This means the series converges everywhere.The nice thing about this example is that the exponential series also converges for all complex numbers and for all n× n

matrices. This gives a nice way to get the power series for sine and cosine out of Euler’s formula

eiθ = cos θ + i sin θ.

Example 3). Here’s a bad one:∞∑n=0

n!xn.

The radius of convergence is the reciprocal of that for Example 2. Thus you get R = 0. This series only converges atx = 0.Exercise. Replace R in Examples 1 and 2 with the space Rn×n of n× n real matrices. Obtain the matrix geometric seriesand the matrix exponential. Where do they converge? Use the definition ‖L‖ = lub {‖Lv‖ | v ∈ Rn, ‖v‖ = 1} to get anorm on the space of matrices. See Apostol, Calculus, Vol. II for many applications of the matrix exponential to differentialequations.

17 Differentiation and Integration of Power Series

We want to legally interchange

b∫a

and limit (or∞∑n=0

) so that we can integrate a power series term-by-term at least if we

stay inside the radius of convergence. Similarly we want to legally interchange ddx and limit (or

∞∑n=0

),

Example.Integrate the geometric series to get the power series for log(1 − x). This is legal if |x| < 1 using the corollary of the

following theorem.

− log(1− x) =x∫0

1

1− tdt =x∫0

∞∑n=0

tndt =

∞∑n=0

x∫0

tndt =

∞∑n=0

tn+1

n+ 1

∣∣∣∣x0

=

∞∑n=0

xn+1

n+ 1.

Note that the formula makes sense if x = −1, although the following theorem and corollary do not justify the equality at −1.

32

Page 156: Introduction, Motivation, Basics About Sets, Functions, Counting

Theorem 43 (Interchange of limit and integral). Suppose that fn : [a, b] → R is continuous ∀n and convergesuniformly on [a, b] to f . Then

limn→∞

b∫a

fn =

b∫a

limn→∞fn =

b∫a

f.

Proof. We already know that f must be continuous on [a, b]. So we can integrate it. See Lectures, I. Using properties ofthe integral from Lectures I, we find that∣∣∣∣∣∣

b∫a

f −b∫a

fn

∣∣∣∣∣∣ =∣∣∣∣∣∣b∫a

(f − fn)∣∣∣∣∣∣ ≤

b∫a

|f − fn| ≤b∫a

‖f − fn‖∞ = ‖f − fn‖∞ (b− a).

It follows that given ε > 0, if we take N so that n ≥ N, implies ‖f − fn‖∞ < εb−a (as we can by uniform convergence),

then we have ∣∣∣∣∣∣b∫a

f −b∫a

fn

∣∣∣∣∣∣ < ε.As a corollary, we see that it is legal to integrate a series of functions term-by-term on a finite interval if the series

converges uniformly on that interval.

Corollary 44 Suppose gn : [a, b] → R is continuous ∀n and∞∑n=0

gn(x) converges uniformly on [a, b] to s(x). Then s(x) is

continuous on [a, b] and∞∑n=0

b∫a

gn =

b∫a

∞∑n=0

gn =

b∫a

s.

Proof. Exercise.The proof of the last theorem was essentially one line. But differentiation requires more effort, as well as more hypotheses.

Why should that be?Example. Consider fn(x) =

sin(nx)√n. This sequence converges uniformly to 0 on [−π, π]. Why? However,

f′n(x) =

√n cos(nx).

This derivative has problems converging at all, much less to 0. For example, it diverges at x = 0. In fact, it can beshown to diverge everywhere. Suppose x �= 0. You just need to see that if x �= 0, ∃N such that |cos (Nx)| < 1

2 and thus

|cos(2Nx)| = ∣∣2 cos2 (Nx)− 1∣∣ = 1− 2 cos2 (Nx) > 1

2.

It follows that there is a subsequence satisfying∣∣√nk cos(nkx)∣∣ > 1

2

√nk and thus diverging.

Exercise. Find an example to show that we need uniform convergence of fn to f in the hypothesis of the precedingtheorem. Pointwise convergence at each x in [a, b] does not suffice for the interchange of integral and limit.Hint. Look at fn(x) = n2x(1− x)n, on [0, 1].

Theorem 45 (Interchange of Derivative and Limit). Suppose {fn} is a sequence of continuously differentiablefunctions on [a, b]. Suppose, also that the sequence of derivatives {f ′n} converges uniformly to g on [a, b]. And, finallyassume there exists one point x0 ∈ [a, b] such that fn(x0) converges.

Then

g(x) = limn→∞

d

dxfn(x) =

d

dxlimn→∞fn(x).

and fn(x) converges uniformly to f(x), with f ′(x) = g(x), for all x ∈ [a, b].

33

Page 157: Introduction, Motivation, Basics About Sets, Functions, Counting

Proof. By the fundamental theorem of calculus from Lectures, I, ∀n, ∃cn s.t.

fn(x) =

x∫a

f ′n + cn. (3)

Here cn = fn(a).Let x = x0 and take the limit of formula (3) as n→∞. This says that ∃c = lim

n→∞cn.

Next take the limit of formula (3) for general x ∈ [a, b] as n → ∞ using the preceding Theorem (which we can sincethe sequence of derivatives converges uniformly). This says, the sequence {fn} converges pointwise (i.e., for each fixed x in[a, b]) to

f(x) =

x∫a

g + c.

To finish the proof of this theorem, we just need to see {fn} converges uniformly to f . Well, try this, using our favoritetriangle inequality and properties of integrals from Lectures I,

|fn(x)− f(x)| =

∣∣∣∣∣∣x∫a

f′n + cn −

x∫a

g − c∣∣∣∣∣∣

≤∣∣∣∣∣∣x∫a

f′n −

x∫a

g

∣∣∣∣∣∣+ |cn − c| ≤x∫a

∣∣∣f ′n − g∣∣∣+ |cn − c|≤ (b− a)

∥∥∥f ′n − g∥∥∥∞ + |cn − c| .

So if you insist on giving me an ε > 0, I can certainly find an N (depending only on ε and not on x) such that n > Nimplies |fn(x)− f | < ε. That is the meaning of uniform convergence.

Corollary. Suppose gn(x) [a, b] → R is continuously differentiable, ∀n and∞∑n=0

g′n(x) converges uniformly on [a, b] to

s(x). And assume there is one point x0 ∈ [a, b] such that∞∑n=0

gn(x0) converges. Then we have∞∑n=0

gn(x) converges to r(x)

uniformly on [a, b] and s(x) = r′(x); i.e.,

∞∑n=0

d

dxgn(x) =

d

dx

∞∑n=0

gn(x).

Proof. Exercise. Deduce this corollary from the preceding theorem.Examples. For our power series, these 2 theorems, allow us to integrate and differentiate term by term on closed subintervalsof (−R,R), where R is the radius of convergence. I find this somewhat surprising, since the power series you get bydifferentiation term-by-term blows up the term by a factor of n:

d

dxanx

n = nanxn−1.

It is also a bit surprising that integration does not produce a series that converges in a larger region since

x∫0

antndt = an

xn+1

n+ 1, for n ≥ 0.

Theorem 46 Suppose f(x) =∞∑n=0

anxn with radius of convergence R. Then for x in (−R,R), we have the following facts.

34

Page 158: Introduction, Motivation, Basics About Sets, Functions, Counting

1)x∫0

f(t)dt =

∞∑n=0

anxn+1

n+ 1.

2)

f ′(x) =∞∑n=0

nanxn−1.

The integrated and differentiated series have the same radius of convergence R.

Proof. 2) Note that if R is its radius of convergence, a power series converges uniformly on closed subintervals of (−R,R).So we just have to convince ourselves that the extra factor of n in nan does not affect the radius of convergence.

To see this, note that if 0 < |x| < c < R, we know by the definition of radius of convergence R on the first page of thispart of the lectures, that

∞∑n=0

|an| cn < ∞. Thus limn→∞anc

n = 0. It follows that there is a bound M such that |ancn| ≤ M ,for all n.Therefore we have:

n |an| |x|n−1 = n

c

( |x|c

)n−1|an| cn ≤Mn

c

( |x|c

)n−1.

So∞∑n=0

n |an| |x|n−1 can be compared with∞∑n=0

n(|x|c

)n−1. This series converges by the ratio test as the ratios are

(n+ 1)(|x|c

)nn(|x|c

)n−1 =n+ 1

n

|x|c→ |x|

c< 1, as n→∞.

1) is left to the reader for Exercise.

Examples. The power series for ex, sin(x), cos(x) converge absolutely and uniformly on closed and bounded sets in thereal line. So they are differentiable and integrable term by term. The same holds if we replace R with C or n× n matrices.

18 Taylor Series

In Lectures, I, we proved Taylor’s formula with remainder:

f(x) =n−1∑k=0

f (k)(a)

k!(x− a)k +Rn.

We had 2 formulas for the remainder. The easiest to remember is Rn =f(k)(c)k! (x−a)k, for some c between a and x. If

one can show that limn→∞Rn = 0, then the function f(x) is represented by its Taylor series within the radius of convergence.

That is, for |x− a| < R,f(x) =

∞∑k=0

f (k)(a)

k!(x− a)k.

See Wikipedia (Taylor series article) for animated pictures showing the convergence of the Taylor series to ex. We provedlast quarter that our favorite functions: ex, sin(x), cos(x), log(1 − x) are represented by their Taylor series within theradius of convergence interval. Of course we cheated and defined ex by its Taylor series. So we have:

ex =∞∑n=0

1

n!xn, for all x ∈ R;

35

Page 159: Introduction, Motivation, Basics About Sets, Functions, Counting

log(1− x) = −∞∑n=0

1

nxn, for |x| < 1.

Figure 14 shows the sum of the first 7 terms of the Taylor series for ex, namely f(x) = 1+x+ x2

2 +x3

6 +x4

24 +x5

5∗24 +x6

30∗24in red and ex in blue on the interval [−5, 5].

Figure 14: Plot of ex blue and 1st 7 terms of its Taylor series about 0 red.

Another example is the Binomial Series:

(1 + x)α =

∞∑n=0

n

)xn, (4)

where we define the generalized binomial coefficient(α

n

)=α(α− 1) · · · (α− n+ 1)

n!.

As usual, define 0! = 1. When α is a positive integer, this is the ordinary binomial theorem. Otherwise you need to restrictx to have |x| < 1. Why?Exercise. Prove formula (4) using Taylor’s formula from Lectures, I. Find the radius of convergence. Why is this the

binomial theorem when α is a positive integer?

We also saw earlier that there are functions f(x) not represented by their Taylor series; e.g., f(0) = 0, and for x �= 0,define f(x) = e

− 1x2 . For this function, one can show that f (n)(0) = 0, for all n. This means the Taylor series for f(x)

around x = 0 is 0 even though f(x) itself is positive except at x = 0. For x �= 0,

e− 1x2 = f(x) �=

∞∑n=0

f (n)(0)

n!xn = 0.

See Figure 15.

36

Page 160: Introduction, Motivation, Basics About Sets, Functions, Counting

Figure 15: The plot of f(x) = e−1x2 . This function is not represented by its Taylor series about 0 which is 0 for all x.

37

Page 161: Introduction, Motivation, Basics About Sets, Functions, Counting

Part IV

The Integral Exists!19 Introduction

Recall our Lectures, I, Part 7. We assumed that for any finite interval [a, b] there exists an integral∫ baf of a continuous

function f : [a, b]→ R. And we assumed the integral satisfies the 2 axioms:INT 1. m ≤ f(x) ≤M, ∀x ∈ [a, b] =⇒ m(b− a) ≤ ∫ b

af ≤M(b− a).

INT 2. a < c < b =⇒ ∫ baf =

∫ caf +

∫ bcf.

We deduced all the other basic facts about integrals from these 2 axioms; e.g., the fundamental theorem of calculus,Taylor’s formula.However, we never showed that such an integral

∫ baf exists! I doubt that anyone was too worried. But now we

will finally prove the existence of the integral. We will in fact be able to integrate more general functions than just thosethat are continuous on [a, b].Note: We should also have proved R exists. That is done by forming the completion of Q (the space of Cauchy sequences

{xn} of rationals mod the equivalence relation {xn} � {yn} iff limn→∞ |xn − yn| = 0.

How will we create∫ baf ? It will be somewhat reminiscent of calculus. Recall the Riemann sums, where you partition

the interval [a, b] with points a = a0 < a1 < · · · < an = b. Then you form rectangles the sum of whose areas approximatesthe integral. In Figure 16 we approximate

∫ 10x2dx as in calculus by dividing up the x-axis interval [0, 1] into 5 equal parts.

The blue lines show the tops of the rectangles and also provide the graph of a step function approximating f(x) = x2. Ourapproximation for the integral is

.2 ∗ ((.2)2 + (.4)2 + (.6)2 + (.8)2 + 12) ∼= 0.44

The actual integral is x3/3|10= 1/3 ∼= .333333.

Figure 16: The calculus way of approximating the integral∫ 10x2dx. Divide the x-axis into 5 equal parts and take the value

of the function at the right- (or left-) hand end point of the subinterval for the height of the rectangle.

38

Page 162: Introduction, Motivation, Basics About Sets, Functions, Counting

Our newfangled way of finding an approximation for the same integral∫ 10x2dx is illustrated in Figure 17. We need to

find a step function s(x) such that ‖s− f‖∞ = l.u.b.0≤x≤1

|s(x)− f(x)| is small. To do this, one should 1st divide up the y-axisrather than the x-axis. Although in the end we divide up both axes.

Figure 17: The newfangled way of approximating the integral∫ 10x2dx . We find a step function in blue s(x) such that∥∥x2 − s(x)∥∥ ≤ 1

5 .

For∫ 10x2dx, we divide the y-axis interval (also [0, 1]) into 5 equal subdivisions and go down to the x-axis via the inverse

function√x to our original function x2. Now our approximation for the integral is:

.2 ∗(√.2− 0

)+ .4 ∗

(√.4−

√.2)+ .6 ∗

(√.6−

√.4)+ .8 ∗

(√.8−

√.6)+ 1 ∗

(1−

√.8) ∼= 0.450 26.

Numerically this is not impressive. But it will give us a way of constructing integrals that throws a new light on the theory.We will be able to get the Lebesgue integral simply by changing our norm from ‖‖∞ norm to the ‖‖1−norm.

Our method of integrating x2 requires us to say the integral∫ 10x2dx is the limit as n→∞ of

∫ 10sn(x)dx for a sequence

of step function sn(x) such that limn→∞

∥∥sn(x)− x2∥∥∞ = 0. More generally we write St[a, b]=the space of step functions on

the interval [a, b]. We will be able to integrate functions f on [a, b] that are uniform limits of sequences of step functionssn(x). These functions are in St[a, b] = the closure of the space of step functions with respect to the ‖‖∞ norm. The officialname for St[a, b] is the space of regulated functions. Yes, you can Google it.In the exercises, we give an example of a non-regulated function. This function can’t decide whether it is 1,−1 or 0 on

any small interval containing 0. It does not have a right-hand limit at 0. The function x sin(1/x)|x sin(1/x)| is pictured in figure 18.

This is not quite the function in the exercises as it does not have a value when x sin(1/x) = 0.

20 Continuous Linear Extension Theorem.

We will extend the integral from step functions to piecewise continuous functions and further. To do this, we need thecontinuous linear extension theorem. Suppose that F and E are normed vector spaces. Assume that E is complete; i.e.,all Cauchy sequences in E converge to a limit in E. We will write ‖‖ for the norms on both F and E. In our case E=thereal numbers=R with norm the usual absolute value.Suppose that F0 is a subspace of F ; i.e., a subset which is a vector space using the same operations of + and multiplication

by scalars as in F . In our case, F0 will be St[a, b], the step functions on the finite interval [a, b].Let L : F0 → E be linear; i.e., L(αx+ βy) = αL(x) + βL(y), for all x, y in F0. and α, β ∈ R. In our case L will be the

integral over a finite interval [a, b].

39

Page 163: Introduction, Motivation, Basics About Sets, Functions, Counting

Figure 18: Plot of x sin(1/x)|x sin(1/x)| on (0, .2)

We also assume that L is continuous for the norms on E and F0. We will use the ∞−norm on St[a, b]; i.e., ‖f‖∞ =l.u.b.x∈[a,b]

|f(x)| .

Lemma 47 Suppose L : F → E is a linear function, where E,F are normed vector spaces. Then L is continuous ⇔ ∃constant C > 0 such that ‖L(x)‖ ≤ C ‖x‖ , ∀x ∈ F.

Proof. =⇒ L continuous implies it is continuous at x = 0. Thus, taking ε = 1, we see that ∃δ > 0 such that ‖x‖ < δimplies ‖L(x)‖ < 1.Now we use a trick coming from properties of the norm. If x �= 0, (which we may assume without any problem since

that case is pretty clear as L(0) = 0), we have∥∥∥ δx2‖x‖

∥∥∥ = δ2 < δ. Since L is linear, we see that

1 >

∥∥∥∥L( δx

2 ‖x‖)∥∥∥∥ = δ

2 ‖x‖ ‖L(x)‖ .

It follows that ‖L(x)‖ < 2δ ‖x‖ . So our constant C = 2

δ .⇐= Now suppose ‖L(x)‖ ≤ C ‖x‖ , ∀x ∈ F. Using the linearity of L, we see that ‖L(x)− L(y)‖ = ‖L(x− y)‖ ≤C ‖x− y‖ < ε, if ‖x− y‖ < ε

C = δ. It follows that in fact L is uniformly continuous on F .

Lemma 48 If F0 is a subspace of a normed vector space F , then the closure F0 is also a subspace of F .

Proof. Recall the fact from the earlier section which says that the closure F0 consist of points of F which are limits ofsequences of points from F0. So we get this lemma from properties of limits. Let x, y ∈ F0, and {xn} {yn} be sequencesof points from F0 such that lim

n→∞xn = x and limn→∞yn = y. Suppose α, β ∈ R. We need to show that

limn→∞ (αxn + βyn) = αx+ βy.

This is an exercise in the properties of limits. It follows that αx +βy ∈ F0. That means F0 is a vector space.

Theorem 49 Continuous Linear Extension TheoremAssume F=normed vector space, F0 = subspace of F , E = complete normed vector space, L:F0 → E, is continuous

linear. Then L can be extended to a unique continuous linear function L : F0 → E. Here F0 is the closure of F0, as inLemma 48. If C is the constant in Lemma 47 for L, then it also works for L.

40

Page 164: Introduction, Motivation, Basics About Sets, Functions, Counting

Proof. Suppose x ∈ F0. Then there is a sequence of points {xn} from F0 such that limn→∞xn = x.

Claim 1. {L (xn)} is a Cauchy sequence in the complete normed vector space E.Proof of Claim 1.Using the linearity and continuity of L and Lemma 47, we know there is a positive constant C such that

‖L(xn)− L(xm)‖ = ‖L(xn − xm)‖ ≤ C ‖xn − xm‖ < εif n,m are sufficiently large, since {xn} is Cauchy.Q.E.D. claim 1.Since L (xn) ∈ E, a complete normed vector space, we know

∃ limn→∞L (xn) = w ∈ E.

We want to definew = L(x).

Claim 2. w is unique and thus w = L(x) is a legal definition of a function.Proof of Claim 2. Suppose we take another sequence {un} from F0 such that lim

n→∞un = x. Then

limn→∞L (un) = v ∈ E.

We want to show that v = w.Using the triangle inequality and the linearity and continuity of L with the existence of its positive constant C, we see

that

‖v − w‖ ≤ ‖v − L(un)‖+ ‖L(un)− L(xn)‖+ ‖L(xn)− w‖≤ ‖v − L(un)‖+ C ‖xn − un‖+ ‖L(xn)− w‖ .

But each of the 3 terms on the right approaches 0 as n→∞. By the Squeeze Lemma then ‖v − w‖ = 0.Q.E.D. claim 2.Claim 3. The function L(x) is linear.Exercise. Prove this using properties of limits and the linearity of L.

Claim 4. The function L(x) is continuous with the same constant C as for L.Proof of Claim 4.By the definition of L(x) and the continuity of the norm, if lim

n→∞xn = x, for a sequence {xn} from F0, we have∥∥L(x)∥∥ = ∥∥∥ limn→∞L (xn)

∥∥∥ = limn→∞ ‖L (xn)‖ .

We know that ‖L(xn)‖ ≤ C ‖xn‖ ∀n. Since the limit preserves ≤, we see (using the continuity of ‖‖) that∥∥L(x)∥∥ ≤ limn→∞C ‖xn‖ = C ‖x‖ .

Q.E.D. claim 4.This completes our proof of the continuous linear extension theorem - the hardest part of our construction of the integral.

Next we proceed to use the theorem.

Part V

Construction of the Integral Continued21 The Integral of a Step Map on [a,b]

We assume −∞ < a < b <∞. A step map or function is just what it says; i.e., a function whose graph looks like a bunchsteps. More precisely, we define f : [a, b] −→ R to be a step function if we can partition [a, b] by P = {a0 < a1 < · · · < an}

41

Page 165: Introduction, Motivation, Basics About Sets, Functions, Counting

so that f is constant on each subinterval (ai−1, ai), for each i = 1, ..., n:

f(x) = wi, ∀x ∈ (ai−1, ai).We don’t really care what values are assigned at the endpoints of the subintervals. These endpoints correspond to subintervalsof length 0. Even if you assign a value to the function on such subintervals, it will contribute nothing to the Riemann integraldefined below. See Figure 19.

Figure 19: A step function y = f(x).

Definition 50 The Riemann integral of the step function f is defined to be

b∫a

f = I(f,P) =n∑i=1

wi(ai − ai−1).

The sumn∑i=1

wi(ai − ai−1) should look like a Riemann sum. Here wi = f(ci) for any choice of points ci ∈ (ai−1, ai).We define a partition Q = {b0 < b1 < · · · < bm} to be a refinement of the partition P = {a0 < a1 < · · · < an} if the

set of points {a0, a1, · · · , an} is contained in the set of points {b0, b1, · · · , bm}.So, for example, the points P = { i

n

∣∣ i = 0, ..., n} form a partition (regular) of [0, 1]. The set Q = { i2n

∣∣ i = 0, ..., 2n}is a refinement of P.Now we need to worry about the possibility that the integral of a step function might depend on what partition we choose.

Lemma 51 Suppose f is a step function with respect to 2 partitions

P = {a0 < a1 < · · · < an} and Q = {b0 < b1 < · · · < bm}of [a, b]. Then I(f,P) = I(f,Q).Proof. Step 1. The partitions P and Q have a common refinement R whose subinterval endpoints are the union of theendpoints from the partitions P and Q; i.e., the points {a0, a1, · · · , an} ∪ {b0, b1, · · · , bm}. Call the common refinementR = {a = r0 < r1 < · · · < rk = b}.Step 2. We claim that I(f,P) = I(f,R) = I(f,Q).To see this, look at Figure 20.This shows that if we look at any subinterval of partition P, like (ai−1, ai) and further subdivide it by inserting a point

c; ai−1 < c < ai, then since f is the constant wi on the subinterval (ai−1, ai), the contribution to I(f,P) from (ai−1, ai)is

wi(ai − ai−1) = wi(c− ai−1) + wi(ai − c).

42

Page 166: Introduction, Motivation, Basics About Sets, Functions, Counting

Figure 20: Inserting a partition point in the ith subinterval of P replaces a term in I(f,P) = ∫ baf by the sum of 2 terms

representing the sums of the areas of the 2 rectangles pictured (if the function is ≥ 0 on the ith subinterval of P anyway)but this does not change the final result.

The right hand side is the sum associated to the 2 subintervals (ai−1, c) and (c, ai) in our new partition refined by adding onepoint to the ith subinterval. Keep doing this for as many points as you want to add to P to get the refinement R. Hopefullythis convinces you that I(f,P) = I(f,R). Similarly, since R is also a refinement of Q, we see that I(f,Q) = I(f,R). Itfollows that I(f,P) = I(f,Q).

Lemma 52 Suppose that St[a, b] denotes the set of all step functions defined on the interval [a, b]. Define the integral

b∫a

f

for f ∈ St[a, b] as in Definition 50. By Lemma 51, the integral is independent of the partition P used to define the stepfunction f . The integral then has the following properties on step functions.

Lemma 53 a) Integrals Preserve ≤ .

f, g ∈ St[a, b] and f(x) ≤ g(x), for all x ∈ [a, b], thenb∫a

f ≤b∫a

g.

b) Integral is Continuous Linear. The map I(f) =

b∫a

f is a continuous linear map from St[a, b] into R using the

‖‖∞ norm on St[a, b] and the usual absolute value on R. In particular,∣∣∣∣∣∣b∫a

f

∣∣∣∣∣∣ ≤ (b− a) ‖f‖∞ .Proof. a) Integrals Preserve ≤ .Given step function f with corresponding partition P and step function g with partition Q, we take a common refinement

R of P and Q. Both f and g are step functions with respect to R. Let R = {a0 < a1 < · · · < an}. Suppose that for eachi = 1, ..., n:

f(x) = wi, ∀x ∈ (ai−1, ai)g(x) = vi, ∀x ∈ (ai−1, ai).

If f(x) ≤ g(x), for all x ∈ [a, b], then wi ≤ vi, ∀i. It follows that

I(f,P) =b∫a

f =n∑i=1

wi(ai − ai−1) ≤ I(g,P) =b∫a

g =n∑i=1

vi(ai − ai−1).

43

Page 167: Introduction, Motivation, Basics About Sets, Functions, Counting

b) Integral is Continuous Linear. Note that if f is a step function with respect to the partition P = {a0 < a1 <· · · < an} of [a, b] and for each i = 1, ..., n:

f(x) = wi, ∀x ∈ (ai−1, ai),then

I(f,P) =b∫a

f =n∑i=1

wi(ai − ai−1).

We have ‖f‖∞ = maxi=1,...,n

|wi| . It follows, using the triangle inequality and the fact that the lengths of the subintervals ofthe partition add up to b− a, that ∣∣∣∣∣∣

b∫a

f

∣∣∣∣∣∣ ≤n∑i=1

|wi| (ai − ai−1) ≤n∑i=1

‖f‖∞ (ai − ai−1)

= ‖f‖∞n∑i=1

(ai − ai−1) = (b− a) ‖f‖∞ .

Continuity at f = 0 follows from this inequality. Once we have proved linearity of the integral, the (uniform) continuitywill also follow since then we can say∣∣∣∣∣∣

b∫a

f −b∫a

g

∣∣∣∣∣∣ =∣∣∣∣∣∣b∫a

(f − g)∣∣∣∣∣∣ ≤

∣∣∣∣∣∣b∫a

|f − g|∣∣∣∣∣∣ ≤ (b− a) ‖f − g‖∞ .

For the last 2 inequalities, we are using the fact that the integral on step functions preserves inequalities. One quickly findsa δ as a function of ε and independent of f and g.

Linearity. Let α, β ∈ R. Given step function f with corresponding partition P and step function g with partition Q, wetake a common refinement R of P and Q. Both f and g are step functions with respect toR. LetR = {a0 < a1 < · · · < an}.Suppose that for each i = 1, ..., n:

f(x) = wi, ∀x ∈ (ai−1, ai)g(x) = vi, ∀x ∈ (ai−1, ai).

Then αf + βg is a step function for partition R with values for each i = 1, ..., n:

(f + g) (x) = αwi + βvi, ∀x ∈ (ai−1, ai).

This shows that St[a, b] is indeed a normed vector space with norm ‖‖∞.It follows from our definition of the integral on step functions that

I(αf + βg,R) =

b∫a

(αf + βg) =n∑i=1

(αwi + βvi) (ai − ai−1)

= α

n∑i=1

wi(ai − ai−1) + βn∑i=1

vi(ai − ai−1)

= α

b∫a

f + β

b∫a

g.

So the integral is linear.

Exercise. Show that it does not matter what (finite) values the function f takes on the partition points.

44

Page 168: Introduction, Motivation, Basics About Sets, Functions, Counting

22 The Integral on Step Functions satisfies the 2 Axioms for Integrals fromLectures I.

Recall from Lectures I that we needed 2 axioms for integrals in order to prove the fundamental theorem of calculus. Nowwe prove those axioms for step functions f ∈ St[a, b].

Axiom 1. If m ≤ f(x) ≤M for all x ∈ [a, b], then m(b− a) ≤b∫a

f ≤M(b− a).

Axiom 2. If a < c < b, then

b∫a

f =

c∫a

f +

b∫c

f.

Proof. Axiom 1 for Step Functions. Since m ≤ f(x) ≤ M for all x ∈ [a, b] and we have proved that the integralpreserves inequalities, we have

m(b− a) =b∫a

m ≤b∫a

f ≤b∫a

M =M(b− a).

Proof. Axiom 2 for Step Functions. For f a step function on [a, b], we can always add a point like c to our partitionP = {a0 < a1 < · · · < an} of [a, b] defining f and assume c = aj for some j. Suppose that for each i = 1, ..., n:

f(x) = wi, ∀x ∈ (ai−1, ai).

Proof. Then according to our definition of the integral of a step function

I(f,P) =

b∫a

f =n∑i=1

wi(ai − ai−1)

=

j∑i=1

wi(ai − ai−1) +n∑

i=j+1

wi(ai − ai−1)

=

c∫a

f +

b∫c

f.

This completes our proofs of the properties of the integral on step functions. In the next section we prove a result whichwill help us with our scheme to integrate continuous functions.

23 Uniform Approximation of Continuous Functions by Step Functions

Theorem 54 f : [a, b]→ R continuous implies f is uniformly continuous on [a, b].

Proof. (by Contradiction)Otherwise ∃ε > 0 s.t. ∀n ∈ Z+ ∃xn, yn ∈ [a, b] with |xn − yn| < 1

n and |f(xn)− f (yn)| ≥ ε.But we showed in Lectures I (using the Spanish Hotel argument) that any bounded sequence of real numbers has a

convergent subsequence. Thus there is an infinite subset J of Z+ such that

∃ limn→∞n ∈ J

xn = x ∈ [a, b]

and∃ limn→∞n ∈ J

yn = y ∈ [a, b].

45

Page 169: Introduction, Motivation, Basics About Sets, Functions, Counting

But then |xn − yn| < 1n implies using the continuity of absolute value:

|x− y| =

∣∣∣∣∣∣∣∣∣ limn→∞n ∈ J

xn − limn→∞n ∈ J

yn

∣∣∣∣∣∣∣∣∣= lim

n→∞n ∈ J

|xn − yn| ≤ limn→∞n ∈ J

1

n= 0,

which means x = y.On the other hand, since |f(xn)− f (yn)| ≥ ε and x = y, we have

0 = |f(x)− f(y)| =

∣∣∣∣∣∣∣∣∣ limn→∞n ∈ J

f(xn)− limn→∞n ∈ J

f(yn)

∣∣∣∣∣∣∣∣∣= lim

n→∞n ∈ J

|f(xn)− f(yn)| ≥ ε.

This contradiction proves the theorem.Now we are ready to uniformly approximate continuous functions by step functions.

Theorem 55 For every continuous function f : [a, b] → R, there exists a sequence of step function sn ∈ St[a, b] such that‖f − sn‖∞ → 0 as n→∞.Proof. Given ε > 0, the preceding theorem says

∃δ > 0 s.t. |x− y| < δ implies |f(x)− f(y)| < ε. (5)

Now let n be so large that b−an < δ and let P = {a0 < a1 < · · · < an} be a partition of [a, b] such that ai−ai−1 = b−an < δ.

Define the step function sn on the subinterval (ai−1, ai) to take any value f(c) for c ∈ (ai−1, ai). Then by formula (5)|f(x)− sn(x)| < ε for all x ∈ (ai−1, ai).

This means ‖f − sn‖∞ < ε for n sufficiently large.

The preceding theorem says that the closure of the space of step functions with respect to the ‖‖∞ norm, written St[a, b],contains all continuous functions on [a, b]. It clearly also contains all piecewise continuous functions (which are alloweda finite number of jump discontinuities).The space St[a, b] is called the space of regulated functions. We saw earlier (and in the exercises) that not all bounded

functions on [a, b] are regulated. A function f(x) must have right and left hand limits at all x ∈ [a, b] to be regulated.

24 Extension of the Integral to the Regulated Functions on [a,b].

We can use the continuous linear extension theorem to extend the integral from St[a, b] to St[a, b]. Every f ∈ St[a, b] is alimit with respect to the ‖‖∞ norm of a sequence sn ∈ St[a, b]. Then the continuous linear extension theorem tells us todefine

b∫a

f = limn→∞

b∫a

sn.

We need to show that this extended integral satisfies the same 2 axioms that the step functions were shown to satisfy.

46

Page 170: Introduction, Motivation, Basics About Sets, Functions, Counting

Theorem 56 Properties of the Integral on the Space St[a, b] of Regulated Functions (which includes all con-tinuous, even piecewise continuous functions on [a, b]).

1)

b∫a

f is a linear map from f ∈ St[a, b] intob∫a

f ∈ R and

∣∣∣∣∣∣b∫a

f

∣∣∣∣∣∣ ≤ (b− a) ‖f‖∞ .

2)

b∫a

f preserves inequalities; i.e., f, g ∈ St[a, b] with f(x) ≤ g(x) ∀x ∈ [a, b] implies

b∫a

f ≤b∫a

g.

3) a < c < b implies

b∫a

f =

c∫a

f +

b∫c

f.

Proof. 1) This is just the continuous linear extension theorem from the last lecture.2) Let h(x) = g(x) − f(x) ∀x ∈ [a, b]. Then h(x) ≥ 0 ∀x ∈ [a, b]. Suppose sn ∈ St[a, b] s.t. lim

n→∞ ‖sn − h‖∞ = 0.

Suppose sn is a step function for the partition P = {a0 < a1 < · · · < am} of [a, b] and for i = 1, ...,m,sn(x) = wi, ∀x ∈ (ai−1, ai).

If wi < 0 for some i, we can replace wi by 0 and make a new step function s∗n which is even closer to h than sn was(because h(x) ≥ 0 for all x). Now we see that

b∫a

g −b∫a

f =

b∫a

h = limn→∞

b∫a

s∗n ≥ 0.

3) Let sn be a sequence of step maps converging to f in the ∞−norm on [a, b]. Then sn must also converge to f inthe ∞−norm on [a, c] and on [c, b]. We showed that for step functions we have:

b∫a

sn =

c∫a

sn +

b∫c

sn.

Take the limit as n→∞ to get property 3 using the basic properties of limits from last quarter.This completes our discussion of the existence of an integral with the properties Int1 and Int 2 from Lectures I. From

this we deduced in Lectures I all the basic properties of integrals such as the fundamental theorem of calculus, integrationby parts, the formula for substituting in an integral, the formulas like

b∫a

xndx =xn+1

n+ 1

∣∣∣∣bx=a

=bn+1

n+ 1− an+1

n+ 1,

when n �= −1 and assuming that if n < −1, 0 is not in the interval [a, b].To extend the fundamental theorem of calculus from continuous functions to piecewise continuous functions or regulated

functions, requires a little more effort. Lang does this in Undergraduate Analysis, Chapter 10 and produces theoremslegalizing differentiation under the integral sign. In the last part of the book he extends the theory to integrals in severalvariables. I leave it to you to read these things.

47

Page 171: Introduction, Motivation, Basics About Sets, Functions, Counting

Part VI

Dirac and WeierstrassOne goal of this part of the lectures is to solve the following problem. Given a continuous function f : [a, b] → R, find apolynomial p such that ‖f − p‖∞ (where ‖g‖∞ = max

a≤x≤b|g(x)|) is arbitrarily small. You might think we had already

solved this problem with Taylor’s formula and series. But Taylor polynomials do not give arbitrarily good approximationsunless the function f has lots of derivatives. Moreover, we know that there are infinitely differentiable functions that are notrepresented by their Taylor series. See Figure 15. Thus we must find a new method to get polynomial approximations. Atthe same time we will create the mechanism to find other sorts of approximations that we will need when we discuss Fourierseries in the last section.

25 A New Kind of Product of Functions

You are familiar with the pointwise product of functions defined by (f · g) (x) = f(x) · g(x). You just take the product ofthe real numbers f(x) and g(x). Thus if we define f(x) = 1 for all x, we get ( f · g) (x) = g(x). So f(x) = 1 is the identityfor pointwise product.Now we want to define a new kind of product, well, not that new if you got to convolution in the Laplace transform

section of ODEs courses. And not so new if you are taking probability or statistics courses where given independent randomvariables with densities f and g, the density of the sum of the random variables is the convolution product f ∗ g. For thisproduct, f ∗ 1 �= f.Our aim is to use convolution in order to uniformly approximate continuous functions f on [a, b] by polynomials

∑Nn=0 anx

n

(or continuous functions of period 1 by trigonometric polynomials∑∞

n=−∞ ane2πinx for Fourier series).

We will assume that our functions are piecewise continuous and that at least one of the functions in f ∗ g vanishes off abounded interval so that we know the integrals exist without thinking too hard about the meaning of

∞∫−∞

= limA→=−∞

⎛⎝ limB→∞

B∫A

⎞⎠ .

Definition 57 The convolution of f and g is defined for all x by

(f ∗ g) (x) =∞∫

−∞f(t)g(x− t)dt.

One can read f ∗g as f splat g since it does splat the properties of the 2 functions together, preserving the best properties.If one function is discontinuous, but the other is a polynomial, the convolution is a polynomial.Example. Let g(x) = 1− x2, for all x. Define

f(x) =

⎧⎪⎪⎨⎪⎪⎩1, if 0 < x < 1−1, if − 1 < x < 00, if x = 00, if x ≥ 1 or x ≤ −1.

See Figures 21 and 22.

48

Page 172: Introduction, Motivation, Basics About Sets, Functions, Counting

Figure 21: g(x) = 1− x2

Figure 22: The f(x) =

⎧⎪⎪⎨⎪⎪⎩1, if 0 < x < 1−1, if − 1 < x < 00, if x = 00, if x ≥ 1 or x ≤ −1.

49

Page 173: Introduction, Motivation, Basics About Sets, Functions, Counting

Then, since f has differing formulas on different intervals (and mercifully g does not), we get

(f ∗ g)(x) = −10∫

−1g(x− t)dt+

1∫0

g(x− t)dt

= −0∫

−1

(1− (x− t)2) dt+ 1∫

0

(1− (x− t)2) dt

=−2x33

+(x+ 1)

3

3+(x− 1)3

3= 2x.

Figure 23: f ∗ g, where f is from Figure 21 and g is from Figure 22.

So we see that even though f is discontinuous, when convolved with a polynomial, we get a polynomial.

Theorem 58 Properties of Convolution.Assume f, g, h are piecewise continuous and at least one vanishes off a bounded interval when necessary to make an

integral exist. Let α ∈ R. Then we have the following properties.1) f ∗ g = g ∗ f2) f ∗ (g + h) = f ∗ g + f ∗ h3) (αf) ∗ g = α (f ∗ g)4) f ∗ (g ∗ h) = (f ∗ g) ∗ h5) Suppose that c and d are both positive. If f(x) = 0 when |x| > c. and g(x) = 0 when |x| > d. Then (f ∗g)(x) = 0

when |x| > c+ d.6) If g(x) is a polynomial then (f ∗ g)(x) is a polynomial.7) (f ∗ g)′ = f ∗ (g′) assuming g continuously differentiable.

Proof. 1) Make the change of variables u = x− t. Then t = x− u and du = −dt and the order of integration changes sowe get:

(f ∗ g) (x) =∞∫

−∞f(t)g(x− t)dt = −

−∞∫∞f(x− u)g(u)du =

∞∫−∞

g(u)f(x− u)du = (g ∗ f) (x).

50

Page 174: Introduction, Motivation, Basics About Sets, Functions, Counting

2)-4) are exercises.

5)

∞∫−∞

f(t)g(x− t)dt =c∫

−cf(t)g(x− t)dt.

Now if x > c+ d, and −c ≤ t ≤ c, we see that x− t > c+ d− c = d and g(x− t) = 0. So x > c+ d implies (f ∗ g) = 0.If x < −c − d and −c ≤ t ≤ c, we see that x − t < −c − d + c = −d and g(x − t) = 0. Thus x < −c − d implies(f ∗ g) = 0.6) It suffices using 2) and 3) to consider the case that g(x) = xn. Then supposing f(x) = 0 if |x| > c, we have the

following, using the binomial theorem

(f ∗ g) (x) =

∞∫−∞

f(t)g(x− t)dt =c∫

−cf(t)(x− t)ndt

=

c∫−cf(t)

n∑k=0

(n

k

)xn−k(−t)kdt =

n∑k=0

(n

k

)xn−k

c∫−cf(t)(−t)kdt.

Letc∫

−cf(t)(−t)kdt = ck.

Then

(f ∗ g) (x) =n∑k=0

(n

k

)ckx

n−k,

which is a polynomial.7) This is an exercise using a result legalizing interchange of derivative and integral which can be found in Lang,

Undergraduate Analysis, p. 276, Theorem 7. The hypothesis requires the uniform continuity of ∂∂x (f(t)g(x− t)) .

This concludes our discussion of convolution. It is important in the theory of probability and Fourier analysis, Laplacetransforms. In fact the Fourier and Laplace transforms change convolution into ordinary pointwise product of the transformedfunctions. Convolution is also used to smooth data thanks to property 7). We want to use it to approximate continuousfunctions by polynomials.

26 A Function Which is Not an Ordinary Function

We want to think about something called the Dirac delta "function" denoted δ(x). This is not to be confused with our earlieruse of the Greek letter δ to mean a small positive number. The Dirac delta function is used in physics to represent animpulse. Examples are a point mass and a point charge. It is often said to be a function that is 0 for x �= 0 and ∞ at x = 0.At least that is what I remember as an undergrad taking physics classes for my minor. It used to drive me crazy, since as amathematics major, I learned that such a function could not exists. Laurent Schwartz whose theory of distributions (1944)legalized delta describes how the formulas involving delta drove him crazy in 1935 in his autobiography. See L. Schwartz, AMathematician Grappling with his Century, p. 218. He also says: " ... it’s a good thing that theoretical physicists do notwait for mathematical justification before going ahead with their theories!"The graph usually associated with δ shows a unit arrow or spike at the origin - not a point at infinite height. See Figure

24.Another "definition" of δ is that for any continuous function f(x) it is supposed to give the formula

∞∫−∞

f(t)δ(t)dt = f(0).

We show in the exercises that this formula forces δ(x) = 0 for x �= 0. But if delta were really a function this would force∞∫

−∞f(t)δ(t)dt = 0 and not f(0).

51

Page 175: Introduction, Motivation, Basics About Sets, Functions, Counting

Figure 24: Dirac delta

Yet another definition of delta is that it is the identity for convolution. That is just as bad.This disturbed mathematicians mightily in the 1930’s and 1940’s. Engineers and physicists just happily used delta and

even its derivative or infinite sums of deltas. Finally Laurent Schwartz and others legalized delta and its relatives by callingit a distribution or generalized function and developing the calculus of distributions. See my book Harmonic Analysison Symmetric Spaces and Applications, I, for an introduction. Other references are Korevaar, Mathematical Methods,Stakgold, Green’s Functions and Boundary Value Problems, and Schwartz, Math. for the Physical Sciences. The subject ofanalysis on distributions does not seem to appear in undergrad math. courses.So what to do? In this section we consider sequences of functions that have the behavior of delta in the limit as n→∞.

These are called Dirac sequences or approximate identities (for convolution). In fact, Dirac did this himself.

Definition 59 A Dirac sequence (of positive type) Kn(x) is a sequence of functions Kn : R→ R such thatDir 1. Kn(x) ≥ 0, for all x and n.

Dir 2.

∞∫−∞

Kn(x)dx = 1, for all n.

Dir 3. ∀ ε > 0 and ∀ δ > 0, ∃ N ∈ Z+ s.t. n ≥ N implies

−δ∫−∞

Kn(x)dx+

∞∫δ

Kn(x)dx < ε.

The 3 properties say that the area under the curve (which is 1) y = Kn(x) becomes more and more concentrated at theorigin as n→∞. You might think it hard to find such a sequence but we have the following examples.

Example 1. Let K(x) be such that K(x) ≥ 0 for all x and

∞∫−∞

K(x)dx = 1. Then Kn(x) = nK(nx) is a Dirac sequence.

Example 2. The Landau Kernel.This kernel is named for the number theorist Edmund Landau (1877-1938) who was alive at the same time that Dirac

wrote his important article introducing his function (1926-7).Define the Landau kernel to be

Ln(x) =

{(1−t2)n

cn, −1 ≤ t ≤ 1

0 otherwise.where cn =

1∫−1

(1− t2)n dt.

It is possible to use integration by parts to find that

cn =(n!)

222n+1

(2n)! (2n+ 1).

52

Page 176: Introduction, Motivation, Basics About Sets, Functions, Counting

See Courant and Hilbert, Methods of Mathematical Physics, Vol. I, p. 84. Unless you know Stirling’s formula, this is nottoo helpful in proving the Landau kernel gives a Dirac sequence. Lang gives a simple inequality which is all we need. InFigure 25, we plot Ln(t) =

(1− t2)n (2n)!(2n+1)

(n!)222n+1, for t ∈ [−1, 1], when n = 10 in blue; 30 in magenta ; 60 in green; 90

in turquoise, 120 in purple. Figure 25 shows the area under the curve (which is 1) begins to concentrate at the origin as nincreases.

Figure 25: Ln(t) =(1− t2)n (2n)!(2n+1)

(n!)222n+1, for t ∈ [−1, 1], when n = 10 in blue; 30 in magenta ; 60 in green; 90 in turquoise,

120 in purple

Lemma. cn ≥ 2n+1 .

Proof.

cn2=

1∫0

(1− t2)n dt = 1∫

0

(1− t)n (1 + t)n dt ≥1∫0

(1− t)n dt = 1

n+ 1.

It is clear that the Landau kernel has the first 2 properties of a Dirac sequence. To prove the 3rd property, we argueas in Lang, Undergraduate Analysis. Given ε > 0 and 0 < δ < 1, we need to find N so that n ≥ N makes the followingintegral < ε :

1

cn

1∫δ

(1− t2)n dt ≤ n+ 1

2

1∫δ

(1− t2)n dt ≤ n+ 1

2

1∫δ

(1− δ2)n dt = n+ 1

2

(1− δ2)n (1− δ) .

We can make the stuff on the right < ε for large n, since we can show that because 0 < 1− δ2 < 1, limn→∞

n+12

(1− δ2)n = 0.

To see this, you could use l’Hôpital’s rule or just remember the appropriate fact about exponentials (if c < 0, then xecx → 0,as x→∞).

27 A Dirac Sequence Approaches The Dirac Delta

We want to prove that any Dirac sequence Kn behaves like an identity for convolution in the limit as n→∞. Some peoplecall Dirac sequences "approximate identities" for this reason.

53

Page 177: Introduction, Motivation, Basics About Sets, Functions, Counting

Theorem 60 Suppose f is a bounded piecewise continuous function on R. Let I be any finite interval on which f iscontinuous. Define ‖g‖∞ = max

x∈I|g(x)|. Then lim

n→∞ ‖f −Kn ∗ f‖∞ = 0. This says that the sequence Kn ∗ f converges

uniformly to f on the interval I. Here the norm is taken on the interval I and we are assuming that the kernel vanishes offI.

Proof. Using the 1st and 2nd properties of Dirac sequences plus the fact that integrals preserve ≤, we have:

|(Kn ∗ f) (x)− f(x)| =

∣∣∣∣∣∣∞∫

−∞Kn(t)f(x− t)dt− f(x)

∞∫−∞

Kn(t)dt

∣∣∣∣∣∣=

∣∣∣∣∣∣∞∫

−∞Kn(t) (f(x− t)− f(x)) dt

∣∣∣∣∣∣≤

∞∫−∞

Kn(t) |f(x− t)− f(x)| dt.

Since f is uniformly continuous on I by Theorem 54, given ε there is a δ such that |f(x− t)− f(x)| < ε when |t| < δ.Since f is bounded, there is a bound M so that |f(x)| ≤M for all x.Now we can break up the last integral

∞∫−∞

Kn(t) |f(x− t)− f(x)| dt =−δ∫

−∞Kn(t) |f(x− t)− f(x)| dt+

∞∫δ

Kn(t) |f(x− t)− f(x)| dt+δ∫

−δKn(t) |f(x− t)− f(x)| dt.

For the first 2 integrals, use the bound on f and the 3rd property of Dirac sequences to see that, for large enough n, theyare less than 2Mε or even ε, if you prefer.For the last integral, use the uniform continuity of f to see that the integral is

≤ εδ∫

−δKn(t)dt ≤ ε

∞∫−∞

Kn(t)dt = ε.

This completes the proof that |(Kn ∗ f) (x)− f(x)| ≤ (2M + 1) ε. You can replace ε by ε2M+1 if you are paranoid.

Corollary 61 Weierstrass Theorem. Any continuous function f on a finite closed interval [a, b] can be uniformlyapproximated by polynomials.

Proof. We can use the preceding theorem for the interval [0, 1] along with the Landau kernel. See Lang, UndergraduateAnalysis, for a general interval. Suppose f is continuous on [0, 1] and vanishes outside [0, 1]. We still need to see that Ln ∗ fis a polynomial. Suppose x ∈ [0, 1]. Then

∞∫−∞

Ln(x− t)f(t)dt =1∫0

Ln(x− t)f(t)dt.

Also we see that for x, t ∈ [0, 1], we have −1 ≤ x− t ≤ 1 and so Ln(x− t) =(1− (x− t)2

)nand the integral is

1

cn

1∫0

(1− (x− t)2

)nf(t)dt.

This is a polynomial using the same reasoning as in our proof in the 1st section that convolution of f with any polynomialis a polynomial.

54

Page 178: Introduction, Motivation, Basics About Sets, Functions, Counting

Figure 26: Gσ(x) = 12πσ e

− x2

2σ2 , where σ = 1/10, 1/20, 1/30, 1/40.

Figure 27: The unit step function U(x) = 1 if x > 0, U(x) = 0, otherwise.

55

Page 179: Introduction, Motivation, Basics About Sets, Functions, Counting

Example. The Gauss Kernel.

One can show (exercise) that the Gauss kernel (or normal density) Gσ(x) =12πσ e

− x2

2σ2 approaches the Dirac deltaas σ → 0. Figure 26 shows the kernel for σ = 1/10, 1/20, 1/30, 1/40.

As an example, let’s approach the Unit Step function in Figure 27.Convolutions of the Gaussians from Figure 26 and the Unit Step function are shown in Figure 28.

Figure 28: Gσ ∗ U, for σ = 110 ,

120 ,

130 ,

140 .

Part VII

Fourier SeriesWe said a bit about Fourier series in the introduction. Fourier’s original paper was published in 1822. The publication wasdelayed by mathematicians such as Lagrange. See the book on Fourier by Grattan-Guinness and Ravetz. It may seem oddthat people worried so much about Fourier expansions when they did not worry about Taylor expansions.It will be slightly easier if we allow complex-valued functions like f(x) = e2πinx, where i =

√−1. This means we need toallow vector spaces with complex scalars.

28 Inner Product Spaces with Complex Scalars

We want to look at vector spaces V with complex (or real) scalars. This means that V has all the axioms for addition andmultiplication by scalars given earlier. Such a vector space is said to have a Hermitian scalar product < v,w > forv, w ∈ V if it satisfies the following axioms.Axioms for a Hermitian Scalar ProductFor all z, w, u ∈ V and α, β ∈ C we have:Axiom P1. < z,w >∈ C and < z,w >= < w, z >. Here α = u+ iv ∈ C has complex conjugate α = u− iv.Axiom P2. < αz + βw, u >= α < z, u > +β < w, u > .Axiom P3. < z, z >≥ 0 and < z, z >= 0 ⇐⇒ z = 0.

Definition 62 As in the case of real scalars, we will say that 2 vectors z, w ∈ V are orthogonal if < z,w >= 0.

56

Page 180: Introduction, Motivation, Basics About Sets, Functions, Counting

Theorem 63 If V is a vector space with complex scalars and < z,w > denotes an Hermitian scalar product on V, then‖z‖ .= √< z, z > gives a norm on V according to our usual axioms for norms on vector spaces.

Proof. We need to check the 3 norm axioms.N1. ‖v‖ ≥ 0 ∀v ∈ V and ‖v‖ = 0 if and only if v = 0.N2. ‖αv‖ = |α| ‖v‖ , ∀v ∈ V and ∀α ∈ R. Here |α| = complex absolute value of α.N3. ‖u+ v‖ ≤ ‖u‖+ ‖v‖ , ∀u, v ∈ V. Triangle Inequality.The first follows from Property 3 of the scalar product. To see the second, note that if α ∈ C

‖αz‖ = √< αz, αz > =√αα ‖z‖ .

We are now done since the complex absolute value of α is√αα.

In order to prove N3, we need the following theorem. The proof will then be an exercise.

Theorem 64 Cauchy-Schwarz Inequality for Complex scalar Product Spaces. Suppose < z,w > is an Hermitianscalar product on the complex vector space V. Then for all z, w ∈ V, we have:

| < z,w > | ≤ ‖z‖ ‖w‖ .Proof. First note that if v, w are non-0 vectors in V , then there is a unique scalar α ∈ C such that v-αw is orthogonal tow. The scalar is α = <v,w>

<w,w> . It is an exercise to check this statement.Next note that we have the Pythagorean Theorem for this general situation. It says that if 2 vectors u, v are

orthogonal, then ‖u+ v‖2 = ‖u‖2 + ‖v‖2 . This is another exercise, requiring you to expand ‖u+ v‖2 =< u + v, u + v >and make use of orthogonality of u and v.To prove Cauchy-Schwarz, write v = v − αw + αw, where α = <v,w>

<w,w> . Then

‖v‖2 = ‖v − αw‖2 + ‖αw‖2 ≥ ‖αw‖2 .This means

‖v‖2 ≥ |α|2 ‖w‖2 = < v,w >2

‖w‖4 ‖w‖2 = < v,w >2

‖w‖2 .

The Theorem follows immediately upon multiplying by ‖w‖2.Example 1. Let C2 =

{(z1z2

)∣∣∣∣ zj ∈ C} . Define the sum of 2 vectors z, w ∈ C2 by(z1z2

)+

(w1w2

)=

(z1 + w1z2 + w2

)and the product with α ∈ C by

α

(z1z2

)=

(αz1αz2

).

The Hermitian scalar product on C2 is defined by⟨(z1z2

),

(w1w2

)⟩= z1w1 + z2w2.

We leave it as an exercise for you to check that this satisfies the 3 axioms for an Hermitian scalar product.Example 2. Let V = C[0, 1] = {f : [0, 1]→ C} . If f, g ∈ V and α ∈ C, define (f + g)(x) = f(x) + g(x) and

(αf) (x) = αf(x), for all x ∈ [0, 1]. Then define the Hermitian scalar product by

< f, g >=

1∫0

f(x)g(x)dx.

It is easy to integrate a complex valued function h(x) = u(x) + iv(x), with u(x) and v(x) real valued. The definition is

b∫a

h =

b∫a

(u+ iv) =

b∫a

u+ i

b∫a

v.

It is an exercise to check the axioms for a Hermitian scalar product.

57

Page 181: Introduction, Motivation, Basics About Sets, Functions, Counting

29 Complex Exponentials

Our Fourier series will involve complex exponentials because it would be twice as much work to write series of sines andcosines. We can define for z ∈ C, the complex exponential by the same old Taylor series, except that now all the termsare complex:

ez =∞∑n=0

zn

n!.

This series converges absolutely and uniformly in the complex plane by the complex version of the Weierstrass M-Test. Thesame proof that worked for real exponentials shows that ez+w = ezew. One also has (ez)w = ezw. And, as before, if y isreal, eiy = cos y + i sin y.Complex conjugation c(z) = z is a continuous function satisfying zw = zw. This means that if p(z) is a polynomial

with real coefficients, then p(z) = p (z) . Furthermore ez = ez. It is an exercise to prove the preceding statements.The complex exponentials en(x) = e2πinx, n = 0,±1,±2,±3, .... form a set of mutually orthogonal functions in C[0, 1] =

{f : [0, 1]→ C} using the scalar product defined in Example 2 of the last section. To see this, suppose m �= n and do theintegral:

< en, em >=

1∫0

e2πinxe2πimxdx =

1∫0

e2πi(n−m)xdx =1

2πi(n−m)e2πi(n−m)x

∣∣∣∣1x=0

= 0.

Here we are assuming a complex version of the fundamental theorem of calculus, but you can check that it works given ourdefinitions of these integrals. It is certainly simpler to do things this way than to show the analogous properties of cos(2πnx)and sin(2πmx), using trigonometric identities. We also need to know that 1 = e2πin, for any integer n.If n = m, we find that

‖en‖2 =< en, en >=1∫0

e2πinxe2πinxdx =

1∫0

1dx = 1.

Our goal is to expand a function f ∈ V = C[0, 1] in a Fourier series

f(x) =∞∑

n=−∞γne

2πinx, with γn =< f, en >=

1∫0

f(t)e−2πnitdt. (6)

Fourier claimed to be able to expand "arbitrary" functions in his series. That will not be possible if we want the series inFormula (6) to converge pointwise or better yet uniformly. For applications we might want even better convergence in thatwe may want to be able to differentiate term-by-term. Generally we will find that the more derivatives f has, the faster theFourier series will converge.If you hate complex numbers and you want to keep everything real, use eix = cosx+ i sinx and write the Fourier series

as

f(x) =a02+

∞∑n=1

(an cos(2πnx) + bn sin(2πnx)) ,

where for n = 1, 2, 3, ....

an =

1∫0

f(t) cos(2πnt)dt and bn =

1∫0

f(t) sin(2πnt)dt. (7)

Exercise. Using Taylor series prove that eiy = cos y + i sin y.Exercise. Show that if an, bn, and cn are defined as in equations (6) and (7), then they are related as follows for

n = 1, 2, 3, 4, ...

cn =1

2(an − ibn) , c−n = 1

2(an + ibn) .

58

Page 182: Introduction, Motivation, Basics About Sets, Functions, Counting

30 Facts about Generalized Fourier Series in Hermitian scalar Product Spaces

We do some infinite dimensional linear algebra here. You should be familiar with the finite dimensional version from calculus.The set-up is that V is a complex vector space with Hermitian scalar product < v,w >. Given an infinite set {v1, v2, v3, . . . }of pairwise orthogonal vectors in V , we want to express an arbitrary vector as an infinite sum

∞∑n=1

γnvn, with scalars γn ∈ C.

Theorem 65 Facts about Generalized Fourier Series in Infinite Dimensional scalar Product SpacesSuppose {v1, v2, v3, . . . } are non-zero elements of V and pairwise orthogonal; i.e. < vn, vm >= 0 if n �= m.Fact 1. Non-0 Orthogonal elements of V are linearly independent.

For any K, if there are scalars γn ∈ C such thatK∑n=1

γnvn = 0, then cn = 0 for all n = 1, 2, ...,K. This says that

{v1, v2, v3, . . . } is a linearly independent set.Fact 2. Formula for the Fourier Coefficients.And suppose that a vector z ∈ V has an expression

z =∞∑n=1

γnvn, with γn ∈ C,

(converging with respect to the norm ‖w‖ = √< v, v >). Then the nth generalized Fourier coefficient is

γj =< z, vj >

< vj , vj >.

Fact 3. Bessel’s Inequality. Assume that the set is orthonormal, meaning that ‖vn‖ = 1, for all n. Then we haveBessel’s inequality

∞∑n=0

|γn|2 ≤ ‖z‖2 .

Fact 4. Parseval’s Equality (or the Plancherel Theorem).Assume that the set is orthonormal. Also assume that {v1, v2, v3, . . . } is complete; that is, any vector z ∈ V has an

expression z =∞∑n=1

γnvn, with γn ∈ C. Then we have Parseval’s equality:

∞∑n=0

|γn|2 = ‖z‖2 .

Proof. Fact 1. IfK∑n=1

γnvn = 0, take the scalar product with vj to get

0 =

⟨K∑n=1

γnvn, vj

⟩=

K∑n=1

γn 〈vn, vj〉 = γj 〈vj , vj〉 .

The 1st equality holds since < 0, w >= 0 for any vector w. The 2nd holds by the linearity of the scalar product in the1st variable. The last equality holds by the pairwise orthogonality of the vectors vn. Since vj �= 0, it follows that γj = 0.Fact 2. Look at the scalar product < z, vj >:

< z, vj >=

⟨ ∞∑n=1

γnvn, vj

⟩=

∞∑n=1

γn 〈vn, vj〉 = γj < vj , vj > .

We can move limits such as infinite sums in and out of the scalar product < v,w > because the scalar product is continuousin v holding w fixed (exercise using the Cauchy-Schwarz inequality). Again, we have used the linearity of the scalar productin the 1st variable and the pairwise orthogonality of the vectors vn. Solve for γj to finish the proof.

59

Page 183: Introduction, Motivation, Basics About Sets, Functions, Counting

Fact 4. Look at the scalar product < z, z >:

< z, z >=

⟨ ∞∑n=1

γnvn,

∞∑m=1

γmvm

⟩=

∞∑n=1

∞∑m=1

γnγm 〈vn, vm〉 =∞∑n=0

|γn|2 .

Here we used the continuity and linearity of the scalar product in each variable holding the other variable fixed as well asthe pairwise orthogonality of the vn plus the fact that ‖vn‖ = 1, ∀n.We leave the proof of Fact 3 as an exercise.Example. Let V = C[0, 1] = {f : [0, 1]→ C | f is continuous} . Note that we can identify this space with C[− 1

2 ,12 ] or

C(I) for any other closed interval I of length 1.

Define the scalar product as < f, g >=

1∫0

f(t)g(t)dt. We have an orthonormal set S ={en(t) = e

2πint|n ∈ Z} . Thenorm induced on this space by the scalar product is the L2−norm given by

‖f‖2 =

√√√√√ 1∫0

|f(t)|2 dt.

Thanks to the preceding theorem, once we have proved that S is a complete orthonormal set, then we will know that anyfunction f ∈ C[0, 1] has a Fourier series converging in the L2−norm. That is

f(x) =∞∑

n=−∞γne

2πinx, with γn =< f, en >=

1∫0

f(t)e−2πitdt.

For the applications to partial differential equations that we want to discuss, we will need the Fourier series to convergepointwise at each x, along with all the necessary derivatives. We will need extra hypotheses on f to insure such niceconvergence.

You can also view the kth partial sum of the generalized Fourier series for z in Theorem 65 as giving best approximations to

z in the subspace ofK∑n=1

αnvn, αn ∈ C, spanned by the v′ns. Here "best" means that the mean square error∥∥∥∥∥z −

K∑n=1

αnvn

∥∥∥∥∥is smallest when αj = γj =< z, vj > . Another way to say this is to say that the Fourier coefficients give the bestleast squares fit to z. We leave the proof of this as an exercise for the reader.

31 When do we have a Complete Orthonormal Set of Vectors in our ScalarProduct Space?

First recall our definition from the preceding section.

Definition 66 A complete orthonormal set {vn}n≥1 in an scalar product space V means that {vn}n≥1 is an orthonormalset of vectors such that every vector z ∈ V can be expanded in a generalized Fourier series

z =∞∑n=1

γnvn, where γn =< z, vn > .

Here convergence of the series means convergence with respect to the norm ‖w‖ = √< w,w >.Theorem 67 3 Equivalent Conditions for an Orthogonal Set to be Complete.Suppose {vn}n≥1 is an orthonormal set in an scalar product space V . The following are equivalent:1) {vn}n≥1 is complete.

2) Parseval’s Equality holds:∞∑n=0

|γn|2 = ‖z‖2 if γn =< z, vn > .

3) No non-0 vector w ∈ V is orthogonal to all the vn, for n = 1, 2, 3, .....

60

Page 184: Introduction, Motivation, Basics About Sets, Functions, Counting

Exercise. Prove the preceding theorem. Hint: You need only show 1) =⇒ 2) =⇒ 3) =⇒ 1).In order to prove that our favorite sequence {e2πinx, n ∈ Z} is complete for the scalar product space C[0, 1], we will need

to use the theory of convolution and Dirac sequences studied earlier. But now we need different kernels from the Landaukernel or the Gauss kernel.

Definition 68 The nth partial sum of the Fourier series of f ∈ C[0, 1] is defined to be

sn(x) =n∑

k=−nγke

2πikx, with γk =< f, ek >=

1∫0

f(t)e−2πkitdt.

Note that the nth partial sum is always taken to be the symmetric sum of terms from −n to +n. There are 2n+1 termsin the nth partial sum.

Definition 69 The Dirichlet kernel is defined to be

Dn(x) =n∑

k=−ne2πikx.

Definition 70 The Fejér (or Cesàro) kernel is

Kn(x) =1

n

n−1∑k=0

Dk(x).

Theorem 71 Properties of the Dirichlet and Fejér Kernels.Property 1. Suppose f ∈ C[− 1

2 ,12 ]. Then if we assume that convolution is over the interval [− 1

2 ,12 ] and that both Kn

and f vanish off this interval, we find that the convolution of f and the Dirichlet kernel is the nth partial sum of the Fourierseries of f ; i.e.,

(f ∗Dn) (x) =n∑

k=−ne2πikx

12∫

− 12

f(t)e−2πiktdt = sn.

The convolution of f and the Fejér kernel is the average of the first n partial sums of the Fourier series of f ; i.e.

f ∗Kn =1

n{s0 + s1 + · · ·+ sn−1} .

Property 2.

Dn(x) =sin((2n+ 1)πx)

sin(πx); Kn(x) =

1

n

(sin(πnx)

sin(πx)

)2.

Proof. Property 1.

(f ∗Dn)(x) =12∫

− 12

f(t)n∑

k=−ne2πik(x−t)dt =

n∑k=−n

e2πikx

12∫

− 12

f(t)e−2πiktdt.

We leave the rest as an exercise.Property 2. We can use the formula for the sum of a geometric progression to see that

Dn(x) =n∑

k=−ne2πikx = e−2πinx

n∑k=−n

e2πi(k+n)x = e−2πinx2n∑j=0

e2πijx

= e−2πinxe(2n+1)2πix − 1e2πix − 1 = e−2πinxe−πix

e(2n+1)2πix − 1eπix − e−πix

=e(2n+1)πix − e−(2n+1)πix

eπix − e−πix =sin((2n+ 1)πx)

sin(πx).

61

Page 185: Introduction, Motivation, Basics About Sets, Functions, Counting

We leave it as an exercise to prove the formula for Kn.

A plot of D10(x) =sin((2∗10+1)πx)

sin(πx) is in Figure 29. Plots of Dn(x) =sin((2n+1)πx)

sin(πx) , n = 1, 2, ..., 10 are in Figure 30. A plot

of K10(x) =110

(sin(10πx)sin(πx)

)2is in Figure 31. Plots of Kn(x) =

1n

(sin(πnx)sin(πx)

)2, n = 1, 2, ..., 10, are in Figure 32.

Figure 29: A plot of D10(x) =sin((2∗10+1)πx)

sin(πx) .

Figure 32 may make you believe the following Theorem.

Theorem 72 The Fejér kernels form a Dirac sequence of positive type.

Proof. Using

Kn(x) =1

n

n−1∑k=0

k∑j=−k

e2πijx =1

n

(sin(πnx)

sin(πx)

)2,

we need to check 3 things from Definition 59. These are:Dir 1. Kn(x) ≥ 0, for all x and n.

Dir 2.

∞∫−∞

Kn(x)dx = 1, for all n.

Dir 3. ∀ ε > 0 and ∀ δ > 0, ∃ N ∈ Z+ s.t. n ≥ N implies

−δ∫− 12

Kn(x)dx+

12∫δ

Kn(x)dx < ε.

Dir1 is clear. We leave Dir 2 to the reader as an exercise. That leaves Dir 3. Since Kn(−x) = Kn(x) and we areassuming everything vanishes outside [− 1

2 ,12 ], we just need to note that

1

n

12∫δ

(sin(πnx)

sin(πx)

)2dx ≤ 1

n

12∫δ

1

sin2(πx)dx→ 0, as n→∞.

62

Page 186: Introduction, Motivation, Basics About Sets, Functions, Counting

Figure 30: Plots of Dn(x) =sin((2n+1)πx)

sin(πx) , n = 1, 2, ..., 10.

Figure 31: A plot of K10(x) =110

(sin(10πx)sin(πx)

)2.

63

Page 187: Introduction, Motivation, Basics About Sets, Functions, Counting

Figure 32: Plots of Kn(x) =1n

(sin(πnx)sin(πx)

)2, n = 1, 2, ..., 10.

Theorem 73 (Fejér’s Theorem, 1904) Suppose that f ∈ C[0, 1] and f(0) = f(1). Then if Kn denotes the Fejér kernel,

f ∗Kn approaches f uniformly on [0, 1] as n→∞.

This means limn→∞ ‖f ∗Kn − f‖∞ = 0. Here, for bounded functions g on [0, 1], we define ‖g‖∞ = lub{|g(x)| |x ∈ [0, 1]}.

One says that the Fourier series of f is Cesàro summable to f where f is continuous.

Proof. This follows from the fact that Kn is a Dirac sequence of positive type by Theorem 60.

It follows from Fejér’s Theorem 73 that if f ∈ C[0, 1] and f(0) = f(1) then f can be uniformly approximated bytrigonometric polynomials of the form

k∑j=−k

γje2πijx, for γj ∈ C.

For

(f ∗Kn)(x) =1

n

n−1∑k=0

k∑j=−k

γje2πijx, where γj =

1∫0

f(x)e−2πijxdx.

Corollary 74 The set {e2πinx|n ∈ Z} is a complete orthonormal set for C[0, 1].

Proof. If w ∈ C[0, 1] is orthogonal to e2πinx for all n, it follows from Fejér’s theorem that w must be 0.Uniform convergence implies L2-convergence. Thus, if f ∈ C[0, 1] satisfies f(0) = f(1), then it can be uniformly approx-

imated by trigonometric polynomials which implies that it can be approximated in L2-norm by trigonometric polynomials.But this means

f(x) =∞∑

n=−∞γne

2πinx, with γn =< f, en >=

1∫0

f(t)e−2πintdt. (8)

Here convergence of the series is with respect to the L2-norm.

64

Page 188: Introduction, Motivation, Basics About Sets, Functions, Counting

Warning: When we say that the convergence of the Fourier series in formula (8) is in the L2-norm, this does not sufficeusually for what we are thinking is happening. We always hope to have uniform convergence, but we know that convergencewith respect to ‖‖∞ is harder to achieve.Maybe we should write some other symbol than = as we have not claimed that the Fourier series converges for every

point x ∈ [0, 1]. That is false.Note that we need not assume that f has period 1 at least if all we want is ‖‖2−norm convergence, since it is possible

to replace any f ∈ C[0, 1] with a g ∈ C[0, 1] such that g(0) = g(1) and such that ‖g − f‖2 is arbitrarily small. This is anexercise.The following Corollaries follow from Theorem 65 and Corollary 74.

Corollary 75 (Parseval’s Equality for Fourier Series) Suppose that f(x) is piecewise continuous on [0, 1]. Then

1∫0

|f(x)|2 dx =∞∑

n=−∞|γn|2 ,

where γn =< f, en >=

1∫0

f(t)e−2πintdt.

Corollary 76 (Parseval’s Equality for scalar Product)Suppose that f and g are piecewise continuous on [0, 1]. Then

1∫0

f(t)g(t)dt =

∞∑n=−∞

αnβn,

if αn =< f, en >=

1∫0

f(t)e−2πintdt and βn =< g, en >=

1∫0

g(t)e−2πintdt.

Exercise. Prove the preceding Corollary.

32 Pointwise Convergence of Fourier Series

For the applications we need pointwise convergence of the Fourier series and even the ability to differentiate the Fourier seriesterm-by-term. When is this legal? To figure such things out, consider the Dirichlet kernel defined in the last section:

Dn(x) =

n∑k=−n

e2πikx =sin((2n+ 1)πx)

sin(πx).

Recall that we showed in the last section that the convolution of f with the Dirichlet kernel is the nth partial sum of theFourier series of f when the period interval has length 1:

f ∗Dn = sn =n∑

k=−ne2πikxγk, where γk =

12∫

− 12

f(t)e−2πiktdt.

However, the Dirichlet kernel is not as nice as the Fejér kernel because it is not positive.Before proving some theorems, let’s look at a few examples.

Example 1. The step function f(x) =

{1, 0 < x ≤ π,−1, −π ≤ x < 0. Make this periodic of period 2π on the real line.

You need to make the change of variables x = 2πy here. This changes the Fourier series and coefficients to

f(x) =∞∑

n=−∞γne

inx, with γn =< f, en >=1

π∫−πf(t)e−intdt.

65

Page 189: Introduction, Motivation, Basics About Sets, Functions, Counting

Here γ0 = 0 and the Fourier coefficient for n �= 0 is

γn =−12π

0∫−πe−intdt+

1

π∫0

e−intdt =1

(−e−int−in

∣∣∣∣0−π+e−int

−in∣∣∣∣π0

)

=i

2πn

(−1 + einπ + e−inπ − 1) .Then for n = 1, 2, 3, ...

γneinx + γ−ne

−inx =4(−1)n sin(nx)

nπ.

Figure 33 shows a plot of the function f(x) at the top left, then it shows the nth partial sums of the Fourier series forn = 10, 50, 100.

Figure 33: The function f(x) ={

1, 0 < x ≤ π,−1, −π ≤ x < 0. , is shown at top left. The nth partial sums of the Fourier series of

f for n = 10, 50, 100 are shown in the next 3 plots. The Gibbs phenomenon is revealed.

This reveals the Gibbs phenomenon which arises because the function has a jump discontinuity at 0. This sort of thingalways happens at a jump discontinuity. It does not help to take more terms of the Fourier series. There will always be anovershoot of about 9%. See Dym and McKean, Fourier Series and Integrals for more details.The phenomenon had been noted before Gibbs by various people. Gibbs explained the phenomenon, replying to Michael-

son (in a letter to Nature in 1898). Michaelson had become angry that his machine for computing the 1st 80 terms in aFourier series gave a result that was not close enough to the function near the jumps.

The next example shows that convergence is very nice when there are no jumps.Example 2. Consider the function f(x) = |x| for |x| ≤ π. Make this periodic of period 2π on the real line.The graphs in Figure 34 show the function plus the 10th, 50th and 100th partial sums of its Fourier series. The

convergence is quite rapid.

Example 3. Let f(x) = x, if x ∈ [0, 1) and make it periodic of period 1 elsewhere. This produces a sawtooth graph.See Figure 35.

66

Page 190: Introduction, Motivation, Basics About Sets, Functions, Counting

Figure 34: Define f(x) = |x| for |x| ≤ π. Make this periodic of period 2π on the real line. The graphs show at top left f(x)on the period interval, then the 10th, 50th and 100th partial sums of its Fourier series.

Figure 35: The sawtooth function - a function of period 1 defined by f(x) = x, for x ∈ [0, 1).

67

Page 191: Introduction, Motivation, Basics About Sets, Functions, Counting

One finds that the Fourier coefficients of the sawtooth function are

γn =

1∫0

xe−2πinxdx ={

12 , n = 0,−12πin , n �= 0.

Then the Parseval identity implies

1

3=

1∫0

x2dx =∞∑

n=−∞|γn|2 =

1

4+

2

4π2

∞∑n=1

1

n2.

Therefore we have Euler’s formulaπ2

6=

∞∑n=1

1

n2= ζ(2).

The 1st step in showing the convergence of Fourier series of differentiable functions is the following Lemma which is alsouseful in quantum mechanics. We could weaken the hypothesis to assume only that the function f is Lebesgue integrableon [0, 1].

Lemma 77 Riemann-Lebesgue Lemma. Suppose that f(x) is a piecewise continuous function on [0, 1]. Then

limn→∞

1∫0

f(x)e2πinxdx = 0.

That is, the nth Fourier coefficient of f approaches 0 as n approaches ∞.

Proof. Parseval’s equality for f says that

1∫0

|f(x)|2 dx =∞∑

n=−∞|γn|2 . Since the integral is finite, the series must converge.

It follows that the terms of the sum approach 0 as n→∞.

Note that the hypothesis of the Riemann-Lebesgue Lemma could be weakened considerably were we to discuss Lebesgueintegrable functions.Now we investigate the speed of convergence of the Fourier series to the function. We show that the more continuous

derivatives f has, the faster the Fourier series converges to f. This sort of theorem goes back to Dirichlet (1829).

Theorem 78 Smoothness of f increases speed of convergence of Fourier series of f .Suppose f : [0, 1] → C, with f(0) = f(1). If the pth derivative of f is continuous for some p ≥ 1, then the nth partial

sum of the Fourier series of f converges uniformly to f ; i.e.,

sn =n∑

k=−ne2πikxγk, where γk =

12∫

− 12

f(t)e−2πiktdt

‖f − sn‖∞ → 0, as n→∞.Moreover there is a positive constant b such that

‖f − sn‖∞ ≤ b

np−12

.

Proof. We will take our interval to be [− 12 ,

12 ]. Since our functions have period 1, this is no problem. It is an exercise to

show that if Dn(x) =n∑

k=−ne2πikx = sin((2n+1)πx)

sin(πx) , is the Dirichlet kernel, then

12∫

− 12

Dn(t)dt = 1.

68

Page 192: Introduction, Motivation, Basics About Sets, Functions, Counting

By the same trick that was used in proving Fejér’s theorem, we have

(sn − f)(x) = (Dn ∗ f − f)(x) =− 12∫

− 12

(f(x− y)− f(x))Dn(y)dy

=

− 12∫

− 12

{f(x− y)− f(x)

sin(πy)

}sin (π (2n+ 1) y) dy.

Next note that the quantity in brackets is continuous at y = 0 (the only place in the interval [− 12 ,

12 ] where something

could go wrong), since f ′(x) exists and is continuous. The reason is that

f(x− y)− f(x)sin(πy)

=f(x− y)− f(x)

y

y

sin(πy),

which approaches −f ′(x) · 1π as y → 0. Then we can apply the Riemann-Lebesgue Lemma to see that (sn − f)(x)→ 0, asn→∞.How do we estimate ‖f − sn‖∞? To do this use the fact that we now know that we have pointwise convergence of the

Fourier series; i.e.,

f(x) =∞∑

n=−∞γne

2πinx, with γn =< f, en >=

1∫0

f(t)e−2πintdt.

If we write γn = f̂(n), then it is an exercise to show that

f̂ (p)(n) = (2πin)pf̂(n). (9)

Now we obtain our estimate as follows making use of the Cauchy-Schwarz inequality for the scalar product space of

sequences {αn}n∈Z , where 〈{αn} , {βn}〉 =∞∑

n=−∞αnβn and ‖{αn}‖ =

∞∑n=−∞

|αn|2 . Cauchy-Schwarz says

|〈{αn} , {βn}〉| ≤ ‖{αn}‖ ‖{βn}‖ .

Now let’s finish the proof. Note that using the triangle inequality and formula (9)

|(s− fn) (x)| =∣∣∣∣∣∣∑|k|>n

f̂(k)e2πikx

∣∣∣∣∣∣ ≤∑|k|>n

∣∣∣f̂(k)∣∣∣ = ∑|k|>n

∣∣∣f̂ ′(k)∣∣∣2π |k| .

Now the Cauchy-Schwarz inequality and Bessel’s inequality say that

|(s− fn) (x)|2 ≤ 1

4π2

∑|k|>n

∣∣∣f̂ ′(k)∣∣∣2 ∑|k|>n

1

|k|2 ≤1

4π2‖f ′‖22

∑|k|>n

1

|k|2 .

We know that∑|k|>n

1|k|2 ≤ 1

n by the proof of the integral test. This completes the proof when p = 1. We leave it as an

exercise to prove the result for general p ≥ 1.

We can, in fact, allow the function f to have a jump discontinuity at a point c. Then, assuming that f has right- andleft- hand derivatives at c, the Fourier series converges to 1

2 (f(c+) + f(c−)) , where f(c+) denotes the right-hand limit and

f(c−), the left-hand limit at c. We saw this in Figure 33. There are actually weaker hypotheses that suffice. See the manyreferences on Fourier series for these things.

69

Page 193: Introduction, Motivation, Basics About Sets, Functions, Counting

1

∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀

Exercises 1 ∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀ 1) a) Suppose �x�α is a norm on a vector space E and �x�β is a norm on a vector space F. Show that you can make the Cartesian product E×F={(x,y) | x∈E and y∈F} into a vector space. b) Show that the Cartesian product E×F has a norm given by �(x,y)�= �x�α + �y�β . c) Consider the special case that E=F=� with �x�α=�x�β=|x|=the ordinary absolute value of x. Using the definition of norm in b), draw a picture of the region in the plane given by the set of points {(x,y)∈�2| �(x,y)-(1,2)�<3 }. 2) a) Suppose the E is an inner product space with inner product denoted by <x,y> for x,y in E. Then use the norm �v� =(<v,v>)1/2. Prove that �v+w�2+�v-w�2=2�v�2+2�w�2. b) Let E=�2 and draw a picture to explain why the formula in part a) is called the parallelogram law. 3) Suppose that E=C[0,1], the space of continuous real-valued functions on the interval [0,1].

a) Prove that the L1 norm 1

( )b

a

f f x dx= � is a norm.

b) Explain why C[0,1] is an infinite dimensional vector space.

4) Take the vector space E of problem 3. Show that if 1

2

20

( )f f x dx= � , then we have

�f�1 ≤ �f�2. Hint. Use the Cauchy-Schwarz inequality. What happens to the norm �f�2 if we replace the interval [0,1] with the interval [a,b]? 5) Let E = �2 using the usual inner product (i.e., the dot product) and the usual Euclidean norm. Under what conditions on 2 vectors x,y in �2 do we have equality in the Cauchy-Schwarz inequality |<x,y>|≤ �x� �y� ? Prove your answer. 6) Which of the following are norms on the 1-dimensional vector space E=� ? Give reasons for your answers. a) |x| = ordinary absolute value of x in �. b) x2. c) |x|/(1+|x|) d) 2|x| e) �x�=1 if x≠0 and �x�=0 if x=0 f) x3. 7) Which of the following give equivalent norms on C[a,b]? Why?

1

( )b

a

f f x dx= � , 2

2( )

b

a

f f x dx= � ,

8) Define for x=(x1,x2) in �2, the p-norm �x�p=( |x1|p + |x2|p)1/p for any p=1,2,3,4,.... Prove that

{ }1 2lim max , .pp

x x x x∞→∞

= =

[ , ]max ( ) .x a b

f f x∞ ∈

=

Exercises II

Page 194: Introduction, Motivation, Basics About Sets, Functions, Counting

2

∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀

Exercises 2 ∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀

1) Suppose that nn

n

xv

y �

= � �� �

is a sequence of vectors in �2. Show that if we define limits for sequences of

vectors in �2 using the norm v∞

, then

lim nn

av L

b→∞

�= = � �� �

if and only if lim nnx a

→∞= and lim .nn

y b→∞

=

2) Consider the sequence of functions on [0,1] given by fn(x) = xn, for 0≤x≤1.

a) Show that for each x∈[0,1] we have lim ( ) ( )nnf x f x

→∞= exists (defining f(x) to be this limit). This is

called a pointwise limit. Note that f(x) is only piecewise continuous on [0,1]. b) Show that the limit in part a) is not uniform; i.e., nf f

∞− does not → 0 as n →∞.

c) What happens to 1nf f− and

2nf f− as n →∞ ? Here both integrals are over [0,1].

3) a) Show that the sequence of partial sums of the series 0

(1 )n

n

x x∞

=

−� converges pointwise (i.e., for each x

in [0,1]) but not uniformly on [0,1].

b) Show that the sequence of partial sums of the series 0

( 1) (1 )n n

n

x x∞

=

− −� converges uniformly on [0,1].

4) Consider the sequence of functions defined by fn(x) = 1/(x+n), for n=1,2,3,4, with x∈(0,∞). We have 4

notions of convergence: pointwise for each fixed x∈(0,∞) and convergence with respect to the norms

1 2, , and

∞. Here the integrals are computed on the interval (0, ∞). Under which

definition of convergence does fn converge to the 0-function as n→∞?

5) Define L:�n→�m to be linear iff L(αx+βy) = αLx+βLy for all x,y in �n and α,β in �. Show that L is

uniformly continuous on �n.

∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀

Exercises 3 ∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀

1) Suppose that <,> is a scalar product on a vector space V and suppose that we have a subset S of V with a vector a adherent to S. Let f and g be functions mapping S into V such that the two following limits exist lim ( )x ax S

f x L→∈

= and lim ( )x ax S

g x M→∈

= . Prove that then lim ( ), ( ) , .x ax S

f x g x L M→∈

=

Page 195: Introduction, Motivation, Basics About Sets, Functions, Counting

3

2) Define C[a,b] to be the space of continuous real valued functions on the finite interval [a,b]. Define the

mapping I:C[a,b]→� by ( ) ( )b

a

I f f x dx= � for f∈C[a,b].

a) Is I continuous when we use the � �∞ norm on C[a,b] and ordinary absolute value as our norm on �?

Why? b) What if we use the � �1 norm on C[a,b]? Why?

3) Consider the following functions on �2

2

2 2

2 , ( , ) (0,0)( , )

0, ( , ) (0,0)

x y x yf x y x y

x y

�≠�= +�

� =

2 2 2

4 4

( ) , ( , ) (0,0)( , )

0, ( , ) (0,0)

y x x yg x y x y

x y

� − ≠�= +�� =

a) Does 0 0 0 0

limlim ( , ) limlim ( , )x y y x

h x y h x y L→ → → →

= = when h=f or h=g ?

b) Do the repeated limits in part a) equal the limit as (x,y)→(0,0) using any of the favorite norms on �2?

That is, does ( , ) (0,0)

lim ( , )x y

h x y L→

= ?

c) Are either of the functions f,g continuous at the origin?

4) Let `2 denote the set of all sequences a={an}n≥1 of real numbers such that 2

1n

na

=� converges.

a) Show that you can define addition and scalar multiplication to make `2 a vector space.

b) Define for a={an}n≥1 and b={bn}n≥1 in `2, <a,b>=1

n nn

a b∞

=� . Show that this series converges and has the

properties of a scalar product. You will need to use the Cauchy-Schwarz inequality on the partial sums.

∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀ Exercises 4

∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀ 1) State whether the following series converge and give a reason for your answer.

a) 0

12nn

=� b)

1

1n n

=� c) 2

1

1n n

=� d)

1

( 1)n

n n

=

−� .

2) Suppose that the series 1

nn

a∞

=� converges absolutely. Show that 2

1n

n

a∞

=� also converges absolutely.

3) Test for convergence and explain your answer.

a) 3

0

n

n

n e∞

=� b)

log

2

1log

n

n n

=

�� �� �

� .

Page 196: Introduction, Motivation, Basics About Sets, Functions, Counting

4

4) a) Does the series 1 1

n

n

n xn

= +� converge pointwise for each x in (0,1)? Does this series converge

uniformly on (0,1)?

b) Show that the series 1 3

n

nxn

x∞

=� converges uniformly for x∈ [0,C].

∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀ Exercises 5

∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀ 1) Find the radius of convergence of the following power series.

a) 1

n

n

nx∞

=� b)

1

1 n

n

xn

=� c)

1

n n

n

n x∞

=� d)

1

12

nn

n

x∞

=� .

2) What functions f(x) are represented by the power series in parts a), b) and d) of problem 1. Compute

the power series for the derivatives f’(x) of these 3 functions. For what values of x is it legal to differentiate term by term?

3) Find the Taylor series at 0 for the function 1 x− . 4) For an integer n, the Bessel function Jn(x) is defined by the power series

2

0

( 1)( )!( )! 2

k nk

nk

xJ xk n k

+∞

=

− �= � �+ � �� .

Find the radius of convergence of this power series and then show that the Bessel function satisfies Bessel’s equation

2 " ' 2 2( ) ( ) ( ) ( ) 0.n n nx J x xJ x x n J x+ + − = ∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀

Exercises 6 ∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀

For most of these problems, it should help to draw a picture. 1) Suppose V is a normed vector space with norm denoted by � �. For any subset A of V define the closure of

A to be the set

{ }0, . . .A x V a A s t x aδ δ= ∈ ∀ > ∃ ∈ − <

Show that if A⊂B, then A B⊂ . 2) Using the definition of closure of a set in problem 1, show that

2 2x xx y x y

y y� � � � � �� � � �∈ < = ∈ ≤� � � � � �� � � �� � � � � �

� � .

Page 197: Introduction, Motivation, Basics About Sets, Functions, Counting

5

3) Let f(x)=x2, for all x∈[0,1]. Find a step function s(x) on the interval [0,1] such that � f-s �∞ < 1/10.

4) Define the function f(x) = x sin(1/x), when x≠0, and f(0) = 0. Define g(x) = 1 if x>0, g(x) = -1 if x<0,

and g(0)=0. Show that h(x)=g(f(x)) is not the (uniform) limit of a sequence of step functions on the interval [0,1], using the norm � �∞.

5) Show that for 2 equivalent norms � �α and � �β on a vector space V that the open sets, closed sets and

continuous functions are the same no matter which of the 2 norms you use.

∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀ Exercises 7

∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀ 1) a) A Mean Value Theorem. Suppose that f,g:[a,b]→� are continuous on [a,b] and g(x)≥0 for all

x∈[a,b]. Show that there exists c∈[a,b] such that

( )b b

a a

fg f c g=� � .

b) Show that the result of problem 1 need not be true if g(x) can assume both positive and negative values on [a,b].

2) Define a b

b a

f f= −� � . Show that b c b

a a c

f f f= +� � � no matter what the ordering of the points a,b,c is.

There are 6 possible orderings starting with a<c<b, a<c<b, etc.

3) Let g:[a,b]→� be continuous and g(x) ≥ 0 for all x∈[a,b]. Show that if 0b

a

g =�

then g(x)=0 ∀x∈[a,b].

4) Let f:[0,1]→� be defined by f(x)=1 for all x in [0,1]. Define F by

1, 0( ) , 0 1

0, 1.

xF x x x

x

=��= < <�� =

Clearly F’(x)=f(x) for all x∈(0,1). However F(1)-F(0) ≠ 1

0f� .

What went wrong with the fundamental theorem of calculus here?

Page 198: Introduction, Motivation, Basics About Sets, Functions, Counting

6

∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀

Exercises 8 ∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀ 1) a) Suppose that f(x)=1, for 0≤x≤1 and f(x)=0, otherwise. Compute (f * f)(x). Draw the graphs of f and f * f. b) Suppose that f(x) is as in part a). Define g(x) = 1-x2, for all x. Compute f * g. Draw the graphs of g and f * g. 2) Suppose that α is a real number and f,g,h are piecewise continuous functions such that h vanishes outside

the interval [-c,c]. Show that a) (f+g) * h = f * h + g * h b) (αg) * h = α(g * h). c) Suppose that f,g,h are piecewise continuous functions that vanish outside an interval [-c,c]. Show that (f * g) * h = f * (g * h). 3) (Delta is not a function). Suppose that there is a piecewise continuous function δ such that

( ) ( ) (0)f t t dt fδ∞

−∞

=� for all continuous functions f which vanish outside an interval [-c,c] for some c.

Show that this implies δ(x)=0, for all x≠0 and thus that ( ) ( ) 0.f t t dtδ∞

−∞

=� A similar result could be

proved assuming only that δ is a Lebesgue integrable function on finite intervals. (In fact, δ is not an ordinary function. It is a generalized function or distribution.)

4) Define the Gaussian by 2

41( )4

xt

tG x etπ

= . Show that G1/n, for n=1,2,3,.... is a Dirac sequence.

∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀

Exercises 9 ∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀ 1) Compute the Fourier series of the function f(x)=x2 for x in[0,1) and extended to all real numbers to be periodic of period 1. What does the Parseval identity say for this function?

2) Suppose that f is a continuous function on [0,1]. Show that if 1

0

( ) 0nf x x dx =� for all n=0,1,2,3, .... , then f must be identically

0. You will need to remember the Weierstrass theorem on uniform approximation here.

Page 199: Introduction, Motivation, Basics About Sets, Functions, Counting

1

****************************************************************************************************����PRACTICE EXAM 1

**************************************************************************************************** 1) Define and give an example: a) norm; b) scalar product; c) equivalent norms; d) limit of a sequence {vn} in a normed vector space; e) Cauchy sequence in a normed vector space; f) complete normed vector space; g) closure of a set in a normed vector space;

for function f:D→W, with V ⊃ { x | 0<�x-c�<δ } for some δ>0. h) Here V,W are normed vector spaces.

i) continuous function f mapping a set in normed vector space V to another normed vector space; j) uniform convergence of a sequence {fn} of functions fn ∈C[a,b], the space of continuous real-valued functions on an interval [a,b]

k) � f �∞ , � f �1 , � f �2 for f ∈ C[a,b] = {continuous real valued functions on the interval [a,b] } l) the cosine of the angle between 2 non-0 vectors v,w in a vector space V with a scalar product

2) True - False. Tell whether the following statements are true or false. Give a brief reason for your answer. a) C[a,b] the space of continuous real-valued functions on an interval [a,b], is a finite dimensional vector

space.

b) For 1 2 2 21 2

2

,v

v v v vv �

= ∈ = +� �� �

� defines a norm on vectors in the plane.

c) The function 2

2 2( , ) xf x yx y

=+

, for (x,y)≠(0,0) and f(0,0)=0 is continuous on the plane �2.

d) Suppose that fn is a sequence of continuous functions on the interval [a,b]. Suppose for every x in [a,b] we have lim ( ) ( ).nn

f x f x→∞

= Then f(x) is continuous on [a,b].

e) �2 is a complete normed vector space.

3) State and prove the Cauchy-Schwarz inequality. What norm is being used in this inequality? Would it still be true if we replaced that norm with some other one? 4) Suppose that � �α and � �β are equivalent norms on a vector space V. Show that if {vn} is a sequence of vectors in V and L∈V, we have

lim lim .n nn nv L v Lα β→∞ →∞

− ⇔ −

5) Suppose that <v,w> denotes a scalar product of 2 vectors v,w in the vector space V.

Show that <v,w> is a continuous function of v, holding w fixed. Is it uniformly continuous? 6) Suppose that L:�2→�2 is a linear map. Show that it is continuous.

( )limx c

f x L→

=

Page 200: Introduction, Motivation, Basics About Sets, Functions, Counting

2

7) Show that f:U→W , where U is a subset of a normed vector space V and W is a normed vector space, is continuous at a point a in U if and only if for every sequence vn in U such that lim nn

v a→∞

= we have ( )lim ( ).nnf v f a

→∞=

8) The function I mapping C[a,b]=the space of continuous real valued functions on the interval [a,b]

into � defined by ( )b

aI f f= � is continuous with respect to the � �∞ norm on C[a,b] and the usual

absolute value on �.

9) Show that the norms � �∞ and � �1 on the space C[a,b]=the space of continuous real valued functions on the interval [a,b] into � are not equivalent.

10) Explain how the picture below can be used to see the difference between �f-g�∞ and �f-g�1 assuming that f is the purple function which starts at the top and g is the blue function which starts at the bottom. ****************************************************************************************************

Page 201: Introduction, Motivation, Basics About Sets, Functions, Counting

1

****************************************************************************************************����PRACTICE EXAM 2

**************************************************************************************************** 1) Define and give an example:

a) radius of convergence of 0

nn

na x

=� b) adherent point to a set A in a normed vector space V

c) continuous linear map L:V→W, where V,W are vector spaces d) step function on the interval [a,b] e) Riemann integral of a step map f) partition of an interval [a,b]

g) convergence of a series nv� in a normed vector space

2) a) State and prove the integral test. b) State and prove the comparison test.

3) Which of the series below converge and why?

a) 2

1( 1)n n n

= −� b) 2

( 1)log( )

n

n n

=

−� c) 2

1( 1)n n n

= −�

d) 21

sin( )2n

nn n

= −� e) 1 2nn

n∞

=�

4) True - False. Tell whether the following statements are true or false. Give a brief reason for your answer.

a) 0 0

n nn n

n na x converges implies a u converges

∞ ∞

= =� � for all u such that |u| < |x|.

b) ( )1

1 2 1.n

n

x converges if xn

=

− <�

c) fn continuous on [a,b] and d) Define f(x)=0 for x irrational and f(x) = 1 for x rational. Then [ , ]f St a b∈ =the space of regulated functions on [a,b]. e) Any bounded function on [a,b] can be uniformly approximated by step functions and is thus integrable. f) Any piecewise continuous function on [a,b] can be uniformly approximated by step functions and is thus integrable.

g) If 1

lim 0,n nn na then a

→∞=

= � is convergent.

5) a) State and prove a formula for the radius of convergence of 0

nn

n

a x∞

=� .

b) Apply your formula to find the radius of convergence of 1

1 n

nxn

=� .

c) What function does this power series represent within the interval where it converges?

lim ( ) ( ) [ , ] lim .b b

n na a

f x f x x a b f fn n

= ∀ ∈ � =→ ∞ → ∞ � �

Page 202: Introduction, Motivation, Basics About Sets, Functions, Counting

2

6) a∈the closure of A= . . lim .n nA a sequence a A s t a an⇔ ∃ ∈ = →∞

7) a) Define �g�∞ = l.u.b. { |g(x)| | x∈[0,1] }, where l.u.b.=least upper bound. Consider the function f(x) = x3. Find a step function s on [0,1] such that �f-s�∞ ≤ .25.

b) Compute 1

0s� .

8) a) State and prove the continuous linear extension theorem.

b) How did we use the continuous linear extension theorem to extend the integral from step functions to continuous functions?

9) Suppose that f is a step function on [a,b] with respect to partition P and g is a step function on [a,b] with

respect to partition Q. Show that if R is a refinement of P and Q , then both f and g are step functions with respect to R and I(f+g,R ) = I(f,R ) + I(g,R ).

10) Show that the map which sends step function f to b

af� is continuous with respect to the � �∞ norm on the

vector space St[a,b] of step functions on [a,b]. 11) Show that any continuous function on a finite closed interval is uniformly continuous on that interval.

12) a) Define the integral b

af� for any function f which is a uniform limit of a sequence sn of step

functions on [a,b]. b) Explain why you know that this integral is a continuous linear function of f, where continuity is with respect to the � �∞ norm on the space of bounded functions on [a,b]. 13) Explain why the integral has the following 2 properties on the space C[a,b] of continuous functions on the finite interval [a,b].

a) the integral preserves ≤

b) for any c with a<c<b we have b c b

a a c

f f f= +� � � .

14) Fundamental Theorem of Calculus. Suppose f:[a,b]→� is continuous. Show that then

( ) ( )x

a

F x F a f− = � for every x∈[a,b] ⇔ F’(x)=f(x), ∀x∈[a,b].

15) Let a<b<c. Suppose that f(x) is continuous on [a,b] and we define g(x) to be equal to f(x) for all x in [a,c)

and for all x in (c,b] but we define g(c) = f(c) + 100. Show that .b b

a a

f g=� �

Page 203: Introduction, Motivation, Basics About Sets, Functions, Counting

12/11/2010

1

Vibrating Things

Resonance is what happens when something is forced to vibrate at certain frequencies - -unbounded oscillations.

In “real” life this would cause the spring to self destruct Such things can happen in bridges ordestruct. Such things can happen in bridges or other structures. Most texts say the Tacoma Narrows bridge disaster was an example of such resonance. However, the latest research questions this conclusion.

The original Tacoma Narrows Bridge opened to traffic on July 1, 1940. It collapsed just four months later during a 42-mile-per-hour wind storm on Nov. 7.

II, Discussion of Final

Page 204: Introduction, Motivation, Basics About Sets, Functions, Counting

12/11/2010

2

M. Braun, Differential Equations and their Applications: “When the bridge began heaving violently, the authorities notified Professor F. B. Farquharson of the University of Washington. Professor Farquharson had conducted numerous tests on a simulated model of the bridge and had assured everyone of its stability. The professor was the last man on the bridge. Even when the span was g ptilting more than twenty-eight feet up and down, he was making scientific observations with little or no anticipation of the imminent collapse of the bridge. “

“A large sign near the bridge approach advertised a local bank with the slogan ‘as safe as the Tacoma Bridge.’”

“After the collapse of the Tacoma Bridge, the n f th st t f W shin t n m d n m ti n lgovernor of the state of Washington made an emotional

speech in which he declared ‘We are going to build the exact same bridge, exactly as before.’ Upon hearing this, the noted engineer Von Karman sent a telegram to the governor stating ‘If you build the exact same bridge exactly as before, it will fall into the exact same river exactly as before.’”

Non Forced Vibration of a String.Let’s look at the wave equation which describes the motion ofa vibrating string. Assume the string has constant densityρ and constant tension τ. Then one can derive the followingPDE known as the wave equation, using Newton’s law ofmotion of the principle of least action:motion of the principle of least action:

(1) , 0<x<π, 0<t.

One assumes, for example, that the string is tied down at the boundary points giving the boundary conditions:

2 2

2 2

u ut x

τρ

∂ ∂=∂ ∂

(2) u(0,t)=u(π,t)=0, for all t>0.

And one may assume initial conditions

(3) u(x,0)=f(x), for 0<x<π.( ,0) 0u xt

∂ =∂

Page 205: Introduction, Motivation, Basics About Sets, Functions, Counting

12/11/2010

3

The method of separation of variables of Daniel Bernoulli says: look for a solution of the PDE in formula (1) of the form u(x,t)=X(x)T(t). If you want this to satisfy (2), assume X(0)=X(π). If you want it to satisfy (3), you are in trouble for the 1st

part, but the 2nd part becomes T ’(0)=0.N pl (x t) X(x)T(t) int f m l (1)

2 2u uτ∂ ∂Now plug u(x,t)=X(x)T(t) into formula (1) You get (setting c=τ/ρ)

X(x)T”(t)=cT(t)X”(x).Divide both sides by X(x)T(t) (hoping you are not dividing by 0). This gives:

(4) ."( ) "( )( ) ( )

T t X xcT t X x

=

2 2

u ut x

τρ

∂ ∂=∂ ∂

This implies each side is constant. Call the constant λ. It is often called the separation constant. It is an eigenvalue in the 1st ODE below.

Exercise 1. Prove that each side in equation (4) must be constant.

Thus we now have 2 ODES to solve. Here assume c=1 in (4).ODE 1. X”(x) = λX(x), 0 = X(0) = X(π).ODE 2. T”(t) = λT(t), T ’(0) = 0.

Look at ODE1. The general solution from Math. 20D is ,

with constants ai To satisfy the boundary conditions we( ) ( )1 2( ) exp expX x a x a xλ λ= + −

with constants ai. To satisfy the boundary conditions, we need λ<0. This means, since eix=cosx+isinx when i=(-1)½, that we should write, for λ=-μ2,

with constants cj.In order to satisfy the 1st boundary condition, we need

X(0)=c1=0. This makes X(x)=c2sin(μx). The 2nd boundary condition is X(π)=c2sin(μπ)=0. This says

( ) ( )1 2( ) cos sinX x c x c xμ μ= +

μπ=nπ, for some integer n=1,2,3,.....So we find the separation constant in equation (4) is

λ=-μ2=n2, for some integer n=1,2,3,.....Then(5) X(x)=c2sin(nx), for n=1,2,3,.....

Page 206: Introduction, Motivation, Basics About Sets, Functions, Counting

12/11/2010

4

Look at ODE2. Math 20D says that the general solution may be taken to be

Since T ’(0) = nb2 = 0, we see that (6) T(t) b (nt) f n 1 2 3

( ) ( )1 2( ) cos sinT t b nt b nt= +

(6) T(t) = b1cos(nt), for n=1,2,3,.....

Now turn to the problem of the 1st part of the initial condition (3). For this you need to be able to write the function

f(x)=X(x)T(0) = c2sin(nx). But, what should we do if the initial shape of the string is not a sine function; e.g., the plucked string pictured f ; g , p g pbelow.

0 1 2

1

Page 207: Introduction, Motivation, Basics About Sets, Functions, Counting

12/11/2010

5

To solve the plucked string problem you need to represent the function f(x)=u(x,0) as a Fourier sine series:

(7) .

Then the constants cn are given by the formula1

( ) sin( )nn

f x c nx≥

=�n g y

(8)

This is proved in Lang, p. 318, for sufficiently smooth functions f. Of course, our plucked string function does not look smooth, just continuous. It has that sharp point, remember.

Anyway our final solution to the vibrating string problem

0

( )sin( ) .nc f y ny dyπ

= �

Anyway our final solution to the vibrating string problem is:

(9)

Here the constants cn are from formula (8) and we assume τ=ρin the PDE (1). Exercise 2. Check (9) solves the PDE.

1

( , ) sin( )cos( ).nn

u x t c nx nt≥

=�

Forced Motion of a Vibrating StringAssume the vibrating string is as before but now

apply an external force of the form f(x)cos(ωt). This leads to the PDE:

( )( ) costt xxu u f x tτ ωρ

− =

Assume 1=τ/ρ, for simplicity. So our problem is now

(10)

(0, ) ( , ) 0; ( ,0) ( ,0) 0.tu t u t u x u xρ

π= = = =

( )( ) cos(0, ) ( , ) 0; ( ,0) ( ,0) 0.tt xx

t

u u f x tu t u t u x u x

ωπ

− == = = =

To solve (10), plug in

1

( , ) ( )sin( ).nn

u x t c t nx≥

=�

Page 208: Introduction, Motivation, Basics About Sets, Functions, Counting

12/11/2010

6

Define the inner product for piecewise continuous functions f, g on [0,π] by(12)

Then 0

, ( ) ( ) .f g f x g x dxπ

= �

0

sin( ) sin( )( ) ( , ), ( , )2 2

nn nxc t u t u x t dx

π

π π∗= ∗ = �

Here we use the fact that

Exercise 3. Prove the last formula and then use it to show that

v (x)= n=1 2 3 forms an orthonormal family for

2 2

2

0

sin ( ) .2

nx dxπ π=�

sin( )nxvn(x)= ,n=1,2,3,.... forms an orthonormal family for

the inner product space of piecewise continuous functions on [0,π] using the inner product defined by formula (12).

If we plug formula (11) into (10) without worrying about our issues of interchange of derivative and summation, we see that we need

( )0 0

( ) ( , ) ( ) ( , ) ( ) cos( ) ( )n tt n xx nc t u c t v x dx u x t f x t v x dxπ π

ω′′ = = +� �

Exercise 4. Show this last formula holds using integration by parts and the boundary conditions.

So from our last formula, we have

0 0

"( , ) ( ) cos( ) ( ) ( ) .n nu x t v x dx t f x v x dxπ π

ω= +� �

( ) ( , ) ( ) cos( ) ( ) ( )n n nc t n u x t v x dx t f x v x dxπ π

ω′′ = − +� �

So now we need to solve an ODE :

0 02 ( ) , cos( ).n nn c t f v tω= − +

( )2( ) ( ) , cos , (0) (0) 0n n n n nc t n c t f v t c cω′′ ′= − + = =

Page 209: Introduction, Motivation, Basics About Sets, Functions, Counting

12/11/2010

7

You can solve this using methods from Math. 20D – for example, variation of parameters. The result is:

Exercise 5. Check this answer.

( )2 2

,( ) cos( ) cos( ) .nn

f vc t t nt

ω= −

Thus a solution for the problem posed by formula (10) is

(13)

Exercise 6. What happens to formula (13) as ω→±n???

2 21

cos( ) cos( ) sin( )( , ) ( ), , , ( ) .2

n n n n nn

t nt nxu x t a v x a f v v xn

ωω π

=

−= = =−�

pp ( )You need to compute

There is a similar result as ω→-n. You should be able to deduce from this that unless an=0, the function u(x,t) blows up as t goes to infinity. This is resonance.

2 2

cos( ) cos( )lim .n

t ntnω

ωω→

−−

Mathematica plots a vibrating square drum:Animate[Plot3D[(Sin[5*Pi*x]*Sin[3*Pi* y]-Sqrt[2/3]*Sin[3*Pi*x]*Sin[5*Pi*y])*Cos[Sqrt[2]*Pi*t/10000],{x,0,3/2},{y,0,3/2},PlotPoints->80,ColorFunction->Hue,Mesh->False,FaceGrids->None,Axes->False,Boxed->False],{t,0,10000}]

Page 210: Introduction, Motivation, Basics About Sets, Functions, Counting

12/11/2010

8

Page 211: Introduction, Motivation, Basics About Sets, Functions, Counting

1

∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε �����������������������������������������������������������������∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε∃δ∃∀∂∞�∀ε

The test concerns Fourier series and an application to resonance. Part I. Vibrating Bridges & Resonance. Read the part of the lectures telling the story of Fourier series. It might also help to remember some things from the ODEs part of calculus. For example, in problem 18 on page 215 of Boyce and DePrima the ODE represents the forced motion of a vibrating spring: u”+u = 3cos(ft.). u(0)=0=u’(0). You first find the solution assuming f is not 1 or -1. Then use l’Hopital’s rule to see what happens if you take the limit as f approaches 1. You should see the spring blow up! This phenomenon is called “resonance.” That is what happens when something is forced to vibrate at certain frequencies One sees unbounded oscillations. In “real” life this would cause the spring to self destruct. Such things can happen in bridges or other structures. Most texts say the Tacoma Narrows bridge disaster was an example of such resonance. However, the latest research questions this conclusion. The Millenium pedestrian bridge in London showed such resonance and was closed for a time and fixed up to keep it from resonating with the many pedestrians walking across. Solders are required not to walk in step across bridges for similar reasons. The original Tacoma Narrows Bridge opened to traffic on July 1, 1940. It collapsed just four

months later during a 42-mile-per-hour wind storm on Nov. 7. There are many web sites devoted to the subject. See a movie of the collapse http://www.archive.org/details/SF121 or http://www.ketchum.org/bridgecollapse.html Or see Judith Packer’s website at the U. of Colorado http://spot.colorado.edu/~packer/Fourier09.html

The London Millennium Footbridge is a pedestrian-only steel suspension bridge crossing the River Thames in London, England. The bridge opened on June 10, 2000 but unexpected resonant lateral vibration caused the bridge to be closed on June 12 for modifications. The movements were produced by the sheer numbers of pedestrians (90,000 users in the first day, with up to 2,000 on the bridge at any one time). M. Braun, Differential Equations and their Applications, notes: “There were many humorous and ironic incidents associated with the collapse of the Tacoma Bridge. When the bridge began heaving violently, the authorities notified Professor F. B. Farquharson of the University of Washington. Professor Farquharson had conducted numerous tests on a simulated model of the bridge and had assured everyone of its stability. The professor was the last man on the bridge. Even when the span was tilting more than twenty-eight feet up and down, he was making scientific observations with little or no anticipation of the imminent collapse of the bridge. “ “A large sign near the bridge approach advertised a local bank with the slogan ‘as safe as the Tacoma Bridge.’” “After the collapse of the Tacoma Bridge, the governor of the state of Washington made an emotional speech in which he declared ‘We are going to build the exact same bridge, exactly as before.’

Page 212: Introduction, Motivation, Basics About Sets, Functions, Counting

2

Upon hearing this, the noted engineer Von Karman sent a telegram to the governor stating ‘If you build the exact same bridge exactly as before, it will fall into the exact same river exactly as before.’” Another famous bridge failure due to resonance: Angers Bridge, in Angers, France, 16 April 1850. The collapse was due to resonance from marching soldiers. Part II. Short course in series expansions and PDEs like the wave equation. Applied Math. involves solving certain partial differential equations, for example, the wave equation, heat equation, Schrödinger’s equation. One way to find solutions for such PDEs is the method of separation of variables. Usually this leads to Fourier series or generalized Fourier series. Let’s look at the wave equation which describes the motion of a vibrating string. Assume the string has constant density ρ and constant tension τ. Then one can derive the following PDE known as the wave equation, using Newton’s law of motion of the principle of least action:

(1) 2 2

2 2

u ut x

τρ

∂ ∂=∂ ∂

, 0<x<π, 0<t.

One assumes, for example, that the string is tied down at the boundary points giving the boundary conditions: (2) u(0,t)=u(π,t)=0, for all t>0. And one may assume initial conditions (3) u(x,0)=f(x), for 0<x<π. The method of separation of variables of Daniel Bernoulli says: look for a solution of the PDE in formula (1) of the form u(x,t)=X(x)T(t). If you want this to satisfy (2), assume X(0)=X(π). If you

want it to satisfy (3), you are in trouble for the 1st part, but the 2nd part becomes T’(0)=0. Now plug u(x,t)=X(x)T(t) into formula (1). You get (setting c=τ/ρ)

X(x)T”(t)=cT(t)X”(x), Divide both sides by X(x)T(t) (hoping you are not dividing by 0). This gives:

(4) "( ) "( )( ) ( )

T t X xcT t X x

= .

This implies each side is constant. Call the constant λ. It is often called the separation constant. It is an eigenvalue in the 1st ODE below. Exercise 1. Prove that each side in equation (4) must be constant. Thus we now have 2 ODES to solve, assuming c=1 (or τ=ρ in formula (1)). ODE 1. X”(x) = λX(x), 0 = X(0) = X(π). ODE 2. T”(t) = λT(t), T ’(0) = 0. Look at ODE1. The general solution from Math. 20D is

( ) ( )1 2( ) exp expX x a x a xλ λ= + − ,

with constants ai. To satisfy the boundary conditions, we need λ<0. This means, since eix=cosx+isinx when i=(-1)½, that we should write, for λ=-μ2, ( ) ( )1 2( ) cos sinX x c x c xμ μ= + ,

with constants cj. In order to satisfy the 1st boundary condition, we need X(0)=c1=0. This makes X(x)=c2sin(μx). The 2nd boundary condition is X(π)=c2sin(μπ)=0. This says

μπ=nπ, for some integer n=1,2,3,..... So we find the separation constant in equation (4) is

( ,0) 0u xt

∂ =∂

Page 213: Introduction, Motivation, Basics About Sets, Functions, Counting

3

λ=-μ2=n2, for some integer n=1,2,3,..... Then (5) X(x)=c2sin(nx), for n=1,2,3,..... Look at ODE2. Math 20D says that the general solution may be taken to be

( ) ( )1 2( ) cos sinT t b nt b nt= + .

Since T ’(0) = nb2 = 0, we see that b2=0 and (6) T(t) = b1cos(nt), for n=1,2,3,..... Now we turn to the problem of the 1st part of the initial condition (3). For this you need to be able to write the function f(x)=X(x)T(0)=c2sin(nx). But, what should we do if the initial shape of the string is not a sine function; e.g., the plucked string pictured below. To solve the plucked string problem you need to represent the function f(x)=u(x,0) as a Fourier sine series (thinking that f is an odd function of period 2π): (7)

1

( ) sin( )nn

f x c nx≥

=� .

Then the constants cn are given by the formula

(8) 0

2 ( )sin( ) .nc f y ny dyπ

π= �

This was proved in Lectures II, for sufficiently smooth functions f. Of course, our plucked string function does not look smooth, just continuous. It has that sharp point, remember. Anyway our final solution to the vibrating string problem is: (9)

1

( , ) sin( )cos( ).nn

u x t c nx nt≥

=�

Here the constants cn are from formula (8). Exercise 2. Check that formula (9) solves the PDE (1) assuming c=1 (or τ=ρ in formula (1)). What assumptions do you need to make on the function f to know that the Fourier coefficients decrease rapidly enough to make the differentiated series converge uniformly so that it is legal to differentiate term-by-term? Thus we see that Fourier series seem necessary if we want to understand vibrating things like strings. They are also useful in the analysis of almost any phenomenon; e.g., the stock market, heat diffusion, yearly San Diego rainfall measurements. Of course, many questions are raised in the mathematical brain here. We put some of them in the preceding exercise. You might also ask: 1) Are the solutions u(x,t) to (1),(2), and (3) unique? Then whatever crazy method we use will lead to the same answer. 2) When is an infinite series of solutions to a PDE again a solution?

0 π/2 π

1

Page 214: Introduction, Motivation, Basics About Sets, Functions, Counting

4

We won’t answer these questions here. The quest for solutions to more general boundary value problems such as that of a vibrating drum leads to Fourier series in 2 variables or series involving Bessel functions, depending on the shape of the drum (square, circular, ...). See Courant and Hilbert, Methods of Mathematical Physics, I, p. 302. Sometimes the problem may lead to Fourier integrals; e.g., heat equation on an infinite metal wire placed along the positive real axis. We have no time to think about this. The references below will. Other References on Fourier Analysis. Dym and McKean, Fourier Series and Integrals; Terras, Harmonic Analysis on Symmetric Spaces and Applications. Fourier’s paper “Théorie analytique de la chaleur” [Analytic theory of heat] appeared in 1822. He says in it that “there is no function f(x) or part of a function which cannot be expressed by a trigonometric series.” In this generality the statement is false. Dirichlet, Riemann, Lebesgue and many others have slaved to make Fourier theory rigorous. Now it helps to use the theory of distributions or generalized functions to keep us from worrying too much about interchanges of derivative and sum. In 1876 DuBois Reymond gave an example of a continuous function whose Fourier series diverges on an everywhere dense set of points. Kolmogorov (1926) gave an example of a Lebesgue integrable function on [0,1] whose Fourier series diverges everywhere. In 1966 Carleson show that a square integrable function on [0,1] has a Fourier series that converges almost everywhere (i.e., except on a set of Lebesgue measure 0). Charles Fefferman (who at age 12 was in the class analogous to Math 142 that I took as an undergrad) has shown that Carleson’s result does not generalize to more than 1 variable. Fefferman was awarded the Fields medal in 1978 for this.

Part III. Forced Motion of a Vibrating String. Assume the vibrating string is as before but now apply an external force of the form f(x)cos(ωt). This leads to the PDE:

( )( ) cos

(0, ) ( , ) 0( ,0) ( ,0) 0.

tt xx

t

u u f x t

u t u tu x u x

τ ωρ

π

− =

= == =

Assume 1=τ/ρ, for simplicity. So our problem is now

(10) ( )( ) cos

(0, ) ( , ) 0( ,0) ( ,0) 0.

tt xx

t

u u f x tu t u tu x u x

ωπ

− == == =

To solve (10), plug in (11)

1

( , ) ( )sin( ).nn

u x t c t nx≥

=�

Define the inner product for piecewise continuous functions f, g on [0,π] by

(12) 0

, ( ) ( ) .f g f x g x dxπ

= �

Page 215: Introduction, Motivation, Basics About Sets, Functions, Counting

5

Then 0

sin( ) sin( )( ) ( , ), ( , )2 2

nn nxc t u t u x t dx

π

π π∗= ∗ = � .

Here we use the fact that 2

0

sin ( ) .2

nx dxπ π=�

Exercise 3. Prove the last formula and then use it to show that vn(x)=sin( )

2

nxπ

,n=1,2,3,.... forms

an orthonormal family for the inner product space of piecewise continuous functions on [0,π] using the inner product defined by formula (12). If we plug formula (11) into (10) without worrying about our issues of interchange of derivative and summation, we see that we need

( )0 0

0 0

( ) ( , ) ( ) ( , ) ( ) cos( ) ( )

( , ) ( ) cos( ) ( ) ( ) .

n tt n xx n

n n

c t u x t v x dx u x t f x t v x dx

u x t v x dx t f x v x dx

π π

π π

ω

ω

′′ = = +

′′= +

� �

� �.

Exercise 4. Show this last formula holds using integration by parts and the boundary conditions. So from our last formula, we have

2

0 02

( ) ( , ) ( ) cos( ) ( ) ( )

( ) , cos( ).

n n n

n n

c t n u x t v x dx t f x v x dx

n c t f v t

π π

ω

ω

′′ = − +

= − +

� �

So now we have an ODE to solve 2( ) ( ) , cos( )

(0) (0) 0.

n n n

n n

c t n c t f v t

c c

ω′′ = − +

′= =

You can solve this using methods from Math. 20D – for example, variation of parameters. The result is:

( )2 2

,( ) cos( ) cos( )nn

f vc t t nt

ω= −

−.

Exercise 5. Check this answer. Thus a solution for the problem posed by formula (10) is

(13) 2 21

cos( ) cos( ) sin( )( , ) ( ), , , ( ) .2

n n n n nn

t nt nxu x t a v x a f v v xn

ωω π

=

−= = =−�

Exercise 6. What happens to formula (13) as ω→±n ??? You need to compute

Page 216: Introduction, Motivation, Basics About Sets, Functions, Counting

6

2 2

cos( ) cos( )limn

t ntnω

ωω→

−−

.

You should be able to deduce from this that unless an=0, the function u(x,t) blows up as t goes to infinity. This is resonance. The phenomenon of resonance is quite general. It works for all sorts of vibrating objects, even with non-constant density and tension and even in higher dimensions. Of course it can be used for good as well as evil. Consider musical instruments for example. Can they be played for evil? I suppose you might be able to destroy a glass by making it resonate.