Math 218 Notes

download Math 218 Notes

If you can't read please download the document

Transcript of Math 218 Notes

AN INTRODUCTION TO FUNCTIONS OF SEVERAL REAL VARIABLESRobert C. Gunningc Robert C. Gunning

ii

IntroductionI have taught the honors version of calculus of several variables at Princeton University from time to time over a number of years. In the past I used Michael Spivaks path breaking and highly original Calculus on Manifolds as the course textbook, supplemented by one or another of the available excellent textbooks covering the standard approaches to calculus in several variables and including a wide range of examples and problems. As usual though, for anyone who has taught a course from the same textbook more than a couple of times, there are some parts of Spivaks treatment that I would like to expand, and some parts that I would like to contract or handle dierently. During the 2007-2008 academic year Lillian Pierce and I cotaught the course, while she was a graduate student here, and reorganized the presentation of the material and the problem assignments; her suggestions were substantial and she played a signicant part in working through the revisions that year. The goal was to continue the rather abstract treatment of the underlying mathematical topics but to supplement it by more emphasis on using the theoretical material to solve problems and to see some applications. The problems were divided into two categories: a rst group of problems testing a basic understanding of the essential topics discussed and their applications, problems that almost all serious students should be able to solve without too much diculty; and a second group of problems covering more theoretical aspects and tougher calculations, challenging the students but still not particularly dicult. The temptation to include a third category of optional very challenging problems, to introduce interested students to a wider range of other theoretical and practical aspects, was frustrated by a resounding lack of interest on the part of the students. A team of Princeon undergraduate mathematics majors consisting of Robert Haraway, Adam Hesterberg, Jay Holt and Alex Schiller took careful notes of the lectures that both Lillian and I gave. In the subsequent years I have used these notes when teaching the course again, modied each year with revisions and additions using the very helpful corrections and suggestions from the students taking the course and the graduate assistants coteaching and grading the course. The resulting notes still follow to an extent the pattern and outlook pioneered in Michael Spivaks book. The principal dierences are more emphasis on dierentiation and the inverse mapping and related theorems; a more extensive treatment of orientation, which can be a rather confusing concept; an introduction of dierential forms rather more simply and primitively, rather than as duals to dierentiations, which iii

iv

INTRODUCTION

seems sucient when the interest is really on their role in Euclidean spaces; and more emphasis on the classical interpretations and analytic aspects of dierential forms. The course based on these notes covers a good deal of material, so it meets four hours each week, one of which is devoted principally to examples and illustrative calculations, while the discussion of the problems is left to oce hours outside the regular class schedule. It has been possible at least to discuss all the topics covered in general term, focusing on the most dicult proofs and leaving it to the students to read the details of some of the proofs in the notes. I would like to express here my sincere thanks to the students who compiled the lecture notes that were the basis for these notes, Robert Haraway, Adam Hesterberg, Jay Holt and Alex Schiller; to Lillian Pierce for her suggestions and great help in reorganizing the course; to the graduate students who have cotaught the course and done the major share of grading the assignments; and to the students who have taken the course since its reorganization for a number of corrections and suggestions for greater clarity in the notes. The remaining errors and confusion are my own responsibility though. Robert C. Gunning Fine Hall, Princeton University, May 2011

ContentsIntroduction 1 Background 1.1 Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Topological Preliminaries . . . . . . . . . . . . . . . . . . . . . . 1.3 Continuous Mappings . . . . . . . . . . . . . . . . . . . . . . . . 2 Dierentiable Mappings 2.1 The Derivative . . . . 2.2 The Chain Rule . . . . 2.3 Higher Derivatives . . 2.4 Functions . . . . . . . 3 The 3.1 3.2 3.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii 1 1 7 13 19 19 23 27 31

Rank Theorem 41 The Inverse Mapping Theorem . . . . . . . . . . . . . . . . . . . 41 The Implicit Function Theorem . . . . . . . . . . . . . . . . . . . 51 The Rank Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 57 . . . . . . . . . . . . Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 71 84 87 95

4 Integration 4.1 Riemann Integral . . 4.2 Fubinis Theorem . . 4.3 Limits and Improper 4.4 Change of Variables

5 Dierential Forms 107 5.1 Line Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 5.2 Dierential Forms . . . . . . . . . . . . . . . . . . . . . . . . . . 116 6 Integration of Dierential Forms 135 6.1 Integrals of Dierential Forms . . . . . . . . . . . . . . . . . . . . 135 6.2 Stokess Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 A Exterior Algebras 161

v

vi

CONTENTS

Chapter 1

Background1.1 Norms

where xi R. For notational convenience a vector x Rn sometimes will be indicated by listing its coordinates in the form x = {xj } = {x1 , . . . , xn }, although it still will be viewed as a column vector; and the image of a mapping f : Rn Rm often will be denoted by f (x1 , . . . , xn ), viewing the mapping f as a function of the individual coordinates of the column vector x. The origin in Rn is the vector 0 = {0} = {0, 0, . . . , 0}. 1

Some fairly standard set-theoretic notation and terminology will be used consistently. In particular, a A indicates that a is an element of, or a point in, the set A, while A B indicates that A is a subset of B, possibly equal to B. The union A B of sets A and B consists of all elements that belong to either A or B or both A and B, while the intersection A B consists of all elements that belong to both A and B. The complement A B consists of all elements of A that do not belong to B, whether or not B is a subset of A. If the set A is understood from context it may be omitted; so B consists of all elements in a set A that are not in B, where the set A is understood. A mapping f : A B associates to each element a A an element f (a) B, the image of the point a. This mapping f is injective if distinct elements of A have distinct images in B, is surjective if every element of B is the image of some element of A, and is bijective if it is both injective and surjective; thus f is bijective if and only if is a one-to-one mapping between the sets A and B, and consequently has an inverse mapping f 1 : B A that is also bijective. The n-dimensional real vector space will be denoted by Rn , following the Bourbaki convention. A vector x Rn always will be viewed as a column vector of length n, so as given by x1 . x= . . xn

2

CHAPTER 1. BACKGROUND

Linear mappings between vector spaces are described by matrix multiplication; for example, a 2 3 matrix A = {aij } describes the mapping A : R3 R2 that takes a vector x = {xj } R3 to the vector x a11 x1 + a12 x2 + a13 x3 a11 a12 a13 1 x2 = R2 . Ax = a21 x1 + a22 x2 + a23 x3 a21 a22 a23 x3

Vectors are added by adding their coordinates and multiplied by a real number a by multiplying their coordinates by a, so that a x1 x1 + y1 . . . x+y= and a x = . for any a R. . . xn + yn a xn

There are a various norms measuring the sizes or lengths of vectors in Rn , only two of which will be considered here: the Euclidean norm or Cartesian norm of a vector x = {xj } is dened by n 2 x 2 = x2 + + x2 (with the non-negative square n 1 j=1 xj = root);

the supremum norm or sup norm of a vector x = {xj } is dened by x = max1jn |xj |. In general, a norm on the vector space Rn is a mapping Rn R that associates to any x Rn a real number x R and that has the following properties: 1. positivity : x 0 and x = 0 if and only if x = 0; 2. homogeneity: cx = |c| x for any c R; 3. the triangle inequality: x + y x + y . That the supremum norm satises these three properties is obvious, except perhaps for the triangle inequality; to verify that, if x, y Rn then since |xj | x and |yj | y for 1 j n it follows that |xj + yj | |xj | + |yj | x + y for 1 j n and consequently that x + y = max1jn |xj + yj | x + y . That the Euclidean norm satises these three properties also is obvious, except for the triangle inequality. It is convenient to demonstrate that together with an inequality involving the inner product of two vectors, which is dened byn

dot

(1.1)

(x, y) =j=1

xj yj

for vectors x = {xj } and y = {yj } Rn ;

1.1. NORMS

3

often this is written (x, y) = xy and called the dot product of the two vectors x and y. It is clear from its denition that the inner or dot product has the following properties: 1. linearity: (c1 x1 + c2 x2 , y) = c1 (x1 , y) + c2 (x2 , y) for any c R; 2. symmetry: (x, y) = (y, x); 3. positivity: (x, x) 0 and (x, x) = 0 if and only if x = 0. It is apparent from symmetry that the inner product (x, y) is also a linear function of the vector y. The Euclidean norm can be dened in terms of the inner product bynorm1

(1.2)

x

2

=

(x, x)

(with the non-negative square root),

as is also quite clear from the denition. Conversely the inner product can be dened in terms of the Euclidean norm bynorm2

(1.3) since x+y2 2

(x, y) =

1 4

x+y

2 2

1 4

xy

2 2

xy

2 2

= (x + y, x + y) (x y, x y) = (x, x) + 2(x, y) + (y, y) (x, x) 2(x, y) + (y, y) = 4(x, y);

equation (1.3) is called the polarization identity.norm

norm2

Theorem 1.1 For any x, y Rn (i) |(x, y)| x 2 y 2 , and this is an equality if and only if the two vectors are linearly dependent; (ii) x + y 2 x 2 + y 2 , and this is an equality if and only if one of the two vectors is a non-negative multiple of the other. Proof: If x, y Rn where y = 0 and if t R, introduce the continuous function f (t) of the variable t dened byn

aux1

(1.4)

f (t) = x + tyn

2 2

=j=1 n

(xj + tyj )2n 2 yj j=1

=j=1

x2 + 2t jj=1 2 2

xj yj + t2

= x

+ 2t(x, y) + t2 y 2 . 22

The function f (t) becomes large for |t| large, since y it must take a minimum value at some point. Since f (t) = 2(x, y) + 2t y2 2

> 0 by assumption, so

4

CHAPTER 1. BACKGROUND

there is actually just a single point at which f (t) = 0, the point t0 = (x,y) , y 2 2 so this must be the point at which the function f (t) takes its minimum value; and the minimum value isaux1x

(1.5) f (t0 ) = x

2 2

+2

(x, y) y 2 2

(x, y) +

(x, y)2 y y 4 2

2 2

=

x

2 2

y

2 2

y

(x, y)22 2

.

particular at It is clear from (1.4) that f (t) 0 at all points t R, so in aux1x the point t0 , and that yields the inequality (i). It is clear from (1.5) that this inequality is an equality if and only if f (t0 ) = 0, hence if and only if x + t0 y = 0 since f (t0 ) = x + t0 y 2 ; and since y = 0 that is just the condition that the 2 aux1 vectors x and y are linearly dependent. Next from (1.4) with t = 1 and from the inequality (i) it follows that x+y2 2

aux1

= x x

2 2 2 2

+ 2(x, y) + y +2 x2

2 2 2 2

y

2

+ y

=( x

2

+ y 2 )2 ,

which yields the inequality (ii). This inequality is an equality if and only if (x, y) = x 2 y 2 , or equivalently if and only if (i) is an equality and (x, y) 0; and since y = 0 it follows from what has already been demonstrated that the inequality (i) is an equality if and only if x = c y for some real number c, and then (x, y) = c y 2 0 if and only if c 0. That suces for the proof. 2 The very useful inequality (i) in the preceding theorem, called the CauchySchwarz inequality, can be written (x, y) x 2 y 1 if x = 0, y = 0;

2

as a consequence there is an angle , called the angle between the nonzero vectors x and y, that is determined uniquely up to a multiple of 2 bynormq1

(1.6)

cos =

(x, y) . x 2 y 2norm

normcmp

In particular if = 0 or so cos = 1 then by Theorem 1.1 (i) the two vectors x and y are linearly dependent; they are parallel and in the same direction if = 0 so that (x, y) > 0, or parallel and in the opposite direction if = so that (x, y) < 0. The two vectors are orthogonal or perpendicular to one another when the angle is either /2 or 3/2 so that (x, y) = 0. The geometrical interpretation of the norm x 2 in terms of the inner product makes that norm particularly useful in many applications. In general, two norms x a and x b on a vector space Rn are equivalent if there are nonzero constants ca and cb such that x a ca x b and x b cb x for all x Rn . The Euclidean and supremum norms are equivalent, since they are related by the very useful inequalities (1.7) x x 2 n x for any x Rn .

1.1. NORMS

5

To verify these inequalities, if x = {xj } Rn then |xj | x for 1 j n so n n x 2 = n x 2 hence x 2 n x ; x2 x 2= j 2j=1 j=1

on the other handn

x

2 2

=j=1

x2 x2 j jnormcmp

so x

2

|xj | for 1 j n hence x

2

x

.

It follows from (1.7) that whenever x 2 is small then x is also small, and conversely; so for many purposes the two norms are interchangeable. If it is not really necessary to specify which norm is meant the notation x will be used, meaning either x 2 or x . Of course some care must be taken; in particular x should have the same meaning in any single equation, or usually throughout any single proof, for quite obvious reasons. Any norm can be used to dene a distance function d(x, y) bynormdistdf

(1.8)

d(x, y) = x y .

As an immediate consequence of the properties of norms it follows that the distance function has the following properties: 1. positivity: d(x, y) = 0 if and only if x = y, 2. symmetry: d(x, y) = d(y, x) 3. triangle inequality: d(x, y) d(x, z) + d(z, y), so the notion of distance does have the expected properties. Equivalent norms determine distance functions that are equivalent in the corresponding sense, as is quite evident from the denitions. For some purposes it is convenient to view an m n matrix A= aij 1 i m, 1 j n Rmn

natnorm1

as a vector in Rmn , and to consider its norm as a vector; thus (1.9) A

= max aij and1im 1jn

A

2

=1im 1jn

a2 (the nonnegative square root). ij

If x Rn then Ax Rm , and sincen j=1 n n

aij xj

j=1

|aij ||xj |

Aj=1

x

=n A

x

6 it follows thatn

CHAPTER 1. BACKGROUND

natnorm2

(1.10)

Ax

= max

1im

j=1

aij xj n A

x

.

An -neighborhood N (a) of a point a Rn , an open -neighborhood to be more specic, is a subset of Rn dened bynormdfn

(1.11)

N (a) =

x Rn

xa 0 there is an index N for which |ai aj | < whenever i, j > N . There are many topological notions and terms that are in common use. A subset U Rn is said to be open if for each a normcmp U there is an > 0 such that N (a) U . It is clear from the inequalities (1.7) that the condition that a set be open is independent of which of the two norms x 2 or x is used to dene the neighborhood N (a). Intuitively a set U Rn is open if for any point a U all points that are near enough to a are also in the set U . For example an open -neighborhood N (a) of a point a Rn and an open cell Rn are open subsets. An open subset of Rn containing a point a is often called an open neighborhood of the point a; an open -neighborhood N (a) is thus an open neighborhood of the point a in this sense as well. The collection T of open subsets of Rn has the following characteristic properties: 1. if U T then

U T ;N i=1

2. if Ui T for 1 i N for some N then 3. T and Rn T .

Ui T ;

8

CHAPTER 1. BACKGROUND

These properties can be summarized in the statement that an arbitrary union of open sets is open, a nite intersection of open sets is open, and the empty set and the set Rn itself are open. That the open subsets of Rn satisfy these three conditions is quite clear. It is also clear that an innite intersection of 1 open sets is not necessarily open; for instance =1 N (a) = a and a single point is not open. A collection T of subsets of an arbitrary abstract set S having these three characteristic properties is said to be a topology on the set S, and the sets U T are said to be the open sets in this topology. For many purposes the relevant geometrical properties of a space can be described just in terms of the collection of open sets, the topology of that space, rather than in terms of a norm or distance function on the space. There are a great many topologies in addition to that dened by the Cartesian norm or the supremum norm or any norm that is equivalent to these norms, or indeed by any norm at all. Two particularly simple examples of topologies on a space that have occasional uses are the discrete topology, in which all subsets are open sets, and the indiscrete topology, in which the only open sets are the empty set and the entire set; the discrete topology is the largest possible topology, that with the greatest collection of open sets, while the indiscrete topology is the smallest possible topology, that with the least collection of open sets. A point a Rn is a limit point of a subset E Rn if for any > 0 the intersection N (a) E contains points of E other than a; clearly that is equivalent to the condition that for all > 0 the intersection N (a) E is an innite set of points. The set of limit points of E is denoted by E and is called the derived set of E. The set E is said to be closed if E E. For example a closed neighborhood N (a) of a point a Rn and a closed cell in Rn are closed subsets. The closure of a set E, denoted by E, is dened to be the union E = E E , and is readily seen to be a closed set; indeed it is easy to see that the closure of a set E is the smallest closed set containing E. A closed neighborhood N (a) of a point a Rn is the closure of the open neighborhood N (a) and a closed cell in Rn is the closure of the open cell provided that = . A basic result is that a subset E Rn is closed if and only if its complement F = Rn E is open. To verify that, if E is closed and a E then a E so a is not a limit point of E hence there must be some > 0 such that N (a) E = ; consequently N (a) F so F is open. Conversely if E is open then no point of E can be a limit point of F , since for any point a E there is an > 0 such that N (a) E hence that N (a) F = ; consequently all the limit points of F are contained in F so F is closed. The complementarity between open and closed sets can be used to dened the interior of a subset E Rn , denoted by E o , as the complement of the closure of the exterior of E, that is, E o = Rn (Rn E); it is easy to see that the interior of E is the largest open set contained in E. It is also clear from the complementarity between open and closed sets that the collection C of closed subsets of Rn has the following characteristic properties: 1. if U C then

C;N i=1

2. if Ui C for 1 i N for some N then

C;

1.2. TOPOLOGICAL PRELIMINARIES 3. C and Rn C.

9

These properties can be summarized in the statement that an arbitrary intersection of closed sets is closed, a nite union of closed sets is closed, and the empty set and the full set Rn are both closed; but an arbitrary union of closed sets is not necessarily closed, since any set can be written as a union of its points and each point is a closed set. A topology on a space can be dened equivalently by giving a collection of sets satisfying these three conditions and dening the open sets to be their complements, although it is customary to dene topologies directly in terms of the open sets. A subset S Rn can be viewed as a set in itself, without regard to the ambient space Rn in which it is contained; in a sense that amounts to considering the set S itself as the entire universe, rather than viewing it is a subset contained in the universe Rn . The restriction of a norm on Rn to the subset S can be used to describe a distance function on the set S itself, and open and closed subsets of S then can be dened in terms of this distance when an neighborhood of a point a in the set S is taken to be the intersection N (a) S. Thus a subset E S is said to be relatively open if for any point a E there is an > 0 such that N (a) S E, where N (a) S consists of those points of S that are distance at most from the point a. A point a S is a relative limit point of E in S if for any > 0 the intersection N (a) S contains a point of E other than a, or equivalently, if for any > 0 the intersection N (a) S is innite; and that is just the condition that a E S where E is the derived set of E as a subset of Rn , so a relative limit point of E is a limit point of E in the ordinary sense but one that is also contained in S. A subset E S is said to be relatively closed if it contains all its relative limits points. Clearly a relatively open subset of S need not be an open subset of E; if S is a coordinate axis in the plane R2 an open interval of S is relatively open in S but is not open in R2 . Equally clearly a relatively closed subset of S need not be a closed subset of Rn ; for S Rn may not be closed in Rn it is still relatively closed in itself. However a subset E S is relatively open if and only if it is the intersection E = U S of S with an open subset U Rn , and is relatively closed if and only if it is the intersection E = C S of S with a closed subset C Rn . Indeed it is clear that if E = U S where U Rn is open then E U is relatively open; and if E S is relatively open then by denition for each point a E there is an a > 0 such that N (a) S E, and then U = a N (a) is an open subset of Rn such that E = U S. Similarly if E = C S for some closed subset C Rn then E C C so E S C S = E, and if E S is a relatively closed subset of S then then C = E E is a closed subset of Rn and C S = (E S) (E S) E. The relatively open subsets of a set S clearly satisfy the conditions that they form a topology on S, called the relative topology on S. This can be a somewhat confusing notion, and it is important to keep clearly in mind the distinction between open subsets of a set S Rn , those open subsets of Rn that are contained in S, and relatively open subsets of the set S, subsets of S that are open in the relative topology of S but are not necessarily open subsets of Rn .

10

CHAPTER 1. BACKGROUND

The boundary of a subset E Rn is the closed set E = E ( E). The boundaries of neighborhoods N (a) and of cells as dened earlier are also their boundaries in this sense. For more general sets some caution is necessary, since the boundary may not always correspond to what naively might be viewed as the boundary of the set; for example if E is the set of all points in an open cell that have rational coordinates then E = . Two subsets E, F Rn are separated if E F = E F = . This is somewhat stronger than the condition that the two sets are disjoint; for example E = [0, 1) and F = [1, 2] are disjoint subsets of R but are not separated since E F = 1, but E = [0, 1) and G = (1, 2] are separated subsets of R. A subset S Rn is connected if it cannot be written as the union of two nonempty separated subsets. Alternatively a topological space S, such as a subset of Rn with the relative topology, is connected if it cannot be written as a disjoint union of two relatively open subsets of S, or equivalently if there is no subset of S that is both relatively open and relatively closed, other than the empty set and all of S. The equivalence of these conditions is quite obvious. It is easy to see that the entire space Rn is connected. Indeed if U Rn is a subset other than the empty set or all of Rn that is both open and closed, choose a point a U and a point b Rn U . The set of real numbers s such that a + t(b a) U for 0 t < s is nonempty, since U is open, and it is bounded above since b U , so this set has a least upper bound s0 , one of the characteristic properties of the real number system. Since U is closed it must be the case that a + s0 b U , for this is the limit of the points a + sb U as s s0 ; but then since U is open it also must be the case that a + sb U for some real numbers s > s0 , a contradiction. The same argument shows that an open neighborhood Nr (a) and an open cell in Rn are connected sets. On the other hand the subset Q R of rational points with the topology it inherits from R is not a connected relative set; indeed the interval [ 2, 3] Q is both open and closed. An open covering of a subset E Rn is a collection of open sets U , not necessarily a countable collection, such that E U . If some of the sets U are redundant they can be eliminated and the remaining sets are also an open covering of E, called a subcovering of the set E. A set E is said to be compact if every open covering of E has a nite subcovering, that is, if for any open covering {U } of E nitely many of the sets U actually cover all of E. This is a rather subtle notion, but it is very important and frequently used. To show that a set E is not compact it suces to nd a single open covering {U } of E such that no nite collection of the sets U can cover E; but to show that E is compact it is necessary to show that for any open covering {U } of E nitely many of the sets U actually will cover E. An example of a non-compact subset of Rn is a nonempty open neighborhood N1 (0) of the origin in Rn ; this 1 set is covered by the open subsets U = N1 (0) for = 1, 2, 3, . . ., but the union of any nite collection of these subsets will be just the set U for the largest in the collection, and that is a proper subset of N1 (0). An open cell is also noncompact, for essentially the same reason. A closed cell is an example of a compact subset; but the proof is rather more subtle and rests on the basic topological properties of the real number system, just as did the proof that Rn

1.2. TOPOLOGICAL PRELIMINARIES

11

is connected. For the proof it is convenient to dene the edgesize of an open or closed cell = { x = {xj } Rn aj xj bj } Rn to be the nonnegative number e() = max1jn (bj aj ).complm

Lemma 1.1 If Rn are closed cells in Rn for = 1, 2, . . . such that +1 and lim e( ) = 0 then is a single point of Rn . Proof: If = { x = {xj } a xj b } then for each j clearly a+1 a j j j j and b+1 b ; and since 1 it is also the case that a b1 and b a1 . j j j j j j The basic completeness property of the real number system implies that any increasing sequence of real numbers bounded from above and any decreasing sequence of real numbers bounded from below have limiting values; therefore lim a = aj and lim b = bj for some uniquely determined real numbers j j aj , bj , and it is clear that the cell = { x = {xj } aj xj bj } is contained in the intersection . On the other hand since (bj aj ) (b a ) e( ) j j and lim e( ) = 0 it must be the case that bj = aj so the limiting cell is just a single point of Rn , which concludes the proof.

cellcom

Theorem 1.2 A closed cell in Rn is compact. Proof: If a closed cell = { x = {xj } aj xj bj } is not compact there is an open covering {U } of that does not admit any nite subcovering. The cell can be written as the union of the closed cells arising from bisecting each of its sides. If nitely many of the sets {U } covered each of the subcells then nitely many would cover the entire set , which is not the case; hence at least one of the subcells cannot be covered by nitely many of the sets {U }. Then bisect each of the sides of that subcell, and repeat the process. The result is that there is a collection of closed cells which cannot be covered by nitely many of the open sets {U } and for which +1 , and lim e( ) 0. It then follows from the preceding lemma that = a, a single point in Rn . This point must be contained within one of the sets {U0 }, and if is suciently large then U0 as well; but that is a contradiction, since the cells were chosen so that none of them could be covered by nitely many of the sets {U }. Therefore the cell is compact, which concludes the proof.

clcomp

Theorem 1.3 A closed subset of a compact set is compact. Proof: Suppose that E F where F is compact and E is closed, and that {U } is an open covering of the set E. The sets U together with the open set Rn E form a covering of the compact set F , so nitely many of these sets cover F hence also cover E. Clearly the set Rn E covers none of the points of E, so the remaining nitely many sets U necessarily cover E. Therefore E is compact, which suces for the proof.

hb

Theorem 1.4 (Heine-Borel Theorem) A subset E Rn is compact if and only if it is closed and bounded.

12

CHAPTER 1. BACKGROUND

Proof: A bounded subset E Rn is contained in a suciently large closed cell cellcom Rn ; clcomp is compact by Theorem 1.2, so if E is also closed then by the set Theorem 1.3 it is compact. Conversely suppose that E is a compact set. If E is not bounded it can be covered by the collection of open neighborhoods N (0) for = 1, 2, . . ., but it is not covered by any nite set of these neighborhoods; that contradicts the compactness of E, so a compact set is bounded. If E is not closed then there is a limit point a E that is not contained in E. Each closed neighborhood N 1/ (a) of the point a must contain a point of E, since a E , but the intersection of all of these neighborhoods consists of the point a itself, which is not contained in E. The complements U = Rn N 1/ (a) then form an open covering of E, but no nite number of these sets suce to cover E since the union of nitely many such sets must be one of the sets U and hence does not cover the points of E contained in N 1/ (a); and that again is a contradiction, which suces to conclude the proof.cas

Theorem 1.5 (Casorati-Weierstrass Theorem) A subset E Rn is compact if and only if every sequence of distinct points in E has a limit point in E. Proof: Suppose that E is compact. If a E is a collection of distinct points of E with no limit points in E then the points a can have no limit points in Rn , for since E is closed by the Heine-Borel Theorem these limit points would necessarily lie in E; in particular the set a is a closed set. Each point a has an open neighborhood U that contains none of the other points, since otherwise a would itself be a limit point of this collection of points. These open sets U together with the open set Rn ( a ) form an open covering of E; and since E is compact nitely many of these sets already cover E. That is a contradiction, since the set Rn ( a ) does not cover any of the points a and no nite collection of the sets U cover all the points a . On the other hand, suppose that E Rn is not compact. Then by the Heine-Borel Theorem either E is not bounded or E is not closed. If E is not bounded it must contain a sequence of distinct points a such that |a | is a strictly increasing sequence of real numbers with no nite limit, and this sequence can have no limit point in E. On the other hand if E is not closed it contains a sequence of distinct points a with a limit point not contained in E. That suces to conclude the proof. As already noted, for any set S Rn a subset E S that is relatively open in S need not be open in Rn and a subset F S that is relatively closed in S need not be closed in Rn . However a subset E S is compact in the relative topology of S, in which the denition of compactness is expressed in terms of open coverings by relatively open sets in S, if and only E is compact as a subset of Rn . Indeed that is quite clear, since the relatively open sets A in S are the intersections A = S U of S with open subsets U in Rn so that E A if and only if E U . Thus the property that a set be compact is a much more intrinsic property of that set than that it be either open or closed. Note

1.3. CONTINUOUS MAPPINGS

13

carefully though that the Heine-Borel Theorem only holds for subsets of the entire space Rn ; the proof quite explicitly rested on properties of closed cells in Rn , and it is clear that a subset of an open cell Rn can be relatively closed and bounded in but not closed in Rn hence not compact. However it is clear that the Casorati-Weierstrass Theorem holds in any subset S Rn in terms of the relative topology of S, since the limit points of a subset E S are just the points of the intersection E S.

1.3

Continuous Mappings

A mapping f from a subset U Rm to a subset V Rn associates to each point x = {xj } U a point f (x) = y = {yj } V ; the coordinates yj depend on the point x so can be viewed as given by functions yj = fj (x), which are the coordinate functions of the mapping f . The mapping f is continuous at a point a U if for every > 0 there is a > 0 such that ||f (a) f (x)|| whenever x U and ||xa|| , or equivalently, for every > 0 there is a normcmp >0 such that f U N (a) V N (f a) . It is clear from the inequalities (1.7) that in the denition of continuity the norm ||x|| can be either the Euclidean norm ||x||2 or the supremum norm ||x|| . It is also clear that the mapping f is continuous at a point a if and only if each of its coordinate functions fj is continuous at that point. The mapping is said to be continuous in the subset U if it is continuous at each point of U .congl

Theorem 1.6 A mapping f : S Rn from a subset S Rm into Rn is continuous in S if and only if f 1 (U ) is a relatively open subset of S for any open subset U Rn . Proof: If f : S Rn is continuous, U Rn is an open subset and a f 1 (U ) then f (a) = b U ; and since U is open there is an open neighborhood N (b) of the point b such that N (b) U . Since f is continuous there is a > 0 such

is the case for any point a f 1 (U ), hence f 1 (U ) is a relatively open subset of S. On the other hand if f 1 (U ) is a relatively open subset of S for any open subset U Rn then for any point a S with image f (a) = b Rn and for any > 0 suciently small that N (b) U the set f 1 (N (b)) is a relatively open subset of S; since the point a is in this set there must be some > 0 for which S N (a) f 1 (N (b)) or equivalently for which f S N (a) N (b), showing that f is continuous at the point a. That is the case for any point a S, so f is continuous in S, which suces for the proof.conglx

that f S N (a) N (b), hence that S N (a) f 1 N (b) f 1 (U ); that

Theorem 1.7 A mapping f : S Rn from a subset S Rm into Rn is continuous in S if and only if f 1 (E) is a relatively closed subset of S for any closed subset E Rn . Proof: This is an immediate consequence of the preceding theorem, since a subset of S is closed in the relative topology of S if and only if its complement

14

CHAPTER 1. BACKGROUND

in S is relatively open and f 1 (Rn E) = S f 1 (E) for any subset E Rn . That suces for the proof. These results show that the continuity of a mapping in a set S really is a property of the topology of S. Indeed a mapping between two general topological spaces is dened to be continuous if and only if the inverse of any open subset is open. A simple consequence of this characterization of continuity is that if g : U V and f : V W are continuous mappings between subsets U, V, W of some Euclidean spaces then the composition f g : U W ,the mapping that takes a point x U to the point (f g)(x) = f g(x) , is also continuous; indeed if E W is open then f 1 (E) is open since f is continuous and (f g)1 (E) = g1 f 1 (E) is open since g is continuous, and consequently congl conglx (f g) is continuous. The results in Theorem 1.6 and Corollary 1.7 involve the inverse image of a set under a mapping f ; the image of an open set under a continuous mapping is not necessarily open, and the image of a closed set under a continuous mapping is not necessarily closed. For instance if f : R R is 2 the continuous mapping f (x) = ex then f (R) = { x 0 < x 1 }, which is neither open nor closed although R itself is both open and closed.concomim

Theorem 1.8 If f : S Rn is a continuous mapping from a subset S Rm into Rn then the image f (E) of any compact subset E S is a compact subset of Rn . Proof: If E S is compact and f (E) is contained in a union of open sets U then E is contained in the union of the open sets f 1 (U ); and since E is compact is it contained in a union of nitely many of the sets U , so E is contained in the union of the inverse images of these nitely many open sets. That suces for the proof. Since a compact subset of Rn is closed, by the Heine-Borel Theorem, the inverse under a continuous mapping of a compact set is necessarily closed; but it is not necessarily compact. For example the inverse image of the set [1, 1] R under the mapping f : R R given by f (x) = sin x is an unbounded set hence is not compact.

concomimcor

Theorem 1.9 If f is a continuous function on a compact set U Rn then there are points a, b U such that (1.19) f (a) = sup f (x)xU

concomimcorq1

and

f (b) = inf f (x).xU

Proof: The image f (U ) R is compact by the preceding theorem, hence is a closed set; so if = supxU f (x) then since is a limit point of the set f (U ) it must be contained in the set f (U ) hence = f (a) for some point a U , and correspondingly for = inf xU f (x). That suces for the proof.conbij

Theorem 1.10 A one-to-one and continuous mapping from a compact subset U Rm onto a subset V Rn has a continuous inverse.

1.3. CONTINUOUS MAPPINGS

15

Proof: If the mapping f : U V is one-to-one it has a well dened inverse mapping g : V U . To show that g is continuous it suces to show that conglx g1 (E) is closed for any closed subset E U , in view of Corollary 1.7. If clcomp E is closed then by Theorem 1.3 it is compact, since U is compact; and then concomim g1 (E) = f (E) is compact by Theorem 1.8, hence is closed by the Heine-Borel Theorem, and that suces for the proof. The assumption of compactness is essential in the preceding theorem. For example the mapping f : [0, 2) R2 dened by f (t) = (cos t, sin t) R2 is clearly one-to-one and continuous; but the inverse mapping fails to be continuous at the point f (0) = (1, 0). Some properties of continuity are not purely topological properties, in the sense that they cannot be stated purely in terms of open and closed sets, but are metric properties, involving the norms used to dene continuity. By denition a mapping f : U W between two subsets of Euclidean spaces is continuous at a point a U if and only if for every > 0 there is a > 0 such that ||f (a)f (x)|| whenever ||xa|| . The mapping f is continuous in the set U if for each point a U and any > 0 there is a a > 0, which may depend on the point a, such that ||f (a) f (x)|| whenever ||x a|| a . The mapping f is uniformly continuous in U if it is possible to nd values a that are independent of the point a U . Equivalently the mapping f is uniformly continuous in U if for any > 0 there exists > 0 such that f (x) f (y) < whenever x y < . It should be kept in mind that the two norms are in dierent spaces; and it is evident from the inequalities normcmp (1.7) that they may be dierent norms as well. Not all continuous mappings are uniformly continuous, as for example the mapping f : R R dened by f (x) = x2 ; but in some circumstances continuous mappings are automatically uniformly continuous.topunf

Theorem 1.11 A continuous mapping f : U Rn from a compact subset U Rm into Rn is uniformly continuous. Proof: If f : U Rn and > 0 then for any point a U there is a > 0 such that f (x) f (a) < 1 whenever x Na . The collection of neighborhoods 2 1 N 2 a (a) for all points a U are an open covering of U , and if U is compact nitely many of these neighborhoods cover all of U . If > 0 is the minimum of the nitely many positive numbers a for this nite covering, and if x, y U are any two points such that x y < 1 , then x N 1 (a) for one of these 2 2 neighborhoods, and since x y < 1 it is also the case that y N (a). It 2 1 1 follows that f (x) f (y) f (x) f (a) + f (a) f (y) 2 + 2 = , so f is uniformly continuous, which concludes the proof.

16 PROBLEMS: GROUP 1

CHAPTER 1. BACKGROUND

(1) Sketch the following subsets of R2 = { x = (x1 , x2 ) }, and for each subset indicate whether it is open, closed, compact, or connected and nd its boundary. (i) { x 0 < x

1}

1 (ii) { x 0 < x1 1, x2 = sin x1 } (iii){ x x

1}{x 0< x

2

1. 1 2 (3) Show that if f and g are continuous functions in an open subset U Rn then max(f, g), min(f, g), and |f | are also continuous; show that the converses do not hold, so if max(f, g) is continuous it is not necessarily the case that f and g are continuous, and similarly for min(f, g) and |f |.n (4) Show that the function x 1 = j=1 |xj | of a vector x = {xj } R is also a norm. What is the shape of an open neighborhood of the origin in the plane R2 dened by this norm? n

(5) Show that the continuous image of a connected set is connected. (6) Show that if Ei Rn are nonempty compact subsets for i = 1, 2, . . . such that Ei+1 Ei then i Ei is also nonempty. Is that the case if the subsets Ei are closed but not necessarily compact?

PROBLEMS: GROUP 2 (7) Show that a norm x 0 on vectors in Rn can be dened by an inner product (x1 , x2 )0 on Rn , in the sense that x 2 = (x, x)0 , if and only if the norm 0 satises the parallelogram law: x+y20

+ xy

20

=2 x

20

+ 2 y 2. 0

As a consequence, show that the supremum norm cannot be dened by an inner product. (8) The set of all real polynomials in a single variable x form a real vector space P, but not a vector space of nite dimension; nonetheless it is possible

1.3. CONTINUOUS MAPPINGS to dene norms on this vector space. Show that the expressions p(x) p(x) c dened for a polynomial p(x) = a0 + a1 x + a2 x2 + by p(x)

17 and

=

sup1 1 x 2 2

|p(x)| and

p(x)

c

= max |a0 |, |a1 |, |a2 |,

are norms on the vector space P, but that these two norms are not equivalent norms. (9) Show that (E ) E for any subset E Rn , but that the containment is not necessarily an equality. (10) There are at least 3 topologies that have been described on the sets Rn : the ordinary topology, dened by the Euclidean or supremum norm; the discrete topology; and the indiscrete topology. For each pair of these topologies determine whether the identity mapping : Rn Rn is a continuous mapping when the rst space Rn has one topology of the pair and the second space Rn has the other topology of the pair. (11) Show that any two norms on Rn are equivalent. (12) Show that if f (x1 , x2 ) is a continuous function in the unit cell R2 then the function g(x1 ) = sup0x2 1 f (x1 , x2 ) is a continuous function on the unit interval [0, 1]. (13) Let : E E be a mapping from a subset E Rn into itself such that ||(x) (y)||2 = ||x y||2 for any points x, y E. (i) Show that if E is compact the mapping : E E is a homeomorphism, a continuous one-to-one mapping from E onto E with a continuous inverse. (ii) Is the mapping necessarily a homeomorphism if the set E is not compact? Why? (14) Show that if E Rn are compact connected subsets such that E+1 E then E = E=1 E is also connected. Is that the case if the subsets E are not assumed to be compact? Why?

18

CHAPTER 1. BACKGROUND

Chapter 2

Dierentiable Mappings2.1 The Derivative

A mapping f : U Rn from an open subset U Rm into Rn is dierentiable at a point a U if there is a linear mapping A : Rm Rn described by an n m matrix A such that for all h in an open neighborhood of the origin in Rmdiff1

(2.1)

f (a + h) = f (a) + Ah + (h)

where limh0

||(h)|| ||h||

= 0.

Here h Rm so Ah Rn , while f (a), f (a + h), (h) Rn . This denition is independent of the norm chosen; for if limh0 ||(h)||2 = 0 then from the ||h||2 normcmp inequalities (1.7) it follows that ||(h)|| ||(h)||2 n so limh0 ||(h)|| = 0 as ||h|| ||h||2 ||h|| well, and the converse holds similarly. If f is dierentiable at a it is continuous at a, since limh0 Ah = 0 and limh0 (h) = 0. For example, if m = 3 and diff1 n = 2 so that f : R3 R2 then (2.1) takes the form h (h) f1 (a + h) f1 (a) a a12 a13 1 h2 + 1 . = + 11 2 (h) f2 (a + h) a21 a22 a23 f2 (a) h3diff1

If f is a constant mapping then f (a + h) = f (a), which is (2.1) for A = 0 and (h) = 0; so any constant mapping is dierentiable. If f is the linear mapping f diff1 Ax for a matrix A then f (a + h) = A(a + h) = f (a) + Ah, which is (x) = (2.1) for the matrix A and (h) = 0, so a linear mapping also is dierentiable. diff1 If m = n = 1 it is possible to divide (2.1) by the real number h and to rewrite that equation in the formh0

lim

f (a + h) f (a) (h) A = lim = 0; h0 h h

this is a form of the familiar denition that the real-valued function f (x) of the variable x R is dierentiable at the point a and that its derivative at that point is the real number A. 19

20diffloc

CHAPTER 2. DIFFERENTIABLE MAPPINGS

Theorem 2.1 A mapping f : Rm Rn is dierentiable at a point a if and only if each of the coordinate functions fi of the mapping f is dierentiable at that point. Proof: If f is a dierentiable mapping it follows from (2.1) that each coordinate function fi satisesm

diff1

diff2

(2.2)

fi (a + h) = fi (a) +j=1

aij hj + i (h) where limh0

|i (h)| h

= 0,

that limh0 proof.

i since |h(h)| (h) ; and this is just the condition that each of the coordinate h functions fi of the mapping f is dierentiable at the point a. Conversely if each diff2 of the coordinate mappings fi is dierentiable at the point a then (2.2) holds for 1 i n. The collection of these n equations taken together form the equation diff1 i (2.1) in which (h) = {i (h)}; and since (h) = max1in |h(h)| it follows h

(h) h

= 0, so the mapping f is dierentiable. That concludes the

For the special case of a vector h = {hj } for which hj = 0 for j = k for some diff2 index k, equation (2.2) takes the form fi (a1 , . . . , ak + hk , . . . , am ) = fi (a1 , . . . , ak , . . . , am ) + aik hk + i (hk ) where limhk 0 |i (hk )| = 0; that is just the condition that fi (x), viewed as a |hk | function of the variable xk alone for xed values xj = aj for the remaining variables for j = k, is a dierentiable function of that variable xk and that its derivative at the point xk = ak is the real number aik . The constant aik is called the partial derivative of the function fi with respect to the variable xk at the point a, and is denoted by aik = k fi (a). It follows that the entries in the matrix A = {aik } are the uniquely determined partial derivatives of the coordinate functions of the mapping; this matrix is called the derivative of the mapping f at the point a and is denoted by f (a), sodif1

(2.3)

f (a) =

f (a)ik = k fi (a) 1 k m, 1 i n .

Generally it is a fairly straightforward matter to calculate the partial derivatives of a function; merely consider all the variables except one of them as constants, and apply the familiar techniques for calculating derivatives of a function of one variable. That can be applied to each coordinate function of a mapping, diff1 to yield the derivative of that mapping. It is evident from (2.1) that a linear combination c1 f1 + c2 f2 of two dierentiable mappings f1 , f2 : Rm Rn at a point a for any constants c1 , c2 R is again a dierentiable mapping at that dif1 point, and it follows from (2.3) that dierentiation is linear in the sense that (c1 f1 + c2 f2 ) (a) = c1 f1 (a) + c2 f2 (a). There are various alternative notations for derivatives and partial derivatives of functions of several variables that are in f (a) common use. For instance Df (a) is often used for f (a), and Dk f (a) or xk

2.1. THE DERIVATIVE

21

for k f (a) when f (x) is a real-valued function of the variable x Rn ; and when the mapping f : Rm Rn is viewed as giving the coordinates yi of a point y Rn as functions yi (x) of the coordinates xj of points x Rm the notation yi is quite commonly used for j yi . xj If a mapping f : Rm Rn is dierentiable at a point a Rm then it has partial derivatives k fi (a) with respect to each variable xj at that point; but it is not true conversely that if the coordinate functions of a mapping f have partial derivatives at a point a with respect to each variable xj then f is a dierentiable mapping. For example the mapping f : R2 R dened by x x 2 1 2 2 if x = 0, x1 + x2 f (x) = 0 if x = 0

vanishes identically in the variable x2 if x1 = 0, so 2 f (0, 0) = 0, and similarly 1 f (0, 0) = 0. This function is not even continuous at the origin, since for instance it takes the value 1 whenever x1 = x2 except at the origin where it 2 takes the value 0; hence it is not dierentiable at the origin. However if the partial derivatives of a mapping not only exist but also are continuous then the mapping is dierentiable. Theorem 2.2 If the partial derivatives of a mapping f : Rm Rn exist at all points near a and are continuous at the point a then the mapping f is dierentiable at the point a.

diffeach

Proof: In view of Theorem 2.1 it is enough to prove this for the special case that n = 1, in which case the mapping f : Rm R is just a real valued function; and for convenience only the case m = 2 will be demonstrated in detail, since it is easier to follow the proof in the simpler case and all the essential ideas are present. Assume that the partial derivatives k fi (x) exist for all points x near a and are continuous at a = {aj }, and consider a xed vector h = {hj }. When one of the variables is held xed and f is viewed as a function of the remaining variable it is a dierentiable function of a single variable. The mean value theorem for functions of a single variable asserts that if f (x) is continuous in a closed interval [a, b] and is dierentiable at each point of the open interval (a, b) then f (b)f (a) = f (c)(ba) for some point c (a, b); this can be applied to the function f (x1 , a2 ) of the single variable x1 in the interval between a1 and a1 +h1 , and to the function f (a1 +h1 , x2 ) of the single variable x2 in the interval between a2 and a2 + h2 if h is suciently small; as a consequence there exist values 1 between a1 and a1 + h1 and 2 between a2 and a2 + h2 such that f (a1 + h1 , a2 ) f (a1 , a2 ) = h1 1 f (1 , a2 ) and f (a1 + h1 , a2 + h2 ) f (a1 + h1 , a2 ) = h2 2 f (a1 + h1 , 2 ).

diffloc

22 Then

CHAPTER 2. DIFFERENTIABLE MAPPINGS

f (a + h) f (a) = f (a1 + h1 , a2 + h2 ) f (a1 , a2 ) = = f (a1 + h1 , a2 + h2 ) f (a1 + h1 , a2 ) + f (a1 + h1 , a2 ) f (a1 , a2 ) = h2 2 f (a1 + h1 , 2 ) + h1 1 f (1 , 2 ) = h2 2 f (a1 , a2 ) + h1 1 f (a1 , a2 ) + (h) where (h) = h2 2 f (a1 + h1 , 2 ) 2 f (a1 , a2 ) + h1 1 f (1 , a2 ) 1 f (a1 , a2 ) . By the triangle inequality |(h)| 2 f (a1 + h1 , 2 ) 2 f (a1 , a2 ) + 1 f (1 , a2 ) 1 f (a1 , a2 ) h since|h1 | h

be continuous at the point a it follows that limh0 |(h)| = 0. That shows that h the mapping f : R2 R is dierentiable at the point (a1 , a2 ), which suces to conclude the proof. If a mapping f : Rm Rn is dierentiable at all points of an open subset U Rm the partial derivatives of the coordinate functions of the mapping f exist at each point of U , but need not be continuous functions in U . The standard example for functions of a single variable is the function x2 sin 1 if x = 0, x2 (2.4) f (x) = 0 if x = 0, which is dierentiable at all points x R but for which the derivative is not continuous at the point 0. Indeed if x = 0 then f (x) = 2x sin 1 2 1 cos 2 , x2 x x

1 and

|h2 | h

1; and since the partial derivatives are assumed to

diffexnc

which is not bounded as x tends to zero; however f (0) = lim h2 sin 1 0 1 h2 = lim h sin 2 = 0 h0 h0 h

h0

1 since | sin h | 1 for h = 0. Thus the hypothesis in the preceding theorem is strictly stronger than just the assumption that the partial derivatives of the coordinate functions of the mapping exist. A mapping f : U Rn dened in an open subset U Rm is said to be continuously dierentiable or of class C 1 in an open subset U Rm if the partial derivatives of its coordinate functions

2.2. THE CHAIN RULE

23

exist and are continuous throughout the set U . By the preceding theorem, that is precisely equivalent to the condition that the mapping f is dierentiable at each point a U and the matrix function f (a) is a continuous function of the variable a U . There are thus two fairly natural but not entirely equivalent notions that relate to properties of dierentiability of mappings: the formal defdiff1 inition (2.1) that a mapping is dierentiable in an open set and the condition C 1 that the component functions of a mapping have continuous partial derivatives in an open set. Dierent applications or considerations may lead to one or the other of these two properties being the primary one considered; that is just a matter of convenience, and you may expect to nd dierent emphases in different situations. It is important though to recognize that these two notions are not entirely equivalent to one another, but to remember how they are actually related.

2.2

The Chain Rule

If g : U Rm is a mapping dened in an open neighborhood U of a point a Rl and f : V Rn is a mapping dened in an open neighborhood V of the point b = g(a) Rm , where g(U ) V , the composition = f g : U Rn is the mapping dened by (x) = f g(x) for any x U ; this situation is described in the following diagram. Rl Rm Rn

difffg

(2.5)

U g

g

V

f

f

W

=f g

a b = g(a) f (b) = (a) chainrule

Theorem 2.3 If the mapping g is dierentiable at the point a and the mapping f is dierentiable at the point b = g(a) then the composite function = f g is dierentiable at the point a and (a) = f g(a) g (a). Proof: Since the mapping f is dierentiable at the point b

comp1

(2.6)

f (b + h) = f (b) + f (b)h + f (h) where lim

h0

f (h) = 0, h

and since the mapping g is dierentiable at the point acomp2

(2.7)

g(a + k) = g(a) + g (a)k + g (k) where limb h

k0

g (k) = 0. k

24comp2

CHAPTER 2. DIFFERENTIABLE MAPPINGScomp1

Substituting (2.7) into (2.6) where b = g(a) and h = g (a)k + g (k) leads to the result that (a + k) = f g(a + k) = f b + h = f (b) + f (b)h + f (h) = (a) + f (b) g (a)k + g (k) + f (h) hence thatcomp3

(2.8)

(a + k) = (a) + f (b)g (a)k + (k)natnorm2

where (k) = f (b)g (k) + f (h). From the inequality (1.10) and the triangle inequality it follows that f (b)g (k) (k) k k n f (b) Since limk0

+

f (h)) g (a)k + g (k) h k n g (a)

g (k) f (h)) + k h f (h) h

+

g (k) . k

g (k) k

= 0 and limh0

= 0 while limh0 k = 0 it(k) k

= 0, and it then follows follows from the preceding equation that limk0 comp3 from (2.8) that h = f g is dierentiable at the point a and that (a) = f g(a) g (a), which concludes the proof. For a simple application of the chain rule, since the function g(y1 , y2 ) = y1 y2 is dierentiable at any point y1 , y2 R and g (y1 , y2 ) = (y2 y1 ) it follows that for any dierentiable mapping f = {f1 , f2 } : U R2 in an open subset U Rm the composition = g f : U R is a dierentiable mapping and (x) = g f (x) f (x) = f2 (x) f1 (x) f1 (x) f2 (x) = f2 (x)f1 (x) + f1 (x)f2 (x);

this is just an extension of the familiar product rule for dierentiating functions from the case of functions of a single variable to the case of functions of several variables, since (x) = f1 (x)f2 (x). The entries in the matrix (x) thus have the form k (x) = f2 (x)k f1 (x) + f1 (x)k f2 (x). The corresponding argument shows that the quotient f1 /f2 of two dierentiable functions is dierentiable at any point x at which f2 (x) = 0, and that its derivative has the expected form. For another application of the chain rule, if f : U V is a one-to-one mapping between two open subsets U, V Rm and if g : V U is the inverse mapping then g f : U U is the identity mapping g f (x) = x, so (g f ) (x) = I where I is the m m identity matrix. If the mappings f and g are continuously dierentiable then by the chain rule g f (x) f (x) = (g f ) (x) = I; thus the matrix g f (x) is the inverse of the

2.2. THE CHAIN RULE

25

matrix f (x) at each point x U , so both matrices are nonsingular matrices at each point x U . An alternative notation for the chain rule is suggestive and sometimes quite useful. When mappings f : Rl Rm and g : Rm Rn are described in terms of the coordinates t = {t1 , . . . , tl } Rl , x = {x1 , . . . , xm } Rm and y = {y1 , . . . , yl } Rn , the coordinate functions of the mappings f , g and have the form yi = fi (x) = i (t) and xj = gj (t). The partial derivatives are sometimes denoted by

( )ik = k i =

yi , tk

(f )ij = j fi (x) =

yi , xj

(g )jk = k gj (t) =

xj . tk

By the preceding theorem the derivative of the composite function = f g is the matrix product = f g , which in terms of the entries of these matrices is n n ( )ik = j=1 (f )ij (g )jk or equivalently k i = j=1 j fi k gj ; and in the alternative notation this takes the formn j=1

ch1

(2.9)

yi = tk

yi xj . xj tk

This is the extension to mappings in several variables of the traditional formulady tion of the chain rule for functions of a single variable as the identity dy = dx dx ; dt dt this form of the chain rule is in some ways easier to remember, and with some caution, easier to use, than the version of the chain rule in the preceding theorem. It is customary however to omit any explicit mention of the points at which the derivatives are taken; so some care must be taken to remember the xj y yi the derivative xi is evaluated at the point x while the derivatives tk and tk j are evaluated at the point t. This lack of clarity means that some caution must be taken when this notation is used. Some care also must be taken with the chain rule in those cases where the compositions are not quite so straightforward. For instance if (x1 , x2 ) = f x1 , x2 , g(x1 , x2 ) for a function f (x1 , x2 , x3 ) of three variables and a function g(x1 , x2 ) of two variables, the function is really the composition = f G of the mappings f : R3 R1 given by the function f and the mapping G : R2 R3 given by x1 G(x1 , x2 ) = x2 , g(x1 , x2 )

26

CHAPTER 2. DIFFERENTIABLE MAPPINGS

so if the function g is dierentiable it follows from the chain rule that (x) = f G(x) G (x)

=

1 f G(x)

2 f G(x)

3 f G(x)

1 g(x) 2 g(x) 0 1

1

0

=

1 f G(x) + 3 f G(x) 1 g(x) 2 f G(x) + 1 f (x)2 g(x)

hence the coordinate functions of the matrix (x) are 1 (x) = 1 f G(x) + 3 f G(x) 1 g(x), 2 (x) = 2 f G(x) + 1 f G(x) 2 g(x). This amounts to calculating the partial derivative 1 (x) as the sum of the partial derivatives of the function f x1 , x2 , g2 (x1 , x2 ) with respect to each of its three variables, and multiplying each of these derivatives by the derivative of what is in the place of that variable with respect to the variable x1 . Some care must be taken with this calculation, though. For example, if f (x1 , x2 ) = sin x1 + x3 2 then 1 f (x1 , x2 ) = cos x1 and consequently 1 f (ex2 , x1 x2 ) = cos ex2 and 2 f (ex2 , x1 x2 ) = 3(x1 x2 )2 ; and 2 f (x1 , x2 ) = 3x2 2

therefore if g(x1 , x2 ) = f (ex2 , x1 x2 ) then 1 g(x1 , x2 ) = 1 f (ex2 , x1 x2 ) x2 e + 2 f (ex2 , x1 x2 ) (x1 x2 ) x1 x1 = 0 + 3(x1 x2 )2 x2 = 3x2 x3 1 2 x2 e + 2 f (ex2 , x1 x2 ) (x1 x2 ) x2 x2 + 3(x1 x2 )2 x1 = ex2 cos ex2 + 3x3 x2 . 1 2

and 2 g(x1 , x2 ) = 1 f (ex2 , x1 x2 ) = cos(ex2 ) ex2

This can be checked in this case by an alternative approach, noting that g(x1 , x2 ) = sin ex2 + (x1 x2 )3 and applying the chain rule for functions of a single variable. Some practice, checked by going back to the form of the chain rule give in Theochainrule rem 2.3, may prove helpful; and if there are any doubts about an application of

2.3. HIGHER DERIVATIVES

27

the chain rule they can be cleared up by identifying the function as an explicit composition of mappings. It should be noted that in this case the meaning of the expression where x3 = g(x1 , x2 ) f x1 , x2 , x3 x1 is not clear; it may mean either the derivative of the function f with respect to its rst variable or the derivative of the composite function of the two variables x1 , x2 with respect to the variable x1 , while 1 f (x1 , x2 , x3 ) where x3 = g(x1 , x2 ) is less ambiguous. The chain rule also is useful in deriving information about the derivatives of functions that are dened only implicitly. For example if a function f (x1 , x2 ) satises the equation f (x1 , x2 )5 + x1 f (x1 , x2 ) + f (x1 , x2 ) = 2x1 + 3x2 and the initial condition that f (0, 0) = 0 then the values of that function are determined implicitly but not explicitly by the preceding equation. This equation is the condition that the composition of the mapping F : R2 R3 dened by F (x1 , x2 ) = x1 , x2 , f (x1 , x2 ) and the mapping G : R3 R dened by G(x1 , x2 , y) = y 5 + x1 y + y 2x1 3x2 is the trivial mapping G F (x1 , x2 ) = 0, so that (G F ) (0, 0) = 0; and if the function f is dierentiable it follows from the chain rule that 1 (G F ) = 5f (x1 , x2 )4 1 f (x1 , x2 ) + x1 1 f (x1 , x2 ) + f (x1 , x2 ) + 1 f (x1 , x2 ) 2, so since 1 (G F ) = 0 and f (0, 0) = 0 the preceding equation reduces to 1 f (0, 0) = 2. A similar calculation yields the value of 2 f (0, 0).

2.3

Higher Derivatives

shows that 1 2 f (0, 0) = 1 but 2 1 f (0, 0) = 1. However for suciently regular functions the order of dierentiation is irrelevant.multord

If f : U R is a function dened in an open set U R2 and if the partial derivative j1 f (x) exists at all points x U the function j1 f (x) may itself have partial derivatives, such as j2 j1 f (x) , which for convenience is shortened to j2 j1 f (x); and the process may continue, leading to j3 j2 j1 f (x) and so on. The order in which successive derivatives are taken may be signicant; for example, a straightforward calculation for the function x1 x2 (x2 x2 ) 1 2 if (x1 , x2 ) = (0, 0) x2 + x2 1 2 f (x1 , x2 ) = 0 if (x1 , x2 ) = 0 Theorem 2.4 If f : U R is a function in open subset U R2 , if the partial derivatives 1 f (x), 2 f (x), 1 2 f (x), 2 1 f (x) exist at all points x U , and if the mixed partial derivatives 1 2 f (x), 2 1 f (x) exist and are continuous at a point a U , then 1 2 f (a) = 2 1 f (a).

28

CHAPTER 2. DIFFERENTIABLE MAPPINGS

Proof: By an application of the mean value theorem for functions of one variable it follows that = f (a1 + h1 , a2 + h2 ) f (a1 + h1 , a2 ) f (a1 , a2 + h2 ) f (a1 , a2 ) = (a1 + h1 ) (a1 ) = (1 )h1 where (x1 ) = f (x1 , a2 + h2 ) f (x1 , a2 ) where 1 is between a1 and a1 + h1

if h is suciently small, and by another application of the mean value theorem for functions of one variable it further follows that (1 ) = 1 f (1 , a2 + h2 ) 1 f (1 , a2 ) = 2 1 f (1 , 2 )h2 if h is suciently small; consequently = 2 1 f (1 , 2 )h2 h1 . On the other hand it is possible to group the terms in the equation for in another way, so that = f (a1 + h1 , a2 + h2 ) f (a1 , a2 + h2 ) f (a1 + h1 , a2 + h2 ) f (a1 , a2 ) . The same argument used for the rst grouping when applied to this grouping amounts to interchanging the roles of the two variables, which leads to the result that = 1 2 f (1 , 2 )h1 h2 for some points 1 between a1 and a1 + h1 and 2 between a2 and a2 + h2 if h is suciently small; the points 1 and 2 are not necessarily the same as the points 1 and 2 since they are derived by dierent applications of the mean value theorem in one variable. Comparing the two expressions for and dividing by h1 h2 shows that 2 1 f (1 , 2 ) = 1 2 f (1 , 2 ). Since the functions 1 2 f (x1 , x2 ) and 2 1 f (x1 , x2 ) are both continuous and 1 and 1 both approach a1 as h1 tends to 0, and correspondingly for the points 2 and 2 , it follows in the limit that 2 1 f (a1 , a2 ) = 12 f (a1 , a2 ), which suces to conclude the proof. The preceding result shows that so long as the mixed partial derivatives exist and are continuous functions the order in which partial derivatives are taken is immaterial; in particular that is the case for C 2 functions, those for which all the second partial derivatives exist and are continuous. In such cases the notation 2 can be simplied, for example by writing 1 2 f (x) in place of 1 2 1 f (x) or 2 1 1 f (x), or by writing j1 j2 f (x) in place of j1 j2 f (x). It is sometimes

where 2 is between a2 and a2 + h2

2.3. HIGHER DERIVATIVES

29

convenient to use the multi-index notation, in which I f (x) where I = (3, 2, 1) 3 2 stands for 1 2 3 f (x) for instance. Another notation frequently used is 3 f (x) 2 = 1 2 f (x). x2 x2 1 The denition (2.1) of dierentiability involved an approximation of a function by a polynomial of degree 1 for suciently small values of the the auxiliary variable h. The existence of higher order derivatives can be interpreted correspondingly as involving an approximation of a function by polynomials of higher degree for suciently small values of the the auxiliary variable h. Since this topic is sometimes omitted in treatments of functions of a single variable the discussion here will begin with the results for functions of one variable, from which the general result follows readily.taylor1diff1

Theorem 2.5 (Taylor expansion in one variable) If a function f : U R1 has partial derivatives up to order k + 1 in an open neighborhood U R1 of a point a R1 then for any h R1 suciently small (2.10) f (a + h) = f (a) + f (a)h + 1 1 f (a)h2 + f (a)h3 + . . . 2! 3! 1 (k) 1 + f (a)hk + f (k+1) ()hk+1 k! (k + 1)!

taylorexp

where is between a and a + h. Proof: For a xed h R suciently small that the closed interval from a to a + h is contained in U lettaylorq1

(2.11) R(x) = f (a + x) f (a) f (a)x

1 1 f (a)x2 f (a)x3 . . . 2! 3! 1 1 cxk+1 f (k) (a)xk k! (k + 1)!

where c R is chosen so that R(h) = 0. Clearly R(0) = 0 as well, so it follows from an application of the mean value theorem for functions of a single variable that R (1 ) = 0 for some value 1 between 0 and h. Note thattalorq2

(2.12) R (x) = f (a + x) f (a) f (a)x

which is much the same as (2.11) but for the index k 1 in place of the index k. In particular R (0) = 0, and since R (1 ) = 0 it follows from another application of the mean value theorem for functions of a single variable that R (2 ) = 0 for some 2 between 0 and 1 . The process continues, with further dierentiation taylorq1 of the expression (2.11), yielding a sequence of points {1 , 2 , . . . , k+1 } where

taylorq1

1 f (a)x2 2! 1 1 f (k1) (a)xk1 cxk , (k 1)! k!

30

CHAPTER 2. DIFFERENTIABLE MAPPINGS

i+1 is between 0 and i , hence is between 0 and h, and R(i) (i ) = 0. Finally for the case i = k + 1 it follows that R(k+1) (x) = f (k+1) (x) c, and since R(k+1) (k+1 ) = 0 it follows that c = f (k+1) (k+1 ). That suces for the proof. The corresponding result in several variables can be deduced from the result in a single variable by an application of the chain rule.taylor2

Theorem 2.6 (Taylor expansion in several variables) If f : U R has continuous partial derivatives up to order k+1 in an open neighborhood U Rm of a point a Rm then for any h = {hj } Rm suciently smallm

taylormult

(2.13) f (a + h) = f (a) +j=1

j f (a)hj + 1 k! jm

1 2! j

m

1 ,j2 =1

j1 j2 f (a)hj1 hj2 +

+

j1 ...jk f (a)hj1 . . . hjk + 1 (k + 1)! jm

1 ,...,jk =1

j1 ...jk+1 f ()hj1 . . . hjk+1

1 ,...,jk+1 =1

where is between a and a + h on the line segment connecting them. Proof: Let (t) = a+th for any t R and consider the function g(t) = f ((t)), for which g(0) = f (0) = f (a) and g(1) = f (1) = f (a + h). By the Taylor taylorexp expansion (2.10) of the function g(t)tayloraux

(2.14)

g(1) = g(0) + g (0) +

1 1 1 g (0) + . . . + g (k) (0) + g (k+1) ( ) 2! k! (k + 1)!

for some (0, 1). By repeated applications of the chain rulem

g (t) =j1 =1 m

j1 f (a + th)hj1 , j1 j2 f (a + th)hj1 hj2j1 ,j2 =1

g (t) =

and in generalm

g () (t) =j1 ,j2 ,...,jnu =1

j1 j2 j f (a + th)hj1 hj2 hj ;taylormult

substituting these values into (2.14) for t = 0 yields (2.13) for = a + h and thereby concludes the proof.

tayloraux

2.4. FUNCTIONS

31

As a consequence, if a function f (x) is twice dierentiable near a point a Rm and if h Rm is suciently small thenm

taylorex

(2.15)

f (a + h) = f (a) +j=1

j f (a)hj +

1 2

m j1 ,j2 =1

j1 j2 f ()hj1 hj2

for some point between a and h on the line segment connecting them. The exm pression j1 ,j2 =1 j1 j2 f ()hj1 hj2 is a quadratic form in the variables hj dened multord by the matrix j1 j2 f (); this matrix, which by Theorem 2.4 is a symmetric matrix, is called the Hessian taylorexp taylormult at the point a. The last of the function f (x) term in the Taylor expansions (2.10) and (2.13) is called Lagranges form of the remainder; there are various alternative forms for the last term that are often used, but they will not be needed in the discussion here.

2.4fcurv

Functions

A common application of dierentiation in the study of functions of several variables is to the determination of extremal values of a dierentiable function f dened in an open set U Rm . A point a U is a local maximum (plural: local maxima) of f if f (a) f (x) for all points x near a, and is a local minimum (plural: local minima) of the function f if f (a) f (x) for all points x near a; in either case the point a is called a local extremum (plural: local extrema) of the function f . It is familiar that local extrema of dierentiable functions f (x) of a single variable occur among those points a at which f (a) = 0. Local extrema of a dierentiable function f of several variables occur similarly among those points a at which f (a) = 0, which points are called the critical points of the function f . That is quite clear, since if a is a local extremum then it is a local extremum of the function f (x) as a function of any one coordinate xj when the remaining coordinates are held xed, so by the one-variable result j f (a) = 0; and since that is the case for all indices j it follows that f (a) = {j fi (a)} = 0 hence the point a is a critical point. Not all critical points are local extrema though; the origin is a critical point of the function f (x) = x3 of a single variable but is neither a local maximum nor a local minimum. The process of nding maxima or minima of functions dened in an open subset U Rm begins with the identication of the critical points of the function, which are candidates for at least local extrema. In many cases it is fairly easy to tell by close inspection whether a local extremum is a local maximum or local minimum; and if that fails there is an analogue for functions of several variables of the second-derivative test familiar from the examination of extrema of functions of a single variable. If f is a twice continuously dierentiable function with a taylorex critical point a Rm then since f (a) = 0 the Taylor expansion(2.15) takes the formm

f (a + h) = f (a) +j1 ,j2 =1

j1 j2 f ()hj1 hj2

32

CHAPTER 2. DIFFERENTIABLE MAPPINGS

where is a point between a and a+h on the line joining these two points. If the Hessian j1 j2 f (a) is positive denite then by continuity the Hessian j1 j2 f () is positive denite for all points U for an open neighborhood U of a, so thatm j1 ,j2 =1

j1 j2 f ()hj1 hj2 0 for all U ;

consequently f (a + h) f (a) so a is a local minimum of the function f . Similarly if the Hessian j1 j2 f (a) is negative denite then a is a local maximum. If the Hessian is neither positive denite nor negative denite the second derivative test generally provides no information; but if the Hessian has some strictly positive and some strictly negative eigenvalues then a is neither a local maximum nor local minimum, but is a point called a saddle point of the function. Geometrically the graph of a function of two variables near such a point looks something like an ordinary saddle; it is concave upwards in one direction and concave downwards in another direction. The diculty lies in determining whether the Hessian is positive or negative denite. The simplest general result is that a symmetric matrix is positive denite if and only if the principal minors (the determinants of all submatrices formed from the rst k rows and columns for 1 k n) are all strictly positive, and the matrix is negative denite if and only if the principal minors alternate in sign beginning with a negative sign, so the 1 1 minor is negative, the 2 2 minor is positive, and so on. For the case of functions in Rm for suciently small m this is fairly easy to apply: for m11 m12 are m11 example if m = 2 the principal minors of a matrix M = m21 m22 and m11 m22 m12 m21 , so it is fairly easy to see whether the critical point is a local maximum or local minimum, although it does involve calculating some second derivatives. It is sometimes tempting to think that a function f has a 2 local minimum at a Rm if j f (a) 0 for 1 j m; but that is not necessarily the case, since for example the matrix ( 1 2 ) has strictly positive diagonal 21 entries but is not positive denite since the determinants of the two principal minors are 1 and 3. As an example, consider the function f (x1 , x2 ) = x4 x2 + x1 x4 x1 x2 1 2 in the plane R2 . Since f (x1 , x2 ) = 4x3 x2 + x4 x2 , x4 + 4x1 x3 x1 2 1 2 1 = (4x3 + x3 1)x2 , (x3 + 4x3 1)x1 2 1 1 2

it is a straightforward calculation to see that there are just 4 critical points, points at which f (x) = 0, the points (0, 0), (1, 0), (0, 1) ,1 1 , 3 5 35

.

The function f (x1 , x2 ) = x1 x2 (x3 + x3 1) is 0 at the origin (0, 0) but is clearly 1 2 negative along the line x1 = x2 and positive along the line x1 = x2 , near the

2.4. FUNCTIONS

33

origin so the origin is not an extremum. It is possible to examine the other 3 critical points similarly, although perhaps with a bit more trouble; but in this case it is also easy to use the second derivative test. Since2 2 1 f (x) = 12x2 x2 , 1 2 f (x) = 1 2 f (x) = 4x3 + 4x3 1, 2 f (x) = 12x1 x2 1 1 2 2

the Hessian matrix of the function f (x), the matrix H(x) =2 1 f (x) 1 2 f (x) , 2 2 1 f (x) 2 f (x)

has the following values at the 4 critical points: H(0, 0) = 0 1 , 1 0 0 3 3 , 0 H(1, 0) = H1 1 , 3 5 35

0 3 =

3 , 03 5

H(0, 1) =

4 1

1 . 4

The matrix H(0, 0) is neither positive nor negative denite, since the determinants of the principal minors are 0 and 1; indeed the usual calculation shows that its eigenvalues of the matrix H(0, 0) are 1, so the point (0, 0) is a saddle point. Similarly the two points (1, 0) and (0, 1) also are saddle points. However the determinants of the principal minors of the matrix H(1, 1) are 12 and 9 so 5 1 3 4 1 1 is positive denite and consequently the the matrix H( 5 , 5 = 5 3 3 1 4 point1 1 , 3 5 35

is a local minimum of the function f (x). It is clear that it is

not a global minimum, since for instance f (x, x) = 2x5 x2 takes arbitrarily large negative values. There is a notational convention that is commonly used in considering the derivatives of functions and is quite convenient. The derivative f (x) of a function is a 1 m matrix, so it is not a vector in the sense used for points x Rm , which are always viewed as column vectors; but the transpose tf (x) is a column vector, so it can be viewed as an ordinary vector in Rm . To avoid confusion and keep these distinctions clear, the transpose of the derivative is often denoted by f (x), although an earlier but still used alternative notation is gradf (x); and this vector is called the gradient of the function f (x). Thus for a function f (x1 , x2 ) of two variables 1 f (x) . f (x) = 1 f (x) 2 f (x) and f (x) = gradf (x) = 2 f (x Although this is really a rather trivial point, the convention that the gradient of a function dened in a subset of a vector space Rm is a vector in the same sense as all other vectors in Rm is quite standard and quite useful; f (x) is a vector that can be used in the same context as the vector x, either in the addition of vectors or in the dot or inner product of two vectors. For example, a straight

34

CHAPTER 2. DIFFERENTIABLE MAPPINGS

line through a point a Rm in the direction of a unit vector u can be described parametrically as the set of points x = (t) for t R, where : R Rm is the mapping (t) = a+ tu. If f : U R is a dierentiable function in an open set U Rm containing the point a the restriction of f to this straight line can be viewed as a function (f )(t) = f (a + tu) of the parameter t R near the origin. The derivative of this restriction is called the directional derivative of the function f at the point a in the direction of the unit vector u, and is denoted by u f (a). It follows from the chain rule that u f (a) = f (a) (0); the coordinate functions of the mapping (t) are j (t) = aj + uj t so the derivative (0) is just the m 1 matrix or vector (0) = {uj }, hencem

der1

(2.16)

u f (a) =j=1

j f (a)uj = f (a) u.

Thus the directional derivative of the function f in the direction of a unit vector u is the dot product of the gradient of the function f at the point a with the unit vector u. The angle between the vectors f (a) and u is determined bynabdir

(2.17)

cos =

u f (a) f (a) u = f (a) 2 f (a) 2

since u is a unit vector; so it follows that the directional derivative is u f (a) = f (a) 2 cos , the length of the projection of the gradient vector f (a) in the direction of the unit vector u. Thus the maximal directional derivative of the function f (x) at the point a is in the direction of the vector f (a) and is equal to f (a) 2 , while the minimal directional derivative is in the direction of the vector f (a) and is equal to f (a) 2 ; and u f (a) = 0 for any vector u orthogonal to the gradient f (a). The gradient f (a) of a function f at a point a is a vector in the direction in which f is increasing most rapidly; and the length f (a) of the gradient vector is the maximum rate of increase of the function f . The question whether a vector is to be viewed as a row vector or a column can be avoided merely by writing a vector as an explicit linear combination of basis vectors in the vector space Rm . If f (x) = xj then f (x) = 0, , 0, 1, 0, , 0 , where the entry 1 is in column j; thus f (x) is independent of x and actually is one of the standard basis vectors for the vector space Rm . It is tempting to avoid introducing a separate notation and to denote this function just by xj . The standard notation for its derivative then would be x , which is somewhat j confusing since x commonly is used to denote another set of variables in Rn ; j it is clearer and rather more customary to denote the derivative of the function xj by dxj , so thatdffq1

(2.18)

dxj = 0, , 0, 1, 0, , 0 ,

2.4. FUNCTIONS

35

with the entry 1 in column j and all other entries 0. With this notation the vector fj (x) for an arbitrary dierentiable function f dffq1 be denoted correcan spondingly by df (x) and written in terms of the basis (2.18) asm

dffq2

(2.19)

df (x) =j=1

j f (x) dxj .

This form for the derivative f (x) is called a dierential form, or to be more explicit, a dierential form of degree 1. More generally a mapping f : U Rm when viewed as associating to a point x Rm a vector also of dimension m is called a vector eld on the subset U Rm ; this is familiar from physics, for the electric, magnetic and gravitational elds. A vector eld f in an open subset U Rm with the coordinate functions fj (x) also can be written as a linear combination of the standard basis vectors in Rn and consequently as a dierential form of degree 1, explicitlym

dffq3

(2.20)

f (x) =j=1

fj (x) dxj .

The discussion of dierential forms will be taken up again in Chapter 5. The condition that a function f is dierentiable at a point a can be viewed as the condition that f can be approximated near the point a by an ane function, a polynomial of degree 1 in the variables in Rm , and that the error in this approximation is fairly small; this interpretation is often quite useful in practice. It is often expressed as the condition that for small changes x in the coordinates of a point in Rm the changediff1 in the value of the function is f approximately a linear function of x; for (2.1) can be writtendifffc4

(2.21)

f (x) = f (x + x) f (x) = f (x)x + (x),

so the change f (x) in the value of the function f (x) is approximately equal to the linear function f (x)x of the change x in the variable x, and the error (x) is much smaller than the change x in the variable sincex0

lim (x)/ x = 0.

If the function f (x) is C 2 in an open neighborhood U of the point x and if M = suptU j1 j2 f (t) is a bound on the Hessian of the function f (t) for t U thenm j1 ,j2 =1

j1 j2 f (t)hj1 hj2 M m2 htaylorex

2

for all points t U , and it follows from (2.15) thatdifffc5

(2.22)

(x) M m2 x

2 ;

36

CHAPTER 2. DIFFERENTIABLE MAPPINGS

this is a rough but sometimes useful estimate for the size of the error term diff1 in(2.1). In practice it is often convenient to have a quick approximation for the values of a function f in a neighborhood of a point when the value of the function at that point is known. Another of the standard approximations in one variable, the mean value theorem, does extend directly to functions of several variables, but extends only as an inequality to mappings from Rm to Rn .mvth

Theorem 2.7 (The Mean Value Theorem) If f : U R is a dierentiable function in an open set U Rm and if two points a, b and the line joining them lie in U then (2.23) f (b) f (a) = f (c) (b a)

mvthq1

for some point c between a and b on the line joining these two points. Proof: The function g(t) = f a + t(b a) ) is a dierentiable function of the variable t in an open neighborhood of the interval [0, 1], and by the mean value theorem for functions of one variable g(1) g(0) = g ( ) for a point (0, 1). Since g(1) = f (b) and g(0) = f (a), while by the chain rulem

g (t) =j=1

j f a + t(b a) (bj aj ),

it follows thatm

f (b) f (a) = g(1) g(0) = g ( ) =m

j=1

j f a + (b a) (bj aj )

=j=1

j f (c)(bj aj ) = f (c) (b a)

where c = a + (a b), and that suces for the proof. Although the preceding theorem can be applied to each coordinate function of the mapping f : U Rn for n > 1, the points at which the derivatives of the dierent coordinate functions are evaluated may be dierent. However for some purposes an estimate is useful enough and that problem can be avoided.mvthcr

mvthcrq1

Theorem 2.8 (The Mean Value Inequality) If f : U Rn is a dierentiable mapping in an open set U Rm and if two points a, b and the line joining them lie in U then (2.24) f (b) f (a) m n f (c) b a for some point c between a and b on the line . Proof: For any vector u Rn for which u 2 = 1 the dot product fu (x) = u f (x) is a real-valued function to which the Mean Value Theorem can be

2.4. FUNCTIONS applied, som

37

fu (b) fu (a) = fu (c) (b a) =n m

j=1

j (u f )(c) (bj aj )n m

=i=1 j=1

ui j fi (c)(bj aj ) =

i=1 j=1

ui f (c)ij (bj aj )

= u f (c)(b a) for some point c between a and b on the line joining these two points. If the unit vector u is chosen to lie in the direction of the vector f (b) f (a) then f (b) f (a)2

f (c)(b a)

= |fu (b) fu (a)| = u f (c)(b a)2

by the Cauchy-Schwarz inequality, so by (1.7) and (1.10) f (b) f (a)

normcmp

natnorm2

f (b) f (a) 2 f (c)(b a) 2 n f (c)(b a) m n f (c)

ba

,

which suces for the proof.mvthcrx

Theorem 2.9 If f : U Rn is a dierentiable mapping in a cell Rm such that f (x) M for all x U then the mapping f is uniformly continuous in . Proof: If f (x) M for all x then for any two points x, y the line joining them is contained in so it follows from the Mean Value Inequality that f (x) f (y) M n x y , which shows that the mapping f is uniformly continuous in and thereby concludes the proof.

38 PROBLEMS: GROUP 1

CHAPTER 2. DIFFERENTIABLE MAPPINGS

(1) Find the derivatives of the following mappings: x1 + x2 (i) f : R3 R3 where f (x) = x3 + x2 2 3 x1 x3 (ii)f : R2 R1 where f (x) = sin(x2 + x2 ) 1 where f (x) = cos(sin x) esin x (iii)f : R1 R2

(2) Find the following iterated partial derivatives: ( i) 1 2 sin(x1 x2 ),2 (ii) 1 sin(x1 x2 ),

(iii) 1 2 3 ex1 +x2 +x3 .

2 f (r cos , r sin ) where f (x1 , x2 ) is a twice continuously dier(3) Find r2 entiable function of two variables. (4) Find the local maxima and minima of the function f (x) = log(x2 + x2 + 1) 1 2 in the plane R2 . (5) Show that the function f (x1 , x2 ) = (x2 + 2x2 )ex1 x2 1 2 has a minimal and a maximal value in R2 ; nd the points at which these extrema are attained, and the values of the function f at those points. (6) Find the directional derivative u x2 x2 + sin(x1 x3 ) 1 at the point (0, 1, 2) in the direction of the unit vector u = (7) Find the direction in which the function f (x) = x1 x2 + x2 x3 + x3 x1 in R3 is increasing and decreasing most rapidly at the point (1, 1, 2)1 (1, 2, 3). 142 2

2.4. FUNCTIONS PROBLEMS: GROUP 2

39

|x1 x2 | (the non-negative square root) (8) Show that the function f (x) = is not dierentiable at the origin. Show that a function f (x) such |f (x)| x 2 2 in an open neighborhood of the origin is dierentiable at the origin, and nd its derivative at the origin. If a function f (x) satises |f (x)| x 2 in an open neighborhood of the origin is it necessarily dierentiable at the origin? (Either show that it is dierentiable or give a counterexample.) (9) If f1 , f2 are C 1 functions dened in an open neighborhood of the point (2, 1) in R2 such that f1 (2, 1) = 4 and f2 (2, 1) = 3, and that2 2 f1 + f2 + 4 = 29 and 2 f2 f1 + 2 + 4 = 17, x2 x2 1 2

nd 1 f2 (2, 1) and 2 f1 (2, 1). (10) If f is a C 1 real-valued function in R2 such that f (1, 2) = f (1, 4) = f (2, 1) = f (4, 1) = 1 and 1 f (1, 1) = 1, 1 f (1, 2) = 1, 1 f (1, 4) = 3, 1 f (2, 1) = 4, 1 f (4, 1) = 2, 2 f (1, 1) = 1, 2 f (1, 2) = 5, 2 f (1, 4) = 1, 2 f (2, 1) = 2, 2 f (4, 1) = 3, nd 1 g(1, 2) where g(x1 , x2 ) = f f (x1 , x2 ), f (x2 , x3 ) . 2 1 (11) If g is a C 1 function in R2 and f (x1 , x2 , x3 ) = g x1 , g(x1 , g(x2 , x3 )) , nd the derivative of the function f in terms of the partial derivatives of the function g. (12) Show that the set M of real 2 2 matrices of the form M (a, b) = a b b a

form a real algebra, in the sense that the sum and product of two matrices of this form have the same form. Show that the mapping that associates to a complex number z = a + i b C the matrix M (z) = M (a, b) M has the property that M (z1 +z2 ) = M (z1 )+M (z2 ) and M (z1 z2 ) = M (z1 )M (z2 ). (The algebra of such mat