Math Course Notes-4

26
Introductory Mathematics 1 Linear Algebra Reference: Gilbert Strang, Introduction to Linear Algebra, Wellesley Camb- drige Press, third Edition. Chapters 1-7. 1.1 Vectors (read Sections 1.1, 3.1, 3.5 of Strang’s book) 1.1.1 R n as vector space Definition: R n is the most important example of vector space, its elements are called vectors and are represented as v = x 1 x 2 . . . x n , x 1 ,x 2 , ..., x n R. Remark: observe that we are using the notation of a column vector and its transpose is a row vector: v T =(x 1 ,x 2 ,...,x n ). Example: R (real line), R 2 (plane, planar vectors), R 3 (whole space), etc. Operations: R n has two operations, the addition of two vectors: v + u = x 1 x 2 . . . x n + y 1 y 2 . . . y n = x 1 + y 1 x 2 + y 2 . . . x n + y n . and the multiplication of a vector by a scalar λ R: λv = λ x 1 x 2 . . . x n = λx 1 λx 2 . . . λx n . 1

description

Course notes on linear algebra and calculus.

Transcript of Math Course Notes-4

Page 1: Math Course Notes-4

Introductory Mathematics

1 Linear Algebra

Reference: Gilbert Strang, Introduction to Linear Algebra, Wellesley Camb-drige Press, third Edition. Chapters 1-7.

1.1 Vectors (read Sections 1.1, 3.1, 3.5 of Strang’s book)

1.1.1 Rn as vector space

Definition: Rn is the most important example of vector space, its elementsare called vectors and are represented as

v =

x1x2...xn

, x1, x2, ..., xn ∈ R.

Remark: observe that we are using the notation of a column vector andits transpose is a row vector:

vT = (x1, x2, . . . , xn).

Example: R (real line), R2 (plane, planar vectors), R3 (whole space), etc.Operations: Rn has two operations, the addition of two vectors:

v + u =

x1x2...xn

+

y1y2...yn

=

x1 + y1x2 + y2

...xn + yn

.

and the multiplication of a vector by a scalar λ ∈ R:

λv = λ

x1x2...xn

=

λx1λx2

...λxn

.

1

Page 2: Math Course Notes-4

Theses properties satisfy the usual distributive and associative laws.When we combine both operations we say that we form linear combina-

tions of u and v: λv + µu.Moreover there is a unique zero vector 0 such that 0+v = v for all v ∈ Rn:

0 =

00...0

.

Finally, for all v, there exists a unique −v such that v + (−v) = 0.Definition: In general, a real vector space is a collection of vectors together

with the rules of vector addition and multiplication by a real number.Examples: The vector space of all real functions f(x). In this case the

’vectors’ are functions. The vector space that consists only in the zero vector.We call it the vector space of dimension 0 since it only has the zero element.

1.1.2 Subspaces of Rn

Example: Consider the vector space R3 and a plane through 0. This planeis also a vector space. We say that it is a subspace of R3.

Definition: A subspace V of a vector space is a set of vectors that satisfiesthe following two conditions:

1. If u,v ∈ V then u + v ∈ V .

2. If v ∈ V and λ ∈ R then λv ∈ V .

Remark: Observe that these two conditions are equivalent to say that alllinear combinations stay in the subspace.

Remark: Observe that the zero vector is always in the subspace. Thereforea plane in R3 that does not contain the 0 is never a subspace.

1.1.3 Linearly dependent and independent vectors

Remark: A subspace that contains u and v must contain all linear combina-tions.

2

Page 3: Math Course Notes-4

Definition: If we choose k vectors v1,...,vk the set of all linear combinations

{λ1v1 + · · ·+ λkvk, λ1, ..., λk ∈ R}

forms a subspace called the subspace spanned by the vectors v1,...,vk.Definition: The dimension of a subspace is the smallest k such that k

vectors span the subspace.Definition: k vectors v1,...,vk in Rn are linearly dependent if there exist

real numbers λ1, ..., λk not all zero such that

λ1v1 + · · ·+ λkvk = 0.

On the other hand, if the equation λ1v1+ · · ·+λkvk = 0 has the only solutionλ1 = · · · = λk = 0, then v1,...,vk are linearly independent.

Remark: Observe that λ1v1 + · · · + λkvk = 0 is a system of n equationsand k unknowns.

Example: Two vectors are linearly dependent if one is multiple of theother, that is, they lie in the same line.

Example: In R3, 3 vectors are linearly independent if they do not lie inthe same plane, but if they do then they are linearly dependent.

Elementary properties:

• If one of the k vectors v1,...,vk is zero, then the vectors are linearlydependent.

• If two vectors are equal in v1,...,vk, then they are linearly dependent.

• If v1,...,vk are linearly independent, then any subcollection is also linearlyindependent.

• If v1,...,vk are linearly dependent, then the family v1, ...,vk,u is alsolinearly dependent, for any u.

• v1,...,vk are linearly dependent if and only if one of them equals a linearcombination of all others.

• If v1,...,vk are linearly independent (or dependent) then if we add amultiple of one of the vectors to another one, the resulting family is alsolinearly independent (or dependent).

3

Page 4: Math Course Notes-4

1.1.4 Basis of a subspace

Definition: k vectors v1,...,vk form a basis of a subspace V of Rn if they arelinearly independent and they span V .

Example: The n vectors

e1 =

10...0

, e2 =

01...0

, ..., en =

0...01

form a basis of Rn. This is called the standard or canonical basis of Rn.

Remark: Rn has infinitely many basis.Theorem: If v1,...,vk forms a basis of a subspace V , then every v ∈ V

can be uniquely written as a linear combination of the basis. In other words,there exist unique λ1, ..., λk ∈ R such that

v = λ1v1 + · · ·+ λkvk.

The λ1, ..., λk are called the coordinates of v in the basis.Theorem: If v1,...,vk spans a subspace V , then some subcollection of it is

a basis of V . Therefore, a basis is formed by the smallest number of vectorsthat span V and are linearly independent.

Theorem: If V has a basis of k vectors, then any other basis has also kvectors.

Definition: The number of vectors in a basis is called the dimension of thesubspace.

Example: Rn has dimension n, since the standard basis has n elements.Change of basis rule: Assume that we have two basis B = {u1, ...,uk}

and B′ = {v1, ...,vk} of the same subspace V . Let w ∈ V with coordinatesλ1, ..., λk in the basis B and coordinates µ1, ..., µk in the basis B′. Then wehave the following rule:

P

λ1...λk

=

µ1...µk

,

where P is the matrix whose columns are the coordinates of each element ofB in the basis B′.

4

Page 5: Math Course Notes-4

1.2 Matrices (read Sections 7.1, 7.2, 7.3, 2.4, 2.5, 2.7, 3.2, 3.3, 3.4 of Strang’sbook)

1.2.1 Linear transformations in Rn

Definition: Let U ⊂ Rn and V ⊂ Rm two subspaces. A transformationT : U → V assigns a vector v ∈ V to each vector u ∈ U , that is, T (u) = v.The transformation T is called linear if for all u1,u2,u ∈ U and λ ∈ R,

T (u1 + u2) = T (u1) + T (u2) and T (λu) = λT (u).

U is called the domain and V is called the range or image of T .Example: rotations of vectors in the plane.Remark: Observe that the range of T defined as V = {T (u),u ∈ U} is a

subspace of Rm.Definition: The kernel of T is the set of vectors K = {u ∈ U : T (u) = 0}.Remark: The kernel of T is a subspace of U .Matrix of a linear transformation: We can assign a matrix to every linear

transformation. Consider the standard basis of U (of dimension k) and V

(of dimension `). Then T is completely determined by the values of T on thestandard basis of U . That is, T is completely determined by the matrix with` rows and k columns, where each columns are the image of the elements ofthe standard of U .

Remark: The same T can be represented by other matrices using differentbasis of U and V . If B is a basis of U and B′ is a basis of V , then the matrixof T in these basis are the coordinates of the image of the elements of B inthe basis B′.

Change of basis rule: One matrix can be computed from the other usingthe product rule: TB,B

′= (PB′,S)−1T S,SPB,S, where PB,S is the matrix with

columns the elements of the basis B, and S denotes the standard basis.

1.2.2 Matrices

Definition: A m × n array of numbers is called a matrix. m is the numberof rows, and n is the number of columns and we denote by ai,j the entry inrow i and column j.

5

Page 6: Math Course Notes-4

Example:

A =

(3 2 30 −1 0

)Here A is a 2× 3 matrix and a2,1 = a2,3 = 0.

Definition: A n× n matrix is called a square matrix.The zero m× n matrix is the matrix that has all entries equal to zero.Matrix operations: Addition of matrices: If A and B are m× n matrices,

then A+B is the m× n matrix with entries ai,j + bi,j.Multiplication by a scalar: If A is am × n matrix and λ ∈ R, then the

matrix λA is m× n with entries λai,j.Notation: −A = (−1)A and A−B = A+ (−B).Remark: The set of m× n matrices is a vector space of dimension mn.Matrix multiplication: We can only multiply two matrices A and B if the

number of columns of A is the same as the number of rows of B. So let A bea m × n matrix and B a n × p matrix. Then C = AB is the m × p matrixwhose entries are computed as

ci,j = ai,1b1,j + ai,2b2,j + · · ·+ ai,nbn,j.

Properties: (AB)C = A(BC), A(B + C) = AB + AC and (A + B)C =AC +BC.

Warning: AB 6= BA ! even if both products are meaningful.Definition: The n× n Identity matrix is defined as

In =

1 0 · · · 00 1 · · · 0...

... · · · ...0 0 · · · 1

.

Properties: If A is a m × n matrix then AIn = A and if B is a n × mmatrix then InB = B. Therefore, if A is an n× n matrix, AIn = InA = A.

Definition: If A is a m× n matrix with entries ai,j then its transpose AT

is the n×m matrix with entries aj,i.Definition: A square matrix is symmetric if AT = A.

6

Page 7: Math Course Notes-4

Properties: (AT)T = A, (A+B)T = AT +BT, (λA)T = λAT and

(AB)T = BTAT.

Definition: We say that a square matrix A is invertible if there exists amatrix A−1 called the inverse of A such that

AA−1 = A−1A = In.

Properties:

1. If A−1 exists, it is unique.

2. If A−1 exists, then A−1 is invertible and (A−1)−1 = A.

3. If A and B are invertible then AB is invertible and (AB)−1 = B−1A−1.

4. If A is invertible then AT is invertible and (AT)−1 = (A−1)T.

5. (λA)−1 = 1λA−1 if λ 6= 0.

Definition: A square matrix A is said to be orthogonal if A−1 = AT.Definition: The column rank of a m×n matrix A is the number of linearly

independent vector among the columns of A. We define in the same way therow rank of A.

Remark: The column rank does not change if we add to a column amultiple of another column or if we multiply by a scalar a column.

Theorem: column rank=row rank.Definition: The rank of a matrix is its column rank.

1.2.3 Systems of linear equations

Definition: Let A be a m × n matrix, x a vector (variable) in Rn and b avector (given) in Rm. Then the equation Ax = b is a system of m linearequations and n variables.

If b = 0 then the system is called homogeneous, otherwise is called inho-mogeneous.

Definition: We define the augmented matrix of A the m× (n+ 1) matrixwhose columns are the columns of A plus the vector b.

7

Page 8: Math Course Notes-4

Theorem: A linear system Ax = b has a solution if and only if the rankof A equals the rank of the augmented matrix.

Remark: Observe that an homogeneous system always has the solutionx = 0.

Remark: Observe that if the rank of the augmented matrix is n+ 1 (thatis, the columns of A and b form a linearly independent family), then thereis no solution to the system Ax = b.

Theorem: Every solution to the system Ax = b can be written as x =xh + xp, where xh is the solution to the homogeneous system Ax = 0, andxp is a particular solution of Ax = b.

Corollary: If Ax = b has a solution, then the number of solutions is thesame as the number of solutions to Ax = 0.

Theorem: Ax = 0 has a unique solution (x = 0) if and only if the columnsof A are linearly independent. In other words, if and only if rank(A)= n.

Remark: If m < n (less equations than variables), then there cannot bemore than m linearly independent columns, and the system cannot have aunique solution.

Theorem: The set of solution of Ax = 0 is a subspace of Rn of dimensionn−rank(A).

Theorem: A n× n square matrix is invertible if and only of rank(A)= n.

1.3 Determinants (read Chapter 5 of Strang’s book)

1.3.1 Properties of the determinants

Definition: The determinant is a function det(A) that associates a numberto every n× n square matrix A, and that has the following properties:

1. det(In)= 1,

2. det(a1, ..., λai, ..., an)= λdet(A), λ ∈ R,

3. det(a1, ..., ai + b, ..., an)= det(A)+det(a1, ...,b, ..., an), b ∈ Rn,

4. det(A)=0 if two of the columns of A are equal,

where a1, ..., an are the column vectors of A.Theorem: These properties uniquely define the determinant.

8

Page 9: Math Course Notes-4

Geometric interpretation: det(A) is the volume of the parallelepiped de-termined by the column vectors of A.

Properties:

• If one of the columns of A is 0, then the determinant of A is 0.

• If we swap two columns of A then we change the sign of the determinant.

• If we add to a column the multiple of another column, this does notchange the value of the determinant.

• If A is triangular then the determinant equals the product of the diagonalterms. Therefore,

det

(a b

c d

)= det

(a b

0 d− (bc)/a

)= ad− bc.

• det(A)=det(AT). Therefore, every property involving columns remainsvalid for rows.

• det(AB)= det(A)det(B).

• det(A−1)= 1det(A) . Therefore, if A is invertible, then det(A)6= 0.

Theorem: If A is a n× n matrix then

det(A) 6= 0⇔ rank(A) = n.

1.3.2 Cofactors, inverse and Cramer’s rule

Definition: The minor matrix of a n × n matrix A is the (n − 1) × (n − 1)matrix Ai,j obtained deteling row i and column j of A. det(Ai,j) is called theminor of A, and det(Ai,i) are the principal minor. (−1)j+i ai,j det(Ai,j) arecalled the cofactors of A. the matrix C that has as entries ci,j equal to thecofactors of A is called the cofactor matrix.

Theorem: For all j = 1, ..., n, we have the formula

det(A) =n∑i=1

(−1)j+iai,jdet(Ai,j).

9

Page 10: Math Course Notes-4

Remark: Observe that in order to apply this formula we need to choosesome j from 1, ..., n.

Theorem: The rank of a n × n matrix A equals the size of the largestminor matrix with a non-zero determinant.

Formula for the inverse of a matrix: If det(A)6= 0, then

A−1 =1

det(A)CT.

Remark: This is a simple formula but computationally very inefficient.In practise it is better to use elementary row operations. That is changingtwo rows, multiplying a row by a non-zero scalar, and adding a multiple ofa row to another. Starting form the n × 2n matrix (A : In), we proceed byelementary row operations until we arrive at (In : A−1).

Theorem: If A is a n × n matrix with det(A)6= 0, then the linear systemAx = b has a unique solution given by x = A−1b.

Cramer’s rule: The solution to this system can also be written as

x1 =d1

det(A), · · · , xn =

dndet(A)

,

where di is the determinant of the matrix obtained replacing column i of Aby the vector b.

1.4 Rn as an Euclidean space (read Section 1.2, Chapter 4 of Strang’s book)

1.4.1 The inner product and norm

Definition: The inner product of two vectors u,v ∈ Rn is defined as

u · v = u1v1 + · · ·+ unvn.

Remark: Using matrix notation, u · v = uTv.Properties:

1. u · v = v · u.

2. (λu) · v = λ(u · v).

3. (u + v) ·w = (u ·w) + (v ·w).

10

Page 11: Math Course Notes-4

4. u · u ≥ 0 and u · u = 0 if and only if u = 0.

Any operation in a vector space satisfying these properties is called aninner product, and a vector space equipped with an inner product is calledan Euclidean space.

Cauchy-Schwarz inequality: (u · v)2 ≤ (u · u)(v · v).Definition: The norm of a vector v ∈ Rn is defined as

‖v‖ =√v · v.

Geometric interpretation: The norm of a vector corresponds to the dis-tance to 0.

Properties:

1. ‖λv‖ = |λ| ‖v‖ for all λ ∈ R.

2. ‖u + v‖ ≤ ‖u‖+ ‖v‖.

3. ‖v‖ ≥ 0 and ‖v‖ = 0 if and only if v = 0.

1.4.2 Orthogonality

Definition: Two vectors u and v in Rn are orthogonal if u · v = 0.Geometric interpretation: They form an angle θ = 90◦. In fact

u · v = ‖u‖ ‖v‖ cos(θ).

Definition: if ‖v‖ = 1, then v is called a unit vector.Theorem: If v1, ...,vk are pairwise orthogonal and different from 0, then

they are linearly independent.Definition: Any collection of n pairwise orthogonal vectors in an Euclidean

space of dimension n forms a basis called orthogonal basis. If moreover thevectors are unit vectors the basis is called orthonormal.

Remark: We can always obtain and orthonormal basis from an orthogonalone by dividing each vector by its norm.

Theorem: Every Euclidean space has an orthogonal basis (therefore, or-thonormal). Proof: The Gram-Schmidt process.

Example: The standard basis is an orthonormal basis of Rn.

11

Page 12: Math Course Notes-4

Theorem: If v1, ...,vn is an orthonormal basis of an n-dimensional vectorspace then the coordinates of every vector v in the space with respect to thisbasis are

v = (v · v1)v1 + · · ·+ (v · vn)vn.Projection of a vector onto a subspace: Consider a basis v1, ...,vk of a

subspace of dimension k in Rn. The vector v in the subspace that is closestto a given vector b in Rn is called the orthogonal projection of b onto thesubspace. Remark that since v is in the subspace, it can be written in aunique way as a linear combination of the basis v1, ...,vk. Therefore, we arelooking for the coordinates of v in the basis.

Solution to this problem: if v1, ...,vk forms an orthonormal basis of thesubspace, then the orthogonal projection of b onto the subspace is given by

v = (b · v1)v1 + · · ·+ (b · vk)vk.

Least square approximations: Consider a linear system Ax = b that hasno solution. Then we want to find a vector e with minimal ‖e‖ and suchthat the system

Ax + e = b

has a solution. In this case, a solution x is called a least square solution.Remark: this problem can be seen as a particular case of the projection

of a vector onto a subspace, where the subspace are all the vectors Ax, andwe want to find x∗ that minimizes the norm ‖Ax∗ − b‖. Then the solutionis the orthogonal projection of b onto the subspace. That is, x∗ such that

(Ax∗ − b) · Ax = 0, for all x.

Which is equivalent to solve the linear system of equations

ATb = ATAx∗,

where observe that ATA is a symmetric matrix.Example: Fit a line to n data points.

12

Page 13: Math Course Notes-4

1.5 Eigenvalues and eigenvectors (read Sections 6.1,6.2,6.4,6.5 of Strang’s book)

1.5.1 Introduction to eigenvalues and eigenvectors

Definition: Let A be an n× n matrix. If x ∈ Rn, x 6= 0 and λ ∈ R are suchthat

Ax = λx,

then we say that x is an eigenvector of A with eigenvalue λ.Remark: The eigenvectors with eigenvalue 0 are the non-zero elements in

the kernel of A.Property: If x is an eigenvector of A with eigenvalue λ then cx is also an

eigenvector of A with eigenvalue λ for all c 6= 0.The equation for the eigenvalues: since Ax = λx⇔ (A− λIn)x = 0, this

implies that λ is an eigenvalue of A ⇔ the columns of the matrix (A− λIn)are linearly dependent ⇔ rank(A− λIn)< n ⇔ det(A− λIn) = 0.

Definition: The equation det(A−λIn) = 0 is called characteristic equationof A, and it is a polynomial of degree n. Therefore, there are at most neigenvalues for A but some of them may be complex.

Remark: It can be proved that a symmetric matrix has only real eigen-values.

1.5.2 Diagonalization of a matrix

Definition: We say that an n × n matrix A is diagonalizable if there existsan invertible matrix S and a diagonal matrix D such that

A = SDS−1,

which is equivalent to S−1AS = D.Application: If A is diagonalizable then we can easily compute its square:

A2 = AA = SDS−1SDS−1 = SD2S−1,

and D2 is the diagonal matrix where the diagonal are the squares of thediagonal of D. By induction, we get that Am = SDmS−1.

Application: eA =∑∞

k=0Ak

k! = In + A + A2

2! + · · · = SeDS−1, and eD is thediagonal matrix where the diagonal are the exponential of the diagonal of D.

13

Page 14: Math Course Notes-4

Property: If A is diagonalizable, then det(A)=det(D), which equals theproduct of the diagonal elements of D.

Spectral Theorem: A is diagonalizable ⇔ A has n linearly independenteigenvectors v1, ...,vn with eigenvalues λ1, ..., λn.

In this case v1, ...,vn are the columns of S and λ1, ..., λn are the diagonalof D.

Consequence: If A is diagonalizable, then det(A) equals the product ofthe eigenvalues.

Remark: If A has n different eigenvalues λ1, ..., λn, then the correspondingeigenvectors v1,...,vn are linearly independent. Therefore, in this case it isdiagonalizable.

Remark: S is not unique.Theorem: A diagonalizable matrix is invertible if and only if all its eigen-

values are non-zero. In this case, A−1 = SD−1S−1, where D−1 is the diagonalmatrix with the inverse of the elements in the diagonal of D.

Theorem: If A is a symmetric matrix and λ1 and λ2 are two differenteigenvalues, then the corresponing eigenvectors x and y are orthogonal.

Spectral Theorem for symmetric matrices: Let A be a n × n symmetricmatrix. Then there exists an orthogonal matrix Q and a diagonal matrixD such that A = QDQ−1. The columns of the matrix Q can be taken asthe orthonormal eigenvectors and the diagonal of D are the correspondingeigenvalues.

Definition: The trace of a matrix is the sum of the elements in the diagonal.Propeties: tr(λA) = λtr(A), tr(AT) = tr(A), tr(A + B) = tr(A) + tr(B),

tr(In) = n, tr(AB) = tr(BA).Property: If A is diagonalizable then the trace of A is the sum of the

eigenvalues.Definition: A square matrix A is idempotent if A2 = A.Property: If A is idempotent and λ is an eigenvalue of A then λm are also

eigenvalues of A for all m ∈ N.Theorem: If A is an idempotent matrix, the only eigenvalues of A are zero

or one.

14

Page 15: Math Course Notes-4

1.5.3 Quadratic forms

Definition: A n× n symmetric matrix is said to be positive definite if

xTAx > 0, for all x 6= 0.

Negative definite if xTAx < 0 for all x 6= 0.Positive semidefinite if xTAx ≥ 0 for all x 6= 0.Negative semidefinite if xTAx ≤ 0 for all x 6= 0.Indefinite if both xTAx > 0 and xTAx < 0 are possible.Theorem: A is positive definite if and only if all the eigenvalues of A are

> 0. A is negative definite if and only if all the eigenvalues of A are < 0.A is positive semidefinite if and only if all the eigenvalues of A are ≥ 0. Ais negative semidefinite if and only if all the eigenvalues of A are ≤ 0. A isindefinite if and only A has two eigenvalues of different sign.

Theorem: A is positive definite if and only all the kth leading principalminor dk are positive for all k = 1, ..., n, where dk is the determinant of itsupper-left k × k submatrix.A is negative definite if and only if (−1)kdk > 0 for all k = 1, ..., n.Definition: A quadratic form Q(x1, ..., xn) is a function of the form

Q(x1, ..., xn) = xTAx =n∑

i,j=1

ai,jxixj,

where A is a n× n matrix and x = (x1, ..., xn)T.

Remark: Since the coefficient of xixj is ai,j + aj,i, the quadratic formdoesn’t change if we replace both coefficients by

ai,j+aj,i2 , which corresponds

to assume that A is symmetric.

15

Page 16: Math Course Notes-4

2 Calculus in one variable

References: Domingo Pestana, Jose Manuel Rodrıguez, Elena Romera y Ve-nancio Alvarez, Curso practico de calculo y precalculo, Ariel Ciencia.

Ron Larson and Bruce H. Edwards, Calculus, Brooks Cole.

2.1 Sequences in R (Chapter 5 of Pestana’s book)

Definition: A sequence of real numbers assigns a number xn to each naturalnumber n = 1, 2, 3, ...

Notation: {xn}n∈N = {x1, x2, x3, ....., xn, ....}Remark: A sequence may be defined by a formula or by a recursion.Example: Fibonacci sequence: x1 = x2 = 1, xn = xn−1 + xn−2, n = 3, 4, ...Definition: A sequence is monotone increasing if for every n ∈ N, xn+1 ≥

xn, and strictly monotone increasing if xn+1 > xn.A sequence is monotone decreasing if for every n ∈ N, xn+1 ≤ xn, and

strictly monotone decreasing if xn+1 < xn.A sequence is monotone if it is monotone increasing or monotone decreas-

ing.Definition: A sequence if bounded if there exists a real number M > 0

such that for all n ∈ N, xn ∈ [−M,M ].Definition: A sequence converges to a real number x ∈ R if for all ε > 0

there exists N(ε) > 0 such that for all n > N , |xn − x| < ε.Notation: limn→∞ xn = x.Definition: A sequence is said to be convergent if there exists x ∈ R such

that limn→∞ xn = x. Otherwise, is called divergent.Remark: A divergent sequence may have a limit +∞ or −∞.Definition: A sequence converges to +∞ (or −∞) if for all A > 0 there

exists N(A) > 0 such that for all n > N , xn > A (or xn < −A).Notation: limn→∞ xn = +∞ or −∞.Property: If limn→∞ xn = x and limn→∞ yn = y then

limn→∞

(xn + yn) = x+ y, limn→∞

xnyn = xy, limn→∞

xnyn

=x

y,

the last equality being true only if yn, y 6= 0.

16

Page 17: Math Course Notes-4

Criteria of convergence:Theorem: If limn→∞ xn = limn→∞ yn = x, and for all n ∈ N, xn ≤ an ≤ yn,

then limn→∞ an = x.Remark: This result is also true if x = +∞ or −∞.Theorem: Every bounded and monotone sequence is convergent.Remark: Observe that every convergent sequence is bounded.Definition: A sequence is Cauchy if for all ε > 0 there exists N(ε) > 0

such that for all n,m > N , |xn − xm| < ε.Theorem: A sequence is convergent if and only if it is a Cauchy sequence.Subsequences:Definition: Let n1 < n2 < n3 < · · · be a strictly monotone increasing

sequence of positive integers. Then the sequence {xnk}k∈N = {xn1, xn2, ...} iscalled a subsequence of {xn}.

Property: If limn→∞ xn = x, then limk→∞ xnk = x for any subsequence.Theorem (Bolzano-Weierstrass): Every bounded sequence has a conver-

gent subsequence.Infimum and supremum:Definition: Let A ⊂ R be a subset of real numbers. Then the infimum of

A is the number a in R satisfying:

• a ≤ x for all x ∈ A .

• for any ε > 0 there exists x ∈ A such that a+ ε > x.

Notation: inf A = a and if no such a exists then inf A = −∞.The supremum of A is the number a in R satisfying:

• a ≥ x for all x ∈ A .

• for any ε > 0 there exists x ∈ A such that a− ε < x.

Notation: supA = a and if no such a exists then supA = +∞.Definition: inf xn = inf{xn} and supxn = sup{xn}.Remark: If a sequence xn is monotone decreasing and bounded below then

limn→∞ xn = inf xn.Subsequential limit:Definition: A number a is called a subsequential limit of a sequence xn if

there exists a subsequence convergent to a.

17

Page 18: Math Course Notes-4

Definition: Let A be the set of subsequential limits of a sequence xn. Thenthe limit inferior of the sequence is the infimum of A and the limit superiorof the sequence is the supremum of A.

Notation: lim infn→∞ xn = inf A and lim supn→∞ xn = supA.Remark: Observe that if the sequence xn converges to x, then A = {x} so

lim infn→∞

xn = lim supn→∞

xn = x.

Observe also that the limit superior and limit inferior are both subsequentiallimits.

2.2 Functions of one variable

2.2.1 Limits of functions (Chapter 8 of Pestana’s book)

We consider functions f : A ⊂ R→ R or f : R→ R. A is called the domainof f and {y ∈ R : f(x) = y, x ∈ A} is the image or range of f .

Definition: Let f : A ⊂ R → R and let x0 ∈ A. We say that y is thelimit of f at x0 if for every sequence of numbers xn ∈ A convergent to x0,the sequence f(xn) converges to y.

Remark: We do note need that x0 ∈ A.Notation: limx→x0 f(x) = y.Properties: If limx→x0 f(x) = y1 and limx→x0 g(x) = y2 then

limx→x0

(f(x) + g(x)) = y1 + y2, limx→x0

f(x)g(x) = y1y2, limx→x0

f(x)

g(x)=y1y2,

the last equality being true only if y2 6= 0 and in some neighborhood of x0,g(x) 6= 0.

Theorem: limx→x0 f(x) = y if and only if for all ε > 0, there exists δ > 0such that if |x− x0| < δ then |f(x)− y| < ε.

Definition: A function is bounded if there exists a real number M > 0such that f(x) ∈ [−M,M ] for all x ∈ A.

Definition: A function is monotone increasing if for every x < y, x, y ∈ A,f(x) ≤ f(y), and strictly monotone increasing if f(x) < f(y).

A function is monotone decreasing if for every x < y, x, y ∈ A, f(x) ≥f(y), and strictly monotone decreasing if f(x) > f(y).

18

Page 19: Math Course Notes-4

2.2.2 Continuity (Chapter 9 of Pestana’s book)

Definition: We say that a function f : A ⊂ R → R is continuous at x0 ∈ Aif limx→x0 f(x) = f(x0).

If f is continuous at every point of a set B ⊂ A, then we say that f iscontinuous at A.

Properties: If f, g are continuous at x0 then so are f + g and fg. Ifmoreover, g(x) 6= 0 in a neighborhood of x0, then f

g is continuous at x0.Theorem: If f : [a, b]→ R is continuous, then f is bounded.Theorem (Weierstrass): Let f : [a, b] → R be continuous and let m =

min[a,b] f(x) and M = max[a,b] f(x). Then there exist two points x1, x2 ∈ [a.b]such that f(x1) = m and f(x2) = M . In other words, f achieves its maximumand minimum.

Corollary: If f : [a, b] → R is continuous then it takes any value betweenm and M .

Definition: Let f : A ⊂ R→ R where A is an interval. Then f is uniformlycontinuous if for all ε > 0 there exists δ > 0 such that if |x − y| < δ then|f(x)− f(y)| < ε.

Remark: Clearly, a uniformly continuous function is continuous. Is thecontrary true ? Yes if A = [a, b].

Example: f(x) = 1x is continuous on (0, 1] but not uniformly continuous.

Composition of functions:Definition: Let g : A → B and f : C → D, with B ⊂ C. Then the

composition of f and g is the function h : A→ D defined as h(x) = f(g(x)),for all x ∈ A.

Notation: h = f ◦ g.Theorem: If f and g are continuous, so is h = f ◦ g.Inverse of a function:Definition: Let f : [a, b] → R be a strictly monotone continuous function

with f(a) = c and f(b) = d. Observe that f takes every value between c andd. We define the inverse function of f as the function f−1 : [c, d] → [a, b]such that

f−1(y) = x⇔ f(x) = y.

Theorem: f−1 is also continuous.

19

Page 20: Math Course Notes-4

2.2.3 Differentiation (Chapters 10,11,14 of Pestana’s book)

Definition: Let f : [a, b]→ R. The derivative of f at x is defined as

f ′(x) = limt→x

f(t)− f(x)

t− x,

provided that the limit exists. If the limit exists and is finite, we say that fis differentiable at x.

Theorem: If f is differentiable at x then f is continuous at x.Example: The function f(x) = |x| is continuous at 0 but not differentiable

at 0.Remark: There exist functions which are continuous everywhere but are

not differentiable at any point.Geometric intepretation: If f is differentiable at x0, then the equation of

the tangent line to the function f at the point x0 is

y = f(x0) + f ′(x0)(x− x0).

That is, the derivative of f at x0 is the slope of this tangent line.Properties: If f and g are differentiable at x ∈ [a, b], then so are f + g, fg

and f/g (the last one provided that g(x) 6= 0), and the derivatives are givenby

• (f + g)′(x) = f ′(x) + g′(x),

• (fg)′(x) = f ′(x)g(x) + g′(x)f(x),

•(fg

)′(x) = f ′(x)g(x)−g′(x)f(x)

g2(x) .

Examples:

1. f(x) = c, then f ′(x) = 0, where c is a real constant.

2. f(x) = x, then f ′(x) = 1.

3. f(x) = xn, then f ′(x) = nxn−1, where n is a positive integer.

4. f(x) = log(x), then f ′(x) = 1x .

5. f(x) = sin(x), then f ′(x) = cos(x).

20

Page 21: Math Course Notes-4

6. f(x) = cos(x), then f ′(x) = − sin(x).

Chain rule: Let f : [a, b]→ R be a continuous function and assume that fis differentiable at x ∈ [a, b]. Let g be defined on the image of f and assumethat g is differentiable at f(x). Then h = g ◦ f is differentiable at x and

h′(x) = g′(f(x))f ′(x).

Application: Let f : [a, b]→ R be a strictly monotone continuous functionwith f(a) = c and f(b) = d. Let g : [c, d] → [a, b] be its inverse. Assumethat f is differentiable at x. Then g is differentiable at y = f(x) if and onlyif f ′(x) 6= 0, and in this case

g′(y) =1

f ′(g(y)).

Definition: Let f : [a, b]→ R. A point x ∈ [a, b] is called a local maximumif there exists δ > 0 such that f(x) ≥ f(y) for all y ∈ [a, b] such that|x− y| < δ. A local minimum satisfies f(x) ≤ f(y).

Theorem: If f has a local minimum or maximum at x ∈ (a, b) and f isderivable at x, then f ′(x) = 0.

Example: f(x) = x3, f ′(0) = 0 but 0 is not a local maximum or minimum.Mean value theorem: If f is continuous on [a, b] and differentiable on (a, b),

then there exists x ∈ (a, b) such that

f ′(x) =f(b)− f(a)

b− a.

In particular, if f(b) = f(a), then there exists x ∈ (a, b) such that f ′(x) = 0(Rolle’s theorem).

Geometric interpretation: f(b)−f(a)b−a is the slope of the secant line connecting

the points (a, f(a)) and (b, f(b)). Thus, the mean value theorem says thatthere exists x ∈ (a, b), such that the tangent line to f at x is parallel to thesecant line.

Theorem: Consider f differentiable in (a, b). Then f is monotone increas-ing in (a, b) if and only if f ′ ≥ 0 in (a, b), and monotone decreasing in (a, b)if and only if f ′ ≤ 0 in (a, b). Moreover, f is constant in (a, b) if and only iff ′ = 0 in (a, b).

21

Page 22: Math Course Notes-4

Hopital’s rule: Let f, g be two differentiable functions in (a, b) with g′ 6= 0in (a, b). Assume that one of both hypothesis is satisfied:

(a) limx→a f(x) = limx→a g(x) = 0.

(b) limx→a f(x) = limx→a g(x) = +∞ or −∞.

Then,

limx→a

f(x)

g(x)= lim

x→a

f ′(x)

g′(x).

Remark: a can be +∞ or −∞ and limx→af ′(x)g′(x) can be +∞ or −∞.

Derivatives of higher order:Definition: If f has a derivative f ′ and f ′ is differentiable, we denote its

derivative by f ′′ and we call it the second derivative of f . In the same way,when they exist, we can define higher order derivatives, and f (n) is called thenth order derivative of f .

Taylor’s theorem: Let f be defined on [a, b] such that f (n−1) is continuousin [a, b] and differentiable in (a, b) (that is, f (n) exists in (a, b)). Then, thereexists x ∈ (a, b) such that

f(b) = f(a)+f ′(a)(b−a)+f ′′(a)

2!(b−a)2+

f (3)(a)

3!(b−a)3+· · ·+f (n)(x)

n!(b−a)n.

This is called the nth order Taylor expansion of f around a.Remark: This theorem says that differentiable functions may be locally

approximated by a polynomial and f (n)(a)n! (b− a)n is called the error of order

n of this approximation. For example, if n = 2 and f ′′ is bounded, then theerror is of order (b− a)2.

Theorem Let x0 such that f ′(x0) = 0 and assume that f ′′ is continuous atx0. Then if f ′′(x0) < 0, f has a local maximum at x0 and if f ′′(x0) > 0, it isa local minimum.

Definition: A function is called convex if for all a, b and λ ∈ (0, 1)

f((1− λ)a+ λb) ≤ (1− λ)f(a) + λf(b),

and concave if the inequality is ≥.Theorem: If f ′′ ≥ 0, then f is convex, and if f ′′ ≤ 0, then it is concave.Property: If f and g are convex (or concave), then f ◦ g is convex (or

concave).

22

Page 23: Math Course Notes-4

2.2.4 Integration (Chapters 12,13 of Pestana’s book)

The definite integral:Definition: Let f : [a, b]→ R be a bounded function. Consider a partition

of [a, b] into n intervals determined by the points

a = x0 < x1 < · · · < xn = b.

We define the following quantities for i = 1, ..., n,

Mi = sup{f(x) : x ∈ [xi−1, xi]}, mi = inf{f(x) : x ∈ [xi−1, xi]}.

We also define the upper sum and lower sum by

U(f) =n∑i=1

Mi(xi − xi−1), L(f) =n∑i=1

mi(xi − xi−1).

If there exists a real number I for which

supL(f) = inf U(f) = I,

where the supremum and infumum are taken with respect to all possiblepartitions of [a, b], we say that f is integrable in [a, b] and we denote thedefinite integral by ∫ b

a

f(x)dx = I.

Geometric interpretation: The definite integral of an integrable continuouspositive function in [a, b] equals the area covered by the x-axis and the graphof f .

Remark: This definition can be extended to infinite intervals or intervalswhere f is unbounded. In this case, it is called improper integrals.

Properties: If f and g are integrable functions in [a, b] and c ∈ R then

1. f+g is integrable in [a, b] and∫ ba (f(x)+g(x))dx =

∫ ba f(x)dx+

∫ ba g(x)dx.

2. cf is integrable in [a, b] and∫ ba cf(x)dx = c

∫ ba f(x)dx.

3. If m ≤ f(x) ≤M for all x ∈ [a, b], then

m(b− a) ≤∫ b

a

f(x)dx ≤M(b− a).

23

Page 24: Math Course Notes-4

4. If c ∈ [a, b] then∫ ba f(x)dx =

∫ ca f(x)dx+

∫ bc f(x)dx.

Notation:∫ ab f(x)dx = −

∫ ba f(x)dx.

Theorem: If f is continuous in [a, b] then f is integrable in [a, b]. In fact,if f is continuous in all but finitely many points of [a, b] then f is integrablein [a, b].

Fundamental theorem of calculus: If f is continuous in [a, b], then thefunction F (x) =

∫ xa f(t)dt is differentiable in [a, b] and F ′(x) = f(x). F is

called the primitive of f and we have that∫ b

a

f(x)dx = F (b)− F (a).

Mean value theorem for integrals: Let f be continuous in [a, b]. Thenthere exists z ∈ (a, b) such that

f(z) =1

b− a

∫ b

a

f(x)dx.

Proof: Apply the mean value theorem to the primitive function F .Remark: If F is a primitive of f then F (x) = F (x) + c is also a primitive

of f .Terminology: A primitive is also called the indefinite integral and is de-

noted by∫f(x)dx = F (x) + x.

Basic primitives:

1.∫

1dx = x+ c.

2.∫xndx = xn+1

n+1 + c, n ∈ N.

3.∫

1xdx = ln |x|+ c.

4.∫exdx = ex + c.

5.∫

sin(x)dx = − cos(x) + c.

6.∫

(f(x) + g(x))dx =∫f(x)dx+

∫g(x)dx.

7.∫cf(x)dx = c

∫f(x)dx.

24

Page 25: Math Course Notes-4

Methods of integration:Integration by substitution: Assume that f is continuous and let F be

its primitive. Recall that if ϕ(t) is a differentiable function, then F ◦ ϕ isdifferentiable and

dF (ϕ(t))

dt= F ′(ϕ(t))ϕ′(t).

But since F ′ = f , we get

dF (ϕ(t))

dt= f(ϕ(t))ϕ′(t).

Therefore, if x = ϕ(t),

F (x) = F (ϕ(t)) =

∫f(ϕ(t))ϕ′(t)dt =

∫f(x)dx.

In many cases, the first integral can be computed much easier that∫f(x)dx.

Formally, we substitute x by ϕ(t) and dx by ϕ′(t)dt.Integration by parts: Since (fg)′ = f ′g + g′f , we have that∫

f(x)g′(x)dx = f(x)g(x)−∫f ′(x)g(x)dx.

3 Calculus in several variables and optimization

Reference: Besada, Garcıa, Miras y Vazquez, Calculo de varias variables,Prentice Hall, 2001.

3.1 Functions of several variables

Sequences in Rn: {xn}n∈N where each xn ∈ Rn.Definition: A sequence if bounded if there exists a real number M > 0

such that for all n ∈ N, ‖xn‖ ≤M .Definition: A sequence converges to x ∈ Rn if for all ε > 0 there exists

N(ε) > 0 such that for all n > N , ‖xn − x‖ < ε.Functions in Rn: f : Rn → R. We say that a function f is continuous at

x0 ∈ Rn if limx→x0f(x) = f(x0), that is, for any sequence xn in Rn convergent

to x0, the sequence f(xn) converges to f(x0).

25

Page 26: Math Course Notes-4

Differentiation in Rn: Definition: A function f : Rn → R is differentiableat x0 ∈ Rn if there exists a linear transformation T : Rn → R such that

limh→0

f(x0 + h)− f(x0)− T (h)

‖h‖= 0.

In this case Df(x0) = T .Example: If f is a linear transformation, then Df(x0) is a constant trans-

formation.Matrix of T : Consider the standard basis e1,...,en in Rn. Then the columns

of the matrix of T in this basis are T (e1),...,T (en), thus we obtain a 1 × nmatrix (vector). This vector is called the gradient of f at x0 and is denoted∇f(x0).

Directional derivative: Let u be a unit vector of Rn. The directionalderivative of f : Rn → R at x0 ∈ Rn in the direction of u, if it exists, isdefined as

Duf(x0) = limt→0

f(x0 + tu)− f(x0)

t.

Partial derivatives: The kth partial derivative of f : Rn → R at x0 ∈ Rn

is the directional derivative of f at x0 in the direction of ek, the kth elementof the standard basis of Rn. We denote it by ∂f

∂xk(x0).

Theorem: If f is differentiable at x0, then all partial derivatives of f at x0

exist and

∇f(x0) =

(∂f

∂x1(x0), ...,

∂f

∂xn(x0)

).

Theorem: Duf(x0) = ∇f(x0) · u.Geometric interpretation: If ∇f(x0) 6= 0, then Duf(x0) is the orthogonal

projection of ∇f(x0) in the direction of u. The maximum value of the direc-tional derivative is attained when ∇f(x0) and u are in the same direction.

Definition: The Hessian matrix of a twice differentiable function f : Rn →R at a point x0 is the matrix with entries ∂2f

∂xi∂xj(x0).

Theorem: If f has a local maximum or minimum at x0 then ∇f(x0) = 0.Theorem: Let x0 such that ∇f(x0) = 0. If the Hessian matrix at x0 is

positive definite, then f has a local minimum at x0. If it is negative definite,then it is a local maximum.

26