Linear Algebra Study Guide

8/3/2019 Linear Algebra Study Guide

http://slidepdf.com/reader/full/linear-algebra-study-guide 1/25

Linear Algebra Written Examinations

Study Guide

Eduardo Corona Other Authors as they Join In

November 2, 2008

Contents

1 Vector Spaces and Matrix Operations 2

2 Linear Operators 2

3 Diagonalizable Operators 33.1 The Rayleigh Quotient and the Min-Max Theorem . . . . . . . . 43.2 Gershgorin’s Discs Theorem . . . . . . . . . . . . . . . . . . . . . 4

4 Hilbert Space Theory: Interior Product, Orthogonal Projectionand Adjoint Operators 54.1 Orthogonal Projection . . . . . . . . . . . . . . . . . . . . . . . . 74.2 The Gram-Schmidt Process and QR Factorization . . . . . . . . 84.3 Riesz Representation Theorem and The Adjoint Operator . . . . 9

5 Normal and Self-Adjoint Operators: Spectral Theorems andRelated Results 115.1 Unitary Operators . . . . . . . . . . . . . . . . . . . . . . . . . . 145.2 Positive Operators and Square Roots . . . . . . . . . . . . . . . . 15

6 Singular Value Decomposition and the Moore-Penrose Gener-alized Inverse 166.1 Singular Value Decomposition . . . . . . . . . . . . . . . . . . . . 166.2 The Moore-Penrose Generalized Inverse . . . . . . . . . . . . . . 186.3 The Polar Decomposition . . . . . . . . . . . . . . . . . . . . . . 19

7 Matrix Norms and Low Rank Approximation 19

7.1 The Frobenius Norm . . . . . . . . . . . . . . . . . . . . . . . . 197.2 Operator Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . 207.3 Low Rank Matrix Approximation: . . . . . . . . . . . . . . . . . 21

1



8 Generalized Eigenvalues, the Jordan Canonical Form and eA 228.1 The Generalized Eigenspace K . . . . . . . . . . . . . . . . . . . 23

8.2 A method to compute the Jordan Form: The points diagram . . 248.3 Applications: Matrix Powers and Power Series . . . . . . . . . . . 24

9 Nilpotent Operators 24

10 Other Important Matrix Factorizations 24

11 Other Topics (which appear in past exams) 24

12 Yet more Topics I can think of 25

1 Vector Spaces and Matrix Operations

2 Linear Operators

De…nition 1 Let U ,V be vector spaces =F (Usually F = R or C). Then L(U; V ) = fT : U ! V j T is linear g: In particular, L(U; U ) = L(U ) is the space of linear operators of U , and L(U; F ) = U is its algebraic dual.

De…nition 2 Important Subspaces: Given W V (subspace), T 1(W ) U: In particular, we are interested in T 1(f0g) = Ker(T ): Also, if S U; then T (S ) U: We are most interested in T (U ) = Ran(T ):

Theorem 3 U; V ev=F , dim(U ) = n; dim(V ) = m: Given B = fu1;:::;ungbasis of U and B0 = fv1;:::;vmg basis of V; to each T 2 L(U; V ) we can associate a matrix [T ]B

0

B such that:

T ui = a1iv1 + ::: + amivm 8i 2 f1; : : ;mg[T ]B

0

B = (aij) in M mn(F )

T U ! V

B l l B0

F n ! F m

[T ]B0

B

Conversely, given a matrix A 2 M mn(F ); there is a unique T A 2 L(U; V ) such that A = [T ]B

0

B :

Proposition 4 Given T 2

L(U ); there exist basis B and B0 of U such that:

[T ]B0

B =

I 00 0

B is constructed as an extension for a basis for Ker(T ); and B0 as an extension for fT (u)gu2B:

2



Theorem 5 (Rank and Nullity) dim(U ) = dim(Ker(T ))+dim(Ran(T )): (T ) =dim(Ker(T )) is known as the nullity of T; and r(T ) = dim(Ran(T )) as the rank

of T:

Change of Basis: U;V ev=F; B and basis of U; B0,0 basis of V; thereexists P invertible such that:

[T ]B0

B = P [T ]0

P 1

P is a matrix that performs a change of coordinates. This means that, if twomatrices are similar, they represent the same linear operator using a di¤erentbasis. This further justi…es that key properties of matrices are preserved undersimilarity.

3 Diagonalizable Operators

If U = V (T;is a linear operator), it is natural to impose that both basis B andB0 are also the same. In this case, it is no longer generally true that we can…nd a basis B such that the corresponding matrix is diagonal. However, if thereexists a basis B such that [T ]B

0

B = diagonal matrix, we say T is diagonalizable.

De…nition 6 V ev=F; T 2 L(V ). 2 F is an eigenvalue of T if 9 v 2 V a nonzero vector such that T v = v: All nonzero vectors such that this holds are known as eigenvectors of T:

We can immediately derive, from this de…nition, that the existence of theeigenpair (; v) (eigenvalue and corresponding eigenvector v) is equivalent tothe existence of a nonzero solution v to

(T I )v = 0

This in turn tells us that the eigenvalues of T are those such that the operatorT I is not invertible. After selecting a basis B for V; this also means: u

det([T ]BB I ) = 0

Which is called the characteristic equation of T . We notice this equationdoes not depend on the choice of basis B; since it is invariant under similarity:

det(P AP 1 I ) = det(P (A I )P 1) = det(A I )

This equation …nally is equivalent to …nding the complex roots of a polyno-mial in : We know this to be a really hard problem for n 5, and a numericallyill-posed problem at that.

3



De…nition 7 V ev=F; T 2 L(V ), an eigenvector of T: Then E = fv 2 V jT v = v

gis the eigenspace for :

Theorem 8 V ev=F of …nite dimension ; T 2 L(V ): The following are equiva-lent:i) T is diagonalizable ii) V has a basis of eigenvectors of T iii) There exist subspaces W 1;:::;W n such that dim(W i) = 1; T (W i) W i and V =Ln

i=1 W iiv) V =Lk

i=1 E k with f1;:::;kg eigenvalues of T

v)Pk

i=1 dim(E i) = dim(V )

Proposition 9 V ev=C; T

2L(V ) then T has at least one eigenvalue (this is a

corollary of the Fundamental Theorem of Algebra, applied to the characteristic equation).

Theorem 10 (Schur’s factorization) V ev=C; T 2 L(V ): There always exists a basis B such that [T ]B is upper triangular.

3.1 The Rayleigh Quotient and the Min-Max Theorem

3.2 Gershgorin’s Discs Theorem

Although calculating eigenvalues of a big matrix is a very di¢cult problem

(computationally and analytically), it is very easy to come up with regions onthe complex plane where all the eigenvalues of a particular operator T mustlie. This technique was …rst devised by the russian mathematician SemyonAranovich Gershgorin (1901 1933):

Theorem 11 (Gershgorin, 1931) Let A = (aij) 2 M n(C): For each i 2 f1; : : ;ng;we de…ne the ith "radius of A" as ri(A) =

Pj6=i jaijj and the ith Gershgorin

disc as Di(A) = fz 2 C j jz aiij < ri(A)

Then, if we de…ne (A) = f j is an eigenvalue of Ag; it follows that:

(A) n

[i=1

Di(A)

That is, all eigenvalues of A must lie inside one or more Gershgorin discs.

4



Proof. Let be an eigenvalue of A; v an associated eigenvector. We …x i as theith coordinate of v with maximum modulus, that is,

jvi

j jvk

j 8k. Necessarily,

jvij 6= 0. Then,

Av = v =)vi =X

aijvj

( aii)vi =Xj6=i

aijvj

j( aii)j jvij Xj6=i

jaijj jvij

2 Di(A)

Now, we know that A represents a linear operator T 2 L(Rn

), and thattherefore its eigenvalues are invariant under transposition of A and under simi-larity. Therefore:

Corollary 12 Let A = (aij) 2 M n(C): Then, (A) \

fn[i=1

Di(P AP 1) j P

is invertible g

Of course, if T is diagonalizable, one of this P 0s is the one such that P AP 1

is diagonal, and therefore the Gershgorin discs degenerate to the n points we arelooking for. However, if we don’t want to compute the eigenvalues, we can stilluse this to come up with a …ne heuristic to reduce the region given by the unionof the gershgorin discs: We can use permutation matrices or diagonal matrices

as our P 0

s to get a "reasonable region". This result also hints at the fact that,if we perturb a matrix A, the eigenvalues change continuously.

The Gershgorin disc theorem is also a quick way to prove A is invertible if itis diagonal dominant, and it also provides us with results when the eigenvaluesof A are all distinct (namely, that there must be at least one eigenvalue perGershgorin disc).

4 Hilbert Space Theory: Interior Product, Or-thogonal Projection and Adjoint Operators

De…nition 13 Let V ev=F: An interior product on V is a function <; >: V

V ! F such that:1) hu + v; wi = hu; wi + hv; wi 8u;v;w 2 V 2) hu;wi = hu; vi 8u; v 2 V , 2 F 3) hu; vi = hv; ui4) hu; ui 0 and hu; ui = 0 =) u = 0

5



By de…nition, every interior product induces a natural norm for V , given by

kv

kV =p hv; v

iDe…nition 14 We say u and v are orthogonal, or u?v, if hu; vi = 0

Some important identities:

1. Pythagoras Theorem: u?v () ku + vk2 = kuk2 + kvk2

2. Cauchy-Bunyakowski-Schwarz: jhu; vij kuk kvk 8u; v 2 V withequality () u = v

3. Parallelogram: ku + vk2 + ku vk2 = 2(kuk2 + kvk2) 8u; v 2 V

4. Polarization:

hu; vi =1

4fku + vk2

ku vk2

g 8u; v 2 V if F = R

hu; vi =1

4

4Xk=1

iku + ikv2 8u; v 2 V if F = C

In fact, identities 3 and 4 (Parallelogram and Polarization) give us bothnecessary and su¢cient conditions for a norm to be induced by someinterior product. In this fashion, we can prove kk1 and kk1 are not inducedby an interior product by showing paralellogram fails.

De…nition 15 v 2 V is said to be of unit norm if kvk = 1

De…nition 16 A subset S V is said to be orthogonal if the elements in S are

mutually orthogonal (perpendicular)

De…nition 17 S is orthonormal if it is orthogonal and its elements are of unit norm

If S is orthogonal, then it is automatically LI (Linearly Independent). Intu-itively, we can think of orthogonal vectors as vectors which do not "cast a shade"on each other, and therefore point to completel exclusive directions. We havethe following property: if S = fv1;:::;vng is orthogonal, then for all v 2 span(S ):

v = 1v1 + ::: + nvn

i =hv; viihvi; vii 8i

Thus, we can obtain the coe¢cient for each element of S independently, bycomputing the interior product with the corresponding vi: Furthermore, if S isorthonormal:

i = hv; vii 8i

And these coe¢cients are also called the abstract Fourier coe¢cients.

6



Theorem 18 (Bessel’s Inequality) Let fv1;:::;vmg orthonormal set, v 2 V .Then:

mXi=1

jhv; viij2 kvk

With equality () v 2 span(fvigmi=1).

4.1 Orthogonal Projection

This last result suggests that, for an orthogonal set S; in order to retrieve the

component going "in the ith direction", we only need to compute hv;viihvi;vii

vi: This

is in fact a projection of our vector v in the direction of vector vi; or the "shadow"that v casts on the direction of vector vi: We shall de…ne this more generally,

and see that we can de…ne projection operators which give us the component of a vector in a given subspace of V :

De…nition 19 S V: We de…ne the orthogonal complement S ? = fv 2 V jhv; si = 0 8s 2 S g: If S V (closed subspace), then S S ? = V and (S ?)? = S (always true for …nite dimension)

De…nition 20 Let W V: Then we de…ne P W 2 L(V ) such that, if v =vW + vW ? ; then P W (v) = vW : We can also de…ne this operator by it’s action on a suitable basis of V : If we take BW = fw1; : : ;w pg and BW ? = fu1;:::;un pgbasis of W and W ?; then B = BW [ BW ?:

P W (wi) = wi 8i 2 f1;:::;pgP W (uj) = 0 8 j 2 f1;:::;n pg

[P W ]B =

I p p 0

0 0

From this, a myriad of properties for P can be deduced:

1. P 2W P W : This follows easily from the fact that P W w = w 8w 2 W

2. Ran(P W ) = W and Ker(P W ) = W ?

3. v P W v 2 W ? 8v 2 V : we can deduce this directly from the de…ni-tion, or compute the interior product with any member of W: Also, thisfollows from the diagram one can draw in R

2 or R3: If we remove the"shadow" cast by a vector, all that is left is the orthogonal component.This additonally tells us that:

P W ? = I P W

7



4. kv P W vk kv wk 8w 2 W : This is a very strong result: it tells us theorthogonal projection is the best approximation to v through vectors in W:

This is a key result which justi…es the use of the projection in applicationssuch as least squares, polynomial interpolation and approximation, fourierseries, etc. In fact, this result can be extended to projection on convexsets in Hilbert spaces.

5. kP W vk kvk 8v 2 W : This tells us the projection is a contraction. Inparticular, we know that kP k = 1; since there are vectors (namely, thosein W ) for which equality holds.

6. hP W u; vi = hu; P W vi 8u; v 2 V (P W is "self-adjoint"). This can beproved explicitly using the unique decomposition of u and v as the sum of components in W and W ?. In particular, this also tells us that the matrixwhich represents P W is symmetric / self-adjoint as well if we choose a basis

of orthonormal vectors.

7. It can be shown that properties (1) and (4), (1) and (5) or (1) and (6)completely characterize the orthogonal projection. That is, from theseproperties alone we can deduce the rest, and that the operator P has tobe a projection onto its range.

4.2 The Gram-Schmidt Process and QR Factorization

We can ask ourselves if, for any basis in V there exists a procedure to turn itinto an orthonormal basis. The Gram-Schmidt Process does exactly this, andas a by-product it also gives us a very useful matrix factorization, QR:

Theorem 21 (Gram-Schmidt) If f

wi

gmi=1 is a linearly independent set, there

exists an orthonormal set fuigmi=1 such that span(fuigmi=1) = span(fwigmi=1): It can be constructed through the following process:8<:v1 = w1 , vk = wk

k1Xj=1

hwk; vjihvj; vji vj = P

span(fvigk1i=1

)?(wk)

9=;

u1 =v1

kv1k , uk =vk

kvkk

Furthermore, by completing fwigmi=1 to a full basis of V (if m n), we can always obtain an orthonormal basis of V following this process.

Theorem 22 ( QR Factorization) Let A be a matrix of full column rank,with columns fwigmi=1: Then , by applying Gram-Schmidt to the columns of A(augmenting them to obtain a full basis if m < n); we obtain the following:

wk = kvkk uk +k1Xj=1

hwk; uji uj 8k

8



If we write this in matrix form, where Q is a matrix with columns fuigni=1 (by de…nition, this is an orthogonal / unitary matrix) and Rkk =

kvk

k; Rjk =

hwk; uji if k > j (upper triangular matrix), this last expression provides the following matrix factorization for A:

A =

0@ j j j j

w1 w2 wm

j j j j

1A = QR =

0@ j j j j j j

u1 um um1 unj j j j j j

1A0BBBBB@

kv1k hw1; u2i hwm

0 kv2k hwm

0 0. . .

...0 0 0 kvm0 0 0 0

This is, A = (Q1 j Q2)R10

= Q1R1, where Q1 has the same column space as

A.

This factorization is very useful both to solve linear systems of equations

(there are e¢cient ways to compute QR; namely the Householder algorithm andother sparse or incomplete QR routines) because, once computed, the systemAx = b is equivalent to solving:

Rx = Qb

Which can be rapidly solved through backward or forward substitution (since Ris upper triangular). Also, QR factorization is extensively used to obtain easierformulas for certain matrix products that appear in applications such as OLSand smooth splines.

A relevant result regarding this matrix factorization is that, although it isnot unique in general, if we have that A = Q1R1 = Q2R2, then it can be shownthat D = R2R1

1 is a diagonal, unitary matrix.

4.3 Riesz Representation Theorem and The Adjoint Op-erator

For any linear operator T 2 L(V; W ); we can obtain a related operator T 2L(W; V ) called the adjoint operator, which has very interesting properties. Thisoperator becomes even more relevant for applications to vector spaces of in…nitedimension. This operator is de…ned as follows:

De…nition 23 Let T 2 L(V; W ). Then, the adjoint operator T 2 L(W; V ) is de…ned by the following functional relation:

hT v ; wiW = hv; T wiV 8v 2 V , w 2 W

If we choose orthonormal basis B and B0 for V and W , then the matrix that represents the adjoint is the conjugate transpose of the matrix that represents A: We get:

hAx;yiRm

= hx; AyiRn

8x 2 Rn; y 2 Rm

Where A = [T ]B0

B and A = [T ]BB0 = ([T ]B0

B ).

9



The existence and uniqueness of this operator is given by the Riesz Repre-sentation Theorem for Hilbert Spaces:

Theorem 24 (Riesz Representation) Let V vs=F a Hilbert space, and T 2L(V; F ) a continuous, linear functional (element of the topological dual). Then,9!v 2 V such that:

T u = hu; vi 8u 2 V

Therefore, the Adjoint operator is always well de…ned by the functionalrelation we have outlined, parting from the linear functional Lwv = hT v ; wiW

…xing each w 2 W:

Remark 25 Here is a quick application of the adjoint operator and the orthog-onal projection operator: Let A 2 M mn(F ); and Ax = b a system of linear equations. Then the least squares solution to this system is given by the solution

to:rrrrrrrrrrrrAx0 = P Ran(A)(b)

Since we are projecting b onto the column space of A; and we know this is the best approximation we can have using linear combinations of the columns of A:Using properties of the projection operator, we now know that:

Ax;b P Ran(A)(b)

= 0 8x

hAx;b Ax0i = 0

Now, using the adjoint of A; we …nd:

hx; Ab AAx0i = 0 8x

So, this means Ab = AAx0; and therefore, if AA is invertible, this means

x0 = (AA)1Ab by = Ax0 = A(AA)1Ab

Incidentally, this also tells us that the projection matrix of a vector onto the column space of a matrix A is given by P Ran(A) = A(AA)1A.

Properties of the Adjoint: (T 2 L(V; W ))

1. (T + S ) = T + S

2. (T )

= T

3. (T ) = T

4. I V = I V (the identity is self-adjoint)

5. (ST ) = T S

10



6. B and B0 orthonormal basis of V and W () then [T ]BB0 = ([T ]B0

B )

(be careful, this is an

()statement)

The most important property of the adjoint, however, provides us with anexplicit relation between the kernel and the image of T and T . These relationscan be deduced directly from the de…nition, and provide us with comprehensivetools to study spaces V and W:

Theorem 26 ( "Fundamental Theorem of Linear Algebra II" ) Let V; W …nite dimentional Hilbert spaces, T 2 L(V; W ): Then:

Ker(T ) = Ran(T )?

Ran(T ) = Ker(T )?

Thus, we can always write V = Ker(T )Ran(T ) and W = Ker(T )Ran(T ).

Proof. (Ker(T ) = Ran(T )?): Let v 2 Ker(T ); and any T u 2 Ran(T ); thenv 2 Ran(T )? () hTu; vi = 0 8u () hu; T vi = 0 8u () v 2 Ker(T ).

The proof of the second statement can be found replacing T and T above.

A couple of results that follow from this one are:

1. T is injective () T is onto

2. Ker(T T ) = Ker(T ), and thus, r(T T ) = r(T ) = r(T ) (rank).

5 Normal and Self-Adjoint Operators: Spectral

Theorems and Related ResultsDepending on the …eld F we are working with, we can obtain "…eld sensitive"theorems that characterize diagonalizable operators. In particular, we are in-terested in the cases where the …eld is either R or C: This discussion will alsoyield important results on isometric, unitary and positive operators.

De…nition 27 T 2 L(V ) is said to be a self-adjoint operator if T T . If F = R, this is equivalent to saying [T ]B is symmetric, and if F = C, that [T ]Bis Hermitian (equal to its conjugate transpose).

De…nition 28 T 2 L(V ) is said to be normal if it commutes with its adjoint,that is, if T T = T T:

Remark 29 If F = R, then an operator T is normal and it is not self-adjoint () 8B orthogonal basis of V , [T ]B is a block diagonal matrix, with blocks of size 1 and blocks of size 2 which are multiples of rotation matrices.

First, we introduce a couple of interesting results on self-adjoint and normaloperators:

11



Proposition 30 T 2 L(V ); F = C then there exist unique self-adjoint opera-tors T 1 and T 2 such that T = T 1 + iT 2: T is then self-adjoint

()T 2

0, and

is normal () T 1T 2 = T 2T 1: These operators are given by:

T 1 =T + T

2, T 2 =

T T

2i

Proposition 31 If T 2 L(V ) is normal, then Ker(T ) = Ker(T ) and Ran(T ) =Ran(T ):

The most important properties of these families of operators, however, haveto do with the spectral information we can retrieve:

Proposition 32 T 2

L(V ) and self-adjoint and F = C: If is an eigenvalue of T , then 2 R

Proof. For u eigenvector, we have:

hu; ui = hu; Tui = hTu; ui = hu; ui =

Proposition 33 T 2 L(V ) and F = C: T is self-adjoint () hT v ; vi 2 R

8v 2 V

Proof. Using both self-adjoint and properties of the interior product:

hT v ; vi = hv ; T vi = hT v ; vi 8v 2 V

This, in particular tells us the Rayleigh quotient for such an operator isalways real, and we can also rederive last proposition.

Proposition 34 If T 2 L(V ) is a normal operator,i) kT vk = kT vk 8v 2 V ii) T I is normal 8 2 Ciii) v is an eigenvector of T () v is an eigenvector of T

Proof. (i): hT v ; T vi = hv; T T vi = hv ; T T vi = hT v; T vi(ii): (T I )

(T I ) = T

T I (T + T

) + 2

I = T T

I (T

+ T ) + 2

I =(T I )(T I )

(iii): (T I )v = 0 =) k(T I )vk = 0 (by i and ii) () (T I )v = 0

12



Theorem 35 (Spectral Theorem, F = C Version) V vs=C …nite dimen-tional Hilbert space. T

2L(V ). V has an orthonormal basis of eigenvectors of

T () T is normal.

Proof. ((=) By Schur’s factorization, there exists a basis of V such that [T ]B isupper triangular. By Gram-Schmidt, I can turn this basis into an orthonormalone Q, and by studying the QR factorization, we realize the resulting matrixis still upper triangular. However, since the basis is now orthonormal, andT is normal, it follows that [T ]Q is a normal, upper triangular matrix. Thisnecessarily implies [T ]S is diagonal (We can see this by computing the productsof the o¤-diagonal elements, and concluding they have to be zero in order forthis to hold).

(=)) If this is the case, then we have Q an orthonormal basis and diagonalsuch that [T ]Q = : Since a diagonal matrix is always normal, it follows that T is a normal operator.

Theorem 36 (Spectral Theorem, F = R Version) V vs=R …nite dimen-tional Hilbert space. T 2 L(V ). V has an orthonormal basis of eigenvectors of T () T is self-adjoint.

Proof. We follow the proof for the complex case, noting that, since F = R; bothSchur’s factorization and Gram-Schmidt will yield matrices with real entries.Finally a diagonal matrix with real entries is always self-adjoint (since this onlymeans that it is symmetric). We can also apply the theorem for the complexcase and use the properties for self-adjoint operators.

In any case, we then have the following powerful properties:

1. V =kMi=1

E i and (E i)? =Mj6=i

E j 8i

2. If we denote P j = P Ej , then P iP j = ij

3. (Spectral Resolution of Identity)

I V =kXi=1

P i

4. (Spectral Resolution of T )

T =k

Xi=1

iP i

These properties characterize all diagonalizable operators on …nite dimen-sional Hilbert spaces. Some important results that follow from this are:

13



Theorem 37 (Cayley-Hamilton) T 2 L(V ); V …nite dimensional Hilbert space.If p is the characteristic polynomial of T; then p(T )

0.

Theorem 38 V vs=C and T 2 L(V ) normal, then 9 p 2 C[x] polynomial such that p(T ) = T . This polynomial can be found by the Lagrange Interpolation problem f p(i) = igki=1

We also have the following properties for T normal, which we can now deduceusing the spectral decomposition of T . These properties basically tell us that, if T is normal, we can operate it almost as if it were a number through its spectralrepresentation.

1. If q is a polynomial, then q (T ) =Pk

i=1 q (i)P i

2. If T

p

0 for some p, then T 03. An operator commutes with T () it commutes with each P i

4. T has a normal "square root" (S such that S 2 = T )

5. T is a projection () all its eigenvalues are 00s or 10s:

6. T = T (anti-adjoint) () all eigenvalues are pure imaginary numbers

5.1 Unitary Operators

De…nition 39 An operator T

2L(V ) is said to be orthogonal ( R) / unitary

( C) if kT vk = kvk 8v 2 V (this means T is a linear isometry, it is a "rigid transformation"). A unitary operator can also be characterized by T normal and T T = T T = I V :

Theorem 40 ( Mazur-Ulam ) If f is an isometry such that f (0) = 0 and f is onto, then f is a linear isometry (unitary operator)

Theorem 41 The following statements are equivalent for T 2 L(V ) :i) T is an isometry ii) T T = T T = I V iii) hT u ; T vi = hu; vi 8u; v 2 V iv) If B is an orthonormal basis, T (B) is an orthonormal basis v) There exists an orthonormal basis of V such that T (B) is an orthonormal

basis.

Theorem 42 If is an eigenvalue of an isometry, then jj = 1. T is then an isometry () T is an isometry as well.

14



5.2 Positive Operators and Square Roots

De…nition 43 V a …nite dimensional Hilbert Space, T 2 L(V ). We say T is a positive operator if T is self-adjoint and hT v ; vi 0 8v 2 V

Remark 44 A matrix A is said to be a positive operator (positive semide…nite matrix) if hAx;xi = x>Ax 0 8x 2 F n

Remark 45 If F = C, we can remove the assumption that T is self-adjoint.

Remark 46 The operators T T and T T are always positive. In fact, it can be shown any positive operator T is of the form SS : This is a general version of the famous Cholesky factorization for symmetric positive de…nite matrices.

Proposition 47 T is a positive operator () T is self-adjoint and all its eigenvalues are real and non-negative.

Some properties of positive operators:

1. T; U 2 L(V ) positive operators, then T + U is positive

2. T 2 L(V ) is positive =) cT is positive 8c 0

3. T 2 L(V ) is positive and invertible =) T 1 is positive

4. T 2 L(V ) positive =) T 2 is positive (the converse is false in general)

5. T; U 2 L(V ) positive operators, then T U = U T implies T U is positive.Here we use heavily that T U = U T implies there is a basis of vectors

which are simultaneously eigenvectors of T and U:

De…nition 48 T 2 L(V ): Then we say S is a square root of T if S 2 = T .

We note that, in general, the square root is not unique, For example, theidentity has an in…nite number of square roots: permutations, re‡ections androtations by 180: However, we can show that, if an operator is positive, thenit has a unique positive square root.

Proposition 49 T 2 L(V ) is positive () T has a unique positive square root.

15



6 Singular Value Decomposition and the Moore-

Penrose Generalized Inverse6.1 Singular Value Decomposition

The Singular Value Decomposition for T 2 L(V; W ) (and the correspondingfactorization for matrices) is, without a doubt, one of the most useful resultsin linear algebra. It is used in applications such as Least Squares Regression,Smooth Spline and Ridge Regression, Principal Component Analysis, MatrixNorms, Noise Filtering, Low Rank approximation of matrices and operators,etc. This decomposition also enables us to de…ne a generalized inverse (alsoknown as the pseudoinverse), and to compute other decompositions, such as thePolar decomposition for positive operators and computing positive square rootsexplicitly.

Theorem 50 (Singular Value Decomposition, or SVD) Let V; W vs =F Hilbert spaces, T 2 L(V; W ) with rank r. Then, there exist orthonormal basis of V and W fv1; : : ;vng and fu1; : : ;umg, as well as positive scalars 1 2 ::: r > 0such that:

T vi = iui ; i r

T vi = 0 ; i > r

These scalars are called the "singular values" of T . Conversely, if basis and scalars like these exist, then fv1; : : ;vng is an orthonormal basis of eigenvectors of T T such that the …rst r are associated to the eigenvalues 2

1;:::;2r, and the

rest are associated to = 0.

Using what we know about positive operators, we can see how the state-ment of this theorem must always be true. Regardless of what T is, T T isa positive operator, and therefore diagonalizable, with nonnegative eigenval-ues f2

1;:::;2rg and possibly also 0: Then, we obtain the set of fu1; : : ;umg by

computing T vi=i = ui for the …rst r vectors, and then completing it to anorthonormal basis of W .

Also, this theorem immediately has a geometric interpretation: by choosingthe "right" basis, we know exactly what is the action of T on the unit sphere.Basically, T sends the unit sphere to the boundary of a r dimensional ellipsoid(since it squashes fvr+1; : : ;vng to zero), with axis on the …rst r u0is: Also, thebiggest axis of this ellipsoid is the one in the direction of u1; and the smallestis the one in the direction of u

r.

Finally, we note that this theorem applied to a matrix A 2 M mn(F ) yieldsthe following matrix factorization: If V; U are unitary matrices with fv1; : : ;vngand fu1; : : ;umg as columns, and if is the matrix in M mn(F ) with all zeros

16



except for ii = i for i r, then:

A = U V =

0@ j j j

u1 unj j j

1A0BBBB@

1 0 0

0. . . 0 0

...... r 0

0 0 0nr

1CCCCA0B@

v1 ... vm

1CA

This factorization is known as the Singular Value Decomposition, or SVD fac-torization of A:

We know that, for the system of equations Ax = b, the best approximationis given by the solution of AAx = Ab. By using the SVD, we can alwayscompute the solution with minimum norm:

Given an SVD for A; A = U V , we have the following:

kAx bk = kU V x bk= kV x Qbk

Since U is a unitary matrix. Therefore, all we need is to minimize ky ck ;and then solve for x = V y, c = U b: However, it is clear that:

ky ck2 =rXi=1

jiyi cij2 +nX

i=r+1

jcij2

Which is minimized precisely when yi = cii

for i r and its minimum value

isPn

i=r+1 jcij2. If we want the y with minimum norm, all we have to do is tomake the rest of its coordinates zero.

Now, solving for x; if we de…ne:

y =

0BBBB@

11

0 0

0. . . 0 0

...... 1

r0

0 0 0nr

1CCCCA

Then the solution for this problem is given by:

x = V y = V yc = (V yU )b

From the properties of Least Squares, and this last formula, we already know

that the matrix (V yU ) does the following:

1. If b 2 Ran(A), (the system is consistent) then it gives us the solutionto Ax = b with minimum norm. For any x 2 Rn; we know we can writex = P Ker(A)x + P Ker(T )?x: Since A(P Ker(A)x) = 0; then (V yU )b is the

unique solution in Ker(A)?.

17



2. If b =2 Ran(A); (the system is inconsistent) then it projects b in CS (A);and then gives us the unique solution to Ax = P Ran(A)b in Ker(A)?:

3. We can also deduce this from the fact that, from the construction of the SVD, the Fundamental Theorem of Linear Algebra II, and A =V U ; that fv1;:::;vrg is a basis for Ker(A)?; fvr+1;:::;vng for Ker(A),fu1;:::;urg for Ran(A) and fur+1;:::;umg for Ran(T )?.

6.2 The Moore-Penrose Generalized Inverse

Theorem 51 (Moore-Penrose Generalized Inverse) Given V; W vs =F Hilbert spaces, T 2 L(V; W ) with rank r: There exists a unique linear operator, which we call the Moore-Penrose Generalized Inverse (or pseudoinverse, for short)T y : W ! V , such that, if S is the restriction of T to Ker(T )?, then:

T y jRan(T )= S 1

As an extension of this inverse, it has the following properties:

T yT P Ker(T )?

T T y P Ran(T )

Finally, if we have an SV D decomposition of T; the pseudoinverse T y can be computed as:

T yuj =vjj

; j r

T yuj = 0 ; j > r

In matrix form, if A = [T ]U V

, Ay = [T ]V U

, then:

Ay = V yU

The following properties can be obtained for the SVD and the Pseudoinverse:

1. Let A 2 M mn(C). Then, A; A and A> have the same singular values.Also, (Ay) = (A)y and (Ay)> = (A>)y

2. (Moore-Penrose Conditions) Let T 2 L(V; W ). If an operator U is such that: (a) T U T T , (b) U T U U and (c) U T and T U areself-adjoint then U T y. These conditions are a characterization of thepseudoinverse of T as a linear operator.

3. We can check that the general formula for the projection to Ran(A) which

we calculated with the adjoint matrix is:P Ran(A) = A(AA)yA = AAy = U yU

Where y is a diagonal matrix with 10s in the …rst r diagonal entriesand 00s in the rest.

18



6.3 The Polar Decomposition

Another useful decomposition that can be obtained from the SVD is the PolarDecomposition. This is a generalization of the decomposition of a complexnumber z = jzj ei arg(z):

Theorem 52 Let A 2 M mn(F ): Then, there exists W unitary and P positive matrices such that A = W P . If A is invertible, then this decomposition is unique. One way to derive this is by using P = jAj =

p AA. Given an SVD

A = U V , then W = U V and P = V V .

Proof. A = U V = (U V )(V V ) = W P . As a product of unitary matrices,W is unitary, and the positivity of P follows from the fact that is diagonalwith non-negative entries.

Some useful results that follow from this decomposition are:

1. A = W P is normal () W P 2 = P 2W

2. Using that a positive matrix has a unique square root, we can use theprevious result to conclude A = W P is normal () W P = P W

3. A = W P; then det(P ) = jdet Aj and det(W ) = ei arg(det(A)).

The Polar decomposition, which can be extended to linear operators in in…-nite dimensions, basically tells us that we can view any linear operator as thecomposition of a partial isometry and a positive operator.

7 Matrix Norms and Low Rank Approximation

Matrices are very versatile: they can be seen as rearranged vectors in Rmn, wecan identify a group of matrices with some mathematical object, or we can justtake them as members of the vector space of linear transformations from F n toF m. In any case, it is very useful to have the notion of what a matrix norm is.

7.1 The Frobenius Norm

If we consider matrices as members of Rmn; it is then natural to endow them

with the usual euclidean norm and interior product:

kAkF =Xi;j

a2ij

hA; BiF =Xi;j

aijbij

19



Or equivalently, we can write:

kAkF = p tr(AA)hA; BiF = tr(BA)

In any case, we conclude the space (M mx(F ); kkF ) is a Hilbert space. Thisnorm has the following properties:

1. kAxkF m kAkF kxkF n (Lipschitz condition). In particular, this condi-tion tells us any linear operator in M mx(F ) is continuous.

2. A and B such that AB makes sense, then kABkF kAkF kBkF

3. Given an SVD A = U V , kAk2F = tr(V V ) = tr() =rXi=1

2i

4. Given A diagonalizable, we can reinterpret the spectral decompositionas follows: A = QQ =Pn

i=1 iq iq i =Pn

i=1 iZ i; where fZ igni=1 isan orthonormal set in (M nn(F ); kkF ). Also, given an SVD of A 2M mn(F ); A = U V =Pr

i=1 i(U jV j ) =Pr

i=1 iZ jj where again, theZ matrices are orthonormal. An orthonormal basis for M mn(F ) can bethen given by Z ij = U iV j .

5. (Pythagoras Theorem) Ran(A)?Ran(B) () A?B in the Frobenius

inner product, and kA + Bk2F = kAk2F +kBk2F (not true for general matrixnorms)

6. (Pseudoinverse, revisited) Ay is the matrix that minimizes kI AX kF .That is, it is the best approximation to the inverse of A in the Frobenius

norm.

7. (Best approximation with unitary matrices) minfkQ Q0kF : Q is unitaryg =kU V Q0kF

7.2 Operator Norms

If we consider matrices as operators in L(F n; F m); it is then natural to use thecorresponding operator norm. This norm is dependent on the norms we usefor F n and F m, and it measures the maximum distorsion of the unitary sphereunder the action of A. That is, given A : (F n; kka) ! (F n; kkb):

kAka;b = maxkxk

a=1

fkAxkbg

Then, by de…nition of an operator norm, it follows that kAxkb kAka;b kxka8x 2 F n.

20



This maximum is always attained at some point in the sphere, since kAxkb isa continuous function, and the sphere is compact. Although we can potentially

use any norms we want on the domain and range of the transformation, it isoften the case that kka and kkb are both p-norms with the same value of p: Inthis case, we talk about the p norm of A as kAk p;p.

An important question that arises then is how to calculate this operatornorm. For a general p, this becomes a constrained optimization problem (oftena nontrivial one for general p). However, for some important cases, we can againsay something in terms of the SVD or in terms of the entries of A.

Properties of kAk2 :

1. kAk2 = maxfig = 1 : As we know, this is a very signi…cant fact, thatis deeply tied with the geometrical interpretation of the SVD. As it wasmentioned before, the SVD reveals that the sphere is mapped to an r

dimensional ellipsoid, where the major axis has length 1.

2. minfkAxk2 : x 2 Ran(A),kxk2 = 1g = r

3. kAk2 = maxkxk

2=1

f maxkyk

2=1

fyAxgg = maxkxk

2=1

f maxkyk

2=1

fhAx;yiF mgg

4. kAk2 = kAAk2 = kAk25. For U and V unitary, kU AV k2 = kAk26. A invertible, thenA12

= 1r

. In general, we haveAy2

= 1r

.

We also have a result for the 1 and

1norms:

kAk1 = maxj=1;::;m

fnXi=1

jaij jg (maximum of kk1 norm of columns)

kAk1 = maxi=1;::;n

fmXj=1

jaij jg (maximum of kk1 norm of rows)

We observe that kAk1 = kAk1.

7.3 Low Rank Matrix Approximation:

We have now seen that the SVD provides us with tools to compute matrixnorms and to derive results of the vector space of matrices of a certain size. Ina way that is completely analogous to the theory of general Hilbert spaces, thisleads us to a theory of matrix approximation, by eliminating the singular valuesthat are not signi…cant and therefore producing an approximation of A that has

21



lower rank. This can immediately be seen as truncating a "Fourier Series" of A; using the orthonormal basis that is suggested by our SVD:

Let A = U V = Pri=1 i(U iV i ) = Pri=1 iZ ii, where fZ ijg = fU iV j gis the orthonormal basis of M mn(F ) (with respect to the Frobenius interiorproduct) as before. Then, it becomes evident that:

i = hA; Z iiiF That is, the i (and all of the entries in ) are the Fourier coe¢cients of A usingthis particular orthonormal basis. We notice that, since the Z ij are exteriorproducts of two vectors, it follows that rank(Z ij) = 1 8i; j.

Now, it is often the case that A will be "noisy", either because it representsthe pixels in a blurred image, or because it is a transformation that involves somenoise. However, as in other instances of …ltering or approximation schemes, weexpect the noise to be of "high frequency", or equivalently, we know that the

signal to noise ratio decreases in proportion to i. Therefore, by truncating theseries after a certain k; the action of A remains almost intact, but we often getrid of a signi…cant amount of "noise". Also, using results of abstract Fourierseries, we can derive the fact that this is the best approximation to A of rank kin the Frobenius norm, that is:

Ak =kXi=1

iZ ii

kA Akk2F =kXi=1

2i = min

rank(B)=kfkA Bk2F g

We also have the following results:1. De…ning the "error matrix" E k = A Ak, it follows by Pythagoras The-

orem that kAkk2F = kAk2F kE kk2F : We can also de…ne a relative error,or

R2k =

kE kk2F kAk2F

=

Pri=k+1 2

iPri=1 2

i

2. The matrix Ak is the result of k succesive approximations to A; each of rank 1.

3. Ak is also an optimal approximation under the kk2 norm, with minimumvalue k+1.

8 Generalized Eigenvalues, the Jordan Canoni-cal Form and eA

The theory of generalized eigenspaces is a natural extension of the results fordiagonalizable operators and their spectral decomposition. Although the SVD

22



does provide some of these properties, it is desireable to have a decompositionof the space as the orthogonal sum of invariant subspaces. This also leads to

the Jordan Canonical form, which is a block diagonal matrix with which wecan easily operate and calculate matrix powers, power series and importantoperators like the exponential operator.

8.1 The Generalized Eigenspace K

De…nition 53 Let V be a vector space over C, T 2 L(V ). For eigenvalue of T; we de…ne am() the algebraic multiplicity as the multiplicity of the root of the characteristic polynomial p() = det(T I ) and gm() the geometric multiplicity as the dimension of the eigenspace E = Ker(T I ).

Let V be a vector space over C, T 2 L(V ) a linear operator. We knowthat T has a set of distinct eigenvalues figki=1 and it is either the case that the

algebraic and the geometric multiplicity of each i coincide (T is diagonalizable)or that, for some ; gm() < am(). In the latter case, the problem is that theeigenspaces fail to span the entire space. We can then consider powers of theoperator (T I ); and since Ker((T I )m) Ker((T I )m+1) 8m (thespace "grows"), we can de…ne the following:

K = fv 2 V : (T I )mv = 0 for some m 2 NgThese generalized eigenspaces have the following properties:

1. E K 8 eigenvalue of T (by de…nition)

2. K V and is invariant under the action of T : T (K ) K

3. If dim(V ) <

1then dim(K ) = am()

4. K 1 \ K 2 = ? for 1 6= 2

Theorem 54 ( Generalized Eigenvector Decomposition ) Let V be a vector space over C, T 2 L(V ). Then if figki=1 are the eigenvalues of T and the characteristic polynomial p() splits in the …eld F ,

V =kMi=1

K i

Proof.

Theorem 55 ( Jordan Canonical Form Theorem ) Under the conditions of the generalized eigenvector decomposition, there exists a basis B such that [T ]B

is block diagonal, and its blocks are Jordan blocks, that is:

[T ]B =

0BBB@

J 1 0 00 J 2 0...

.... . .

...0 0 0 J k

1CCCA

23



Where each J i is a Jordan Canonical Form. A Jordan Canonical Form in turn is also a block diagonal matrix, composed of Jordan blocks, which are matrices

of the form:

J ( p)i

=

0BBBB@

i 1 0

0 i. . .

......

.... . . 1

0 0 0 i

1CCCCA

And the number of blocks in J i coincides with the geometric multiplicity of i( dim(E i)). Also, the maximum size of these blocks is the …rst m for which Ker((T I )m) = Ker((T I )m+1) = K i .

Proof.

8.2 A method to compute the Jordan Form: The pointsdiagram

8.3 Applications: Matrix Powers and Power Series

9 Nilpotent Operators

10 Other Important Matrix Factorizations

1. LDU Factorization

2. Cholesky

11 Other Topics (which appear in past exams)

1. Limits with Matrices

2. Symplectic Matrices

3. Perron Frobenius and the Theory of Matrices with Positive Entries.

4. Markov Chains

5. Graph Adjacency Matrices and the Graph Laplacian. Dijikstra and FloydWarshall.

6. Matrices of rank k and the Sherman-Morrison-Woodbury formula (for theinverse of rank k updates of a matrix)

24



12 Yet more Topics I can think of

1. Symmetric Positive Semide…nite Matrices and the Variance CovarianceMatrix

2. Krylov Subspaces: CG, GMRES and Lanczos Algorithms

3. Toeplitz and Wavelet Matrices

4. Polynomial Interpolation

25

Linear Algebra Study Guide

Documents

Transcript of Linear Algebra Study Guide