AbstractLinearAlgebra - University College CorkA linear map between abstract vector spaces doesn’t...

Benjamin McKay

Abstract Linear Algebra

October 19, 2016

This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

Contents

I Basic Definitions 11 Vector Spaces 32 Fields 213 Direct Sums of Subspaces 35

II Jordan Normal Form 414 Jordan Normal Form 435 Decomposition and Minimal Polynomial 536 Matrix Functions of a Matrix Variable 617 Symmetric Functions of Eigenvalues 718 The Pfaffian 77

III Factorizations 899 Dual Spaces 9110 Singular Value Factorization 9511 Factorizations 101

IV Tensors 10512 Quadratic Forms 10713 Tensors and Indices 11714 Tensors 12715 Exterior Forms 137Hints 141Bibliography 161List of notation 163Index 165

iii

Basic Definitions

Chapter 1

Vector Spaces

The ideas of linear algebra apply more widely, in more abstract spaces than Rn.

Definition

To avoid rewriting everything twice, once for real numbers and once for complexnumbers, let K stand for either R or C.Definition 1.1. A vector space V over K is a set (whose elements are calledvectors) equipped with two operations, addition (written +) and scaling (written·), so that

a. Addition laws:

1. u+ v is in V2. (u+ v) + w = u+ (v + w)3. u+ v = v + u

for any vectors u, v, w in V ,

b. Zero laws:

1. There is a vector 0 in V so that 0 + v = v for any vector v in V .2. For each vector v in V , there is a vector w in V , for which v+w = 0.

c. Scaling laws:

1. av is in V2. 1 · v = v

3. a(bv) = (ab)v4. (a+ b)v = av + bv

5. a(u+ v) = au+ av

for any numbers a, b ∈ K, and any vectors u and v in V .

Because (u+ v) +w = u+ (v+w), we never need parentheses in adding upvectors.

3

4 Vector Spaces

Kn is a vector space, with the usual addition and scaling.

The set V of all real-valued functions of a real variable is a vector space:we can add functions (f + g)(x) = f(x) + g(x), and scale functions:(c f)(x) = c f(x). This example is the main motivation for developingan abstract theory of vector spaces.

Take some region inside Rn, like a box, or a ball, or several boxes andballs glued together. Let V be the set of all real-valued functions ofthat region. Unlike Rn, which comes equipped with the standard basis,there is no “standard basis” of V . By this, we mean that there is nocollection of functions fi we know how to write down so that everyfunction f is a unique linear combination of the fi. Even still, we cangeneralize a lot of ideas about linear algebra to various spaces like Vinstead of just Rn. Practically speaking, there are only two types ofvector spaces that we ever encounter: Rn (and its subspaces) and thespace V of real-valued functions defined on some region in Rn (and itssubspaces).

The set Kp×q of p × q matrices is a vector space, with usual matrixaddition and scaling.

1.1 If V is a vector space, prove that

a. 0 v = 0 for any vector v, and

b. a 0 = 0 for any scalar a.

1.2 Let V be the set of real-valued polynomial functions of a real variable.Prove that V is a vector space, with the usual addition and scaling.

1.3 Prove that there is a unique vector w for which v + w = 0. (Lets alwayscall that vector −v.) Prove also that −v = (−1)v.

We will write u − v for u + (−v) from now on. We define linear relations,linear independence, bases, subspaces, bases of subspaces, and dimension usingexactly the same definitions as for Rn.

Remark 1.2. Thinking as much as possible in terms of abstract vector spacessaves a lot of hard work. We will see many reasons why, but the first is thatevery subspace of any vector space is itself a vector space.

Review problems 5

Review problems

1.4 Prove that if u+ v = u+ w then v = w.

1.5 Imagine that the population pj at year j is governed (at least roughly) bysome equation

pj+1 = apj + bpj−1 + cpj−2.

Prove that for fixed a, b, c, the set of all sequences . . . , p1, p2, . . . which satisfythis law is a vector space.

1.6 Give examples of subsets of the plane

a. invariant under scaling of vectors (sending u to au for any number a),but not under addition of vectors. (In other words, if you scale vectorsfrom your subset, they have to stay inside the subset, but if you add somevectors from your subset, you don’t always get a vector from your subset.)

b. invariant under addition but not under scaling or subtraction.

c. invariant under addition and subtraction but not scaling.

1.7 Take positive real numbers and “add” by the law u ⊕ v = uv and “scale”by a� u = ua. Prove that the positive numbers form a vector space with thesefunny laws for addition and multiplication.

1.8 Which of the following sets are vector spaces (with the usual addition andscalar multiplication for real-valued functions)? Justify your answer.

a. The set of all continuous functions of a real variable.

b. The set of all nonnegative functions of a real variable.

c. The set of all polynomial functions of degree exactly 3.

d. The set of all symmetric 10× 10 matrices A, i.e. A = At.

Bases

We define linear combinations, linear relations, linear independence, bases andthe span of a set of vectors identically.

1.9 Find bases for the following vector spaces:

a. The set of polynomial functions of degree 3 or less.

b. The set of 3× 2 matrices.

c. The set of n× n upper triangular matrices.

d. The set of polynomial functions p(x) of degree 3 or less which vanish atthe origin x = 0.

6 Vector Spaces

Lemma 1.3. The number of elements in a linearly independent set is nevermore than the number of elements in a spanning set: if v1, v2, . . . , vp ∈ V isa linearly independent set of vectors and w1, w2 . . . , wq ∈ V is a spanning setof vectors in the same vector space, then p ≤ q. Moreover, p = q just whenv1, v2, . . . , vp is a basis. In particular, any two bases have the same number ofelements.

Proof. If v1 = 0 then1 v1 + 0 v2 + 0 v3 · · ·+ 0 vp = 0,

a linear relation. So v1 6= 0. We can write v1 as a linear combination, sayv1 = b1 w1 + b2 w2 + · · ·+ bq wq.

Not all of the b1, b2, . . . , bq coefficients can vanish, since v1 6= 0. If we relabelthe subscripts, we can arrange that b1 6= 0. Solve for w1:

w1 = 1b1v1 −

b2b1w2 − · · · −

bqb1wq.

Therefore we can write each of w1, w2, w3, . . . , wq as linear combinations ofv1, w2, w3, . . . , wq. So v1, w2, w3, . . . , wq is a spanning set.

Next replace v1 in this argument by v2, and then by v3, etc. We can alwaysreplace one of the vectors w1, w2, . . . , wq by each of the vectors v1, v2, . . . , vp.If p ≥ q, we can keep going like this until we replace all of the vectorsw1, w2, . . . , wq by the vectors v1, v2, . . . , vq: v1, v2, . . . , vq is a spanning set. Ifp = q, we find that v1, v2, . . . , vp span, so form a basis. If p > q, vq+1 is a linearcombination

vq+1 = b1 v1 + b2 v2 + · · ·+ bq vq,

a linear relation, a contradiction.

Definition 1.4. The dimension of a vector space V is n if V has a basis consistingof n vectors. If there is no such value of n, then we say the V has infinitedimension.Remark 1.5. We can include the possibility that n = 0 by defining K0 to consistin just a single vector 0, a zero dimensional vector space.

1.10 Let V be the set of polynomials of degree at most p in n variables. Findthe dimension of V .

Subspaces

The definition of a subspace is identical to that for Rn.

Let V be the set of real-valued functions of a real variable. The set Pof continuous real-valued functions of a real variable is a subspace of V .

Review problems 7

Let V be the set of all infinite sequences of real numbers. We add asequence x1, x2, x3, . . . to a sequence y1, y2, y3, . . . to make the sequencex1 + y1, x2 + y2, x3 + y3, . . . . We scale a sequence by scaling each entry.The set of convergent infinite sequences of real numbers is a subspaceof V .

In these last two examples, we see that a large part of analysis is encodedinto subspaces of infinite dimensional vector spaces. (We will define dimensionshortly.)

1.11 Describe some subspaces of the space of all real-valued functions of a realvariable.

Review problems

1.12 Which of the following are subspaces of the space of real-valued functionsof a real variable?

a. The set of everywhere positive functions.

b. The set of nowhere positive functions.

c. The set of functions which are positive somewhere.

d. The set of polynomials which vanish at the origin.

e. The set of increasing functions.

f. The set of functions f(x) for which f(−x) = f(x).

g. The set of functions f(x) each of which is bounded from above and belowby some constant functions.

1.13 Which of the following are subspaces of vector space of all 3× 3 matrices?

a. The invertible matrices.

b. The noninvertible matrices.

c. The matrices with positive entries.

d. The upper triangular matrices.

e. The symmetric matrices.

f. The orthogonal matrices.

8 Vector Spaces

1.14 Prove that for any subspace U of a finite dimensional vector space U ,there is basis for V

u1, u2, . . . , up, v1, v2, . . . , vq

so thatu1, u2, . . . , up,

form a basis for U .

1.15

a. Let H be an n× n matrix. Let P be the set of all matrices A for whichAH = HA. Prove that P is a subspace of the space V of all n × nmatrices.

b. Describe this subspace P for

H =(

1 00 −1

).

Sums and Direct Sums

Suppose that U,W ⊂ V are two subspaces of a vector space. Then the inter-section U ∩W ⊂ V is also a subspace. Let U +W be the set of all sums u+wfor any u ∈ U and w ∈W . Then U +W ⊂ V is a subspace.

1.16 Prove that U +W is a subspace, and that

dim(U +W ) = dimU + dimW − dim(U ∩W ).

If U andW are any vector spaces (not necessarily subspaces of any particularvector space V ) the direct sum U⊕W is the set of all pairs (u,w) for any u ∈ Uand w ∈W . We add pairs and scale pairs in the obvious way:

(u1, w1) + (u2, w2) = (u1 + u2, w1 + w2)

andc (u,w) = (cu, cw) .

If u1, u2, . . . , up is a basis for U and w1, w2, . . . , wq is a basis for W , then

(u1, 0) , (u2, 0) , . . . (up, 0) , (0, w1) , (0, w2) , . . . (0, wq)

is a basis for U ⊕W . In particular,

dim (U ⊕W ) = dimU ⊕ dimW.

Linear Maps 9

Linear Maps

Definition 1.6. A linear map T between vector spaces U and V is a rule whichassociates to each vector x from U a vector Tx from V so that

a. T (x0 + x1) = Tx0 + Tx1

b. T (ax) = aTx

for any vectors x0, x1 and x in U and real number a. We will write T : U → Vto mean that T is a linear map from U to V .

Let U be the vector space of all real-valued functions of real variable.Imagine 16 scientists standing one at each kilometer along a riverbank,each measuring the height of the river at the same time. The height atthat time is a function h of how far you are along the bank. The 16measurements of the function, say h(1), h(2), . . . , h(16), sit as the entriesof a vector in R16. So we have a map T : U → R16, given by samplingvalues of functions h(x) at various points x = 1, x = 2, . . . , x = 16.

Th =

h(1)h(2)...

h(16)

.

This T is a linear map.

Any p × q matrix A determines a linear map T : Rq → Rp, by theequation Tx = Ax. Conversely, given a linear map T : Rq → Rp, definea p×q matrix A by letting the j-th column of A be Tej . Then Tx = Ax.We say that A is the matrix associated to T . In this way we can identifythe space of linear maps T : Rq → Rp with the space of p× q matrices.It is convenient to write T = A to mean that T has associated matrixA.

There is an obvious linear map I : V → V given by Iv = v for anyvector v in V , and called the identity map

Definition 1.7. If S : U → V and T : V →W are linear maps, then TS : U →Wis their composition.

10 Vector Spaces

If U,W ⊂ V are subspaces, then there is an obvious linear map

T : U ⊕W → U +W,T (u,w) = u+ w.

This map is a bijection just when U ∩W = {0}, clearly, in which casewe use this map to identify U ⊕W with U + W , and say U + W is adirect sum of subspaces.

1.17 Prove that if A is the matrix associated to a linear map S : Rp → Rq andB the matrix associated to T : Rq → Rr, then BA is the matrix associated totheir composition.

Remark 1.8. From now on, we won’t distinguish a linear map T : Rq → Rp fromits associated matrix, which we will also write as T . Once again, deliberateambiguity has many advantages.

Remark 1.9. A linear map between abstract vector spaces doesn’t have anassociated matrix; this idea only makes sense for maps T : Rq → Rp.

Let U and V be two vector spaces. The set W of all linear mapsT : U → V is a vector space: we add linear maps by (T1 + T2) (u) =T1(u) + T2(u), and scale by (cT )u = cTu.

Definition 1.10. The kernel of a linear map T : U → V is the set of vectors u inU for which Tu = 0. The image is the set of vectors v in V of the form v = Tufor some u in U .

Definition 1.11. A linear map T : U → V is an isomorphism if

a. Tx = Ty just when x = y (one-to-one) for any x and y in U , and

b. For any z in W , there is some x in U for which Tx = z (onto).

Two vector spaces U and V are called isomorphic if there is an isomorphismbetween them.

Being isomorphic means effectively being the same for purposes of linearalgebra.

Remark 1.12. When working with an abstract vector space V , the role that hasup to now been played by a change of basis matrix will henceforth be played byan isomorphism F : Rn → V . Equivalently, Fe1, Fe2, . . . , F en is a basis of V .

Let V be the vector space of polynomials p(x) = a+ bx+ cx2 of degree

Linear Maps 11

at most 2. Let F : R3 → V be the map

F

abc

= a+ bx+ cx2.

Clearly F is an isomorphism.

1.18 Prove that a linear map T : U → V is an isomorphism just when its kernelis 0, and its image is V .

1.19 Let V be a vector space. Prove that I : V → V is an isomorphism.

1.20 Prove that an isomorphism T : U → V has a unique inverse map T−1 : V →U so that T−1T = 1 and TT−1 = 1, and that T−1 is linear.

1.21 Let V be the set of polynomials of degree at most 2, and map T : V → R3

by, for any polynomial p,

Tp =

p(0)p(1)p(2)

.

Prove that T is an isomorphism.

Theorem 1.13. If v1, v2, . . . , vn is a basis for a vector space V , and w1, w2, . . . , wnare any vectors in a vector space W , then there is a unique linear map T : V →W so that Tvi = wi.

Proof. If there were two such maps, say S and T , then S − T would vanish onv1, v2, . . . , vn, and therefore by linearity would vanish on any linear combinationof v1, v2, . . . , vn, therefore on any vector, so S = T .

To see that there is such a map, we know that each vector x in V can bewritten uniquely as

x = x1v1 + x2v2 + · · ·+ xnvn.

So lets defineTx = x1w1 + x2w2 + · · ·+ xnwn.

If we take two vectors, say x and y, and write them as linear combinations ofbasis vectors, say with

x = x1v1 + x2v2 + · · ·+ xnvn,

y = y1v1 + y2v2 + · · ·+ ynvn,

then

T (x+ y) = (x1 + y1)w1 + (x2 + y2)w2 + · · ·+ (xn + yn)wn= Tx+ Ty.

12 Vector Spaces

Similarly, if we scale a vector x by a number a, then

ax = a x1v1 + a x2v2 + · · ·+ a xnvn,

so that

T (ax) = a x1w1 + a x2w2 + · · ·+ a xnwn

= a Tx.

Therefore T is linear.

Corollary 1.14. A vector space V has dimension n just when it is isomorphicto Kn. To each basis

v1, v2, . . . , vn

we associate the unique linear isomorphism F : Rn → V so that

F (e1) = v1, F (e2) = v2, . . . , F (en) = vn.

Suppose that T : V →W is a linear map between finite dimensional vectorspaces, and we have a basis

v1, v2, . . . , vp ∈ V

and a basisw1, w2, . . . , wq ∈W.

Then we can write each element Tvi somehow in terms of these w1, w2, . . . , wq,say

Tvj =∑i

Aijwi,

for some numbers Aij . Let A be the matrix with entries Aij ; we say that A isthe matrix of T in these bases.

Let F : Rp → V be the isomorphism

F (ej) = vj ,

and let G : Rq →W be the isomorphism

G (ei) = wi.

Then G−1TF : Rp → Rq is precisely the linear map

G−1TFx = Ax,

given by the matrix A. The proof: clearly

Aej = j-th column of A,

=

A1jA2j...Aqj

,

=∑i

Aijei.

Review problems 13

ThereforeGAej =

∑i

Aijwj = Tvj = TFej .

So GA = TF , or A = G−1TF .We can now just take all of the theorems we have previously proven about

matrices and prove them for linear maps between finite dimensional vectorspaces, by just replacing the linear map by its matrix. For example,

Theorem 1.15. Let T : U → V be a linear transformation of finite dimensionalvector spaces. Then

dim kerT + dim imT = dimU.

The proof is that the kernels and images are identified when we match upT and A using F and G as above.Definition 1.16. If T : U → V is a linear map, and W is a subspace of U , therestriction, written T |W : W → V , is the linear map defined by T |W (w) = Twfor w in W , only allowing vectors from W to map through T .

Review problems

1.22 Prove that if linear maps satisfy PS = T and P is an isomorphism, thenS and T have the same kernel, and isomorphic images.

1.23 Prove that if linear maps satisfy SP = T , and P is an isomorphism, thenS and T have the same image and isomorphic kernels.

1.24 Prove that dimension is invariant under isomorphism.

1.25 Prove that the space of all p× q matrices is isomorphic to Rpq.

Quotient Spaces

A subspace W of a vector space V doesn’t usually have a natural choice ofcomplementary subspace. For example, if V = R2, and W is the vertical axis,then we might like to choose the horizontal axis as a complement toW . But thischoice is not natural, because we could carry out a linear change of variables,fixing the vertical axis but not the horizontal axis (for example, a shear alongthe vertical direction). There is a natural choice of vector space which playsthe role of a complement, called the quotient space.Definition 1.17. If V is a vector space and W a subspace of V , and v a vectorin V , the translate v+W of W is a set of vectors in V of the form v+w wherew is in W .

The translates of the horizontal plane through 0 in R3 are just thehorizontal planes.

14 Vector Spaces

1.26 Prove that any subspace W will have

w +W = W,

for any w from W .

Remark 1.18. If we take W the horizontal plane (x3 = 0) in R3, then thetranslates 0

01

+W and

721

+W

are the same, because we can write721

=

001

+

720

+W =

001

+W.

This is the main idea behind translates: two vectors make the same translatejust when their difference lies in the subspace.Definition 1.19. If x+W and y+W are translates, we add them by (x+W ) +(y +W ) = (x+ y) +W . If s is a number, let s(x+W ) = sx+W .

1.27 Prove that addition and scaling of translates is well-defined, independentof the choice of x and y in a given translate.

Definition 1.20. The quotient space V/W of a vector space V by a subspace Wis the set of all translates v +W of all vectors v in V .

Take V the plane, V = R2, and W the vertical axis. The translates ofW are the vertical lines in the plane. The quotient space V/W has thevarious vertical lines as its points. Each vertical line passes through thehorizontal axis at a single point, uniquely determining the vertical line.So the translates are the points(

x0

)+W.

The quotient space V/W is just identified with the horizontal axis, bytaking (

x0

)+W to x.

Lemma 1.21. The quotient space V/W of a vector space by a subspace is avector space. The map T : V → V/W given by the rule Tx = x+W is an ontolinear map.

Determinants 15

Remark 1.22. The concept of quotient space can each be circumvented by usingsome complicated matrices, as can everything in linear algebra, so that onenever really needs to use abstract vector spaces. But that approach is far morecomplicated and confusing, because it involves a choice of basis, and there isusually no natural choice to make. It is always easiest to carry out linear algebraas abstractly as possible, descending into choices of basis at the latest possiblestage.

Proof. One has to check that (x + W ) + (y + W ) = (y + W ) + (x + W ), butthis follows from x+ y = y+x clearly. Similarly all of the laws of vector spaceshold. The 0 element of V/W is the translate 0 + W , i.e. W itself. To checkthat T is linear, consider scaling: T (sx) = sx+W = s(x+W ), and addition:T (x+ y) = x+ y +W = (x+W ) + (y +W ).

Lemma 1.23. If U and W are subspaces of a vector space V , and V = U ⊕Wa direct sum of subspaces, then the map T : V → V/W taking vectors v tov +W restricts to an isomorphism T |U : U → V/W .

Remark 1.24. So, while there is no natural complement to W , every choice ofcomplement is naturally identified with the quotient space.

Proof. The kernel of T is clearly U ∩W = 0. To see that T is onto, take avector v+W in V/W . Because V = U ⊕W , we can somehow write v as a sumv = u+ w with u from U and w from W . Therefore v +W = u+W = T |U ulies in the image of T |U .

Theorem 1.25. If V is a finite dimensional vector space and W a subspaceof V , then

dimV/W = dimV − dimW.

Definition 1.26. If T : U → V is a linear map, and U0 ⊂ U and V0 ⊂ V aresubspaces, and T (U0) ⊂ V0, we can define vector spaces U ′ = U/U0, V ′ = V/V0and a linear map T ′ : U ′ → V ′ so that T ′ (u+ U0) = (Tu) + V0.

It is easy to check that T ′ is a well defined linear map.

Determinants

Definition 1.27. If T : V → V is a linear map taking a finite dimensional vectorspace to itself, define detT to be

detT = detA,

where F : Rn → V is an isomorphism, and A is the matrix associated toF−1TF : Rn → Rn.Remark 1.28. There is no definition of determinant for a linear map of aninfinite dimensional vector space, and there is no general theory to handle suchthings, although there are many important examples.

16 Vector Spaces

Remark 1.29. A map T : U → V between different vector spaces doesn’t havea determinant.

1.28 Prove that value of the determinant is independent of the choice of iso-morphism F .

1.29 Let V be the vector space of polynomials of degree at most 2, and letT : V → V be the linear map Tp(x) = 2p(x− 1) (shifting a polynomial p(x) to2p(x− 1).) For example, T1 = 2, Tx = 2(x− 1), Tx2 = 2(x− 1)2.

a. Prove that T is a linear map.

b. Prove that T is an isomorphism.

c. Find detT .

Theorem 1.30. Suppose that S : V → V and T : V → V are diagonalizablelinear maps, i.e. each has a basis of eigenvectors. Then ST = TS just whenthere is a basis which consists of eigenvectors simultaneously for both S and T .

This is hard to prove for matrices, but easy in the abstract setting of linearmaps.

Proof. If there is a basis of simultaneous eigenvectors, then clearly the matricesof S and T are diagonal in that basis, so commute, so ST = TS. Now supposethat ST = TS.

Clearly the result is true if dimV = 1. More generally, clearly the result istrue if T = λI for any constant λ, because all vectors in V are then eigenvectorsof T .

More generally, for any eigenvalue λ of T , let Vλ be the λ-eigenspace of T .Because T is diagonal, the sum

V = Vλ1 ⊕ Vλ2 ⊕ · · · ⊕ Vλp

summed over the eigenvalues λ1, λ2, . . . , λp of T is isomorphic to a direct sum.We claim that each Vλ is S-invariant, for each eigenvalue λ of T . The proof: pickany vector v ∈ Vλ. We want to prove that Sv ∈ Vλ. Since Vλ is the λ-eigenspaceof T , clearly Tv = λv. But then we need to prove that T (Sv) = λ(Sv). This iseasy:

TSv = STv,

= Sλv,

= λSv.

Review problems 17

Review problems

1.30 Let T : V → V be the linear map Tx = 2x. Suppose that V has dimensionn. What is detT?

1.31 Let V be the vector space of all 2×2 matrices. Let A be a 2×2 matrix withtwo different eigenvalues, λ1 and λ2, and eigenvectors x1 and x2 correspondingto these eigenvalues. Consider the linear map T : V → V given by TB =AB (matrix multiplication on the right hand side of B by A). What are theeigenvalues of T and what are the eigenvectors? (Warning: the eigenvectorsare vectors from V , so they are matrices.) What is detT?

1.32 The same but let TB = BA.

1.33 Let V be the vector space of polynomials of degree at most 2, and letT : V → V be defined by Tq(x) = q(−x). What is the characteristic polynomialof T? What are the eigenspaces of T? Is T diagonalizable?

1.34 (Due to Peter Lax [4].) Consider the problem of finding a polynomialp(x) with specified average values on each of a dozen intervals on the x-axis.(Suppose that the intervals don’t overlap.) Does this problem have a solution?Does it have many solutions? (All you need is a naive notion of average value,but you can consult a calculus book, for example [9], for a precise definition.)

(a) For each polynomial p of degree n, let Tp be the vector whose entries arethe averages. Suppose that the number of intervals is at least n. Showthat Tp = 0 only if p = 0.

(b) Suppose that the number of intervals is no more than n. Show that wecan solve Tp = b for any given vector b.

1.35 How many of the “nutshell” criteria for invertibility of a matrix can youtranslate into criteria for invertibility of a linear map T : U → V ? How muchmore if we assume that U and V are finite dimensional? How much more if weassume as well that U = V ?

Complex Vector Spaces

If we change the definition of a vector space, a linear map, etc. to use complexnumbers instead of real numbers, we have a complex vector space, complexlinear map, etc. All of the examples so far in this chapter work just as well withcomplex numbers replacing real numbers. We will refer to a real vector spaceor a complex vector space to distinguish the sorts of numbers we are using toscale the vectors. Some examples of complex vector spaces:

a. Cn

b. The space of p× q matrices with complex entries.

18 Vector Spaces

c. The space of complex-valued functions of a real variable.

d. The space of infinite sequences of complex numbers.

Inner Product Spaces

Definition 1.31. An inner product on a real vector space V is a choice of a realnumber 〈x, y〉 for each pair of vectors x and y so that

a. 〈x, y〉 is a real-valued linear map in x for each fixed y

b. 〈x, y〉 = 〈y, x〉

c. 〈x, x〉 ≥ 0 and equal to 0 just when x = 0.

A real vector space equipped with an inner product is called a inner productspace. A linear map between vector spaces is called orthogonal if it preservesinner products.

Theorem 1.32. Every inner product space of dimension n is carried by someorthogonal isomorphism to Rn with its usual inner product.

Proof. Use the Gram–Schmidt process to construct an orthonormal basis, usingthe same formulas we have used before, say u1, u2, . . . , un. Define a linear mapFx = x1u1 + · · ·+ xnun, for x in Rn. Clearly F is an orthogonal isomorphism.

Take A any symmetric n× n matrix with positive eigenvalues, and let〈x, y〉A = 〈Ax, y〉 (with the usual inner product on Rn appearing onthe right hand side). Then the expression 〈x, y〉A is an inner product.Therefore by the theorem, we can find a change of variables taking itto the usual inner product.

Definition 1.33. A linear map T : V → V from an inner product space to itselfis symmetric if 〈Tv,w〉 = 〈v, Tw〉 for any vectors v and w.

Theorem 1.34 (Spectral Theorem). Given a symmetric linear map T on afinite dimensional inner product space V , there is an orthogonal isomorphismF : Rn → V for which F−1TF is the linear map of a diagonal matrix.

Hermitian Inner Product Spaces

Definition 1.35. A Hermitian inner product on a complex vector space V is achoice of a complex number 〈z, w〉 for each pair of vectors z and w from V sothat

a. 〈z, w〉 is a complex-valued linear map in z for each fixed w

Review problems 19

b. 〈z, w〉 = 〈w, z〉

c. 〈z, z〉 ≥ 0 and equal to 0 just when z = 0.

Review problems

1.36 Let V be the vector space of complex-valued polynomials of a complexvariable of degree at most 3. Prove that for any four distinct points z1, z2, z3, z4,the expression

〈p(z), q(z)〉 = p (z0) q (z0) + p (z1) q (z1) + p (z2) q (z2) + p (z3) q (z3) +

is a Hermitian inner product.

1.37 Continuing the previous question, if the points z0, z1, z2, z3 are z0 =1, z1 = −1, z2 = i, z3 = −i, prove that the map T : V → V given by Tp(z) =p(−z) is unitary.

1.38 Continuing the previous two questions, unitarily diagonalize T .

1.39 State and prove a spectral theorem for normal complex linear mapsT : V → V on a Hermitian inner product space, and define the terms adjoint,normal and unitary for complex linear maps V → V .

Chapter 2

Fields

Instead of real or complex numbers, we can dream up wilder notions of numbers.

Definition 2.1. A field is a set F equipped with operations + and · so that

a. Addition laws

1. x+ y is in F2. (x+ y) + z = x+ (y + z)3. x+ y = y + x

for any x, y and z from F .

b. Zero laws

1. There is an element 0 of F for which x+ 0 = x for any x from F

2. For each x from F there is a y from F so that x+ y = 0.

c. Multiplication laws

1. xy is in F2. x(yz) = (xy)z3. xy = yx

for any x, y and z in F .

d. Identity laws

1. There is an element 1 in F for which x1 = 1x = x for any x in F .2. For each x 6= 0 there is a y 6= 0 for which xy = 1. (This y is called

the reciprocal or inverse of x.)3. 1 6= 0

e. Distributive law

1. x(y + z) = xy + xz

for any x, y and z in F .

We will not ask the reader to check all of these laws in any of our examples,because there are just too many of them. We will only give some examples; fora proper introduction to fields, see Artin [1].

21

22 Fields

Of course, the set of real numbers R is a field (with the usual additionand multiplication), as is the set C of complex numbers and the set Qof rational numbers. The set Z of integers is not a field, because theinteger 2 has no integer reciprocal.

Let F be the set of all rational functions p(x)/q(x), with p(x) and q(x)polynomials, and q(x) not the 0 polynomial. Clearly for any pair ofrational functions, the sum

p1(x)q1(x) + p2(x)

q2(x) = p1(x)q2(x) + q1(x)p2(x)q1(x)q2(x)

is also rational, as is the product, and the reciprocal.

2.1 Suppose that F is a field. Prove the uniqueness of 0, i.e. that there is onlyone element z = 0 in F which satisfies x+ z = x for any element x.

2.2 Prove the uniqueness of 1.

2.3 Let x be an element of a field F . Prove the uniqueness of the element yfor which x+ y = 0.

Henceforth, we write this y as −x.

2.4 Let x be an element of field F . If x 6= 0, prove the uniqueness of thereciprocal.

Henceforth, we write the reciprocal of x as 1x , and write x+ (−y) as x− y.

Some Finite Fields

Let F be the set of numbers F = {0, 1}. Carry out multiplication bythe usual rule, but when you add, x+ y won’t mean the usual addition,but instead will mean the usual addition except when x = y = 1, andthen we set 1 + 1 = 0. F is a field called the field of Boolean numbers.

2.5 Prove that for Boolean numbers, −x = x and 1x = x.

Suppose that p is a positive integer. Let F be the set of numbersFp = {0, 1, 2, . . . , p− 1}. Define addition and multiplication as usualfor integers, but if the result is bigger than p−1, then subtract multiplesof p from the result until it lands in Fp, and let that be the definition

Some Finite Fields 23

of addition and multiplication. F2 is the field of Boolean numbers. Weusually write x = y (mod p) to mean that x and y differ by a multipleof p.For example, if p = 7, we find

5 · 6 = 30 (mod 7)= 30− 28 (mod 7)= 2 (mod 7).

This is arithmetic in F7.It turns out that Fp is a field for any prime number p.

2.6 Prove that Fp is not a field if p is not prime.

The only trick in seeing that Fp is field is to see why there is a reciprocal.It can’t be the usual reciprocal as a number. For example, if p = 7

6 · 6 = 36 (mod 7)= 36− 35 (mod 7)= 1 (mod 7)

(because 35 is a multiple of 7). So 6 has reciprocal 6 in F7.

The Euclidean Algorithm

To compute reciprocals, we first need to find greatest common divisors,using the Euclidean algorithm. The basic idea: given two numbers, forexample 12132 and 2304, divide the smaller into the larger, writing aquotient and remainder:

12132− 5 · 2304 = 612.

Take the two last numbers in the equation (2304 and 612 in this exam-ple), and repeat the process on them, and so on:

2304− 3 · 612 = 468612− 1 · 468 = 144468− 3 · 144 = 36144− 4 · 36 = 0.

Stop when you hit a remainder of 0. The greatest common divisor ofthe numbers you started with is the last nonzero remainder (36 in ourexample).Now that we can find the greatest common divisor, we will need towrite the greatest common divisor as an integer linear combination ofthe original numbers. If we write the two numbers we started withas a and b, then our goal is to compute integers u and v for which

24 Fields

ua + bv = gcd(a, b). To do this, lets go backwards. Start with thesecond to last equation, giving the greatest common divisor.

36 = 468− 3 · 144

Plug the previous equation into it:

= 468− 3 · (612− 1 · 468)

Simplify:

= −3 · 612 + 4 · 468

Plug in the equation before that:

= −3 · 612 + 4 · (2304− 3 · 612)= 4 · 2304− 15 · 612= 4 · 2304− 15 · (12132− 5 · 2304)= −15 · 12132 + 79 · 2304.

We have it: gcd(a, b) = u a+b v, in our case 36 = −15 ·12132+79 ·2304.What does this algorithm do? At each step downward, we are facing anequation like a−bq = r, so any number which divides into a and b mustdivide into r and b (the next a and b) and vice versa. The remaindersr get smaller at each step, always smaller than either a or b. On thelast line, b divides into a. Therefore b is the greatest common divisorof a and b on the last line, and so is the greatest common divisor of theoriginal numbers. We express each remainder in terms of previous aand b numbers, so we can plug them in, cascading backwards until weexpress the greatest common divisor in terms of the original a and b.In the example, that gives (−15)(12132) + (79)(2304) = 36.Let compute a reciprocal modulo an integer. Lets compute 17−1 modulo1009. Take a = 1009, and b = 17.

1009− 59 · 17 = 617− 2 · 6 = 56− 1 · 5 = 15− 5 · 1 = 0.

Going backwards

1 = 6− 1 · 5= 6− 1 · (17− 2 · 6)= −1 · 17 + 3 · 6= −1 · 17 + 3 · (1009− 59 · 17)= −178 · 17 + 3 · 1009.

Matrices 25

So finally, modulo 1001, −178 ·17 = 1. So 17−1 = −178 = 1009−178 =831 (mod 1009).This is how we can compute reciprocals in Fp: we take a = p, and bthe number to reciprocate, and apply the process. If p is prime, theresulting greatest common divisor is 1, and so we get up+ vb = 1, andso vb = 1 (mod p), so v is the reciprocal of b.

2.7 Compute 15−1 in F79.

2.8 Solve the linear equation 3x+ 1 = 0 in F5.

2.9 Prove that Fp is a field whenever p is a prime number.

Matrices

Matrices with entries from any field F are added, subtracted, and multiplied bythe same rules. We can still carry out forward elimination, back substitution,calculate inverses, determinants, characteristic polynomials, eigenvectors andeigenvalues, using the same steps.

2.10 Let F be the Boolean numbers, and A the matrix

A =

0 1 01 0 11 1 0

,

thought of as having entries from F . Is A invertible? If so, find A−1.

All of the ideas of linear algebra worked out for the real and complex numbershave obvious analogues over any field, except for the concept of inner product,which is much more sophisticated. From now on, we will only state and proveresults for real vector spaces, but those results which do not require innerproducts (or orthogonal or unitary matrices) continue to hold with identicalproofs over any field.

2.11 If A is a matrix whose entries are rational functions of a variable t, provethat the rank of A is constant in t, except for finitely many values of t.

Polynomials

Consider the fieldF2 = {0, 1} .

Consider the polynomialp(x) = x2 + x.

26 Fields

Clearly

p(0) = 02 + 0,= 0.

Keeping in mind that 2 = 0 in F2, clearly

p(1) = 12 + 1,= 1 + 1,= 0.

Therefore p(x) = 0 for any value of x in F2. So p(x) is zero, as a function.But we will still want to say that p(x) is not zero as a polynomial, becauseit is x2 + x, a sum of powers of x with nonzero coefficients. We should thinkof polynomials as abstract expressions, sums of constants times powers of avariable x, and distinguish them from polynomial functions. Think of x asjust a symbol, abstractly, not representing any value. So p(x) is nonzero as apolynomial (because it has nonzero coefficients), but p(x) is zero as a polynomialfunction.

A rational function is a ratio p(x)/q(x) of polynomials, with q(x) not thezero polynomial. CAREFUL: it isn’t really a function, and should probablybe called something like a rational expression. We are stuck with the standardterminology here. We consider two such expressions to be the same aftercancellation of any common factor from numerator and denominator. So 1/x isa rational function, in any field, and x/x2 = 1/x in any field. Define addition,subtraction, multiplication and division of rational functions as you expect. Forexample,

1x

+ 1x+ 1 = 2x+ 1

x(x+ 1) ,

over any field.CAREFUL: over the field F2, we know that x2 + x vanishes for every x. So

the rational functionf(x) = 1

x2 + x

is actually not defined, no matter what value of x you plug in, because thedenominator vanishes. But we still consider it a perfectly well defined rationalfunction, since it is made out of perfectly well defined polynomials.

If x is an abstract variable (think of just a letter, not a value taken from anyfield), then we write F (x) for the set of all rational functions p(x)/q(x). ClearlyF (x) is a field.fs For example, F2(x) contains 0, 1, x, 1 + x, 1/x, 1/(x+ 1), . . . .

Subfields

If K is a field, a subfield F ⊂ K is a subset containing 0 and 1 so that if a andb are in F , then a + b, a − b and ab are in F , and, if b 6= 0, then a/b is in F .In particular, F is itself a field. For example, Q ⊂ R, R ⊂ C, and Q ⊂ C aresubfields. Another example: if F is any field, then F ⊂ F (x) is a subfield.

Splitting fields 27

2.12 Find all of the subfields of F7.

2.13 Find a subfield of C other than R or Q.

Example: Over the field R, the polynomial x2+1 has no roots. A polynomialp(x) with coefficients in a field F splits if it is a product of linear factors. IfF ⊂ K is a subfield, we say that a polynomial p(x) splits over K if it splitsinto a product of linear factors, allowing the factors to have coefficients fromK. Example: x2 + 1 splits over C:

x2 + 1 = (x− i) (x+ i) .

If F ⊂ K is a subfield, then K is an F -vector space. For example, C is anR-vector space of dimension 2. The dimension of K as an F -vector space iscalled the degree of K over F .

Splitting fields

We won’t prove the following theorem:

Theorem 2.2. If F is a field and p(x) is a polynomial over F , then there isa field K containing F as a subfield, over which p(x) splits into linear factors,and so that every element of K is expressible as a rational function of the rootsof p(x) with coefficients from F . Moreover, K has finite degree over F .

This field K is uniquely determined up to an isomorphism of fields, and iscalled the splitting field of p(x) over F .

For example, over F = R the polynomial p(x) = x2 + 1 has splitting fieldC:

x2 + 1 = (x− i) (x+ i) .

Example of a splitting field

Consider the polynomial p(x) = x2 + x+ 1 over the finite field F2. Let’s lookfor roots of p(x), i.e. eigenvalues. Try x = 0:

p(0) = 02 + 0 + 1 = 1,

no good. Try x = 1:

p(1) = 12 + 1 + 1 = 1 + 1 + 1 = 1,

since 1 + 1 = 0. No good. So p(x) has no eigenvalues in F2. We know bytheorem 2.2 that there is some splitting field K for p(x), containing F2, so thatp(x) splits into linear factors over K, say

p(x) = (x− α) (x− β) ,

for some α, β ∈ K.

28 Fields

What can we say about this mysterious field K? We know that it containsF2, contains α, contains β, and that everything in it is made up of rationalfunctions over F2 with α and β plugged in for the variables. We also know thatK has finite dimension over F2. Otherwise, K is a total mystery: we don’tknow its dimension, or a basis of it over F , or its multiplication or additionrules, or anything. We know that in F2, 1 + 1 = 0. Since F2 ⊂ K is a subfield,this holds in K as well. So in K, for any c ∈ K,

c(1 + 1) = c0 = 0.

Therefore c + c = 0 in K, for any element c ∈ K. Roughly speaking, thearithmetic rules in F2 impose themselves in K as well.

A clever trick, which you probably wouldn’t notice at first: it turns out thatβ = α+ 1. Why? Clearly by definition, α is a root of p(x), i.e.

α2 + α+ 1 = 0.

So then let’s try α+ 1 and check that it is also a root.

(α+ 1)2 + (α+ 1) + 1 = α2 + 2α+ 1 + α+ 1 + 1,

but numbers in K cancel in pairs, c+ c = 0, so

= α2 + α+ 1,= 0

since α is a root of p(x). So therefore elements of K can be written in terms ofα purely.

The next fact about K: clearly

{0, 1, α, α+ 1} ⊂ K.

We want to prove that{0, 1, α, α+ 1} = K.

How? First, lets try to make up an addition table for these 4 elements:

+0 1 α α+ 1

0 0 1 α α+ 11 1 0 α+ 1 αα α α+ 1 0 1

α+ 1 α+ 1 α 1 0

To make up a multiplication table, we need to note that

0 = α2 + α+ 1,

so thatα2 = −α− 1 = α+ 1,

Example of a splitting field 29

andα(α+ 1) = α2 + α = α+ 1 + α = 1.

Therefore(α+ 1) (α+ 1) = α2 + 2α+ 1 = α.

This gives the complete multiplication table:

·0 1 α α+ 1

0 0 0 0 01 0 1 α α+ 1α 0 α α+ 1 1

α+ 1 0 α+ 1 1 α

Looking for reciprocals, we find that

10 does not exist,11 = 1,1α

= α+ 1,1

α+ 1 = α.

So {0, 1, α, α+ 1} is a field, containing F2, and p(x) splits over this field, andthe field is generated by F2 and α, so this field must be the splitting field ofp(x):

{0, 1, α, α+ 1} = K.

So K is the finite field with 4 elements, K = F4.

2.14 Consider the polynomial

p(x) = x3 + x2 + 1

over the field F2. Suppose that that splitting field K of p(x) contains a root αof p(x). Prove that α2 and 1 + α + α2 are the two other roots. Compute theaddition table and the multiplication table of the 8 elements

0, 1, α, 1 + α, α2, 1 + α2, α+ α2, 1 + α+ α2.

Use this to prove that

K ={

0, 1, α, 1 + α, α2, 1 + α2, α+ α2, 1 + α+ α2}so K is the finite field F8.

30 Fields

Construction of splitting fields

Suppose that F is a field and p(x) is a polynomial over F . We say that p(x) isirreducible if p(x) does not split into a product of factors over F . Basic fact: ifp(x) is irreducible, and p(x) divides a product q(x)r(x), then p(x) must divideone of the factors q(x) or r(x).

Suppose that p(x) is a nonconstant irreducible polynomial. (Think forexample of x2 + 1 over F = R, to have some concrete example in mind.) Wehave no roots of p(x) in F , so can we construct a splitting field explicitly?

Let V be the vector space of all polynomials over F in a variable x. LetW ⊂ V be the subspace consisting of all polynomials divisible by p(x). Clearlyif p(x) divides two polynomials, then it divides their sum, and their scalings,so W ⊂ V is a subspace. Let K = V/W . So K is a vector space.

Every element of K is a translate, so has the form

q(x) +W,

for some polynomial q(x). Any two translates, say q(x) +W and r(x) +W , areequal just when q(x)− r(x) ∈ W , as in our general theory of quotient spaces.So this happens just when q(x) − r(x) is divisible by p(x). In other words,if you write down a translate q(x) + W ∈ K and I write down a translater(x) +W ∈ K, then these will be the same translate just exactly when

r(x) = q(x) + p(x)s(x),

for some polynomial s(x) over F .So far K is only a vector space. Let’s make K into a field. We know have

to add elements of K, since K is a vector space. How do we multiply elements?Take two elements, say

q(x) +W, r(x) +W,

and try to define their product to be

q(x)r(x) +W.

Is this well defined? If I write the same translates down differently, I couldwrite them as

q(x) +Q(x)p(x) +W, r(x) +R(x)p(x) +W,

and my product would turn out to be

(q(x) +Q(x)p(x)) (r(x) +R(x)p(x)) +W

=q(x)r(x) + p(x) (q(x)R(x) +Q(x)r(x) +Q(x)R(x)) +W,

=q(x)r(x) +W,

the same translate, since your result and mine agree up to multiples of p(x), sorepresent the same translate. So now we can multiple elements of K.

Construction of splitting fields 31

The next claim is that K is a finite dimensional vector space. This is notobvious, since K = V/W and both V and W are infinite dimensional vectorspaces. Take any element of K, say q(x) +W , and divide q(x) by p(x), say

q(x) = p(x)Q(x) + r(x),

a quotientQ(x) and remainder r(x). Clearly q(x) differs from r(x) by a multipleof p(x), so

q(x) +W = r(x) +W.

Therefore every element of K can be written as a translate

r(x) +W

for r(x) a polynomial r(x) of degree less than the degree of p(x). Clearlyr(x) is unique, since we can’t quotient out anything of lower degree than p(x).Therefore K is identified as a vector space with the vector space of polynomialsin x of degree less than the degree of p(x).

The notation is much nicer if we write x + W as, say, α. Then clearlyx2 +W = α2, etc. so q(x) +W = q(α) for any polynomial q(x) over F . So wecan say that α ∈ K is an element so that p(α) = 0, so p(x) has a root over K.Moreover, every element of K is a polynomial over F in the element α.

We need to check that K is a field. The hard bit is checking that everyelement of K has a reciprocal. Pick any element q(α) ∈ K. We want toprove that q(α) has a reciprocal, i.e. that there is some element r(α) so thatq(α)r(α) = 1. Fix q(α) and consider the F -linear map

T : K → K,

given byT (r(α)) = q(α)r(α).

If q(α) 6= 0, then T (r(α)) = 0 just when q(α)r(α) = 0, i.e. just when q(x)r(x)is divisible by p(x). But since p(x) is irreducible, we know that p(x) dividesq(x)r(x) just when p(x) divides one of q(x) or r(x). But then q(x) +W = 0 orr(x) +W = 0, i.e. q(α) = 0 or r(α) = 0. We know by hypothesis that q(α) 6= 0,so r(α) = 0. In other words, the kernel of T is {0}. Therefore T is invertible.So T is an isomorphism of F -vector spaces. In particular, T is onto. So theremust exist some r(α) ∈ K so that

T (r(α)) = 1,

i.e.q(α)r(α) = 1,

so q(α) has a reciprocal,r(α) = 1

q(α) .

The remaining axioms of fields are easy to check, so K is a field, containingF , and containing a root α for p(x). Every element of K is a polynomial in α.

32 Fields

The dimension of K over F is finite. We only need to check that p(x) splitsover K into a product of linear factors. Clearly we can split off one linearfactor: x − α, since α is a root of p(x) over K. Inductively, if p(x) doesn’tcompletely split into a product of linear factors, we can try to factor out as manylinear factors as possible, and then repeat the whole process for any remainingnonlinear factors.

If you have to calculate in K, how do you do it? The elements of K look likeq(α), where α is just some formal symbol, and you add and multiply as usualwith polynomials. But we can always assume that q(α) is a polynomial in α ofdegree less than the degree of p(x), and then subtract off any p(x)-multipleswhen we multiply or add, since p(α) = 0. For example, if p(x) = x2 + 1, andF = R, then the field K consists of expressions like q(α) = b+ cα, where b andc are any real numbers. When we multiply, we just replace α2 + 1 by 0, i.e.replace α2 by −1. So we just get K being the usual complex numbers.

Transcendental numbers

Some boring examples: a number a ∈ C is algebraic if it is the solution of apolynomial equation p(a) = 0 where p(x) is a nonzero polynomial with rationalcoefficients. A number which is not algebraic is called transcendental. If x isan abstract variable, and F is a field, let

F (x)

be the set of all rational functions p(x)/q(x) in the variable x with p(x) andq(x) polynomials with coefficients from F .

Theorem 2.3. Take an abstract variable x. A number a ∈ C is transcendentalif and only if the field

Q(a) ={p(a)q(a)

∣∣∣∣ p(x)q(x) ∈ Q(x) and q(a) 6= 0

}is isomorphic to Q(x).

Proof. If a is transcendental, then the map

φ : p(x)q(x) ∈ Q(x) 7→ p(a)

q(a) ∈ Q(a)

is clearly well defined, onto, and preserves all arithmetic operations. Is φ 1-1?Equivalently, does φ have trivial kernel? Suppose that p(x)/q(x) lies in thekernel of φ. Then p(a) = 0. Therefore p(x) = 0. Therefore p(x)/q(x) = 0. Sothe kernel is trivial, and so φ is a bijection preserving all arithmetic operations,so φ is an isomorphism of fields.

On the other hand, take any complex number a and suppose that there issome isomorphism of fields

ψ : Q(x)→ Q(a).

Transcendental numbers 33

Let b = ψ(x). Because ψ is a field isomorphism, all arithmetic operationscarried out on x must then be matched up with arithmetic operations carriedout on b, so

ψ

(p(x)q(x)

)= p(b)q(b) .

Because ψ is an isomorphism, some element must map to a, say

ψ

(p0(x)q0(x)

)= a.

Sop0(b)q0(b) = a.

So Q(b) = Q(a). Any algebraic relation on a clearly gives one on b and viceversa. Therefore a is algebraic if and only if b is. Suppose that b is algebraic.Then q(b) = 0 for some polynomial q(x), and then ψ is not defined on 1/q(x),a contradiction.

Chapter 3

Direct Sums of Subspaces

Subspaces have a kind of arithmetic.

Definition 3.1. The intersection of two subspaces is the collection of vectorswhich belong to both of the subspaces. We will write the intersection of sub-spaces U and V as U ∩ V .

The subspace U of R3 given by the vectors of the form x1x2

x1 − x2

intersects the subspace V consisting in the vectors of the formx1

0x3

in the subspace written U ∩V , which consists in the vectors of the formx1

0x1

.

Definition 3.2. If U and V are subspaces of a vector space W , write U + V forthe set of vectors w of the form w = u + v for some u in U and v in V ; callU + V the sum.

3.1 Prove that U + V is a subspace of W .

Definition 3.3. If U and V are two subspaces of a vector space W , we will writeU +V as U ⊕V (and say that U ⊕V is a direct sum) to mean that every vectorx of U + V can be written uniquely as a sum x = y + z with y in U and z inZ. We will also say that U and V are complementary, or complements of oneanother.

35

36 Direct Sums of Subspaces

R3 = U ⊕ V for U the subspace consisting of the vectors

x =

x1x20

and V the subspace consisting of the vectors

x =

00x3

,

since we can write any vector x uniquely as

x =

x1x20

+

00x3

.

3.2 Give an example of two subspaces of R3 which are not complementary.

Theorem 3.4. U + V is a direct sum U ⊕ V just when U ∩ V consists of justthe 0 vector.

Proof. If U+V is a direct sum, then we need to see that U∩V only contains thezero vector. If it contains some vector x, then we can write x uniquely as a sumx = y+ z, but we can also write x = (1/2)x+ (1/2)x or as x = (1/3)x+ (2/3)x,as a sum of vectors from U and V . Therefore x = 0.

On the other hand, if there is more than one way to write x = y+z = Y +Zfor some vectors y and Y from U and z and Z from V , then 0 = (y−Y )+(z−Z),so Y − y = z − Z, a nonzero vector from U ∩ V .

Lemma 3.5. If U ⊕ V is a direct sum of subspaces of a vector space W , thenthe dimension of U ⊕ V is the sum of the dimensions of U and V . Moreover,putting together any basis of U with any basis of V gives a basis of W .

Proof. Pick a basis for U , say u1, u2, . . . , up, and a basis for V , say v1, v2, . . . , vq.Then consider the set of vectors given by throwing all of the u’s and v’s together.The u’s and v’ are linearly independent of one another, because any linearrelation

0 = a1u1 + a2u2 + · · ·+ apup + b1v1 + b2v2 + · · ·+ bqvq

would allow us to write

a1u1 + a2u2 + · · ·+ apup = − (b1v1 + b2v2 + · · ·+ bqvq) ,

so that a vector from U (the left hand side) belongs to V (the right hand side),which is impossible unless that vector is zero, because U and V intersect only

Application: Simultaneously Diagonalizing Several Matrices 37

at 0. But that forces 0 = a1u1 + a2u2 + · · ·+ apup. Since the u’s are a basis,this forces all a’s to be zero. The same for the b’s, so it isn’t a linear relation.Therefore the u’s and v’s put together give a basis for U ⊕ V .

We can easily extend these ideas to direct sums with many summandsU1 ⊕ U2 ⊕ · · · ⊕ Uk.

3.3 Prove that if U⊕V = W , then any linear maps S : U → Rp and T : V → Rpdetermine a unique linear mapQ : W → Rp written Q = S⊕T , so that Q|U = Sand Q|V = T .

Application: Simultaneously Diagonalizing Several Matrices

Theorem 3.6. Suppose that T1, T2, . . . , TN are linear maps taking a vectorspace V to itself, each diagonalizable, and each commuting with the other (whichmeans T1T2 = T2T1, etc.) Then there is a single invertible linear map F : Rn →V diagonalizing all of them.

Proof. Since T1 and T2 commute, if x is an eigenvector of T1 with eigenvalueλ, then T2x is too:

T1 (T2x) = T2T1x

= T2λx

= λ (T2x) .

So each eigenspace of T1 is invariant under T2. The same is true for any twoof the linear maps T1, T2, . . . , TN . Because T1 is diagonalizable, V is a directsum of the eigenspaces of T1. So it suffices to find a basis for each eigenspace ofT1, which diagonalizes all of the linear maps. It suffices to prove this on eacheigenspace separately. So lets restrict to an eigenspace of T1, where T1 = λ1.So T1 = λ1 is diagonal in any basis. By the same reasoning applied to T2, etc.,we can work on a common eigenspace of all of the Tj , arranging that T2 = λ2,etc., diagonal in any basis.

Transversality

Lemma 3.7. If V is a finite dimensional vector space, containing two subspacesU and W , then

dimU + dimW = dim(U +W ) + dim (U ∩W ) .

Proof. Take any basis for U ∩W . Then while you pick some more vectors fromU to extend it to a basis of U , I will simultaneously pick some more vectorsfrom W to extend it to a basis of W . Clearly can throw our vectors togetherto get a basis of U +W . Count them up.

This lemma makes certain inequalities on dimensions obvious.

38 Direct Sums of Subspaces

Lemma 3.8. If U and W are subspaces of an n dimensional vector space V ,say of dimensions p and q, then

max {0, p+ q − n} ≤ dim (U ∩W ) ≤ min {p, q} ,max {p, q} ≤ dim (U +W ) ≤ min {n, p+ q} .

Proof. All inequalities but the first are obvious. The first follows from the lastby applying lemma 3.7 on the previous page.

3.4 How few dimensions can the intersection of subspaces of dimensions 5 and3 in R7 have? How many?

Definition 3.9. Two subspaces U and W of a finite dimensional vector space Vare transverse if U +W = V .

3.5 How few dimensions can the intersection of transverse subspaces of dimen-sions 5 and 3 in R7 have? How many?

3.6 Must subspaces in direct sums be transverse?

Computations

In Rn, all abstract concepts of linear algebra become calculations.

3.7 Suppose that U and W are subspaces of Rn. Take a basis for U , and putit into the columns of a matrix, and call that matrix A. Take a basis for W ,and put it into the columns of a matrix, and call that matrix B. How do youfind a basis for U +W? How do you see if U +W is a direct sum?

Proposition 3.10. Suppose that U and W are subspaces of Rn and that Aand B are matrices whose columns give bases for U and W respectively. ApplyGaussian elimination to find a basis for the kernel of (A B), say(

x1y1

),

(x2y2

), . . . ,

(xsys

).

Then the vectors Ax1, Ax2, . . . , Axs form a basis for the intersection of U andW .

Proof. For example, Ax1 + By1 = 0, so Ax1 = −By1 = B(−y1) lies in theimage of A and of B. Therefore the vectors Ax1, Ax2, . . . , Axn lie in U ∩W .Suppose that some vector v also lies in U ∩W . Then v = Ax = B(−y) forsome vectors x and y. But then Ax+By = 0, so(

xy

)is a multiple of the vectors(

x1y1

),

(x2y2

), . . . ,

(xsys

).

Computations 39

so x =∑ajAxj , for some numbers aj . Therefore these Axj span the in-

tersection. Suppose they suffer some linear relation: 0 =∑cjAxj . So

0 = A∑cjxj . But the columns of A are linearly independent, so A is 1-1.

Therefore 0 =∑cjxj . At the same time,

0 =∑

cjAxj

=∑

cjB(−yj)

= B(−∑

cjyj).

But B is also 1-1, so 0 =∑cjyj . So

0 =∑

cj

(xjyj

).

But these vectors are linearly independent by construction.

Jordan Normal Form

Chapter 4

Jordan Normal Form

We can’t quite diagonalize every linear map, but we will see how close we can come:the Jordan normal form.

Jordan Normal Form and Strings

Diagonalizing is powerful. By diagonalizing a linear map, we can see whatit does completely, and we can easily carry out immense computations; forexample, finding large powers of a linear map. The trouble is that some linearmaps can’t be diagonalized. How close can we come?

A =(

0 10 0

)is the simplest possible example. Its only eigenvalue is λ = 0. As a realor complex matrix, it has only one eigenvector,(

10

),

up to rescaling. Not enough eigenvectors to form a basis of R2, so notenough to diagonalize. We will build this entire chapter from this simpleexample.

4.1 What does the map taking x to Ax look like for this matrix A?

Lets write

∆1 = (0) , ∆2 =(

0 10 0

), ∆3 =

0 1 00 0 10 0 0

,

and in general write ∆n or just ∆ for the square matrix with 1’s just above thediagonal, and 0’s everywhere else.

4.2 Prove that ∆ej = ej−1, except for ∆e1 = 0. So we can think of ∆ asshifting the standard basis, like the proverbial lemmings stepping forward untile1 falls off the cliff:

43

44 Jordan Normal Form

e1 e2 . . . en−1 en

0

∆

∆ ∆ ∆ ∆

A matrix of the form λ+∆ is called a Jordan block. Our goal in this chapteris to prove:

Theorem 4.1. Suppose that T : V → V is a linear map on a finite dimensionalvector space V over the field K = R or C, and that all of the eigenvalues of Tlie in the field K. Then there is a basis in which the matrix of T is in Jordannormal form:

λ1I +∆λ2I +∆

. . .λNI +∆

,

broken into Jordan blocks.

A λ-string for a linear map T : V → V with eigenvalue λ is a collection ofnonzero vectors of the form

(T − λI)k v (T − λI)k−1 v . . . (T − λI) v v

0.

T−λIT−λI T−λI T−λI T−λI

4.3 For T = λI +∆, show that e1, e2, . . . , en is a λ-string.

4.4 Find strings of

T =

2 0 0 0 0 00 2 1 0 0 00 0 2 0 0 00 0 0 3 1 00 0 0 0 3 10 0 0 0 0 3

.

Now we prove the theorem.

Proof. It is enough to give a basis of strings, since the isomorphism taking thatbasis to the standard basis will identify T with the matrix of Jordan blocks: Twill act on its strings just like each of those Jordan blocks acts on its strings.

If dimV = 1, any linear map T : V → V is a scaling, so the result is obvious.Suppose that dimV = n > 1. Pick an eigenvalue λ of T , and a λ-eigenvector vof T . Let V0 be the span of v, and V1 = V/V0. Since T takes V0 to V0, we candefine a quotient map T1 : V1 → V1, so

T1 (w + V0) = Tw + V0.

Jordan Normal Form and Strings 45

By induction, T1 has a basis of strings. Pick one such string for an eigenvalueµ 6= λ, say

(T1 − µI)` u (T1 − µI)`−1 u . . . (T1 − µI)u u

0.

T1−µIT1−µI T1−µI T1−µI T1−µI

Since u ∈ V1, we can write u as a translate u = w + V0. Then

(T1 − µI)`+1u = 0,

tells us that(T − µI)`+1

w = cv,

for some number c ∈ K, since 0 ∈ V1 corresponds precisely to elements spannedby v back up in V .

This w vector might not give us a string. Lets patch it by trying to addsomething on to it: try w + bv for some number b ∈ K, and check that

(T − µI)`+1 (w + bv) =(c+ b (λ− µ)`+1

)v.

We can letb = − c

(λ− µ)`+1 ,

and find that w + bv lies in a string in V which maps to the original stringdown in V1.

Next, what if µ = λ? We take a λ-string in V1, say

(T1 − λI)` u (T1 − λI)`−1 u . . . (T1 − λI)u u

0.

T1−λIT1−λI T1−λI T1−λI T1−λI

Again we write u as w + V0, some w ∈ V . Clearly

(T1 − λI)`+1u = 0

implies that(T − λI)`+1

w = cv

for some c ∈ K. If c = 0, then w starts a string, and we use that string. Ifc 6= 0, then

(T − λI)`+2w = 0

and again w starts a string, but one step longer and containing cv. Rescale toarrange c = 1, so v belongs to this string.

If we end up with two different strings both containing v, the difference ofthe two will end up at 0 instead of v, so a shorter string. Therefore we can


arrange that only one string at most contains v. If v doesn’t already show upin any one of our strings, then we can add v to a string too, just made of v.

We have lifted every string in V1 to a string in V . The vectors in our stringsmap to a basis of V1, so their span has dimension at least n − 1. But v is inour strings too, so we have all n dimensions of V spanned by our strings. Ourstrings have exactly n vectors in them, so form a basis.

Generalized Eigenspaces

If λ is an eigenvalue of a linear map T : V → V , a vector v ∈ V is a generalizedeigenvector of T with eigenvalue λ if (T − λI)k v = 0, for some integer k > 0.If k = 1 then x is an eigenvector in the usual sense.

A =(

0 10 0

)satisfies A2 = 0, so every vector x in K2 is a generalized eigenvector of A,with eigenvalue 0. In the generalized sense, we have lots of eigenvectors.

4.5 Prove that every vector in Kn is a generalized eigenvector of ∆ with eigen-value 0. Then prove that for any number λ, every vector in Kn is a generalizedeigenvector of λ+∆ with eigenvalue λ.

For example, each vector in a λ-string is a generalized eigenvector witheigenvalue λ.

Suppose that T : V → V is a linear map, pick some λ ∈ K and let

Vλ(T ) ⊂ V

be the set of all generalized eigenvectors of T with eigenvalue λ. Call Vλ(T ) thegeneralized λ-eigenspace of T .

Lemma 4.2. The generalized eigenspaces of a linear map are subspaces.

Proof. If v ∈ Vλ(T ) and c ∈ K, we need to prove that cv ∈ Vλ(T ). By definition,v is a generalized eigenvector with eigenvalue λ, so

(T − λI)k v = 0

for some positive integer k. But then

(T − λI)k (cv) = c (T − λI)k v = 0.

Similarly, if v1, v2 ∈ Vλ(T ), say

(T − λI)k1 v1 = 0

Jordan Normal Form and Strings 47

and(T − λI)k2 v2 = 0

(the powers k1 and k2 could be different) then clearly

(T − λI)max(k1,k2) (v1 + v2) = 0.

4.6 Prove that every nonzero generalized eigenvector v belongs to a string, andall of the vectors in the string are linearly independent.

4.7 Prove that nonzero generalized eigenvectors of a square matrix, with dif-ferent eigenvalues, are linearly independent. Prove that the sum of all of thegeneralized eigenspaces is a direct sum.

Corollary 4.3. If T : V → V is a linear map on a finite dimensional vectorspace V over K = R or C, and all eigenvalues of T belong to K, then V is adirect sum

V =⊕λ

Vλ(T )

of the generalized eigenspaces of T .

Cracking of Jordan Blocks

Jordan normal form is very sensitive to small changes in matrix entries. Forthis reason, we cannot compute Jordan normal form unless we know the matrixentries precisely, to infinitely many decimals.

4.8 If T : V → V is a linear map on an n-dimensional vector space and has ndifferent eigenvalues, show that T is diagonalizable.

The matrix∆2 =

(0 10 0

)is not diagonalizable, but the nearby matrix(

ε1 10 ε2

)is, as long as ε1 6= ε2, since these ε1 and ε2 are the eigenvalues of thismatrix. It doesn’t matter how small ε1 and ε2 are. The same ideaclearly works for ∆ of any size, and for λ + ∆ of any size, and so forany matrix in Jordan normal form.


Theorem 4.4. Every linear map T : V → V on a finite dimensional complexvector space can be approximated as closely as we like by a diagonalizable linearmap.

Remark 4.5. By “approximated”, we mean that we can make a new linear mapwhose matrix entries, in any basis we like, are all as close as we like to theentries of the matrix of the original linear map.

Proof. Put your linear map into Jordan normal form, say F−1TF , and thenuse the trick from the last example, to make diagonalizable matrices B closeto F−1TF . Then FBF−1 is diagonalizable too, and is close to T .

Remark 4.6. Using the same proof, we can also approximate any real linearmap arbitrarily closely by diagonalizable real linear maps (i.e. diagonalized byreal change of basis matrices), just when all of its eigenvalues are real.

Uniqueness of Jordan Normal Form

Proposition 4.7. A linear map T : V → V on a finite dimensional vectorspace has at most one Jordan normal form, up to reordering the Jordan blocks.

Proof. Clearly the eigenvalues are independent of change of basis. The problemis to figure out how to measure, for each eigenvalue, the number of blocks ofeach size. Fix an eigenvalue λ. Let

dm (λ, T ) = dim ker (T − λI)m .

(We will only use this notation in this proof.) Clearly dm (λ, T ) is independentof choice of basis. For example, d1 (λ, T ) is the number of blocks with eigenvalueλ, while d2 (λ, T ) counts vectors at or next to the end of strings with eigenvalueλ. All 1 × 1 blocks contribute to both d1 (λ, T ) and d2 (λ, T ). The differenced2 (λ, T )− d1 (λ, T ) is the number of blocks of size at least 2× 2. Similarly thenumber d3 (λ, T )−d2 (λ, T ) measures the number of blocks of size at least 3×3,etc. But then the difference

(d2 (λ, T )− d1 (λ, T ))−(d3 (λ, T )− d2 (λ, T )) = 2 d2 (λ, T )−d1 (λ, T )−d3 (λ, T )

is the number blocks of size at least 2× 2, but not 3× 3 or more, i.e. exactly2× 2.

The number of m×m blocks is the difference between the number of blocksat least m×m and the number of blocks at least (m+ 1)× (m+ 1), so

number of m×m blocks = (dm (λ, T )− dm−1 (λ, T ))− (dm+1 (λ, T )− dm (λ, T ))= 2 dm (λ, T )− dm−1 (λ, T )− dm+1 (λ, T ) .

(We can just define d0 (λ, T ) = 0 to allow this equation to hold even for m = 1.)Therefore the number of blocks of any size is independent of the choice ofbasis.

Working over arbitrary fields 49

Working over arbitrary fields

Over F2, the matrices(0 00 0

),

(1 00 0

),

(0 10 0

),

(1 00 1

),

(0 00 1

),

(1 10 1

)are in Jordan normal form. In fact, they are the only 2× 2 matrices that are inJordan normal form, because we only have 0’s and 1’s to put into the entries.

Over F2, consider the matrix

A =(

0 11 1

)Consider the problem of finding the Jordan normal form of A. First, thecharacteristic polynomial of A is

p(x) = det(xI −A),

= det(x −1−1 x− 1

),

but −1 = 1 in F2, since 1 + 1 = 0, so

= det(x 11 x+ 1

),

= x(x+ 1)− 1,= x2 + x+ 1.

Let’s look for roots of p(x), i.e. eigenvalues. Try x = 0:

p(0) = 02 + 0 + 1 = 1,

no good. Try x = 1:

p(1) = 12 + 1 + 1 = 1 + 1 + 1 = 1,

since 1 + 1 = 0. No good. So p(x) has no eigenvalues in F2. We immediatelyrecognize, from section 2 on page 27, that the splitting field of p(x) is

F4 = {0, 1, α, α+ 1} ,

and that p (α) = p (α+ 1) = 0. So over F4, A has eigenvalues α and α + 1,which are distinct. Each distinct eigenvalue contributes at least one linearlyindependent eigenvector, so A is diagonalizable and has Jordan normal form(

α 00 α+ 1

).


To find the basis of eigenvectors, proceed as usual. Try λ = α, and find thateigenvectors (

xy

)must satisfy

A

(xy

)= α

(xy

),

which forces preciselyy = αx and x+ y = αy.

The first equation tells us that if x = 0 then y = 0, and we want a nonzeroeigenvector. So we can assume that x 6= 0, rescale to arrange x = 1, and theny = α, so our eigenvector is (

xy

)=(

1α

).

In the same way, for λ = α+ 1, we find the eigenvector(xy

)=(

1α+ 1

).

So the matrixF =

(1 1α α+ 1

)diagonalizes A over F4:

FAF−1 =(α 00 α+ 1

)as you can easily check.

Review problems

Compute the matrix F for which F−1AF is in Jordan normal form, and theJordan normal form itself, for

4.9A =

(1 11 1

)4.10

A =

0 0 10 0 00 0 0

4.11 Thinking about the fact that ∆ has string en, en−1, . . . , e1, what is theJordan normal form of A = ∆2

n? (Don’t try to find the matrix F bringing Ato that form.)

Review problems 51

4.12 Find a matrix F so that F−1AF is in Jordan normal form, where

A =

0 1 0 00 0 1 00 0 0 1−1 0 −2 0

4.13 Find a matrix F so that F−1AF is in Jordan normal form, where

A =

0 0 1 0 00 0 0 1 00 0 0 0 10 0 0 0 00 0 0 0 0

.

4.14 Without computation (and without finding the matrix F taking A toJordan normal form), explain how you can see the Jordan normal form of1 10 100

0 20 2000 0 300

4.15 If a linear map T : V → V satisfies a complex polynomial equation f(T ) =0, show that each eigenvalue of T must satisfy the same equation.

4.16 Prove that any reordering of Jordan blocks can occur by changing thechoice of the basis we use to bring a linear map T to Jordan normal form.

4.17 Prove that every linear map is a product of two diagonalizable linearmaps.

4.18 Suppose that to each n×n matrix A we assign a number D(A), and thatD(AB) = D(A)D(B).

(a) Prove thatD(P−1AP

)= D(A)

for any matrix P .

(b) Define a function f(x) by

f(x) = D

x

11

. . .1

.

Prove that f(x) is multiplicative; i.e. f(ab) = f(a)f(b) for any numbersa and b.


(c) Prove that on any diagonal matrix

A =

a1

a2. . .

an

we have

D(A) = f (a1) f (a2) . . . f (an) .

(d) Prove that on any diagonalizable matrix A,

D(A) = f (λ1) f (λ2) . . . f (λn) ,

where λ1, λ2, . . . , λn are the eigenvalues of A (counted with multiplicity).

(e) Use the previous exercise to show that D(A) = f(det(A)) for any matrixA.

(f) In this sense, det is the unique multiplicative quantity associated to amatrix, up to composing with a multiplicative function f . What are allcontinuous multiplicative functions f? (Warning: it is a deep result thatthere are many discontinuous multiplicative functions.)

4.19 Consider the matrix

A =

0 1 00 0 11 0 1

over the finite field F2. Calculate the characteristic polynomial; you should get

p(x) = x3 + x2 + 1.

Prove that the matrix is not diagonalizable over F2, but becomes diagonalizableover the finite field F8. Hint: see problem 2.14 on page 29. If we write theroots of p(x) as α, α2, 1 + α+ α2, then calculate out a basis of eigenvectors.

Chapter 5

Decomposition and Minimal Polynomial

The Jordan normal form has an abstract version for linear maps, called the decompo-sition of a linear map.

Polynomial Division

5.1 Divide x2 + 1 into x5 + 3x2 + 4x+ 1, giving quotient and remainder.

5.2 Use the Euclidean algorithm (subsection 2 on page 23) applied to poly-nomials instead of integers, to compute the greatest common divisor r(x) ofa(x) = x4 +2x3 +4x2 +4x+4 and b(x) = x5 +x2 +2x3 +2. Find polynomialsu(x) and v(x) so that u(x)a(x) + b(x)v(x) = r(x). Find the least commonmultiple.

Given any pair of polynomials a(x) and b(x), the Euclidean algorithm writestheir greatest common divisor r(x) as

r(x) = u(x) a(x) + v(x) b(x),

a “linear combination” of a(x) and b(x). Similarly, if we have any number ofpolynomials, we can write the greatest common divisor of any pair of them asa linear combination. Pick two pairs of polynomials, and write the greatestcommon divisor of the greatest common divisors as a linear combination, etc.Keep going until you hit the greatest common divisor of the entire collection.We can unwind this process, to write the greatest common divisor of the entirecollection of polynomials as a linear combination of the polynomials themselves.

5.3 For integers 2310, 990 and 1386 (instead of polynomials) express their great-est common divisor as an “integer linear combination” of them.

The Minimal Polynomial

We are interested in equations satisfied by a linear map.

Lemma 5.1. Every linear map T : V → V on a finite dimensional vector spaceV satisfies a polynomial equation p(T ) = 0 with p(x) a nonzero polynomial.

Proof. The set of all linear maps V → V is a finite dimensional vector space,of dimension n2 where n = dimV . Therefore the elements I, T, T 2, . . . , Tn

2

cannot be linearly independent.

53

54 Decomposition and Minimal Polynomial

5.4 Prove that ∆k has zeros down the diagonal, for any integer k ≥ 1.

5.5 Prove that, for any number λ, the diagonal entries of (λI +∆)k are all λk.

Definition 5.2. The minimal polynomial of a linear map T : V → V is thesmallest degree polynomial

m(x) = xd + ad−1xd−1 + · · ·+ a0

with coefficients in K for which m(T ) = 0.

Lemma 5.3. There is a unique minimal polynomial for any linear map T : V →V on a finite dimensional vector space V . The minimal polynomial divides everyother polynomial s(x) for which s(T ) = 0. All roots of the minimal polynomialare eigenvalues.

Proof. For example, if T satisfies two polynomials, say 0 = T 3 + 3T + 1 and0 = 2T 3 + 1, then we can rescale the second equation by 1

2 to get 0 = T 3 + 12 ,

and then we have two equations which both start with T 3, so just take thedifference: 0 =

(T 3 + 3T + 1

)−(T 3 + 1

2). The point is that the T 3 terms wipe

each other out, giving a new equation of lower degree. Keep going until youget the lowest degree possible nonzero polynomial. Rescale to get the leadingcoefficient to be 1.

If s(x) is some other polynomial, and s(T ) = 0, then divide m(x) into s(x),say s(x) = q(x)m(x) + r(x), with remainder r(x) of smaller degree than m(x).But then 0 = s(T ) = q(T )m(T ) + r(T ) = r(T ), so r(x) has smaller degree thanm(x), and r(T ) = 0. But m(x) is already the smallest degree possible withoutbeing 0. So r(x) = 0, and m(x) divides s(x).

If λ is a root of m(x), say m(x) = (x− λ)p(x), but not an eigenvalue, then

p(T ) = (T − λI)−1m(T ) = 0,

but p(x) is a smaller degree polynomial than the minimal, a contradiction.

5.6 Prove that the minimal polynomial of ∆n is m(x) = xn.

5.7 Prove that the minimal polynomial of a Jordan block λ + ∆n is m(x) =(x− λ)n.

Lemma 5.4. If A and B are square matrices with minimal polynomials mA(x)and mB(x), then the matrix

C =(A 00 B

)has minimal polynomial mC(x) the least common multiple of the polynomialsmA(x) and mB(x).

Review problems 55

Proof. Calculate that

C2 =(A2 00 B2

),

etc., so for any polynomial q(x),

q(C) =(q(A) 0

0 q(B)

).

Let l(x) be the least common multiple of the polynomials mA(x) and mB(x).Then clearly l(C) = 0. So mC(x) divides l(x). But mC(C) = 0, so mC(A) = 0.Therefore mA(x) divides mC(x). Similarly, mB(x) divides mC(x). So mC(x)is the least common multiple.

Using the same proof:

Lemma 5.5. If a linear map T : V → V has invariant subspaces U and Wso that V = U ⊕W , then the minimal polynomial of T is the least commonmultiple of the minimal polynomials of T |U and T |W .

Lemma 5.6. Suppose that T : V → V is a linear map on a finite dimensionalvector space V over a field K and that all of the eigenvalues of T lie in K. Thenthe minimal polynomial m(x) is

m(x) = (x− λ1)d1 (x− λ2)d2 . . . (x− λs)ds

where λ1, λ2, . . . , λs are the eigenvalues of T and dj is no larger than the di-mension of the generalized eigenspace of λj. In fact dj is the size of the largestJordan block with eigenvalue λj in the Jordan normal form.

Proof. We can assume that T is already in Jordan normal form: the minimalpolynomial is the least common multiple of the minimal polynomials of theblocks.

Corollary 5.7. Suppose that T : V → V is a linear map on a finite dimensionalvector space V over a field K and that all of the eigenvalues of T lie in K. ThenT is diagonalizable just when it satisfies a polynomial equation s(T ) = 0 where

s(x) = (x− λ1) (x− λ2) . . . (x− λs) = 0,

for some distinct numbers λ1, λ2, . . . , λs ∈ K, which happens just when itsminimal polynomial is a product of distinct linear factors over K.

Review problems

5.8 Find the minimal polynomial of

A =(

1 23 4

).


5.9 Prove that the minimal polynomial of any 2× 2 matrix A is

m(λ) = λ2 − trAλ+ detA,

(where trA is the trace of A), unless A is a multiple of the identity matrix, sayA = c for some number c, in ehich case m(A) = λ− c.

5.10 Use Jordan normal form to prove the Cayley–Hamilton theorem: everylinear map T : V → V on a finite dimensional complex vector space satisfiesp(T ) = 0, where p(λ) = det (A− λ I) is the characteristic polynomial of A.

5.11 Prove that if A is a square matrix with real entries, then the minimalpolynomial of A has real coefficients.

5.12 If T : V → V is a linear map on a finite dimensional complex vector spaceV of dimension n, and Tn = I, prove that T is diagonalizable. Give an exampleon a real vector space to show that T need not be diagonalizable over the realnumbers.

Appendix: How the Find the Minimal Polynomial

Given a square matrix A, to find its minimal polynomial requires that we findlinear relations among powers of A. If we find a relation like A3 = I + 5A,then we can multiply both sides by A to obtain a relation A4 = A + 5A2. Inparticular, once some power of A is a linear combination of lower powers of A,then every higher power is also a linear combination of lower powers.

For each n × n matrix A, just for this appendix lets write A to mean thevector you get by chopping out the columns of A and stacking them on top ofone another. For example, if

A =(

1 23 4

),

then

A =

1324

.

Clearly a linear relation like A3 = I+5A will hold just when it holds underlined:A3 = I + 5A.

Now lets suppose that A is n× n, and lets form the matrix

B =(I A A2 . . . An

).

Clearly B has n2 rows and n columns. Apply forward elimination to B, andcall the resulting matrix U . If one of the columns, say A3, is not a pivot column,then A3 is a linear combination of lower powers of A, so therefore A4 is too,

Decomposition of a Linear Map 57

etc. So as soon as you hit a pivotless column of B, all subsequent columns arepivotless. Therefore U looks like

U =

∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗∗ ∗ ∗ ∗ ∗ ∗ ∗∗ ∗ ∗ ∗ ∗ ∗

,

pivots straight down the diagonal, until you hit rows of zeros. Cut out all ofthe pivotless columns of U except the first pivotless column. Also cut out thezero rows. Then apply back substitution, turning U into

1 a01 a1

. . ....1 ap

.

Then the minimal polynomial is m(x) = xp+1 − a0 − a1x − · · · − apxp. Tosee that this works, you notice that we have cut out all but the column ofAp+1, the smallest power of A that is a linear multiple of lower powers. Sothe minimal polynomial has to express Ap+1 as a linear combination of lowerpowers, i.e. solving the linear equations a0I + a1A + · · · + apA

p = Ap+1 = 0.These equations yield the matrix B with a0, a1, . . . , ap as the unknowns, andwe just apply elimination. On large matrices, this process is faster than findingthe determinant. But it has the danger that small perturbations of the matrixentries alter the minimal polynomial drastically, so we can only apply thisprocess when we know the matrix entries precisely.

5.13 Find the minimal polynomial of0 1 12 1 20 0 −1

.

What are the eigenvalues?

Decomposition of a Linear Map

There is a more abstract version of the Jordan normal form, applicable to anabstract finite dimensional vector space.Definition 5.8. If T : V →W is a linear map, and U is a subspace of V , recallthat the restriction T |U : U →W is defined by T |U u = Tu for u in U .


Definition 5.9. Suppose that T : V → V is a linear map from a vector spaceback to itself, and U is a subspace of V . We say that U is invariant under T ifwhenever u is in U , Tu is also in U .

A difficult result to prove by any other means:

Corollary 5.10. If a linear map T : V → V on a finite dimensional vector spaceis diagonalizable, then its restriction to any invariant subspace is diagonalizable.

Proof. The linear map satisfies the same polynomial equation, even after re-stricting to the subspace.

5.14 Prove that a linear map T : V → V on a finite dimensional vector space isdiagonalizable just when every subspace invariant under T has a complementarysubspace invariant under T .

Definition 5.11. A linear map N : V → V from a vector space back to itself iscalled nilpotent if there is some positive integer k for which Nk = 0.

Clearly a linear map on a finite dimensional vector space is nilpotent justwhen its minimal polynomial is p(x) = xk for some positive integer k.

Corollary 5.12. A linear map N : V → V is nilpotent just when the restrictionof N to any N invariant subspace is nilpotent.

5.15 Prove that the only nilpotent which is diagonalizable is 0.

5.16 Give an example to show that the sum of two nilpotents might not benilpotent.

Definition 5.13. Two linear maps S : V → V and T : V → V commute whenST = TS.

Lemma 5.14. The sum and difference of commuting nilpotents is nilpotent.

Proof. If S and T are nilpotent linear maps V → V , say Sp = 0 and T q = 0.Then take any number r ≥ p+ q and expand out the sum (S + T )r . BecauseS and T commute, every term can be written with all its S factors on the left,and all its T factors on the right, and there are r factors in all, so either p ofthe factors must be S or q must be T , hence each term vanishes.

Lemma 5.15. If two linear maps S, T : V → V commute, then S preserveseach generalized eigenspace of T , and vice versa.

Proof. If ST = TS, then clearly ST 2 = T 2S, etc. so that Sp(T ) = p(T )S forany polynomial p(T ). Suppose that T has an eigenvalue λ. If x is a generalizedeigenvector, i.e. (T − λI)px = 0 for some p, then

(T − λI)p Sx = S (T − λI)p x = 0,

so that Sx is also a generalized eigenvector with the same eigenvalue.

Decomposition of a Linear Map 59

Theorem 5.16. Take T : V → V any linear map over a field K, and supposethat all of the eigenvalues of T belong to K. Then T can be written in just oneway as a sum T = D +N with D diagonalizable, N nilpotent, and all three ofT,D and N commuting.

For T = λI +∆, set D = λI, and N = ∆.

Remark 5.17. If any two of T,D and N commute, and T = D +N , then it iseasy to check that all three commute.

Proof. First, lets prove that D and N exist, and then prove they are unique.One proof that D and N exist, which doesn’t require Jordan normal form: splitup V into the direct sum of the generalized eigenspaces of T . It is enough tofind some D and N on each one of these spaces. But on each eigenspace, saywith eigenvalue λ, we can let D = λI and let N = T − λI. So existence of Dand N is obvious.

Another proof that D and N exist, which uses Jordan normal form: picka basis in which the matrix of T is in Jordan normal form. Lets also call thismatrix T , say

T =

λ1I +∆

λ2I +∆. . .

λNI +∆

.

Let

D =

λ1I

λ2I. . .

λNI

,

and

N =

∆

∆. . .

∆

.

This proves that D and N exist.Why are D and N uniquely determined? All generalized eigenspaces of T

are D and N invariant. So we can restrict to a single generalized eigenspaceof T , and need only show that D and N are uniquely determined there. If λis the eigenvalue, then D − λI = (T − λI) − N is a difference of commutingnilpotents, so nilpotent by lemma 5.14 on the preceding page. Therefore D−λIis both nilpotent and diagonalizable, and so vanishes: D = λI and N = T −λI,uniquely determined.


Theorem 5.18. If T0 : V → V and T1 : V → V are commuting linear maps (i.e.T0T1 = T1T0) on a finite dimensional vector space V over a field K. Supposethat all of the eigenvalues of T0 and T1 belong to K. Then splitting each intoits diagonalizable and nilpotent parts, T0 = D0 + N0 and T1 = D1 + N1, anytwo of the maps T0, D0, N0, T1, D1, N1 commute.

Proof. If x is a generalized eigenvector of T0 (so that (T0 − λ)k x = 0 for some λand integer k > 0), then T1x is also (because (T0 − λ)k T1x = 0 too, by pullingthe T1 across to the left). Since V is a direct sum of generalized eigenspaces ofT0, we can restrict to a generalized eigenspace of T0 and prove the result there.So we can assume that T0 = λ0 +N0, for some complex number λ0. Switchingthe roles of T0 and T1, we can assume that T1 = λ1 +N1. Clearly D0 = λ0 andD1 = λ1 commute with one another and with anything else. The commutingof T0 and T1 is equivalent to the commuting of N0 and N1.

Review problems

5.17 Find the decomposition T = D +N of

T =(ε1 10 ε2

)(using the same letter T for the linear map and its associated matrix).

5.18 If two linear maps S : V → V and T : V → V on a finite dimensionalcomplex vector space commute, show that the eigenvalues of ST are productsof eigenvalues of S with eigenvalues of T .

Chapter 6

Matrix Functions of a Matrix Variable

We will make sense out of expressions like eT , sinT, cosT for square matrices T , andfor linear maps T : V → V . We expect that the reader is familiar with calculus andinfinite series.

Definition 6.1. A function f(x) is analytic if near each point x = x0, f(x) isthe sum of a convergent Taylor series, say

f(x) = a0 + a1 (x− x0) + a2 (x− x0)2 + . . .

We will henceforth allow the variable x (and the point x = x0 around which wetake the Taylor series) to take on real or complex values.Definition 6.2. If f(x) is an analytic function and T : V → V is a linear mapon a finite dimensional vector space, define f(T ) to mean the infinite series

f(T ) = a0 + a1 (T − x0) + a2 (T − x0)2 + . . . ,

just plugging the linear map T into the expansion.

Lemma 6.3. Under an isomorphism F : U → V of finite dimensional vectorspaces

f(F−1TF

)= F−1f(T )F,

with each side defined when the other is.

Proof. Expanding out, we see that(F−1TF

)2 = F−1T 2F . By induction,(F−1TF

)k = F−1T kF for k = 1, 2, 3, . . . . So for any polynomial function p(x),we see that p

(F−1TF

)= F−1p(T )F . Therefore the partial sums of the Taylor

expansion converge on the left hand side just when they converge on the right,approaching the same value.

Remark 6.4. If a square matrix is in square blocks,

A =(B 00 C

),

then clearly

f(A) =(f(B) 0

0 f(C)

).

So we only need to work one block at a time.

61

62 Matrix Functions of a Matrix Variable

Theorem 6.5. Let f(x) be an analytic function. If A is a single Jordan block,A = λ+∆n, and the series for f(x) converges near x = λ, then

f(A) = f(λ) + f ′(λ)∆+ f ′′(λ)2 ∆2 + f ′′(λ)

3! ∆3 + · · ·+ f (n−1)(λ)(n− 1)! ∆n−1.

Proof. Expand out the Taylor series, keeping in mind that ∆n = 0.

Corollary 6.6. The value of f(A) does not depend on which Taylor series weuse for f(x): we can expand f(x) about any point as long as the series convergeson the spectrum of A. If we change the choice of the point to expand around,the resulting expression for f(A) determines the same matrix.

Proof. Entries will be given by the formulas above, which don’t depend on theparticular choice of Taylor series, only on the values of the function f(x) for xin the spectrum of A.

Corollary 6.7. Suppose that T : V → V is a linear map of a finite dimensionalvector space. Split T into T = D+N , diagonalizable and nilpotent parts. Takef(x) an analytic function given by a Taylor series converging on the spectrumof T (which is the spectrum of D). Then

f(T ) = f(D +N)

= f(D) + f ′(D)N + f ′′(D)N2

2! + f ′′′(D)N3

3! + · · ·+ f (n−1)(D)Nn−1

(n− 1)!

where n is the dimensional of V .

Remark 6.8. In particular, the result is independent of the choice of point aboutwhich we expand f(x) into a Taylor series, as long as the series converges onthe spectrum of T .

Consider the matrix

A =(π 10 π

)= π +∆.

Then

sinA = sin (π +∆)= sin (π) + sin′ (π)∆= 0 + cos (π)∆= −∆

=(

0 −10 0

).

The Exponential and Logarithm 63

6.1∗ Find√

1 +∆, log (1 +∆), e∆.

Remark 6.9. If f(x) is analytic on the spectrum of a linear map T : V → V onfinite dimensional vector space, then we could actually define f(T ) by using adifferent Taylor series for f(x) around each eigenvalue, but that would requirea more sophisticated theory. (We could, for example, split up V into general-ized eigenspaces for T , and compute out f(T ) on each generalized eigenspaceseparately; this proves convergence.) However, we will never use such a com-plicated theory—we will only define f(T ) when f(x) has a single Taylor seriesconverging on the entire spectrum of T .

The Exponential and Logarithm

The seriesex = 1 + x+ x2

2! + x3

3! + . . .

converges for all values of x. Therefore eT is defined for any linear mapT : V → V on any finite dimensional vector space.

6.2∗ Recall that the trace of an n×n matrix A is trA = A11 +A22 + · · ·+Ann.Prove that det eA = etrA.

For any positive number r,

log (x+ r) = −∞∑k=0

1k

(−xr

)k.

For |x| < r, this series converges by the ratio test. (Clearly x canactually be real or complex.) If T : V → V is a linear map on a finitedimensional vector space, and if every eigenvalue of T has positive realpart, then we can pick r larger than the largest eigenvalue of T . Then

log T = −∞∑k=0

1k

(−T − r

r

)kconverges. The value of this sum does not depend on the value of r,since the value of log x = log (x− r + r) doesn’t.

Remark 6.10. The same tricks work for complex linear maps. We won’t ever betempted to consider f(T ) for f(x) anything other than a real-valued functionof a real variable; the reader may be aware that there is a sophisticated theoryof complex functions of a complex variable.


6.3∗ Find the Taylor series of f(x) = 1x around the point x = r (as long as

r 6= 0). Prove that f(A) = A−1, if all eigenvalues of A have positive real part.

6.4∗ If f(x) is the sum of a Taylor series converging on the spectrum of amatrix A, why are the entries of f(A) smooth functions of the entries of A? (Afunction is called “smooth” to mean that we can differentiate the function anynumber of times with respect to any of its variables, in any order.)

6.5∗ For any complex number z = x+iy, prove that ez converges to ex (cos y + i sin y).

6.6∗ Use the result of the previous exercise to prove that elogA = A if alleigenvalues of T have positive real part.

6.7∗ Use the results of the previous two exercises to prove that log eA = A ifall eigenvalues of A have imaginary part strictly between −π/2 and π/2.

Lemma 6.11. If A and B are two n × n matrices, and AB = BA, theneA+B = eAeB.

Proof. We expand out the product eAeB and collect terms. (The process pro-ceeds term by term exactly as it would if A and B were real numbers, becauseAB = BA.)

Corollary 6.12. eA is invertible for all square matrices A, and(eA)−1 = e−A.

Proof. A commutes with −A, so eAe−A = e0 = 1.

Definition 6.13. A real matrix A is skew-symmetric if At = −A. A complexmatrix A is skew-adjoint if A∗ = −A.

Corollary 6.14. If A is skew-symmetric/complex skew-adjoint then eA is or-thogonal/unitary.

Proof. Term by term in the Taylor series,(eA)t = eA

t = e−A. Similarly forthe other cases.

Lemma 6.15. If two n × n matrices A and B commute (AB = BA) andthe eigenvalues of A and B have positive real part and the products of theireigenvalues also have positive real part, then log (AB) = logA+ logB.

Proof. The eigenvalues of AB will be products of eigenvalues of A and of B, assee in section 5 on page 57. Again the result proceeds as it would for A and Bnumbers, term by term in the Taylor series.

Corollary 6.16. If A is orthogonal/unitary and all eigenvalues of A havepositive real part, then logA is skew-symmetric/complex skew-adjoint.

6.8∗ What do corollaries 6.14 and 6.16 say about the matrix

A =(

0 1−1 0

)?

The Exponential and Logarithm 65

6.9∗ What can you say about eA for A symmetric? For A self-adjoint?

If we slightly alter a matrix, we only slightly alter its spectrum.

Theorem 6.17 (Continuity of the spectrum). Suppose that A is a n×n complexmatrix, and pick some disks in the complex plane which together contain exactlyk eigenvalues of A (counting each eigenvalue by its multiplicity). In order toensure that a complex matrix B also has exactly k eigenvalues (also counted bymultiplicity) in those same disks, it is sufficient to ensure that each entry of Bis close enough to the corresponding entry of A.

Proof. Eigenvalues are the roots of the characteristic polynomial det (A− λ I).If each entry of B is close enough to the corresponding entry of A, then eachcoefficient of the characteristic polynomial of B is close to the correspondingcoefficient of the characteristic polynomial of A. The result follows by theargument principle (theorem 6.29 on page 69 in the appendix to this chapter).

Remark 6.18. The eigenvalues vary as differentiable functions of the matrixentries as well, except where eigenvalues “collide” (i.e. at matrices for whichtwo eigenvalues are equal), when there might not be any way to write theeigenvalues in terms of differentiable functions of matrix entries. In a suitablesense, the eigenvectors can also be made to depend differentiably on the matrixentries away from eigenvalue “collisions.” See Kato [3] for more information.

6.10∗ Find the eigenvalues of

A =(

0 1t 0

)as a function of t. What happens at t = 0?

6.11 Prove that if an n× n complex matrix A has n distinct eigenvalues, thenso does every complex matrix whose entries are close enough to the entries ofA.

Corollary 6.19. If f(x) is an analytic function given by a Taylor series con-verging on the spectrum of a matrix A, then f(B) is defined by the same Taylorexpansion as long as each entry of B is close enough to the corresponding entryof A.

6.12∗ Prove that a complex square matrix A is invertible just when it has theform A = eL for some square matrix L.


Appendix: Analysis of Infinite Series

Proposition 6.20. Suppose that f(x) is defined by a convergent Taylor series

f(x) =∑k

ak (x− x0)k ,

converging for x near x0. Then both of the series∑k

|ak| |x− x0|k ,∑k

kak (x− x0)k−1

converge for x near x0.

Proof. We can assume just by replacing x by x− x0 that x0 = 0. Lets supposethat our Taylor series converges for −b < x < b. Then it must converge forx = b/r for any r > 1. So the terms must get small eventually, i.e.∣∣∣∣∣ak

(b

r

)k∣∣∣∣∣→ 0.

For large enough k,|ak| <

(rb

)k.

Pick any R < r. Then

|ak|(b

R

)k<

(b

R

)k (rb

)k=( rR

)k.

Therefore if |x| ≤ b/R, we have∑|ak| |x|k ≤

∑( rR

)ka geometric series of diminishing terms, so convergent.

Similarly, ∑∣∣kakxk∣∣ ≤∑ k( rR

)kwhich converges by the comparison test.

Corollary 6.21. Under the same conditions, the series∑k

(k

`

)ak (x− x0)k−`

converges in the same domain.

Appendix: Analysis of Infinite Series 67

Proof. The same trick works.

Lemma 6.22. If f(x) is the sum of a convergent Taylor series, then f ′(x) istoo. More specifically, if

f(x) =∑

ak (x− x0)k ,

thenf ′(x) =

∑kak (x− x0)k−1

.

Proof. Letf1(x) =

∑kak (x− x0)k−1

,

which we know converges in the same open interval where the Taylor series forf(x) converges. We have to show that f1(x) = f ′(x). Pick any points x + hand x where f(x) converges. Expand out:

f (x+ h)− f (x)h

− f1 (x) =∑k ak (x+ h)k −

∑k akx

k

h−∑k

kakxk−1

=∑k

ak

((x+ h)k − xk

h− kxk−1

)

=∑k

ak

(k∑`=0

(k

`

)xk−`h`−1 − xkh−1 − kxk−1

)

= h∑k

ak

k∑`=2

(k

`

)xk−`h`−2

= h∑k

akk(k − 1)k∑`=0

1(`+ 2)(`+ 1)

(k − 2`

)xk−2−`h`.

Each term has absolute value no more than∣∣∣∣akk(k − 1)(k

`

)(x+ h)k−2

∣∣∣∣which are the terms of a convergent series. The expression

f (x+ h)− f (x)h

− f1 (x)

is governed by a convergent series multiplied by h. In particular the limit ash→ 0 is 0.

Corollary 6.23. Any function f(x) which is the sum of a convergent Taylorseries in a disk has derivatives of all orders everywhere in the interior of thatdisk, given by formally differentiating the Taylor series of f(x).

All of these tricks work equally well for complex functions of a complexvariable, as long as they are sums of Taylor series.


Appendix: Perturbing Roots of Polynomials

p

z

p

z

A point z travelling around a circle winds around each point p inside, and doesn’twind around any point p outside.

Lemma 6.24. As a point z travels counterclockwise around a circle, it travelsonce around every point inside the circle, and does not travel around any pointoutside the circle.

p

z

p

z

As z travels around the circle, the angle from p to z increases by 2π if p is inside, butis periodic if p is outside.

Remark 6.25. Lets state the result more precisely. The angle between anypoints z and p is defined only up to 2π multiples. As we will see, if z travelsaround a circle counterclockwise, and p doesn’t lie on that circle, we can selectthis angle to be a continuously varying function φ. If p is inside the circle, thenthis function increases by 2π as ztravels around the circle. If p is outside thecircle, then this function is periodic as z travels around the circle.

Proof. Rotate the picture to get p to lie on the positive x-axis, say p = (p0, 0).Scale to get the circle to be the unit circle, so z = (cos θ, sin θ). The vectorfrom z to p is

p− z = (p0 − cos θ,− sin θ) .

This vector has angle φ from the horizontal, where

(cosφ, sinφ) =(p0 − cos θ

r,− sin θ

r

)with

r =√

(p0 − cos θ)2 + sin2 θ.

If p0 > 1, then cosφ > 0 so that after adding multiples of 2π, we must have φcontained inside the domain of the arcsin function:

φ = arcsin(− sin θ

r

),

a continuous function of θ, and φ (θ + 2π) = φ (θ). This continuous function φis uniquely determined up to adding integer multiples of 2π.

Appendix: Perturbing Roots of Polynomials 69

On the other hand, suppose that p0 < 1. Consider the angle Φ betweenZ = (p0 cos θ, p0 sin θ) and P = (1, 0). By the argument above

Φ = arcsin(−p0 sin θ

r

)where

r =√

(1− p0 cos θ)2 + p20 sin2 θ.

Rotating by θ takes P to z and Z to p. Therefore the angle of the ray fromz to p is φ = Φ(θ) + θ, a continuous function increasing by 2π every time θincreases by 2π. Since Φ is uniquely determined up to adding integer multiplesof 2π, so is φ.

Corollary 6.26. Consider the complex polynomial function

P (z) = (z − p1) (z − p2) . . . (z − pn) .

Suppose that p1 lies inside some disk, and all other roots p2, p3, . . . , pn lie outsidethat disk. Then as z travels once around the boundary of that disk, the argumentof the complex number w = P (z) increases by 2π.

Proof. The argument of a product is the sum of the arguments of the factors,so the argument of P (z) is the sum of the arguments of z − p1, z − p2, etc.

Corollary 6.27. Consider the complex polynomial function

P (z) = a (z − p1) (z − p2) . . . (z − pn) .

Suppose that some roots p1, p2, . . . , pk all lie inside some disk, and all otherroots pk+1, pk+2, . . . , pn lie outside that disk. Then as z travels once around theboundary of that disk, the argument of the complex number w = P (z) increasesby 2πk.

Corollary 6.28. Consider two complex polynomial functions P (z) and Q(z).Suppose that P (z) has k roots lying inside some disk, and Q(z) has ` roots lyinginside that same disk, and all other roots of P (z) and Q(z) lie outside that disk.(So no roots of P (z) or Q(z) lie on the boundary of the disk.) Then as z travelsonce around the boundary of that disk, the argument of the complex numberw = P (z)/Q(z) increases by 2π(k − `).

Theorem 6.29 (The argument principle). If P (z) is a polynomial, with kroots inside a particular disk, and no roots on the boundary of that disk, thenevery polynomial Q(z) of the same degree as P (z) and whose coefficients aresufficiently close to the coefficients of P (z) has exactly k roots inside the samedisk, and no roots on the boundary.


Proof. To apply corollary 6.28 on the preceding page, we have only to ensurethat Q(z)/P (z) is not going to change in argument (or vanish) as we travelaround the boundary of that disk. So we have only to ensure that while zstays on the boundary of the disk, Q(z)/P (z) lies in a particular half-plane, forexample that Q(z)/P (z) is never a negative real number (or 0). So it is enoughto ensure that |P (z)−Q(z)| < |P (z)| for z on the boundary of the disk. Letm be the minimum value of |P (z)| for z on the boundary of the disk. Supposethat the furthest point of our disk from the origin is some point z with |z| = R.Then if we write out Q(z) = P (z) +

∑cjz

j , we only need to ensure that thecoefficients c0, c1, . . . , cn satisfy

∑|cj |Rj < m, to be sure that Q(z) will have

the same number of roots as P (z) in that disk.

Chapter 7

Symmetric Functions of Eigenvalues

Symmetric Functions

Definition 7.1. A function f (x1, x2, . . . , xn) is symmetric if its value is un-changed by permuting the variables x1, x2, . . . , xn.

For example, x1 + x2 + · · ·+ xn is clearly symmetric.Definition 7.2. The elementary symmetric functions are the functions

s1(x) = x1 + x2 + . . . + xn

s2(x) = x1x2 + x1x3 + · · ·+ x1xn + x2x3 + · · ·+ xn−1xn

...

sk(x) =∑

i1<i2<···<ik

xi1xi2 . . . xik .

For any (real or complex) numbers x = (x1, x2, . . . , xn) let

Px(t) = (t− x1) (t− x2) . . . (t− xn) .

Clearly the roots of Px(t) are precisely the entries of the vector x.

7.1∗ Prove that

Px(t) = tn− s1(x)tn−1 + s2(x)tn−2 + · · ·+ (−1)k sk(x)tn−k + · · ·+ (−1)nsn(x).

Definition 7.3. Let s(x) = (s1(x), s2(x), . . . , sn(x)), so that s : Rn → Rn (If wework with complex numbers, then s : Cn → Cn.)

Lemma 7.4. The map s is onto, i.e. for each complex vector w in Cn thereis a complex vector z in Cn so that s(z) = w.

Proof. Pick any w in Cn. Let z1, z2, . . . , zn be the complex roots of the poly-nomial

P (t) = tn − w1tn−1 + w2t

n−2 + · · ·+ (−1)nwn.

Such roots exist by the fundamental theorem of algebra. Clearly Pz(t) = P (t),since these polynomial functions have the same roots and same leading term.

Lemma 7.5. The entries of two vectors z and w are permutations of oneanother just when s(z) = s(w).

71

72 Symmetric Functions of Eigenvalues

Proof. The roots of Pz(t) and Pw(t) are the same numbers.

Corollary 7.6. A function is symmetric just when it is a function of theelementary symmetric functions.

Remark 7.7. This means that every symmetric function f : Cn → C has theform f(z) = h(s(z)), for a unique function h : Cn → C, and conversely if h isany function at all, then f(z) = h(s(z)) determines a symmetric function.

Theorem 7.8. A symmetric function of some complex variables is continuousjust when it is expressible as a continuous function of the elementary symmetricfunctions, and this expression is uniquely determined.

Proof. If h(z) is continuous, clearly f(z) = h(s(z)) is. If f(z) is continuous andsymmetric, then given any sequence w1, w2, . . . in Cn converging to a point w,we let z1, z2, . . . be a sequence in Cn for which s (zj) = wj , and z a point forwhich s(z) = w. The entries of zj are the roots of the polynomial

Pzj (t) = tn − wj1tn−1 + wj2tn−2 + · · ·+ (−1)nwjn.

By the argument principle (theorem 6.29 on page 69), we can rearrange theentries of each of the various z1, z2, . . . vectors so that they converge to z.Therefore h (wj) = f (zj) converges to f(z) = h(w). If there are two expressions,f(z) = h1(s(z)) and f(z) = h2(s(z)), then because s is onto, h1 = h2.

If a = (a1, a2, . . . , an), write za to mean za11 za2

2 . . . zann . Call a the weight of

the monomial za. We will order weights by “alphabetical” order, for example sothat (2, 1) > (1, 2). Define the weight of a polynomial to be the highest weightof any of its monomials. (The zero polynomial will not be assigned any weight.)The weight of a product of nonzero polynomials is the sum of the weights. Theweight of a sum is at most the highest weight of any term. The weight of sj(z)is

(1, 1, . . . , 1︸︷︷︸j

, 0, 0, . . . , 0).

Theorem 7.9. Every symmetric polynomial f has exactly one expression as apolynomial in the elementary symmetric polynomials. If f has real/rational/integercoefficients, then f is a real/rational/integer coefficient polynomial of the ele-mentary symmetric polynomials.

Proof. For any monomial za, let

za =∑p

zp(a)

a sum over all permutations p. Every symmetric polynomial, if it containsa monomial za, must also contain zp(a), for any permutation p. Hence everysymmetric polynomial is a sum of za polynomials. Consequently the weight aof a symmetric polynomial f must satisfy a1 ≥ a2 ≥ · · · ≥ an. We have only

Symmetric Functions 73

to write the za in terms of the elementary symmetric functions, with integercoefficients. Let bn = an, bn−1 = an−1−bn, bn−2 = an−2−bn−1, . . . , b1 = a1−b2.Then s(z)b has leading monomial za, so za − s(z)b has lower weight. Applyinduction on the weight.

z21 +z2

2 = (z1 + z2)2−2 z1z2. To compute out these expressions: f(z) =z2

1 +z22 has weight (2, 0). The polynomials s1(z) and s2(z) have weights

(1, 0) and (1, 1). So we subtract off the appropriate weights of s1(z)2

from f(z), and find f(z)− s1(z)2 = −2 z1z2 = −2s2(z).

Sums of powers

Define pj(z) = zj1 + zj2 + · · ·+ zjn, the sums of powers.

Lemma 7.10. The sums of powers are related to the elementary symmetricfunctions by

0 = k sk − p1 sk−1 + p2 sk−2 − · · ·+ (−1)k−1pk−1 s1 + (−1)kpk.

Proof. Lets write z(`) for z with the `-th entry removed, so if z is a vector inCn, then z(`) is a vector in Cn−1.

pjsk−j =∑`

zj`

∑i1<i2<···<ik−j

zi1zi2 . . . zik−j

Either we can’t pull a z` factor out of the second sum, or we can:

=∑`

zj`

i1,i2,···6=`∑i1<i2<···<ik−j

zi1zi2 . . . zik−j+∑`

zj+1`

i1,i2,···6=`∑i1<i2<···<ik−j−1

zi1zi2 . . . zik−j−1

=∑`

zj`sk−j

(z(`))

+∑`

zj+1` sk−j−1

(z(`)).

Putting in successive terms of our sum,

pjsk−j − pj+1sk−j−1 =∑`

zj`sk−j

(z(`))

+∑`

zj+1` sk−j−1

(z(`))

−∑`

zj+1` sk−j−1

(z(`))−∑`

zj+2` sk−j−2

(z(`))

=∑`

zj`sk−j

(z(`))−∑`

zj+2` sk−j−2

(z(`)).

Hence the sum collapses to

p1sk − p2sk−1 + · · ·+ (−1)k−1pk−1s1 =∑`

z`sk−1

(z(`))

+ (−1)k−1∑`

zk` · s0

(z(`))

= k sk + (−1)k−1pk.

74 Symmetric Functions of Eigenvalues

Proposition 7.11. Every symmetric polynomial is a polynomial in the sumsof powers. If the coefficients of the symmetric polynomial are real (or rational),then it is a real (or rational) polynomial function of the sums of powers. Everycontinuous symmetric function of complex variables is a continuous function ofthe sums of powers.

Proof. We can solve recursively for the sums of powers in terms of the elementarysymmetric functions and conversely.

Remark 7.12. The standard reference on symmetric functions is [5].

The Invariants of a Square Matrix

Definition 7.13. A complex-valued or real-valued function f(A), depending onthe entries of a square matrix A, is an invariant if f

(FAF−1) = f(A) for any

invertible matrix F .So an invariant is independent of change of basis. If T : V → V is a linear

map on an n-dimensional vector space, we can define the value f(T ) of anyinvariant f of n × n matrices, by letting f(T ) = f(A) where A is the matrixassociated to FTF−1, for any isomorphism F : Rn → V .

For any n× n matrix A, write

det(A− λ) = sn(A)− sn−1(A)λ+ sn−2(A)λ2 + · · ·+ (−1)nλn.

The functions s1(A), s2(A), . . . , sn(A) are invariants.

The functionspk(A) = tr

(Ak).

are invariants.

7.2 If A is diagonal, say

A =

z1

z2. . .

zn

,

then prove that sj(A) = sj (z1, z2, . . . , zn), the elementary symmetric functionsof the eigenvalues.

7.3∗ Generalize the previous exercise to A diagonalizable.

The Invariants of a Square Matrix 75

Theorem 7.14. Each continuous (or polynomial) invariant function of a com-plex matrix has exactly one expression as a continuous (or polynomial) functionof the elementary symmetric functions of the eigenvalues. Each polynomialinvariant function of a real matrix has exactly one expression as a polynomialfunction of the elementary symmetric functions of the eigenvalues.

Remark 7.15. We can replace the elementary symmetric functions of the eigen-values by the sums of powers of the eigenvalues.

Proof. Every continuous invariant function f(A) determines a continuous func-tion f(z) by setting

A =

z1

z2. . .

zn

.

Taking F any permutation matrix, invariance tells us that f(FAF−1) = f(A).

But f(FAF−1) is given by applying the associated permutation to the entries of

z. Therefore f(z) is a symmetric function. If f(A) is continuous (or polynomial)then f(z) is too. Therefore f(z) = h(s(z)), for some continuous (or polynomial)function h; so f(A) = h(s(A)) for diagonal matrices. By invariance, the same istrue for diagonalizable matrices. If we work with complex matrices, then everymatrix can be approximated arbitrarily closely by diagonalizable matrices (bytheorem 4.4 on page 48). Therefore by continuity of h, the equation f(A) =h(s(A)) holds for all matrices A.

For real matrices, the equation only holds for those matrices whose eigen-values are real. However, for polynomials this is enough, since two polynomialfunctions equal on an open set must be equal everywhere.

Remark 7.16. Consider the function f(A) = sj (|λ1| , |λ2| , . . . , |λn|), where Ahas eigenvalues λ1, λ2, . . . , λn. This function is a continuous invariant of a realmatrix A, and is not a polynomial in λ1, λ2, . . . , λn.

Chapter 8

The Pfaffian

Skew-symmetric matrices have a surprising additional polynomial invariant, called thePfaffian, but it is only invariant under rotations, and only exists for skew-symmetricmatrices with an even number of rows and columns.

Skew-Symmetric Normal Form

Theorem 8.1. For any skew-symmetric matrix A with an even number of rowsand columns, there is a rotation matrix F so that

FAF−1 =

0 a1−a1 0

0 a2−a2 0

. . .0 an−an 0

.

(We can say that F brings A to skew-symmetric normal form.) If A is any skew-symmetric matrix with an odd number of rows and columns, we can arrangethe same equation, again via a rotation matrix F , but the normal form has anextra row of zeroes and an extra column of zeroes:

FAF−1 =

0 a1−a1 0

0 a2−a2 0

. . .0 an−an 0

0

.

Proof. Because A is skew-symmetric, A is skew-adjoint, so normal when thoughtof as a complex matrix. So there is a unitary basis ofC2n of complex eigenvectorsof A. If λ is an complex eigenvalue of A, with complex eigenvector z, scale z

77

78 The Pfaffian

to have unit length, and then

λ = λ 〈z, z〉= 〈λz, z〉= 〈Az, z〉= −〈z,Az〉= −〈z, λz〉= −λ 〈z, z〉= −λ.

Therefore λ = −λ, i.e. λ has the form ia for some real number a. So there aretwo different possibilities: λ = 0 or λ = ia with a 6= 0.

If λ = 0, then z lies in the kernel, so if we write z = x + iy then both xand y lie in the kernel. In particular, we can write a real orthonormal basisfor the kernel, and then x and y will be real linear combinations of those basisvectors, and therefore z will be a complex linear combination of those basisvectors. Lets take u1, u2, . . . , us to be a real orthonormal basis for the kernelof A, and clearly then the same vectors u1, u2, . . . , un form a complex unitarybasis for the complex kernel of A.

Next lets take care of the nonzero eigenvalues. If λ = ia is a nonzeroeigenvalue, with unit length eigenvector z, then taking complex conjugates onthe equation Az = λz = iaz, we find Az = −iaz, so z is another eigenvectorwith eigenvalue −ia. So they come in pairs. Since the eigenvalues ia and −iaare distinct, the eigenvectors z and z must be perpendicular. So we can alwaysmake a new unitary basis of eigenvectors, throwing out any λ = −ia eigenvectorand replacing it with z if needed, to ensure that for each eigenvector z in ourunitary basis of eigenvectors, z also belongs to our unitary basis. Moreover, wehave three equations: Az = iaz, 〈z, z〉 = 1, and 〈z, z〉 = 0. Write z = x + iywith x and y real vectors, and expand out all three equations in terms of xand y to find Ax = −ay,Ay = ax, 〈x, x〉 + 〈y, y〉 = 1, 〈x, x〉 − 〈y, y〉 = 0 and〈x, y〉 = 0. So if we let X = 1√

2x and Y = 1√2y, then X and Y are unit vectors,

and AX = −aY and AY = aX.Now if we carry out this process for each eigenvalue λ = ia with a > 0,

then we can write down vectors X1, Y1, X2, Y2, . . . , Xt, Yt, one pair for eacheigenvector from our unitary basis with a nonzero eigenvalue. These vectorsare each unit length, and each Xi is perpendicular to each Yi. We also haveAXi = −aiYi and AYi = aiXi.

If zi and zj are two different eigenvectors from our original unitary basisof eigenvectors, and their eigenvalues are λi = iai and λj = iaj with ai, aj >0, then we want to see why Xi, Yi, Xj and Yj must be perpendicular. Thisfollows immediately from zi, zi, zj and zj being perpendicular, by just expandinginto real and imaginary parts. Similarly, we can see that u1, u2, . . . , us areperpendicular to each Xi and Yi. So finally, we can let

F =(X1 Y1 X2 Y2 . . . Xt Yt u1 u2 . . . us

).

Partitions 79

Clearly these vectors form a real orthonormal basis, so F is an orthogonal matrix.We want to arrange that F be a rotation matrix. Lets suppose that F is not arotation. We can either change the sign of one of the vectors u1, u2, . . . , us (ifthere are any), or replace X1 by −X1, which switches the sign of a1, to makeF orthogonal.

Partitions

A partition of the numbers 1, 2, . . . , 2n is a choice of division of these numbersinto pairs. For example, we could choose to partition 1, 2, 3, 4, 5, 6 into

{4, 1} , {2, 5} , {6, 3} .

This is the same partition if we write the pairs down in a different order, like

{2, 5} , {6, 3} , {4, 1} ,

or if we write the numbers inside each pair down in a different order, like

{1, 4} , {5, 2} , {6, 3} .

It isn’t really important that the objects partitioned be numbers. Of course,you can’t partition an odd number of objects into pairs. Each permutation pof the numbers 1, 2, 3, . . . , 2n has an associated partition

{p(1), p(2)} , {p(3), p(4)} , . . . , {p(2n− 1), p(2n)} .

For example, the permutation 3, 1, 4, 6, 5, 2 has associated partition

{3, 1} , {4, 6} , {5, 2} .

Clearly two different permutations p and q could have the same associatedpartition, i.e. we could first transpose various of the pairs of the partition ofp, keeping each pair in order, and then transpose entries within each pair, butnot across different pairs. Consequently, there are n!2n different permutationsassociating the same partition: n! ways to permute pairs, and 2n ways toswap the order within each pair. When you permute a pair, like changing3, 1, 4, 6, 5, 2 to 4, 6, 3, 1, 5, 2, this is the effect of a pair of transpositions (one topermute 3 and 4 and another to permute 5 and 6), so has no effect on signs.Therefore if two permutations have the same partition, the root cause of anydifference in sign must be from transpositions inside each pair. For example,while it is complicated to find the signs of the permutations 3, 1, 4, 6, 5, 2 andof 4, 6, 1, 3, 5, 2, it is easy to see that these signs must be different.

On the other hand, we can write each partition in “alphabetical order”, likefor example rewriting

{4, 1} , {2, 5} , {6, 3}

as{1, 4} , {2, 5} , {3, 6}

80 The Pfaffian

so that we put each pair in order, and then order the pairs among one anotherby their lowest elements. This in term determines a permutation, called thenatural permutation of the partition, given by putting the elements in thatorder; in our example this is the permutation

1, 4, 2, 5, 3, 6.

We write the sign of a permutation p as sgn(p), and define the sign of a par-tition P to be sign of its natural permutation. Watch out: if we start with apermutation p, like 6, 2, 4, 1, 3, 5, then the associated partition P is

{6, 2} , {4, 1} , {3, 5} .

This is the same partition as

{1, 4} , {2, 6} , {3, 5}

(just written in alphabetical order). The natural permutation q of P is therefore

1, 4, 2, 6, 3, 5,

so the original permutation p is not the natural permutation of its associatedpartition.

8.1 How many partitions are there of the numbers 1, 2, . . . , 2n?

8.2 Write down all of the partitions of

(a) 1, 2;

(b) 1, 2, 3, 4;

(c) 1, 2, 3, 4, 5, 6.

The Pfaffian

We want to write down a square root of the determinant.

If A is a 2× 2 skew-symmetric matrix,

A =(

0 a−a 0

),

then detA = a2, so the entry a = A12 is a polynomial function of theentries of A, which squares to detA.

The Pfaffian 81

A huge calculation shows that if A is a 4 × 4 skew-symmetric matrix,then

detA = (A12A34 −A13A24 +A14A23)2.

So A12A34 − A13A24 + A14A23 is a polynomial function of the entriesof A which squares to detA.

Definition 8.2. For any 2n× 2n skew-symmetric matrix A, let

Pf A = 1n!2n

∑p

sgn(p)Ap(1)p(2)Ap(3)p(4) . . . Ap(2n−1)p(2n)

where sgn(p) is the sign of the permutation p. Pf is called the Pfaffian.Remark 8.3. Don’t ever try to use this horrible formula to compute a Pfaffian.We will find better way soon.

Lemma 8.4. For any 2n× 2n skew-symmetric matrix A,

Pf A =∑P

sgn(P )Ap(1)p(2)Ap(3)p(4) . . . Ap(2n−1)p(2n),

where the sum is over partitions P and the permutation p is the natural permu-tation of the partition P . In particular, Pf A is an integer coefficient polynomialof the entries of the matrix A.

Proof. Each permutation p has an associated partition P . So we can write thePfaffian as a sum

Pf A = 1n!2n

∑P

∑p

sgn(p)Ap(1)p(2)Ap(3)p(4) . . . Ap(2n−1)p(2n)

where the first sum is over all partitions P , and the second over all permutationsp which have P as their associated partition. But if two permutations p and qboth have the same associated partition

{p(1), p(2)} , {p(3), p(4)} , . . . , {p(2n− 1), p(2n)} ,

then p and q give the same pairs of indices in the expression Ap(1)p(2)Ap(3)p(4) . . . Ap(2n−1)p(2n).Perhaps some of the indices in these pairs might be reversed. For example, wemight have partition P being

{1, 5} , {2, 6} , {3, 4} ,

and permutations p being 1, 5, 2, 6, 3, 4 and q being 5, 1, 2, 6, 3, 4. The contribu-tion to the sum coming from p is

sgn(p)A15A26A34,

82 The Pfaffian

while that from q issgn(q)A51A26A34.

But then A51 = −A15, a sign change which is perfectly offset by the sign sgn(q):each transposition of pairs gives a sign change from sgn(p).

So put together, we find that for any two permutations p and q with thesame partition, their contributions are the same:

sgn(q)Aq(1)q(2)Aq(3)q(4) . . . Aq(2n−1)q(2n) = sgn(p)Ap(1)p(2)Ap(3)p(4) . . . Ap(2n−1)p(2n).

Therefore the n!2n permutations with associated partition P all contribute thesame amount as the natural permutation of P .

Rotation Invariants of Skew-Symmetric Matrices

Theorem 8.5.Pf2A = detA.

Moreover, for any 2n× 2n matrix B,

Pf(BABt

)= Pf(A) det(B).

If A is in skew-symmetric normal form, say

A =

0 a1−a1 0

0 a2−a2 0

. . .0 an−an 0

,

thenPf A = a1a2 . . . an.

Proof. Lets start by proving that Pf A = a1a2 . . . an for A in skew-symmetricnormal form. Looking at the terms that appear in the Pfaffian, we find thatat least one of the factors Ap(2j−1)p(2j) in each term will vanish unless thesefactors come from among the entries A1,2, A3,4, . . . , A2n−1,2n. So all termsvanish, except when the partition is (in alphabetical order)

{1, 2} , {3, 4} , . . . , {2n− 1, 2n} ,

yieldingPf A = a1a2 . . . an.

In particular, Pf2A = detA.

Rotation Invariants of Skew-Symmetric Matrices 83

For any 2n× 2n matrix B,

n!2n Pf(BABt

)=∑p

sgn(p)(BABt

)p(1)p(2)

(BABt

)p(3)p(4) . . .

(BABt

)p(2n−1)p(2n)

=∑p

sgn(p)(∑i1i2

Bp(1)i1Ai1i2Bp(2)i2

∑i3i4

Bp(3)i3Ai3i4Bp(4)i4

· · ·∑

i2n−1i2n

Bp(2n−1)i2n−1Ai2n−1i2nBp(2n)i2n

)

=∑

i1i2...i2n

(∑p

sgn(p)Bp(1)i1Bp(2)i2Bp(3)i3 . . . Bp(2n)i2n

)Ai1i2Ai3i4 . . . Ai2n−1i2n

=∑

i1i2...i2n

det(Bei1 Bei2 . . . Bei2n

)Ai1i2Ai3i4 . . . Ai2n−1i2n

If i1 = i2 then two columns are equal inside the determinant, so we can writethis as a sum over permutations:

=∑q

det(Beq(1) Beq(2) . . . Beq(2n)

)Aq(1)q(2)Aq(3)q(4) . . . Aq(2n−1)q(2n)

=∑q

sgn(q) det(Be1 Be2 . . . Be2n

)Aq(1)q(2)Aq(3)q(4) . . . Aq(2n−1)q(2n)

= n!2n detB Pf A.

Finally, to prove that Pf2A = detA, we just need to get A into skew-symmetricnormal form via a rotation matrixB, and then Pf2A = Pf2 (BABt) = det (BABt) =detA.

How do you calculate the Pfaffian in practice? It is like the determinant,except that you start running your finger down the first column under thediagonal, and you write down −,+,−,+, . . . in front of each entry from thefirst column, and then the Pfaffian you get by crossing out that row and column,

84 The Pfaffian

and symmetrically the corresponding column and row. So

Pf

0 −2 1 32 0 8 −4−1 −8 0 5−3 4 −5 0

= −(2) · Pf

0 −2 1 32 0 8 −4−1 −8 0 5−3 4 −5 0

+ (−1) · Pf

0 −2 1 32 0 8 −4−1 −8 0 5−3 4 −5 0

− (−3) · Pf

0 −2 1 32 0 8 −4−1 −8 0 5−3 4 −5 0

= −(2) · 5 + (−1) · (−4)− (−3) · 8.

Lets prove that this works:

Lemma 8.6. If A is a skew-symmetric matrix with an even number of rowsand columns, larger than 2× 2, then

Pf A = −A21 Pf A[21] +A31 Pf A[31] − . . .

=∑j>1

(−1)i+1Ai1 Pf A[i1],

where A[ij] is the matrix A with rows i and j and columns i and j removed.

Proof. Lets define a polynomial P (A) in the entries of a skew-symmetric matrixA (with an even number of rows and columns) by setting P (A) = Pf A if A is2× 2, and setting

P (A) =∑i>1

(−1)i+1Ai1P(A[i1]

),

for larger A. We need to show that P (A) = Pf A. Clearly P (A) = Pf A if A isin skew-symmetric normal form. Each term in Pf A corresponds to a partition,and each partition must put 1 into one of its pairs, say in a pair {1, i}. It thencan’t use 1 or i in any other pair. Clearly P (A) also has exactly one factor likeAi1 in each term, and then no other factors get to have i or 1 as subscripts.Moreover, all terms in P (A) and in Pf A have a coefficient of 1 or −1. So it isclear that the terms of P (A) and of Pf A are the same, up to sign. We have tofix the signs.

Suppose that we swap rows 2 and 3 ofA and columns 2 and 3. Lets show thatthis changes the signs of P (A) and of Pf A. For Pf A, this is immediate from


theorem 8.5 on page 82. Let Q be the permutation matrix of the transpositionof 2 and 3. (To be more precise, let Qn be the n × n permutation matrix ofthe transposition of 2 and 3, for any value of n ≥ 3. But lets write all suchmatrices Qn as Q.) Let B = QAQt, i.e. A with rows 2 and 3 and columns 2and 3 swapped. So Bij is just Aij unless i or j is either 2 or 3. So

P (B) = −B21P(B[21]

)+B31P

(B[31]

)−B41P

(B[41]

)+ . . .

= −A31P(A[31]

)+A21P

(A[21]

)−A41P

(QA[41]Qt

)By induction, the sign changes in the last term.

= +A21P(A[21]

)−A31P

(A[31]

)+A41P

(A[41]

)= −P (A).

So swapping rows 2 and 3 changes a sign. In the same way,

P(QAQt

)= sgn qP (A),

for Q the permutation matrix of any permutation q of the numbers 2, 3, . . . , 2n.If we start withA in skew-symmetric normal form, letting the numbers a1, a2, . . . , anin the skew-symmetric normal form be some abstract variables, then Pf A isjust a single term of the Pf and of P and these terms have the same sign.All of the terms of Pf are obtained by permuting indices in this term, i.e. asPf (QAQt) for suitable permutation matrices Q. Indeed you just need to take Qthe permutation matrix of the natural permutation of each partition. Thereforethe signs of Pf and of P are the same for each term, so P = Pf.

8.3∗ Prove that the odd degree elementary symmetric functions of the eigen-values vanish on any skew-symmetric matrix.

Let s1(a), . . . , sn(a) be the usual symmetric functions of some numbersa1, a2, . . . , an. For any vector a let t(a) be the vector

s1(a2

1, a22, . . . , a

2n

)s2(a2

1, a22, . . . , a

2n

)...

sn−1(a2

1, a22, . . . , a

2n

)a1a2 . . . an

.

Lemma 8.7. Two complex vectors a and b in Cn satisfy t(a) = t(b) just whenb can be obtained from a by permutation of entries and changing signs of aneven number of entries. A function f(a) is invariant under permutations andeven numbers of sign changes just when f(a) = h(t(a)) for some function h.

Proof. Clearly sn(a21, a

22, . . . , a

2n) = (a1a2 . . . an)2 = tn(a)2. In particular, the

symmetric functions of a21, a

22, . . . , a

2n are all functions of t1, t2, . . . , tn. Therefore

86 The Pfaffian

if we have two vectors a and b with t(a) = t(b), then a21, a

22, . . . , a

2n are a permuta-

tion of b21, b22, . . . , b2n. So after permutation, a1 = ±b1, a2 = ±b2, . . . , an = ±bn,equality up to some sign changes. Since we also know that tn(a) = tn(b), wemust have a1a2 . . . an = b1b2 . . . bn. If none of the ai vanish, then a1a2 . . . an =b1b2 . . . bn ensures that none of the bi vanish either, and that the number ofsign changes is even. It is possible that one of the ai vanish, in which case wecan change its sign as we like to arrange that the number of sign changes iseven.

Lemma 8.8. Two skew-symmetric matrices A and B with the same evennumbers of rows and columns can be brought one to another, say B = FAF t,by some rotation matrix F , just when they have skew-symmetric normal forms

0 a1−a1 0

0 a2−a2 0

. . .0 an−an 0

and

0 b1−b1 0

0 b2−b2 0

. . .0 bn−bn 0

respectively, with t(a) = t(b).

Proof. If we have a skew-symmetric normal form for a matrix A, with num-bers a1, a2, . . . , an as above, then t1(a), t2(a), . . . , tn−1(a) are the elementarysymmetric functions of the squares of the eigenvalues, while tn(a) = Pf A, soclearly t(a) depends only on the invariants of A under rotation. In particu-lar, suppose that I find two different skew-symmetric normal forms, one withnumbers a1, a2, . . . , an and one with numbers b1, b2, . . . , bn. Then the numbersb1, b2, . . . , bn must be given from the numbers a1, a2, . . . , an by permutationand switching of an even number of signs. In fact we can attain these changesby actual rotations as follows.

For example, think about 4 × 4 matrices. The permutation matrix F of3, 4, 1, 2 permutes the first two and second two basis vectors, and is a rotationbecause the number of transpositions is even. When we replace A by FAF t, weswap a1 with a2. Similarly, we can take the matrix F which reflects e1 and e3,changing the sign of a1 and of a2. So we can clearly carry out any permutations,and any even number of sign changes, on the numbers a1, a2, . . . , an.


Lemma 8.9. Any polynomial in a1, a2, . . . , an can be written in only one wayas h(t(a)).

Proof. Recall that every complex number has a square root (a vector with halfthe argument and the square root of the modulus). Clearly 0 has only 0 assquare root, while all other complex numbers z have two square roots, whichwe write as ±

√z.

Given any complex vector b, I can solve t(a) = b by first constructing asolution c to

s(c) =

b1b2...

bn−1b2n

,

and then letting aj = ±√cj . Clearly t(a) = b unless tn(a) has the wrong sign.If we change the sign of one of the aj then we can fix this. So t : Cn → Cn isonto.

Theorem 8.10. Each polynomial invariant of a skew-symmetric matrix witheven number of rows and columns can be expressed in exactly one way as apolynomial function of the even degree symmetric functions of the eigenvaluesand the Pfaffian. Two skew-symmetric matrices A and B with the same evennumbers of rows and columns can be brought one to another, say B = FAF t,by some rotation matrix F , just when their even degree symmetric functionsand their Pfaffian agree.

Proof. If f(A) is a polynomial invariant under rotations, i.e. f (FAF t) = f(A),then we can write f(A) = h(t(a)), with a1, a2, . . . , an the numbers in the skew-symmetric normal form of A, and h some function. Lets write the restrictionof f to the normal form matrices as as polynomial f(a). We can split f into asum of homogeneous polynomials of various degrees, so lets assume that f isalready homogeneous of some degree. We can pick any monomial in f and sumit over permutations and over changes of signs of any even number of variables,and f will be a sum over such quantities. So we only have to consider eachsuch quantity, i.e. assume that

f =∑

(±a1)dp(1) (±a2)dp(2) . . . (±an)dp(n)

where the sum is over all choices of any even number of minus signs and allpermutations p of the degrees d1, d2, . . . , dn. If all degrees d1, d2, . . . , dn are even,then f is an elementary symmetric function of a2

1, a22, . . . , a

2n, so a polynomial

in t1(a), t2, (a), . . . , tn−1(a). If all degrees are odd, then they are all positive,and we can divide out a factor of a1a2 . . . an = tn(a). So lets assume that atleast one degree is even, say d1, and that at least one degree is odd, say d2. Allterms in equation 8 that put a plus sign in front of a1 and a2 cancel those termswhich put a minus sign in front of both a1 and a2. Similarly, terms putting

88 The Pfaffian

a minus sign in front of a1 and a plus sign in front of a2 cancel those whichdo the opposite. So f = 0. Consequently, invariant polynomial functions f(A)are polynomials in the Pfaffian and the symmetric functions of the squaredeigenvalues.

The characteristic polynomial of an n×n skew-symmetric matrix A is clearly

det (A− λ I) =(λ+ a2

1) (λ+ a2

2). . .(λ+ a2

n

),

so thats2j(A) = sj

(a2

1, a22, . . . , a

2n

), s2j−1(A) = 0,

for any j = 1, . . . , n. Consequently, invariant polynomial functions f(A) arepolynomials in the Pfaffian and the even degree symmetric functions of theeigenvalues.

The Fast Formula for the Pfaffian

Since Pf (BABt) = detB Pf A, we can find the Pfaffian of a large matrix by asort of Gaussian elimination process, picking B to be a permutation matrix, ora strictly lower triangular matrix, to move A one step towards skew-symmetricnormal form. Careful:

8.4∗ Prove that replacing A by BABt, with B a permutation matrix, permutesthe rows and columns of A.

8.5∗ Prove that if B is a strictly lower triangular matrix which adds a multipleof, say, row 2 to row 3, then BABt is A with that row addition carried out,and with the same multiple of column 2 added to column 3.

We leave the reader to formulate the obvious notion of Gaussian eliminationof skew-symmetric matrices to find the Pfaffian.

Factorizations

Chapter 9

Dual Spaces

In this chapter, we learn how to manipulate whole vector spaces, rather than justindividual vectors. Out of any abstract vector space, we will construct some newvector spaces, giving algebraic operations on vector spaces rather than on vectors.

The Vector Space of Linear Maps Between Two Vector Spaces

If V and W are two vector spaces, then a linear map T : V → W is alsooften called a homomorphism of vector spaces, or a homomorphism for short,or a morphism to be even shorter. We won’t use this terminology, but we willnevertheless write Hom (V,W ) for the set of all linear maps T : V →W .

Definition 9.1. A linear map is onto if every output w in W comes from someinput: w = Tv, some input v in V . A linear map is 1-to-1 if any two distinctvectors v1 6= v2 get mapped to distinct vectors Tv1 6= Tv2.

9.1 Turn Hom (V,W ) into a vector space.

9.2 Prove that a linear map is an isomorphism just when it is 1-to-1 and onto.

9.3 Give the simplest example you can of a 1-to-1 linear map which is not onto.

9.4 Give the simplest example you can of an onto linear map which is not1-to-1.

9.5 Prove that a linear map is 1-to-1 just when its kernel consists precisely inthe zero vector.

The Dual Space

The simplest possible real vector space is K.

Definition 9.2. If V is a vector space, let V ∗ = Hom (V,K), i.e. V ∗ is the setof linear maps T : V → K, i.e. the set of real-valued linear functions on V . Wecall V ∗ the dual space of V .

We will usually write vectors in V with Roman letters, and vectors in V ∗with Greek letters. The vectors in V ∗ are often called covectors.

91

92 Dual Spaces

If V = Kn, every linear function looks like

α(x) = a1x1 + a2x2 + · · ·+ anxn.

We can write this as

α(x) =(a1 a2 . . . an

)x1x2...xn

.

So we will identify Kn∗ with the set of row matrices. We will writee1, e2, . . . , en for the obvious basis: ei is the i-th row of the identitymatrix.

9.6 Why is V ∗ a vector space?

9.7 What is dimV ∗?

Remark 9.3. V and V ∗ have the same dimension, but we should think of themas quite different vector spaces.

Lemma 9.4. Suppose that V is a vector space with basis v1, v2, . . . , vn. Pick anynumbers a1, a2, . . . , an ∈ K. There is precisely one linear function f : V → Kso that

f (v1) = a1 and f (v2) = a2 and . . . and f (vn) = an.

Proof. The equations above uniquely determine a linear function f , by theo-rem 1.13 on page 11, since we have defined the linear function on a basis.

Lemma 9.5. Suppose that V is a vector space with basis v1, v2, . . . , vn. Thereis a unique basis for V ∗, called the basis dual to v1, v2, . . . , vn, which we willwrite using Greek letters, say as ξ1, ξ2, . . . , ξn, so that

ξi (vj) ={

1 if i = j ,

0 if i 6= j .

Remark 9.6. The hard part is getting used to the notation: ξ1, ξ2, . . . , ξn areeach a linear function taking vectors from V to numbers: ξ1, ξ2, . . . , ξn : V → K.

Proof. For each fixed i, the equations uniquely define ξi as above. The functionsξ1, ξ2, . . . , ξn are linearly independent, because if they satisfy a linear relation∑aiξi = 0, then applying the linear function

∑aiξi to the basis vector vj

must get zero, but we find that we get aj , so aj = 0, and this holds foreach j, so all numbers a1, a2, . . . , an vanish. The linear functions ξ1, ξ2, . . . , ξnspan V ∗ because, if we have any linear function f on V , then we can set

The Dual Space 93

a1 = f (v1) , a2 = f (v2) , . . . , an = f (vn), and find f(v) =∑ajξj(v) for

v = v1 or v = v2, etc., and therefore for v any linear combination of v1, v2,etc. Therefore f =

∑ajξj , and we see that these functions ξ1, ξ2, . . . , ξn span

V ∗.

9.8 Find the dual basis ξ1, ξ2, ξ3 to the basis

v1 =

100

, v2 =

120

, v3 =

123

∈ K3.

9.9 Recall that we identified each vector in Kn∗ with a row matrix. Supposethat v1, v2, . . . , vn is a basis for Kn. Let ξ1, ξ2, . . . , ξn be the dual basis. LetF be the matrix whose columns are v1, v2, . . . , vn, and G be the matrix whoserows are ξ1, ξ2, . . . , ξn. Prove that G = F−1.

Lemma 9.7. For any finite dimensional vector space V , V ∗∗ and V are iso-morphic, by associating to each vector x from V the linear function fx fromV ∗∗ defined by

fx(α) = α(x).

Remark 9.8. This lemma is very confusing, but very simple, and therefore veryimportant.

Proof. First, lets ask what V ∗∗ means. Its vectors are linear functions on V ∗,by definition. Next, lets pick a vector x in V and construct a linear functionon V ∗. How? Take any covector α in V ∗, and lets assign to it some numberf(α). Since α is (by definition again) a linear function on V , α(x) is a number.Lets take the number f(α) = α(x). Lets call this function f = fx. The rest ofthe proof is a series of exercises.

9.10 Check that fx is a linear function.

9.11 Check that the map T (x) = fx is a linear map T : V → V ∗∗.

9.12 Check that T : V → V ∗∗ is one-to-one: i.e. if we pick two different vectorsx and y in V , then fx 6= fy.

Remark 9.9. Although V and V ∗∗ are identified as above, V and V ∗ cannot beidentified in any “natural manner,” and should be thought of as different.Definition 9.10. If T : V → W is a linear map, write T ∗ : W ∗ → V ∗ for thelinear map given by

T ∗(α)(v) = α(Tv).Call T ∗ the transpose of T .

9.13 Prove that T ∗ : W ∗ → V ∗ is a linear map.

9.14 What does this notion of transpose have to do with the notion of transposeof matrices?

Chapter 10

Singular Value Factorization

We will analyse statistical data, using the spectral theorem.

Principal Components

Consider a large collection of data coming in from some kind of measuringequipment. Lets suppose that the data consists in a large number of vectors,say vectors v1, v2, . . . , vN in Rn. How can we get a good rough description ofwhat these vectors look like?

We can take the mean of the vectors

µ = 1N

(v1 + v2 + · · ·+ vN ) ,

as a good description of where they lie. How do they arrange themselvesaround the mean? To keep things simple, lets subtract the mean from eachof the vectors. So assume that the mean is µ = 0, and we are asking how thevectors arrange themselves around the origin.

Imagine that these vectors v1, v2, . . . , vN tend to lie along a particularline through the origin. Lets try to take an orthonormal basis of Rn, sayu1, u2, . . . , un, so that u1 points along that line. How can we find the directionof that line? We look at the quantity 〈vk, x〉. If the vectors lie nearly on a linethrough 0, then for x on that line, 〈vk, x〉 should be large positive or negative,while for x perpendicular to that line, 〈vk, x〉 should be nearly 0. If we square,we can make sure the large positive or negative becomes large positive, so wetake the quantity

Q(x) = 〈v1, x〉2 + 〈v2, x〉2 + · · ·+ 〈vN , x〉2 .

The spectral theorem guarantees that we can pick an orthonormal basis u1, u2, . . . , unof eigenvectors of the symmetric matrix A associated to Q. We will arrangethe eigenvalues λ1, λ2, . . . , λn from largest to smallest. Because Q(x) ≥ 0, wesee that none of the eigenvalues are negative. Clearly Q(x) grows fastest in thedirection x = u1.

10.1 The symmetric matrix A associated to Q(x) (for which

〈Ax, x〉 = Q(x)

95

96 Singular Value Factorization

for every vector x) is

Aij = 〈v1, ei〉〈v1, ej〉+ 〈v2, ei〉〈v2, ej〉+ · · ·+ 〈v2, ei〉〈v2, ej〉 .

If we rescale all of the vectors v1, v2, . . . , vN by the same nonzero scalar, thenthe resulting vectors tend to lie along the same lines or planes as the originalvectors did. So it is convenient to replace Q(x) by the quadratic polynomialfunction

Q(x) =∑k 〈vk, x〉

2∑` ‖v`‖

2 .

This has associated symmetric matrix

Aij =∑k 〈vk, ei〉〈vk, ej〉∑

` ‖v`‖2 ,

which we will call the covariance matrix associated to the data.

Lemma 10.1. Given any set of nonzero vectors v1, v2, . . . , vN in Rn, writethem as the columns of a matrix V . Their covariance matrix

A = V V t∑k ‖vk‖

2

has an orthonormal basis of eigenvectors u1, u2, . . . , un with eigenvalues 1 ≥λ1 ≥ λ2 ≥ · · · ≥ λn ≥ 0.

Remark 10.2. The square roots of the eigenvalues are the correlation coefficients,each indicating how much the data tends to lie in the direction of the associatedeigenvector.

Proof. We have only to check that 〈x, V V tx〉 =∑k 〈vk, x〉

2, an exercise forthe reader, to see that A is the covariance matrix. Eigenvalues of A can’tbe negative, as mentioned already. For any vector x of length 1, the Schwarzinequality says that ∑

k

〈vk, x〉2 ≤∑k

‖vk‖2 .

Therefore, by the minimum principle, eigenvalues of A can’t exceed 1.

Our data lies mostly along a line through 0 just when λ1 is large, and theremaining eigenvalues λ2, λ3, . . . , λn are much smaller. More generally, if wefind that the first dozen or so eigenvalues are relatively large, and the restare relatively much smaller, then our data must lie very close to a subspaceof dimension a dozen or so. The data tends most strong to lie along the u1direction; fluctations about that direction are mostly in the u2 direction, etc.

Every vector x can be written as x = a1 u1 + a2 u2 + · · · + an un, andthe numbers a1, a2, . . . , an are recovered from the formula ai = 〈x, ui〉. If theeigenvalues λ1, λ2, . . . , λd are relatively much larger than the rest, we can say

Singular Value Factorization 97

(a) Data points. The mean ismarked as a cross.

(b) The same data. Lines indi-cate the directions of eigenvec-tors. Vectors sticking out fromthe mean are drawn in those di-rections. The lengths of the vec-tors give the correlation coeffi-cients.

Figure 10.1: Applying principal components analysis to some data points.

that our data live near the subspace spanned by u1, u2, . . . , ud, and say thatour data has d effective dimensions. The numbers a1, a2, . . . , ad are called theprincipal components of a vector x.

To store the data, instead of remembering all of the vectors v1, v2, . . . , vN ,we just keep track of the eigenvectors u1, u2, . . . , ud, and of the principal com-ponents of the vectors v1, v2, . . . , vN . In matrices, this means that instead ofstoring V , we store F = (u1 u2 . . . un), and store the first d columns of F tV ;let W be the matrix of these columns. Coming out of storage, we can approx-imately recover the vectors v1, v2, . . . , vN as the columns of FW . The matrixF represents an orthogonal transformation putting the vectors v1, v2, . . . , vNnearly into the subspace spanned by e1, e2, . . . , ed, and mostly along the e1direction, with fluctations mostly along the e2 direction, etc. So it is oftenuseful to take a look at the columns of W themselves, as a convenient pictureof the data.

Singular Value Factorization

Theorem 10.3. Every real matrix A can be written as A = UΣV t, with Uand V orthogonal, and Σ has the same dimensions as A, with the form

Σ =(D 00 0

)with D diagonal with nonnegative diagonal entries:

D =

σ1

σ2. . .

σr

.

98 Singular Value Factorization

Proof. Suppose that A is p× q. Just like when we worked out principal com-ponents, we order the eigenvalues of AtA from largest to smallest. For eacheigenvalue λj , let σj =

√λj . (Since we saw that the eigenvalues λj of AtA

aren’t negative, the square root makes sense.) Let V be the matrix whosecolumns are an orthonormal basis of eigenvectors of AtA, ordered by eigenvalue.Write

V =(V1 V2

)with V1 the eigenvectors with positive eigenvalues, and V2 those with 0 eigen-value. For each nonzero eigenvalue, define a vector

uj = 1σjAvj .

Suppose that there are r nonzero eigenvalues. Lets check that these vectorsu1, u2, . . . , ur are orthonormal.

〈ui, uj〉 =⟨

1σiAvi,

1σjAvj

⟩= 1σiσj

〈Avi, Avj〉

= 1σiσj

⟨vi, A

tAvj⟩

= 1σiσj

〈vi, λjvj〉

= λj√λiλj

{1 if i = j ,

0 otherwise.

={

1 if i = j ,

0 otherwise.

If there aren’t enough vectors u1, u2, . . . , ur to make up a basis (i.e. if r < p),then just write down some more vectors to make up an orthonormal basis, sayvectors ur+1, u2, . . . , up, and let

U1 =(u1 u2 . . . ur

),

U2 =(ur+1 u2 . . . up

),

U =(U1 U2

).

By definition of these uj , Avj = σjuj , so AV1 = U1D. Calculate

UΣV t =(U1 U2

)(D 00 0

)(V1t

V2t

)= U1DV1

t

= AV1V−11

= A.

Singular Value Factorization 99

Corollary 10.4. Any square matrix A can be written as A = KP (the Cartandecomposition, also called the polar decomposition), where K is orthogonaland P is symmetric and positive semidefinite.

Proof. Write A = UΣV t and set K = UV t and P = V ΣV t.

Chapter 11

Factorizations

Most theorems in linear algebra are obvious consequence of simple factorizations.

LU Factorization

Forward elimination is messy: we swap rows and add rows to lower rows. Wewant to put together all of the row swaps into one permutation matrix, and allof the row additions into one strictly lower triangular matrix.

Algorithm

To forward eliminate a matrix A (lets say with n rows), start by setting p tobe the permutation 1, 2, 3, . . . , n (the identity permutation), L = 1, U = A. Tostart with, no entries of L are painted. Carry out forward elimination on U .

a. Each time you find a nonzero pivot in U , you paint a larger square boxin the upper left corner of L. (This number of rows in this painted box isalways the number of pivots in U .)

b. When you swap rows k and ` of U ,

1. swap entries k and ` of the permutation p and2. swap rows k and ` of L, but only swap unpainted entries which lie

beneath painted ones.

c. If you add s (row k) to row ` in U , then put −s into column k, row ` inL.

The painted box in L is always square, with number of rows and columns equalto the number of pivots drawn in U .

Remark 11.1. As each step begins, the pivot rows of U with nonzero pivots inthem are finished, and the entries inside the painted box and the entries on andabove the diagonal of L are finished.

Theorem 11.2. By following the algorithm above, every matrix A can bewritten as A = P−1LU where P is a permutation matrix of a permutation p, La strictly lower triangular matrix, and U an upper triangular matrix.

101

102 Factorizations

Figure 11.1: Computing the LU factorization

p L U

1, 2, 3

1 0 00 1 00 0 1

1 0 02 0 13 3 3

1, 2, 3

1 0 0−2 1 0−3 0 1

1 0 00 0 10 3 3

1, 3, 2

1 0 0−3 1 0−2 0 1

1 0 00 3 30 0 1

1, 3, 2

1 0 0−3 1 0−2 0 1

1 0 00 3 30 0 1

1, 3, 2

1 0 0−3 1 0−2 0 1

1 0 00 3 30 0 1

Proof. Lets show that after each step, we always have PA = LU and alwayshave L strictly lower triangular. For the first forward elimination step, we mighthave to swap rows. There is no painted box yet, so the algorithm says thatthe row swap leaves all entries of L alone. Let Q be the permutation matrix ofthe required row swap, and q the permutation. Our algorithm will pass fromp = 1, L = I, U = A to p = q, L = I, U = QA, and so PA = LU .

Next, we might have to add some multiples of the first row of U to lowerrows. We carry this out by a strictly lower triangular matrix, say

S =(

1 0s I

),

with s a vector. Notice that

S−1 =(

1 0−s I

),

subtracts the corresponding multiples of row 1 from lower rows. So U becomesUnew = SU , while the permutation p (and hence the matrix P ) stays the same.The matrix L becomes Lnew = S−1L = S−1, strictly lower triangular, andP newA = LnewUnew.

LU Factorization 103

Suppose that after some number of steps, we have reduced the upper leftcorner of U to echelon form, say

U =(U0 U10 U2

),

with U0 in echelon form. Suppose that

L =(L0 0L1 1

)is strictly lower triangular, and that P is some permutation matrix. Finally,suppose that PA = LU .

Our next step in forward elimination could be to swap rows k and ` in U ,and these we can assume are rows in the bottom of U , i.e. rows of U2. Supposethat Q is the permutation matrix of a transposition so that QU2 is U2 with theappropriate rows swapped. In particular, Q2 = I since Q is the permutationmatrix of a transposition. Let

P new =(I 00 Q

)P,

Lnew =(I 00 Q

)L

(I 00 Q

),

Unew =(I 00 Q

)U.

Check that P newA = LnewUnew. Multiplying out:

Lnew =(L0 0QL1 I

),

strictly lower triangular. The upper left corner L0 is the painted box. So Lnew

is just L with rows k and ` swapped under the painted box.If we add s (row k) of U to row `, this means multiplying by a strictly lower

triangular matrix, say S. Then PA = LU implies that PA = LS−1SU . ButLS−1 is just L with s (column `) subtracted from column k.

11.1 Find the LU-factorization of each of

A =(

0 11 0

), B =

(1), C =

(1).

11.2∗ Suppose thatA is an invertible matrix. Prove that any two LU-factorizationsof A which have the same permutation matrix P must be the same.

Tensors

Chapter 12

Quadratic Forms

Quadratic forms generalize the concept of inner product, and play a crucial role inmodern physics.

Bilinear Forms

Definition 12.1. A bilinear form on a vector space V is a rule associating to eachpair of vectors x and y from V a number b(x, y) which is linear as a functionof x for each fixed y and also linear as a function of y for each fixed x.

If V = R, then every bilinear form on V is b(x, y) = cxy where c couldbe any constant.

Every inner product on a vector space is a bilinear form. Moreover,a bilinear form b on V is an inner product just when it is symmetric(b(v, w) = b(w, v)) and positive definite (b(v, v) > 0 unless v = 0).

If V = Rp, each p × p matrix A determines a bilinear map b by therule b(v, w) = 〈v,Aw〉. Conversely, given a bilinear form b on Rp, wecan define a matrix A by setting Aij = b (ei, ej), and then clearly if weexpand out,

b(x, y) =∑ij

xiyj b (ei, ej)

=∑ij

xiyj Aij

= 〈x,Ay〉 .

So every bilinear form on Rp has the form b(x, y) = 〈x,Ay〉 for auniquely determined matrix A.

107

108 Quadratic Forms

Of course, we can add and scale bilinear forms in the obvious way to makemore bilinear forms, and the bilinear forms on a fixed vector space V form avector space.

Lemma 12.2. Fix a basis v1, v2, . . . , vp of V . Given any collection of numbersbij (with i, j = 1, 2, . . . , p), there is precisely one bilinear form b with bij =b (vi, vj). Thus the vector space of bilinear forms on V is isomorphic to thevector space of p× p matrices (bij).

Proof. Given any bilinear form b we can calculate out the numbers bij =b (vi, vj). Conversely given any numbers bij , and vectors x =

∑i xivi and

y =∑j yjvj we can let

b (x, y) =∑i,j

bijxiyj .

Clearly adding bilinear forms adds the associated numbers bij , and scalingbilinear forms scales those numbers.

Lemma 12.3. Let B be the set of all bilinear forms on V . If V has finitedimension, then dimB = (dimV )2.

Proof. There are (dimV )2 numbers bij .

Review problems

12.1 Let V be any vector space. Prove that b(x0 + y0, x1 + y1) = y0 (x1) is abilinear form on V ⊕ V ∗.

12.2∗ What are the numbers bij if b(x, y) = 〈x, y〉 (the usual inner product onRn)?

12.3∗ Suppose that V is a finite dimensional vector space, and let B be thevector space of all bilinear forms on V . Prove that B is isomorphic to the vectorspace Hom (V, V ∗), by the isomorphism F : Hom (V, V ∗)→ B given by takingeach linear map T : V → V ∗, to the bilinear form b given by b(v, w) = (Tw)(v).(This gives another proof that dimB = (dimV )2.)

12.4∗ A bilinear form b on V is degenerate if b(x, y) = 0 for all x for some fixedy, or for all y for some fixed x.

a. Give the simplest example you can of a nonzero bilinear form which isdegenerate.

b. Give the simplest example you can of a bilinear form which is nondegen-erate.

12.5∗ Let V be the vector space of all 2 × 2 matrices. Let b(A,B) = trAB.Prove that b is a nondegenerate bilinear form.

Quadratic Forms 109

12.6∗ Let V be the vector space of polynomial functions of degree at most 2.For each of the expressions

(a)

b(p(x), q(x)) =∫ 1

−1p(x)q(x) dx,

(b)

b(p(x), q(x)) =∫ ∞−∞

p(x)q(x)e−x2dx,

(c)b(p(x), q(x)) = p(0)q′(0),

(d)b(p(x), q(x)) = p(1) + q(1),

(e)b(p(x), q(x)) = p(0)q(0) + p(1)q(1) + p(2)q(2),

is b bilinear? Is b degenerate?

Quadratic Forms

Definition 12.4. A bilinear form b on a vector space V is symmetric if b(x, y) =b(y, x) for all x and y in V . A bilinear form b on a vector space V is positivedefinite if b(x, x) > 0 for all x 6= 0.

12.7∗ Which of the bilinear forms in problem 12.6 are symmetric?

Definition 12.5. The quadratic form Q of a bilinear form b on a vector spaceV is the real-valued function Q(x) = b(x, x).

The squared length of a vector in Rn,

Q(x) = ‖x‖2 = 〈x, x〉 =∑i

x2i

is the quadratic form of the inner product on Rn.

Every quadratic form on Rn has the form

Q(x) =∑ij

Aijxixj ,

for some numbers Aij = Aji. We could make a symmetric matrix A

110 Quadratic Forms

with those numbers as entries, so that Q(x) = 〈x,Ax〉. The symmetricmatrix A is uniquely determined by the quadratic form Q and uniquelydetermines Q.

12.8∗ What are the quadratic forms of the bilinear forms in problem 12.6 onthe preceding page?

Lemma 12.6. Every quadratic form Q determines a symmetric bilinear formb by

b(x, y) = 12 (Q(x+ y)−Q(x)−Q(y)) .

Moreover, Q is the quadratic form of b.

Proof. There are various identities we have to check on b to ensure that b is abilinear form. Each identity involves a finite number of vectors. Therefore itsuffices to prove the result over a finite dimensional vector space V (replacing Vby the span of the vectors involved in each identity). Be careful: the identitieshave to hold for all vectors from V , but we can first pick vectors from V , andthen replace V by their span and then check the identity.

Since we can assume that V is finite dimensional, we can take a basis for Vand therefore assume that V = Rn. Therefore we can write

Q(x) = 〈x,Ax〉 ,

for a symmetric matrix A. Expanding out

b(x, y) = 12 (〈x+ y,A(x+ y)〉 − 〈x,Ax〉 − 〈y,Ay〉)

= 〈x,Ay〉 ,

which is clearly bilinear.

12.9∗ The results of this chapter are still true over any field (although we won’ttry to make sense of being positive definite if our field is not R), except forlemma 12.6. Find a counterexample to lemma 12.6 over the field of Booleannumbers.

Theorem 12.7. The equation

b(x, y) = 12 (Q(x+ y)−Q(x)−Q(y)) .

gives an isomorphism between the vector space of symmetric bilinear forms band the vector space of quadratic forms Q.

The proof is obvious just looking at the equation: if you scale the left side,then you scale the right side, and vice versa, and similarly if you add bilinearforms on the left side, you add quadratic forms on the right side.

Sylvester’s Law of Inertia 111

Sylvester’s Law of Inertia

Theorem 12.8 (Sylvester’s Law of Inertia). Given a quadratic form Q on afinite dimensional vector space V , there is an isomorphism F : V → Rn forwhich

Q(x) = x21 + x2

2 + · · ·+ x2p︸︷︷︸

p positive terms

−x2p+1 − x2

p+2 − · · · − x2p+q︸︷︷︸

q negative terms

,

where

Fx =

x1x2...xn

.

We cannot by any linear change of variables alter the value of p (the number ofpositive terms) or the value of q (the number of negative terms).

Remark 12.9. Sylvester’s Law of Inertia tells us what all quadratic forms looklike, if we allow ourselves to change variables. The numbers p and q are the onlyinvariants. The reader should keep in mind that in our study of the spectraltheorem, we only allowed orthogonal changes of variable, so we got eigenvaluesas invariants. But here we allow any linear change of variable; in particular wecan rescale, so only the signs of the eigenvalues are invariant.

Proof. We could apply the spectral theorem, but we will instead use elementaryalgebra. Take any basis for V , so that we can assume that V = Rn, and thatQ(x) = 〈x,Ax〉 for some symmetric matrix A. In other words,

Q(x) = A11x21 +A12x1x2 + . . . .

Suppose that A11 6= 0. Lets collect together all terms containing x1 andcomplete the square

A11x21 +

∑j>1

A1jx1xj +∑i>1

Ai1xix1

=A11

x1 + 1A11

∑j>1

A1jxj

2

− 1A11

∑j>1

A1jxj

2

Lety1 = x1 + 1

A11

∑j>1

A1jxj .

ThenQ(x) = A11 y

21 + . . .

where the . . . involve only x2, x3, . . . , xn. Changing variables to use y1 in placeof x1 is an invertible linear change of variables, and gets rid of nondiagonalterms involving x1.

112 Quadratic Forms

We can continue this process using x2 instead of x1, until we have usedup all variables xi with Aii 6= 0. So lets suppose that all diagonal terms of Avanish. If there is some nondiagonal term of A which doesn’t vanish, say A12,then make new variables y1 = x1 + x2 and y2 = x1 − x2, so x1 = 1

2 (y1 + y2)and x2 = 1

2 (y1 − y2). Then x1x2 = 14(y2

1 − y22), turning the A12x1x2 term into

two diagonal terms.So now we have killed off all nondiagonal terms, so we can assume that

Q(x) = A11x21 +A22x

22 + · · ·+Annx

2n.

We can rescale x1 by any nonzero constant c which scales A11 by 1/c2. Letschoose c so that c2 = ±A11.

12.10∗ Apply this method to the quadratic form

Q(x) = x21 + x1x2 + x2x1 + x2x3 + x3x2.

Next we have to show that the numbers p of positive terms and q of negativeterms cannot be altered by any linear change of variables. We can assume (byusing the linear change of variables we have just constructed) that V = Rn andthat

Q(x) = x21 + x2

2 + · · ·+ x2p −

(x2p+1 + x2

p+2 + · · ·+ x2p+q).

We want to show that p is the largest dimension of any subspace on which Q ispositive definite, and similarly that q is the largest dimension of any subspaceon which Q is negative definite. Consider the subspace V+ of vectors of theform

x =

x1x2...xp00...0

.

Clearly Q is positive definite on V+. Similarly, Q is negative definite on the

Sylvester’s Law of Inertia 113

subspace V− of vectors of the form

x =

00...0

xp+1xp+2...

xp+q00...0

.

Suppose that we can find some subspace W of V of greater dimension than p,so that Q is positive definite on W . Let T be the orthogonal projection to V+.In other words, for any vector x in Rn, let

P+x =

x1x2...xp00...0

.

Then P+|W : W → V+ is a linear map, and dimW > dimV+, so

dimW = dim ker P+|W + dim im T |W≤ dim ker P+|W + dimV+

< dim ker P+|W + dimW.

so, subtracting dimW from both sides,

0 < dim ker P+|W .

114 Quadratic Forms

Therefore there is a nonzero vector x in W for which P+x = 0, i.e.

x =

00...0

xp+1xp+2...

xp+qxp+q+1xp+q+2

...xn

.

Clearly Q(x) > 0 since x lies in W . But clearly

Q(x) = −x2p+1 − x2

p+2 − · · · − x2p+q ≤ 0,

a contradiction.

Remark 12.10. Much of the proof works over any field, as long as we can divideby 2, i.e. as long as 2 6= 0. However, there could be a problem when we try torescale: even if 2 6= 0, we can only arrange

Q(x) = ε1x21 + ε2x

22 + · · ·+ εnx

2n,

where each εi can be rescaled by any nonzero number of the form c2. (There isno reasonable analogue of the numbers p and q in a general field). In particular,since every complex number has a square root, the same theorem is true forcomplex quadratic forms, but in the stronger form that we can arrange q = 0,i.e. we can arrange

Q(x) = x21 + x2

2 + · · ·+ x2p.

12.11∗ Prove that a quadratic form on any real n-dimensional vector space isnondegenerate just when p+ q = n, with p and q as in Sylvester’s law of inertia.

12.12∗ For a complex quadratic form, prove that if we arrange our quadraticform to be

Q(x) = x21 + x2

2 + · · ·+ x2p.

then we are stuck with the resulting value of p, no matter what linear changeof variables we employ.

Kernel and Null Cone 115

Kernel and Null Cone

Definition 12.11. Take a vector space V . The kernel of a symmetric bilinearform b on V is the set of all vectors x in V for which b(x, y) = 0 for any vectory in V .

12.13∗ Find the kernel of the symmetric bilinear form b(x, y) = x1y1 − x2y2for x and y in R3.

Definition 12.12. The null cone of a symmetric bilinear form b(x, y) is the setof all vectors x in V for which b(x, x) = 0.

The vectorx =

(10

)lies in the null cone of the symmetric bilinear form b(x, y) = x1y1−x2y2for x and y in R2. Indeed the null cone of that symmetric bilinear formis 0 = b(x, x) = x2

1 − x22, so its the pair of lines x1 = x2 and x1 = −x2

in R2.

12.14 Prove that the kernel of a symmetric bilinear form lies in its null cone.

12.15∗ Find the null cone of the symmetric bilinear form b(x, y) = x1y1−x2y2for x and y in R3. What part of the null cone is the kernel?

12.16∗ Prove that the kernel is a subspace. For which symmetric bilinear formsis the null cone a subspace? For which symmetric bilinear forms is the kernelequal to the null cone?

Review problems

12.17∗ Prove that the kernel of a symmetric bilinear form consists precisely inthe vectors x for which Q(x+ y) = Q(y) for all vectors y, with Q the quadraticform of that bilinear form. (In other words, we can translate in the x directionwithout altering the value of the function Q(y).)

Orthonormal Bases

Definition 12.13. If b is a nondegenerate symmetric bilinear form on a vectorspace V , a basis v1, v2, . . . , vn for V is called orthonormal for b if if

b (vi, vj) ={±1 if i = j,

0 if i 6= j.

Corollary 12.14. Any nondegenerate symmetric bilinear form any finite di-mensional vector space has an orthonormal basis.

116 Quadratic Forms

Proof. Take the linear change of variables guaranteed by Sylvesters’ law ofinertia, and then the standard basis will be orthonormal.

12.18 Find an orthonormal basis for the quadratic form b(x, y) = x1 y2 + x2 y1on R2.

12.19 Suppose that b is a symmetric bilinear form on finite dimensional vectorspace V .

(a) For each vector x in V , define a linear map ξ : V → R by ξ(y) = b(x, y).Write this covector ξ as ξ = Tx. Prove that the map T : V → V ∗ givenby ξ = Tx is linear.

(b) Prove that the kernel of T is the kernel of b.

(c) Prove that T is an isomorphism just when b is nondegenerate. (The moralof the story is that a nondegenerate symmetric bilinear form b identifiesthe vector space V with its dual space V ∗ via the map T .)

(d) If b is nondegenerate, prove that for each covector ξ, there is a uniquevector x in V so that b(x, y) = ξ(y) for every vector y in V .

Review problems

12.20∗ What is the linear map T of problem 12.19 (i.e. write down the associ-ated matrix) for each of the following symmetric bilinear forms?

(a) b(x, y) = x1y2 + x2y1 on R2

(b) b(x, y) = x1y1 + x2y2 on R2

(c) b(x, y) = x1y1 − x2y2 on R2

(d) b(x, y) = x1y1 − x2y2 − x3y3 − x4y4 on R4

(e) b(x, y) = x1 (y1 + y2 + y3) + (x1 + x2 + x3) y1

Chapter 13

Tensors and Indices

In this chapter, we define the concept of a tensor in Rn, following an approach commonin physics.

What is a Tensor?

Vectors x have entries xi

x =

x1x2...xn

.

A matrix A has entries Aij . To describe the entries of a vector x, we use a singleindex, while for a matrix we use two indices. A tensor is just an object whoseentries have any number of indices. (Entries will also be called components.)

In a physics course, one learns that stress applied to a crystal causesan electric field (see Feynman et. al. [2] II-31-12). The electric fieldis a vector E (at each point in space) with components Ei, while thestress is a symmetric matrix S with components Sij = Sji. These arerelated by the piezoelectric tensor P , which has components Pijk:

Ei =∑jk

PijkSjk.

Just as a matrix is a rectangle of numbers, a tensor with three indicesis a box of numbers.

For this chapter, all of our tensors will be “tensors in Rn,” which meansthat the indices all run from 1 to n. For example, our vectors are literally inRn, while our matrices are n× n, etc.

The subject of tensors is almost trivial, since there is really nothing muchwe can say in any generality about them. There are two subtle points: upperversus lower indices and summation notation.

117

118 Tensors and Indices

Upper versus lower indices

It is traditional (following Einstein) to write components of vectors x not as xibut as xi, so not as

x =

x1x2...xn

,

but instead as

x =

x1

x2

...xn

.

In particular, x2 doesn’t mean x · x, but means the second entry of x. Next wewrite elements of Rn∗ as

y =(y1 y2 . . . yn

),

with indices down. We will call the elements of Rn∗ covectors. Finally, we writematrices as

A =

A1

1 A12 . . . A1

q

A21 A2

2 . . . A2q

...... . . .

...Ap1 Ap2 . . . Apq

,

so Arowcolumn. In general, a tensor can have as many upper and lower indices as

we need, and we will treat upper and lower indices as being different.For example, the components of a matrix look like Aij , never like Aij or Aij ,

which would represent tensors of a different type.

Summation Notation

Following Einstein further, whenever we write an expression with some letterappearing once as an upper index and once as a lower index, like Aijxj , thismeans

∑j A

ijxj , i.e. a sum is implicitly understood over the repeated j index.

We will often refer to a vector x as xi. This isn’t really fair, since it confusesa single component xi of a vector with the entire vector, but it is standard.Similarly, we write a matrix A as Aij and a tensor with 2 upper and 3 lowerindices as

tijklm.

The names of the indices have no significance and will usually change duringthe course of calculations.

Operations 119

Operations

What can we do with tensors? Very little. At first sight, they look complicated.But there are very few operations on tensors. We can

a. Add tensors that have the same numbers of upper and of lower indices;for example add

sijklm

totijklm

to getsijklm + tijklm .

If the tensors are vectors or matrices, this is just adding in the usual way.

b. Scale; for example3 tijklm

means the obvious thing: triple each component of tijklm.

c. Swap indices of the same type; for example, take a tensor tijk and makethe tensor tikj . There is no nice notation for doing this.

d. Take tensor product: just write down two tensors beside one another, withdistinct indices; for example, the tensor product of sij and tij is

sijtkl.

Note that we have to change the names on the indices of t before we writeit down, so that we don’t use the same index names twice.

e. Finally, contract: take any one upper index, and any one lower index, andset them equal and sum. For example, we can contract the i and k indicesof a tensor tijk to produce tiji. Note that tiji has only one index j, sincethe summation convention tells us to sum over all possibilities for the iindex. So tiji is a covector.

In tensor calculus there are some additional operations on tensor quantities(various methods for differentiating and integrating), and these additional op-erations are essential to physical applications, but tensor calculus is not in thealgebraic spirit of this book, so we will never consider any other operations thanthose listed above.

If xi is a vector and yi a covector, then we can’t add them, because theindices don’t match. But we can take their tensor product xiyj , andthen we can contract to get xiyi. This is of course just y(x), thinkingof every covector y as a linear function on Rn.


If xi is a vector andAij is a matrix, then their tensor product isAijxk, andcontracting gives Aijxj , which is the vectorAx. So matrix multiplicationis tensor product followed by contraction.

If Aij and Bij are two matrices, then AikBkj is the matrix AB. Similarly,AkjB

ik is the matrix BA. These are the two possible contractions of the

tensor product AijBkl .

A matrix Aij has only one contraction: Aii, the trace, which is a number(because it has no free indices).

It is standard in working with tensors to write the entries of the identitymatrix not as Iij but as δij :

δij ={

1 if i = j,

0 otherwise.

The trace of the identity matrix is δii = n, since for each value of iwe add one. Tensor products of the identity matrix give many othertensors, like δijδkl .

13.1 Take the tensor product of the identity matrix and a covector ξ, andsimplify all possible contractions.

If a tensor has various lower indices, we can average over all permuta-tions of them. This process is called symmetrizing over indices. Forexample, a tensor tijk can be symmetrized to a tensor 1

2 tijk + 1

2 tikj . Ob-

viously we can also symmetrize over any two upper indices. But wecan’t symmetrize over an upper and a lower index.If we fix our attention on a pair of indices, we can also antisymmetrizeover them, say taking a tensor tjk and producing

12 tjk −

12 tkj .

A tensor is symmetric in some indices if it doesn’t change when theyare permuted, and is antisymmetric in those indices if it changes by thesign of the permutation when the indices are permuted.Again focusing on just two lower indices, we can split any tensor into a

Operations 121

sumtjk = 1

2 tjk + 12 tkj︸︷︷︸

symmetric

+ 12 tjk −

12 tkj︸︷︷︸

antisymmetric

of a symmetric part and an antisymmetric part.

13.2∗ Suppose that a tensor tijk is symmetric in i and j, and antisymmetricin j and k. Prove that tijk = 0.

Of course, we write a tensor as 0 to mean that all of its components are 0.

Lets look at some tensors with lots of indices. Working in R3, define atensor ε by setting

εijk =

1 if i, j, k is an even permutation of 1, 2, 3,−1 if i, j, k is an odd permutation of 1, 2, 3,0 if i, j, k is not a permutation of 1, 2, 3.

Of course, i, j, k fails to be a permutation just when two or three of i, jor k are equal. For example,

ε123 = 1, ε221 = 0, ε321 = −1, ε222 = 0.

13.3 Take three vectors x, y and z in R3, and calculate the contractionεijkx

iyjzk.

13.4∗ Prove that every tensor tijk which is antisymmetric in all lowerindices is a constant multiple tijk = c εijk.

More generally, working in Rn, we can define a tensor ε by

εi1i2...in =

1 if i1, i2, . . . in is an even permutation of 1, 2, . . . , n,−1 if i1, i2, . . . in is an odd permutation of 1, 2, . . . , n,0 if i1, i2, . . . in is not a permutation of 1, 2, . . . , n.

Note thatεi1i2...inA

i11 A

i22 . . . Ainn = detA,

for a matrix A.


Changing Variables

Lets change variables from x to y = Fx, with F an invertible matrix. We wantto make a corresponding tensor F∗t out of any tensor t, so that sums, scalings,tensor products, and contractions will correspond, and so that on vectors F∗xwill be Fx. These requirements will determine what F∗t has to be.

Lets start with a covector ξi. We need to preserve the contraction ξ(x) onany vector x, so need to have F∗ξ (F∗x) = ξ(x). Replacing the vector x by somevector y = Fx, we get F∗ξ(y) = ξ

(F−1y

), for any vector y, which identifies

F∗ξ asF∗ξ = ξF−1.

In indices:(F∗ξ)i = ξj

(F−1)j

i.

In other words, F∗ξ is ξ contracted against F−1. So vectors transform asF∗x = Fx (contract with F ), and covectors as F∗ξ = ξF−1 (contract withF−1). Contracting any tensor with as many vectors and covectors as neededto form a number; in order to preserve these contractions, the tensor’s upperindices must transform like vectors, and its lower indices like covectors, whenwe carry out F∗. For example,

(F∗t)ijk = F ipFjq tpqr

(F−1)r

k

In other words, we contract one copy of F with each upper index and contractone copy of F−1 with each lower index.

For example, lets see the invariance of contraction under F∗ in the simplestcase of a matrix.

(F∗A)ii = F ijAjk

(F−1)k

i

= Ajk(F−1)k

iF ij

= Ajk(F−1F

)kj

= Ajkδkj

= Ajj

since Ajkδkj has a sum over j and k, but each term vanishes unless j = k, inwhich case we find Ajj being added.

13.5 Prove that both of the contractions of a tensor tijk are preserved by linearchange of variables.

13.6∗ Find F∗ε. (Recall the tensor ε from example 13 on the previous page.)

Remark 13.1. Note how abstract the subject is: it is rare that we would writedown examples of tensors, with actual numbers in their entries. It is morecommon that we think about tensors as abstract algebraic gadgets for storingmultivariate data from physics. Writing down examples, in this rarified air,would only make the subject more confusing.

Two Problems 123

Two Problems

Two problems:

a. How can you tell if two tensors can be made equal by a linear change ofvariable?

b. How can a tensor be split into a sum of tensors, in a manner that isunaltered by linear change of variables?

The first problem has no solution. For a matrix, the answer is Jordan normalform (theorem 4.1 on page 44). For a quadratic form, the answer is Sylvester’slaw of inertia (theorem 12.8 on page 111). But a general tensor can store anenormous amount of information, and no one knows how to describe it. We cangive some invariants of a tensor, to help to distinguish tensors. These invariantsare similar to the symmetric functions of the eigenvalues of a matrix.

The second problem is much easier, and we will provide a complete solution.

Tensor Invariants

The contraction of indices is invariant under F∗, as is tensor product. So givena tensor t, we can try to write down some invariant numbers out it by takingany number of tensor products of that tensor with itself, and any number ofcontractions until there are no indices left.

For example, a matrix Aij has among its invariants the numbers

Aii, AijA

ji , A

ijA

jkA

ki , . . .

In the notation of earlier chapters, these numbers are

trA, tr(A2) , tr (A3) , . . .

i.e. the functions we called pk(A) in example 7 on page 74. We already knowfrom that chapter that every real-valued polynomial invariant of a matrix is afunction of p1(A), p2(A), . . . , pn(A). More generally, all real-valued polynomialinvariants of a tensor are polynomial functions of those obtained by taking somenumber of tensor products followed by some number of contractions. (The proofis very difficult; see Olver [6] and Procesi [7]).

13.7∗ Describe as many invariants of a tensor tijkl as you can.

It is difficult to decide how many of these invariants you need to write downbefore you can be sure that you have a complete set, in the sense that everyinvariant is a polynomial function of the ones you have written down. Generaltheorems (again see Olver [6] and Procesi [7]) ensure that eventually you willproduce a finite complete set of invariants.

If a tensor has more lower indices than upper indices, then so does everytensor product of it with itself any number of times, and so does every contrac-tion. Therefore there are no polynomial invariants of such tensors. Similarly if


there are more upper indices than lower indices then there are no polynomialinvariants. For example, a vector has no polynomial invariants. For example, aquadratic form Qij = Qji has no upper indices, so has no polynomial invariants.(This agrees with Sylvester’s law of inertia (theorem 12.8 on page 111), whichtells us that the only invariants of a quadratic form (over the real numbers) arethe integers p and q, which are not polynomials in the Qij entries.

Breaking into Irreducible Components

Cartesian Tensors

Tensors in engineering and applied physics never have any upper indices. How-ever, the engineers and applied physicists never carry out any linear changesof variable except for those described by orthogonal matrices. This practice is(quite misleadingly) referred to as working with “Cartesian tensors.” The pointto keep in mind is that they are working in Rn in a physical context in whichlengths (and distances) are physically measurable. In order to preserve lengths,the only linear changes of variable we can employ are those given by orthogonalmatrices.

If we had a tensor with both upper and lower indices, like tijk, we can seethat it transforms under a linear change of variables y = Fx as

(F∗t)ijk = F iptpqr

(F−1)q

j

(F−1)r

k.

Lets define a new tensor by letting sijk = tijk, just dropping the upper index. IfF is orthogonal, i.e. F−1 = F t, then F =

(F−1)t, so F ip =

(F−1)p

i. Therefore

(F∗s)ijk =(F−1)p

ispqr

(F−1)q

j

(F−1)r

k

= F iptpqr

(F−1)q

j

(F−1)r

k

= (F∗t)ijk .

We summarize this calculation:

Theorem 13.2. Dropping upper indices to become lower indices is an operationon tensors which is invariant under any orthogonal linear change of variable.

Note that this trick only works for orthogonal matrices F , i.e. orthogonalchanges of variable.

13.8∗ Prove that doubling (i.e. the linear map y = Fx = 2x) acts on a vectorx by doubling it, on a covector by scaling by 1

2 . (In general, this linear mapFx = 2x acts on any tensor t with p upper and q lower indices by scaling by2p−q, i.e. F∗t = 2p−qt.) What happens if you first lower the index of a vectorand then apply F∗? What happens if you apply F∗ and then lower the index?

13.9∗ Prove that contracting two lower indices with one another is an operationon tensors which is invariant under orthogonal linear change of variable, butnot under rescaling of variables.

Cartesian Tensors 125

Engineers often prefer their approach to tensors: only lower indices, and allof the usual operations. However, their approach makes rescalings, and othernonorthogonal transformations (like shears, for example) more difficult. Thereis a similar approach in relativistic physics to lower indices: by contracting witha quadratic form.

Chapter 14

Tensors

We give a mathematical definition of tensor, and show that it agrees with the moreconcrete definition of chapter 13. We will continue to use Einstein’s summationconvention throughout this chapter.

Multilinear Functions and Multilinear Maps

Definition 14.1. If V1, V2, . . . , Vp are some vector spaces, then a function t(x1, x2, . . . , xp)is multilinear if t(x1, x2, . . . , xp) is a linear function of each vector when all ofthe other vectors are held constant. (The vector x1 comes from the vector spaceV1, etc.)

Similarly a map t taking vectors x1 in V1, x2 in V2, . . . , xp in Vp, to avector w = t(x1, x2, . . . , xp) in a vector space W is called a multilinear map ift(x1, x2, . . . , xp) is a linear map of each vector xi, when all of the other vectorsare held constant.

The function t(x, y) = xy is multilinear for x and y real numbers, beinglinear in x when y is held fixed and linear in y when x is held fixed.Note that t is not linear as a function of the vector(

xy

).

Any two linear functions ξ : V → R and η : W → R on two vectorspaces determine a multilinear map t(x, y) = ξ(x)η(y) for x from V andy from W .

What is a Tensor?

We already have a definition of tensor, but only for tensors “in Rn”. We wantto define some kind of object which we will call a tensor in an abstract vectorspace, so that when we pick a basis (identifying the vector space with Rn), theobject becomes a tensor in Rn. The clue is that a tensor in Rn has lower andupper indices, which can be contracted against vectors and covectors. For now,

127

128 Tensors

lets only think about tensors with just upper indices. Upper indices contractagainst covectors, so we should be able to “plug in covectors,” motivating thedefinition:Definition 14.2. For any finite dimensional vector spaces V1, V2, . . . , Vp, letV1⊗V2⊗ · · · ⊗Vp (called the tensor product of the vector spaces V1, V2, . . . , Vp)be the set of all multilinear maps

t (ξ1, ξ2, . . . , ξp) ,

where ξ1 is a covector from V1∗, ξ2 is a covector from V2

∗, etc. Each suchmultilinear map t is called a tensor .

A tensor tij in Rn following our old definition (from chapter 13) yields atensor t(ξ, η) = tijξiηj following this new definition. On the other hand,if t (ξ, η) is a tensor in Rn ⊗ Rn, then we can define a tensor followingour old definition by letting tij = t

(ei, ej

), where e1, e2, . . . , en is the

usual dual basis to the standard basis of Rn.

Let V be a finite dimensional vector space. Recall that there is anatural isomorphism V → V ∗∗, given by sending any vector v to thelinear function fv on V ∗ given by fv (ξ) = ξ(v). We will henceforthidentify any vector v with the function fv; in other words we will fromnow on use the symbol v itself instead of writing fv, so that we thinkof a covector ξ as a linear function ξ(v) on vectors, and also think ofa vector v as a linear function on covectors ξ, by the bizarre definitionv(ξ) = ξ(v). In this way, a vector is the simplest type of tensor.

Definition 14.3. Let V and W be finite dimensional vector spaces, and take va vector in V and w a vector in W . Then write v ⊗ w for the multilinear map

v ⊗ w(ξ, η) = ξ(v)η(w).

So v ⊗ w is a tensor in V ⊗W , called the tensor product of v and w.Definition 14.4. If s is a tensor in V1 ⊗ V2 ⊗ · · · ⊗ Vp, and t is a tensor inW1 ⊗W2 ⊗ · · · ⊗Wq, then let s⊗ t, called the tensor product of s and t, be thetensor in V1 ⊗ V2 ⊗ · · · ⊗ Vp ⊗W1 ⊗W2 ⊗ · · · ⊗Wq given by

s⊗ t (ξ1, ξ2, . . . , ξp, η1, η2, . . . , ηq) = s (ξ1, ξ2, . . . , ξp) t (η1, η2, . . . , ηq) .

Definition 14.5. Similarly, we can define the tensor product of several tensors.For example, given finite dimensional vector spaces U, V and W , and vectorsu from U , v from V and w from W , let u ⊗ v ⊗ w mean the multilinear mapu⊗ v ⊗ w(ξ, η, ζ) = ξ(u)η(v)ζ(w), etc.

What is a Tensor? 129

14.1∗ Prove that

(av)⊗ w = a (v ⊗ w) = v ⊗ (aw)(v1 + v2)⊗ w = v1 ⊗ w + v2 ⊗ wv ⊗ (w1 + w2) = v ⊗ w1 + v ⊗ w2

for any vectors v, v1, v2 from V and w,w1, w2 from W and any number a.

14.2∗ Take U, V and W any finite dimensional vector spaces. Prove that(u⊗ v)⊗ w = u⊗ (v ⊗ w) = u⊗ v ⊗ w for any three vectors u from U , v fromV and w from W .

Theorem 14.6. If V and W are two finite dimensional vector spaces, withbases v1, v2, . . . , vp and w1, w2, . . . , wq, then V ⊗W has as a basis the vectorsvi ⊗ wJ for i running over 1, 2, . . . , p and J running over 1, 2, . . . , q.

Proof. Take the dual bases v1, v2, . . . , vp and w1, w2, . . . , wq. Every tensor tfrom V ⊗W has the form

t(ξ, η) = t(ξiv

i, ηJwJ)

= ξiηJ t(vi, wJ

)so let tij = t

(vi, wJ

)to find

= tiJξiηJ

= tiJvi ⊗ wJ (ξ, η) .

So the vi ⊗ wJ span. Any linear relation between the vi ⊗ wJ , just readingthese lines from bottom to top, would yield a vanishing multilinear map, sowould have to satisfy 0 = t

(vk, wL

)= tkL, forcing all coefficients in the linear

relation to vanish.

Remark 14.7. A similar theorem, with a similar proof, holds for any tensorproducts: take any finite dimensional vector spaces V1, V2, . . . , Vp, and pickany basis for V1 and any basis for V2, etc. Then taking one vector from eachbasis, and taking the tensor product of these vectors, we obtain a tensor inV1⊗V2⊗· · ·⊗Vp. These tensors, when we throw in all possible choices of basisvectors for all of those bases, yield a basis for V1 ⊗ V2 ⊗ · · · ⊗ Vp, called thetensor product basis.

14.3 Let V = R3,W = R2 and let

x =

123

, y =(

45

).

What is x⊗ y in terms of the standard basis vectors ei ⊗ eJ?

130 Tensors

Definition 14.8. Tensors of the form v ⊗ w are called pure tensors.

14.4 Prove that every tensor in V ⊗W can be written as a sum of pure tensors.

Consider in R3 the tensor

e1 ⊗ e1 + e2 ⊗ e2 + e3 ⊗ e3.

This tensor is not pure (which is certainly not obvious just looking atit). Lets see why. Any pure tensor x⊗ y must be

x⊗ y =(x1e1 + x2e2 + x3e3

)⊗(y1e1 + y2e2 + y3e3

)=x1y1e1 ⊗ e1 + x2y1e2 ⊗ e1 + x3y1e3 ⊗ e1

+ x1y2e1 ⊗ e2 + x2y2e2 ⊗ e2 + x3y2e3 ⊗ e2

+ x1y3e1 ⊗ e3 + x2y3e2 ⊗ e3 + x3y3e3 ⊗ e3.

If we were going to have x⊗ y = e1 ⊗ e1 + e2 ⊗ e2 + e3 ⊗ e3, we wouldneed x1y1 = 1, x2y2 = 1, x3y3 = 1, but also x1y2 = 0, so x1 = 0 ory2 = 0, contradicting x1y1 = x2y2 = 1.

Definition 14.9. The rank of a tensor is the minimum number of pure tensorsthat can appear when it is written as a sum of pure tensors.Definition 14.10. If U, V and W are finite dimensional vector spaces and b :U ⊕ V →W is a map for which b(u, v) is linear in u for any fixed v and linearin v for any fixed u, say that b is a bilinear map.

Theorem 14.11 (Universal Mapping Theorem). Every bilinear map b : U ⊕V →W induces a unique linear map B : U ⊗ V →W , by the rule B (u⊗ v) =b(u, v). Sending b to B = Tb gives an isomorphism T between the vector spaceZ of all bilinear maps f : U ⊕ V →W and the vector space Hom (U ⊗ V,W ).

14.5∗ Prove the universal mapping theorem:

Review problems

14.6∗ Write down an isomorphism V ∗ ⊗W ∗ = (V ⊗W )∗, and prove that it isan isomorphism.

14.7∗ If S : V0 → V1 and T : W0 → W1 are linear maps of finite dimensionalvector spaces, prove that there is a unique linear map, which we will write asS ⊗ T : V0 ⊗W0 → V1 ⊗W1, so that

S ⊗ T (v0 ⊗ w0) = (Sv0)⊗ (Tw0) .

14.8∗ What is the rank of e1 ⊗ e1 + e2 ⊗ e2 + e3 ⊗ e3 as a tensor in R3 ⊗ R3?

Review problems 131

14.9∗ Let any tensor v ⊗ w in V ⊗W eat a covector ξ from V ∗ by the rule(v ⊗ w) ξ = ξ(v)w. Prove that this makes v ⊗ w into a linear map V ∗ → W .Prove that this definition extends to a linear map V ⊗ W → Hom (V ∗,W ).Prove that the rank of a tensor as defined above is the rank of the associatedlinear map V ∗ →W . Use this to find the rank of

∑i ei ⊗ ei in Rn ⊗ Rn.

14.10∗ Take two vector spaces V and W and define a vector space V ∗W tobe the collection of all real-valued functions on V ⊕W which are zero except atfinitely many points. Careful: these functions don’t have to be linear. Pickingany vectors v from V and w from W , lets write the function

f(x, y) ={

1, if x = v and y = w

0, otherwise,

as v ∗ w. So clearly V ∗W is a vector space, whose elements are linear combi-nations of elements of the form v ∗w. Let Z be the subspace of V ∗W spannedby the vectors

(av) ∗ w − a(v ∗ w),v ∗ (aw)− a(v ∗ w),(v1 + v2) ∗ w − v1 ∗ w − v2 ∗ w,v ∗ (w1 + w2)− v ∗ w1 − v ∗ w2,

for any vectors v, v1, v2 from V and w,w1, w2 from W and any number a.

a. Prove that if V and W both have positive and finite dimension, thenV ∗W and Z are infinite dimensional.

b. Write down a linear map V ∗W → V ⊗W .

c. Prove that your linear map has kernel containing Z.

It turns out that (V ∗W ) /Z is isomorphic to V ⊗W . We could have definedV ⊗W to be (V ∗W ) /Z, and this definition has many advantages for variousgeneralizations of tensor products.

Remark 14.12. In the end, what we really care about is that tensors using ourabstract definitions should turn out to have just the properties they had with themore concrete definition in terms of indices. So even if the abstract definitionis hard to swallow, we will really only need to know that tensors have tensorproducts, contractions, sums and scaling, change according to the usual ruleswhen we linearly change variables, and that when we tensor together bases, weobtain a basis for the tensor product. This is the spirit behind problem 14.10.

132 Tensors

“Lower Indices”

Fix a finite dimensional vector space V . Consider the tensor product V ⊗ V ∗.A basis v1, v2, . . . , vn for V has a dual basis v1, v2, . . . , vn for V ∗, and a tensorproduct basis vi ⊗ vj . Every tensor t in V ⊗ V ∗ has the form

t = tijvi ⊗ vj .

So it is clear where the lower indices come from when we pick a basis: theycome from V ∗.

14.11 We saw in chapter 13 that a matrixA is written in indices asAij . Describean isomorphism between Hom (V, V ) and V ⊗ V ∗.

Lets define contractions.

Theorem 14.13. Let V and W be finite dimensional vector spaces. There isa unique linear map

V ⊗ V ∗ ⊗W →W,

called the contraction map, that on pure tensors takes v ⊗ ξ ⊗ w to ξ(v)w.

Remark 14.14. We can generalize this idea in the obvious way to any tensorproduct of any finite number of finite dimensional vector spaces: if one of thevector spaces is the dual of another one, then we can contract. For example, wecan contract V ⊗W ⊗V ∗ by a linear map which on pure tensors takes v⊗w⊗ ξto ξ(v)w.

Proof. Pick a basis v1, v2, . . . , vp of V and a basis w1, w2, . . . , wq of W . DefineT on the basis vi ⊗ vj ⊗ wK by

Tvi ⊗ vj ⊗ wK ={wK if i = j,

0 if i 6= j.

By theorem 1.13 on page 11, there is a unique linear map

T : V ⊗ V ∗ ⊗W →W,

which has these values on these basis vectors. Writing any vector v in V asv = aivi and any covector ξ in V ∗ as ξ = biv

i, and any vector w in W asw = cJwJ , we find

Tv ⊗ ξ ⊗ w = aibjcJwJ = ξ(v)w.

Therefore there is a linear map T : V ⊗ V ∗ ⊗W → W , that on pure tensorstakes v ⊗ ξ ⊗ w to ξ(v)w. Any other such map, say S, which agrees with T onpure tensors, must agree on all linear combinations of pure tensors, so on alltensors.

“Swapping Indices” 133

“Swapping Indices”

We have one more tensor operation to generalize to abstract vector spaces:when working in indices we can associate to a tensor tijk the tensor tikj , i.e.swap indices. This generalizes in the obvious manner.

Theorem 14.15. Take V and W finite dimensional vector spaces. There isa unique linear isomorphism V ⊗W → W ⊗ V , which on pure tensors takesv ⊗ w to w ⊗ v.

14.12∗ Prove theorem 14.15, by imitating the proof of theorem 14.13.

Remark 14.16. In the same fashion, we can make a unique linear isomorphismreordering the factors in any tensor product of vector spaces. To be more specific,take any permutation q of the numbers 1, 2, . . . , p. Then (with basically thesame proof) there is a unique linear isomorphism

V1 ⊗ V2 ⊗ · · · ⊗ Vp → Vq(1) ⊗ Vq(2) ⊗ · · · ⊗ Vq(p)

which takes each pure tensor v1⊗ v2⊗· · ·⊗ vp to the pure tensor vq(1)⊗ vq(2)⊗· · · ⊗ vq(p).

Summary

We have now acheived our goal: we have defined tensors on an abstract finitedimensional vector space, and defined the operations of addition, scaling, tensorproduct, contraction and “index swapping” for tensors on an abstract vectorspace.

All there is to know about tensors is that

a. they are sums of pure tensors v ⊗ w,

b. the pure tensor v ⊗ w depends linearly on v and linearly on w, and

c. the universal mapping property.

Another way to think about the universal mapping property is that there areno identities satisfied by tensors other than those which are forced by (1) and(2); if there were, then we couldn’t turn a bilinear map which didn’t satisfy thatidentity into a linear map on tensors, i.e. we would contradict the universalmapping property. Roughly speaking, there is nothing else that you could knowabout tensors besides (1) and (2) and the fact that there is nothing else toknow.

Cartesian Tensors

If V is a finite dimensional inner product space, then we can ask how to “lowerindices” in this abstract setting. Given a single vector v from V , we can “lower

134 Tensors

its index” by turning v into a covector. We do this by contructing the covectorξ(x) = 〈v, x〉. One often sees this covector ξ written as v∗ or some such notation,and usually called the dual to v.

14.13∗ Prove that the map ∗ : V → V ∗ given by taking v to v∗ is an isomor-phism of finite dimensional vector spaces.

Careful: the covector v∗ depends on the choice not only of the vector v, butalso of the inner product.

14.14∗ In the usual inner product in Rn, what is the map ∗?

14.15∗ Let 〈x, y〉0 be the usual inner product on Rn, and define a new innerproduct by the rule 〈x, y〉1 = 2, 〈x, y〉0. Calculate the map which gives the dualcovector in the new inner product.

The inverse to ∗ is usually also written ∗, and we write the vector dualto a covector ξ as ξ∗. Naturally, we can define an inner product on V ∗ by〈ξ, η〉 = 〈ξ∗, η∗〉.

14.16∗ Prove that this defines an inner product on V ∗.

If V and W are finite dimensional inner product spaces, we then define aninner product on V ⊗W by setting

〈v1 ⊗ w1, v2 ⊗ w2〉 = 〈v1, v2〉〈w1, w2〉 .

This expression only determines the inner product on pure tensors, but sincethe inner product is required to be bilinear and every tensor is a sum of puretensors, we only need to know the inner product on pure tensors.

14.17∗ Prove that this defines an inner product on V ⊗W .

Lets write V ⊗2 to mean V ⊗V , etc. We refer to tensors in a vector space Vto mean elements of V ⊗p ⊗ V ∗⊗q for some positive integers p and q, i.e. tensorproducts of vectors and covectors. The elements of V ⊗p are called covarianttensors: they are sums of tensor products of vectors. The elements of V ∗⊗p arecalled contravariant tensors: they are sums of tensor products of covectors.

14.18∗ Prove that an inner product yields a unique linear isomorphism

∗ : V ⊗p ⊗ V ∗⊗q → V ⊗(p+q),

so that

(v1 ⊗ v2 ⊗ · · · ⊗ vp ⊗ ξ1 ⊗ ξ2 ⊗ · · · ⊗ ξq)∗ = v1⊗v2⊗· · ·⊗vp⊗ξ1∗⊗ξ2∗⊗· · ·⊗ξq∗.

This isomorphism “raises indices.” Similarly, we can define a map to “lowerindices.”

Polarization 135

Polarization

We will generalize the isomorphism between symmetric bilinear forms andquadratic forms to an isomorphism between symmetric tensors and polynomials.Definition 14.17. Let V be a finite dimensional vector space. If t is a tensorin V ∗⊗p, i.e. a multilinear function t (v1, v2, . . . , vp) depending on p vectorsv1, v2, . . . , vp from V , then we can define the polarization of t to be the function(also written traditionally with the same letter t) t(v) = t (v, v, . . . , v).

If t is a covector, so a linear function t(v) of a single vector v, then thepolarization is the same linear function.

In R2, if t = e1 ⊗ e2, then the polarization is t (x) = x1x2 for

x =(x1

x2

)in R2.

The antisymmetric tensor t = e1 ⊗ e2 − e2 ⊗ e1 in R2 has polarizationt(x) = x1x2 − x2x1 = 0, vanishing.

Definition 14.18. A function f : V → R on a finite dimensional vector spaceis called a polynomial if there is a linear isomorphism F : Rn → V for whichf(F (x)) is a polynomial in the usual sense.

Clearly the choice of linear isomorphism F is irrelevant. A different choicewould only alter the linear functions by linear combinations of one another, andtherefore would alter the polynomial functions by substituting linear combina-tions of new variables in place of old variables. In particular, the degree of apolynomial function is well defined. A polynomial function f : V → R is calledhomogeneous of degree d if f(λx) = λdf(x) for any vector x in V and numberλ. Clearly every polynomial function splits into a unique sum of homogeneouspolynomial functions.

There are two natural notions of multiplying symmetric tensors, whichsimply differ by a factor. The first is

s◦t (x1, x2, . . . , xa+b) =∑p

s(xp(1), xp(2), . . . , xp(a)

)t(xp(a+1), xp(a+2), . . . , xp(a+b)

),

when s has a lower indices and t has b and the sum is over all permutations pof the numbers 1, 2, . . . , a+ b. The second is

st (x1, x2, . . . , xa+b) = 1(a+ b)!s ◦ t.

136 Tensors

Theorem 14.19. Polarization in is a linear isomorphism taking symmetriccontravariant tensors to polynomials, preserving degree, and taking products toproducts (using the second multiplication above).

Chapter 15

Exterior Forms

This chapter develops the definition and basic properties of exterior forms.

Why Forms?

This section is just for motivation; the ideas of this section will not be usedsubsequently.

Antisymmetric contravariant tensors are also called exterior forms. In termsof indices, they have no upper indices, and they are skew-symmetric in theirlower indices. The reason for the importance of exterior forms is quite deep,coming from physics. Consider fluid passing into and out of a region in space.We would like to measure how rapidly some quantity (for instance heat) flowsinto that region. We do this by integrating the flux of the quantity across theboundary: counting how much is going in, and subtracting off how much isgoing out. This flux is an integral along the boundary surface. But if we changeour mind about which part is the inside and which is the outside, the sign of theintegral has to change. So this type of integral (called a surface integral) hasto be sensitive to the choice of inside and outside, called the orientation of thesurface. This sign sensitivity has to be built into the integral. We are used tointegrating functions, but they don’t change sign when we change orientationthe way a flux integral should. For example, the area of a surface is not aflux integral, because it doesn’t depend on orientation. Lets play around withsome rough ideas to get a sense of how exterior forms provide just the rightsign changes for flux integrals: we can integrate exterior forms. For precisedefinitions and proofs, Spivak [8] is an excellent introduction.

Lets imagine some kind of object α that we can integrate over any surfaceS (as long as S is reasonably smooth, except maybe for a few sharp edges: letsnot make that precise since we are only playing around). Suppose that theintegral

∫Sα is a number. Of course, S must be a surface with a choice of

orientation. Moreover, if we write −S for the same surface with the oppositeorientation, then

∫−S α = −

∫Sα. Suppose that

∫Sα varies smoothly as we

smoothly deform S. Moreover, suppose that if we cut a surface S into twosurfaces S1 and S2,

137

138 Exterior Forms

S = S1 ∪ S1

then∫Sα =

∫S1α+∫S2α: the integral is a sum of locally measureable quantities.

Fix a point, which we can translate to become the origin, and scale up thepicture so that eventually, for S a little piece of surface,

∫Sα is very nearly

invariant under small translations of S. (We can do this because the integralvaries smoothly, so after rescaling the picture the integral hardly varies at all.)Just for simplicity, lets assume that

∫Sα is unchanged when we translate the

surface S. Any two opposite sides of a box are translations of one another, butwith opposite orientations. So they must have opposite signs for

∫α. Therefore

any small box has as much of our quantity entering as leaving. Approximatingany region with small boxes, we must get total flux

∫Sα = 0 when S is the

boundary of the region.Pick two linearly independent vectors u and v, and let P be the parallelogram

at the origin with sides u and v. Pick any vector w perpendicular to u and vand with

det(u v w

)> 0.

Orient P so that the outside of P is the side in the direction of w.

If we swap u and v, then we change the orientation of the parallelogram P .Lets write α(u, v) for

∫Pα. Slicing the parallelogram into 3 equal pieces, say

into 3 parallelograms with sides u/3, v, we see that α(u/3, v) = α(u, v)/3.

In the same way, we can see that α(λu, v) = λα(u, v) for λ any positive rationalnumber (dilate by the numerator, and cut into a number of pieces given by thedenominator). Because α(u, v) is a smooth function of u and v, we see thatα(λu, v) = λα(u, v) for λ > 0. Similarly, α(0, v) = 0 since the parallelogram isflattened into a line. Moreover, α(−u, v) = −α(u, v), since the parallelogram of−u, v is the parallelogram of u, v reflected, reversing its orientation. So α(u, v)scales in u and v. By reversing orientation, α(v, u) = −α(u, v). A shear appliedto the parallelogram will preserve the area, and after the shear we can cutand paste the parallelogram. The integral must be preserved, by translationinvariance, so α(u+ v, v) = α(u, v).

The hard part is to see why α is linear as a function of u. This comes fromthe drawing

Definition 139

v w

u

u+ v

v + w

The integral over the boundary of this region must vanish. If we pick threevectors u, v and w, which we draw as the standard basis vectors, then the regionhas boundary given by various parallelograms and triangles (each triangle beinghalf a parallelogram), and the vanishing of the integral gives

0 = α(u, v) + α(v, w) + 12α(w, u)− 1

2α(w, u)− α(u+ w, v).

Therefore α(u, v) + α(v, w) = α(u+ w, v), so that finally α is a tensor. If youlike indices, you can write α as αij with αji = −αij .

Our argument is only slightly altered if we keep in mind that the integral∫Sα should not really be exactly translation invariant, but only vary slightly

with small translations, and that the integral around the boundary of a smallregion should be small. We can still carry out the same argument, but throwingin error terms proportional to the area of surface and extent of translation, orto the volume of a region. We end up with α being an exterior form whosecoefficients are functions.

If we imagine a flow contained inside a surface, we can similarly measureflux across a curve. We also need to be sensitive to orientation: which side ofthe boundary of a surface is the inside of the surface. Again the correct objectto work with in order to have the correct sign sensitivity is an exterior form(whose coefficients are functions, not just numbers). Similar remarks hold inany number of dimensions. So exterior forms play a vital role because they arethe objects we integrate. We can easily change variables when we integrateexterior forms.

Definition

Definition 15.1. A tensor t in V ∗⊗p is called a p-form if it is antisymmetric, i.e.

t (v1, v2, . . . , vp)

is antisymmetric as a function of the vectors v1, v2, . . . , vp,

For any permutation q,

t(vq(1), vq(2), . . . , vq(p)

)= (−1)N t (v1, v2, . . . , vp) ,

where (−1)N is the sign of the permutation q.

140 Exterior Forms

The form in Rn

ε(v1, v2, . . . , vn) = det(v1 v2 . . . vn

)is called the volume form of Rn (because of its interpretation as anintegrand:

∫Rε is the volume of any region R.).

A covector ξ in V ∗ is a 1-form, because there are no permutations youcan carry out on ξ(v).

In Rn we traditionally write points as

x =

x1

x2

...xn

,

and write dx1 for the covector given by the rule dx1(y) = y1 for anyvector y in Rn. Then α = dx1 ⊗ dx2 − dx2 ⊗ dx1 is a 2-form:

α(u, v) = u1v2 − u2v1.

15.1∗ If t is a 3-form, prove that

a. t(x, y, y) = 0 and

b. t(x, y + 3x, z) = t(x, y, z)

for any vectors x, y, z.

Hints

1.1.

a. 0 v = (0 + 0) v = 0 v + 0 v. Add the vector w for which 0 v + w = 0 toboth sides.

b. a 0 = a (0 + 0) = a 0 + a 0. Similar.

1.3. If there are two such vectors, w1 and w2, then v + w1 = v + w2 = 0.Therefore

w1 = 0 + w1

= (v + w2) + w1

= v + (w2 + w1)= v + (w1 + w2)= (v + w1) + w2

= 0 + w2

= w2.

0 = (1 + (−1)) v= 1 v + (−1) v= v + (−1) v

so (−1) v = −v.1.8.

a. Yes: limits of sums and scalings are sums and scalings of limits, so conti-nuity survives sums and scalings.

b. No: scaling by a negative number will take a (not everywhere 0) non-negative function to a nonpositive (somewhere negative) function. Forexample, −

(x2) = −x2. Therefore you can’t scale functions while staying

inside the set of nonnegative functions.

c. No:(x3 + 1

)+(−x3 + 7

)= 8 has degree 0, so you can’t add functions

while staying in the set of polynomials of degree 3.

141

142 Hints

d. Yes: if you add symmetric 3 × 3 matrices A and B, you get a matrixA+B which satisfies

(A+B)t = At +Bt = A+B,

so A+B is symmetric, and if a is any number then clearly

(aA)t = a(At)

= aA,

so aA is symmetric. So you can add and scale symmetric matrices, andthey stay symmetric.

1.9. You might take:

a. 1, x, x2, x3

b. (1 0 00 0 0

),

(1 0 00 0 0

),

(1 0 00 0 0

),(

1 0 00 0 0

),

(1 0 00 0 0

),

(1 0 00 0 0

)c. The set of matrices e(i, j) for i ≤ j, where e(i, j) has zeroes everywhere,

except for a 1 at row i and column j.

d. A polynomial p(x) = a+ bx+ cx2 + dx3 vanishes at the origin just whena = 0. A basis: x, x2, x3.

1.11. Some examples you might think of:

• The set of constant functions.

• The set of linear functions.

• The set of polynomial functions.

• The set of polynomial functions of degree at most d (for any fixed d).

• The set of functions f(x) which vanish when x < 0.

• The set of functions f(x) for which there is some interval outside of whichf(x) vanishes.

• The set of functions f(x) for which f(x)p(x) goes to zero as x gets large,for every polynomial p(x).

• The set of functions which vanish at the origin.

1.12.

Hints 143

a. no

b. no

c. no

d. yes

e. no

f. yes

g. yes

1.13.

a. no

b. no

c. no

d. yes

e. yes

f. no

1.15.

a. If AH = HA and BH = HB, then (A + B)H = H(A + B), clearly.Similarly 0H = H0 = 0, and (cA)H = H(cA).

b. P is the set of diagonal 2× 2 matrices.

1.17. Ax = Sx and By = Ty for any x in Rp and y in Rq. Therefore TSx =BSx = BAx for any x in Rp. So (BA)ej = TSej , and therefore BA has therequired columns to be the associated matrix.1.19.

a. 1x = 1y just when x = y, since 1x = x.

b. Given any z in V , we find z = 1x just for x = z.

1.20. Given z in V , we can find some x in U so that Tx = z. If there are twochoices for this x, say z = Tx = Ty, then we know that x = y. Therefore x isuniquely determined. So let T−1z = x. Clearly T−1 is uniquely determined,by the equation T−1T = 1, and satisfies TT−1 = 1 too. Lets prove that T−1

is linear. Pick z and w in V . Then let Tx = z and Ty = w. We know

144 Hints

that x and y are uniquely determined by these equations. Since T is linear,Tx+ Ty = T (x+ y). This gives z + w = T (x+ y). So

T−1(z + w) = T−1T (x+ y)= x+ y

= T−1z + T−1w.

Similarly, T (ax) = aTx, and taking T−1 of both sides gives aT−1z = T−1az.So T−1 is linear.1.21. Write p(x) = a+ bx+ cx2. Then

Tp =

aa+ b+ ca+ 2b+ 4c

.

To solve for a, b, c in terms of p(0), p(1), p(2), we solve11 1 11 2 4

abc

=

p(0)p(1)p(2)

.

Apply forward elimination:1 0 0 p(0)1 1 1 p(1)1 2 4 p(2)

1 0 0 p(0)

0 1 1 p(1)− p(0)0 2 4 p(2)− p(0)

1 0 0 p(0)

0 1 1 p(1)− p(0)0 0 2 p(2)− 2p(1) + p(0)

Back substitute to find:

c = 12p(0)− p(1) + 1

2p(2)

b = p(1)− p(0)− c

= −32p(0) + 2p(1)− 1

2p(2)

a = p(0).

Therefore we can recover p = a+bx+cx2 completely from knowing p(0), p(1), p(2),so T is one-to-one and onto.1.22. The kernel of T is the set of x for which Tx = 0. But Tx = 0 impliesSx = P−1Tx = 0, and Sx = 0 implies Tx = PSx = 0. So the same kernel.

Hints 145

The image of S is the set of vectors of the form Sx, and each is carried by Pto a vector of the form PSx = Tx. Conversely P−1 carries the image of T tothe image of S. Check that this is an isomorphism.

1.24. If T : U → V is an isomorphism, and F : Rn → U is a isomorphism, provethat TF : Rn → V is also an isomorphism.

1.27. If x + W is a translate, we might find that we can write this translatetwo different ways, say as x+W but also as z +W . So x and z are equal upto adding a vector from W , i.e. x − z lies in W . Then after scaling, clearlysx− sz = s(x− z) also lies in W . So sx+W = sz +W , and therefore scalingis defined independent of any choices. A similar argument works for additionof translates.

1.28. Take F and G two isomorphisms. Determinants of matrices multiply. LetA be the matrix associated to F−1TF : Rn → Rn and B the matrix associatedto G−1TG : Rn → Rn. Let C be the matrix associated to G−1F . ThereforeCAC−1 = B.

detB = det(CAC−1)

= detC detA (detC)−1

= detA.

1.29.

a. T (p(x) + q(x)) = 2 p(x−1) + 2 q(x−1) = Tp(x) +Tq(x), and T (ap(x)) =2a p(x− 1) = a Tp(x).

b. If Tp(x) = 0 then 2 p(x− 1) = 0, so p(x− 1) = 0 for any x, so p(x) = 0for any x. Therefore T has kernel {0}. As for the image, if q(x) isany polynomial of degree at most 2, then let p(x) = 1

2q(x + 1). ClearlyTp(x) = q(x). So T is onto.

c. To find the determinant, we need an isomorphism. Let F : R3 → V ,

F

abc

= a+ bx+ cx2.

146 Hints

Calculate the matrix A of F−1TF by

F−1TF

abc

= F−1T (a+ bx+ cx2)

= F−12(a+ b(x− 1) + c(x− 1)2)

= F−1 (2a+ 2b(x− 1) + 2c(x2 − 2x+ 1

))= F−1 ((2a− 2b+ 2c) + (2b− 4c)x+ 2cx2)=

2a− 2b+ 2c

2b− 4c2c

So the associated matrix is

A =

2 −2 20 2 −40 0 2

giving

detT = detA = 8.

1.30. detT = 2n

1.31. detT = detA2. The eigenvalues of T are the eigenvalues of A. Theeigenvectors with eigenvalue λj are spanned by(

xj 0),(0 xj

).

1.32. Let y1 and y2 be the eigenvectors of At with eigenvalues λ1 and λ2respectively. Then the eigenvalues of T are those of A, with multiplicity 2, andλi-eigenspace spanned by (

yti0

),

(0yti

).

1.33. The characteristic polynomial is p (λ) = (λ+ 1) (λ− 1)2. The 1-eigenspaceis the space of polynomials q(x) for which q(−x) = q(x), so q(x) = a + cx2.This eigenspace is spanned by 1, x2. The (−1)-eigenspace is the space of poly-nomials q(x) for which q(−x) = −q(x), so q(x) = bx. The eigenspace isspanned by x. Indeed T is diagonalizable, and diagonalized by the isomorphismFe1 = 1, F e2 = x, Fe3 = x2, for which

F−1TF =

1−1

1

.

1.34.

Hints 147

(a) A polynomial with average value 0 on some interval must take on thevalue 0 somewhere on that interval, being either 0 throughout the intervalor positive somewhere and negative somewhere else. A polynomial in onevariable can’t have more zeros than its degree.

(b) It is enough to assume that the number of intervals is n, since if it issmaller, we can just add some more intervals and specify some morechoices for average values on those intervals. But then Tp = 0 only forp = 0, so T is an isomorphism.

1.36. Clearly the expression is linear in p(z) and “conjugate linear” in q(z).Moreover, if 〈p(z), p(z)〉 = 0, then p(z) has roots at z0, z1, z2 and z3. But p(z)has degree at most 3, so has at most 3 roots or else is everywhere 0.

2.1. If there were two, say z1 and z2, then z1+z2 = z1, but z1+z2 = z2+z1 = z2.

2.2. Same proof, but with · instead of +.

2.6. If p = ab, then in Fp arithmetic ab = p (mod p) = 0 (mod p). If a hasa reciprocal, say c, then ab = 0 (mod p) implies that b = cab (mod p) = c0(mod p) = 0 (mod p). So b is a multiple of p, and p is a multiple of b, so p = band a = 1.

2.7. You find −21 as answer from the Euclidean algorithm. But you can add79 any number of times to the answer, to get it to be between 0 and 78, sincewe are working modulo 79, so the final answer is −21 + 79 = 58.

2.8. x = 3

2.9. It is easy to check that Fp satisfies addition laws, zero laws, multiplicationlaws, and the distributive law: each one holds in the integers, and to see thatit holds in Fp, we just keep track of multiples of p. For example, x + y in Fpis just addition up to a multiple of p, say x + y + ap, usual integer addition,some integer a. So (x + y) + z in Fp is (x + y) + z in the integers, up to amultiple of p, and so equals x+ (y+ z) up to a multiple of p, etc. The tricky bitis the reciprocal law. Since p is prime, nothing divides into p except p and 1.Therefore for any integer a between 0 and p− 1, the greatest common divisorgcd(a, p) is 1. The Euclidean algorithm computes out integers u and v so thatua+ vp = 1, so that ua = 1 (mod p). Adding or subtracting enough multiplesof p to u, we find a reciprocal for a.

148 Hints

2.10. Gauss-Jordan elimination applied to (A 1): 0 1 0 1 0 01 0 1 0 1 01 1 0 0 0 1

1 0 1 0 1 0

0 1 0 1 0 01 1 0 0 0 1

1 0 1 0 1 0

0 1 0 1 0 00 1 1 0 1 1

1 0 1 0 1 0

0 1 0 1 0 00 0 1 1 1 1

1 0 0 1 0 1

0 1 0 1 0 00 0 1 1 1 1

So

A−1 =

1 0 11 0 01 1 1

.

2.11. Carry out Gauss–Jordan elimination thinking of the entries as living inthe field of rational functions. The result has rational functions as entries. Forany value of t for which none of the denominators of the entries are zero, andnone of the pivot entries are zero, the rank is just the number of pivots. Thereare finitely many entries, and the denominator of each entry is a polynomial,so has finitely many roots.3.2. You could try letting U be the horizontal plane, consisting in the vectors

x =

x1x20

and V any other plane, for instance the plane consisting in the vectors

x =

x10x3

.

3.3. Any vector w in Rn splits uniquely into w = u+v for some u in U and v inV . So we can unambiguously define a map Q by the equation Qw = Su+ Tv.It is easy to check that this map Q is linear. Conversely, given any mapQ : Rn → Rp, define S = Q|U and T = Q|V .

Hints 149

3.7. The pivot columns of (A B) form a basis for U + W . All columns arepivot columns just when U +W is a direct sum.4.1. Take any point of the plane, project it horizontally onto the vertical axis,and then rotate the vertical axis clockwise by a right angle to become thehorizontal axis. If you do it twice, clearly you end up at the origin.4.4.

λ = 2e1e2, e3

λ = 3 e4, e5, e6

4.6. Clearly v by itself is a string of length 1. Let k be the length of the longeststring starting from v. So the string is

(T − λI)k v (T − λI)k−1 v . . . (T − λI) v v

0.

T−λIT−λI T−λI T−λI T−λI

Suppose there is a linear relation, say relation:

c0v + c1 (T − λI) v + · · ·+ ck (T − λI)k v = 0.

Multiply both sides with a big enough power of T − λI to kill off all but thefirst term. This power exists, because all of the vectors in the string are killedby some power of T − λI, and the further down the list, the smaller a powerof T − λI you need to do the job, since you already have some T − λI sittingaround. So this forces c0 = 0. Hit with the next smallest power of T − λI toforce c1 = 0, etc.4.7. Take the shortest possible linear relation between nonzero generalized eigen-vectors. Suppose that it involves vectors vi with various different eigenvaluesλi: (T − λiI)ki vi = 0. If you can find a shorter relation, use it instead. If youcan find a relation of the same length with lower powers ki, use it instead. Letwi = (T − λ1I) vi. These wi are still generalized eigenvectors, each with thesame eigenvalue as vi, but a smaller power for k1. So these wi must all vanish.So all vi are λ1-eigenvectors. The same works for λ2, so all of the vectors viare also λ2 eigenvectors, so all vanish.

To show that the sum of generalized eigenspaces is a direct sum, we needto show that every vector in it has a unique expression as a sum of generalizedeigenvectors of distinct eigenvalues. If there are two such expressions, take theirdifference and you get a linear relation among generalized eigenvectors.4.8. Two ways:(1) There is an eigenvector for each eigenvalue. Pick one foreach. Eigenvectors with different eigenvalues are linearly independent, so they

150 Hints

form a basis. (2) Each Jordan block of size k has an eigenvalue λ of multiplicityk. So all blocks must be 1× 1, and hence A is diagonal in Jordan normal form.

4.9.F =

( 12

12

− 12

12

), F−1AF =

(0 00 2

)4.10.

F =

1 0 00 1 10 1 0

, F−1AF =

0 1 00 0 00 0 0

4.11. Two blocks, each at most n/2 long, and a zero block if needed to pad itout.4.12.

F =

− i

412

i4

121

4i4

14 − i

4i4 0 − i

4 0− 1

4i4 − 1

4 − i4

, F−1AF =

i 1 0 00 i 0 00 0 −i 10 0 0 −i

4.13.

F =

0 0 1 0 01 0 0 0 00 0 0 1 00 1 0 0 00 0 0 0 1

,

F−1AF =

0 10 0

0 1 00 0 10 0 0

.

4.17. Clearly it is enough to prove the result for a single Jordan block. Givenan n× n Jordan block λ+∆, let A be any diagonal matrix,

A =

a1

a2. . .

an

with all of the aj distinct and nonzero. Compute out the matrix B = A(λ+∆)to see that all diagonal entries of B are distinct and that B is upper triangular.

Hints 151

Why is B diagonalizable? Then λ + ∆ = A−1B, a product of diagonalizablematrices.5.1. Using long division of polynomials,

x3 − x+ 3x2 + 1

)x5 + 3x2 + 4x+ 1

− x5 − x3

− x3 + 3x2 + 4xx3 + x

3x2 + 5x+ 1− 3x2 − 3

5x− 2so the quotient is x3 − x+ 3 and the remainder is 5x− 2.

5.2.x − 2

x4 + 2x3 + 4x2 + 4x+ 4)

x5 + 2x3 + x2 + 2− x5 − 2x4 − 4x3 − 4x2 − 4x

− 2x4 − 2x3 − 3x2 − 4x + 22x4 + 4x3 + 8x2 + 8x + 8

2x3 + 5x2 + 4x+ 1012x −

14

2x3 + 5x2 + 4x+ 10)

x4 + 2x3 + 4x2 + 4x + 4− x4 − 5

2x3 − 2x2 − 5x

− 12x

3 + 2x2 − x + 412x

3 + 54x

2 + x + 52

134 x

2 + 132

2x + 5x2 + 2

)2x3 + 5x2 + 4x+ 10

− 2x3 − 4x5x2 + 10

− 5x2 − 100

Clearly r(x) = x2 + 2 (up to scaling; we will always scale to get the leadingterm to be 1.) Solving backwards for r(x):

r(x) = u(x)a(x) + v(x)b(x)

with

u(x) = − 213

(x− 1

2

),

v(x) = 213

(x2 − 5

2x+ 3).

152 Hints

5.3. Euclidean algorithm yields:

2310− 2 · 990 = 330990− 3 · 330 = 0

and

1386− 1 · 990 = 396990− 2 · 396 = 198396− 2 · 198 = 0.

Therefore the greatest common divisors are 330 and 198 respectively. ApplyEuclidean algorithm to these:

330− 1 · 198 = 132198− 1 · 132 = 66132− 2 · 66 = 0.

Therefore the greatest common divisor of 2310, 990 and 1386 is 66. Turningthese equations around, we can write

66 = 198− 1 · 132= 198− 1 · (330− 1 · 198)= 2 · 198− 1 · 330= 2 (990− 2 · 396)− 1 · (2310− 2 · 990)= 4 · 990− 4 · 396− 1 · 2310= 4 · 990− 4 · (1386− 1 · 990)− 1 · 2310= 8 · 990− 4 · 1386− 1 · 2310.

5.6. Clearly ∆nn = 0, since ∆ shifts each vector of the string en, en−1, . . . , e1

one step (and sends e1 to 0). Therefore the minimal polynomial of ∆n mustdivide xn. But for k < n, ∆k

nen = en−k is not 0, so xk is not the minimalpolynomial.5.7. If the minimal polynomial of λ+∆ is m(x), then the minimal polynomialof ∆ must be m(x− λ).5.8. m (λ) = λ2 − 5λ− 25.10. Take the matrix A of T to have Jordan normal form, and you findcharacteristic polynomial

det (A− λ I) = (λ1 − λ)n1 (λ2 − λ)n2 . . . (λN − λ)nN ,

with n1 the sum of the sizes of all Jordan blocks with eigenvalue λ1, etc. Thecharacteristic polynomial is clearly divisible by the minimal polynomial.

Hints 153

5.11. Split the minimal polynomial m(x) into real and imaginary parts. Checkthat s(A) = 0 for s(x) either of these parts, a polynomial equation of the sameor lower degree. The imaginary part has lower degree, so vanishes.5.12. Let

zk = cos(

2πkn

)+ i sin

(2πkn

).

By deMoivre’s theorem, znk = 1. If we take k = 1, 2, . . . , n, these are theso-called n-th roots of 1, so that

zn − 1 = (z − z1) (z − z2) . . . (z − zn) .

Clearly each root of 1 is at a different angle. Tn = I implies that

(T − z1I) (T − z2I) . . . (T − znI) = 0,

so by corollary 5.7, T is diagonalizable. Over the real numbers, we can take

T =(

0 −11 0

),

which satisfies T 4 = 1, but has complex eigenvalues, so is not diagonalizableover the real numbers.5.13. Applying forward elimination to the matrix

1 0 2 20 2 2 60 0 0 00 1 1 31 1 3 50 0 0 00 1 1 30 2 2 61 −1 1 −1

yields

1 0 2 20 2 2 60 0 0 00 1 1 31 1 3 50 0 0 00 1 1 30 2 2 61 −1 1 −1

154 Hints

Add −(row 1) to row 5, −(row 1) to row 9.

1 0 2 20 2 2 60 0 0 00 1 1 30 1 1 30 0 0 00 1 1 30 2 2 60 −1 −1 −3

Move the pivot ↘.

1 0 2 20 2 2 60 0 0 00 1 1 30 1 1 30 0 0 00 1 1 30 2 2 60 −1 −1 −3

Add − 12 (row 2) to row 4, − 1

2 (row 2) to row 5, − 12 (row 2) to row 7, −(row 2)

to row 8, 12 (row 2) to row 9.

1 0 2 20 2 2 60 0 0 00 0 0 00 0 0 00 0 0 00 0 0 00 0 0 00 0 0 0

Hints 155

Move the pivot ↘.

1 0 2 20 2 2 60 0 0 00 0 0 00 0 0 00 0 0 00 0 0 00 0 0 00 0 0 0

Move the pivot →.

1 0 2 20 2 2 60 0 0 00 0 0 00 0 0 00 0 0 00 0 0 00 0 0 00 0 0 0

Move the pivot →.

1 0 2 20 2 2 60 0 0 00 0 0 00 0 0 00 0 0 00 0 0 00 0 0 00 0 0 0

Cutting out all of the pivotless columns after the first one, and all of the zerorows, yields 1 0 2

0 2 2

156 Hints

Scale row 2 by 12 . 1 0 2

0 1 1

The minimal polynomial is therefore

−2− λ+ λ2 = (λ+ 1) (λ− 2) .

The eigenvalues are λ = −1 and λ = 2. We can’t see which eigenvalue hasmultiplicity 2 and which has multiplicity 1.5.17. If ε1 6= ε2, then D = T and N = 0. If ε1 = ε2, then

D =(ε1 00 ε2

), N =

(0 10 0

).

If we imagine ε1 and ε2 as variables, then D “jumps” (its top right cornerchanges from 0 to 1 “suddenly”) as ε1 and ε2 collide.5.18. We have seen previously that S and T will preserve each others generalizedeigenspaces: if (S − λ)kx = 0, then (S − λ)k Tx = 0. Therefore we can restrictto a generalized eigenspace of S, and then further restrict to a generalizedeigenspace of T . So we can assume that S = λ0 +N0 and T = λ1 +N1, withλ0 and λ1 complex numbers and N0 and N1 commuting nilpotent linear maps.But then ST = λ0λ1 + N where N = λ0N1 + λ1N0 + N0N1. Clearly largeenough powers of N will vanish, because they will be sums of terms like

λj0λk1N

k+`0 N j+`

1 .

So N is nilpotent.6.11. If we have a matrix A with n distinct eigenvalues, then every nearbymatrix has nearby eigenvalues, so still distinct.7.2.

det (A− λ I) = (z1 − λ) (z2 − λ) . . . (zn − λ)= (−1)nPz (λ)= sn(z)− sn−1(z)λ+ sn−2(z)λ2 + · · ·+ (−1)nλn.

8.1. There are (2n)! permutations of 1, 2, . . . , 2n, and each partition is associatedto n!2n different permutations, so there are (2n)!/ (n!2n) different partitions of1, 2, . . . , 2n.8.2. In alphabetical order, the first pair must always start with 1. Then we canchoose any number to sit beside the 1, and the smallest number not choosenstarts the second pair, etc.

Hints 157

(a) {1, 2}

(b) {1, 2} , {3, 4}{1, 3} , {2, 4}{1, 4} , {2, 3}

(c) {1, 2} , {3, 4} , {5, 6}{1, 2} , {3, 5} , {4, 6}{1, 2} , {3, 6} , {4, 5}{1, 3} , {2, 4} , {5, 6}{1, 3} , {2, 5} , {4, 6}{1, 3} , {2, 6} , {4, 5}{1, 4} , {2, 3} , {5, 6}{1, 4} , {2, 5} , {3, 6}{1, 4} , {2, 6} , {3, 5}{1, 5} , {2, 3} , {4, 6}{1, 5} , {2, 4} , {3, 6}{1, 5} , {2, 6} , {3, 4}{1, 6} , {2, 3} , {4, 5}{1, 6} , {2, 4} , {3, 5}{1, 6} , {2, 5} , {3, 4}

9.5. If a linear map T : V →W is 1-to-1 then T0 6= Tv unless v = 0, so thereare no vectors v in the kernel of T except v = 0. On the other hand, supposethat T : V →W has only the zero vector in its kernel. If v1 6= v2 but Tv1 = Tv2,then v1 − v2 6= 0 but T (v1 − v2) = Tv1 − Tv2 = 0. So the vector v1 − v2 is anonzero vector in the kernel of T .9.6. Hom (V,W ) is always a vector space, for any vector space W .9.7. If dimV = n, then V is isomorphic to Kn, so without loss of generality wecan just assume V = Kn. But then V ∗ is identified with row matrices, so hasdimension n as well: dimV ∗ = dimV .9.10.

fx (α+ β) = (α+ β) (x)= α(x) + β(x)= fx(α) + fx(β).

Similarly for fx(s α).9.11. fsx(α) = α(sx) = s α(x) = sfx(α). Similarly for any two vectors x andy in V , expand out fx+y.9.12. We need only find a linear function α for which α (x) 6= α (y). IdentifyingV with Kn using some choice of basis, we can assume that V = Kn, and thusx and y are different vectors in Kn. So some entry of x is not equal to someentry of y, say xj 6= yj . Let α be the function α(z) = zj , i.e. α = ej .

158 Hints

10.1. Calculate out 〈Ax, x〉 for x = x1 e1 + x2 e2 + · · · + xn en to see thatthis gives Q(x). We know that the equation 〈Ax, x〉 = Q(x) determines thesymmetric matrix A uniquely.11.1.

p L U

1, 2(

1 00 1

) (0 11 0

)2, 1

(1 00 1

) (1 00 1

)2, 1

(1 00 1

) (1 00 1

)

12.14. If b(x, y) = 0 for all y, then plug in y = x to find b(x, x) = 0.12.18.

1√2

(11

),

1√2

(1−1

).

12.19.

(a) Suppose that ξ = Tx and η = Ty, i.e ξ(z) = b(x, z) and η(z) = b(y, z)for any vector z. Then ξ(z) + η(z) = b(x, z) + b(y, z) = b(x + y, z), soTx+ Ty = ξ + η = T (x+ y). Similarly for scaling: aTx = Tax.

(b) If Tx = 0 then 0 = b(x, y) for all vectors y from V , so x lies in the kernelof b, i.e. the kernel of T is the kernel of b. Since b is nondegenerate, thekernel of b consists precisely in the 0 vector. Therefore T is 1-to-1.

(c) T : V → V ∗ is a 1-to-1 linear map, and V and V ∗ have the same dimen-sions, say n. dim imT = dim kerT + dim imT = dimV = n, so T is onto,hence T is an isomorphism.

(d) You have to take x = T−1ξ.

13.1. The tensor product is δijξk. There are two contractions:

a. δiiξk = n ξk (in other words, nξ), and

b. δijξi = ξj (in other words, ξ).

13.3.εijkx

iyjzk = det(x y z

).

Hints 159

13.5. One contraction is sk = tiik.

(F∗t)iik = F il tlpq

(F−1)p

i

(F−1)q

k

= tlpq(FF−1)p

l

(F−1)q

k

= tllq(F−1)q

k

= (F∗s)k .

14.3.

x⊗ y = 4 e1 ⊗ e1 + 5 e1 ⊗ e2 + 8 e2 ⊗ e1 + 10 e2 ⊗ e2 + 12 e3 ⊗ e1 + 15 e3 ⊗ e2

14.4. Picking a basis v1, v2, . . . , vp for V and w1, w2, . . . , wq for W , we havealready seen that every tensor in V ⊗W has the form

tiJvi ⊗ wJ ,

so is a sum of pure tensors.14.11. Take any linear map T : V → V and define a tensor t in V ⊗ V ∗, whichis thus a bilinear map t (ξ, v) for ξ in V ∗ and v in V ∗∗ = V , by the rule

t (ξ, v) = ξ(Tv).

Clearly if we scale T , then we scale t by the same amount. Similarly, if we addlinear maps on the right side of equation 15, then we add tensors on the leftside. Therefore the mapping taking T to t is linear. If t = 0 then ξ(Tv) = 0for any vector v and covector ξ. Thus Tv = 0 for any vector v, and so T = 0.Therefore the map taking T to t is 1-to-1. Finally, we need to see that the maptaking T to t is onto, the tricky part. But we can count dimensions for that: ifdimV = n then dim Hom (V, V ) = n2 and dimV ⊗ V ∗ = n.

Bibliography

[1] Emil Artin, Galois theory, 2nd ed., Dover Publications Inc., Mineola, NY,1998. 21

[2] Richard P. Feynman, Robert B. Leighton, and Matthew Sands, The Feyn-man lectures on physics. Vol. 2: Mainly electromagnetism and matter,Addison-Wesley Publishing Co., Inc., Reading, Mass.-London, 1964. 117

[3] Tosio Kato, Perturbation theory for linear operators, Springer-Verlag, Berlin,1995. 65

[4] Peter D. Lax, Linear algebra, John Wiley & Sons Inc., New York, 1997. 17

[5] I. G. Macdonald, Symmetric functions and Hall polynomials, 2nd ed., Ox-ford University Press, New York, 1995. 74

[6] Peter J. Olver, Classical invariant theory, Cambridge University Press,Cambridge, 1999. 123

[7] Claudio Procesi, Lie groups, Springer, New York, 2007. 123

[8] Michael Spivak, Calculus on manifolds, Addison-Wesley, Reading, Mass.,1965. 137

[9] , Calculus, 3rd ed., Publish or Perish, 1994. 17

161

List of notation

Rn The space of all vectors with n real number entries, 17p× q Matrix with p rows and q columns, 170 Any matrix whose entries are all zeroes, 19Σ Sum, 22I The identity matrix, 25In The n× n identity matrix, 25ei The i-th standard basis vector (also the i-th column of the iden-

tity matrix), 25A−1 The inverse of a square matrix A, 28det The determinant of a square matrix, 47At The transpose of a matrix A, 60dim Dimension of a subspace, 82kerA The kernel of a matrix A, 85imA The image of a matrix A, 89〈x, y〉 Inner product of two vectors, 117||x|| The length of a vector, 117|z| Modulus (length) of a complex number, 140arg z Argument (angle) of a complex number, 140C The set of all complex numbers, 140Cn The set of all complex vectors with n entries, 140〈z, w〉 Hermitian inner product, 142A∗ Adjoint of a complex matrix or complex linear map A, 143

163

Index

alphabetical ordersee partition, alphabetical order,

79analytic function, 61antisymmetrize, 120associated

matrix, see matrix, associated

basisdual, 92

bilinear form, 107degenerate, 108nondegenerate, 108positive definite, 109symmetric, 109

bilinear map, 130block

Jordan, see Jordan blockBoolean numbers, 22

Cayley–Hamiltontheorem, see theorem, Cayley–Hamilton

commuting, 58complement, see subspace, complemen-

tarycomplementary subspace, see subspace,

complementarycomplex

linear map, see linear map, com-plex

vector space, see vector space, com-plex

component, 117components

see principal components, 95composition, 9covector, 91, 118

dual, 134

degenerate

bilinear form, see bilinear form,degenerate

dimensioneffective, 97

direct sum, 8, see subspace, direct sum

effective dimension, see dimension, ef-fective

eigenvectorgeneralized, 46

field, 21form

bilinear, see bilinear formvolume, 140

form, exterior, 139

generalizedeigenvector, see eigenvector, gen-

eralized

Hermitianinner product, see inner product,

Hermitianhomomorphism, 91

identitypermutation, see permutation, iden-

tityidentity map, see linear, map, identityinertia

law of, 111inner product, 18

Hermitian, 18space, 18

intersection, 35invariant

subspace, see subspace, invariantinverse, 21isomorphism, 10

165

166 Index

Jordanblock, 44normal form, 44

linearmap, 9

complex, 17identity, 9

maplinear, see linear map

matrixassociated, 9skew-adjoint, 64skew-symmetric, 64

minimal polynomial, see polynomial,minimal

multilinear, 127multiplicative

function, 51

nilpotent, 58nondegenerate

bilinear form, see bilinear form,nondegenerate

normal formJordan, see Jordan normal form

partition, 79alphabetical order, 79associated to a permutation, 79

permutationidentity, 101natural, of partition, 80

Pfaffian, 81polarization, 135polynomial, 135

homogeneous, 135minimal, 54

principal component, 97principal components, 95product

tensor, see tensor, productproduct, inner, see inner productpure

tensor, see tensor, pure

quadratic formkernel, 115

quotient space, 14

ranktensor, see tensor, rank

realvector space, see vector space, real

reciprocal, 21restriction, 13, 57

skew-symmetricnormal form, 77

string, 44subspace

complement, see subspace, com-plementary

complementary, 35direct sum, 35invariant, 58sum, 35

sumof subspaces, see subspace, sum

sum, direct, see subspace, direct sumSylvester, 111symmetrize, 120

tensor, 117, 128antisymmetric, 120contraction, 119contravariant, 134covariant, 134product, 119pure, 130rank, 130symmetric, 120

tensor productbasis, 129of tensors, 128of vector spaces, 128of vectors, 128

theoremCayley–Hamilton, 56

trace, 63, 74, 120translate, 13transpose, 93transverse, 38

vector, 3vector space, 3

complex, 17real, 17

weightof polynomial, 72

AbstractLinearAlgebra - University College CorkA linear map between abstract vector spaces doesn’t...

Documents

Transcript of AbstractLinearAlgebra - University College CorkA linear map between abstract vector spaces doesn’t...