MA 631 Notes Fall 13

7/29/2019 MA 631 Notes Fall 13

1/48

Notes for Linear Algebra

Fall 2013

1 Vector spaces

Definition 1.1 [Vector spaces] Let V be a set with a binary operation +, F a field, and(c, v) cv be a mapping from F V into V. Then V is called a vector space over F (ora linear space over F) if

(i) u + v = v + u for all u, v V(ii) u + (v + w) = (u + v) + w for all u,v,w V(iii) 0 V : v + 0 = v for all v V(iv) u V u V : u + (u) = 0(v) (ab)v = a(bv) for all a, b F and v V(vi) (a + b)v = av + bv for all a, b F and v V(vii) a(u + v) = au + av for all a F and u, v V(viii) 1v = v for all v V

Note that (i)-(iv) mean that (V, +) is an abelian group. The mapping (c, v) cv is

called scalar multiplication, the elements of F scalars, the elements of V vectors. Thevector 0 V is called zero vector (do not confuse it with 0 in F).

In this course we practically study two cases: F = IR and F = C|| . We call them realvector spaces and complex vector spaces, respectively.

Examples 1.2 [of vector spaces] (a) F field, Fn = {(x1, . . . , xn) : xi F} n-tuple space.The operations are defined by

(x1, . . . , xn) + (y1, . . . , yn) = (x1 + y1, . . . , xn + yn)

c(x1, . . . , xn) = (cx1, . . . , c xn)

Note that IR2 and IR3 can be interpreted as ordinary plane and space, respectively, inrectangular coordinate systems.

(b) F field, S set, V = {f : S F} (arbitrary functions from S to F). Theoperations are defined by

(f + g)(s) = f(s) + g(s) for f, g V, s S

(cf)(s) = cf(s) for c F, f V, s S

1

7/29/2019 MA 631 Notes Fall 13

2/48

(c) F[x], the space of polynomials over a field F in the unknown x. Note that bydefinition these are identified with sequences (a0, a1, a2, . . .), an F for all n, such that

only finitely many of the an are non-zero. Addition and scalar multiplication are definedcomponentwise.

(d) F field, Fmn the space of m n matrices with components in F. For A Fmn

denote by Aij the component of the matrix A in the ith row and the jth column, i [1, m],j [1, n]. Then the operations are defined by

(A + B)ij = Aij + Bij

(cA)ij = cAij

Proposition 1.3 [Basic properties of vector spaces] Let V be a vector space over F,v V and c F, then

c0 = 0

0v = 0

(1)v = v

cv = 0 c = 0 or v = 0

Definition 1.4 [Linear combination] Let v1, . . . , vn V and c1, . . . , cn F. Then thevector c1v1 + + cnvn is called a linear combination of v1, . . . , vn.

Definition 1.5 [Subspace] A subset of a vector space V which is itself a vector space

with respect to the operations in V is called a subspace of V.

Theorem 1.6 [Subspace criterion] A non-empty subset W of a vector space V is asubspace of V if and only if

au + bv W for all a, b F, u,v W.

Examples 1.7 [of subspaces](a) Let I IR be an interval. Then C(I), the set of all continuous real-valued

functions on I, is a subspace of the vector space of all functions on I. If k is a positiveinteger, then Ck(I), the k times continuously differentiable functions on I, is a subspace

of C(I).(b) F[x]n, the polynomials of degree n over F, is a subspace of F[x].(c) The space of symmetric n n matrices (i.e. such that Aij = Aji) is a subspace of

Fnn.(e) W = {0} is a trivial subspace in any vector space (it only contains the zero vector).(f) IfC is a collection of subspaces of V, then their intersection

WCW

is a subspace of V. This is not true for unions.

2

7/29/2019 MA 631 Notes Fall 13

3/48

Definition 1.8 [Span] Let A be a subset of a vector space V. Denote by C the collectionof all subspaces ofV that contain A. Then the span ofA is the subspace ofV defined by

span A = WCW

Note: span A is the smallest subspace of V which contains A.

Theorem 1.9 If A = , then span A = {0}, otherwise span A is the set of all linearcombinations of elements of A.

Example 1.10 The vectors (3, 5) and (0, 2) span IR2.

Example 1.11 If W1, W2 V are subspaces, then

span(W1 W2) = W1 + W2

whereW1 + W2

def= {u + v : u W1, v W2}

Definition 1.12 [Linear independence, linear dependence] A finite set of vectors v1, . . . , vnis said to be linearly independent, if c1v1 + + cnvn = 0 implies c1 = = cn = 0. Inother words, all nontrivial linear combinations are different from zero. Otherwise, the set{v1, . . . , vn} is said to be linearly dependent.

An infinite set of vectors is said to be linearly independent if every finite subset of it

is linearly independent.

Examples 1.13 (a) In the space Fn, the vectors

e1 = (1, 0, 0, . . . , 0)

e2 = (0, 1, 0, . . . , 0)...

en = (0, 0, 0, . . . , 1)

are linearly independent.(b) In F[x]n, the polynomials 1, x , x2, . . . , xn are linearly independent. In F[x], the

infinite collection of polynomials {xi : 0 i < } is linearly independent.(c) If a set of vectors contains the zero vector, it is linearly dependent.

Lemma 1.14 Let S be a linearly independent subset of V. Suppose that u V is notcontained in span S. Then the set S {u} is linearly independent.

Definition 1.15 [Basis] A basis of V is a linearly independent set B V such thatspan B = V.

3

7/29/2019 MA 631 Notes Fall 13

4/48

Examples 1.16 a) {e1, . . . , en} is a basis ofFn (the canonical basis).b) {1, x , . . . , xn} is a basis of F[x]n. The infinite collection of polynomials {x

i : 0

i < }i=0 is a basis of F[x].

Theorem 1.17 Let V = {0} be a vector space and S a linearly independent subset ofV. Then there exists a basis B of V such that S B. In particular, every non-trivialvector space has a basis.

Theorem 1.18 Let V be spanned by u1, . . . , umn, and let v1, . . . , vm be linearly inde-pendent in V. Then m n.

Definition 1.19 [finite-dimensional vector space] A vector space V is said to be finite-dimensional if it has a finite basis.

Corollary 1.20 Let V be a finite-dimensional vector space. Then(a) Each basis of V is finite.(b) If {u1, . . . , um} and {v1, . . . , vn} are two bases of V, then m = n.

Definition 1.21 [Dimension] The dimension of a finite-dimensional vector space V isthe number of vectors in any basis of V. It is denoted by dim V.

The trivial vector space {0}, the one consisting of a single zero vector, has no bases,and we define dim{0} = 0.

Examples 1.22 (a) dim Fn

= n(b) dim F[x]n = n + 1(c) dim Fmn = mn(d) F[x] is not a finite-dimensional space

Corollary 1.23 Let V be a finite dimensional vector space.(a) If W is a subspace of V, then W is finite dimensional and dim W dim V.(b) If W is a proper subspace of V, then dim W < dim V.

Theorem 1.24 Let W1 and W2 be finite dimensional subspaces of V. Then W1 + W2 isfinite dimensional and

dim W1 + dim W2 = dim (W1 W2) + dim (W1 + W2).

Lemma 1.25 If {u1, . . . , un} is a basis for V, then for any v V there exist uniquescalars c1, . . . , cn such that

v = c1u1 + + cnun

4

7/29/2019 MA 631 Notes Fall 13

5/48

Definition 1.26 [Coordinates]Let B = {u1, . . . , un} be an ordered basis of V. If v = c1u1 + + cnun, then

(c1, . . . , cn) are the coordinates of the vector v with respect to the basis B. Notation:

(v)B =

c1...

cn

is the coordinate vector of v relative B.

Examples 1.27 (a) The canonical coordinates of a vector v = (x1, . . . , xn) Fn in the

standard (canonical) basis {e1, . . . , en} are its components x1, . . . , xn, since v = x1e1 + + xnen.

(b) The coordinates of a polynomial p Pn(IR) given by

p = a0 + a1x + a2x2 + + anx

n

in the basis {1, x , . . . , xn} are its coefficients a0, a1, . . . , an.

Theorem 1.28 [coordinate mapping] Let dim V = n and B = {u1, . . . , un} be an or-dered basis of V. Then the mapping

MB : V Fn given by v (v)B

is a bijection. For any u, v V and c F we have

MB(u + v) = MB(u) + MB(v)

MB(cv) = cMB(v)

Note: The mapping MB : V Fn not only is a bijection, but also preserves the vector

operations. Since there is nothing else defined in V, we have a complete identity of Vand Fn. Any property ofV can be proven by first substituting Fn for V and then usingthe mapping MB.

Theorem 1.29 [Change of coordinates] Let B = {u1, . . . , un} and B

= {u1, . . . , u

n} betwo ordered bases of V. Denote by PB,B the n n matrix with jth column given by

(PB,B)j = (uj)B

for j = 1, . . . , n. Then the matrix PB,B is invertible with P1B,B = PB,B and

(v)B = PB,B(v)B

for every v V.

5

7/29/2019 MA 631 Notes Fall 13

6/48

Definition 1.30 [Row space, row rank] Let A Fmn. Denote by vi = (Ai1, . . . , Ain) Fn the ith row vector of A. Then the subspace span{v1, . . . , vm} of F

n is called the row

space of A, and dim(span{v1, . . . , vm}) is called the row rank of A.

Definition 1.31 [elementary row operations, row equivalence] The following operationson m n-matrices are called elementary row operations:

(a) Multiplication of one row by a non-zero scalar.(b) Addition of a multiple of one row to another.(c) Interchange of two rows.Two matrices A, B Fmn are called row-equivalent if A can be transformed into B

by a finite series of elementary row operations.

Note: This is an equivalence relation.

Theorem 1.32 Row equivalent matrices have the same row space.

Note: Theorem 1.30 has a converse: if two matrices have the same row space, then theyare row equivalent (a proof may be found in textbooks). We will not need this fact.

Definition 1.33 [row echelon matrix] Let A Fmn have row vectors v1, . . . , vm. A iscalled a row-echelon matrix if

(a) There exists an integer M such that v1, . . . , vM = 0 and vM+1, . . . , vm = 0.(b) If ji = min{j : Aij = 0}, 1 i M, then j1 < j2 < .. . < jM.

Theorem 1.34 If R is a row echelon matrix, then the non-zero row vectors of R are abasis of the row space of R.

Remark 1.35 (i) Every matrix is row equivalent to a row echelon matrix.(ii) Theorems 1.32 and 1.34 show how to find a basis of span{v1, . . . , vm} for a given

set of vectors v1, . . . , vm Fn. In particular, this gives the dimension of that subspace.

Definition 1.36 [direct sums, complementary subspaces] Let W1 and W2 be subspacesof a vector space V. Their sum W1 + W2 is called direct if W1 W2 = {0}. In this caseone writes W1 W2 for their sum.

If W1 W2 = V, then the subspaces W1 and W2 are called complementary subspacesof V.

Theorem 1.37 Let W1 and W2 be subspaces of V. Then the following are equivalent:(i) The sum of W1 and W2 is direct.(ii) If w1 W1, w2 W2 and w1 + w2 = 0, then w1 = w2 = 0.(iii) For each w W1 + W2 there exist unique vectors w1 W1 and w2 W2 such

that w = w1 + w2.(iii) IfB1 is a basis ofW1 and B2 is a basis ofW2, then B1 B2 is a basis ofW1 + W2.

6

7/29/2019 MA 631 Notes Fall 13

7/48

Corollary 1.38 For every subspace W of V there exists a subspace U of V such thatW and U are complementary.

Note: U is generally not unique.

Example 1.39 Let V = C[0, 1]. Let W1 = {f V :1

0 f(x) dx = 0} and W2 = {f V : f const}. Then W1 and W2 are complementary subspaces of V.

Definition 1.40 [external direct sum] Let V and W be vector spaces over F. Theexternal direct sum V W of V and W is defined by

V W := {(v, w)| v V, w W},

(v1, w1) + (v2, w2) = (v1 + v2, w1 + w2) for all v1, v2 V, w1, w2 W ,

c(v, w) = (cv,cw) for all v V, w W, c F .

Theorem 1.41 Let V W be the external direct sum of V and W. Then(a) V W is a vector space over F.(b) If V and W are finite dimensional, then V W is finite-dimensional and

dim V W = dim V + dim W.

(c) V := {(v, 0)|v V} and W := {(0, w)|w W} are complementary subspaces ofV W.

7

7/29/2019 MA 631 Notes Fall 13

8/48

2 Linear Transformations

Definition 2.1 [Linear transformation]Let V and W be vector spaces over a field F. A linear transformation from V to Wis a mapping T : V W such that

(i) T(u + v) = T u + T v for all u, v V;(ii) T(cu) = c T u for all u V, c F.If V = W then T is often called a linear operator on V.

Theorem 2.2 [Elementary properties of linear transformations] Let T : V W be alinear transformation. Then

(a) T(0) = 0, T(u) = T u, T(

ciui) =

ciT ui,(b) Im T = {T u : u V} is a subspace of W (also denoted by R(T)),

(c) Ker T = {u V : T u = 0} is a subspace of V (also denoted by N(T)),(d) T is injective (one-to-one) if and only if Ker T = {0}.

Examples 2.3 [of linear transformations] (a) A matrix A Fmn defines a linear trans-formation T : Fn Fm by T u = Au (multiplication). We denote this transformationby TA.

(b) T : C1(0, 1) C(0, 1) defined by T f = f, where f is the derivative of thefunction f.

(c) T : IR[x]n IR[x]n1 defined by T f = f, as above.(d) T : C[0, 1] IR defined by T f =

10 f(x) dx.

Theorem 2.4 Let V and W be vector spaces, where V is finite dimensional with basis{u1, . . . , un}. Let {v1, . . . , vn} be a subset of W. Then there exists a unique lineartransformation T : V W such that T ui = vi for all 1 i n.

Theorem 2.5 Let T : Fn Fm be a linear transformation, then there exists a uniquematrix A Fmn such that T = TA.

Definition 2.6 [Rank of a linear transformation] Let T : V W be a linear and Im(T)be finite dimensional. Then

rank(T) := dim Im(T). (2.1)

Theorem 2.7 Let T : V W be a linear transformation and V finite dimensional,then Im(T) is finite dimensional and

rank(T) + dim ker(T) = dim V.

Definition 2.8 [Column space, column rank] Let A Fmn, then the subspace of Fm

spanned by the columns of A is called the column space of A, its dimension the columnrank of A.

8

7/29/2019 MA 631 Notes Fall 13

9/48

Lemma 2.9 If R Fmn is a row echelon matrix, then

row rank R = column rank R.

Theorem 2.10 Let A Fmn and T = TA the induced linear transformation from Fn

to Fm. Then(a) Im(TA) is the column space of A. In particular rank(TA) = column rank(A),(b) dim Ker(TA) = n row rank(A),(c) row rank(A) = column rank(A).

Definition 2.11 [Rank of a matrix] Let A Fmn. Then

rank(A) := row rank(A) = column rank(A)

.

Definition 2.12 Let V and W be vector spaces over F. The set of all linear transfor-mations from V to W is denoted by L(V, W)

Theorem 2.13 L(V, W) is a vector space (over F), with ordinary addition and mul-tiplication by scalars as defined for functions (from V to W). If V and W are finitedimensional, then L(V, W) is finite dimensional and

dim L(V, W) = dim V dim W.

Note: A proof of this will follow from Theorem 2.25.

Proposition 2.14 Let V,W,Z be vector spaces over F. For any T L(V, W) andU L(W, Z) the composition U T, also denoted by U T, is a linear transformation fromV to Z, i.e. U T L(V, Z).

Example 2.15 Let A Fmn, B Fkm and TA, TB be defined as in Example 2.3(a).Then the composition TBTA is a linear transformation from F

n to Fk given by the productmatrix BA, i.e. TBTA = TBA.

Definition 2.16 [Isomorphism] A transformation T L(V, W) is called an isomorphismif T is bijective. If an isomorphism T L(V, W) exists, the vector spaces V and W arecalled isomorphic.

Proposition 2.17 If T L(V, W) is an isomorphism, then T1 L(W, V) is an iso-morphism.

9

7/29/2019 MA 631 Notes Fall 13

10/48

Theorem 2.18 Let V and W be finite dimensional, and T L(V, W).(i) T is injective if and only if whenever {u1, . . . , uk} is linearly independent, then

{T u1, . . . , T uk} is also linearly independent.(ii) T is surjective (i.e. Im(T) = W) if and only if rank(T) =dim W.(iii) T is an isomorphism if and only if whenever {u1, . . . , un} is a basis in V, then

{T u1, . . . , T un} is a basis in W.(iv) If T is an isomorphism, then dim V =dim W.

Theorem 2.19 Let V and W be finite dimensional, dim V =dim W, and T L(V, W).Then the following are equivalent:

(i) T is an isomorphism,(ii) T is injective,(iii) T is surjective.

Definition 2.20 [Automorphism, GL(V)] An isomorphisms T L(V, V) is called anautomorphism of V. The set of all automorphisms of V is denoted by GL(V), whichstands for general linear group. In the special case V = Fn one sometimes writesGL(V) = GL(n, F).

Note: GL(V) is not a subspace of L(V, V) (it does not contain zero, for example), butit is a group with respect to composition. This group is not abelian, i.e. T U = UT forsome T, U GL(V), unless dim V = 1.

Example 2.21 Let T : IR[x]n IR[x]n be defined by T f(x) = f(x + a), where a IRis a fixed number. Then T is an isomorphism.

Theorem 2.22 If dim V = n, then V is isomorphic to Fn.

Definition 2.23 [Matrix representation] Let dim V = n and dim W = m. Let B ={u1, . . . , un} be a basis in V and C = {v1, . . . , vm} be a basis in W. Let T L(V, W).The unique matrix A defined by

T uj =m

i=1Aijvi

is called the matrix of T relative to B, C and denoted by [T]B,C.

Theorem 2.24 In the notation of Definitions 1.26 and 2.23, for each vector u V

(T u)C = [T]B,C (u)B.

Theorem 2.25 The mapping T [T]B,C defines an isomorphisms from L(V, W) toFmn.

10

7/29/2019 MA 631 Notes Fall 13

11/48

Theorem 2.26 Let B , C , D be bases in the finite dimensional vector spaces V,W,Z,respectively. For any T L(V, W) and U L(W, Z) we have

[ST]B,D = [S]C,D [T]B,C.

Corollary 2.27 Let V, W be finite dimensional with bases B, C. Then T L(V, W) isan isomorphism if and only if [T]B,C is an invertible matrix. In this case

[T]1B,C = [T1]C,B.

Corollary 2.28 Let B = {u1, . . . , un} and B = {u1, . . . , un} be two bases in V. Then

[IV]B,B = PB,B

where PB,B is the transition matrix defined in Definition 1.29.

Theorem 2.29 Let V and W be finite dimensional, B and B bases in V, and C andC bases in W. Then for every T L(V, W) we have

[T]B,C = Q [T]B,C P

whereP = [I]B,B and Q = [I]C,C

In the special case V = W, B = C and B = C we have

[T]B,B = P1 [T]B,B P.

Definition 2.30 [Similar matrices] Two matrices A, A Fnn are said to be similar,denoted A A, if there is an invertible matrix P Fnn such that

A = P1A P.

Note: Similarity is an equivalence relation.

Theorem 2.31 Two matrices A and A in Fnn are similar if and only if there exists ann-dimensional vector space V, T L(V, V), and two bases B, B of V with A = [T]B,Band A = [T]B,B.

Theorem 2.32 Let V and W be finite dimensional with bases B, C. For all T L(V, W),

rank(T) = rank[T]B,C.

Corollary 2.33 If A and A are similar, then rank(A) =rank(A).

11

7/29/2019 MA 631 Notes Fall 13

12/48

Definition 2.34 [Linear functional, Dual space]Let V be a vector space over F. Then V := L(V, F) is called the dual space of V.

The elements f V

are called linear functionals on V (they are linear transformationsfrom V to F).

Corollary 2.35 V is a vector space. If V is finite dimensional, then dim V =dim V.

Examples 2.36 (a) T : C[0, 1] IR defined by T f =1

0 f(x) dx.(b) The trace of a square matrix A Fnn is defined by tr A =

i Aii. Then

tr (Fnn).(c) Every linear functional f on Fn is on the form

f(x1, . . . , xn) =n

i=1

aixi

for some ai F (and each f of this form is a linear functional).

Theorem 2.37 Let V be finite dimensional and B = {u1, . . . , un} be a basis ofV. Thenthere is a unique basis B = {f1, . . . , f n} of V

such that

fi(uj) = ij,

where ij = 1 ifi = j and 0 if i = j (Kronecker delta symbol). B has the property that

f =n

i=1

f(ui)fi

for all f V.

Definition 2.38 The basis B found in Theorem 2.37 is called the dual basis of B.

Example 2.39 In Example 2.36(c) the dual basis of the standard basis B = {e1, . . . , en}is given by fi(x1, . . . , xn) = xi, i = 1, . . . , n.

Systems of linear equations

Definition 2.40 [Systems of linear equations] A system of m linear equations in n un-knowns x1, . . . , xn is defined by

a11x1 + . . . + a1nxn = b1a21x1 + . . . + a2nxn = b2

... ... ...am1x1 + . . . + amnxn = bm

(2.2)

where aij F and bi F are given. If A Fmn has entries Aij = aij and x F

n1

and b Fm1 are column vectors with entries x1, . . . , xn and b1, . . . , bm, then (2.2) canbe written as

Ax = b.

Every x Fn1 with Ax = b is called a solution of Ax = b. Systems of the form Ax = 0are called homogeneous. If b = 0, then Ax = b is called inhomogeneous.

12

7/29/2019 MA 631 Notes Fall 13

13/48

Theorem 2.41 [homogeneous systems] IfA Fmn, then the set of solutions ofAx = 0is a subspace of Fn1 of dimension n rank(A). In particular, x1 = x2 = . . . = xn = 0

is the unique solution of Ax = 0 if and only if rank(A) = n.

Theorem 2.42 [inhomogeneous systems] Let A Fmn and b Fm1. Define thematrix (A, b) Fm(n+1) by adjoining b as (n + 1)-st column to A.

(i) The inhomogeneous system Ax = b has at least one solution if and only ifrank(A, b) = rank(A).

(ii) Suppose that rank(A, b) = rank(A). Then the solution ofAx = b is unique if andonly if rank(A) = n.

(iii) Let rank(A, b) = rank(A) < n (i.e. Ax = b has multiple solutions), and let xp bea particular solution ofAx = b. Then x is a solution ofAx = b if and only ifx = xp +xh,where xh is a solution of Ax = 0.

Corollary 2.43 [elementary form of Fredholms Alternative] Let A Fnn. Then ex-actly one of the following is true:

(i) Ax = 0 has a non-trivial solution.(ii) Ax = b is solvable for every b Fn1.

Gaussian elimination

Definition 2.44 [Equivalent systems] Let A, A Fmn, b, b Fm1. Then the systemsAx = b and Ax = b are called equivalent if and only if they have the same set of solutions.

Theorem 2.45 If (A, b) and (A, b) are row equivalent, then Ax = b and Ax = b areequivalent.

Remark 2.46 [Gaussian elimination] The inhomogeneous system Ax = b is solved bythe following method:

To (A, b) find a row-equivalent matrix (R, b), where R is a row-echelon matrix.

Ifb has non-zero entries below the non-zero rows ofR, then Ax = b has no solution.

Otherwise solve Rx = b by backward substitution. Denote the first non-zero entriesin each row of R as pivots. If the k-th column of R contains a pivot, then xk isa determined variable. Otherwise xk is a free variable (any choice will lead to asolution).

Example 2.47 Consider

1 1 11 2 1

1 1 1

x1x2

x3

=

15

b

.

13

7/29/2019 MA 631 Notes Fall 13

14/48

This system is equivalent to

1 1 10 3 00 0 0

x1x2x3

= 16b + 3

.

If b = 3, then the system has no solution. If b = 3, then x3 is free, x2 = 2 andx1 = 1 x3, i.e. the general solution is

x =

1 x32

x3

.

14

7/29/2019 MA 631 Notes Fall 13

15/48

3 Determinants

Definition 3.1 [Permutation, Transposition, Sign]A permutation of a set S is a bijective mapping : S S. If S = {1, . . . , n} we call a permutation of degree n (or permutations of n letters).

Notation: = ((1), . . . , (n)).Sn is the set of all permutations of degree n.A permutation which interchanges two numbers and leaves all the others fixed is

called a transposition. Notation: (j, k).If Sn, then the sign of is defined as follows: sg() = 1 if can be written as

a product of an even number of transpositions, and sg() = 1 if can be written as aproduct of an odd number of transpositions.

Note: The sign of a permutation is well-defined, see MA634.

Definition 3.2 [Determinant] Let A Fnn. Then the determinantof A is

det(A) =

Sn

sg()a1(1)a2(2) an(n).

Note: every term in this sum contains exactly one element from each row and exactlyone element from each column.

Examples 3.3 (i) IfA F22, then det(A) = a11a22 a12a21.(ii) Let A =diag(a11, . . . , ann) be a diagonal matrix (i.e., A is the n n-matrix with

diagonal elements a11, . . . , ann and zeros off the diagonal). Then det(A) = a11 ann.

Theorem 3.4 Let A,B,C Fnn.(a) If B is obtained from A by multiplying one row of A by k,then

det(B) = k det(A).

(Note that det(kB) = kn det(B).)b) IfA,B and C are identical, except for the i-th row, where

cij = aij + bij (1 j n)

thendet(C) = det(A) + det(B).

(Note that in general det(A + B) = det(A) + det(B).)c) If B is obtained from A by interchange of two rows, then

det(B) = det(A).

d) IfA has two equal rows, then det(A) = 0.e) If B is obtained from A by adding a multiple of one row to another row, then

det(B) = det(A).

15

7/29/2019 MA 631 Notes Fall 13

16/48

Remark 3.5 Let A Fnn. By using two elementary row operations row interchangeand adding a multiple of one row to another row we can transform A into a row echelon

matrix R, whose leading entries in non-zero rows are non-zero numbers (not necessarilyones). By the above theorem, det(A) = (1)p det(R), where p is the number of rowinterchanges used. From Theorem 3.13 below it will follow that det(R) is the product ofthe diagonal entries of R. This gives a method for calculating det(A) which is generallymore convenient than using Defintion 3.2.

Definition 3.6 [Elementary matrices] A matrix E Fnn is called an elementary matrixif it is of one of the following three types (here E(ij) denotes the matrix with ij-entry 1and all other entries 0):

(i) E = diag(1, . . . , 1, c, 1, . . . , 1), where c = 0 is the entry in the j-th row and j-thcolumn .

(ii) E = In + cE(ij), where c F, i = j.(iii) E = In E(ii) E(jj) + E(ij) + E(ji), where i = j.

Note: If A Fnn and E is an elementary matrix, then EA if found from A by multi-plying the j-th row by c if E is of type (i), by adding c times the j-th row to the i-throw if E is of type (ii), and by exchanging the i-th and j-th rows if E is of type (iii).

Remark 3.7 A square matrix A is invertible if and only if it can be transformed intoIn via elementary row operations. From this it follows that A is invertible if and only ifA = Es . . . E 1 for elementary matrices E1, . . . , E s.

Lemma 3.8 Let A, B Fnn and A invertible. Then

det(AB) = det(A) det(B)

Theorem 3.9 A Fnn is invertible if and only if det(A) = 0. In this case det(A1) =1/ det(A).

Theorem 3.10 Let A, B Fnn, then

det(AB) = det(A) det(B).

Corollary 3.11 If A and B are similar, then det(A) = det(B).

Theorem+Definition 3.12 [Transpose] The transpose At of A Fnn is defined byAtij := Aji. Then

det(A) = det(At).

Note: This implies that all parts of Theorem 3.4 are valid with row replaced by col-umn.

16

7/29/2019 MA 631 Notes Fall 13

17/48

Theorem 3.13 If A11 Frr, A21 F(nr)r, and A22 F(nr)(nr), then

det A11 0

A21 A22

= det(A11) det(A22)

.

Note: By induction it follows that

det

A11 0 0...

. . . . . ....

.... . .

. . . 0Am1 Amm

= det(A11) det(Amm)

In particular, for a lower triangular matrix one has

det

a11 0 0...

. . . . . ....

.... . . . . . 0

an1 ann

= a11 ann

By Theorem 3.12 this also holds for upper triangular matrices.

Definition 3.14 [Cofactors] Let A Fnn and i, j {1, . . . , n}. Denote by A(i|j) F(n1)(n1) the matrix obtained by deleting the i-th row and j-th column from A. Then

cij := (1)i+j det(A(i|j))

is called the i, j-cofactor of A.

Theorem 3.15 [Cofactor expansion](a) For every i {1, . . . , n} one has the i-th row cofactor expansion

det(A) =n

j=1

aijcij.

(b) For every j {1, . . . , n} one has the j-th column cofactor expansion

det(A) = ni=1

aijcij.

Theorem 3.16 If A is invertible, then

A1 =1

det(A)adj(A)

where adj(A) Fnn is the adjoint matrix of A defined by

(adj(A))ij = cji.

17

7/29/2019 MA 631 Notes Fall 13

18/48

Theorem 3.17 [Cramers rule] If A Fnn is invertible and b Fn, then the uniquesolution x Fn of the equation Ax = b is given by

xj =det(Bj)

det(A)(j = 1, . . . , n),

where Bj is the matrix obtained by replacing the j-th column of A by the vector b.

Remark 3.18 Using adj(A) to compute A1 or Cramers rule to solve the equation Ax =b is numerically impractical, since the computation of determinants is too expensivecompared to other methods (e.g. Gaussian elimination). However, Theorems 3.16 and3.17 have theoretical value: they show the continuous dependence of entries of A1 onentries of A and, respectively, of x on the entries of A and b.

Definition 3.19 [triangular matrices] A matrix A Fnn is said to be upper triangularif Aij = 0 for all i > j. If, in addition, Aii = 1 for all 1 i n, the matrix A is said tobe unit upper triangular. Similarly, lower triangular matrices and unit lower triangularmatrices are defined.

Theorem 3.20 The above four classes of matrices are closed under multiplication. Forexample, if A, B Fnn are unit lower triangular matrices, then so is AB.

Theorem 3.21 The above four classes of matrices are closed under taking inverses (ifinvertible).

18

7/29/2019 MA 631 Notes Fall 13

19/48

4 The LU Decomposition Method

This chapter is not covered any more in Algebra II. Part of this material has beenintegrated into Chapter 2 (systems of linear equations, Gaussian elimination). The othercontents are now part of the course Applied Linear Algebra.

19

7/29/2019 MA 631 Notes Fall 13

20/48

5 Diagonalization

Throughout this section, we use the following notation: V is a finite dimensional vectorspace, dim V = n, and T L(V, V). Also, B is a basis in V and [T]B := [T]B,B Fnn

is a matrix representing T.Our goal in this and the following sections is to find a basis in which the matrix [T]B

is as simple as possible. In this section we study conditions under which the matrix[T]B can be made diagonal.

5.1 Definition (Eigenvalue, eigenvector)A scalar F is called an eigenvalue of T if there is a non-zero vector v V such

thatT v = v

In this case, v is called an eigenvector ofT corresponding to the eigenvalue . (Sometimesthese are called characteristic value, characteristic vector.)

5.2 Theorem + Definition (Eigenspace)For every F the set

E = {v V : T v = v} = Ker(I T)

is a subspace of V. If is an eigenvalue, then E = {0}, and it is called the eigenspacecorresponding to .

Note that E always contains 0, even though 0 is never an eigenvector. At the sametime, the zero scalar 0 F may be an eigenvalue.

5.3 Remark0 is an eigenvalue Ker T = {0} T is not invertible [T]B is singular

det[T]B = 0 .

5.4 Definition (Eigenvalue of a matrix) F is called an eigenvalue of a matrix A Fnn if there is a nonzero v Fn such

thatAv = v

Eigenvectors and eigenspaces of matrices are defined accordingly.

5.5 Simple properties(a) is an eigenvalue of A Fnn is an eigenvalue of TA L(F

n, Fn).(b) is an eigenvalue of T is an eigenvalue of [T]B for any basis B.(c) if A = diag(1, . . . , n), then 1, . . . , n are eigenvalues of A with eigenvectors

e1, . . . , en.

20

7/29/2019 MA 631 Notes Fall 13

21/48

(d) dim E = nullity(I T) = n rank(I T).

5.6 LemmaThe function p(x) := det(xI [T]B) is a polynomial of degree n. This function is

independent of the basis B.

5.7 Definition (Characteristic polynomial)The function

CT(x) := det(xI [T]B)

is called the characteristic polynomial of T.For a matrix A Fnn, the function

CA(x) := det(xI A)

is called the characteristic polynomial of A.

Note that CT(x) = C[T]B(x) for any basis B.

5.8 Lemma is an eigenvalue of T if and only ifCT() = 0.

5.9 CorollaryT has at most n eigenvalues. If V is complex (F = C|| ), then T has at least one

eigenvalue. In this case, according to the Fundamental Theorem of Algebra, CT(x) isdecomposed into linear factors.

5.10 CorollaryIf A B are similar matrices, then CA(x) CB(x), hence A and B have the same

set of eigenvalues.

5.11 Examples

(a) A =

3 2

1 0

. Then CA(x) = x

2 3x + 2 = (x 1)(x 2), so that = 1, 2.

Also, E1 =span(1, 1) and E2 =span(2, 1), so that E1 E2 = IR2.

(b) A =

1 10 1

. Then CA(x) = (x 1)

2, so that = 1 is the only eigenvalue,

E1 =span(1, 0).

(c) A =

0 1

1 0

. Then CA(x) = x

2 + 1, so that there are no eigenvalues in the

real case and two eigenvalues (i and i) in the complex case.

21

7/29/2019 MA 631 Notes Fall 13

22/48

(d) A =

3 3 21 2 2

1 1 0

. Then CA(x) = (x1)(x2)2, so that = 1, 2. E1 =span(1, 1, 0)

and E2 =span(2, 0, 1). Here E1 E2 = IR3 (not enough eigenvectors).

(e) A =

1 0 00 2 0

0 0 2

. Then CA(x) = (x 1)(x 2)2, so that = 1, 2. E1 =span{e1}

and E2 =span{e2, e3}. Now E1 E2 = IR3.

5.12 Definition (Diagonalizability)T is said to be diagonalizableif there is a basis B such that [T]B is a diagonal matrix.A matrix A Fnn is said to be diagonalizable if there is a similar matrix D A

which is diagonal.

5.13 Lemma(i) If v1, . . . , vk are eigenvalues of T corresponding to distinct eigenvalues 1, . . . , k,

then the set {v1, . . . , vn} is linearly independent.(ii) If1, . . . , k are distinct eigenvalues of T, then E1 + + En = E1 En.

Proof of (i) goes by induction on k.

5.14 CorollaryIf T has n distinct eigenvalues, then T is diagonalizable. (The converse is not true.)

5.15 TheoremT is diagonalizable if and only if there is a basis B consisting entirely of eigenvectors

of T. (In this case we say that T has a complete set of eigenvectors.)

Not all matrices are diagonalizable, even in the complex case, see Example 5.11(b).

5.16 Definition (Invariant subspace)A subspace W is said to be invariant under T if T W W, i.e. T w W for all

w W.

The restriction of T to a T-invariant subspace W is denoted by T|W. It is a lineartransformation of W into itself.

5.17 Examples(a) Any eigenspace E is T-invariant. Note that any basis in E consists of eigenvec-

tors.

(b) Let A =

0 1 01 0 0

0 0 1

. Then the subspaces W1 =span{e1, e2} and W2 =

22

7/29/2019 MA 631 Notes Fall 13

23/48

span{e3} are TA-invariant. The restriction TA|W1 is represented by the matrix

0 11 0

,

and the restriction TA|W2 is the identity.

5.18 LemmaLet V = W1 W2, where W1 and W2 are T-invariant subspaces. Let B1 and B2 be

bases in W1, W2, respectively. Denote [T|W1 ]B1 = A1 and [T|W2]B2 = A2. Then

[T]B1B2 =

A1 00 A2

Matrices like this are said to be block-diagonal. Note: by induction, this generalizes toV = W1 Wk.

5.19 Definition (Algebraic multiplicity, geometric multiplicity)Let be an eigenvalue ofT. The algebraic multiplicityof is its multiplicity as a root

of CT(x), i.e. the highest power of x that divides CT(x). The geometric multiplicityof is the dimension of the eigenspace E.

The same definition goes for eigenvalues of matrices.

Both algebraic and geometric multiplicities are at least one.

5.20 Theorem

T is diagonalizable if and only if the sum of geometric multiplicities of its eigenvaluesequals n. In this case, if 1, . . . , s are all distinct eigenvalues, then

V = E1 Es

Furthermore, if B1, . . . , Bs are arbitrary bases in E1 , . . . , E s and B = B1 Bs,then

[T]B = diag(1, . . . , 1, 2, . . . , 2, . . . , s, . . . , s)

where each eigenvalue i appears mi = dim Ei times.

5.21 CorollaryAssume thatCT(x) = (x 1) (x n)

where is are not necessarily distinct.(i) If all the eigenvalues have the same algebraic and geometric multiplicities, then T

is diagonalizable.(ii) If all the eigenvalues are distinct, then T is diagonalizable.

23

7/29/2019 MA 631 Notes Fall 13

24/48

5.22 CorollaryLet T be diagonalizable, and D1, D2 are two diagonal matrices representing T (in dif-

ferent bases). Then D1 and D2 have the same the diagonal elements, up to a permutation.

5.23 Examples (continued from 5.11)

(a) The matrix is diagonalizable, its diagonal form is D =

1 00 2

.

(b) The matrix is not diagonalizable.(c) The matrix is not diagonalizable in the real case, but is diagonalizable in the

complex case. Its diagonal form is D =

i 00 i

.

(d) The matrix is not diagonalizable.

24

7/29/2019 MA 631 Notes Fall 13

25/48

6 Generalized eigenvectors, Jordan decomposition

Throughout this and the next sections, we use the following notation: V is a finitedimensional complex vector space, dim V = n, and T L(V, V).In this and the next sections we focuse on nondiagonalizable transformations. Those,

as we have seen by examples, do not have enough eigenvectors.

6.1 Definition (Generalized eigenvector, Generalized eigenspace)Let be an eigenvalue of T. A vector v = 0 is called a generalized eigenvector of T

corresponding to if(T I)kv = 0

for some positive integer k.

The generalized eigenspacecorresponding to is the set of all generalized eigenvectorscorresponding to plus the zero vector. We denote that space by U.

6.2 Example

Let A =

10

for some C|| . Then is the (only) eigenvalue of A, and

E = span {e1}. Since (A I)2 is the zero matrix, U = C||

2.

6.3 NotationFor each k 1, denote U

(k) = Ker(T I)

k. Clearly, U(1) = E and U

(k) U

(k+1)

(i.e., {U

(k)

} is an increasing sequence of subspaces of V). Note that ifU

(k)

= U

(k+1)

,then dim U(k) < dim U

(k+1) .

Observe that U = k=1U

(k) . Since each U

(k) is a subspace of V, their union U is a

subspace, too.

6.4 LemmaThere is an m = m such that U

(k) = U

(k+1) for all 1 k m 1, and

U(m) = U(m+1) = U

(m+2) =

In other words, the sequence of subspaces U(k) strictly increases up to k = m and stabi-

lizes after k m. In particular, U = U(m) .

Proof. By way of contradiction, assume that

U(k) = U

(k+1) = U

(k+2)

Pick a vector v U(k+2) \ U

(k+1) and put u := (T I)v. Then (T I)

k+2v = 0, so(T I)k+1u = 0, hence u Uk+1 . But then u U

k , and so (T I)

ku = 0, hence(T I)k+1v = 0, a contradiction. 2

25

7/29/2019 MA 631 Notes Fall 13

26/48

6.5 Corollarym n, where n = dim V. In particular, U = U

(n) .

6.6 Definition (Polynomials in T)By Tk we denote T T (k times), i.e. inductively Tkv = T(Tk1v). A polynomial

in T isakT

k + + a1T + a0I

where a0, . . . , ak C|| . Example: (T I)k is a polynomial in T for any k 1.

Note: Tk Tm = Tm Tk. For arbitrary polynomials p, q we have p(T)q(T) = q(T)p(T).

6.7 Lemma.

The generalized eigenvectors of T span V.

Proof goes by induction on n = dimV. For n = 1 the lemma follows from 5.9. Assumethat the lemma holds for all vector spaces of dimension < n. Let be an eigenvalue ofT. We claim that V = V1V2, where V1 := Ker(T I)

n = U and V2 := Im(T I)n.

Proof of the claim:(i) Show that V1V2 = 0. Let vV1V2. Then (T I)

nv = 0 and (T I)nu = vfor some uV. Hence (T I)2nu = (T I)nv = 0, i.e. uU. Now 6.5 implies that0 = (T I)nu = v.(ii) Show that V = V1 + V2. In view of 2.6 we have dim V1 + dim V2 = dim V. Since V1and V

2are independent by (i), we have V

1+ V

2= V.

The claim is proven.Since, V1 = U, it is already spanned by generalized eigenvectors. Next, V2 is T-

invariant, because for any vV2 we have v = (T I)nu for some uV, so T v =

T(T I)nU = (T I)nT uV2. Since dim V2 < n (remember that dim V1 1),by the inductive assumption V2 is spanned by generalized eigenvectors of T|V2 , which are,of course, generalized eigenvectors for T. This proves the lemma.2

6.8 LemmaGeneralized eigenvectors corresponding to distinct eigenvalues of T are linearly inde-

pendent.

Proof. Let v1, . . . , vm be generalized eigenvectors corresponding to distinct eigenvalues1, . . . , m. We need to show that if v1 + + vm = 0, then v1 = = vm = 0. It isenough to show that v1 = 0.

Let k be the smallest positive integer such that (T 1I)kv1 = 0. We now apply thetransformation

R := (T 1I)k1(T 2I)

n (T mI)n

to the vector v1 + + vm (if k = 1, then the first factor in R is missing). Since all thefactors in R commute (as polynomials in T), R kills all the vectors v2, . . . , vm, and we

26

7/29/2019 MA 631 Notes Fall 13

27/48

get Rv1 = 0. Next, we replace T iI by (T 1I) + (1 i)I for all i = 2, . . . , min the product formula for R. We then expand this formula by Binominal Theorem. All

the terms but one will contain (T 1I)r

with some r k, which kills v1, due to ourchoice of k. The equation Rv1 = 0 is then equivalent to

(1 2)n (1 m)

n(T 1I)k1v1 = 0

This contradicts our choice of k (remember that the eigenvalues are distinct, so thati = 1). 2

6.9 Definition (Nilpotent transformations)The transformation T : V V is said to be nilpotent, if Tk = 0 for some positive

integer k. The same definition goes for matrices.

6.10 ExampleIf A C|| nn is an upper triangular matrix whose diagonal entries are zero, then it is

nilpotent.

6.11 LemmaT is nilpotent if and only if 0 is its only eigenvalue.

Proof. Let T be nilpotent and be its eigenvalue with eigenvector v = 0. Then Tkv = kvfor all k, and then Tkv = 0 implies k = 0, so = 0.

Conversely, if 0 is the only eigenvalue, then by 6.7, V = U0 =Ker Tn

, so Tn

= 0. 2

6.12 Theorem (Structure Theorem)Let 1, . . . , m be all distinct eigenvalues ofT : V V with corresponding generalized

eigenspaces U1, . . . , U m. Then(i) V = U1Um(ii) Each Uj is T-invariant(iii) (T jI)|Uj is nilpotent for each j(iv) Each T|Uj has exactly one eigenvalue, j(v) dim Uj equals the algebraic multiplicity of j

Proof. (i) Follows from 6.7 and 6.8.(ii) Recall that Uj = Ker(T jI)

n by 6.5. Hence, for any vUj we have

(T jI)nT v

by 6.6= T(T jI)

nv = 0

so that T vUj .(iii) Follows from 6.5.(iv) From 6.11 and (iii), (T jI)|Uj has exactly one eigenvalue, 0. Therefore, TUj hasexactly one eigenvalue, j. (A general fact: is an eigenvalue of T if and only if

27

7/29/2019 MA 631 Notes Fall 13

28/48

is an eigenvalue of T I.)(v) Pick a basis Bj in each Uj , then B :=

mj=1Bj is a basis of V by (i). Due to 5.18 and

(i), the matrix [T]B is block-diagonal, whose diagonal blocks are [T|Uj ]Bj , 1 j m.Then

CT(x) = det(xI [T]B)

= C[T|U1 ]B1 (x) C[T|Um ]Bm(x)

= CT|U1 (x) CT|Um (x)

due to 3.11. Since T|Uj has the only eigenvalue j , we have CT|Uj (x) = (x j)dim Uj ,

henceCT(x) = (x 1)

dim U1 (x m)dim Um

Theorem is proven. 2

6.13 CorollarySince E U, the geometric multiplicity of never exceeds its algebraic multiplicity.

6.14 Definition (Jordan block matrix)An mm matrix J is called a Jordan block matrix for the eigenvalue if

J =

1 0 0 0

0 1 0. . . 0

0 0 1.. .

.

.....

. . . . . . . . . . . . 0...

. . . . . . 0 10 0 0

6.15 Properties of Jordan block matrices(i) CJ(x) = (x )m, hence is the only eigenvalue of J, its algebraic multiplicity is m(ii) (J I)m = 0, i.e. the matrix J I nilpotent(iii) rank (J I) = m 1, hence nullity (J I) = 1, so the geometric multiplicity of is 1.

(iv) Je1 = e1, hence (J I)e1 = 0, i.e. E =span{e1}(v) Jek = ek + ek1, hence (J I)ek = ek1 for 2 k m(vi) (J I)kek = 0 for all 1 k m.Note that the map J I takes

em em1 e1 0

6.16 Definition (Jordan chain)Let be an eigenvalue of T. A Jordan chain is a sequence of non-zero vectors

{v1, . . . , vm} such that (T I)v1 = 0 and (T I)vk = vk1 for k = 2, . . . , m. The

28

7/29/2019 MA 631 Notes Fall 13

29/48

length of the Jordan chain is m.

Note that a Jordan chain contains exactly one eigenvector (v1), and all the other vectorsin the chain are generalized eigenvectors.

6.17 LemmaLet be an eigenvalue ofT. Suppose we have s 1 Jordan chains corresponding to ,

call them {v11, . . . , v1m1}, . . ., {vs1, . . . , vsms}. Assume that the vectors {v11, v21, . . . , vs1}(the eigenvectors in these chains) are linearly independent. Then all the vectors {vij :i = 1, . . . , s , j = 1, . . . , mj} are linearly independent.

Proof. Let M = max{m1, . . . , ms} be the maximum length of the chains. The proof goesby induction on M. For M = 1 the claim is trivial. Assume the lemma is proved forchains of lengths M 1. Without loss of generality, assume that m1 m2 ms,i.e. the lengths are decreasing, so M = m1.

By way of contradiction, let i,j

cijvij = 0

Applying (T I)m11 to the vector

cijvij kills all the terms except the last vectorsvim1 in the chains of maximum length (of length m1). Those vectors will be transformedto (T I)m11vim1 = vi1, so we get

pi=1 c

im1vi1 = 0

where p s is the number of chains of length m1. Since the vectors {vi1} are linearlyindependent, we conclude that cim1 = 0 for all i. That reduces the problem to the caseof chains of lengths M 1. 2

6.18 CorollaryLet B = {v1, . . . , vm} be a Jordan chain corresponding to an eigenvalue . Then B is

linearly independent, i.e. it is a basis in the subspace W := span{v1, . . . , vm}. Note thatW is T-invariant, and the matrix [T|W]B is exactly a Jordan block matrix.

6.19 Definition (Jordan basis)A basis B of V is called a Jordan basis for T if it is a union of some Jordan chains.

6.20 RemarkIf B is a Jordan basis of V, then [T]B is a block diagonal matrix, whose diagonal

blocks are Jordan block matrices.

29

7/29/2019 MA 631 Notes Fall 13

30/48

6.21 Definition (Jordan matrix)A matrix Q is called a Jordan matrix corresponding to an eigenvalue if

Q =

J1 0...

. . ....

0 Js

where J1, . . . , J s are Jordan block matrices corresponding to , and their lengths decrease:|J1| |Js|.

A matrix A is called a Jordan matrix if

A =

Q1 0...

. . ....

0 Qr

where Q1, . . . , Qr are Jordan matrices corresponding to distinct eigenvalues 1, . . . , r.

6.22 ExampleLet T : C|| 4 C|| 4 have only one eigenvalue, . We can find all possible Jordan matrices

for T; there are 5 distinct such matrices.

6.23 Theorem (Jordan decomposition)Let V be a finite dimensional complex vector space.

(i) For any T : V V there is a Jordan basis B ofV, so that the matrix [T]B is a Jordanmatrix. The latter is unique, up to a permutation of eigenvalues.(ii) Every matrix A C|| nn is similar to a Jordan matrix. The latter is unique, up to apermutation of eigenvalues (i.e., Qjs in 6.21).

Note: the uniqueness of the matrices Qj is achieved by the requirement |J1| |Js|in 6.21.

Note: the Jordan basis B is not unique, not even after fixing the order of eigenvalues.

Proof of Theorem 6.23.(ii) follows from (i).It is enough to prove (i) assuming that T has just one eigenvalue , and then use 6.12.We can even assume that the eigenvalue of T is zero, by switching from T to T I.Hence, we assume that T is nilpotent.

The proof of existence goes by induction on n = dim V. The case n = 1 is trivial.Assume the theorem is proved for all spaces of dimension < n. Consider W := Im V. Ifdim W = 0, then T = 0, so the matrix [T]B is diagonal (actually, zero) in any basis. So,assume that m := dim W 1.

30

7/29/2019 MA 631 Notes Fall 13

31/48

Note that W is T-invariant, and m < n, because n m = nullity T = 0. So, by theinductive assumption there is a basis B in W that is the union of Jordan chains. Let k

be the number of those chains.The last vector in each Jordan chain has a pre-image under T (because it belongs in

W = Im T). So, we can extend each of those k Jordan chains by one more vector. Nowwe get k Jordan chains for the transformation T : V V. By 6.17 all the vectors inthose chains are linearly independent, so they span a subspace of dimension m + k.

Next, consider the space K := Ker T. Note that K0 :=Ker T|W is a subspace of K.The first vectors in our Jordan chains make a basis in K0, so k = dim K0 = nullity T|W.This basis can be extended to a basis in K, and thus we can get r := dim K dim K0new vectors. Note that dim K dim K0 = n m k. Hence, the total number of vectorswe found is n. The last r vectors are eigenvectors, so they make r Jordan chains of length1.

Finally, note that all our vectors are independent, therefore they make a basis in V.To prove the uniqueness, note that for any k 1 the number of Jordan chains in the

Jordan basis that have length k equals rank Tk1 rank Tk, so it is independent of thebasis. This easily proves the uniqueness. 2

6.24 Strategy to find the Jordan matrixGiven a matrix A C|| nn you can find a similar Jordan matrix J A as follows.

First, find all the eigenvalues. Then, for every eigenvalue and k 1 find the numberrk =rank(T I)k. You will get a sequence n > r1 > r2 > > rp = rp+1 = (inview of 6.4). Then rk1 rk is the number of Jordan blocks of length k corresponding

to .

Example: A =

1 0 10 1 1

0 0 1

. Here = 1 is the only eigenvalue, and r1 =rank(AI) = 1.

Then (A I)2 is a zero matrix, so r2 = 0. Therefore, J =

1 1 00 1 0

0 0 1

.

In addition, we can find a Jordan basis in this example, i.e. a basis in which thetransformation is represented by a Jordan matrix. The strategy: pick a vector v1

Im (A I), then take one of its preimages v2 (A I)1

v1, and an arbitrary eigenvectorv3 Ker(A I) independent ofv1. Then {v1, v2} and {v3} are two Jordan chains makinga basis.

31

7/29/2019 MA 631 Notes Fall 13

32/48

7 Minimal Polynomial and Cayley-Hamilton Theo-rem

We continue using the notation V , n, T of Section 6. The statements 7.17.4 below holdfor any field F, but in the rest of the section 7 we work in the complex field.

7.1 LemmaFor any T : V V there exists a nonzero polynomial fP(F) such that f(T) = 0.

If we require that the leading coefficient equal 1 and the degree of the polynomial beminimal, then such a polynomial is unique.

Proof: Recall that L(V, V) is a vector space of a finite dimension, dim L(V, V) = n2. The

transformations I , T , T 2

, . . . belong in L(V, V), so there is a k 1 such that I , T , . . . , T k

are linearly dependent, i.e. c0, . . . , ckF:

c0I + c1T + + ckTk = 0

and we can assume that ck=0. Dividing through by ck gives ck = 1. Lastly, if there aretwo distinct polynomials of degree k and with ck = 1 satisfying the above equation, thentheir difference is a polynomial g of degree < k such that g(T) = 0. This proves theuniqueness. 2

7.2 Definition (Minimal polynomial)

The unique polynomial found in Lemma 7.1 is called the minimal polynomial MT(x)of T : V V. If A is an nn-matrix then the minimal polynomial of TA is denoted byMA(x).

7.3 Examples

(a) A =

0 10 0

. Here A2 = 0, so that the minimal polynomial is MA(x) = x

2. (It

is easy to check that cI + A = 0 for any c, so the degree one is not enough.) Comparethis to the characteristic polynomial CA(x) = x

2.(b) It is an easy exercise to prove CA(A) = 0 for any 2 2 matrix A (over any field F).

Hence MA(x) either coincides with CA(x) or is of degree one. In the latter case AcI = 0for some c F, hence A = cI, in which case CA(x) = (x c)2 and MA(x) = x c.

(c) A =

2 3 10 2 0

0 0 2

. Here A 2I is an upper triangular matrix with zero main

diagonal, so it is nilpotent, see 6.10. It is easy to check that (A 2I)2 = 0, hence theminimal polynomial is MA(x) = (x2)2 = x2 4x+4. Compare this to the characteristicpolynomial CA(x) = (x 2)

3. Note that CA(x) is a multiple of MA(x).

32

7/29/2019 MA 631 Notes Fall 13

33/48

7.4 Corollary(i) Let B be a basis ofV, then MT(x) = M[T]B(x).

(ii) Similar nn-matrices have the same minimal polynomial.

From now on, F = C|| again.

7.5 TheoremLet 1, . . . , r be all distinct eigenvalues of T and Uj the corresponding generalized

eigenspaces. Let mj is the maximum size of Jordan blocks corresponding to the eigenvaluej. Consider the polynomial

p(x) = (x 1)m1 (x r)

mr

Then:(i) degp dim V(ii) if f(x) is a nonzero polynomial such that f(T) = 0, then f is a multiple of p(iii) p(x) = MT(x)

Note: mj is the length of the longest Jordan chain in Uj. Also, mj is the smallest positiveinteger s.t. (T jI)mjv = 0 for all vUj.

Proof:(i) Follows from 6.23.

(ii) Let f(T) = 0. We will show that f(x) is a multiple of (x j)mj for all j. Fix a jand write f(x) = c(x c1)t1 (x cs)ts(x j)t where c1, . . . cs denote the roots of f

other than j (if j is not a root of f, we simply put t = 0). Ift < mj, then there isa vector v Uj such that u := (T jI)tv = 0. Recall that Uj is T-invariant, so it is(T cI)-invariant for any c. Hence, u Uj. Furthermore, each transformation T ciIleaves Uj invariant, and is a bijection of Uj because ci = j. Therefore, the transforma-tion c(T c1I)t1 (T csI)ts is a bijection of Uj, so it takes u to a nonzero vector w.Thus, w = f(T)v = 0, hence f(T) = 0, a contradiction. This proves (ii).(iii) By (ii), MT(x) is a multiple of p(x). It remains to prove that p(T) = 0, then use7.1. To prove that p(T) = 0, recall that V = U1 Ur, and for every v Uj we have(T

jI)mjv = 0. 2

7.6. ExampleLet J be an mm Jordan block matrix for eigenvalue , see 6.14. Then U = C||

m

and (T I)m = 0 (and m is the minimal such power). So, MJ(x) = (x )m. Notethat CJ(x) = MJ(x). In general, though, CA(x) = MA(x).

7.7 Theorem (Cayley-Hamilton)The characteristic polynomial CT(x) is a multiple of the minimal polynomial MT(x).

In particular, CT(T) = 0, i.e. any linear operator satisfies its own characteristic equation.

33

7/29/2019 MA 631 Notes Fall 13

34/48

Proof: Let 1, . . . , r be all distinct eigenvalues of T, and p1, . . . , pr their algebraic mul-

tiplicities. Then CT(x) = (x 1)p1

(x r)pr

. Note that pj = dim Uj mj , wheremj is the maximum size of Jordan blocks corresponding to j . So, by 7.5 the minimalpolynomial MT(x) = (x 1)

m1 (x r)mr divides CT(x). 2

7.8 Corollary(i) CT(x) and MT(x) have the same linear factors.(ii) T is diagonalizable if and only if MT(x) = (x 1) (x r), i.e. MT(x) has nomultiple roots.

7.9 Examples

(a) A = 1 0 01 0 1

1 1 2

. Here CA(x) = (x + 1)2(x 1). Therefore, MT(x) may beeither (x+1)2(x1) or (x+1)(x1). To find it, it is enough to check if (A+I)(AI) = 0,i.e. if A2 = I. This is not true, so MT(x) = (x + 1)

2(x 1). The Jordan form of the

matrix is J =

1 1 00 1 0

0 0 1

.

(b) Assume that CA(x) = (x 2)4(x 3)2 and MA(x) = (x 2)2(x 3). Find allpossible Jordan forms ofA. Answer: there are two Jordan blocks of length one for = 3and two or three Jordan blocks of lengths 2+2 or 2+1+1 for = 2.

34

7/29/2019 MA 631 Notes Fall 13

35/48

8 Norms and Inner Products

This section is devoted to vector spaces with an additional structure - inner product.The underlining field here is either F = IR or F = C|| .

8.1 Definition (Norm, distance)A norm on a vector space V is a real valued function | | | | satisfying

1. ||v|| 0 for all v V and ||v|| = 0 if and only if v = 0.2. ||cv|| = |c| ||v|| for all c F and v V.3. ||u + v|| ||u|| + ||v|| for all u, v V (triangle inequality).

A vector space V = (V, | | | |) together with a norm is called a normed space.In normed spaces, we define the distancebetween two vectors u, v by d(u, v) = ||uv||.

Note that property 3 has a useful implication: ||u| | | |v|| ||u v|| for all u, v V.

8.2 Examples(i) Several standard norms in IRn and C|| n:

||x||1 :=n

i=1

|xi| (1 norm)

||x||2 :=

n

i=1|xi|

2

1/2(2 norm)

note that this is Euclidean norm,

||x||p :=

n

i=1

|xi|p

1/p(p norm)

for any real p [1, ),

||x|| := max1in

|xi| ( norm)

(ii) Several standard norms in C[a, b] for any a < b:

||f||1 :=ba

|f(x)| dx

||f||2 :=

ba

|f(x)|2 dx

1/2

||f|| := maxaxb

|f(x)|

35

7/29/2019 MA 631 Notes Fall 13

36/48

8.3 Definition (Unit sphere, Unit ball)Let | | | | be a norm on V. The set

S1 = {v V : ||v|| = 1}

is called a unit sphere in V, and the set

B1 = {v V : ||v|| 1}

a unit ball (with respect to the norm | | | |). The vectors v S1 are called unit vectors.

For any vector v = 0 the vector u = v/||v|| belongs in the unit sphere S1, i.e. any nonzerovector is a multiple of a unit vector.

The following four theorems, 8.48.7, are given for the sake of completeness, they willnot be used in the rest of the course. Their proofs involve some advanced material ofreal analysis. The students who are not familiar with it, may disregard the proofs.

8.4 TheoremAny norm on IRn or C|| n is a continuous function. Precisely: for any > 0 there is

a > 0 such that for any two vectors u = (x1, . . . , xn) and v = (y1, . . . , yn) satisfying

maxi |xi yi| < we have||u| | | |v|| < .

Proof. Let g = max{||e1||, . . . , ||en||}. By the triangle inequality, for any vector

v = (x1, . . . , xn) = x1e1 + + xnen

we have||v|| |x1| ||e1|| + + |xn| ||en|| (|x1| + + |xn|) g

Hence if max{|x1|, . . . , |xn|} < , then ||v|| < ng.Now let two vectors v = (x1, . . . , xn) and u = (y1, . . . , yn) be close, so that maxi |xi

yi| < . Then ||u| | | |v|| ||u v|| < ngIt is then enough to set = /ng. This proves the continuity of the norm || || as a

function of v.2

8.5 CorollaryLet | | | | be a norm on IRn or C|| n. Denote by

Se1 = {(x1, . . . , xn) : |x1|2 + + |xn|

2 = 1}

the Euclidean unit sphere in IRn (or C|| n). Then the function | | | | is bounded above andbelow on Se1:

0 < minvSe

1

||v|| maxvSe

1

||v|| <

36

7/29/2019 MA 631 Notes Fall 13

37/48

Proof. Indeed, it is known in real analysis that Se1 is a compact set. Note that | | | | isa continuous function on Se1. It is known in real analysis that a continuous function on

a compact set always takes its maximum and minimum values. In our case, the min-imum value of|||| on Se1 is strictly positive, because 0 / S

e1. This proves the corollary. 2

8.6 Definition (Equivalent norms)Two norms, | | | |a and || ||b, on V are said to be equivalent if there are constants

0 < C1 < C2 such that C1 ||u||a/||u||b C2 < for all u = 0. This is an equivalencerelation.

8.7 TheoremIn any finite dimensional space V, any two norms are equivalent.

Proof. Assume first that V = IRn or V = C|| n. It is enough to prove that any norm isequivalent to the 2-norm. Any vector v = (x1, . . . , xn) V is a multiple of a Euclideanunit vector u Se1, so it is enough to check the equivalence for vectors u S

e1, which

immediately follows from 8.5. So, the theorem is proved for V = IRn and V = C|| n. Anarbitrary n-dimensional vector space over F = IR or F = C|| is isomorphic to IRn or C|| n,respectively. 2

8.8 Theorem + Definition (Matrix norm)Let | | | | be a norm on IRn or C|| n. Then

A := sup||x||=1

||Ax|| = supx=0

||Ax||||x||

defines a norm in the space ofnn matrices. It is called the matrix norm induced by ||||.

Proof is a direct inspection.

Note that supremum can be replaced by maximum in Theorem 8.8. Indeed, one can ob-viously write A = sup||x||2=1 ||Ax||/||x||, then argue that the function ||Ax|| is continuouson the Euclidean unit sphere Se1 (as a composition of two continuous functions, Ax and| | | |), then argue that ||Ax||/||x|| is a continuous function, as a ratio of two continuousfunctions, of which ||x|| = 0, so that ||Ax||/||x|| takes its maximum value on Se1.

Note that there are norms on IRnn that are not induced by any norm on IRn, for example||A|| := maxi,j |aij| (Exercise, use 8.10(ii) below).

8.9 Theorem(i) ||A||1 = max1jn

ni=1 |aij| (maximum column sum)

(ii) ||A|| = max1inn

j=1 |aij| (maximum row sum)

37

7/29/2019 MA 631 Notes Fall 13

38/48

Note: There is no explicit characterization of ||A||2 in terms of the aij.

8.10 TheoremLet | | | | be a norm on IRn or C|| n. Then(i) ||Ax|| ||A||||x|| for all vectors x and matrices A.(ii) ||AB|| ||A||||B|| for all matrices A, B.

8.11 Definition (Real Inner Product)Let V be a real vector space. A real inner product on V is a real valued function on

V V, denoted by , , satisfying1. u, v = v, u for all u, v V2. cu,v = cu, v for all c IR and u, v V3. u + v, w = u, w + v, w for all u,v,w V4. u, u 0 for all u V, and u, u = 0 iff u = 0

Comments: 1 says that the inner product is symmetric, 2 and 3 say that it is linear inthe first argument (the linearity in the second argument follows then from 1), and 4 saysthat the inner product is non-negative and non-degenerate (just like a norm). Note that0, v = u, 0 = 0 for all u, v V.

A real vector space together with a real inner product is called a real inner product space,or sometimes a Euclidean space.

8.12 Examples(i) V = IRn: u, v =

ni=1 uivi (standard inner product). Note: u, v = u

tv = vtu.(ii) V = C([a, b]) (real functions): f, g =

ba f(x)g(x) dx

8.13 Definition (Complex Inner Product)Let V be a complex vector space. A complex inner product on V is a complex valued

function on V V, denoted by , , satisfying1. u, v = v, u for all u, v V2, 3, 4 as in 8.11.

A complex vector space together with a complex inner product is called a complex innerproduct space, or sometimes a unitary space.

8.14 Simple properties(i) u, v + w = u, v + u, w(ii) u,cv = cu, v

The properties (i) and (ii) are called conjugate linearity in the second argument.

38

7/29/2019 MA 631 Notes Fall 13

39/48

8.15 Examples(i) V = C|| n: u, v =

ni=1 uivi (standard inner product). Note: u, v = u

tv = vtu.

(ii) V = C([a, b]) (complex functions): f, g =ba f(x)g(x) dx

Note: the term inner product space from now on refers to either real or complex innerproduct space.

8.16 Theorem (Cauchy-Schwarz-Buniakowsky inequality)Let V be an inner product space. Then

|u, v| u, u1/2v, v1/2

for all u, v V.

The equality holds if and only if {u, v} is linearly dependent.

Proof. We do it in the complex case. Assume that v = 0. Consider the function

f(z) = u zv,u zv

= u, u zv, u zu, v + |z|2v, v

of a complex variable z. Let z = rei and u, v = sei be the polar forms of the numbersz and u, v. Set = and assume that r varies from to , then

0 f(z) = u, u 2sr + r2v, v

Since this holds for all r IR (also for r < 0, because the coefficients are all nonnegative),the discriminant has to be 0, i.e. s2 u, uv, v 0. This completes the proof inthe complex case. It the real case it goes even easier, just assume z IR. The equalitycase in the theorem corresponds to the zero discriminant, hence the above polynomialassumes a zero value, and hence u = zv for some z C|| . (We left out the case v = 0, doit yourself as an exercise.) 2

8.17 Theorem + Definition (Induced norm)If V is an inner product vector space, then ||v|| := v, v1/2 defines a norm on V. It

is called the induced norm.

To prove the triangle inequality, you will need 8.16.

8.18 ExampleThe inner products in Examples 8.12(i) and 8.15(i) induce the 2-norm on IRn and C|| n,

respectively.The inner products in Examples 8.12(ii) and 8.15(ii) induce the 2-norm on the spaces

C[a, b] of real and complex functions, respectively.

39

7/29/2019 MA 631 Notes Fall 13

40/48

8.19 TheoremLet V be a real vector space with norm | | | |. The norm | | | | is induced by an inner

product if and only if the function

u, v :=1

4

(||u + v||2 ||u v||2

)(polarization identity)

satisfies the definition of an inner product. In this case |||| is induced by the above innerproduct.

Note: A similar but more complicated polarization identity holds in complex inner prod-uct spaces.

40

7/29/2019 MA 631 Notes Fall 13

41/48

9 Orthogonal vectors

In this section, V is always an inner product space (real or complex).

9.1 Definition (Orthogonal vectors)Two vectors u, v V are said to be orthogonal if u, v = 0.

9.2 Example(i) The canonical basis vectors e1, . . . , en in IR

n or C|| n with the standard inner productare mutually (i.e., pairwise) orthogonal.

(ii) Any vectors u = (u1, . . . , uk, 0, . . . , 0) and v = (0, . . . , 0, vk+1, . . . , vn) are orthog-onal in IRn or C|| n with the standard inner product.

(iii) The zero vector 0 is orthogonal to any vector.

9.3 Theorem (Pythagoras)If u, v = 0, then ||u + v||2 = ||u||2 + ||v||2.

Inductively, it follows that if u1, . . . , uk are mutually orthogonal, then

||u1 + + uk||2 = ||u1||

2 + + ||uk||2

9.4 TheoremIf nonzero vectors u1, . . . , uk are mutually orthogonal, then they are linearly indepen-

dent.

9.5 Definition (Orthogonal/Orthonormal basis)A basis {ui} in V is said to be orthogonal, if all the basis vectors ui are mutually

orthogonal. If, in addition, all the basis vectors are unit (i.e., ||ui|| = 1 for all i), thenthe basis is said to be orthonormal, or an ONB.

9.6 Theorem (Fourier expansion)If B = {u1, . . . , un} is an ONB in a finite dimensional space V, then

v =n

i=1

v, uiui

for every v V, i.e. ci = v, ui are the coordinates of the vector v in the basis B. Onecan also write this as [v]tB = (v, u1, . . . , v, un).

Note: the numbers v, ui are called the Fourier coefficientsofv in the ONB {u1, . . . , un}.

For example, in IRn or C|| n with the standard inner product, the coordinates of any vectorv = (v1, . . . , vn) satisfy the equations vi = v, ei.

41

7/29/2019 MA 631 Notes Fall 13

42/48

9.7 Definition (Orthogonal projection)Let u, v V, and v = 0. The orthogonal projection of u onto v is

Prvu =u, v

||v||2v

Note that the vector w := u Prvu is orthogonal to v. Therefore, u is the sum of twovectors, Prvu parallel to v, and w orthogonal to v (see the diagram below).

Figure 1: Orthogonal projection of u to v

B

Ec

u

v||u|| cos (v/||v||)

9.8 Definition (Angle)In the real case, for any nonzero vectors u, v V let

cos =u, v

||u||||v||

By 8.16, we have cos [1, 1]. Hence, there is a unique angle [0, ] with this valueof cosine. It is called the angle between u and v.

Note that cos = 0 if and only if u and v are orthogonal. Also, cos = 1 if and only ifu, v are proportional, v = cu, then the sign of c coincides with the sign of cos .

9.9 TheoremLet B = {u1, . . . , un} be an ONB in a finite dimensional space V. Then

v =n

i=1

Pruiv =n

i=1

ui cos i

where i is the angle between ui and v.

42

7/29/2019 MA 631 Notes Fall 13

43/48

9.10 Theorem (Gram-Schmidt)Let nonzero vectors w1, . . . , wm be mutually orthogonal. For v V, set

wm+1 = v n

i=1

Prwiv

Then the vectors w1, . . . , wm+1 are mutually orthogonal, and

span{w1, . . . , wm, v} = span{w1, . . . , wm, wm+1}

In particular, wm+1 = 0 if and only if y span{w1, . . . , wm}.

9.11 Algorithm (Gram-Schmidt orthogonalization)

Let {v1, . . . , vn} be a basis in V. Define

w1 = v1

and then inductively, for m 1,

wm+1 = vm+1 mi=1

Prwivm+1

= vm+1 mi=1

vm+1, wi

||wi||2wi

This gives an orthogonal basis {w1, . . . , wn}, which agrees with the basis {v1, . . . , vn} inthe following sense:

span{v1, . . . , vm} = span{w1, . . . , wm}

for all 1 m n.The basis {w1, . . . , wn} can be normalized by ui = wi/||wi|| to give an ONB {u1, . . . , un}.

Alternatively, an ONB {u1, . . . , un} can be obtained directly by

w1 = u1 u1 = w1/||w1||

and inductively for m 1

wm+1 = vm+1 mi=1

vm+1, uiui um+1 = wm+1/||wm+1||

9.12 ExampleLet V = Pn(IR) with the inner product given by f, g =

10 f(x)g(x) dx. Applying

Gram-Schmidt orthogonalization to the basis {1, x , . . . , xn} gives the first n + 1 of the socalled Legendre polynomials.

43

7/29/2019 MA 631 Notes Fall 13

44/48

9.13 CorollaryLet W V be a finite dimensional subspace of an inner product space V, and

dim W = k. Then there is an ONB {u1, . . . , uk} in W. If, in addition, V is finite di-mensional, then the basis {u1, . . . , uk} ofW can be extended to an ONB {u1, . . . , un} ofV.

9.14 Definition (Orthogonal complement)Let S V be a subset (not necessarily a subspace). Then

S := {v V : v, w = 0 for all w S}

is called the orthogonal complement to S.

9.15 Theorem

S is a subspace of V. If W =span S, then W = S.

9.16 ExampleIf S = {(1, 0, 0)} in IR3, then S =span{(0, 1, 0), (0, 0, 1)}.Let V = C[a, b] (real functions) with the inner product from 8.12(ii). Let S = {f

const} (the subspace of constant functions). Then S = {g :ba g(x) dx = 0}. Note that

V = S S, see 1.34(b).

9.17 TheoremIf W is a finite dimensional subspace of V, then

V = W W

Proof. By 9.13, there is an ONB {u1, . . . , uk} of W. For any v V the vector

v k

i=1

v, uiui

belongs in W. Hence, V = W + W. The linear independence of W and W followsfrom 9.4. 2

Note: the finite dimension ofW is essential. Let V = C[a, b] with the inner product from8.12(ii) and W V be the set of real polynomials restricted to the interval [a, b]. ThenW = , and at the same time V = W.

9.18 Theorem (Parcevals identity)Let B = {u1, . . . , un} be an ONB in V. Then

v, w =n

i=1

v, uiw, ui = [v]tB[w]B = [w]

tB[v]B

44

7/29/2019 MA 631 Notes Fall 13

45/48

for all v, w V. In particular,

||v||2

=

ni=1

|v, ui|2

= [v]tB[v]B

Proof. Follows from 9.6. 2

9.19 Theorem (Bessels inequality)Let {u1, . . . , un} be an orthonormal subset of V. Then

||v||2 n

i=1

|v, ui|2

for all v V.

Proof. For any v V the vector

w := v n

i=1

v, uiui

belongs in {u1, . . . , un}. Hence,

||v||2 = ||w||2 +n

i=1

|v, ui|2

9.20 Definition (Isometry)

Let V, W be two inner product spaces (both real or both complex). An isomorphismT : V W is called an isometry if it preserves the inner product, i.e.

T v , T w = v, w

for all v, w V. In this case V and W are said to be isometric.

Note: it can be shown by polarization identity that T : V W preserves inner productif and only if T preserves the induced norms, i.e. ||T v|| = ||v|| for all v V.

9.21 Theorem Let dim V < . A linear transformation T : V W is an isometry if

and only if whenever {u1, . . . , un} is an ONB in V, then {T u1, . . . , T un} is an ONB in W.

9.22 Corollary Finite dimensional inner product spaces V and W (over the same field)are isometric if and only if dim V = dim W.

9.23 ExampleLet V = IR2 with the standard inner product. Then the maps defined by matrices

A1 =

0 11 0

and A2 =

1 00 1

are isometries of IR2.

45

7/29/2019 MA 631 Notes Fall 13

46/48

10 Operators in Inner Product Spaces

Definition 10.1 [isometries] Let V and W be inner product spaces. An isomorphismT : V W is called an isometry if

T v , T w = v, w for all v, w V .

In this case V and W are called isometric. IfV = W is a complex (real) inner productspace, then an isometry T : V V is called a unitary (orthogonal) operator.

Proposition 10.2 [Properties of isometries](a) T v , T w v, w V T v = v v V.(b) If dim V < and {u1, . . . , un} is an ONB in V, then T : V W is an isometry

if and only if {T u1, . . . , T un} is an ONB in W.(c) Finite dimensional inner product spaces V and W are isometric if and only if

dim V = dim W.(d) The unitary operator on a complex inner product space V are a subgroup U(V)

of GL(V). The orthogonal operators on a real inner product space are a subgroup O(V)of GL(V).

Definition 10.3 [orthogonal, unitary matrices] A matrix Q IRnn is called orthogonalif QQt = I. A matrix U C|| nn is called unitary if U UH = I. (Here UH = Ut is theHermitean transpose.)

Theorem 10.4 (a) Let Q IRnn. Then the following are equivalent:(i) Q is orthogonal,(ii) Qt is orthogonal,(iii) The rows of Q are on ONB of IRn,(iv) The columns of Q are on ONB of IRn,(v) Qu,Qv = u, v for all u, v IRn (where , is the standard inner product).(b) A completely analogous result holds for unitary matrices in C|| nn.

Corollary 10.5 (a) The orthogonal matrices in IRnn form a subgroup O(n) ofGL(n, IR).The unitary matrices in C|| nn form a subgroup U(n) of GL(n, C|| ).

(b) If Q O(n), then det Q {1, 1}. If U U(n), then | det U| = 1.

Lemma 10.6 Let T L(V), B = {u1, . . . , un} be an ONB of V and A := [T]B. Then

Ak,j = T uj, uk for all j, k.

Theorem 10.7 (a) Let V be a real (complex) inner product space and B an ONB inV. Then T L(V) is orthogonal (unitary) if and only if [T]B is an orthogonal (unitary)matrix.

46

7/29/2019 MA 631 Notes Fall 13

47/48

Theorem 10.8 IfQ IRnn is orthogonal, then (Q) {1, 1}. IfU C|| nn is unitary,then (U) {z C|| : |z| = 1}.

Theorem 10.9 [Riesz] Let V be a finite-dimensional inner product space and f V

(i.e. f : V F a linear functional). Then there exists a unique v V such that

f(u) = u, v for all u V .

Note: For each fixed v V the mapping u u, v defines a linear functional on V.The Riesz Theorem says that all linear functionals are of this form and thereby leads toa natural identification of V with V via the mapping v , v.

Theorem+Definition 10.10 [adjoint operator] Let V be a finite-dimensional inner

product space and T L(V). Then there exists a unique T L(V) such that

Tu , v = u, Tv for all u, v V .

T is called the adjoint operator of T.

Proposition 10.11 [Properties of adjoints] Let T, S L(V) and c F. Then(i) (T + S) = T + S,(ii) (cT) = cT,(iii) (T S) = ST,(iv) (T) = T.

Theorem 10.12 Let V be a real (complex) finite-dimensional inner product space, Ban ONB ofV, and T L(V). Then

[T]B = [T]tB ([T]

HB ).

Definition 10.13 [selfadjoint operators, symmetric and hermitean matrices]An operator T L(V) on a finite-dimensional inner product space V is called selfad-

joint if T = T.A matrix A IRnn is called symmetric if A = At, i.e. Akj = Ajk for all 1 j, k n.A matrix A C|| nn is called hermitean if A = AH, i.e. A

kj= A

jkfor all 1 j, k n.

Corollary 10.14 Let V be a finite-dimensional real (complex) inner product space andB an ONB of V. Then T L(V) is selfadjoint if and only if [T]B is symmetric (her-mitean).

Theorem 10.15 Let T be selfadjoint on V. Then(a) (T) IR.(b) If 1, 2 (T), 1 = 2 and v1, v2 are corresponding eigenvectors, then v1 v2.(c) T has an eigenvalue.

47

7/29/2019 MA 631 Notes Fall 13

48/48

Theorem 10.16 U L(V) is orthogonal (unitary) if and only if U = U1.

Definition 10.17 [orthogonally, unitarily equivalent matrices] Two matrices A, B IRnn (C|| nn) are called orthogonally equivalent (unitarily equivalent) if there exists anorthogonal matrix O (unitary matrix U) such that B = O1AO (B = U1AU).

Note: If two matrices are orthogonally (unitarily) equivalent, then they are similar, butnot vice versa.

Lemma 10.18 Let V be a finite-dimensional inner product space, T L(V), and W V a T-invariant subspace, i.e. T w W for all w W.

(a) If T is selfadjoint, then W is T-invariant.(b) If T is an isometry, then W is T-invariant.

Theorem 10.19 [Spectral theorem for selfadjoint operators]Let V be a finite-dimensional inner product space and T L(V) selfadjoint. Then

there exists an ONB of V consisting entirely of eigenvectors of T. In particular, T isdiagonalizable.

Theorem 10.20 IfA is symmetric in IRnn (hermitean in C|| nn), then A is orthogonally(unitarily) equivalent to a diagonal matrix.

Note: (1) Theorems 10.19 and 10.20 carry over to unitary operators on complex finite-

dimensional inner product spaces, respectively to unitary matrices, with essentially thesame proofs. They do not hold for orthogonal operators and matrices, due to the lack ofan analogue of Theorem 10.15(c). However the following can be shown: Let T : V Vbe an orthogonal operator on a finite-dimensional real inner product space V. ThenV = V1 . . . Vm, where Vi are mutually orthogonal subspaces, each Vi is a T-invariantone- or two-dimensional subspace of V.

(2) Much of the above theory for selfadjoint and unitary operators can be carried outfor the more general case of normal operators T, i.e. operators such that TT = T T.

MA 631 Notes Fall 13

Documents

Transcript of MA 631 Notes Fall 13