“notes2” 2013/1/16 page 1

29
“notes2” 2013/1/16 page 1 1 ¡¡¡¡¡¡¡ .mine ======= ¿¿¿¿¿¿¿ .r356 ¡¡¡¡¡¡¡ .mine ======= ¿¿¿¿¿¿¿ .r356

Transcript of “notes2” 2013/1/16 page 1

Page 1: “notes2” 2013/1/16 page 1

“notes2”2013/1/16page 1

i

i

i

i

1

¡¡¡¡¡¡¡ .mine ======= ¿¿¿¿¿¿¿ .r356 ¡¡¡¡¡¡¡ .mine =======¿¿¿¿¿¿¿ .r356

Page 2: “notes2” 2013/1/16 page 1

“notes2”2013/1/16page 2

i

i

i

i

2

Page 3: “notes2” 2013/1/16 page 1

“notes2”2013/1/16page 3

i

i

i

i

Contents

1 Review and Background 51.1 Vector Space, Basis, Linear Independence . . . . . . . . . . . . . 51.2 Linear Operator, Linear Transformation . . . . . . . . . . . . . 9

1.2.1 Matrix of a Transformation . . . . . . . . . . . . . . 111.2.2 Change of Basis, Matrix case . . . . . . . . . . . . . 12

1.3 Range, Kernel, Onto, One-to-one, invertibility . . . . . . . . . . 131.3.1 Range . . . . . . . . . . . . . . . . . . . . . . . . . . 131.3.2 Kernel and Null Space . . . . . . . . . . . . . . . . . 141.3.3 One-to-one, Onto, Invertible . . . . . . . . . . . . . 151.3.4 Isomorphic . . . . . . . . . . . . . . . . . . . . . . . 161.3.5 Rank Plus Nullity . . . . . . . . . . . . . . . . . . . 17

1.4 A Few Words About Solutions of Linear Systems of Equations . 171.5 Sum vs. Direct Sum of Subspaces . . . . . . . . . . . . . . . . . 181.6 Extending a Linearly Independent Set to a Basis . . . . . . . . . 191.7 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20¡¡¡¡¡¡¡ .mine1.8 Brief Review: Eigenpairs, Eigenspace, Diagonalizability . . . . . 23

1.8.1 Similarity and Diagonalizability . . . . . . . . . . . 251.9 Partitioned Matrices . . . . . . . . . . . . . . . . . . . . . . . . 26

2 Invariant Subspaces 292.1 Toward a Direct Sum Decomposition . . . . . . . . . . . . . . . 33

3 Jordan, Generalized, and Stuff 373.1 Triangular Form . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.1.1 Generalized Eigenvectors . . . . . . . . . . . . . . . 383.1.2 A Basis for an Invariant Subspace: Case Study . . . 393.1.3 Block Partitions of Matrices . . . . . . . . . . . . . 42

4 Inner Product Spaces and Normed Spaces 45

5 Singular Value Decomposition 475.1 Matrix 2-norm and Frobenius norm . . . . . . . . . . . . . . . . 47

5.1.1 Motivation for SVD in terms of Semi-Positive Definite Matrices 48

3

Page 4: “notes2” 2013/1/16 page 1

“notes2”2013/1/16page 4

i

i

i

i

4 Contents

5.2 The Singular Value Decomposition (SVD) . . . . . . . . . . . . 48=======1.8 Brief Review: Eigenpairs, Eigenspace, Diagonalizability . . . . . 22

1.8.1 Similarity and Diagonalizability . . . . . . . . . . . 241.9 Partitioned Matrices . . . . . . . . . . . . . . . . . . . . . . . . 25

2 Invariant Subspaces 272.1 Toward a Direct Sum Decomposition . . . . . . . . . . . . . . . 31

3 Jordan, Generalized, and Stuff 353.1 Triangular Form . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.1.1 Generalized Eigenvectors . . . . . . . . . . . . . . . 363.1.2 A Basis for an Invariant Subspace: Case Study . . . 373.1.3 Block Partitions of Matrices . . . . . . . . . . . . . 40

4 Inner Product Spaces and Normed Spaces 43

5 Singular Value Decomposition 475.1 Matrix 2-norm and Frobenius norm . . . . . . . . . . . . . . . . 47

5.1.1 Motivation for SVD in terms of Semi-Positive Definite Matrices 485.2 The Singular Value Decomposition (SVD) . . . . . . . . . . . . 49¿¿¿¿¿¿¿ .r356

Page 5: “notes2” 2013/1/16 page 1

“notes2”2013/1/16page 5

i

i

i

i

List of Contributors

M. E. KilmerTufts University

E. de SturlerVirginia Tech

5

Page 6: “notes2” 2013/1/16 page 1

“notes2”2013/1/16page 6

i

i

i

i

6 List of Contributors

Page 7: “notes2” 2013/1/16 page 1

“notes2”2013/1/16page 7

i

i

i

i

Chapter 1

Review and Background

Warning Label:

And I do not fear to say that those who are heldWise among men and who search the reasons of thingsAre those who bring the most sorrow on themselves.

(Medea, Euripides)

1.1 Vector Space, Basis, Linear Independence

Definition 1.1. A vector space is a nonempty set V of objects called vectors, onwhich are defined two operations, called additon and multiplication by scalars whichare subject to the conditions in the list below.

For V to be a vector space, these axioms must hold for all vectors u,v,w ∈ V

and for all scalars c, d in the set of scalars (either R or C).

1. v + u ∈ V (closed under addition)

2. v + u = u+ v

3. (u+ v) +w = u+ (v +w)

4. There is in V a “zero”, denoted 0, such that u+ 0 = u

5. For each u in V , ∃ an additive inverse, denoted −u, such that u+ (−u) = 0.

6. For each u in V and each scalar c, cu ∈ V (closed under scalar multiplication)

7. c(u+ v) = cu+ cv

8. (c+ d)u = cu+ du

7

Page 8: “notes2” 2013/1/16 page 1

“notes2”2013/1/16page 8

i

i

i

i

8 Chapter 1. Review and Background

9. c(du) = (cd)u

10. 1u = u

Before proceeding, let us distinguish between a real vector space V and acomplex vector space V . In the former, the term scalar refers to only the setof (field of) real numbers, R. In the latter, scalars are elements of the set of (fieldof) complex numbers, C, which of course includes the reals. It theory, it is possibleto define vector spaces over other fields besides R and C, but the discussion is bestleft until after a course on modern algebra. For this class, we will consider only realvector spaces (the scalars can only be real-valued) or complex vector spaces (thescalars can be complex-valued), where the type of space should be clear from thecontext.

As a warmup, let’s start this section with a proof of the following theorem.

Theorem 1.2. The zero in a vector space V must be unique.

Proof. Given any u ∈ V , if w is a “zero” in V , then with the definition of azero in Axiom 4, we have u + w = u. Together with Axiom 2, we also observeu + w = u = w + u. Suppose 0 also represents a zero in the vector space. Sinceit’s an element of the vector space, we can substitute it in for u in the previousequation to get 0+w = 0 = w+0 = w, where the last equality comes from Axiom4.

Examples of Vector Spaces:

1. The spaces Rn, n ≥ 1. (real vector spaces)

2. The spaces Cn, n ≥ 1. (complex vector spaces; elements are column vectors

with real or complex entries; use field of complex numbers as the scalars inthe above).

3. The set of real-valued polynomials Pn. Elements are p(t) = a0+a1t+· · ·+antn,

of degree n, n ≥ 1. (real vector space – the coefficients, independent variableare all real, use real scalars.) The “0” is the constant polynomial p(t) = 0.

4. The set of complex-valued polynomials Πn. The elements are p(z) = c0 +c1z+ · · ·+ cnz

n of degree n, n ≥ 1. (complex vector space – coefficients couldbe complex, indept variable could be complex, use C for the scalars). Thezero is the constant polynomial p(z) = 0.

5. The infinite dimensional1 space R∞. Its vectors have infinitely many compo-nents x = {3, 4, 8.9, . . .}, but rules of addition and scalar multiplication stillapply.

6. The set of all real-valued, continuous functions of a single variable definedon a set D (D might be an interval on the real line, for example). Usually

1we will come back to dimension shortly

Page 9: “notes2” 2013/1/16 page 1

“notes2”2013/1/16page 9

i

i

i

i

1.1. Vector Space, Basis, Linear Independence 9

denoted C(D), or C[a, b]. Here, elements of a vector space look like f(t), g(t).Two vectors are equal iff their values are equal ∀t ∈ D.

7. The space of 3× 3 complex-valued matrices (complex vector space), denotedC3×3. The “vector” elements in this case are 3 × 3 matrices with real orcomplex entries. Vector addition is componentwise addition; multiplicationby a scalar means multiply each matrix entry by the scalar in question.

Definition 1.3. A subspace, H, of a vector space V is a nonempty subset thatsatisfies two requirements:

• For every u,v ∈ H, u+ v ∈ H. (H is closed under vector addition)

• For every u ∈ H, and any scalar c (in the same field as we used to define V ),cu ∈ H (H is closed under scalar multiplication).

NOTE: Subspaces are themselves vector spaces!

Example 1.1. The set of all lower triangular 3 × 3 complex valued matrices iseasily proved to be a subspace of the vector space C3×3. On the other hand, the setof all real-valued lower triangular 3 × 3 matrices is not a subspace of C3×3 – whynot??

Recall that a linear combination of vectors in a vector space refers to anyFINITE sum of scalar multiples (use the scalars appropriate to the vector space) ofvectors in the vector space. We use, for a finite positive integer n,

span(v1, . . . ,vn) := {α1v1+ · · ·+αnvn|α1, . . . , αnany scalars that define the v.s.},

to denote the set of all linear combinations of the vectors v1, . . . ,vn. Remem-ber, too, that the set spanned by v1, . . . ,vn is a subspace of V .

Definition 1.4. We also need to remember that the indexed set of vectors {v1, . . . ,vn}in V is said to be linearly independent if the equation

c1v1 + · · ·+ cnvn = 0 (1.1)

has only the trivial solution (ci = 0). If the equation (1.1) has a non-trivial solution(i.e. at least one ci is non-zero) solution, the set is dependent.

NOTE: There is a difference between checking for linear dependence in Rn

(Cn) and a general vector space. To check a set in Rn for dependence results inlooking for non-trival solutions to a matrix equation Ax = 0, where the columns ofA are the vi to be checked for independence. But in the case of a general vectorspace, one must rely directly on the definition of independence/dependence.

Example 1.2. p1(z) = 1, p2(z) = 2i+z, p3(z) = z. The set {p1, p2, p3} is dependentbecause (−2i)1 + (1)(2i+ z) + (−1)z = 0 for any z.

Page 10: “notes2” 2013/1/16 page 1

“notes2”2013/1/16page 10

i

i

i

i

10 Chapter 1. Review and Background

Example 1.3. The set {sin(t), cos(t)} is linearly independent in C[0, 1]. This isbecause the only way that

c1 sin(t) + c2 cos(t) = 0

for ALL VALUES OF t, is if c1 = c2 = 0.

Finally, we need to recall the following.

Definition 1.5. Let V be a vector space. Then the set2 B = {b1, . . . ,bn} ofelements in V is called a basis for V if

• B is a linearly independent set;

• V = span(b1, . . .bn)

In other words, a basis provides the minimal amount of information (non-redundant information) necessary to be able to reconstruct any element of V .

Definition 1.6. The number of elements in the basis for the vector space is referredto as its dimension.

We can now distinguish between infinite dimensional and finite dimensionalsubspaces as those needing an infinite dimensional basis set vs. those requiringa finite dimensional basis. The spaces R

∞ and C[0, 1] are examples of infinitedimensional vector spaces.

NOTE: Unless otherwise stated, we are talking about finite dimensional vec-tor spaces – n in the basis definition will represent a fixed integer – and lineartransformations and operators between finite dimensional vector spaces. However,we will from time to time explicitly discuss the infinite dimensional case. Similarly,although much of the theory holds for vector spaces over some field, as noted at thebeginning, we focus (for practical purposes) on vector spaces of the real or complexfield.

Exercise 1.1. Show that if V is any finite n-dimensional subspace, any collectionof n linearly independent elements in V must be a basis for V .

In particular, the representation of any element of V in the given basis is alsounique:

Theorem 1.7 (Unique Representation Theorem). Let B = {b1, . . . ,bn} be abasis for a vector space V . Then for any v ∈ V , there exists a unique set of scalars(also called expansion coefficients) such that

v = c1b1 + . . . cnbn.

2n could represent infinity here.

Page 11: “notes2” 2013/1/16 page 1

“notes2”2013/1/16page 11

i

i

i

i

1.2. Linear Operator, Linear Transformation 11

Thus, if someone gives you the basis, and the expansion coefficients, you canget v back again. But if you change the basis, the representation, in general, willchange.

Consider P2, which is the set of all real-valued polynomials of degree 2 orless. The standard basis for this space {1, t, t2}. Any vector in P2 can be writtenuniquely as p(t) = (c1)1 + c2t + c3t

2. So if my neighbor just transmitted to me

the vector

c1c2c3

in R3, and I knew he/she had used the standard basis, I could

reconstruct the corresponding polynomial in P2. However, if I change the basis to,say, {4, 1− t, 3t2} (you should be able to verify that this is a basis for P2) then youexpect the expansion coefficients of a particular polynomial in the standard basisvs. this basis will in general be different.

In fact, for ANY finite dimensional vector space, once you establish the basis,it’s OK to think in terms of vectors (in Rn or Cn) of the expansion coefficientsrather than the original vector space. There is an isomorphism here. We willcome back to this after we discuss linear transformations in the next section.

1.2 Linear Operator, Linear Transformation

Let V and W be vector spaces. Then L is linear transformation on V if L : V →W

and the following two conditions hold:

1. L(f + g) = L(f) + L(g) for every f, g ∈ V .

2. L(αf) = αL(f) for every scalar α (in the appropriate field) and every f ∈ V .

Note that in particular item 2 implies that L(0V ) = 0W , where we have used thesubscript to indicate that the element that plays the role of 0 in V may be differentfrom the 0 in W .

NOTATIONAL ASIDE: Although in the above, we have written the lineartransformation using a function-type notation (i.e. the input argument on which Lacts appears in parenthesis to the right of L) it is not uncommon in textbooks toomit the parenthesis. In other words, L(f) and Lf mean the same thing.

Example 1.4. Differentiation. Suppose that V is the set of all continuously differ-entiable real-valued functions on [0, 1] (notation for this is C1[0, 1], the superscriptindicates that the functions in question are at least once differentiable on [0, 1], andthat those derivatives are continuous.) Let f(t) ∈ V , and define L(f) = f ′(t).Check that this is a linear transformation. What is W?

We might also express this using differential operator notation – for instance,Dk acting on a k-times differentiable function means take k derivatives of thatfunction. So, here, L(f) := D(f(t)) = f ′(t). Likewise, if V is the set of all k timesdiff’ble functions on [0, 1], then L(f) := Dkf is a linear transformation.

As noted in any undergraduate course on ordinary differential equations, if weform a linear combination of linear differential operators, it will still define a linear

Page 12: “notes2” 2013/1/16 page 1

“notes2”2013/1/16page 12

i

i

i

i

12 Chapter 1. Review and Background

transformation: e.g. ifL = c0 + c1D + c2D

2

thenL(f) = c0 + c1D(f(t)) + c2D

2(f(t)) = c0 + c1f′(t) + c2f

′′(t).

is a linear transformation ( here V = C2[0, 1], which is the set of all twice continu-ously differentiable functions on [0, 1], while W = C[0, 1]).

Exercise 1.2. Note that, in the above example, we can replace the constant coeffi-cients above with coefficients that are a function of t, and it would still give a linear(with respect to f) transformation. Show this.

Example 1.5. Suppose u is a function of two independent variables, x, y. Recallfrom 3rd semester calculus the definition of a partial derivative. Notationally, wemay use ux, uxx, uxy etc. to denote partials (first partial with respect to x, 2ndpartial with respect to x or the partial with respect to x then y, respectively).

Let V denote the (infinite dimensional) space of functions of 2 variables withan infinite number of partial derivatives in both x and y on closed intervals of thereal line V = C∞[a, b]× C∞[c, d]. Consider the linear transformation defined by

Lu = uxx(x, y) + uyy(x, y)− γ2u(x, y)

for any given non-negative real constant γ. L is a linear transformation between twoinfinite dimensional vector spaces; note due to choice of V , W = V . The followingtherefore represents a partial differential equation in operator notation:

Lu = g(x, y)

for some given g(x, y) ∈ W . In a PDEs course you are looking for functionsu(x, y) ∈ V which satisfy this equation.

Example 1.6. Integration. Suppose V = C[0, 1]. Define L(f) =∫f(t)dt. Check

that this is a linear transformation from V to V .

By linear operator we mean that the linear map is between the vector spaceV and the same space V . That is, it is a linear transformation in which W is V .The distinction may seem somewhat subtle. Consider the following examples.

Example 1.7. Let A =

4 −1 20 1 −34 0 −1

. Then if L(x) := Ax, L : V → V ,

where V = R3. So in this example, L is a linear operator (and by default, a linear

transformation). On the other hand, if B =

−1 20 10 1

, then with L(x) = Bx,

L : R2 → R3, so that in the latter case, L defines a linear transformation, but nota linear operator.

Page 13: “notes2” 2013/1/16 page 1

“notes2”2013/1/16page 13

i

i

i

i

1.2. Linear Operator, Linear Transformation 13

Example 1.8. Let L = D, the first derivative operator. Then L is a lineartransformation on C1[0, 1] (to C[0, 1]), but not a linear operator (because there arefunctions whose first derivatives are continuous on [0,1] but not differentiable).

Now let L = D2, the 2nd derivative operator. Then L is a linear operatorfrom C∞[0, 1] to C∞[0, 1]. Note that in our previous PDE example, the choice ofV gives us a linear operator as well.

Example 1.9. The identity operator is a linear operator between V and V that doesnot change the input, and is denoted I. For instance, if V = Πn, I(p(z)) = p(z).

One very important linear transformation is the one alluded to in the previoussection which maps from V to the set of basis expansion coefficients for a fixed basison V .

Example 1.10. Let p(t) ∈ P2. Using the standard basis for P2, we may write

p(t) = a0 + a1t+ a2t2

We can define L(p(t)) =

a0a1a2

, so that L : P2 → R3. Check that this is a linear

transformation.

Mapping to the Basis Expansion Coefficients The previous example illustratesa linear transformation that is so fundamental that we (along with most linearalgebra textbooks) have a notation for it. Given a basis, B, for an n-dimensionalvector space V , if x ∈ V , it can be expanded in this basis. The vector of expansioncoeffients in this basis is of length n, and will be denoted using brackets withsubscript: L(x) = [x]B.

In Example , then, if B denotes the standard basis {1, t, t2}, we would have

[p]B =

a0a1a2

. Specifically, if p(t) = 4− 3t+ 8t2, [p]B =

4−38

. On the other

hand, if B = {2, 1− t, 3t2} then [p]B =

12383

.

1.2.1 Matrix of a Transformation

A picture is worth a thousand words here - we’ll do this in class, but you should beable to find one in most LA textbooks.

Suppose L : V → W , where (for simplicity) we assume V,W are real vectorspaces (the same derivation can be done if both are complex vector spaces, butmaking a choice now makes the notation easier below). We will assume V and W

are finite dimensional of dimension n and m, respectively.Let BV denote a basis for V and BW denote a basis for W . From the previous

section, we can define 2 ADDITIONAL linear transformations:

Page 14: “notes2” 2013/1/16 page 1

“notes2”2013/1/16page 14

i

i

i

i

14 Chapter 1. Review and Background

• One from V → Rn according to the rule [v]BVfor v ∈ V ;

• One from W → Rm according to the rule [w]BWfor w ∈W .

Consider the following. Let v ∈ V . Define w = L(v) ∈ W . By the UniqueRepresentation Theorem, we can now define a unique vector in Rm to w accordingto the rule w = [w]BW

.Now consider the assignment of a unique vector in Rn to v according to

v = [v]BV.

So does there exist a FOURTH linear transformation LM : Rn → Rm thatwould map v directly to w? Yes. Note that since the mapping is between R

n

and Rm, this linear transformation *must* be definable in terms of a matrix vectorproduct. So what is the matrix M such that Mv = w? Note that for this to work,M MUST be of size m× n.

Definition 1.8. If BV = {v1, . . . ,vn} then the matrix of the linear transfor-mation L defined above must be given by

M = [[L(v1)]BW, [L(v2)]BW

, · · · , [L(vn))]BW] .

The implication is key: virtually everything interesting you want to knowabout L can be obtained by examining properties ofM. For example, the dimensionof the range of L will be the dimension of the range of M; L will be invertible iffM is invertible; eigenvalues of L are eigenvalues of M, etc. We will come back tothis concept throughout the semester.

Important Notational Point. This matrix is very important, and its entriesclearly depend on L, and on the bases used for V and W . Therefore, it, too, hasits own short-hand notation: The matrix of the transformation L between vectorspaces V and W with bases BV and BW , respectively, is denoted

[L]BW←BV,

so that it is clear now from the notation that the matrix is associated explicitlywith transformation L and particular bases for V and W , and that the matrix isconsistent with moving from v to w.

Exercise 1.3. Find the matrix of the transformation for L : P2 → P2 withL(p(t)) = 3p(t). Find the matrix of the transformation for these cases:

1. use the standard basis for both V and W

2. use the standard basis for V but use {1− t, 1 + t, t2} for W

3. repeat the 2nd example but with the bases for V and W reversed

4. use {1− t, 1 + t, t2} as the basis for both V and W .

Page 15: “notes2” 2013/1/16 page 1

“notes2”2013/1/16page 15

i

i

i

i

1.3. Range, Kernel, Onto, One-to-one, invertibility 15

1.2.2 Change of Basis, Matrix case

We start with a visual. Suppose A is a real, square, 3x3 matrix, and for simplicityassume that the matrix is invertible, so that the columns are independent and thusare a basis for R3. Then the linear transformation defined by Ax corresponds to achange of basis for R3. To see this, observe that when given x, we can draw it in ouroriginal coordinate system using the standard basis because x = x1e1+x2e2+x3e3.But after action by A we have

y = x1a1 + x2a2 + x3a3.

Now, insted of the xi being coordinates for the standard basis, they’ve becomecoefficients in the coordinate system determined by the columns of A.

Exercise 1.4. More generally, define L := Ax for n × n invertible A. Then (forreal A) this defines a linear mapping from Rn to Rn. We can use the standard basis{e1, . . . , en} for R

n on the V side of the picture in the previous subsection. SinceA has independent columns, they form a basis for Rn as well, so we can use themfor the basis for Rn as well. Show [L]BW←BV

= I, the n× n identity matrix.

Next suppose that you have two different bases for Rn (Cn), call them BV andBW . For any x ∈ Rn, we should be able to find the vectors [x]BV

and [x]BW. Now,

multiplication by the identity matrix I is a linear operator from Rn to Rn. As youwill show in the next exercise, for this operator, the matrix of the transformationin this special case is the so-called change of basis matrix, M, such that M[x]BV

=[x]BW

.

Exercise 1.5. Prove the matrix of the transformation for the case in the precedingparagraph (i.e. L = I) is the change of basis matrix

[[v1]BW, . . . , [vn]BW

] .

Furthermore, the previous discussion leads us to another observation. Givenany x, V[x]BV

= x, where V is the matrix containing the basis vectors in the BVbasis. Similarly, W[x]BW

= x. V,W must be invertible (why?), hence

x = V[x]BV= W[x]BW

, =⇒ [x]BW= W−1V

︸ ︷︷ ︸

M

[x]BV.

Exercise 1.6. Show/argue that this M must be equal to the matrix of the trans-formation from the previous exercise.

1.3 Range, Kernel, Onto, One-to-one, invertibility

1.3.1 Range

If L is a linear transformation whose domain is the vector space V (the set ofallowable input objects is the domain), and v ∈ V then we say the output, or image

Page 16: “notes2” 2013/1/16 page 1

“notes2”2013/1/16page 16

i

i

i

i

16 Chapter 1. Review and Background

of v under L, is L(v).

Definition 1.9. The range of a linear transformation L is the set of all possibleimages (outputs):

R(L) = {x ∈ W |x = L(v),v ∈ V }.Other possible/equivalent short hand notation for range is Range(L), Ran(L).

Let us consider again Example 1.2, and ask ourselves what the range must be.We contend that it must be all of R3. To prove this, we need to show that for everyelement in R3, there must exist some element in P2 which maps to it.

Proof. Let x =

c1c2c3

represent an arbitrary element in R3. Now define poly-

nomial q(t) = c1 + c2(t) + c3(t2). Clearly, L(q(t)) = x. Since x was an arbitrary

element of R3, our proof is complete.

However, it is not always the case that if L is a linear transformation from V

to W that the range is all of W . To see this, consider L(x) = Bx from Example

1.7. Let y =

222

. It is straightforward to show that y 6∈ R(L) by showing that

there is no solution to Bx = y. Thus, in this example R(L) 6= R3.

A word about set equality and set containment. We say that C ⊂ D if wecan show that any element of C must be an element of D. We say that 2 sets C,Dare equal if it can be show than ∀c ∈ C, c ∈ D AND ∀d ∈ D, d ∈ D (that is, C ⊂ D

and D ⊂ C). Thus, one can show that two sets are not equal simply by showingthat there is at least one element of one of the two sets that’s not contained in theother.

In the special case that the linear transformation is defined explictly througha matrix-vector product, i.e. L(x) := Ax, then we usually talk about the range ofA rather than than range of L. So in this case, the notation would be R(A), andsince matrix-vector products always define linear transformations, it is understoodfrom context that R(A) means the range of the associated linear transformation.

Furthermore, we have an EXACT way of specifying the range of a lineartransformation associated with a matrix-vector product:

Definition 1.10. The range of A, denoted R(A) (or Range(A) or Ran(A)), is theset of all linear combinations of the columns of A. This is also often referred to asthe column space of A.

A is m× n, R(A) = {y|y = Ax for some x ∈ V } = span(a1, . . . , an),

where V is either Rn or Cn depending on context.

Page 17: “notes2” 2013/1/16 page 1

“notes2”2013/1/16page 17

i

i

i

i

1.3. Range, Kernel, Onto, One-to-one, invertibility 17

1.3.2 Kernel and Null Space

If L : V → W is a linear transformation, we know (by definition of linear transfor-mation) that whatever the zero is in W , that element must be in the range of L.Therefore, it is natural to ask what (and how many) elements in V map to it.

Definition 1.11. The kernel of a linear transformation L : V → W is the set ofall elements in V that map to the zero element in W :

ker(L) = {x ∈ V |L(x) = 0 ∈W}.Note that in some texts, the kernel is often called the null space, which may beexpressed in short-hand in different ways, such as Null(L), or Nul(L), or N(L).On the other hand, some books reserve null space as a particular case of the kernelspecific to linear transformations defined by matrix-vector products.

Let us consider Example ??. You should have reasoned that W in that case isthe class of all continuous (though not necessarily differentiable anymore) functionson [0,1]. The “zero” of W is the constant function g(t) = 0. What maps to thisfunction? Answer: Any function that’s continuously diff’ble on [0,1] such that itsderivative is zero. So, if f(t) = c (i.e. c could be any real number, the point isthat f(t) is a constant function), g(t) := f ′(t) = 0. Thus, ker(L) is the set of allconstant functions on [0,1].

In Example 1.6, the kernel contains only the constant function 0.In the special case that the linear transformtion is defined explicitly through

matrix-vector product, L(x) := Ax, then we call the kernel of the linear transfor-mation the null space of A.

Definition 1.12. The null space of A is denoted N(A) ( or Nul(A) or Null(L))and is given as

N(A) = {x|Ax = 0}.

For theA in Example ??, you should be able to show thatN(A) = span(

1431

).

Exercise 1.7. You should be able to prove that for an arbitrary linear transfor-mation between 2 arbitrary vector spaces V and W , that Null(L) and Range(L) aresubspaces of V and W , respectively.

Exercise 1.8. Find the dimensions of the kernels in Example ??, 1.6, ?? (thelatter for both transformations induces by the two different matrices).

1.3.3 One-to-one, Onto, Invertible

Definition 1.13. Let L : V → W be a linear transformation. Then L is onto W

if R(L) = W . We may equivalently say that L is surjective.

Page 18: “notes2” 2013/1/16 page 1

“notes2”2013/1/16page 18

i

i

i

i

18 Chapter 1. Review and Background

Definition 1.14. Let L : V → W be a linear transformation. Then L is one-to-one if for every y ∈ R(L), there exists a unique x ∈ V such that y = L(x). Inother words, if for any v,u ∈ V we observe L(v) = L(u), due to uniqueness it mustbe the case that v = u. We may equivalently say that L is injective.

Exercise 1.9. From the previous definition and the definition of kernel, it’s easyto prove that if the kernel is non-trivial, the linear transformation cannotbe one-to-one. Prove this.

Exercise 1.10. Give examples (previous ones from this text, or others you define)of linear transformations which satisfy a) one-to-one but not onto b) onto but notone-to-one. Make sure to give examples for which V and/or W are neither Rn

nor Cn.

Definition 1.15. Let L : V →W be a linear transformation. The L is invertible,or bijective if L is BOTH one-to-one and and onto. Thus, there must exist aninverse, denoted L−1, which is a linear transformation from W to V which is alsoinvertible, and such that L−1(L(v)) = v;L(L−1(w)) = w, for every v ∈ V andevery w ∈W .

1.3.4 Isomorphic

Definition 1.16. Let V and W be two vector spaces. Then V and W are said tobe isomorphic to each other if there exists a linear transformation between themthat is invertible.

To show they are isomorphic, it suffices to give the corresponding invertibleLT.

Suppose for the moment that V is a real vector space and let dim(V ) = n forfinite n. Let B = {b1, . . . ,bn} denote the basis for V . There is a unique way (byUnique Representation Theorem) of expanding every single element of V in thisbasis. I.e, for any x = V ,

x =

n∑

i=1

cibi, is unique.

A fundamentally important/relevant fact relative to basis expansion is con-tained in the following theorem

Exercise 1.11. Define L : V → Rn as the map that takes an element of V andstrips off the expansion coefficients in order, and places them in a vector of length

n, namely

c1...cn

(i.e. L(x) = [x]B). Then this linear transformation invertible

– prove it!

Page 19: “notes2” 2013/1/16 page 1

“notes2”2013/1/16page 19

i

i

i

i

1.4. A Few Words About Solutions of Linear Systems of Equations 19

As corollaries to the exercise above, we observe:

Example 1.11. For integer n > 0, Pn−1 is isomorphic to Rn.

Example 1.12. For integer n > 0, Πn−1 is isomorphic to Cn.

What are the practical implications? Well, suppose I want to prove that somecollection of polynomials is linearly independent. It sufficies to a) pick a basis for thepolynomials, b) analyze the vectors of expansion coefficients for linear independence.The latter involves showing that the only solution to Ax = 0 is the trivial solution,where the columns of the matrix A are the column vectors of expansion coefficientsin the basis.

1.3.5 Rank Plus Nullity

Suppose L is a linear transformation L : V →W . Clearly Range(L) ⊂W .

Theorem 1.17. For subspaces V,W and linear transformation L : V →W ,

dim(Range(L)) + dim(ker(L)) = dim(V ).

A special case of this is usually presented in a first course on linear algebra,namely that for an m× n matrix A,

dim(R(A)) + dim(N(A)) = n.

Since rank is defined as the dimension of the range space, the above result isoften called the Rank-plus-nullity Theorem.

Return to Example (1.12) and review the example in the context of this the-orem.

1.4 A Few Words About Solutions of Linear Systemsof Equations

When L(x) := Ax for an m× n matrix A, then L defines a linear transformationfrom L : Cn → Cm.

A solution to Ax = b exists iff b ∈ R(A). When a solution exists, then it isunique iff N(A) = {0}.

Therefore, it is only possible for an inverse to exist if n = m, and the dimensionof the range is equal to n.

For a particular b, to solve a small system, recall that we typically form theaugmented m × (n + 1) matrix [A,b], put it in row echelon form (REF) anduse backsubstitution or put it in reduced row echelon form (RREF) (in whichcase the solution(s) can be determined directly from the form).

Page 20: “notes2” 2013/1/16 page 1

“notes2”2013/1/16page 20

i

i

i

i

20 Chapter 1. Review and Background

To find a basis for N(A), we solve Ax = 0. Either x = 0 is the only solution,or there are free variables. In the latter case, write the pivot variables in terms ofthe free variables, and expand.

Example 1.13. A =

(1 2 1−2 4 −2

)

. The RREF is U =

(1 0 10 0 0

)

so there

are 2 free variables x2, x3 meaning the dimension of the null space must be 2. Thepivot variable can be expressed x1 = −x3. The solution set to Ax = 0 is the sameas the solution set to Ux = 0. The solution set to the latter can be expressed as

x1

x2

x3

=

−x3

x2

x3

= x3

101

+ x2

010

,

so the basis for the null space is given by

101

,

010

.

Exercise 1.12. Note that the procedure outlined above should, of course, be givingus linearly independent vectors that span the nullspace, otherwise, we don’t have abasis. Prove that the procedure will give linearly independent vectors.

1.5 Sum vs. Direct Sum of Subspaces

The sum of a finite number of subspaces V1, . . . , Vp of a finite dimensional vectorspace is the set whose elements can be written as a sum of elements in the respectivesubspaces:

V1 + · · ·+ Vp := {x|x = v1 + · · ·+ vp,vi ∈ Vi}.

Theorem 1.18. The sum of subspaces Vi ⊂ V , i = 1, . . . , p, is a subspace of V .

Proof. Let u,v ∈ V1 + V2 · · ·+ Vp. Then by definition, it must be the case that

u = u1 + · · ·+ up, with ui ∈ Vi; v = v1 + · · ·+ vp with vi ∈ Vi.

Therefore,

u+ v = u1 + · · ·up + v1 + · · ·vp

= (u1 + v1) + (u2 + v2) + · · · (up + vp) assoc. law in V

But because each Vi is a subspace and therefore closed under addition, ui + vi ∈Vi, i = 1, . . . , p. This shows that u + v is a sum of elements each living in Vi, andtherefore it, too, is in the direct sum of the subspaces.

Furthermore, if c is an arbitrary scalar, then

cu = cu1 + · · ·+ cup

Page 21: “notes2” 2013/1/16 page 1

“notes2”2013/1/16page 21

i

i

i

i

1.6. Extending a Linearly Independent Set to a Basis 21

because of scalar multipicative distribution in the vector space V . Noting that eachVi is a subspace and therefore closed under scalar multiplication, it follows thatcui ∈ Vi, i = 1, . . . , p. Thus, cu is a sum of elements each living in Vi, so it, too,must be in the direct sum.

Therefore, since the set is closed under addition and scalar multiplicaiton, itmust be a subspace of V .

Definition 1.19. Now define U = V1+ · · ·+Vp. If the expression of each u ∈ U asa sum of exactly p terms u =

∑p

i=1 ui, with ui ∈ Vi, is unique, then we say thatU is a direct sum of these subspaces,

U = V1 ⊕ Vp.

NOTE that the direct sum of subspaces of V also forms a subspace of V ,and thus in particular, the 0 element of V is contained in the direct sum. What isespecially noteworthy about direct sums is that the uniqueness ensures there is onlyone way to express 0: i.e. if we put u = 0 then we are forced to take ui ∈ Vi = 0in the expression.

Suppose Vi

⋂Vj = {0} if i 6= j. Then it is possible to show that U must be a

direct sum of the subspaces. Conversely, the subspace V1 + · · · + Vp CANNOT bea direct sum unless Vi

⋂Vj = {0} for i 6= j. Therefore, when we write U ⊕W for

two subspaces U,W of V , it is implicit that U⋂W = {0}.

1.6 Extending a Linearly Independent Set to a Basis

Suppose V is a finite dimensional vector space - say dim(V ) = n > 1. Let v ∈V,v 6= 0. Note that span(v) is a 1-dimensional vector space.

It is possible to use this single vector as a starting point, and through aprocedure, or algorithm, obtain a basis (i.e., get n linearly independent vectors thatspan V ) for V ! In fact, from any set of k (1 ≤ k ≤ n − 1 linearly independentvectors, it is possible to extend to a basis for V .

Theorem 1.20. Given a finite dimensional vector space V of dimension n > 0,every linearly independent set of k < n vectors in V can be extended to a basis ofV .

In some sense, this seems obvious. Certainly, if we know that dim(V ) = n,there must exist a basis for it from which we’ve deduced that V has this dimension.So there exists at least one set of n linearly independent vectors that span V . Thusthe proof is trivial if those k independent vectors we’re given are chosen from amongthose n in the known basis. But what if we’re given a different set of k independentvectors? The general proof is below.

Proof. Let S := {v1, . . . ,vj} for any integer j with 1 ≤ j < n be a set of linearlyindependent vectors in V . Now V has finite dimension n, so it has some known

Page 22: “notes2” 2013/1/16 page 1

“notes2”2013/1/16page 22

i

i

i

i

22 Chapter 1. Review and Background

basis which spans the space, call it B = {w1, . . . ,wn}. We assume n ≥ 2, andj < n. Let k = 1 to begin. Note that dim(S) = j to begin, but its dimension isgradually increased (though maybe not on each step):

1. If S⋃{wk} is still a linearly independent set, then set S ← S

⋃{wj}.

If dim(S) = n, stop; else, set k ← k + 1 and repeat Step 1.

2. Otherwise, discard wk, set k ← k + 1 and repeat Step 1.

This process guarantees that eventually you will get n linearly independent vectorsin V on or before step k = n. Why? Since every set of n linearly independentvectors in V must be a basis for V (you should be able to show this!), we are done.

Corollary 1.21. Let V be a finite dimensional vector space, and U a propersubspace of V . Then there exists a subspace W such that V = U ⊕W .

Although we could prove it now, the proof is elegant (trivial) after we studyorthogonal complements.

1.7 Determinants

In Matrix Analysis and Applied Linear Algebra by Carl Meyer, the author makesthe following statement about determinants of matrices:

... today matrix and linear algebra are in the main stream of appliedmathematics, while the role of determinants has been relegated to a mi-nor backwater position. Nevertheless, it is still important to understandwhat a determinant is and to learn a few of its fundamental properties...to explore those properties that are useful in the further development ofmatrix theory and its applications.

This sentiment is echoed in Linear Algebra and its Applications, Third Ed byGil Strang: “The simple things about the determinant are not the explicit formulas,but the properties it possesses.”

Hence, we keep the discussion on determinants very short and sweet, with aneye to practical relevance (e.g. we will not present Cramer’s rule). Please referenceany standard linear algebra text for a complete discussion on determinants and alltheir properties.

Notatation: If A is n × n, we will typically use det(A) to denote the deter-minant of A. This notation suggests there is a mapping - indeed, det : Cn×n ⇒ C

if A is complex valued and produces a mapping Rn×n ⇒ R if A is real valued.

Definition 1.22. det(A) for 2× 2 A is given by a11a22 − a21a12.

Let Mij denote the (n − 1) × (n − 1) matrix (called a submatrix of A) youwould get if you deleted row i and column j from A. The determinant of this

Page 23: “notes2” 2013/1/16 page 1

“notes2”2013/1/16page 23

i

i

i

i

1.7. Determinants 23

submatrix is called the (i, j)th cofactor of A.

Definition 1.23. (compare to Lay, 3.1) Let A be n × n, n ≥ 2. Fix the index1 ≤ i ≤ n. Then the determinant can be computed via co-factor expansion on rowi as

det(A) =n∑

j=1

(−1)i+jaijdet(Mij).

For the sake of completeness, we note that co-factor expansion can also bedone via columns instead of rows (exchange the use of i and j in the formulaabove). Note that the definition is recursive in nature: the det of a 5x5 requiresdets of 4x4’s which requires dets of 3x3’s which requires dets of 2x2’s.

Any student who has ever used co-factor expansion to compute a determinantof a matrix of dimension larger than 3 knows that computation of the determinantusing co-factor expansion involves A LOT OF WORK due to the recursion. Never-theless, for small problems where we want you to compute something by hand (likeeigenvalues - see next section), it’s useful. It’s also often a useful formula when thematrix A has special structure.

For example, you can use co-factor expansion to prove that the determi-nant of an upper triangular, lower triangular, or diagonal matrix is theproduct of its diagonal elements.

That leads us to the practical computation of a determinant. Let U representa REF of an n × n matrix A. Then U is square and triangular. Fortunately, wehave a theorem that tells us the determinant of an upper triangular matrix is easyto compute:

Theorem 1.24. If U is a REF of n× n A, then

det(A) = ±u11u22 · · ·unn,

and the sign determinantion depends on the number of row swaps that were per-formed in computing U (negative if odd number, positive if even).

For the proof, see any linear algebra text.To keep it brief, we include a few useful facts about determinants. Note

determinants only makes sense if we are talking about square matrices.

• A is invertible iff det(A) = 0.

• det(AB) = det(A)det(B).

• If A is invertible det(A−1) = 1det(A) .

• det(AT ) = det(A).

Page 24: “notes2” 2013/1/16 page 1

“notes2”2013/1/16page 24

i

i

i

i

24 Chapter 1. Review and Background

Now we cover a few useful applications for determinants (besides using themto specify eigenvalues, which we cover in the next section).

Example 1.14. Linear independence of functions, and the Wronskian. Considerthe following nth order linear ODE:

a0(t)y(n)(t) + a1(t)y

(n−1)(t) + · · ·+ an(t)y(t) = g(t)

In a standard ordinary linear differential equations (ODE)s class, you learnthat to form a “general solution” to this you need a linear combination of n linearlyindependent solutions to the ODE. Solutions to an nth order ODE are themselvesfunctions yi(t) of the independent variable t. Given a list of solutions y1(t), . . . , yn(t)to the the set is independent if the only solution to

c1y1(t) + · · · cnyn(t) = 0

is the trivial solution c1 = ... = cn = 0. Note we are talking about this equationbeing true for ANY value of t (at least within a subinterval of the real line).

Now since this equation must hold for all t (in the prescribed subinterval), andsince all the functions are n times differentiable, we can take n − 1 derivatives ofthis equation to obtain n− 1 more equations:

c1y(i)1 (t) + · · · cny(i)n (t) = 0, i = 0, . . . , n− 1.

We can conveniently write the preceding set of equations in matrix-vector formas

y1(t) y2(t) · · · yn(t)y′1(t) y′2(t) · · · y′n(t)y′′1 (t) y′′2 (t) · · · y′′n(t)...

......

...

y(n−1)1 (t) · · · · · · y

(n−1)n (t)

c1c2...cn

= 0.

The matrix is a function of t, therefore its determinant (called the Wronskian)will be a function of t. We would need the determinant to be non-zero to concludethat the yi are independent. Now if det(A(t)) proves to be non-zero when a partic-ular value of t is plugged in (we usually use t0, where the initial condition y(t0) = c

is specified in the problem), this is sufficient to show they are independent. But if itevaluates to zero for a particular value of t, you actually cannot conclude anythingwithout restrictions on the coefficient functions ai(t) and g(t).

Example 1.15. Determinants, geometry, and change of variables in integrals.The numerical solution to 2D partial differential equations is often acheived

through use of a method known as the finite element method. We will be seeingmore about FEM after we discuss inner products. For now, we concentrate on awell-known subproblem in FEM computation – the need to compute integrals overthe elements, where an element is a 2D polygon possibly on an irregular grid. Ifwe want to evaluate integral of f(x, y) over a one of these polygons, we’d like to do

Page 25: “notes2” 2013/1/16 page 1

“notes2”2013/1/16page 25

i

i

i

i

1.8. Brief Review: Eigenpairs, Eigenspace, Diagonalizability 25

them all by transforming the problem to one of computation of integration over areference element. In this example we will focus on a very simple instance.

Imagine a grid, the horizontal axis is the u axis the vertical axis is the v

axis. Recall that in R2, any pair of vectors a1, a2 implicitly form a parallelogram(corner tagged to (0,0)) with the components of a1 the (u,v) endpoint of one sideand components of a2 the endpoint of the other side. Let A = [a1, a2].

If the parallelogram is a rectangle, then it’s clear A will be diagonal and det(A)will be equal to the area of the rectangle.

In particular, the matrix representation of the unit square would be A =[e1, e2], and det(A) = 1 is the area of the unit square.

Certain geometric transformations can be represented as linear transforma-tions (e.g. a shear transformation, rotation, or dilation - see, for example, Lay,1.8). If we wanted to convert our unit square to a parallelogram (still tagged to theorigin) we apply our transformation to the columns of the rectangle. This amountsto multiplication by the matrix of the transformation, which we’ll call B. So thecoordinates of the new parallelogram would consist of the columns of the matrixP = BI. Note that det(P) = det(B)det(I) = det(B).

Thus, the columns of B form the sides of a parallelogram, and the area of theparallelogram determined by the columns of B is the absolute value of the determi-nant. A formal proof of this can be found in many linear algebra textbooks.

Now we will go back to how this relates to the change of variables in doubleintegrals. (TBD)

1.8 Brief Review: Eigenpairs, Eigenspace,Diagonalizability

Let A be an n× n matrix (can have real or complex entries). Remember that if λis real or complex and x is a NONZERO vector in Rn or Cn then if

Ax = λx

that x is a an eigenvector with corresponding eigenvalue λ. The zero vector is neveran eigenvector!

Finding eigenvalues of a matrix is equivalent (on paper) to computing roots ofthe n-degree characteristic polynomial det(A − λI). This equivalence may notproduce an efficient algorithm for computing eigenvalues, however! If you want toknow more about practical computation of eigenvalues, we recommend a course inNumerical Linear Algebra.

We will consider eigenvalues and vectors in a subsequent section in greaterdetail. The few things worth reminding you at this point:

• Even ifA is a real-valued matrix, it can have a complex-valued eigenpair. Thisfollows from the fact that the characteristic polynomial, even if real valued,need not have all real roots3. There are situations when you can guaranteethis will never be the case, however, and we will study those in a later section.

3Fundamental Theorem of Algebra

Page 26: “notes2” 2013/1/16 page 1

“notes2”2013/1/16page 26

i

i

i

i

26 Chapter 1. Review and Background

Note, however, this puts an interesting twist with respect to the vector spaceswe should be considering. IfA is real but λ and x are complex, we might preferto think of the linear transformation defined by multiplication with A overCn. In this case, it’s clear x is in the domain and range of the transformation.

• If A is real-valued, any eigenvalues (eigenvectors) come in complex conjugatepairs. Thus, if A is real-valued 3×3 and you know one eigenvalue is complex,you know one other is the complex conjugate and therefore the third eigenvaluemust be real.

• If A has 0 as an eigenvalue, the matrix is not invertible. You should be ableto prove this.

• As roots of a polynomial can have multiplicity greater than 1, this impliesthat you may get multiple copies of the same eigenvalue.

If A is real-valued and the eigenpair is real and λ 6= 0, it is easy to visualize(in R2 or R3 ) the meaning of eigenvector and eigenvalue – multiplicaiton by Amaps x back in the direction of itself (direction x is invariant to multiplication byA), scaled by an amount λ.

We have a special term for expressing the concept of repeated eigenvalues.

Definition 1.25. We say an eigenvalue λ has algebraic multiplicity m for anon-negative integer m if λ is a root of multiplicity m of the characteristic polyno-mial.

In a first course on Linear Algebra, texts may drop the term “algebraic” fromthe definition. But it is needed to distinguish it from the concept of geometricmultiplicity. Recall that to find an eigenvector for an eigenvalue λ, one needs tosolve the homogeneous equation (A−λI)x = 0. This is equivalent to finding a basisfor Null(A−λI). Recall that Null(A−λI) is called the eigenspace correspondingto the eigenvalue λ.

Definition 1.26. The geometric multiplicity of an eigenvalue λ is the dimen-sion of the eigenspace Null(A− λI).

Theorem 1.27. The geometric multiplicity of an eigenvalue λ is at least 1, butless than or equal to the algebraic multiplicity of λ.

Exercise 1.13. Prove the first part of the theorem (namely, that the geometricmultiplicity of an eigenvalue must be at least 1.)

1.8.1 Similarity and Diagonalizability

Definition 1.28. Two n×n matrices A,B are similar if there exists an invertiblematrix X such that

A = XBX−1 ( which of course implies that X−1AX = B.

Page 27: “notes2” 2013/1/16 page 1

“notes2”2013/1/16page 27

i

i

i

i

1.8. Brief Review: Eigenpairs, Eigenspace, Diagonalizability 27

If the dimension of the eigenspace associated with an eigenvalue λ is k ≥ 1,then there must be k linearly independent vectors (they might be in Cn even whenA is real!) that span the eigenspace – those are k linearly independent eigenvectorsassociated with λ. Now suppose that µ 6= λ is another eigenvalue of A, and supposethat the dimension of the eigenspace corresponding to µ is j. Then there is a theoremwe learned in first semester linear algebra that says that the k eigenvectors from λ

and the j eigenvectors from µ must be linearly independent.Recall that A is diagonalizable4 if there exists invertible W and diagonal D

such that A = WDW−1. In other words, A is diagonalizable if it is similar toa diagonal matrix (more on this in Chapter 2). Recall from first semester linearalgebra that you can prove that if two matrices are similar, they share the sameeigenvalues. Thus, if A is diagonalizable as above, its eigenvalues must be theelements on the diagonal of the diagonal matrix D.

IfA is n×n with n distinct eigenvalues,Amust be diagonalizable (see footnotefor caveat) – why?? On the other hand, if A has any eigenvalue for which thegeometric and algebraic multiplicities are not equal, then the matrix cannot bediagonalizable.

We will study the connection between algebraic and geometric multiplicity fur-ther in the next two chapters. We will also study invariant subspaces (eigenspacesare special examples of these) and the connection to direct sums and diagonaliz-ability.

Exercise 1.14. A matrix is called skew-symmetric if A = −AT . The matrix

(1 2−2 3

)

is skew-symmetric and the eigenvalues are 2 ±√3 (and x, x are the eigenvectors,

the entries aren’t important for this exercise). Setting D = diag(2 +√3, 2 −

√3)

and X = [v, v], A = XDX−1. However, there is no way to diagonalize A with areal similarity transform. Prove this.

Example 1.16. This is to illustrate that diagonalizability and invertibility areentirely different concepts, one does not imply the other. Consider

A =

(1 11 1

)

.

Clearly the matrix is not invertible and has rank 1. Computing det(A−λI) = 0 we

see that λ1 = 0, λ2 = 2. An eigenvector for λ1 is x1 =

(−11

)

and x2 =

(11

)

is

an eigenvector for λ2. The matrix is diagonalizable, with X = [x1,x2] (check that

4There is a subtle but important point to be made here. The matrix A could be real but havecomplex eigenvalues/vectors. If that is the case, the matrix will not be diagonalizable by a real-valued matrix X for a real-valued diagonal matrix D. However, if we consider the L : Cn → Cn,then A is diagonalizable. We will work under the latter assumption.

Page 28: “notes2” 2013/1/16 page 1

“notes2”2013/1/16page 28

i

i

i

i

28 Chapter 1. Review and Background

X is invertible) and

A = X

(0 00 2

)

X−1.

1.9 Partitioned Matrices

Definition 1.29. A matrix partition is a set of horizontal and vertical dividingrules that result in a partitioned, or block, matrix. The same matrix can be blocked,or partitioned, a number of ways.

The easiest way to describe this is to see some examples.

Example 1.17.

A =

3 −9 81 2 −17 4 −4

=

3 −9 81 2 −17 4 −4

=

(A11 A12

A21 A22

)

This specific partition corresponds to a block 2× 2 matrix, with the first block equalto a 2 × 2 submatrix and the lower right submatrix is 1 × 1. However, we couldhave also partitioned this in several other ways: a) as a block 2x2 such that A11 is1× 1 and A22 is 2 times 2 b) as a block 2x2 but A11 is 2 by 1; c) block 2x2 whereA11 is 1 by 2; d) as a block 3x2 e) block 2 x 3, f) block 3 by 3.

Notationally, since we use capital letters to define matrices in this text, andbecause any given block of a block partitioned matrix can itself be considered amatrix, we continue to use upper case letters for the blocks, with the subscriptindicating to which block-row or block-column the submatrix belongs.

We say leading principle submatrix when we mean the A11 correspondingto a particular partition. Sometimes we refer to the lowest-rightmost block as thetrailing submatrix.

Exercise 1.15. Identify all the block partitionings, and corresponding elements andsizes of each block, for the first example

You can block partition any matrix, rectangular or square. And if you havea pair of matrices, if we partition conformably, we can write traditional matrixoperations using the block partitioning. For example, suppose A is 4 × 6, and wepartition it into a block 1×3 where each block is a 4×2 submatrix. Simultaneously,we partition a 6 × 10 matrix B into a block 3× 10, where each submatrix block is2× 10. Then the product AB is defined and can be computed as

AB =(A11 A12 A13

)

B11

B21

B31

=(A11B11 +A12B21 +A13B31

).

Page 29: “notes2” 2013/1/16 page 1

“notes2”2013/1/16page 29

i

i

i

i

1.9. Partitioned Matrices 29

Since each AijBjk is the product of a 4 × 2 with a 2 × 10, and we sum these, theresulting product is indeed 4× 10, as we knew it had to be.

Of particular interest are block triangular and block diagonal matrices. Con-sider the 4× 4 block upper triangular n× n matrix

A =

A11 A12 A13 A14

0 A22 A23 A24

0 0 A33 A34

0 0 0 A44

.

To back out the value of n, I have to be told the dimensions of the diagonal blocks.For example, if each diagonal block is 3× 3, then A is a 12× 12 matrix. If the first2 blocks are size 2 and the last 2 are size 1, then n = 6.

A special case of a block upper triangular matrix is a block diagonal ma-trix in which case the off diagonal blocks correspond to 0 matrices of appropriatedimension. For example, a 3× 3 block diagonal matrix would have the form

A =

A11 0 00 A22 00 0 A33

.

Block diagonal matrices are of great interest to us in the remainder of the course.