On the Unique Games Conjecture Subhash Khot NYU Courant CCC, June 10, 2010.
Matrices - NYU Courant
Transcript of Matrices - NYU Courant
![Page 1: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/1.jpg)
Matrices
DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysishttp://www.cims.nyu.edu/~cfgranda/pages/OBDA_fall17/index.html
Carlos Fernandez-Granda
![Page 2: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/2.jpg)
Basic properties
Singular value decomposition
Denoising
Collaborative filtering
Principal component analysis
Probabilistic interpretation
Dimensionality reduction
Eigendecomposition
![Page 3: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/3.jpg)
Column and row space
The column space col (A) of a matrix A is the span of its columns
The row space row (A) is the span of its rows.
![Page 4: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/4.jpg)
Rank
For any matrix A
dim (col (A)) = dim (row (A))
This is the rank of A
![Page 5: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/5.jpg)
Orthogonal column spaces
If the column spaces of A,B ∈ Rm×n are orthogonal then
〈A,B〉 = 0
Proof:
〈A,B〉 := tr(ATB
)=
n∑i=1
〈A:,i ,B:,i 〉 = 0
Consequence:
||A + B||2F = ||A||2F + ||B||2F
![Page 6: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/6.jpg)
Linear maps
Given two vector spaces V and R associated to the same scalar field,a linear map f : V → R is a map from vectors in V to vectors in Rsuch that for any scalar α and any vectors ~x1, ~x2 ∈ V
f (~x1 + ~x2) = f (~x1) + f (~x2)
f (α~x1) = α f (~x1)
![Page 7: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/7.jpg)
Matrix-vector product
The product of a matrix A ∈ Cm×n and a vector ~x ∈ Cn is a vectorA~x ∈ Cm, such that
(A~x) [i ] =n∑
j=1
Aij~x [j ]
A~x is a linear combination of the columns of A
A~x =n∑
j=1
~x [j ]A:j
![Page 8: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/8.jpg)
Equivalence between matrices and linear maps
For finite m, n every linear map f : Cm → Cn can be uniquelyrepresented by a matrix F ∈ Cm×n
![Page 9: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/9.jpg)
Proof
The matrix is
F :=[f (~e1) f (~e2) · · · f (~en)
],
the columns are the result of applying f to the standard basis
For any vector ~x ∈ Cn
f (x) = f
(n∑
i=1
~x [i ]~ei
)
=n∑
i=1
~x [i ]f (~ei )
= F~x
![Page 10: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/10.jpg)
Projecting and lifting
When a matrix Cm×n is fat, i.e., n > m, we say that it projects vectorsonto a lower dimensional space (this is not an orthogonal projection!)
When a matrix is tall, i.e., m > n, we say that it lifts vectors to ahigher-dimensional space
![Page 11: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/11.jpg)
Adjoint
Given two vector spaces V and R with inner products 〈·, ·〉V and 〈·, ·〉R,the adjoint f ∗ : R → V of a linear map f : V → R satisfies
〈f (~x) , ~y〉R = 〈~x , f ∗ (~y)〉V
for all ~x ∈ V and ~y ∈ R
![Page 12: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/12.jpg)
Conjugate transpose
The entries of the conjugate transpose A∗ ∈ Cn×m of A ∈ Cm×n are
(A∗)ij = Aji , 1 ≤ i ≤ n, 1 ≤ j ≤ m.
If the entries are all real, this is just the transpose
The adjoint f ∗ : Cn → Cm of a f : Cm → Cn represented by amatrix F corresponds to F ∗
![Page 13: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/13.jpg)
Symmetric and Hermitian
A symmetric matrix is equal to its transpose
A Hermitian or self-adjoint matrix is equal to its conjugate transpose
![Page 14: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/14.jpg)
Range
Let V and R be vector spaces associated to the same scalar field, therange of a map f : V → R is the set of vectors in R reached by f
range (f ) := {~y | ~y = f (~x) for some ~x ∈ V}
The range of a matrix is the range of its associated linear map
![Page 15: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/15.jpg)
The range is the column space
For any matrix A ∈ Cm×n
range (A) = col (A)
Proof:
col (A) ⊆ range (A) because A:i = A~ei for 1 ≤ i ≤ n
range (A) ⊆ col (A) because A~x is a linear combination of the columns ofA for any ~x ∈ Cn
![Page 16: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/16.jpg)
Null space
Let V and R be vector spaces, the null space of a map f : V → R is theset of vectors that are mapped to zero:
null (f ) :={~x | f (~x) = ~0
}The null space of a matrix is the null space of its associated linear map
The null space of a linear map is a subspace
![Page 17: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/17.jpg)
Null space and row space
For any matrix A ∈ Rm×n
null (A) = row (A)⊥
Proof:
Any vector ~x ∈ row (A) can be written as ~x = AT~z , for some ~z ∈ Rm
If y ∈ null (A) then
〈~y , ~x〉 =
⟨~y ,AT~z
⟩= 〈A~y , ~z〉= 0
![Page 18: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/18.jpg)
Null space and row space
For any matrix A ∈ Rm×n
null (A) = row (A)⊥
Proof:
Any vector ~x ∈ row (A) can be written as ~x = AT~z , for some ~z ∈ Rm
If y ∈ null (A) then
〈~y , ~x〉 =⟨~y ,AT~z
⟩
= 〈A~y , ~z〉= 0
![Page 19: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/19.jpg)
Null space and row space
For any matrix A ∈ Rm×n
null (A) = row (A)⊥
Proof:
Any vector ~x ∈ row (A) can be written as ~x = AT~z , for some ~z ∈ Rm
If y ∈ null (A) then
〈~y , ~x〉 =⟨~y ,AT~z
⟩= 〈A~y , ~z〉
= 0
![Page 20: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/20.jpg)
Null space and row space
For any matrix A ∈ Rm×n
null (A) = row (A)⊥
Proof:
Any vector ~x ∈ row (A) can be written as ~x = AT~z , for some ~z ∈ Rm
If y ∈ null (A) then
〈~y , ~x〉 =⟨~y ,AT~z
⟩= 〈A~y , ~z〉= 0
![Page 21: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/21.jpg)
Null space and range
Let A ∈ Rm×n
dim (range (A)) + dim (null (A)) = n
![Page 22: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/22.jpg)
Identity matrix
The identity matrix of dimensions n × n is
I =
1 0 · · · 00 1 · · · 0
· · ·0 0 · · · 1
For any ~x ∈ Cn, I~x = ~x
![Page 23: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/23.jpg)
Matrix inverse
The inverse of A ∈ Cn×n is a matrix A−1 ∈ Cn×n such that
AA−1 = A−1A = I
![Page 24: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/24.jpg)
Orthogonal matrices
An orthogonal matrix is a square matrix such that
UTU = UUT = I
The columns U:1, U:2, . . . , U:n form an orthonormal basis
For any ~x ∈ Rn
~x = UUT~x =n∑
i=1
〈U:i , ~x〉U:i
![Page 25: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/25.jpg)
Product of orthogonal matrices
If U,V ∈ Rn×n are orthogonal matrices, then UV is also anorthogonal matrix
(UV )T (UV ) = V TUTUV = I
![Page 26: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/26.jpg)
Orthogonal matrices
Orthogonal matrices change the direction of vectors, not their magnitude
||U~x ||22
= ~xTUTU~x
= ~xT~x
= ||~x ||22
![Page 27: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/27.jpg)
Orthogonal matrices
Orthogonal matrices change the direction of vectors, not their magnitude
||U~x ||22 = ~xTUTU~x
= ~xT~x
= ||~x ||22
![Page 28: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/28.jpg)
Orthogonal matrices
Orthogonal matrices change the direction of vectors, not their magnitude
||U~x ||22 = ~xTUTU~x
= ~xT~x
= ||~x ||22
![Page 29: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/29.jpg)
Orthogonal matrices
Orthogonal matrices change the direction of vectors, not their magnitude
||U~x ||22 = ~xTUTU~x
= ~xT~x
= ||~x ||22
![Page 30: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/30.jpg)
Orthogonal-projection matrix
Given a subspace S ⊆ Rn of dimension d , the matrix
P := UUT
where the columns of U:1, U:2, . . . , U:d are an orthonormal basis of S,maps any vector ~x to PS ~x
P~x = UUT~x
=d∑
i=1
〈U:i , ~x〉U:i
= PS ~x
![Page 31: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/31.jpg)
Basic properties
Singular value decomposition
Denoising
Collaborative filtering
Principal component analysis
Probabilistic interpretation
Dimensionality reduction
Eigendecomposition
![Page 32: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/32.jpg)
Singular value decomposition
Every rank r real matrix A ∈ Rm×n, has a singular-value decomposition(SVD) of the form
A =[~u1 ~u2 · · · ~ur
]σ1 0 · · · 00 σ2 · · · 0
. . .0 0 · · · σr
~v T1
~vT2...~vTr
= USV T
![Page 33: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/33.jpg)
Singular value decomposition
I The singular values σ1 ≥ σ2 ≥ · · · ≥ σr are positive real numbers
I The left singular vectors ~u1, ~u2, . . . ~ur form an orthonormal set
I The right singular vectors ~v1, ~v2, . . . ~vr also form an orthonormal set
I The SVD is unique if all the singular values are different
I If σi = σi+1 = . . . = σi+k , then ~ui , . . . , ~ui+k can be replaced by anyorthonormal basis of their span (the same holds for ~vi , . . . , ~vi+k)
I The SVD of an m×n matrix with m ≥ n can be computed in O(mn2)
![Page 34: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/34.jpg)
Column and row space
I The left singular vectors ~u1, ~u2, . . . ~ur are a basis for the column space
I The right singular vectors ~v1, ~v2, . . . ~vr are a basis for the row space
Proof:
span (~u1, . . . , ~ur ) ⊆ col (A) since ~ui = A(σ−1i ~vi
)col (A) ⊆ span (~u1, . . . , ~ur ) because A:i = U
(SV T ~ei
)
![Page 35: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/35.jpg)
Singular value decomposition
A = [ ~u1 · · · ~ur︸ ︷︷ ︸Basis of range(A)
~ur+1 · · · ~un]
σ1 · · · 0 0 · · · 00 · · · 0 0 · · · 00 · · · σr 0 · · · 00 · · · 0 0 · · · 0
· · · · · ·0 · · · 0 0 · · · 0
[ ~v1 ~v2 · · · ~vr︸ ︷︷ ︸Basis of row(A)
~vr+1 · · · ~vn︸ ︷︷ ︸Basis of null(A)
]T
![Page 36: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/36.jpg)
Rank and numerical rank
The rank of a matrix is equal to the number of nonzero singular values
Given a tolerance ε > 0, the numerical rank is the number of singularvalues greater than ε
![Page 37: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/37.jpg)
Linear maps
The SVD decomposes the action of a matrix A ∈ Rm×n on a vector~x ∈ Rn into:
1. Rotation
V T~x =n∑
i=1
〈~vi , ~x〉 ~ei
2. Scaling
SV T~x =n∑
i=1
σi 〈~vi , ~x〉 ~ei
3. Rotation
USV T~x =n∑
i=1
σi 〈~vi , ~x〉 ~ui
![Page 38: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/38.jpg)
Linear maps
~x
~v1
~v2
~y
V T~x
~e1
~e2V T~y
SV T~x
σ1~e1
σ2~e2SV T~y
USV T~x
σ1~u1σ2~u2
USV T~y
V T
S
U
![Page 39: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/39.jpg)
Linear maps
~x
~v1
~v2
~y
V T~x
~e1
~e2V T~y
SV T~xσ1~e1
~0 SV T~y
USV T~x
σ1~u1
~0
USV T~y
V T
S
U
![Page 40: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/40.jpg)
Singular values
For any orthogonal matrices U ∈ Rm×m and V ∈ Rn×n the singularvalues of UA and AV are the same as those of A
Proof:
U := UU
VT
:= V T V
are orthogonal, so USV T and USVT are valid SVDs for UA and AV
![Page 41: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/41.jpg)
Singular values
The singular values satisfy
σ1 = max{||~x ||2=1 | ~x∈Rn}
||A~x ||2
= max{||~y ||2=1 | ~y∈Rm}
∣∣∣∣∣∣AT~y∣∣∣∣∣∣
2
σi = max{||~x ||2=1 | ~x∈Rn, ~x⊥~u1,...,~ui−1}
||A~x ||2
= max{||~y ||2=1 | ~y∈Rm, ~y⊥~v1,...,~vi−1}
∣∣∣∣∣∣AT~y∣∣∣∣∣∣
2, 2 ≤ i ≤ min {m, n}
![Page 42: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/42.jpg)
Proof
Consider ~x ∈ Rn such that ||~x ||2 = 1 and for a fixed 1 ≤ i ≤ n
~x ⊥ ~v1, . . . , ~vi−1
We decompose ~x into
~x =n∑j=i
αj~vj + Prow(A)⊥ ~x
where 1 = ||~x ||22 ≥∑n
j=i α2j
![Page 43: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/43.jpg)
Proof
||A~x ||22
=
⟨n∑
k=1
σk 〈~vk , ~x〉 ~uk ,n∑
k=1
σk 〈~vk , ~x〉 ~uk
⟩
=n∑
k=1
σ2k 〈~vk , ~x〉
2 because ~u1, . . . , ~un are orthonormal
=n∑
k=1
σ2k
⟨~vk ,
n∑j=i
αj~vj + Prow(A)⊥ ~x
⟩2
=n∑j=i
σ2j α
2j because ~v1, . . . , ~vn are orthonormal
≤ σ2i
n∑j=i
α2j because σi ≥ σi+1 ≥ . . . ≥ σn
≤ σ2i
![Page 44: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/44.jpg)
Proof
||A~x ||22 =
⟨n∑
k=1
σk 〈~vk , ~x〉 ~uk ,n∑
k=1
σk 〈~vk , ~x〉 ~uk
⟩
=n∑
k=1
σ2k 〈~vk , ~x〉
2 because ~u1, . . . , ~un are orthonormal
=n∑
k=1
σ2k
⟨~vk ,
n∑j=i
αj~vj + Prow(A)⊥ ~x
⟩2
=n∑j=i
σ2j α
2j because ~v1, . . . , ~vn are orthonormal
≤ σ2i
n∑j=i
α2j because σi ≥ σi+1 ≥ . . . ≥ σn
≤ σ2i
![Page 45: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/45.jpg)
Proof
||A~x ||22 =
⟨n∑
k=1
σk 〈~vk , ~x〉 ~uk ,n∑
k=1
σk 〈~vk , ~x〉 ~uk
⟩
=n∑
k=1
σ2k 〈~vk , ~x〉
2 because ~u1, . . . , ~un are orthonormal
=n∑
k=1
σ2k
⟨~vk ,
n∑j=i
αj~vj + Prow(A)⊥ ~x
⟩2
=n∑j=i
σ2j α
2j because ~v1, . . . , ~vn are orthonormal
≤ σ2i
n∑j=i
α2j because σi ≥ σi+1 ≥ . . . ≥ σn
≤ σ2i
![Page 46: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/46.jpg)
Proof
||A~x ||22 =
⟨n∑
k=1
σk 〈~vk , ~x〉 ~uk ,n∑
k=1
σk 〈~vk , ~x〉 ~uk
⟩
=n∑
k=1
σ2k 〈~vk , ~x〉
2 because ~u1, . . . , ~un are orthonormal
=n∑
k=1
σ2k
⟨~vk ,
n∑j=i
αj~vj + Prow(A)⊥ ~x
⟩2
=n∑j=i
σ2j α
2j because ~v1, . . . , ~vn are orthonormal
≤ σ2i
n∑j=i
α2j because σi ≥ σi+1 ≥ . . . ≥ σn
≤ σ2i
![Page 47: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/47.jpg)
Proof
||A~x ||22 =
⟨n∑
k=1
σk 〈~vk , ~x〉 ~uk ,n∑
k=1
σk 〈~vk , ~x〉 ~uk
⟩
=n∑
k=1
σ2k 〈~vk , ~x〉
2 because ~u1, . . . , ~un are orthonormal
=n∑
k=1
σ2k
⟨~vk ,
n∑j=i
αj~vj + Prow(A)⊥ ~x
⟩2
=n∑j=i
σ2j α
2j because ~v1, . . . , ~vn are orthonormal
≤ σ2i
n∑j=i
α2j because σi ≥ σi+1 ≥ . . . ≥ σn
≤ σ2i
![Page 48: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/48.jpg)
Proof
||A~x ||22 =
⟨n∑
k=1
σk 〈~vk , ~x〉 ~uk ,n∑
k=1
σk 〈~vk , ~x〉 ~uk
⟩
=n∑
k=1
σ2k 〈~vk , ~x〉
2 because ~u1, . . . , ~un are orthonormal
=n∑
k=1
σ2k
⟨~vk ,
n∑j=i
αj~vj + Prow(A)⊥ ~x
⟩2
=n∑j=i
σ2j α
2j because ~v1, . . . , ~vn are orthonormal
≤ σ2i
n∑j=i
α2j because σi ≥ σi+1 ≥ . . . ≥ σn
≤ σ2i
![Page 49: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/49.jpg)
Proof
||A~x ||22 =
⟨n∑
k=1
σk 〈~vk , ~x〉 ~uk ,n∑
k=1
σk 〈~vk , ~x〉 ~uk
⟩
=n∑
k=1
σ2k 〈~vk , ~x〉
2 because ~u1, . . . , ~un are orthonormal
=n∑
k=1
σ2k
⟨~vk ,
n∑j=i
αj~vj + Prow(A)⊥ ~x
⟩2
=n∑j=i
σ2j α
2j because ~v1, . . . , ~vn are orthonormal
≤ σ2i
n∑j=i
α2j because σi ≥ σi+1 ≥ . . . ≥ σn
≤ σ2i
![Page 50: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/50.jpg)
Singular vectors
The right singular vectors satisfy
~v1 = argmax{||~x ||2=1 | ~x∈Rn}
||A~x ||2
~vi = argmax{||~x ||2=1 | ~x∈Rn, ~x⊥~v1,...,~vi−1}
||A~x ||2 , 2 ≤ i ≤ m
and the left singular vectors satisfy
~u1 = argmax{||~y ||2=1 | ~y∈Rm}
∣∣∣∣∣∣AT~y∣∣∣∣∣∣
2
~ui = argmax{||~y ||2=1 | ~y∈Rm, ~y⊥~u1,...,~ui−1}
∣∣∣∣∣∣AT~y∣∣∣∣∣∣
2, 2 ≤ i ≤ n
![Page 51: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/51.jpg)
Proof
~vi achieves the maximum
||A~vi ||22
=
⟨n∑
k=1
σk 〈~vk , ~vi 〉 ~uk ,n∑
k=1
σk 〈~vk , ~vi 〉 ~uk
⟩
=n∑
k=1
σ2k 〈~vk , ~vi 〉
2
= σ2i
Same proof for ~ui replacing A by AT
![Page 52: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/52.jpg)
Proof
~vi achieves the maximum
||A~vi ||22 =
⟨n∑
k=1
σk 〈~vk , ~vi 〉 ~uk ,n∑
k=1
σk 〈~vk , ~vi 〉 ~uk
⟩
=n∑
k=1
σ2k 〈~vk , ~vi 〉
2
= σ2i
Same proof for ~ui replacing A by AT
![Page 53: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/53.jpg)
Proof
~vi achieves the maximum
||A~vi ||22 =
⟨n∑
k=1
σk 〈~vk , ~vi 〉 ~uk ,n∑
k=1
σk 〈~vk , ~vi 〉 ~uk
⟩
=n∑
k=1
σ2k 〈~vk , ~vi 〉
2
= σ2i
Same proof for ~ui replacing A by AT
![Page 54: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/54.jpg)
Proof
~vi achieves the maximum
||A~vi ||22 =
⟨n∑
k=1
σk 〈~vk , ~vi 〉 ~uk ,n∑
k=1
σk 〈~vk , ~vi 〉 ~uk
⟩
=n∑
k=1
σ2k 〈~vk , ~vi 〉
2
= σ2i
Same proof for ~ui replacing A by AT
![Page 55: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/55.jpg)
Optimal subspace for orthogonal projection
Given a set of vectors ~a1, ~a2, . . . ~an and a fixed dimension k ≤ n, theSVD of
A :=[~a1 ~a2 · · · ~an
]∈ Rm×n
provides the k-dimensional subspace that captures the most energy
n∑i=1
∣∣∣∣Pspan(~u1,~u2,...,~uk ) ~ai∣∣∣∣2
2 ≥n∑
i=1
||PS ~ai ||22
for any subspace S of dimension k
![Page 56: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/56.jpg)
Proof
Because ~u1, ~u2, . . . , ~uk are orthonormal
n∑i=1
∣∣∣∣Pspan(~u1,~u2,...,~uk ) ~ai∣∣∣∣2
2
=n∑
i=1
k∑j=1
〈~uj , ~ai 〉2
=k∑
j=1
∣∣∣∣∣∣AT ~uj
∣∣∣∣∣∣22
Induction on k
The base case k = 1 follows from
~u1 = argmax{||~y ||2=1 | ~y∈Rm}
∣∣∣∣∣∣AT~y∣∣∣∣∣∣
2
![Page 57: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/57.jpg)
Proof
Because ~u1, ~u2, . . . , ~uk are orthonormal
n∑i=1
∣∣∣∣Pspan(~u1,~u2,...,~uk ) ~ai∣∣∣∣2
2 =n∑
i=1
k∑j=1
〈~uj , ~ai 〉2
=k∑
j=1
∣∣∣∣∣∣AT ~uj
∣∣∣∣∣∣22
Induction on k
The base case k = 1 follows from
~u1 = argmax{||~y ||2=1 | ~y∈Rm}
∣∣∣∣∣∣AT~y∣∣∣∣∣∣
2
![Page 58: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/58.jpg)
Proof
Because ~u1, ~u2, . . . , ~uk are orthonormal
n∑i=1
∣∣∣∣Pspan(~u1,~u2,...,~uk ) ~ai∣∣∣∣2
2 =n∑
i=1
k∑j=1
〈~uj , ~ai 〉2
=k∑
j=1
∣∣∣∣∣∣AT ~uj
∣∣∣∣∣∣22
Induction on k
The base case k = 1 follows from
~u1 = argmax{||~y ||2=1 | ~y∈Rm}
∣∣∣∣∣∣AT~y∣∣∣∣∣∣
2
![Page 59: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/59.jpg)
Proof
Because ~u1, ~u2, . . . , ~uk are orthonormal
n∑i=1
∣∣∣∣Pspan(~u1,~u2,...,~uk ) ~ai∣∣∣∣2
2 =n∑
i=1
k∑j=1
〈~uj , ~ai 〉2
=k∑
j=1
∣∣∣∣∣∣AT ~uj
∣∣∣∣∣∣22
Induction on k
The base case k = 1 follows from
~u1 = argmax{||~y ||2=1 | ~y∈Rm}
∣∣∣∣∣∣AT~y∣∣∣∣∣∣
2
![Page 60: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/60.jpg)
Proof
Because ~u1, ~u2, . . . , ~uk are orthonormal
n∑i=1
∣∣∣∣Pspan(~u1,~u2,...,~uk ) ~ai∣∣∣∣2
2 =n∑
i=1
k∑j=1
〈~uj , ~ai 〉2
=k∑
j=1
∣∣∣∣∣∣AT ~uj
∣∣∣∣∣∣22
Induction on k
The base case k = 1 follows from
~u1 = argmax{||~y ||2=1 | ~y∈Rm}
∣∣∣∣∣∣AT~y∣∣∣∣∣∣
2
![Page 61: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/61.jpg)
Proof
Let S be a subspace of dimension k
S ∩ span (~u1, . . . , ~uk−1)⊥ contains a nonzero vector ~b
If dim (V) has dimension n, S1,S2 ⊆ V and dim (S1) + dim (S2) > n,then dim (S1 ∩ S2) ≥ 1
There exists an orthonormal basis ~b1, ~b2, . . . , ~bk for S such that ~bk := ~bis orthogonal to ~u1, ~u2, . . . , ~uk−1
![Page 62: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/62.jpg)
Proof
Let S be a subspace of dimension k
S ∩ span (~u1, . . . , ~uk−1)⊥ contains a nonzero vector ~b
If dim (V) has dimension n, S1,S2 ⊆ V and dim (S1) + dim (S2) > n,then dim (S1 ∩ S2) ≥ 1
There exists an orthonormal basis ~b1, ~b2, . . . , ~bk for S such that ~bk := ~bis orthogonal to ~u1, ~u2, . . . , ~uk−1
![Page 63: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/63.jpg)
Induction hypothesis
k−1∑i=1
∣∣∣∣∣∣AT ~ui
∣∣∣∣∣∣22
=n∑
i=1
∣∣∣∣Pspan(~u1,~u2,...,~uk−1) ~ai∣∣∣∣2
2
≥n∑
i=1
∣∣∣∣∣∣Pspan(~b1,~b2,...,~bk−1) ~ai
∣∣∣∣∣∣22
=k−1∑i=1
∣∣∣∣∣∣AT~bi
∣∣∣∣∣∣22
![Page 64: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/64.jpg)
Proof
Recall that
~uk = argmax{||~y ||2=1 | ~y∈Rm, ~y⊥~u1,...,~uk−1}
∣∣∣∣∣∣AT~y∣∣∣∣∣∣
2
which implies ∣∣∣∣∣∣AT ~uk
∣∣∣∣∣∣22≥∣∣∣∣∣∣AT~bk
∣∣∣∣∣∣22
We concluden∑
i=1
∣∣∣∣Pspan(~u1,~u2,...,~uk ) ~ai∣∣∣∣2
2 =k∑
i=1
∣∣∣∣∣∣AT ~ui
∣∣∣∣∣∣22
≥k∑
i=1
∣∣∣∣∣∣AT~bi
∣∣∣∣∣∣22
=n∑
i=1
||PS ~ai ||22
![Page 65: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/65.jpg)
Proof
Recall that
~uk = argmax{||~y ||2=1 | ~y∈Rm, ~y⊥~u1,...,~uk−1}
∣∣∣∣∣∣AT~y∣∣∣∣∣∣
2
which implies ∣∣∣∣∣∣AT ~uk
∣∣∣∣∣∣22≥∣∣∣∣∣∣AT~bk
∣∣∣∣∣∣22
We concluden∑
i=1
∣∣∣∣Pspan(~u1,~u2,...,~uk ) ~ai∣∣∣∣2
2 =k∑
i=1
∣∣∣∣∣∣AT ~ui
∣∣∣∣∣∣22
≥k∑
i=1
∣∣∣∣∣∣AT~bi
∣∣∣∣∣∣22
=n∑
i=1
||PS ~ai ||22
![Page 66: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/66.jpg)
Best rank-k approximation
Let USV T be the SVD of a matrix A ∈ Rm×n
The truncated SVD U:,1:kS1:k,1:kVT:,1:k is the best rank-k approximation
U:,1:kS1:k,1:kVT:,1:k = argmin
{A | rank(A)=k}
∣∣∣∣∣∣A− A∣∣∣∣∣∣
F
![Page 67: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/67.jpg)
Proof
Let A be an arbitrary matrix in Rm×n with rank(A) = k
Let U ∈ Rm×k be a matrix with orthonormal columns such thatcol(U) = col(A)
∣∣∣∣∣∣U:,1:kUT:,1:kA
∣∣∣∣∣∣2F
=n∑
i=1
∣∣∣∣∣∣Pcol(U:,1:k) ~ai
∣∣∣∣∣∣22
≥n∑
i=1
∣∣∣∣∣∣Pcol(U) ~ai
∣∣∣∣∣∣22
=∣∣∣∣∣∣UUTA
∣∣∣∣∣∣2F
![Page 68: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/68.jpg)
Orthogonal column spaces
If the column spaces of A,B ∈ Rm×n are orthogonal then
||A + B||2F = ||A||2F + ||B||2F
![Page 69: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/69.jpg)
Proof
col(A− UUTA
)is orthogonal to col
(A)
= col(U)
∣∣∣∣∣∣A− A∣∣∣∣∣∣2
F=∣∣∣∣∣∣A− UUTA
∣∣∣∣∣∣2F
+∣∣∣∣∣∣A− UUTA
∣∣∣∣∣∣2F
≥∣∣∣∣∣∣A− UUTA
∣∣∣∣∣∣2F
= ||A||2F −∣∣∣∣∣∣UUTA
∣∣∣∣∣∣2F
≥ ||A||2F −∣∣∣∣∣∣U:,1:kU
T:,1:kA
∣∣∣∣∣∣2F
=∣∣∣∣∣∣A− U:,1:kU
T:,1:kA
∣∣∣∣∣∣2F
![Page 70: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/70.jpg)
Proof
col(A− UUTA
)is orthogonal to col
(A)
= col(U)
∣∣∣∣∣∣A− A∣∣∣∣∣∣2
F=∣∣∣∣∣∣A− UUTA
∣∣∣∣∣∣2F
+∣∣∣∣∣∣A− UUTA
∣∣∣∣∣∣2F
≥∣∣∣∣∣∣A− UUTA
∣∣∣∣∣∣2F
= ||A||2F −∣∣∣∣∣∣UUTA
∣∣∣∣∣∣2F
≥ ||A||2F −∣∣∣∣∣∣U:,1:kU
T:,1:kA
∣∣∣∣∣∣2F
=∣∣∣∣∣∣A− U:,1:kU
T:,1:kA
∣∣∣∣∣∣2F
![Page 71: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/71.jpg)
Proof
col(A− UUTA
)is orthogonal to col
(A)
= col(U)
∣∣∣∣∣∣A− A∣∣∣∣∣∣2
F=∣∣∣∣∣∣A− UUTA
∣∣∣∣∣∣2F
+∣∣∣∣∣∣A− UUTA
∣∣∣∣∣∣2F
≥∣∣∣∣∣∣A− UUTA
∣∣∣∣∣∣2F
= ||A||2F −∣∣∣∣∣∣UUTA
∣∣∣∣∣∣2F
≥ ||A||2F −∣∣∣∣∣∣U:,1:kU
T:,1:kA
∣∣∣∣∣∣2F
=∣∣∣∣∣∣A− U:,1:kU
T:,1:kA
∣∣∣∣∣∣2F
![Page 72: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/72.jpg)
Proof
col(A− UUTA
)is orthogonal to col
(A)
= col(U)
∣∣∣∣∣∣A− A∣∣∣∣∣∣2
F=∣∣∣∣∣∣A− UUTA
∣∣∣∣∣∣2F
+∣∣∣∣∣∣A− UUTA
∣∣∣∣∣∣2F
≥∣∣∣∣∣∣A− UUTA
∣∣∣∣∣∣2F
= ||A||2F −∣∣∣∣∣∣UUTA
∣∣∣∣∣∣2F
≥ ||A||2F −∣∣∣∣∣∣U:,1:kU
T:,1:kA
∣∣∣∣∣∣2F
=∣∣∣∣∣∣A− U:,1:kU
T:,1:kA
∣∣∣∣∣∣2F
![Page 73: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/73.jpg)
Proof
col(A− UUTA
)is orthogonal to col
(A)
= col(U)
∣∣∣∣∣∣A− A∣∣∣∣∣∣2
F=∣∣∣∣∣∣A− UUTA
∣∣∣∣∣∣2F
+∣∣∣∣∣∣A− UUTA
∣∣∣∣∣∣2F
≥∣∣∣∣∣∣A− UUTA
∣∣∣∣∣∣2F
= ||A||2F −∣∣∣∣∣∣UUTA
∣∣∣∣∣∣2F
≥ ||A||2F −∣∣∣∣∣∣U:,1:kU
T:,1:kA
∣∣∣∣∣∣2F
=∣∣∣∣∣∣A− U:,1:kU
T:,1:kA
∣∣∣∣∣∣2F
![Page 74: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/74.jpg)
Reminder
For any pair of m × n matrices A and B
tr(BTA
):= tr
(ABT
)
![Page 75: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/75.jpg)
Frobenius norm
For any matrix A ∈ Rm×n, with singular values σ1, . . . , σmin{m,n}
||A||F =
√√√√min{m,n}∑i=1
σ2i .
Proof:
||A||2F = tr(ATA
)= tr
(VSUTUSV T
)= tr
(VSSV T
)because UTU = I
= tr(V TVSS
)= tr (SS) because V TV = I
![Page 76: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/76.jpg)
Frobenius norm
For any matrix A ∈ Rm×n, with singular values σ1, . . . , σmin{m,n}
||A||F =
√√√√min{m,n}∑i=1
σ2i .
Proof:
||A||2F = tr(ATA
)
= tr(VSUTUSV T
)= tr
(VSSV T
)because UTU = I
= tr(V TVSS
)= tr (SS) because V TV = I
![Page 77: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/77.jpg)
Frobenius norm
For any matrix A ∈ Rm×n, with singular values σ1, . . . , σmin{m,n}
||A||F =
√√√√min{m,n}∑i=1
σ2i .
Proof:
||A||2F = tr(ATA
)= tr
(VSUTUSV T
)
= tr(VSSV T
)because UTU = I
= tr(V TVSS
)= tr (SS) because V TV = I
![Page 78: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/78.jpg)
Frobenius norm
For any matrix A ∈ Rm×n, with singular values σ1, . . . , σmin{m,n}
||A||F =
√√√√min{m,n}∑i=1
σ2i .
Proof:
||A||2F = tr(ATA
)= tr
(VSUTUSV T
)= tr
(VSSV T
)because UTU = I
= tr(V TVSS
)= tr (SS) because V TV = I
![Page 79: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/79.jpg)
Frobenius norm
For any matrix A ∈ Rm×n, with singular values σ1, . . . , σmin{m,n}
||A||F =
√√√√min{m,n}∑i=1
σ2i .
Proof:
||A||2F = tr(ATA
)= tr
(VSUTUSV T
)= tr
(VSSV T
)because UTU = I
= tr(V TVSS
)
= tr (SS) because V TV = I
![Page 80: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/80.jpg)
Frobenius norm
For any matrix A ∈ Rm×n, with singular values σ1, . . . , σmin{m,n}
||A||F =
√√√√min{m,n}∑i=1
σ2i .
Proof:
||A||2F = tr(ATA
)= tr
(VSUTUSV T
)= tr
(VSSV T
)because UTU = I
= tr(V TVSS
)= tr (SS) because V TV = I
![Page 81: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/81.jpg)
Operator norm
The operator norm of a linear map and the corresponding matrixA ∈ Rm×n is
||A|| := max{||~x ||2=1 | ~x∈Rn}
||A ~x ||2
= σ1
![Page 82: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/82.jpg)
Nuclear norm
The nuclear norm of a matrix A ∈ Rm×n with singular valuesσ1, . . . , σmin{m,n} is
||A||∗ :=
min{m,n}∑i=1
σi
![Page 83: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/83.jpg)
Multiplication by orthogonal matrices
For any orthogonal matrices U ∈ Rm×m and V ∈ Rn×n the singularvalues of UA and AV are the same as those of A
Consequence:
The operator, Frobenius and nuclear norm of UA and AV are the sameas those of A
![Page 84: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/84.jpg)
Hölder’s inequality for matrices
For any matrix A ∈ Rm×n,
||A||∗ = sup{||B||≤1 | B∈Rm×n}
〈A,B〉 .
Consequence: nuclear norm satisfies triangle inequality
||A + B||∗
= sup{||C ||≤1 | C∈Rm×n}
〈A + B,C 〉
≤ sup{||C ||≤1 | C∈Rm×n}
〈A,C 〉+ sup{||D||≤1 | D∈Rm×n}
〈B,D〉
= ||A||∗ + ||B||∗
![Page 85: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/85.jpg)
Hölder’s inequality for matrices
For any matrix A ∈ Rm×n,
||A||∗ = sup{||B||≤1 | B∈Rm×n}
〈A,B〉 .
Consequence: nuclear norm satisfies triangle inequality
||A + B||∗ = sup{||C ||≤1 | C∈Rm×n}
〈A + B,C 〉
≤ sup{||C ||≤1 | C∈Rm×n}
〈A,C 〉+ sup{||D||≤1 | D∈Rm×n}
〈B,D〉
= ||A||∗ + ||B||∗
![Page 86: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/86.jpg)
Hölder’s inequality for matrices
For any matrix A ∈ Rm×n,
||A||∗ = sup{||B||≤1 | B∈Rm×n}
〈A,B〉 .
Consequence: nuclear norm satisfies triangle inequality
||A + B||∗ = sup{||C ||≤1 | C∈Rm×n}
〈A + B,C 〉
≤ sup{||C ||≤1 | C∈Rm×n}
〈A,C 〉+ sup{||D||≤1 | D∈Rm×n}
〈B,D〉
= ||A||∗ + ||B||∗
![Page 87: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/87.jpg)
Hölder’s inequality for matrices
For any matrix A ∈ Rm×n,
||A||∗ = sup{||B||≤1 | B∈Rm×n}
〈A,B〉 .
Consequence: nuclear norm satisfies triangle inequality
||A + B||∗ = sup{||C ||≤1 | C∈Rm×n}
〈A + B,C 〉
≤ sup{||C ||≤1 | C∈Rm×n}
〈A,C 〉+ sup{||D||≤1 | D∈Rm×n}
〈B,D〉
= ||A||∗ + ||B||∗
![Page 88: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/88.jpg)
Proof
The proof relies on the following lemma:
For any Q ∈ Rn×n
max1≤i≤n
|Qii | ≤ ||Q||
Proof:
Since ||~ei ||2 = 1,
max1≤i≤n
|Qii | ≤ max1≤i≤n
√√√√ n∑j=1
Q2ji
= max1≤i≤n
||Q ~ei ||2
≤ ||Q||
![Page 89: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/89.jpg)
Proof
Let A := USV T ,
sup||B||≤1
Trace(ATB
)
= sup||B||≤1
Trace(V S UTB
)= sup||B||≤1
Trace(S BUTV
)≤ sup||M||≤1
Trace (S M) since ||B|| =∣∣∣∣∣∣BUTV
∣∣∣∣∣∣≤ sup
max1≤i≤n|Mii |≤1Trace (S M) by the lemma
≤ supmax1≤i≤n|Mii |≤1
n∑i=1
Mii σi
≤n∑
i=1
σi
= ||A||∗
![Page 90: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/90.jpg)
Proof
Let A := USV T ,
sup||B||≤1
Trace(ATB
)= sup||B||≤1
Trace(V S UTB
)
= sup||B||≤1
Trace(S BUTV
)≤ sup||M||≤1
Trace (S M) since ||B|| =∣∣∣∣∣∣BUTV
∣∣∣∣∣∣≤ sup
max1≤i≤n|Mii |≤1Trace (S M) by the lemma
≤ supmax1≤i≤n|Mii |≤1
n∑i=1
Mii σi
≤n∑
i=1
σi
= ||A||∗
![Page 91: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/91.jpg)
Proof
Let A := USV T ,
sup||B||≤1
Trace(ATB
)= sup||B||≤1
Trace(V S UTB
)= sup||B||≤1
Trace(S BUTV
)
≤ sup||M||≤1
Trace (S M) since ||B|| =∣∣∣∣∣∣BUTV
∣∣∣∣∣∣≤ sup
max1≤i≤n|Mii |≤1Trace (S M) by the lemma
≤ supmax1≤i≤n|Mii |≤1
n∑i=1
Mii σi
≤n∑
i=1
σi
= ||A||∗
![Page 92: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/92.jpg)
Proof
Let A := USV T ,
sup||B||≤1
Trace(ATB
)= sup||B||≤1
Trace(V S UTB
)= sup||B||≤1
Trace(S BUTV
)≤ sup||M||≤1
Trace (S M) since ||B|| =∣∣∣∣∣∣BUTV
∣∣∣∣∣∣≤ sup
max1≤i≤n|Mii |≤1Trace (S M) by the lemma
≤ supmax1≤i≤n|Mii |≤1
n∑i=1
Mii σi
≤n∑
i=1
σi
= ||A||∗
![Page 93: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/93.jpg)
Proof
Let A := USV T ,
sup||B||≤1
Trace(ATB
)= sup||B||≤1
Trace(V S UTB
)= sup||B||≤1
Trace(S BUTV
)≤ sup||M||≤1
Trace (S M) since ||B|| =∣∣∣∣∣∣BUTV
∣∣∣∣∣∣≤ sup
max1≤i≤n|Mii |≤1Trace (S M) by the lemma
≤ supmax1≤i≤n|Mii |≤1
n∑i=1
Mii σi
≤n∑
i=1
σi
= ||A||∗
![Page 94: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/94.jpg)
Proof
Let A := USV T ,
sup||B||≤1
Trace(ATB
)= sup||B||≤1
Trace(V S UTB
)= sup||B||≤1
Trace(S BUTV
)≤ sup||M||≤1
Trace (S M) since ||B|| =∣∣∣∣∣∣BUTV
∣∣∣∣∣∣≤ sup
max1≤i≤n|Mii |≤1Trace (S M) by the lemma
≤ supmax1≤i≤n|Mii |≤1
n∑i=1
Mii σi
≤n∑
i=1
σi
= ||A||∗
![Page 95: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/95.jpg)
Proof
Let A := USV T ,
sup||B||≤1
Trace(ATB
)= sup||B||≤1
Trace(V S UTB
)= sup||B||≤1
Trace(S BUTV
)≤ sup||M||≤1
Trace (S M) since ||B|| =∣∣∣∣∣∣BUTV
∣∣∣∣∣∣≤ sup
max1≤i≤n|Mii |≤1Trace (S M) by the lemma
≤ supmax1≤i≤n|Mii |≤1
n∑i=1
Mii σi
≤n∑
i=1
σi
= ||A||∗
![Page 96: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/96.jpg)
Proof
To complete the proof, we show that the equality holds
UV T has operator norm equal to one
⟨A,UV T
⟩
= tr(ATUV T
)= tr
(V S UTUV T
)= tr
(V TV S
)= tr (S)
= ||A||∗
![Page 97: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/97.jpg)
Proof
To complete the proof, we show that the equality holds
UV T has operator norm equal to one
⟨A,UV T
⟩= tr
(ATUV T
)
= tr(V S UTUV T
)= tr
(V TV S
)= tr (S)
= ||A||∗
![Page 98: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/98.jpg)
Proof
To complete the proof, we show that the equality holds
UV T has operator norm equal to one
⟨A,UV T
⟩= tr
(ATUV T
)= tr
(V S UTUV T
)
= tr(V TV S
)= tr (S)
= ||A||∗
![Page 99: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/99.jpg)
Proof
To complete the proof, we show that the equality holds
UV T has operator norm equal to one
⟨A,UV T
⟩= tr
(ATUV T
)= tr
(V S UTUV T
)= tr
(V TV S
)
= tr (S)
= ||A||∗
![Page 100: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/100.jpg)
Proof
To complete the proof, we show that the equality holds
UV T has operator norm equal to one
⟨A,UV T
⟩= tr
(ATUV T
)= tr
(V S UTUV T
)= tr
(V TV S
)= tr (S)
= ||A||∗
![Page 101: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/101.jpg)
Proof
To complete the proof, we show that the equality holds
UV T has operator norm equal to one
⟨A,UV T
⟩= tr
(ATUV T
)= tr
(V S UTUV T
)= tr
(V TV S
)= tr (S)
= ||A||∗
![Page 102: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/102.jpg)
Basic properties
Singular value decomposition
Denoising
Collaborative filtering
Principal component analysis
Probabilistic interpretation
Dimensionality reduction
Eigendecomposition
![Page 103: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/103.jpg)
Denoising correlated signals
Aim: Estimating n m-dimensional signals ~x1, ~x2, . . . , ~xn ∈ Rm from
~yi = ~xi + ~zi , 1 ≤ i ≤ n
Assumption 1: Signals are similar and approximately span an unknownlow-dimensional subspace
Assumption 2: Noisy perturbations are independent / uncorrelated
![Page 104: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/104.jpg)
Denoising correlated signals
Aim: Estimating n m-dimensional signals ~x1, ~x2, . . . , ~xn ∈ Rm from
~yi = ~xi + ~zi , 1 ≤ i ≤ n
Assumption 1: Signals are similar and approximately span an unknownlow-dimensional subspace
Assumption 2: Noisy perturbations are independent / uncorrelated
![Page 105: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/105.jpg)
Denoising correlated signals
By the assumptions
X :=[~x1 ~x2 · · · ~xn
]is approximately low rank, whereas
Z :=[~z1 ~z2 · · · ~zn
]is full rank
If Z is not too large, low-rank approximation to
Y :=[~y1 ~y2 · · · ~yn
]= X + Z
should correspond mostly to X
![Page 106: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/106.jpg)
Denoising via SVD truncation
1. Stack the vectors as the columns of a matrix Y ∈ Rm×n
2. Compute the SVD of Y = USV T
3. Truncate the SVD to produce the low-rank estimate L
L := U:,1:kS1:k,1:kVT:,1:k ,
for a fixed value of k ≤ min {m, n}
![Page 107: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/107.jpg)
Important decision
What rank k to choose?
I Large k
will approximate signals well but not suppress noise
I Small k
will suppress noise but may not approximate signals well
![Page 108: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/108.jpg)
Important decision
What rank k to choose?
I Large k will approximate signals well but not suppress noise
I Small k
will suppress noise but may not approximate signals well
![Page 109: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/109.jpg)
Important decision
What rank k to choose?
I Large k will approximate signals well but not suppress noise
I Small k will suppress noise but may not approximate signals well
![Page 110: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/110.jpg)
Denoising of digit images
MNIST data
Signals: 6131 28× 28 images of the number 3
Noise: iid Gaussian so that SNR is 0.5 in `2 norm
More noise than signal!
![Page 111: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/111.jpg)
Denoising of digit images
0 100 200 300 400 500 600 7000
500
1,000
1,500
i
σi
DataSignalsNoise
![Page 112: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/112.jpg)
Denoising of digit images
0 5 10 15 20 25 30 35 40 45 500
500
1,000
1,500
i
σi
DataSignalsNoise
![Page 113: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/113.jpg)
k = 40
Signals
Rank40
approx
Estimate
Data
![Page 114: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/114.jpg)
k = 10
Signals
Rank40
approx
Estimate
Data
![Page 115: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/115.jpg)
Basic properties
Singular value decomposition
Denoising
Collaborative filtering
Principal component analysis
Probabilistic interpretation
Dimensionality reduction
Eigendecomposition
![Page 116: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/116.jpg)
Collaborative filtering
A :=
Bob Molly Mary Larry
1 1 5 4 The Dark Knight2 1 4 5 Spiderman 34 5 2 1 Love Actually5 4 2 1 Bridget Jones’s Diary4 5 1 2 Pretty Woman1 2 5 5 Superman 2
![Page 117: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/117.jpg)
Intuition
Some people have similar tastes and hence produce similar ratings
Some movies are similar and hence elicit similar reactions
This tends to induce low-rank structure in the matrix of ratings
![Page 118: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/118.jpg)
SVD
A− µ~1~1T = UΣV T = U
7.79 0 0 00 1.62 0 00 0 1.55 00 0 0 0.62
V T
µ :=1n
m∑i=1
n∑j=1
Aij
![Page 119: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/119.jpg)
Rank 1 model
A + σ1~u1~vT1 =
Bob Molly Mary Larry
1.34 (1) 1.19 (1) 4.66 (5) 4.81 (4) The Dark Knight1.55 (2) 1.42 (1) 4.45 (4) 4.58 (5) Spiderman 34.45 (4) 4.58 (5) 1.55 (2) 1.42 (1) Love Actually4.43 (5) 4.56 (4) 1.57 (2) 1.44 (1) B.J.’s Diary4.43 (4) 4.56 (5) 1.57 (1) 1.44 (2) Pretty Woman1.34 (1) 1.19 (2) 4.66 (5) 4.81 (5) Superman 2
![Page 120: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/120.jpg)
First left singular vector
~u1 =D. Knight Sp. 3 Love Act. B.J.’s Diary P. Woman Sup. 2
( )−0.45 −0.39 0.39 0.39 0.39 −0.45
Coefficients cluster movies into action (+) and romantic (-)
![Page 121: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/121.jpg)
First right singular vector
~v1 =Bob Molly Mary Larry
( )0.48 0.52 −0.48 −0.52
Coefficients cluster people into action (-) and romantic (+)
![Page 122: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/122.jpg)
Generalization
Each rating is a sum of k terms
rating (movie i , user j) =k∑
l=1
σl~ul [i ] ~vl [j ]
Singular vectors cluster users and movies in different ways
Singular values weight the importance of the different factors.
![Page 123: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/123.jpg)
Basic properties
Singular value decomposition
Denoising
Collaborative filtering
Principal component analysis
Probabilistic interpretation
Dimensionality reduction
Eigendecomposition
![Page 124: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/124.jpg)
Sample covariance matrix
The sample covariance matrix of {~x1, ~x2, . . . , ~xn} ∈ Rm
Σ (~x1, . . . , ~xn) :=1
n − 1
n∑i=1
(~xi − av (~x1, . . . , ~xn)) (~xi − av (~x1, . . . , ~xn))T
av (~x1, ~x2, . . . , ~xn) :=1n
n∑i=1
~xi
Σ (~x1, . . . , ~xn)ij =
{var (~x1 [i ] , . . . , ~xn [i ]) if i = j ,cov ((~x1 [i ] , ~x1 [j ]) , . . . , (~xn [i ] , ~xn [j ])) if i 6= j
![Page 125: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/125.jpg)
Variation in a certain direction
For a unit vector ~d ∈ Rm
var(~d T~x1, . . . , ~d
T~xn)
=1
n − 1
n∑i=1
(~d T~xi − av
(~d T~x1, . . . , ~d
T~xn))2
=1
n − 1
n∑i=1
(~d T (~xi − av (~x1, . . . , ~xn))
)2
= ~d T
(1
n − 1
n∑i=1
(~xi − av (~x1, . . . , ~xn)) (~xi − av (~x1, . . . , ~xn))T)~d
= ~d TΣ (~x1, . . . , ~xn) ~d
Covariance matrix captures variance in every direction!
![Page 126: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/126.jpg)
Variation in a certain direction
For a unit vector ~d ∈ Rm
var(~d T~x1, . . . , ~d
T~xn)
=1
n − 1
n∑i=1
(~d T~xi − av
(~d T~x1, . . . , ~d
T~xn))2
=1
n − 1
n∑i=1
(~d T (~xi − av (~x1, . . . , ~xn))
)2
= ~d T
(1
n − 1
n∑i=1
(~xi − av (~x1, . . . , ~xn)) (~xi − av (~x1, . . . , ~xn))T)~d
= ~d TΣ (~x1, . . . , ~xn) ~d
Covariance matrix captures variance in every direction!
![Page 127: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/127.jpg)
Variation in a certain direction
For a unit vector ~d ∈ Rm
var(~d T~x1, . . . , ~d
T~xn)
=1
n − 1
n∑i=1
(~d T~xi − av
(~d T~x1, . . . , ~d
T~xn))2
=1
n − 1
n∑i=1
(~d T (~xi − av (~x1, . . . , ~xn))
)2
= ~d T
(1
n − 1
n∑i=1
(~xi − av (~x1, . . . , ~xn)) (~xi − av (~x1, . . . , ~xn))T)~d
= ~d TΣ (~x1, . . . , ~xn) ~d
Covariance matrix captures variance in every direction!
![Page 128: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/128.jpg)
Variation in a certain direction
For a unit vector ~d ∈ Rm
var(~d T~x1, . . . , ~d
T~xn)
=1
n − 1
n∑i=1
(~d T~xi − av
(~d T~x1, . . . , ~d
T~xn))2
=1
n − 1
n∑i=1
(~d T (~xi − av (~x1, . . . , ~xn))
)2
= ~d T
(1
n − 1
n∑i=1
(~xi − av (~x1, . . . , ~xn)) (~xi − av (~x1, . . . , ~xn))T)~d
= ~d TΣ (~x1, . . . , ~xn) ~d
Covariance matrix captures variance in every direction!
![Page 129: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/129.jpg)
Variation in a certain direction
For a unit vector ~d ∈ Rm
var(~d T~x1, . . . , ~d
T~xn)
=1
n − 1
n∑i=1
(~d T~xi − av
(~d T~x1, . . . , ~d
T~xn))2
=1
n − 1
n∑i=1
(~d T (~xi − av (~x1, . . . , ~xn))
)2
= ~d T
(1
n − 1
n∑i=1
(~xi − av (~x1, . . . , ~xn)) (~xi − av (~x1, . . . , ~xn))T)~d
= ~d TΣ (~x1, . . . , ~xn) ~d
Covariance matrix captures variance in every direction!
![Page 130: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/130.jpg)
Principal component analysis
Given n data vectors ~x1, ~x2, . . . , ~xn ∈ Rd ,
1. Center the data,
~ci = ~xi − av (~x1, ~x2, . . . , ~xn) , 1 ≤ i ≤ n
2. Group the centered data as columns of a matrix
C =[~c1 ~c2 · · · ~cn
].
3. Compute the SVD of C
The left singular vectors are the principal directions
The principal values are the coefficients of the centered vectorsin the basis of principal directions.
![Page 131: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/131.jpg)
Directions of maximum variance
The principal directions satisfy
~u1 = argmax{||~d||2=1 | ~d∈Rn}
var(~d T~x1, . . . , ~d
T~xn)
~ui = argmax{||~d||2=1 | ~d∈Rn, ~d⊥~u1,...,~ui−1}
var(~d T~x1, . . . , ~d
T~xn), 2 ≤ i ≤ k
![Page 132: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/132.jpg)
Directions of maximum variance
The associated singular values satisfy
σ1√n − 1
= max{||~d||2=1 | ~d∈Rn}
std(~d T~x1, . . . , ~d
T~xn)
σi√n − 1
= max{||~d||2=1 | ~d∈Rn, ~d⊥~u1,...,~ui−1}
std(~d T~x1, . . . , ~d
T~xn), 2 ≤ i ≤ k
![Page 133: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/133.jpg)
Proof
Σ (~x1, . . . , ~xn) =1
n − 1
n∑i=1
(~xi − av (~x1, . . . , ~xn)) (~xi − av (~x1, . . . , ~xn))T
=1
n − 1CCT
For any vector ~d
var(~d T~x1, . . . , ~d
T~xn)
= ~d TΣ (~x1, . . . , ~xn) ~d
=1
n − 1~d TCCT ~d
=1
n − 1
∣∣∣∣∣∣CT ~d∣∣∣∣∣∣2
2
![Page 134: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/134.jpg)
Proof
Σ (~x1, . . . , ~xn) =1
n − 1
n∑i=1
(~xi − av (~x1, . . . , ~xn)) (~xi − av (~x1, . . . , ~xn))T
=1
n − 1CCT
For any vector ~d
var(~d T~x1, . . . , ~d
T~xn)
= ~d TΣ (~x1, . . . , ~xn) ~d
=1
n − 1~d TCCT ~d
=1
n − 1
∣∣∣∣∣∣CT ~d∣∣∣∣∣∣2
2
![Page 135: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/135.jpg)
Proof
Σ (~x1, . . . , ~xn) =1
n − 1
n∑i=1
(~xi − av (~x1, . . . , ~xn)) (~xi − av (~x1, . . . , ~xn))T
=1
n − 1CCT
For any vector ~d
var(~d T~x1, . . . , ~d
T~xn)
= ~d TΣ (~x1, . . . , ~xn) ~d
=1
n − 1~d TCCT ~d
=1
n − 1
∣∣∣∣∣∣CT ~d∣∣∣∣∣∣2
2
![Page 136: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/136.jpg)
Proof
Σ (~x1, . . . , ~xn) =1
n − 1
n∑i=1
(~xi − av (~x1, . . . , ~xn)) (~xi − av (~x1, . . . , ~xn))T
=1
n − 1CCT
For any vector ~d
var(~d T~x1, . . . , ~d
T~xn)
= ~d TΣ (~x1, . . . , ~xn) ~d
=1
n − 1~d TCCT ~d
=1
n − 1
∣∣∣∣∣∣CT ~d∣∣∣∣∣∣2
2
![Page 137: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/137.jpg)
Singular values
The singular values of A satisfy
σ1 = max{||~x ||2=1 | ~x∈Rn}
||A~x ||2
= max{||~y ||2=1 | ~y∈Rm}
∣∣∣∣∣∣AT~y∣∣∣∣∣∣
2
σi = max{||~x ||2=1 | ~x∈Rn, ~x⊥~u1,...,~ui−1}
||A~x ||2
= max{||~y ||2=1 | ~y∈Rm, ~y⊥~v1,...,~vi−1}
∣∣∣∣∣∣AT~y∣∣∣∣∣∣
2, 2 ≤ i ≤ min {m, n}
![Page 138: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/138.jpg)
Singular vectors
The left singular vectors of A satisfy
~u1 = argmax{||~y ||2=1 | ~y∈Rm}
∣∣∣∣∣∣AT~y∣∣∣∣∣∣
2
~ui = argmax{||~y ||2=1 | ~y∈Rm, ~y⊥~v1,...,~vi−1}
∣∣∣∣∣∣AT~y∣∣∣∣∣∣
2, 2 ≤ i ≤ n
![Page 139: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/139.jpg)
PCA in 2D
σ1/√n − 1 = 0.705,
σ2/√n − 1 = 0.690
σ1/√n − 1 = 0.983,
σ2/√n − 1 = 0.356
σ1/√n − 1 = 1.349,
σ2/√n − 1 = 0.144
~u1
~u2
v1
v2
~u1
~u2
~u1
~u2
![Page 140: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/140.jpg)
Centering
σ1/√n − 1 = 5.077 σ1/
√n − 1 = 1.261
σ2/√n − 1 = 0.889 σ2/
√n − 1 = 0.139
~u1
~u2
~u2
~u1
Uncentered data Centered data
![Page 141: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/141.jpg)
PCA of faces
Data set of 400 64× 64 images from 40 subjects (10 per subject)
Each face is vectorized and interpreted as a vector in R4096
![Page 142: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/142.jpg)
PCA of faces
Center PD 1 PD 2 PD 3 PD 4 PD 5
σi/√n − 1 330 251 192 152 130
![Page 143: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/143.jpg)
PCA of faces
PD 10 PD 15 PD 20 PD 30 PD 40 PD 50
90.2 70.8 58.7 45.1 36.0 30.8
![Page 144: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/144.jpg)
PCA of faces
PD 100 PD 150 PD 200 PD 250 PD 300 PD 359
19.0 13.7 10.3 8.01 6.14 3.06
![Page 145: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/145.jpg)
Projection onto first 7 principal directions
Center PD 1 PD 2
= 8613 - 2459 + 665
PD 3 PD 4 PD 5
- 180 + 301 + 566
PD 6 PD 7
+ 638 + 403
![Page 146: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/146.jpg)
Projection onto first k principal directions
Signal 5 PDs 10 PDs 20 PDs 30 PDs 50 PDs
100 PDs 150 PDs 200 PDs 250 PDs 300 PDs 359 PDs
![Page 147: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/147.jpg)
Basic properties
Singular value decomposition
Denoising
Collaborative filtering
Principal component analysis
Probabilistic interpretation
Dimensionality reduction
Eigendecomposition
![Page 148: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/148.jpg)
Covariance matrix
The covariance matrix of a random vector ~x is defined as
Σ~x :=
Var (~x [1]) Cov (~x [1] ,~x [2]) · · · Cov (~x [1] ,~x [n])
Cov (~x [2] ,~x [1]) Var (~x [2]) · · · Cov (~x [2] ,~x [n])...
.... . .
...Cov (~x [n] ,~x [1]) Cov (~x [n] ,~x [2]) · · · Var (~x [n])
= E
(~x~xT
)− E(~x)E(~x)T
If the covariance matrix is diagonal, the entries are uncorrelated
![Page 149: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/149.jpg)
Covariance matrix after a linear transformation
Let Σ ∈ Rn×n be the covariance matrix of ~x. For any matrix A ∈ Rm×n
ΣA~x+~b
= AΣ~xAT
Proof:
ΣA~x
= E(
(A~x) (A~x)T)− E (A~x) E (A~x)T
= A(E(~x~xT
)− E(~x)E(~x)T
)AT
= AΣ~xAT
![Page 150: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/150.jpg)
Covariance matrix after a linear transformation
Let Σ ∈ Rn×n be the covariance matrix of ~x. For any matrix A ∈ Rm×n
ΣA~x+~b
= AΣ~xAT
Proof:
ΣA~x = E(
(A~x) (A~x)T)− E (A~x) E (A~x)T
= A(E(~x~xT
)− E(~x)E(~x)T
)AT
= AΣ~xAT
![Page 151: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/151.jpg)
Covariance matrix after a linear transformation
Let Σ ∈ Rn×n be the covariance matrix of ~x. For any matrix A ∈ Rm×n
ΣA~x+~b
= AΣ~xAT
Proof:
ΣA~x = E(
(A~x) (A~x)T)− E (A~x) E (A~x)T
= A(E(~x~xT
)− E(~x)E(~x)T
)AT
= AΣ~xAT
![Page 152: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/152.jpg)
Covariance matrix after a linear transformation
Let Σ ∈ Rn×n be the covariance matrix of ~x. For any matrix A ∈ Rm×n
ΣA~x+~b
= AΣ~xAT
Proof:
ΣA~x = E(
(A~x) (A~x)T)− E (A~x) E (A~x)T
= A(E(~x~xT
)− E(~x)E(~x)T
)AT
= AΣ~xAT
![Page 153: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/153.jpg)
Variance in a fixed direction
The variance of ~x in the direction of a unit-norm vector ~v equals
Var(~v T~x
)= ~v TΣ~x~v
![Page 154: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/154.jpg)
SVD of covariance matrix
Σ~x = UΛUT
=[~u1 ~u2 · · · ~un
] σ1 0 · · · 00 σ2 · · · 0
· · ·0 0 · · · σn
[~u1 ~u2 · · · ~un]T
![Page 155: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/155.jpg)
Directions of maximum variance
The SVD of the covariance matrix Σ~x of a random vector ~x satisfies
σ1 = max||~v ||2=1
Var(~v T~x
)~u1 = arg max
||~v ||2=1Var
(~v T~x
)σk = max
||~v ||2=1,~v⊥~u1,...,~uk−1Var
(~v T~x
)~uk = arg max
||~v ||2=1,~v⊥~u1,...,~uk−1Var
(~v T~x
)
![Page 156: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/156.jpg)
Directions of maximum variance
√σ1 = 1.22,
√σ2 = 0.71
√σ1 = 1,
√σ2 = 1
√σ1 = 1.38,
√σ2 = 0.32
![Page 157: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/157.jpg)
Probabilistic interpretation of PCA
n = 5 n = 20 n = 100
True covarianceSample covariance
![Page 158: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/158.jpg)
Basic properties
Singular value decomposition
Denoising
Collaborative filtering
Principal component analysis
Probabilistic interpretation
Dimensionality reduction
Eigendecomposition
![Page 159: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/159.jpg)
Dimensionality reduction
Data with a large number of features can be difficult to analyze or process
Dimensionality reduction is a useful preprocessing step
If data modeled are vectors in Rm we can reduce the dimension byprojecting onto Rk , where k < m
For orthogonal projections, the new representation is 〈~b1, ~x〉, 〈~b2, ~x〉, . . . ,〈~bk , ~x〉 for a basis ~b1, . . . , ~bk of the subspace that we project on
Problem: How do we choose the subspace?
![Page 160: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/160.jpg)
Optimal subspace for orthogonal projection
Given a set of vectors ~a1, ~a2, . . . ~an and a fixed dimension k ≤ n, theSVD of
A :=[~a1 ~a2 · · · ~an
]∈ Rm×n
provides the k-dimensional subspace that captures the most energy
n∑i=1
∣∣∣∣Pspan(~u1,~u2,...,~uk ) ~ai∣∣∣∣2
2 ≥n∑
i=1
||PS ~ai ||22
for any subspace S of dimension k
![Page 161: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/161.jpg)
Nearest neighbors in principal-component space
Nearest neighbors classification (Algorithm 4.2 in Lecture Notes 1)computes n distances in Rm for each new example
Cost: O (nmp) for p examples
Idea: Project onto first k main principal directions beforehand
Cost:I O
(m2n
), if m < n, to compute principal dimensions
I kmn operations to project training setI kmp operations to project test setI knp to perform nearest-neighbor classification
Faster if p > m
![Page 162: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/162.jpg)
Face recognition
Training set: 360 64× 64 images from 40 different subjects (9 each)
Test set: 1 new image from each subject
We model each image as a vector in R4096 (m = 4096)
To classify we:
1. Project onto first k principal directions
2. Apply nearest-neighbor classification using the `2-norm distance in Rk
![Page 163: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/163.jpg)
Performance
0 10 20 30 40 50 60 70 80 90 100
10
20
30
4
Number of principal components
Errors
![Page 164: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/164.jpg)
Nearest neighbor in R41
Test image
Projection
Closestprojection
Correspondingimage
![Page 165: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/165.jpg)
Dimensionality reduction for visualization
Motivation: Visualize high-dimensional features projected onto 2D or 3D
Example:
Seeds from three different varieties of wheat: Kama, Rosa and Canadian
Features:I AreaI PerimeterI CompactnessI Length of kernelI Width of kernelI Asymmetry coefficientI Length of kernel groove
![Page 166: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/166.jpg)
Projection onto two first PCs
2.5 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0
Projection onto first PC
2.5
2.0
1.5
1.0
0.5
0.0
0.5
1.0
1.5
2.0Pro
ject
ion o
nto
seco
nd P
C
![Page 167: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/167.jpg)
Projection onto two last PCs
2.5 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0
Projection onto (d-1)th PC
2.5
2.0
1.5
1.0
0.5
0.0
0.5
1.0
1.5
2.0Pro
ject
ion o
nto
dth
PC
![Page 168: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/168.jpg)
Whitening
Motivation: Dominating principal directions are not necessarily the mostinformative
Principal directions corresponding to small singular values may containinformation that is drowned by main directions
Intuitively, linear skew obscures useful structure
![Page 169: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/169.jpg)
PCA of faces
Center PD 1 PD 2 PD 3 PD 4 PD 5
σi/√n − 1 330 251 192 152 130
![Page 170: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/170.jpg)
Whitening
Given ~x1, ~x2, . . . , ~xn ∈ Rd we:
1. Center the data,
~ci = ~xi − av (~x1, ~x2, . . . , ~xn) , 1 ≤ i ≤ n
2. Group the centered data as columns of a matrix
C =[~c1 ~c2 · · · ~cn
]3. Compute the SVD of C = USV T
4. Whiten by applying the linear map US−1UT
~wi := US−1UT~ci
![Page 171: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/171.jpg)
Covariance of whitened data
Matrix of whitened vectors
W = US−1UTC
The covariance matrix of the whitened data is
Σ (~c1, . . . , ~cn)
=1
n − 1WW T
=1
n − 1US−1UTCCTUS−1UT
=1
n − 1US−1UTUSV TVSUTUS−1UT
=1
n − 1I
![Page 172: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/172.jpg)
Covariance of whitened data
Matrix of whitened vectors
W = US−1UTC
The covariance matrix of the whitened data is
Σ (~c1, . . . , ~cn) =1
n − 1WW T
=1
n − 1US−1UTCCTUS−1UT
=1
n − 1US−1UTUSV TVSUTUS−1UT
=1
n − 1I
![Page 173: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/173.jpg)
Covariance of whitened data
Matrix of whitened vectors
W = US−1UTC
The covariance matrix of the whitened data is
Σ (~c1, . . . , ~cn) =1
n − 1WW T
=1
n − 1US−1UTCCTUS−1UT
=1
n − 1US−1UTUSV TVSUTUS−1UT
=1
n − 1I
![Page 174: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/174.jpg)
Covariance of whitened data
Matrix of whitened vectors
W = US−1UTC
The covariance matrix of the whitened data is
Σ (~c1, . . . , ~cn) =1
n − 1WW T
=1
n − 1US−1UTCCTUS−1UT
=1
n − 1US−1UTUSV TVSUTUS−1UT
=1
n − 1I
![Page 175: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/175.jpg)
Covariance of whitened data
Matrix of whitened vectors
W = US−1UTC
The covariance matrix of the whitened data is
Σ (~c1, . . . , ~cn) =1
n − 1WW T
=1
n − 1US−1UTCCTUS−1UT
=1
n − 1US−1UTUSV TVSUTUS−1UT
=1
n − 1I
![Page 176: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/176.jpg)
Whitening
~x1, . . . , ~xn UT~x1, . . . , UT~xn S−1UT~x1, . . . , S−1UT~xn
![Page 177: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/177.jpg)
Whitened faces
~x
US−1UT~x
![Page 178: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/178.jpg)
Basic properties
Singular value decomposition
Denoising
Collaborative filtering
Principal component analysis
Probabilistic interpretation
Dimensionality reduction
Eigendecomposition
![Page 179: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/179.jpg)
Eigenvectors
An eigenvector ~q of a square matrix A ∈ Rn×n satisfies
A~q = λ~q
for a scalar λ which is the corresponding eigenvalue
Even if A is real, its eigenvectors and eigenvalues can be complex
![Page 180: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/180.jpg)
Eigendecomposition
If a matrix has n linearly independent eigenvectors then it isdiagonalizable
Let ~q1, . . . , ~qn be lin. indep. eigenvectors of A ∈ Rn×n with eigenvaluesλ1, . . . , λn
A =[~q1 ~q2 · · · ~qn
] λ1 0 · · · 00 λ2 · · · 0
· · ·0 0 · · · λn
[~q1 ~q2 · · · ~qn]−1
= QΛQ−1
![Page 181: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/181.jpg)
Proof
AQ
=[A~q1 A~q2 · · · A~qn
]=[λ1~q1 λ2~q2 · · · λ2~qn
]= QΛ
![Page 182: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/182.jpg)
Proof
AQ =[A~q1 A~q2 · · · A~qn
]
=[λ1~q1 λ2~q2 · · · λ2~qn
]= QΛ
![Page 183: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/183.jpg)
Proof
AQ =[A~q1 A~q2 · · · A~qn
]=[λ1~q1 λ2~q2 · · · λ2~qn
]
= QΛ
![Page 184: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/184.jpg)
Proof
AQ =[A~q1 A~q2 · · · A~qn
]=[λ1~q1 λ2~q2 · · · λ2~qn
]= QΛ
![Page 185: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/185.jpg)
Not all matrices have an eigendecomposition
A :=
[0 10 0
]Assume A has an eigenvector ~q associated to an eigenvalue λ[
~q [2]0
]
=
[0 10 0
] [~q [1]~q [2]
]=
[λ~q [1]λ~q [2]
]
![Page 186: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/186.jpg)
Not all matrices have an eigendecomposition
A :=
[0 10 0
]Assume A has an eigenvector ~q associated to an eigenvalue λ[
~q [2]0
]=
[0 10 0
] [~q [1]~q [2]
]
=
[λ~q [1]λ~q [2]
]
![Page 187: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/187.jpg)
Not all matrices have an eigendecomposition
A :=
[0 10 0
]Assume A has an eigenvector ~q associated to an eigenvalue λ[
~q [2]0
]=
[0 10 0
] [~q [1]~q [2]
]=
[λ~q [1]λ~q [2]
]
![Page 188: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/188.jpg)
Spectral theorem for symmetric matrices
If A ∈ Rn is symmetric, then it has an eigendecomposition of the form
A = UΛUT
where the matrix of eigenvectors U is an orthogonal matrix
![Page 189: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/189.jpg)
Eigendecomposition vs SVD
Symmetric matrices also have singular value decomposition
A = USV T
Left singular vectors = eigenvectors
Singular values = magnitude of eigenvalues
~vi = ~ui if λi ≥ 0
~vi = −~ui if λi < 0
![Page 190: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/190.jpg)
Application of eigendecomposition
Fast computation of
AA · · ·A~x = Ak~x
Ak
= QΛQ−1QΛQ−1 · · ·QΛQ−1
= QΛkQ−1
= Q
λk1 0 · · · 00 λk2 · · · 0
· · ·0 0 · · · λkn
Q−1
![Page 191: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/191.jpg)
Application of eigendecomposition
Fast computation of
AA · · ·A~x = Ak~x
Ak = QΛQ−1QΛQ−1 · · ·QΛQ−1
= QΛkQ−1
= Q
λk1 0 · · · 00 λk2 · · · 0
· · ·0 0 · · · λkn
Q−1
![Page 192: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/192.jpg)
Application of eigendecomposition
Fast computation of
AA · · ·A~x = Ak~x
Ak = QΛQ−1QΛQ−1 · · ·QΛQ−1
= QΛkQ−1
= Q
λk1 0 · · · 00 λk2 · · · 0
· · ·0 0 · · · λkn
Q−1
![Page 193: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/193.jpg)
Application of eigendecomposition
Fast computation of
AA · · ·A~x = Ak~x
Ak = QΛQ−1QΛQ−1 · · ·QΛQ−1
= QΛkQ−1
= Q
λk1 0 · · · 00 λk2 · · · 0
· · ·0 0 · · · λkn
Q−1
![Page 194: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/194.jpg)
Application of eigendecomposition
For any vector ~x
Ak~x
=n∑
i=1
αiAk~qi
=n∑
i=1
αiλki ~qi
If |λ1| > |λ2| ≥ . . . then ~q1 will eventually dominate
![Page 195: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/195.jpg)
Application of eigendecomposition
For any vector ~x
Ak~x =n∑
i=1
αiAk~qi
=n∑
i=1
αiλki ~qi
If |λ1| > |λ2| ≥ . . . then ~q1 will eventually dominate
![Page 196: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/196.jpg)
Application of eigendecomposition
For any vector ~x
Ak~x =n∑
i=1
αiAk~qi
=n∑
i=1
αiλki ~qi
If |λ1| > |λ2| ≥ . . . then ~q1 will eventually dominate
![Page 197: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/197.jpg)
Application of eigendecomposition
For any vector ~x
Ak~x =n∑
i=1
αiAk~qi
=n∑
i=1
αiλki ~qi
If |λ1| > |λ2| ≥ . . . then ~q1 will eventually dominate
![Page 198: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/198.jpg)
Power method
Set ~x1 := ~x/ ||~x ||2, where ~x is a randomly generated.
For i = 1, 2, 3, . . ., compute
~xi :=A~xi−1
||A~xi−1||2
![Page 199: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/199.jpg)
Power method
~q1
~x1~q2
![Page 200: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/200.jpg)
Power method
~q1
~x2
~q2
![Page 201: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/201.jpg)
Power method
~q1
~x3
~q2
![Page 202: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/202.jpg)
Deer and wolfs
Model for deer dn+1 and wolf wn+1 population in year n + 1
dn+1 =54dn −
34wn
wn+1 =14dn +
14wn n = 0, 1, 2, . . .
![Page 203: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/203.jpg)
Deer and wolfs
[dnwn
]
= QΛnQ−1[d0w0
]=
[3 11 1
] [1 00 0.5n
] [0.5 −0.5−0.5 1.5
] [d0w0
]=
d0 − w0
2
[31
]+
3w0 − d0
8n
[11
]
![Page 204: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/204.jpg)
Deer and wolfs
[dnwn
]= QΛnQ−1
[d0w0
]
=
[3 11 1
] [1 00 0.5n
] [0.5 −0.5−0.5 1.5
] [d0w0
]=
d0 − w0
2
[31
]+
3w0 − d0
8n
[11
]
![Page 205: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/205.jpg)
Deer and wolfs
[dnwn
]= QΛnQ−1
[d0w0
]=
[3 11 1
] [1 00 0.5n
] [0.5 −0.5−0.5 1.5
] [d0w0
]
=d0 − w0
2
[31
]+
3w0 − d0
8n
[11
]
![Page 206: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/206.jpg)
Deer and wolfs
[dnwn
]= QΛnQ−1
[d0w0
]=
[3 11 1
] [1 00 0.5n
] [0.5 −0.5−0.5 1.5
] [d0w0
]=
d0 − w0
2
[31
]+
3w0 − d0
8n
[11
]
![Page 207: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/207.jpg)
Deer and wolfs
0 2 4 6 8 10Year
0
20
40
60
80
100Popula
tion
Deer
Wolfs
![Page 208: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/208.jpg)
Deer and wolfs
0 2 4 6 8 10Year
0
20
40
60
80
100Popula
tion
![Page 209: Matrices - NYU Courant](https://reader035.fdocuments.net/reader035/viewer/2022081701/62ddd2e2f6c90e31d90316f1/html5/thumbnails/209.jpg)
Deer and wolfs
0 2 4 6 8 10Year
0
200
400
600
800
1000
1200
1400
1600Popula
tion