ELE 538B: Sparsity, Structure and Inference
Low-Rank Matrix Recovery
Yuxin Chen
Princeton University, Spring 2017
Outline
• Low-rank matrix completion and recovery
• Spectral methods
• Nuclear norm minimization
RIP and low-rank matrix recovery Phase retrieval / solving random quadratic systems of equations Matrix completion
Matrix recovery 8-2
Low-rank matrix completion and recovery
Matrix recovery 8-3
Motivation 1: recommendation systems
? ? ? ?
?
?
??
??
???
?
?
• Netflix challenge: Netflix provides highly incomplete ratings from0.5 million users for & 17,770 movies
• How to predict unseen user ratings for movies?Matrix recovery 8-4
In general, we cannot infer missing ratings
X ? ? ? X ?? ? X X ? ?X ? ? X ? ?? ? X ? ? XX ? ? ? ? ?? X ? ? X ?? ? X X ? ?
Underdetermined system (more unknowns than observations)
Matrix recovery 8-5
... unless rating matrix has other structure
? ? ? ?
?
?
??
??
???
?
?
A few factors explain most of the data
How to exploit (approx.) low-rank structure in prediction?
Matrix recovery 8-6
... unless rating matrix has other structure
? ? ? ?
?
?
??
??
???
?
?
A few factors explain most of the data −→ low-rank approximation
How to exploit (approx.) low-rank structure in prediction?
Matrix recovery 8-6
Motivation 2: sensor localization
• n sensors / points xj ∈ R3, j = 1, · · · , n• Observe partial information about pairwise distances
Di,j = ‖xi − xj‖2 = ‖xi‖2 + ‖xj‖2 − 2x>i xj
• Want to infer distance between every pair of nodes
Matrix recovery 8-7
Motivation 2: sensor localization
Introduce
X =
x>1x>2
...x>n
∈ Rn×3
then distance matrix D = [Di,j ]1≤i,j≤n can be written as
D = d2e> + ed>2 − 2XX>︸ ︷︷ ︸
low rank
where d2 := [‖x1‖2, · · · , ‖xn‖2 ]>
rank(D) n −→ low-rank matrix completion
Matrix recovery 8-8
Motivation 3: structure from motionGiven multiple images and a few correspondences between imagefeatures, how to estimate locations of 3D points?
Snavely, Seitz, & Szeliski
Structure from motion: reconstruct 3D scene geometry︸ ︷︷ ︸structure
and
camera poses︸ ︷︷ ︸motion
from multiple images
Matrix recovery 8-9
Motivation 3: structure from motion
Tomasi and Kanade’s factorization:• Consider n 3D points in m different 2D frames
• xi,j ∈ R2×1: locations of jth point in ith frame
xi,j = Mi︸︷︷︸projection matrix ∈R2×3
pj︸︷︷︸3D position ∈R3
• Matrix of all 2D locations
X =
x1,1 · · · x1,n... . . . ...
xm,1 · · · xm,n
=
M1...
Mm
[ p1 · · · pn]
︸ ︷︷ ︸low-rank factorization
∈ R2m×n
Goal: fill in missing entries of X given a small number of entriesMatrix recovery 8-10
Motivation 4: missing phase problemDetectors record intensities of diffracted rays• electric field x(t1, t2) −→ Fourier transform x(f1, f2)
Fig credit: Stanford SLAC
intensity of electrical field:∣∣x(f1, f2)
∣∣2 =∣∣∣∫ x(t1, t2)e−i2π(f1t1+f2t2)dt1dt2
∣∣∣2Phase retrieval: recover signal x(t1, t2) from intensity |x(f1, f2)
∣∣2Matrix recovery 8-11
A discrete-time model: solving quadratic systemsA
x
Ax
1
A
x
Ax
1
A
x
Ax
1
A
x
Ax
y = |Ax|2
1
1
-3
2
-1
4
2 -2
-1
3
4
1
9
4
1
16
4 4
1
9
16
Solve for x ∈ Cn in m quadratic equations
yk = |〈ak,x〉|2, k = 1, . . . ,mor y = |Ax|2 where |z|2 := |z1|2, · · · , |zm|2
Matrix recovery 8-12
An equivalent view: low-rank factorization
Lifting: introduce X = xx∗ to linearize constraints
yk = |a∗kx|2 = a∗k(xx∗)ak =⇒ yk = a∗kXak (8.1)
find X 0
s.t. yk = 〈aka∗k,X〉, k = 1, · · · ,mrank(X) = 1
Matrix recovery 8-13
Setup
• Consider M ∈ Sn×n (symmetric square case for simplicity)
• rank(M) = r n
• Singular value decomposition (SVD) of M :
M = UΣV >︸ ︷︷ ︸(2n−r)r degrees of freedom
=r∑i=1
σiuivTi
where Σ =
σ1. . .
σr
contain all singular values σi;
U := [u1, · · · ,ur], V := [v1, · · · ,vr] consist of singular vectors
Matrix recovery 8-14
Low-rank matrix completion
Observed entries
Mi,j , (i, j) ∈ Ω︸︷︷︸sampling set
Completion via rank minimization
minimizeX rank(X) s.t. Xi,j = Mi,j , (i, j) ∈ Ω
• An operator PΩ: orthogonal projection onto subspace ofmatrices supported on Ω
Completion via rank minimization
minimizeX rank(X) s.t. PΩ(X) = PΩ(M)
Matrix recovery 8-15
Low-rank matrix completion
Observed entries
Mi,j , (i, j) ∈ Ω︸︷︷︸sampling set
Completion via rank minimization
minimizeX rank(X) s.t. Xi,j = Mi,j , (i, j) ∈ Ω
• An operator PΩ: orthogonal projection onto subspace ofmatrices supported on Ω
Completion via rank minimization
minimizeX rank(X) s.t. PΩ(X) = PΩ(M)
Matrix recovery 8-15
More general: low-rank matrix recovery
Linear measurements
yi = 〈Ai,M〉 = Tr(A>i M), i = 1, . . .m
• An operator form
y = A(M) :=
〈A1,M〉...
〈Am,M〉
Recovery via rank minimization
minimizeX rank(X) s.t. y = A(X)
Matrix recovery 8-16
Spectral methods
Matrix recovery 8-17
Signal + noise
(i, j) ∈ Ω independently with prob. p
One can write observation PΩ(M) as
1pPΩ(M) = M︸︷︷︸
signal
+ 1pPΩ(M)−M︸ ︷︷ ︸
noise
• Noise has mean zero: E[
1pPΩ(M)
]= M
Matrix recovery 8-18
Low-rank denoising
1pPΩ(M) = M︸︷︷︸
low-rank signal
+ 1pPΩ(M)−M︸ ︷︷ ︸
:=E (zero-mean noise)
Algorithm 8.1 Spectral methodM ←− best rank-r approximation of 1
pPΩ(M)
Matrix recovery 8-19
Histograms of singular values of PΩ(M)
A 104 × 104 random rank-3 matrix M with p = 0.003
Fig. credit: Keshavan, Montanari, Oh ’10
Matrix recovery 8-20
Performance of spectral methods
Theorem 8.1 (Keshavan, Montanari, Oh ’10)
Suppose number of observed entries m obeys m & n logn. Then
‖M −M‖F‖M‖F
.maxi,j |Mi,j |
1n‖M‖F︸ ︷︷ ︸
:=ν
·
√nr log2 n
m,
• ν reflects whether energy of M is spread out
• When m ν2n log2 n, estimate M is very close to truth1
• Degrees of freedom rn
−→ nearly-optimal sample complexity for incoherent matrices1The logarithmic factor can be improved.
Matrix recovery 8-21
Perturbation bounds
To ensure M is good estimate, it suffices to control noise E
Lemma 8.2
Suppose rank(M) = r. For any perturbation E,
‖Pr(M + E)−M‖ ≤ 2‖E‖‖Pr(M + E)−M‖F ≤ 2
√2r‖E‖
where Pr(X) is best rank-r approximation of XMatrix recovery 8-22
Proof of Lemma 8.2
By matrix perturbation theory,
‖Pr(M + E)−M‖triangle inequality
≤ ‖Pr(M + E)− (M + E)‖+ ‖(M + E)−M‖≤ σr+1(M + E) + ‖E‖
Weyl’s inequality≤ σr+1(M)︸ ︷︷ ︸
=0
+ ‖E‖+ ‖E‖ = 2‖E‖
The 2nd inequality of Lemma 8.2 follows since both Pr(M + E) andM are rank-r, and hence
‖Pr(M + E)−M‖F ≤√
2r‖Pr(M + E)−M‖
Matrix recovery 8-23
Controlling the noise
Recall that entries of E = 1pPΩ(M)−M are zero-mean and
independent
A bit of random matrix theory ...
Lemma 8.3 (Chapter 2.3, Tao ’12)
Suppose X ∈ Rn×n is a random symmetric matrix obeying• Xi,j : i < j are independent
• E[Xi,j ] = 0 and Var[Xi,j ] . 1
• maxi,j |Xi,j | .√n
Then ‖X‖ .√n logn
Matrix recovery 8-24
Proof of Theorem 8.1
Let Mmax := maxi,j |Mi,j |. The zero-mean matrix E =√p
MmaxE obeys
Var[Ei,j
]= p(1− p) ·
( √p
Mmax· 1pMi,j
)2≤(Mi,j
Mmax
)2. 1,
|Ei,j | ≤|Mi,j |√pMmax
.1√p.
Lemma 8.3 tells us that if p & 1n , then
‖E‖ .√n logn ⇐⇒ ‖E‖ . Mmax
√n
√p
logn
This together with Lemma 8.2 and the fact m pn2 establishes Theorem8.1
Matrix recovery 8-25
Nuclear norm minimization
Matrix recovery 8-26
Convex relaxation
minimizeX rank(X)︸ ︷︷ ︸nonconvex
s.t. PΩ(X) = PΩ(M)
minimizeX rank(X)︸ ︷︷ ︸nonconvex
s.t. A(X) = A(M)
Question: what is convex surrogate for rank(·)?
Matrix recovery 8-27
Nuclear norm
Definition 8.4The nuclear norm of X is
‖X‖∗ :=n∑i=1
σi(X)︸ ︷︷ ︸ith largest singular value
• Nuclear norm is a counterpart of `1 norm for rank function
• Equivalence among different norms
‖X‖ ≤ ‖X‖F ≤ ‖X‖∗ ≤√r‖X‖F ≤ r‖X‖.
• (Tightness) X : ‖X‖∗ ≤ 1 is convex hull of rank-1 matricesobeying ‖uv>‖ ≤ 1 (Fazel ’02)
Matrix recovery 8-28
Additivity of nuclear norm
Fact 8.5
Let A and B be matrices of the same dimensions. If AB> = 0 andA>B = 0, then ‖A + B‖∗ = ‖A‖∗ + ‖B‖∗.
• If row (resp. column) spaces of A and B are orthogonal, then‖A + B‖∗ = ‖A‖∗ + ‖B‖∗
• Similar to `1 norm: when x and y have disjoint support,
‖x + y‖1 = ‖x‖1 + ‖y‖1 (a key to study `1-min under RIP)
Matrix recovery 8-29
Proof of Fact 8.5
Suppose A = UAΣAV>A and B = UBΣBV
>B , which gives
AB> = 0A>B = 0
⇐⇒ V >A VB = 0U>AUB = 0
Thus, one can write
A = [UA,UB ,UC ]
ΣA
00
[VA,VB ,VC ]>
B = [UA,UB ,UC ]
0ΣB
0
[VA,VB ,VC ]>
and hence‖A + B‖∗ =
∥∥∥∥[UA,UB ][
ΣA
ΣB
][VA,VB ]>
∥∥∥∥∗
= ‖A‖∗ + ‖B‖∗
Matrix recovery 8-30
Dual norm
Definition 8.6 (Dual norm)For a given norm ‖ · ‖A, the dual norm is defined as
‖X‖?A := max〈X,Y 〉 : ‖Y ‖A ≤ 1
• `1 norm dual←→ `∞ norm
• nuclear norm dual←→ spectral norm
• `2 norm dual←→ `2 norm
• Frobenius norm dual←→ Frobenius norm
Matrix recovery 8-31
Representing nuclear norm via SDPSince spectral norm is dual norm of nuclear norm,
‖X‖∗ = max〈X,Y 〉 : ‖Y ‖ ≤ 1
The constraint is equivalent to
‖Y ‖ ≤ 1 ⇐⇒ Y Y > ISchur complement⇐⇒
[I YY > I
] 0
Fact 8.7
‖X‖∗ = maxY
〈X,Y 〉
∣∣∣∣∣[
I YY > I
] 0
Fact 8.8 (Dual characterization)
‖X‖∗ = minW1,W2
12Tr(W1) + 1
2Tr(W2)∣∣∣∣∣[W1 XX> W2
] 0
• Optimal point: W1 = UΣU>, W2 = V ΣV > (whereX = UΣV >)
Matrix recovery 8-32
Representing nuclear norm via SDPSince spectral norm is dual norm of nuclear norm,
‖X‖∗ = max〈X,Y 〉 : ‖Y ‖ ≤ 1
The constraint is equivalent to
‖Y ‖ ≤ 1 ⇐⇒ Y Y > ISchur complement⇐⇒
[I YY > I
] 0
Fact 8.7
‖X‖∗ = maxY
〈X,Y 〉
∣∣∣∣∣[
I YY > I
] 0
Fact 8.8 (Dual characterization)
‖X‖∗ = minW1,W2
12Tr(W1) + 1
2Tr(W2)∣∣∣∣∣[W1 XX> W2
] 0
• Optimal point: W1 = UΣU>, W2 = V ΣV > (whereX = UΣV >)
Matrix recovery 8-32
Aside: dual of semidefinite program
(primal) minimizeX 〈C,X〉s.t. 〈Ai,X〉 = bi, 1 ≤ i ≤ m
X 0
m
(dual) maximizey b>y
s.t.m∑i=1
yiAi + S = C
S 0
Exercise: use this to verify Fact 8.8
Matrix recovery 8-33
Nuclear norm minimization via SDP
Convex relaxation of rank minimization
M = argminX‖X‖∗ s.t. y = A(X)
This is solvable via SDP
minimizeX,W1,W212Tr(W1) + 1
2Tr(W2)
s.t. y = A(X),[W1 XX> W2
] 0
Matrix recovery 8-34
RIP and low-rank matrix recovery
Matrix recovery 8-35
RIP for low-rank matrices
Almost parallel results to compressed sensing ...2
Definition 8.9The r-restricted isometry constants δub
r (A) and δlbr (A) are smallest
quantities s.t.
(1− δlbr )‖X‖F ≤ ‖A(X)‖F ≤ (1 + δub
r )‖X‖F, ∀X : rank(X) ≤ r
2One can also define RIP w.r.t. ‖ · ‖2F rather than ‖ · ‖F.
Matrix recovery 8-36
RIP and low-rank matrix recovery
Theorem 8.10 (Recht, Fazel, Parrilo ’10, Candes, Plan ’11)
Suppose rank(M) = r. For any fixed integer K > 0, if1+δub
Kr
1−δlb(2+K)r
<√
K2 , then nuclear norm minimization is exact
• Can be easily extended to account for noise and imperfectstructural assumption
Matrix recovery 8-37
Geometry of nuclear norm ball
Level set of nuclear norm ball:∥∥∥∥∥[x yy z
]∥∥∥∥∥∗≤ 1
Fig. credit: Candes ’14Matrix recovery 8-38
Some notation
Recall M = UΣV >
• Let T be span of matrices of the form (called tangent space)
T = UX> + Y V > : X,Y ∈ Rn×r
• Let PT be orthogonal projection onto T :
PT (X) = UU>X + XV V > −UU>XV V >
• Its complement PT⊥ = I − PT :
PT⊥(X) = (I −UU>)X(I − V V >)
MP>T⊥(X) = 0 and M>PT⊥(X) = 0
Matrix recovery 8-39
Proof of Theorem 8.10
Suppose X = M + H is feasible and obeys ‖M + H‖∗ ≤ ‖M‖∗.The goal is to show that H = 0 under RIP.
The key is to decompose H into H0 + H1 + H2 + . . .︸ ︷︷ ︸Hc
• H0 = PT (H) (rank 2r)• Hc = P⊥T (H) (obeying MH>c = 0 and M>Hc = 0)• H1: best rank-(Kr) approximation of Hc (K is const)• H2: best rank-(Kr) approximation of Hc −H1
• ...
Matrix recovery 8-40
Proof of Theorem 8.10
Informally, the proof proceeds by showing that
1. H0 dominates∑i≥2 Hi (by objective function)
2. (converse)∑i≥2 Hi dominates H0 + H1 (by RIP + feasibility)
These can happen simultaneously only when H = 0
Matrix recovery 8-41
Proof of Theorem 8.10Step 1 (which does not rely on RIP). Show that∑
j≥2‖Hj‖F ≤ ‖H0‖∗/
√Kr. (8.2)
This follows immediately by combining the following 2 observations:(i) Since M + H is assumed to be a better estimate:
‖M‖∗ ≥ ‖M + H‖∗ ≥ ‖M + Hc‖∗ − ‖H0‖∗ (8.3)≥ ‖M‖∗ + ‖Hc‖∗︸ ︷︷ ︸
Fact 8.5 (MH>c =0 and M>Hc=0)
− ‖H0‖∗
=⇒ ‖Hc‖∗ ≤ ‖H0‖∗ (8.4)
(ii) Since nonzero singular values of Hj−1 dominate those of Hj (j ≥ 2):
‖Hj‖F ≤√Kr‖Hj‖ ≤
√Kr[‖Hj−1‖∗/(Kr)
]≤ ‖Hj−1‖∗/
√Kr
=⇒∑j≥2‖Hj‖F ≤
1√Kr
∑j≥2‖Hj−1‖∗ = 1√
Kr‖Hc‖∗ (8.5)
Proof of Theorem 8.10
Step 2 (using feasibility + RIP). Show that ∃ρ <√K/2 s.t.
‖H0 + H1‖F ≤ ρ∑
j≥2‖Hj‖F (8.6)
If this claim holds, then
‖H0 + H1‖F ≤ ρ∑
j≥2‖Hj‖F
(8.2)≤ ρ
1√Kr‖H0‖∗
≤ ρ 1√Kr
(√2r‖H0‖F
)= ρ
√2K‖H0‖F
≤ ρ√
2K‖H0 + H1‖F. (8.7)
This cannot hold with ρ <√K/2 unless H0 + H1 = 0︸ ︷︷ ︸
equivalently, H0=H1=0
Matrix recovery 8-43
Proof of Theorem 8.10
We now prove (8.6). To connect H0 +H1 with∑j≥2 Hj , we use feasibility:
A(H) = 0 ⇐⇒ A(H0 + H1) = −∑
j≥2A(Hj),
which taken collectively with RIP yields
(1− δlb(2+K)r)‖H0 + H1‖F ≤
∥∥A(H0 + H1)∥∥
F =∥∥∥∑
j≥2A(Hj)
∥∥∥F
≤∑
j≥2
∥∥A(Hj)∥∥
F
≤∑
j≥2(1 + δub
Kr)‖Hj‖F
This establishes (8.6) as long as ρ := 1+δubKr
1−δlb(2+K)r
<√
K2 .
Matrix recovery 8-44
Gaussian sampling operators satisfy RIP
If entries of Aimi=1 are i.i.d. N (0, 1/m), then
δ5r(A) <√
3−√
2√3 +√
2
with high prob., provided that
m & nr (near-optimal sample size)
This satisfies assumption of Theorem 8.10 with K = 3
Matrix recovery 8-45
Precise phase transition
Using statistical dimension machienry, we can locate precise phasetransition (Amelunxen, Lotz, McCoy & Tropp ’13)
nuclear norm min
works if m > stat-dim(D (‖ · ‖∗,X)
)fails if m < stat-dim
(D (‖ · ‖∗,X)
)where
stat-dim(D (‖ · ‖∗,X)
)≈ n2ψ
(r
n
)and
ψ (ρ) = infτ≥0
ρ+ (1− ρ)
[ρ(1 + τ2) + (1− ρ)
∫ 2
τ
(u− τ)2√
4− u2
πdu]
Matrix recovery 8-46
Aside: subgradient of nuclear norm
Subdifferential (set of subgradients) of ‖ · ‖∗ at M is
∂‖M‖∗ =UV > + W : PT (W ) = 0, ‖W ‖ ≤ 1
• Does not depend on singular values of M
• Z ∈ ∂‖M‖∗ iff
PT (Z) = UV >, ‖PT⊥(Z)‖ ≤ 1.
Matrix recovery 8-47
Derivation of statistical dimension
WLOG, suppose X =[
Ir0
], then ∂‖X‖∗ =
[Ir
W
]| ‖W ‖ ≤ 1
.
Let G =[
G11 G12G21 G22
]be i.i.d. standard Gaussian.
From last lecture, we know that
stat-dim(D(‖ · ‖∗,X)
)≈ inf
τ≥0E[
infZ∈∂‖X‖∗
‖G− τZ‖2F
]= infτ≥0
E
[inf
W :‖W‖≤1
∥∥∥∥[ G11 G12G21 G22
]− τ
[Ir
W
]∥∥∥∥2
F
]
Matrix recovery 8-48
Derivation of statistical dimension
Observe that
E
[inf
W :‖W‖≤1
∥∥∥∥[ G11 G12G21 G22
]− τ
[Ir
W
]∥∥∥∥2
F
]
= E[‖G11 − τIr‖2
F + ‖G21‖2F + ‖G12‖2
F + inf‖W‖≤1
‖G22 − τW ‖2F
]= r(2n− r + τ2)+ E
[∑n−r
i=1(σi (G22)− τ)2
+
].
empirical distributions of σi(G22)/√n− r
Recall from random matrix theory (Marchenko-Pastur law)
1n− rE
[n−r∑i=1
(σi(G22
)− τ)2
+
]→∫ 2
0(u− τ)2
+
√4− u2
πdu,
where G22 ∼ N(0, 1
n−rI)
. Taking ρ = r/n and minimizing over τ lead toclosed-form expression for phase transition boundary.
Matrix recovery 8-49
Derivation of statistical dimension
Observe that
E
[inf
W :‖W‖≤1
∥∥∥∥[ G11 G12G21 G22
]− τ
[Ir
W
]∥∥∥∥2
F
]
= E[‖G11 − τIr‖2
F + ‖G21‖2F + ‖G12‖2
F + inf‖W‖≤1
‖G22 − τW ‖2F
]= r(2n− r + τ2)+ E
[∑n−r
i=1(σi (G22)− τ)2
+
].
empirical distributions of σi(G22)/√n− r
Recall from random matrix theory (Marchenko-Pastur law)
1n− rE
[n−r∑i=1
(σi(G22
)− τ)2
+
]→∫ 2
0(u− τ)2
+
√4− u2
πdu,
where G22 ∼ N(0, 1
n−rI)
. Taking ρ = r/n and minimizing over τ lead toclosed-form expression for phase transition boundary.
Matrix recovery 8-49
Numerical phase transition (n = 30)
Figure credit: Amelunxen, Lotz, McCoy, & Tropp ’13
Matrix recovery 8-50
Sampling operators that do NOT satisfy RIP
Unfortunately, many sampling operators fail to satisfy RIP(e.g. none of the 4 motivating examples in this lecture satisfies RIP)
Two important examples:
• Phase retrieval / solving random quadratic systems of equations
• Matrix completion
Matrix recovery 8-51
Phase retrieval / solving random quadraticsystems of equations
Matrix recovery 8-52
Rank-one measurements
Measurements: see (8.1)
yi = a>i xx>︸ ︷︷ ︸
:=M
ai =⟨aia
>i︸ ︷︷ ︸
:=Ai
,M⟩, 1 ≤ i ≤ m
A (X) =
〈A1,X〉〈A2,X〉
...〈Am,X〉
=
〈a1a
>1 ,X〉
〈a2a>2 ,X〉...
〈ama>m,X〉
Matrix recovery 8-53
Rank-one measurements
Suppose ai ∼ N (0, In)
• If x is independent of ai, then⟨aia
>i ,xx
>⟩ =∣∣a>i x∣∣2 ‖x‖2 ⇒ ∥∥∥A(xx>)∥∥∥
F√m‖xx>‖F
• Consider Ai = aia>i :⟨
aia>i ,Ai
⟩= ‖ai‖4 ≈ n‖aia>i ‖F
=⇒ ‖A(Ai)‖F ≥ |⟨aia
>i ,Ai
⟩| ≈ n‖Ai‖F
• If sample size m n (information limit), then
maxX: rank(X)=1‖A(X)‖F‖X‖F
minX: rank(X)=1‖A(X)‖F‖X‖F
&n√m
&√n
=⇒ 1 + δubK
1− δlb2+K
≥maxX: rank(X)=1
‖A(X)‖F‖X‖F
minX: rank(X)=1‖A(X)‖F‖X‖F
&√n
√K
Violate RIP condition in Theorem 8.10
Matrix recovery 8-54
Rank-one measurements
Suppose ai ∼ N (0, In)
• If x is independent of ai, then⟨aia
>i ,xx
>⟩ =∣∣a>i x∣∣2 ‖x‖2 ⇒ ∥∥∥A(xx>)∥∥∥
F√m‖xx>‖F
• Consider Ai = aia>i :⟨
aia>i ,Ai
⟩= ‖ai‖4 ≈ n‖aia>i ‖F
=⇒ ‖A(Ai)‖F ≥ |⟨aia
>i ,Ai
⟩| ≈ n‖Ai‖F
• If sample size m n (information limit), then
maxX: rank(X)=1‖A(X)‖F‖X‖F
minX: rank(X)=1‖A(X)‖F‖X‖F
&n√m
&√n
=⇒ 1 + δubK
1− δlb2+K
≥maxX: rank(X)=1
‖A(X)‖F‖X‖F
minX: rank(X)=1‖A(X)‖F‖X‖F
&√n
√K
Violate RIP condition in Theorem 8.10
Matrix recovery 8-54
Why do we lose RIP?
Problem:
• Low-rank matrices X (e.g. aia>i ) might be too aligned withsome rank-one measurements loss of incoherence in some measurements
• Some measurements 〈Ai,X〉 might have too high of a leverageon A(X) when measured in ‖ · ‖F Change ‖ · ‖F to other norms!
Matrix recovery 8-55
Mixed-norm RIP
Solution: modify RIP appropriately ...
Definition 8.11 (RIP-`2/`1)Let ξub
r (A) and ξlbr (A) be smallest quantities s.t.
(1− ξlbr )‖X‖F ≤ ‖A(X)‖1 ≤ (1 + ξub
r )‖X‖F, ∀X : rank(X) ≤ r
Matrix recovery 8-56
Analyzing phase retrieval via RIP-`2/`1
Theorem 8.12 (Chen, Chi, Goldsmith ’15)
Theorem 8.10 continues to hold if we replace δubr and δlb
r with ξubr
and ξlbr , respectively.
• Follows same proof as for Theorem 8.10, except that ‖ · ‖F(highlighted in red) is replaced by ‖ · ‖1 in Slide 8-44
• Back to example in Slide 8-54: If x is independent of ai, then⟨
aia>i ,xx
>⟩ =∣∣a>i x∣∣2 ‖x‖2 ⇒
∥∥A(xx>)∥∥1 m‖xx>‖F
‖A(Ai)‖1 = |⟨aia
>i ,Ai
⟩|+∑j:j 6=i |
⟨aia
>i ,Aj
⟩| ≈ (n+m)‖Ai‖F
For both cases, ‖A(X)‖1‖X‖F
are of same order
Matrix recovery 8-57
Analyzing phase retrieval via RIP-`2/`1
Theorem 8.12 (Chen, Chi, Goldsmith ’15)
Theorem 8.10 continues to hold if we replace δubr and δlb
r with ξubr
and ξlbr , respectively.
• Follows same proof as for Theorem 8.10, except that ‖ · ‖F(highlighted in red) is replaced by ‖ · ‖1 in Slide 8-44
• Back to example in Slide 8-54: If x is independent of ai, then⟨
aia>i ,xx
>⟩ =∣∣a>i x∣∣2 ‖x‖2 ⇒
∥∥A(xx>)∥∥1 m‖xx>‖F
‖A(Ai)‖1 = |⟨aia
>i ,Ai
⟩|+∑j:j 6=i |
⟨aia
>i ,Aj
⟩| ≈ (n+m)‖Ai‖F
For both cases, ‖A(X)‖1‖X‖F
are of same order
Matrix recovery 8-57
Analyzing phase retrieval via RIP-`2/`1
A debiased operator satisfies RIP condition of Theorem 8.12 whenm & nr
B(X) :=
〈A1 −A2,X〉〈A3 −A4,X〉
...
∈ Rm/2
• Debiasing is crucial when r 1
• A consequence of Hanson-Wright inequality for quadratic form(Hanson & Wright ’71, Rudelson & Vershynin ’03)
Matrix recovery 8-58
Theoretical guarantee for phase retrieval
(PhaseLift) minimizeX∈Rn×n
Tr(X)︸ ︷︷ ︸‖·‖∗ for PSD matrices
s.t. yi = a>i Xai, 1 ≤ i ≤ mX 0 (since X = xx>)
Theorem 8.13 (Candes, Strohmer, Voroninski ’13, Candes, Li’14, ...)
Suppose aiind.∼ N (0, I). With high prob., PhaseLift recovers xx>
exactly as soon as m & n
Matrix recovery 8-59
Extension of phase retrieval
(PhaseLift) minimizeX∈Rn×n
Tr(X)︸ ︷︷ ︸‖·‖∗ for PSD matrices
s.t. a>i Xai = a>i Mai, 1 ≤ i ≤ mX 0
Theorem 8.14 (Chen, Chi, Goldsmith ’15, Cai, Zhang ’15,Kueng, Rauhut, Terstiege ’17)
Suppose M 0, rank(M) = r, and aiind.∼ N (0, I). With high
prob., PhaseLift recovers M exactly as soon as m & nr
Matrix recovery 8-60
Matrix completion
Matrix recovery 8-61
Sampling operators for matrix completionObservation operator (projection onto matrices supported on Ω)
Y = PΩ(M)
where (i, j) ∈ Ω with prob. p (random sampling)• PΩ does NOT satisfy RIP when p 1!• For example,
1 0 0 0 00 0 0 0 00 0 0 0 00 0 0 0 00 0 0 0 0
︸ ︷︷ ︸
M
? X ? X XX ? X ? X? X X ? ?X ? ? X ?X ? X ? X
︸ ︷︷ ︸
Ω
‖PΩ(M)‖F = 0, or equivalently, 1+δubK
1−δlb2+K
=∞
Matrix recovery 8-62
Which sampling pattern?
Consider the following sampling patternX X X X X? ? ? ? ?X X X X XX X X X XX X X X X
• If some rows/columns are not sampled, recovery is impossible.
Matrix recovery 8-63
Which low-rank matrices can we recover?
Compare following rank-1 matrices:1 0 0 · · · 00 0 0 · · · 0...
......
0 0 0 · · · 0
︸ ︷︷ ︸
hard
vs.
1 1 1 · · · 11 1 1 · · · 1...
......
1 1 1 · · · 1
︸ ︷︷ ︸
easy
Column / row spaces cannot be aligned with canonical basis vectors
Matrix recovery 8-64
Coherence
Definition 8.15Coherence parameter µ of M = UΣV > is smallest quantity s.t.
maxi‖U>ei‖2 ≤
µr
nand max
i‖V >ei‖2 ≤
µr
n
(primal) minimizeX hC, Xis.t. hAi, Xi = bi, 1 i m
X 0
m
(dual) maximizey b>y
s.t.mX
i=1
yiAi + S = C
S 0
A (X) =
26664
hA1, XihA2, Xi
...hAm, Xi
37775 =
26664
ha1a>1 , Xi
ha2a>2 , Xi...
hama>m, Xi
37775
ei U PUei
1
(primal) minimizeX hC, Xis.t. hAi, Xi = bi, 1 i m
X 0
m
(dual) maximizey b>y
s.t.mX
i=1
yiAi + S = C
S 0
A (X) =
26664
hA1, XihA2, Xi
...hAm, Xi
37775 =
26664
ha1a>1 , Xi
ha2a>2 , Xi...
hama>m, Xi
37775
ei U PUei
1
(primal) minimizeX hC, Xis.t. hAi, Xi = bi, 1 i m
X 0
m
(dual) maximizey b>y
s.t.mX
i=1
yiAi + S = C
S 0
A (X) =
26664
hA1, XihA2, Xi
...hAm, Xi
37775 =
26664
ha1a>1 , Xi
ha2a>2 , Xi...
hama>m, Xi
37775
ei U PU (ei)
1
• µ ≥ 1 (since∑ni=1 ‖U>ei‖2 = ‖U‖2F = r)
• µ = 1 if 1√n
1 = U = V (most incoherent)
• µ = nr if ei ∈ U (most coherent)
Matrix recovery 8-65
Performance guarantee
Theorem 8.16 (Candes & Recht ’09, Candes & Tao ’10, Gross’11, ...)Nuclear norm minimization is exact and unique with high probability,provided that
m & µnr log2 n
• This result is optimal up to a logarithmic factor
• Established via a RIPless theory
Matrix recovery 8-66
Numerical performance of nuclear-normminimization
n = 50
Fig. credit: Candes, Recht ’09Matrix recovery 8-67
KKT condition
Lagrangian:
L (X,Λ) = ‖X‖∗+〈Λ,PΩ(X)−PΩ(M)〉 = ‖X‖∗+〈PΩ(Λ),X−M〉
When M is minimizer, KKT condition reads
0 ∈ ∂XL(X,Λ)∣∣X=M ⇐⇒ ∃Λ s.t. − PΩ(Λ) ∈ ∂‖M‖∗
⇐⇒ ∃W s.t. UV > + W is supported on Ω,PT (W ) = 0, and ‖W ‖ ≤ 1
Matrix recovery 8-68
Optimality condition via dual certificate
Slightly stronger condition than KKT guarantees uniqueness:
Lemma 8.17
M is unique minimizer of nuclear norm minimization if• sampling operator PΩ restricted to T is injective, i.e.
PΩ(H) 6= 0 ∀ nonzero H ∈ T
• ∃W s.t.
UV > + W is supported on Ω,PT (W ) = 0, and ‖W ‖ < 1
Matrix recovery 8-69
Proof of Lemma 8.17
For any W0 obeying ‖W0‖ ≤ 1 and PT (W ) = 0, one has
‖M + H‖∗ ≥ ‖M‖∗ +⟨UV > + W0,H
⟩= ‖M‖∗ +
⟨UV > + W ,H
⟩+ 〈W0 −W ,H〉
= ‖M‖∗ +⟨PΩ(UV > + W
),H⟩
+ 〈PT⊥(W0 −W ),H〉= ‖M‖∗ +
⟨UV > + W ,PΩ(H)
⟩+ 〈W0 −W ,PT⊥(H)〉
if we take W0 s.t. 〈W0,PT⊥(H)〉 = ‖PT⊥(H)‖∗≥ ‖M‖∗ + ‖PT⊥(H)‖∗ − ‖W ‖ · ‖PT⊥(H)‖∗= ‖M‖∗ + (1− ‖W ‖) ‖PT⊥(H)‖∗ > ‖M‖∗
unless PT⊥(H) = 0.
But if PT⊥(H) = 0, then H = 0 by injectivity. Thus, ‖M +H‖∗ > ‖M‖∗for any H 6= 0, concluding the proof.
Matrix recovery 8-70
Constructing dual certificates
Use “golfing scheme” to produce approximate dual certificate (Gross’11)
• Think of it as an iterative algorithm (with sample splitting) tosolve KKT
Matrix recovery 8-71
Proximal algorithm
In the presence of noise, one needs to solve
X = argminX
12‖y −A(X)‖2F + λ‖X‖∗
which can be solved via proximal methodsProximal operator:
proxλ‖·‖∗(X) = arg minZ
12‖Z −X‖2F + λ‖Z‖∗
= UTλ(Σ)V >
where SVD of X is X = UΣV > with Σ = diag(σi), and
Tλ(Σ) = diag((σi − λ)+)
Matrix recovery 8-72
Proximal algorithm
Algorithm 8.2 Proximal gradient methodsfor t = 0, 1, · · · :
Xt+1 = Tµt(Xt − µtA∗A(Xt)
)where µt: step size / learning rate
Matrix recovery 8-73
Reference
[1] ”Lecture notes, Advanced topics in signal processing (ECE 8201),”Y. Chi, 2015.
[2] ”Guaranteed minimum-rank solutions of linear matrix equations vianuclear norm minimization,” B. Recht, M. Fazel, P. Parrilo, SIAMReview , 2010.
[3] ”Exact matrix completion via convex optimization,” E. Candes, andB. Recht, Foundations of Computational Mathematics, 2009
[4] ”Matrix completion from a few entries,” R. Keshavan, A. Montanari,S. Oh, IEEE Transactions on Information Theory, 2010.
[5] ”Matrix rank minimization with applications,” M. Fazel, Ph.D. Thesis,2002.
[6] ”Modeling the world from internet photo collections,” N. Snavely,S. Seitz, and R. Szeliski, International Journal of Computer Vision, 2008.
Matrix recovery 8-74
Reference
[7] ”Computer vision: algorithms and applications,” R. Szeliski, Springer,2011.
[8] ”Shape and motion from image streams under orthography: afactorization method ,” C. Tomasi and T. Kanade, International Journalof Computer Vision, 1992.
[9] ”Topics in random matrix theory ,” T. Tao, American mathematicalsociety , 2012.
[10] ”A singular value thresholding algorithm for matrix completion,” J. Cai,E. Candes, Z. Shen, SIAM Journal on Optimization, 2010.
[11] ”Recovering low-rank matrices from few coefficients in any basis,”D. Gross, IEEE Transactions on Information Theory, 2011.
[12] ”Incoherence-optimal matrix completion,” Y. Chen, IEEE Transactionson Information Theory, 2015.
Matrix recovery 8-75
Reference
[13] ”The power of convex relaxation: Near-optimal matrix completion,”E. Candes, and T. Tao, 2010.
[14] ”Tight oracle inequalities for low-rank matrix recovery from a minimalnumber of noisy random measurements,” E. Candes, and Y. Plan, IEEETransactions on Information Theory, 2011.
[15] ”PhaseLift: Exact and stable signal recovery from magnitudemeasurements via convex programming ,” E. Candes, T. Strohmer, andV. Voroninski, Communications on Pure and Applied Mathematics, 2013.
[16] ”Solving quadratic equations via PhaseLift when there are about asmany equations as unknowns,” E. Candes, X. Li, Foundations ofComputational Mathematics, 2014.
[17] ”A bound on tail probabilities for quadratic forms in independentrandom variables,” D. Hanson, F. Wright, Annals of MathematicalStatistics, 1971.
Matrix recovery 8-76
Reference
[18] “Hanson-Wright inequality and sub-Gaussian concentration,”M. Rudelson, R. Vershynin, Electronic Communications in Probability,2013.
[19] “Exact and stable covariance estimation from quadratic sampling viaconvex programming ,” Y. Chen, Y. Chi, and A. Goldsmith, IEEETransactions on Information Theory, 2015.
[20] “ROP: Matrix recovery via rank-one projections,” T. Cai, and A. Zhang,Annals of Statistics, 2015.
[21] “Low rank matrix recovery from rank one measurements,” R. Kueng,H. Rauhut, and U. Terstiege, Applied and Computational HarmonicAnalysis, 2017.
[22] “Mathematics of sparsity (and a few other things),” E. Candes,International Congress of Mathematicians, 2014.
Matrix recovery 8-77
Top Related