Latent Structure Beyond Sparse Codes -...
Transcript of Latent Structure Beyond Sparse Codes -...
![Page 1: Latent Structure Beyond Sparse Codes - LCSLlcsl.mit.edu/ldr-workshop/Slides/Recht_LDR_MIT_112313.pdf · Latent Structure Beyond Sparse Codes Benjamin Recht Department of EECS and](https://reader035.fdocuments.net/reader035/viewer/2022081523/5fd775957b5be137b4201fd1/html5/thumbnails/1.jpg)
Latent Structure Beyond Sparse Codes
Benjamin RechtDepartment of EECS and StatisticsUniversity of California, Berkeley
![Page 2: Latent Structure Beyond Sparse Codes - LCSLlcsl.mit.edu/ldr-workshop/Slides/Recht_LDR_MIT_112313.pdf · Latent Structure Beyond Sparse Codes Benjamin Recht Department of EECS and](https://reader035.fdocuments.net/reader035/viewer/2022081523/5fd775957b5be137b4201fd1/html5/thumbnails/2.jpg)
Sparse Codes
1.25x 2.5x
5x 10x
Figure 1. Learned dictionaries. Each panel shows 100 basis functions selected at random from the dictionary of a givenovercompleteness ratio.
resulting in dictionaries containing more specialized elements such as straight contours, blobs, local curvature, andgratings. The specialized elements are better matched to the structures occurring natural images, as evidencedby the fact that they yield lower L1 norm representations, steeper coe�cient decay, and better denoising. Itseems plausible that they may also result in improved image compression though this remains to be seen.
These results are of relevance to neuroscience because the input layer of V1 is thought to be at least 100x
redundancy
Which mathematical representations can be learned robustly?
robustness and sparsity
Gabor-like thingies...
![Page 3: Latent Structure Beyond Sparse Codes - LCSLlcsl.mit.edu/ldr-workshop/Slides/Recht_LDR_MIT_112313.pdf · Latent Structure Beyond Sparse Codes Benjamin Recht Department of EECS and](https://reader035.fdocuments.net/reader035/viewer/2022081523/5fd775957b5be137b4201fd1/html5/thumbnails/3.jpg)
Sparse Approximation
• Use the fact that images are sparse in wavelet basis to reduce number of measurements required for signal acquisition.
pixels largewaveletcoefficients
widebandsignalsamples
largeGaborcoefficients
time
frequency
Compressed Sensing
• npatients << npeaks
• If very few are needed for diagnosis, search for a sparse set of markers
Lasso
![Page 4: Latent Structure Beyond Sparse Codes - LCSLlcsl.mit.edu/ldr-workshop/Slides/Recht_LDR_MIT_112313.pdf · Latent Structure Beyond Sparse Codes Benjamin Recht Department of EECS and](https://reader035.fdocuments.net/reader035/viewer/2022081523/5fd775957b5be137b4201fd1/html5/thumbnails/4.jpg)
Cardinality Minimization• PROBLEM: Find the vector of lowest cardinality that
satisfies/approximates the underdetermined linear system
• NP-HARD:–Reduce to EXACT-COVER
–Hard to approximate
–Known exact algorithms require enumeration
• HEURISTIC: Replace cardinality with l1 norm
�x = y � : Rp ! Rn
![Page 5: Latent Structure Beyond Sparse Codes - LCSLlcsl.mit.edu/ldr-workshop/Slides/Recht_LDR_MIT_112313.pdf · Latent Structure Beyond Sparse Codes Benjamin Recht Department of EECS and](https://reader035.fdocuments.net/reader035/viewer/2022081523/5fd775957b5be137b4201fd1/html5/thumbnails/5.jpg)
Density Matrix
Seismic Imaging
Geometric Structure
Rank of:
RecommenderSystems
DataMatrix
Quantum Tomography
Rank of:
Rank of:
Rank of: Unfolded Tensor
GramMatrix
![Page 6: Latent Structure Beyond Sparse Codes - LCSLlcsl.mit.edu/ldr-workshop/Slides/Recht_LDR_MIT_112313.pdf · Latent Structure Beyond Sparse Codes Benjamin Recht Department of EECS and](https://reader035.fdocuments.net/reader035/viewer/2022081523/5fd775957b5be137b4201fd1/html5/thumbnails/6.jpg)
Affine Rank Minimization• PROBLEM: Find the matrix of lowest rank that
satisfies/approximates the underdetermined linear system
• NP-HARD:–Reduce to solving polynomial equations
–Hard to approximate
–Exact algorithms are awful
• HUERISTIC: Replace rank with nuclear norm
�(X) = y � : Rp1⇥p2 ! Rn
![Page 7: Latent Structure Beyond Sparse Codes - LCSLlcsl.mit.edu/ldr-workshop/Slides/Recht_LDR_MIT_112313.pdf · Latent Structure Beyond Sparse Codes Benjamin Recht Department of EECS and](https://reader035.fdocuments.net/reader035/viewer/2022081523/5fd775957b5be137b4201fd1/html5/thumbnails/7.jpg)
Heuristic: Gradient Descent
• Step 1: Pick (i,j) and compute residual:
• Step 2: Take a mixture of current model and corrected model (𝛼,β>0):
r x p2
=M LR*
p1 x rp1 x p2
minimize kXk⇤subject to �(X) = b
IDEA: Replace rank with nuclear norm:
Some guy on livejournal, 2006Fazel, Parillo, Recht, 2007Candes and Recht, 2008
Succeeds when number of samples is Õ(r(p1 +p2))
e = (LiRTj �Mij)
Li
Rj
�
↵Li � �eRj
↵Rj � �eLi
�
![Page 8: Latent Structure Beyond Sparse Codes - LCSLlcsl.mit.edu/ldr-workshop/Slides/Recht_LDR_MIT_112313.pdf · Latent Structure Beyond Sparse Codes Benjamin Recht Department of EECS and](https://reader035.fdocuments.net/reader035/viewer/2022081523/5fd775957b5be137b4201fd1/html5/thumbnails/8.jpg)
System Identification: find a dynamical model that agrees with time series data• All linear systems are combinations of single pole filters.• Leverage this structure for new algorithms and analysis.
Observe a time series driven by the inputy1, y2, . . . , yTu1, u2, . . . uT
What is a principled way to build a parsimonious model for the input-output responses?
Na et al, 2012
Shah, Bhaskar, Tang, and Recht 2012
![Page 9: Latent Structure Beyond Sparse Codes - LCSLlcsl.mit.edu/ldr-workshop/Slides/Recht_LDR_MIT_112313.pdf · Latent Structure Beyond Sparse Codes Benjamin Recht Department of EECS and](https://reader035.fdocuments.net/reader035/viewer/2022081523/5fd775957b5be137b4201fd1/html5/thumbnails/9.jpg)
Linear Inverse Problems• Find me a solution of
• Φ n x p, n<p
• Of the infinite collection of solutions, which one should we pick?
• Leverage structure:
• How do we design algorithms to solve underdetermined systems problems with priors?
y = �x
Sparsity Rank Smoothness Symmetry
![Page 10: Latent Structure Beyond Sparse Codes - LCSLlcsl.mit.edu/ldr-workshop/Slides/Recht_LDR_MIT_112313.pdf · Latent Structure Beyond Sparse Codes Benjamin Recht Department of EECS and](https://reader035.fdocuments.net/reader035/viewer/2022081523/5fd775957b5be137b4201fd1/html5/thumbnails/10.jpg)
kxk1 =pX
i=1
|xi|
• 1-sparse vectors of Euclidean norm 1
• Convex hull is the unit ball of the l1 norm
1
1
-1
-1
Sparsity
![Page 11: Latent Structure Beyond Sparse Codes - LCSLlcsl.mit.edu/ldr-workshop/Slides/Recht_LDR_MIT_112313.pdf · Latent Structure Beyond Sparse Codes Benjamin Recht Department of EECS and](https://reader035.fdocuments.net/reader035/viewer/2022081523/5fd775957b5be137b4201fd1/html5/thumbnails/11.jpg)
minimize kxk1
subject to �x = y
x1
x2
Φx=y
Compressed Sensing: Candes, Romberg, Tao, Donoho, Tanner, Etc...
![Page 12: Latent Structure Beyond Sparse Codes - LCSLlcsl.mit.edu/ldr-workshop/Slides/Recht_LDR_MIT_112313.pdf · Latent Structure Beyond Sparse Codes Benjamin Recht Department of EECS and](https://reader035.fdocuments.net/reader035/viewer/2022081523/5fd775957b5be137b4201fd1/html5/thumbnails/12.jpg)
• 2x2 matrices• plotted in 3d
rank 1 x2 + z2 + 2y2 = 1
Rank
![Page 13: Latent Structure Beyond Sparse Codes - LCSLlcsl.mit.edu/ldr-workshop/Slides/Recht_LDR_MIT_112313.pdf · Latent Structure Beyond Sparse Codes Benjamin Recht Department of EECS and](https://reader035.fdocuments.net/reader035/viewer/2022081523/5fd775957b5be137b4201fd1/html5/thumbnails/13.jpg)
• 2x2 matrices• plotted in 3d
rank 1 x2 + z2 + 2y2 = 1
Convex hull:
Rank
kXk⇤ =X
i
�i(X)
![Page 14: Latent Structure Beyond Sparse Codes - LCSLlcsl.mit.edu/ldr-workshop/Slides/Recht_LDR_MIT_112313.pdf · Latent Structure Beyond Sparse Codes Benjamin Recht Department of EECS and](https://reader035.fdocuments.net/reader035/viewer/2022081523/5fd775957b5be137b4201fd1/html5/thumbnails/14.jpg)
• 2x2 matrices• plotted in 3d
Nuclear Norm Heuristic
Fazel 2002. R, Fazel, and Parillo 2007
Rank Minimization/Matrix Completion
kXk⇤ =X
i
�i(X)
![Page 15: Latent Structure Beyond Sparse Codes - LCSLlcsl.mit.edu/ldr-workshop/Slides/Recht_LDR_MIT_112313.pdf · Latent Structure Beyond Sparse Codes Benjamin Recht Department of EECS and](https://reader035.fdocuments.net/reader035/viewer/2022081523/5fd775957b5be137b4201fd1/html5/thumbnails/15.jpg)
• Integer solutions: all components of x
are ±1
• Convex hull is the unit ball of the l1 norm
(1,-1)
(1,1)
(-1,-1)
(-1,1)
Integer Programming
![Page 16: Latent Structure Beyond Sparse Codes - LCSLlcsl.mit.edu/ldr-workshop/Slides/Recht_LDR_MIT_112313.pdf · Latent Structure Beyond Sparse Codes Benjamin Recht Department of EECS and](https://reader035.fdocuments.net/reader035/viewer/2022081523/5fd775957b5be137b4201fd1/html5/thumbnails/16.jpg)
minimize kxk1subject to �x = y
x1
x2
Φx=y
Donoho and Tanner 2008Mangasarian and Recht. 2009.
![Page 17: Latent Structure Beyond Sparse Codes - LCSLlcsl.mit.edu/ldr-workshop/Slides/Recht_LDR_MIT_112313.pdf · Latent Structure Beyond Sparse Codes Benjamin Recht Department of EECS and](https://reader035.fdocuments.net/reader035/viewer/2022081523/5fd775957b5be137b4201fd1/html5/thumbnails/17.jpg)
• Search for best linear combination of fewest atoms• “rank” = fewest atoms needed to describe the model
Parsimonious Models
atomsmodel weights
rank
![Page 18: Latent Structure Beyond Sparse Codes - LCSLlcsl.mit.edu/ldr-workshop/Slides/Recht_LDR_MIT_112313.pdf · Latent Structure Beyond Sparse Codes Benjamin Recht Department of EECS and](https://reader035.fdocuments.net/reader035/viewer/2022081523/5fd775957b5be137b4201fd1/html5/thumbnails/18.jpg)
Atomic Norms• Given a basic set of atoms, , define the function
• When is centrosymmetric, we get a norm
• When can we compute this?• When does this work?
kxkA = inf{X
a2A|ca| : x =
X
a2Acaa}
kxkA = inf{t > 0 : x 2 tconv(A)}
A
minimize kzkAsubject to �z = yIDEA:
A
![Page 19: Latent Structure Beyond Sparse Codes - LCSLlcsl.mit.edu/ldr-workshop/Slides/Recht_LDR_MIT_112313.pdf · Latent Structure Beyond Sparse Codes Benjamin Recht Department of EECS and](https://reader035.fdocuments.net/reader035/viewer/2022081523/5fd775957b5be137b4201fd1/html5/thumbnails/19.jpg)
Hierarchical dictionary for image patches
26/42
Union of Subspaces
• X has structured sparsity: linear combination of elements from a set of subspaces {Ug}.
• Atomic set: unit norm vectors living in one of the Ug
Permutations and Rankings
• X a sum of a few permutation matrices
• Examples: Multiobject Tracking, Ranked elections, BCS
• Convex hull of permutation matrices: doubly stochastic matrices.
![Page 20: Latent Structure Beyond Sparse Codes - LCSLlcsl.mit.edu/ldr-workshop/Slides/Recht_LDR_MIT_112313.pdf · Latent Structure Beyond Sparse Codes Benjamin Recht Department of EECS and](https://reader035.fdocuments.net/reader035/viewer/2022081523/5fd775957b5be137b4201fd1/html5/thumbnails/20.jpg)
• Moments: convex hull of of [1,t,t2,t3,t4,...], t∈T, some basic set.
• System Identification, Image Processing, Numerical Integration, Statistical Inference
• Solve with semidefinite programming
• Cut-matrices: sums of rank-one sign matrices.
• Collaborative Filtering, Clustering in Genetic Networks, Combinatorial Approximation Algorithms
• Approximate with semidefinite programming
• Low-rank Tensors: sums of rank-one tensors
• Computer Vision, Image Processing, Hyperspectral Imaging, Neuroscience
• Approximate with alternating least-squares
![Page 21: Latent Structure Beyond Sparse Codes - LCSLlcsl.mit.edu/ldr-workshop/Slides/Recht_LDR_MIT_112313.pdf · Latent Structure Beyond Sparse Codes Benjamin Recht Department of EECS and](https://reader035.fdocuments.net/reader035/viewer/2022081523/5fd775957b5be137b4201fd1/html5/thumbnails/21.jpg)
Atomic norms in sparse approximation
• Greedy approximations
• Best n term approximation to a function f in the convex hull of A.
• Maurey, Jones, and Barron (1980s-90s)• Devore and Temlyakov (1996)• Random Feature Heuristics (Rahimi and R, 2007)
kf � fnkL2 c0kfkAp
n
![Page 22: Latent Structure Beyond Sparse Codes - LCSLlcsl.mit.edu/ldr-workshop/Slides/Recht_LDR_MIT_112313.pdf · Latent Structure Beyond Sparse Codes Benjamin Recht Department of EECS and](https://reader035.fdocuments.net/reader035/viewer/2022081523/5fd775957b5be137b4201fd1/html5/thumbnails/22.jpg)
• Set of directions that decrease the norm from x form a cone:
• x is the unique minimizer if the intersection of this cone with the null space of Φ equals {0}
Tangent Cones
y = �zx
minimize kzkAsubject to �z = y
{z : kzkA kxkA}TA(x)
TA(x) = {d : kx + ↵dkA kxkA for some ↵ > 0}
![Page 23: Latent Structure Beyond Sparse Codes - LCSLlcsl.mit.edu/ldr-workshop/Slides/Recht_LDR_MIT_112313.pdf · Latent Structure Beyond Sparse Codes Benjamin Recht Department of EECS and](https://reader035.fdocuments.net/reader035/viewer/2022081523/5fd775957b5be137b4201fd1/html5/thumbnails/23.jpg)
Mean Width
d
0x
S
C
(d) = supx2C
d
0x
�d
0x
Support Function:
SC(d) + SC(�d)measures width of C when projected onto span of d.
mean width: w(C) =
Z
Sp�1
SC(u)du
![Page 24: Latent Structure Beyond Sparse Codes - LCSLlcsl.mit.edu/ldr-workshop/Slides/Recht_LDR_MIT_112313.pdf · Latent Structure Beyond Sparse Codes Benjamin Recht Department of EECS and](https://reader035.fdocuments.net/reader035/viewer/2022081523/5fd775957b5be137b4201fd1/html5/thumbnails/24.jpg)
• When does a random subspace, U in , intersect a convex cone C at the origin?
• Gordon (1988): with high probability if
where is the mean width.
• Corollary: For inverse problems, if Φ is a random Gaussian matrix with n rows, need
for exact recovery of x.
codim(U) � pw(C \ Sp�1)
2
w(C \ Sp�1) =
Z
Sp�1
SC(u)du
n � pw(TA(x) \ Sp�1)2
Rp
![Page 25: Latent Structure Beyond Sparse Codes - LCSLlcsl.mit.edu/ldr-workshop/Slides/Recht_LDR_MIT_112313.pdf · Latent Structure Beyond Sparse Codes Benjamin Recht Department of EECS and](https://reader035.fdocuments.net/reader035/viewer/2022081523/5fd775957b5be137b4201fd1/html5/thumbnails/25.jpg)
• Hypercube:
• Sparse Vectors, p vector, sparsity s
• Block sparse, M groups (possibly overlapping), maximum group size B, k active groups
• Low-rank matrices: p1 x p2, (p1<p2), rank r
Ratesn � p/2
n � 2s log�ps
�+
5s4
n � k⇣p
2 log (M � k) +pB⌘2
+ kB
n � 3r(p1 + p2 � r)
![Page 26: Latent Structure Beyond Sparse Codes - LCSLlcsl.mit.edu/ldr-workshop/Slides/Recht_LDR_MIT_112313.pdf · Latent Structure Beyond Sparse Codes Benjamin Recht Department of EECS and](https://reader035.fdocuments.net/reader035/viewer/2022081523/5fd775957b5be137b4201fd1/html5/thumbnails/26.jpg)
• Suppose we observe
• If is an optimal solution, then provided that
Robust Recovery (deterministic)
minimize kzkAsubject to k�z � yk �
kwk2 �
kx� x̂k 2�
✏
x̂
y = �x + w
{z : kzkA kxkA}
k�z � yk �
n � pw(TA(x) \ Sp�1)2
(1� ✏)2
![Page 27: Latent Structure Beyond Sparse Codes - LCSLlcsl.mit.edu/ldr-workshop/Slides/Recht_LDR_MIT_112313.pdf · Latent Structure Beyond Sparse Codes Benjamin Recht Department of EECS and](https://reader035.fdocuments.net/reader035/viewer/2022081523/5fd775957b5be137b4201fd1/html5/thumbnails/27.jpg)
• Suppose we observe
• If is an optimal solution, then provided that
Robust Recovery (statistical)
x̂
y = �x + w
x̂
minimize k�z � yk2 + µkzkA
cone{u : kx+ ukA kxkA + �kuk}
kx� x̂k2 ⌘(x,A,�, �)µAnd under an additional “cone condition”
Bhaskar, Tang, and Recht 2011
µ � Ew[k�⇤wk⇤A]k�x� �x̂k2
pµkxkA
![Page 28: Latent Structure Beyond Sparse Codes - LCSLlcsl.mit.edu/ldr-workshop/Slides/Recht_LDR_MIT_112313.pdf · Latent Structure Beyond Sparse Codes Benjamin Recht Department of EECS and](https://reader035.fdocuments.net/reader035/viewer/2022081523/5fd775957b5be137b4201fd1/html5/thumbnails/28.jpg)
• Sparse Vectors, p vector, sparsity s
• Low-rank matrices: p1 x p2, (p1<p2), rank r
Denoising Rates (re-derivations)
1
pkx̂� x?k22 = O
✓�2s log(p)
p
◆
1
p1p2kx̂� x?k2F = O
✓�2r
p1
◆
![Page 29: Latent Structure Beyond Sparse Codes - LCSLlcsl.mit.edu/ldr-workshop/Slides/Recht_LDR_MIT_112313.pdf · Latent Structure Beyond Sparse Codes Benjamin Recht Department of EECS and](https://reader035.fdocuments.net/reader035/viewer/2022081523/5fd775957b5be137b4201fd1/html5/thumbnails/29.jpg)
Atomic Norm Minimization
• Generalizes existing, powerful methods• Rigorous formula for developing new analysis
algorithms• Tightest bounds on number of measurements
needed for model recovery in all common models• One algorithm prototype for many data-mining
applications
minimize kzkAsubject to �z = yIDEA:
Chandrasekaran, Recht, Parrilo, and Willsky 2010
![Page 30: Latent Structure Beyond Sparse Codes - LCSLlcsl.mit.edu/ldr-workshop/Slides/Recht_LDR_MIT_112313.pdf · Latent Structure Beyond Sparse Codes Benjamin Recht Department of EECS and](https://reader035.fdocuments.net/reader035/viewer/2022081523/5fd775957b5be137b4201fd1/html5/thumbnails/30.jpg)
• Gram matrix of y vectors indicates overlapping support
• Use graph algorithms to identify single dictionary elements at a time
Learning representations
• ASSUME:• very sparse vectors• s<N1/2/log(N)
• very incoherent dictionary (much more than RIP)
• number of observations is much bigger than N
Arora, Ge, and MoitraAgarwal, Anandkumar, and Netrapalli
x z
|��x, �z�| � |�x, z�|
![Page 31: Latent Structure Beyond Sparse Codes - LCSLlcsl.mit.edu/ldr-workshop/Slides/Recht_LDR_MIT_112313.pdf · Latent Structure Beyond Sparse Codes Benjamin Recht Department of EECS and](https://reader035.fdocuments.net/reader035/viewer/2022081523/5fd775957b5be137b4201fd1/html5/thumbnails/31.jpg)
Extended representations
C = �(K � L)convex body
linear map
cone affine space
this non-regular hexagon only has the trivial LP-lift
{y ! R5+ : y1 + y2 + y3 + y5 = 2, y3 + y4 + y5 = 1},
regular hexagon is the projection of a 3-dimlslice of R
5+
![Page 32: Latent Structure Beyond Sparse Codes - LCSLlcsl.mit.edu/ldr-workshop/Slides/Recht_LDR_MIT_112313.pdf · Latent Structure Beyond Sparse Codes Benjamin Recht Department of EECS and](https://reader035.fdocuments.net/reader035/viewer/2022081523/5fd775957b5be137b4201fd1/html5/thumbnails/32.jpg)
C = �(K � L)
(1,-1)
(1,1)
(-1,-1)
(-1,1)
1
1
-1
-1
� =�I �I
�L = {y :
2d�
i=1
yi = 1} L = {Z : trace(Z) = 1}
�
��A B
BT C
��= B
�
��T xxT u
��= x
L =
�y :
yi + yi+d = 11 � i � d
�
� =�I �I
�
L =
�Z =
�T xxT u
�:
T toeplitzT11 = u = 1
�
K = R2d+
K = Sd1+d2+
K = Sd+1+K = R2d
+
![Page 33: Latent Structure Beyond Sparse Codes - LCSLlcsl.mit.edu/ldr-workshop/Slides/Recht_LDR_MIT_112313.pdf · Latent Structure Beyond Sparse Codes Benjamin Recht Department of EECS and](https://reader035.fdocuments.net/reader035/viewer/2022081523/5fd775957b5be137b4201fd1/html5/thumbnails/33.jpg)
Extended representations
C = �(K � L)
linear map
cone affine space
this non-regular hexagon only has the trivial LP-lift
{y ! R5+ : y1 + y2 + y3 + y5 = 2, y3 + y4 + y5 = 1},
regular hexagon is the projection of a 3-dimlslice of R
5+
C� = {y : �x, y� � 1 �x � C}
1 � �x, y� = �A(x), B(y)�A : C � K B : C� � K�
C has a lift into K if there are maps
such that
for all extreme points of x ∈ C and y ∈ C*
polar body
Gouveia, Parrilo, and Thomas
Representation learning becomes matrix factorization
![Page 34: Latent Structure Beyond Sparse Codes - LCSLlcsl.mit.edu/ldr-workshop/Slides/Recht_LDR_MIT_112313.pdf · Latent Structure Beyond Sparse Codes Benjamin Recht Department of EECS and](https://reader035.fdocuments.net/reader035/viewer/2022081523/5fd775957b5be137b4201fd1/html5/thumbnails/34.jpg)
Learning extended representations?
C = �(K � L)convex body
linear map
cone affine space
• Learning representation through NMF?• Ties immediately with gaussian width analysis• Could obviate graph structured arguments• What are the right features?