Stability of Randomized Learning Algorithms - Machine Learning
New$Algorithms$for$Learning$ Incoherentand ......New$Algorithms$for$Learning$ Incoherentand$...
Transcript of New$Algorithms$for$Learning$ Incoherentand ......New$Algorithms$for$Learning$ Incoherentand$...
![Page 1: New$Algorithms$for$Learning$ Incoherentand ......New$Algorithms$for$Learning$ Incoherentand$ OvercompleteDiconaries Sanjeev’Arora Rong’Ge’ Tengyu’Ma Ankur’Moitra Princeton’](https://reader034.fdocuments.net/reader034/viewer/2022052002/60159ca65a210f1e7141f6c8/html5/thumbnails/1.jpg)
New Algorithms for Learning Incoherent and Overcomplete Dic:onaries
Sanjeev Arora Rong Ge Tengyu Ma Ankur Moitra Princeton Microso8
Research Princeton MIT
ICERM Workshop, May 7
![Page 2: New$Algorithms$for$Learning$ Incoherentand ......New$Algorithms$for$Learning$ Incoherentand$ OvercompleteDiconaries Sanjeev’Arora Rong’Ge’ Tengyu’Ma Ankur’Moitra Princeton’](https://reader034.fdocuments.net/reader034/viewer/2022052002/60159ca65a210f1e7141f6c8/html5/thumbnails/2.jpg)
Dic:onary Learning
• Simple “dicEonary elements” build complicated objects
• Given the objects, can we learn the dicEonary?
![Page 3: New$Algorithms$for$Learning$ Incoherentand ......New$Algorithms$for$Learning$ Incoherentand$ OvercompleteDiconaries Sanjeev’Arora Rong’Ge’ Tengyu’Ma Ankur’Moitra Princeton’](https://reader034.fdocuments.net/reader034/viewer/2022052002/60159ca65a210f1e7141f6c8/html5/thumbnails/3.jpg)
Why dic:onary learning? [Olshausen Field ’96]
natural image patches
dicEonary learning
Gabor-‐like Filters
![Page 4: New$Algorithms$for$Learning$ Incoherentand ......New$Algorithms$for$Learning$ Incoherentand$ OvercompleteDiconaries Sanjeev’Arora Rong’Ge’ Tengyu’Ma Ankur’Moitra Princeton’](https://reader034.fdocuments.net/reader034/viewer/2022052002/60159ca65a210f1e7141f6c8/html5/thumbnails/4.jpg)
Example: Image Comple:on [Mairal, Elad & Sapiro ’08]
![Page 5: New$Algorithms$for$Learning$ Incoherentand ......New$Algorithms$for$Learning$ Incoherentand$ OvercompleteDiconaries Sanjeev’Arora Rong’Ge’ Tengyu’Ma Ankur’Moitra Princeton’](https://reader034.fdocuments.net/reader034/viewer/2022052002/60159ca65a210f1e7141f6c8/html5/thumbnails/5.jpg)
Outline
• DicEonary Learning problem
• GeNng a crude esEmate
• Refining the soluEon
![Page 6: New$Algorithms$for$Learning$ Incoherentand ......New$Algorithms$for$Learning$ Incoherentand$ OvercompleteDiconaries Sanjeev’Arora Rong’Ge’ Tengyu’Ma Ankur’Moitra Princeton’](https://reader034.fdocuments.net/reader034/viewer/2022052002/60159ca65a210f1e7141f6c8/html5/thumbnails/6.jpg)
Dic:onary Learning Problem • Given samples of the form Y = AX • X is a sparse matrix • Goal: Learn A (dicEonary). • InteresEng case: m > n (overcomplete)
…… …… =
Y A X
n
m
DicEonary Element Sparse CombinaEon Samples
![Page 7: New$Algorithms$for$Learning$ Incoherentand ......New$Algorithms$for$Learning$ Incoherentand$ OvercompleteDiconaries Sanjeev’Arora Rong’Ge’ Tengyu’Ma Ankur’Moitra Princeton’](https://reader034.fdocuments.net/reader034/viewer/2022052002/60159ca65a210f1e7141f6c8/html5/thumbnails/7.jpg)
Previous Approach
……
DicEonary A Sparse Code X
LASSO Basis Pursuit
Matching Pursuit
Least Squares K-‐SVD
AlternaEng MinimizaEon
![Page 8: New$Algorithms$for$Learning$ Incoherentand ......New$Algorithms$for$Learning$ Incoherentand$ OvercompleteDiconaries Sanjeev’Arora Rong’Ge’ Tengyu’Ma Ankur’Moitra Princeton’](https://reader034.fdocuments.net/reader034/viewer/2022052002/60159ca65a210f1e7141f6c8/html5/thumbnails/8.jpg)
Problem with Alterna:ng Minimiza:on
• Monotone objecEve funcEon? • Local Minimum Issues
……
DicEonary A Sparse Code X
![Page 9: New$Algorithms$for$Learning$ Incoherentand ......New$Algorithms$for$Learning$ Incoherentand$ OvercompleteDiconaries Sanjeev’Arora Rong’Ge’ Tengyu’Ma Ankur’Moitra Princeton’](https://reader034.fdocuments.net/reader034/viewer/2022052002/60159ca65a210f1e7141f6c8/html5/thumbnails/9.jpg)
Empirical Behavior IteraEons
log accuracy
• SyntheEc Experiment, “qualitaEve” plot • K-‐SVD converges with prob. 1/3, random samples as iniEal dict.
Converge, prob 1/3
Stuck, prob 2/3
slow at the beginning
This Talk
![Page 10: New$Algorithms$for$Learning$ Incoherentand ......New$Algorithms$for$Learning$ Incoherentand$ OvercompleteDiconaries Sanjeev’Arora Rong’Ge’ Tengyu’Ma Ankur’Moitra Princeton’](https://reader034.fdocuments.net/reader034/viewer/2022052002/60159ca65a210f1e7141f6c8/html5/thumbnails/10.jpg)
Provable Algorithms
• Run in poly Eme, uses poly samples, learn the ground truth
• Separate modeling and opEmizaEon error • Design new algorithms/Tweak old algorithms
• Work only on “reasonable instances”
![Page 11: New$Algorithms$for$Learning$ Incoherentand ......New$Algorithms$for$Learning$ Incoherentand$ OvercompleteDiconaries Sanjeev’Arora Rong’Ge’ Tengyu’Ma Ankur’Moitra Princeton’](https://reader034.fdocuments.net/reader034/viewer/2022052002/60159ca65a210f1e7141f6c8/html5/thumbnails/11.jpg)
• Consider Image CompleEon
• The representaEon should be unique and robust!
When is the solu:on “reasonable”?
?
DicEonary Image
![Page 12: New$Algorithms$for$Learning$ Incoherentand ......New$Algorithms$for$Learning$ Incoherentand$ OvercompleteDiconaries Sanjeev’Arora Rong’Ge’ Tengyu’Ma Ankur’Moitra Princeton’](https://reader034.fdocuments.net/reader034/viewer/2022052002/60159ca65a210f1e7141f6c8/html5/thumbnails/12.jpg)
Sparse Recovery
• Given A, y, find x.
• Incoherence [Donoho Huo ’99]
• DicEonary elements have inner-‐product 𝜇/√𝑛 • SoluEon is unique and robust • Long line of work [Logan, Donoho Stark, Elad, …….] • Sparsity up to √𝑛
![Page 13: New$Algorithms$for$Learning$ Incoherentand ......New$Algorithms$for$Learning$ Incoherentand$ OvercompleteDiconaries Sanjeev’Arora Rong’Ge’ Tengyu’Ma Ankur’Moitra Princeton’](https://reader034.fdocuments.net/reader034/viewer/2022052002/60159ca65a210f1e7141f6c8/html5/thumbnails/13.jpg)
Our Results
• Thm: If dicEonary A is incoherent, X is randomly k-‐sparse from “nice distribuEon”, learn dicEonary A with accuracy ε when sparsity 𝑘≤ min {√𝑛 /𝜇log 𝑚 , 𝑚↑0.4 }
• Handles sparsity up to √𝑛 • Sample complexity O*(m/ε2). Independently [Agarwal et al.] obtain similar result with slightly different assumpEons and weaker sparsity Later [Barak et al.] get stronger result using SOS
![Page 14: New$Algorithms$for$Learning$ Incoherentand ......New$Algorithms$for$Learning$ Incoherentand$ OvercompleteDiconaries Sanjeev’Arora Rong’Ge’ Tengyu’Ma Ankur’Moitra Princeton’](https://reader034.fdocuments.net/reader034/viewer/2022052002/60159ca65a210f1e7141f6c8/html5/thumbnails/14.jpg)
Our Results
• Thm: Given an esEmated dicEonary ε-‐close to true dicEonary, one iteraEon of K-‐SVD outputs a ε/2-‐close dicEonary
• Works whenever ε<1/log m (before require 1/poly). • Sample complexity O(mlog m)
• Combine: Can learn an incoherent dicEonary with O*(m) samples in poly Eme.
![Page 15: New$Algorithms$for$Learning$ Incoherentand ......New$Algorithms$for$Learning$ Incoherentand$ OvercompleteDiconaries Sanjeev’Arora Rong’Ge’ Tengyu’Ma Ankur’Moitra Princeton’](https://reader034.fdocuments.net/reader034/viewer/2022052002/60159ca65a210f1e7141f6c8/html5/thumbnails/15.jpg)
Outline
• DicEonary Learning problem
• GeNng a crude esEmate
• Refining the soluEon
![Page 16: New$Algorithms$for$Learning$ Incoherentand ......New$Algorithms$for$Learning$ Incoherentand$ OvercompleteDiconaries Sanjeev’Arora Rong’Ge’ Tengyu’Ma Ankur’Moitra Princeton’](https://reader034.fdocuments.net/reader034/viewer/2022052002/60159ca65a210f1e7141f6c8/html5/thumbnails/16.jpg)
Ideas
• Find the support of X, without knowing A.
• Given support of X, find approximate A
![Page 17: New$Algorithms$for$Learning$ Incoherentand ......New$Algorithms$for$Learning$ Incoherentand$ OvercompleteDiconaries Sanjeev’Arora Rong’Ge’ Tengyu’Ma Ankur’Moitra Princeton’](https://reader034.fdocuments.net/reader034/viewer/2022052002/60159ca65a210f1e7141f6c8/html5/thumbnails/17.jpg)
Finding the Support
• Tool: Test whether two columns of X intersect
Disjoint ≈ Small Inner-‐product Intersect ≈ Large Inner-‐product
![Page 18: New$Algorithms$for$Learning$ Incoherentand ......New$Algorithms$for$Learning$ Incoherentand$ OvercompleteDiconaries Sanjeev’Arora Rong’Ge’ Tengyu’Ma Ankur’Moitra Princeton’](https://reader034.fdocuments.net/reader034/viewer/2022052002/60159ca65a210f1e7141f6c8/html5/thumbnails/18.jpg)
Finding the Support: Overlapping Clustering
• Connect pairs of samples with large inner-‐product • Vertex = Sample • Cluster = Rows of X!
![Page 19: New$Algorithms$for$Learning$ Incoherentand ......New$Algorithms$for$Learning$ Incoherentand$ OvercompleteDiconaries Sanjeev’Arora Rong’Ge’ Tengyu’Ma Ankur’Moitra Princeton’](https://reader034.fdocuments.net/reader034/viewer/2022052002/60159ca65a210f1e7141f6c8/html5/thumbnails/19.jpg)
Overlapping Clustering
• Main problem
• Idea: Count the number of common neighbors • pair of points share unique cluster è cluster
Many Common Neighbors in same Cluster
Few Common Neighbors
![Page 20: New$Algorithms$for$Learning$ Incoherentand ......New$Algorithms$for$Learning$ Incoherentand$ OvercompleteDiconaries Sanjeev’Arora Rong’Ge’ Tengyu’Ma Ankur’Moitra Princeton’](https://reader034.fdocuments.net/reader034/viewer/2022052002/60159ca65a210f1e7141f6c8/html5/thumbnails/20.jpg)
Es:mate Dic:onary Elements
• Focus on a row of X/column of A • Can use SVD to find maximum variance direcEon • Or take samples with same sign and average
![Page 21: New$Algorithms$for$Learning$ Incoherentand ......New$Algorithms$for$Learning$ Incoherentand$ OvercompleteDiconaries Sanjeev’Arora Rong’Ge’ Tengyu’Ma Ankur’Moitra Princeton’](https://reader034.fdocuments.net/reader034/viewer/2022052002/60159ca65a210f1e7141f6c8/html5/thumbnails/21.jpg)
Outline
• DicEonary Learning problem
• GeNng a crude esEmate
• AlternaEng MinimizaEon Works!
![Page 22: New$Algorithms$for$Learning$ Incoherentand ......New$Algorithms$for$Learning$ Incoherentand$ OvercompleteDiconaries Sanjeev’Arora Rong’Ge’ Tengyu’Ma Ankur’Moitra Princeton’](https://reader034.fdocuments.net/reader034/viewer/2022052002/60159ca65a210f1e7141f6c8/html5/thumbnails/22.jpg)
K-‐SVD[Aharon,Elad, Bruckstein 06]
• Given: a good guess ( 𝐴 ) • Goal: find a even beqer dicEonary • Update one dict. element:
• Take all samples with the element • Decode: 𝑦≈𝐴 𝑥 • Residual: 𝑟=𝑦−∑𝑗≠𝑖↑▒𝐴 ↓𝑗 𝑥 ↓𝑗 =± 𝐴↓𝑖 +∑𝑗≠𝑖↑▒( 𝐴 ↓𝑗 𝑥 ↓𝑗 − 𝐴↓𝑗 𝑥↓𝑗 ) • Use top singular vec of residuals
Noise
![Page 23: New$Algorithms$for$Learning$ Incoherentand ......New$Algorithms$for$Learning$ Incoherentand$ OvercompleteDiconaries Sanjeev’Arora Rong’Ge’ Tengyu’Ma Ankur’Moitra Princeton’](https://reader034.fdocuments.net/reader034/viewer/2022052002/60159ca65a210f1e7141f6c8/html5/thumbnails/23.jpg)
K-‐SVD illustrated
Blue: True dicEonary Dashed: EsEmated DicEonary Take all samples with same element Compute Residual
Hope: In residuals, noise is small and random, top singular vector robust for random noise
![Page 24: New$Algorithms$for$Learning$ Incoherentand ......New$Algorithms$for$Learning$ Incoherentand$ OvercompleteDiconaries Sanjeev’Arora Rong’Ge’ Tengyu’Ma Ankur’Moitra Princeton’](https://reader034.fdocuments.net/reader034/viewer/2022052002/60159ca65a210f1e7141f6c8/html5/thumbnails/24.jpg)
K-‐SVD: Intui:on
• When error (𝐴 ↓𝑖 − 𝐴↓𝑖 ) is random • SEll incoherent • Can “decode” • Noise looks random
• When error is adversarial • May not be incoherent • Noise can be correlated
• Bad case: error is highly correlated, poinEng to same direcEon
![Page 25: New$Algorithms$for$Learning$ Incoherentand ......New$Algorithms$for$Learning$ Incoherentand$ OvercompleteDiconaries Sanjeev’Arora Rong’Ge’ Tengyu’Ma Ankur’Moitra Princeton’](https://reader034.fdocuments.net/reader034/viewer/2022052002/60159ca65a210f1e7141f6c8/html5/thumbnails/25.jpg)
make the noise “random”
• ObservaEon: Can detect the bad case!
• To handle bad case, need to • Perturb the esEmated dicEonary • Keep perturbaEon small • The result has low spectral norm
• █■min ‖𝐵‖ @𝑠.𝑡. ‖𝐵↓𝑖 − 𝐴 ↓𝑖 ‖≤𝜖
Large Singular value!
Convex! OPT≤||A||
![Page 26: New$Algorithms$for$Learning$ Incoherentand ......New$Algorithms$for$Learning$ Incoherentand$ OvercompleteDiconaries Sanjeev’Arora Rong’Ge’ Tengyu’Ma Ankur’Moitra Princeton’](https://reader034.fdocuments.net/reader034/viewer/2022052002/60159ca65a210f1e7141f6c8/html5/thumbnails/26.jpg)
Low spectral norm is enough
• Key Lemma: When B has small spectral norm, |<Bi,Bj>|≤ 1/log m, random k columns of B are “almost orthogonal” • è Decoding is accurate for a random sample
• Proof sketch: For BTB • Diagonals are large • Off-‐diagonals are small in Exp. • ConcentraEon è random submatrix is diag. dominant
![Page 27: New$Algorithms$for$Learning$ Incoherentand ......New$Algorithms$for$Learning$ Incoherentand$ OvercompleteDiconaries Sanjeev’Arora Rong’Ge’ Tengyu’Ma Ankur’Moitra Princeton’](https://reader034.fdocuments.net/reader034/viewer/2022052002/60159ca65a210f1e7141f6c8/html5/thumbnails/27.jpg)
Conclusion & Open Problems
• K-‐SVD works provably with good iniEalizaEon • Does the proof give any insight in pracEce? • Whitening
• “Error behaves random” useful in other seNngs? • Handle larger sparsity? • work with RIP assumpEon?
• Lowerbounds?
Thank you!
![Page 28: New$Algorithms$for$Learning$ Incoherentand ......New$Algorithms$for$Learning$ Incoherentand$ OvercompleteDiconaries Sanjeev’Arora Rong’Ge’ Tengyu’Ma Ankur’Moitra Princeton’](https://reader034.fdocuments.net/reader034/viewer/2022052002/60159ca65a210f1e7141f6c8/html5/thumbnails/28.jpg)
Thank you! QuesEons?
![Page 29: New$Algorithms$for$Learning$ Incoherentand ......New$Algorithms$for$Learning$ Incoherentand$ OvercompleteDiconaries Sanjeev’Arora Rong’Ge’ Tengyu’Ma Ankur’Moitra Princeton’](https://reader034.fdocuments.net/reader034/viewer/2022052002/60159ca65a210f1e7141f6c8/html5/thumbnails/29.jpg)
K-‐SVD[Aharon,Elad, Bruckstein 06]
• Given: a good guess • Goal: find a even beqer dicEonary
• Update one dict. element: • Take all samples with the element • Remove other elements • Use top singular vec of residuals
• Hope: In residuals, error is small and random, SVD robust for random noise
![Page 30: New$Algorithms$for$Learning$ Incoherentand ......New$Algorithms$for$Learning$ Incoherentand$ OvercompleteDiconaries Sanjeev’Arora Rong’Ge’ Tengyu’Ma Ankur’Moitra Princeton’](https://reader034.fdocuments.net/reader034/viewer/2022052002/60159ca65a210f1e7141f6c8/html5/thumbnails/30.jpg)
Other Applica:ons
Image Denoising [Mairal et al. ’09]
Digital Zooming [Couzinie-‐Devy ’10]
![Page 31: New$Algorithms$for$Learning$ Incoherentand ......New$Algorithms$for$Learning$ Incoherentand$ OvercompleteDiconaries Sanjeev’Arora Rong’Ge’ Tengyu’Ma Ankur’Moitra Princeton’](https://reader034.fdocuments.net/reader034/viewer/2022052002/60159ca65a210f1e7141f6c8/html5/thumbnails/31.jpg)
Applica:ons
Image CompleEon [Mairal, Elad & Sapiro ’08]
Image Denoising [Mairal et al. ’09]
Digital Zooming [Couzinie-‐Devy ’10]
![Page 32: New$Algorithms$for$Learning$ Incoherentand ......New$Algorithms$for$Learning$ Incoherentand$ OvercompleteDiconaries Sanjeev’Arora Rong’Ge’ Tengyu’Ma Ankur’Moitra Princeton’](https://reader034.fdocuments.net/reader034/viewer/2022052002/60159ca65a210f1e7141f6c8/html5/thumbnails/32.jpg)
Refining the solu:on
• Use other columns to reduce the variance! • Get ε accuracy with poly(m,n)log 1/ε samples