Informatics and Mathematical Modelling / Intelligent Signal Processing MLSP 2007 1 Morten Mørup...

17
Informatics and Mathematical Modelling / Intelligent Signal Processing MLSP 2007 1 Morten Mørup Multiplicative updates for the LASSO Morten Mørup and Line Harder Clemmensen Informatics and Mathematical Modeling Technical University of Denmark

Transcript of Informatics and Mathematical Modelling / Intelligent Signal Processing MLSP 2007 1 Morten Mørup...

Page 1: Informatics and Mathematical Modelling / Intelligent Signal Processing MLSP 2007 1 Morten Mørup Multiplicative updates for the LASSO Morten Mørup and Line.

Informatics and Mathematical Modelling / Intelligent Signal Processing

MLSP 2007 1Morten Mørup

Multiplicative updates for the LASSO

Morten Mørup and Line Harder ClemmensenInformatics and Mathematical Modeling

Technical University of Denmark

Page 2: Informatics and Mathematical Modelling / Intelligent Signal Processing MLSP 2007 1 Morten Mørup Multiplicative updates for the LASSO Morten Mørup and Line.

Informatics and Mathematical Modelling / Intelligent Signal Processing

MLSP 2007 2Morten Mørup

Overview Multiplicative updates (MU) Non-negative matrix factorization (NMF) Convergence of MU Semi-NMF MU for the LASSO Results obtained analyzing a small and 2 large scale

BioInformatics datasets

Page 3: Informatics and Mathematical Modelling / Intelligent Signal Processing MLSP 2007 1 Morten Mørup Multiplicative updates for the LASSO Morten Mørup and Line.

Informatics and Mathematical Modelling / Intelligent Signal Processing

MLSP 2007 3Morten Mørup

Multiplicative updates

Step size parameter

Page 4: Informatics and Mathematical Modelling / Intelligent Signal Processing MLSP 2007 1 Morten Mørup Multiplicative updates for the LASSO Morten Mørup and Line.

Informatics and Mathematical Modelling / Intelligent Signal Processing

MLSP 2007 4Morten Mørup

Non-negative matrix factorization (NMF)

(Lee & Seung - 2001)

NMF gives Part based representation(Lee & Seung – Nature 1999)

Page 5: Informatics and Mathematical Modelling / Intelligent Signal Processing MLSP 2007 1 Morten Mørup Multiplicative updates for the LASSO Morten Mørup and Line.

Informatics and Mathematical Modelling / Intelligent Signal Processing

MLSP 2007 5Morten Mørup

Proof of convergence for =1 by auxiliary functions

(Lee & Seung 2001)

Page 6: Informatics and Mathematical Modelling / Intelligent Signal Processing MLSP 2007 1 Morten Mørup Multiplicative updates for the LASSO Morten Mørup and Line.

Informatics and Mathematical Modelling / Intelligent Signal Processing

MLSP 2007 6Morten Mørup

Multiplicative updates can also be used for Semi-NMF

(Ding et al. 2006)

(A) MU

(B) MUqp

(Sha et al. 2003)

Page 7: Informatics and Mathematical Modelling / Intelligent Signal Processing MLSP 2007 1 Morten Mørup Multiplicative updates for the LASSO Morten Mørup and Line.

Informatics and Mathematical Modelling / Intelligent Signal Processing

MLSP 2007 7Morten Mørup

The least absolute shrinkage and selection operator LASSO

(Tibshirani, 1996)Also known as basis pursuit denoising BPD (Chen et al. 1999)

Page 8: Informatics and Mathematical Modelling / Intelligent Signal Processing MLSP 2007 1 Morten Mørup Multiplicative updates for the LASSO Morten Mørup and Line.

Informatics and Mathematical Modelling / Intelligent Signal Processing

MLSP 2007 8Morten Mørup

The LASSO problem is in general highly over complete

=YI x J I x M XM x J

LASSO is based on a sparse coding principle / principle of parsimony – simplest solution also the best solution

Page 9: Informatics and Mathematical Modelling / Intelligent Signal Processing MLSP 2007 1 Morten Mørup Multiplicative updates for the LASSO Morten Mørup and Line.

Informatics and Mathematical Modelling / Intelligent Signal Processing

MLSP 2007 9Morten Mørup

LASSO by quadratic programming

(Other approaches: LARS, Homotopy method (Drori et al. 2006 ), Danzig Selector (Friedlander and Saunders))

Page 10: Informatics and Mathematical Modelling / Intelligent Signal Processing MLSP 2007 1 Morten Mørup Multiplicative updates for the LASSO Morten Mørup and Line.

Informatics and Mathematical Modelling / Intelligent Signal Processing

MLSP 2007 10Morten Mørup

This recast problem can naturally be solved by multiplicative updates

Page 11: Informatics and Mathematical Modelling / Intelligent Signal Processing MLSP 2007 1 Morten Mørup Multiplicative updates for the LASSO Morten Mørup and Line.

Informatics and Mathematical Modelling / Intelligent Signal Processing

MLSP 2007 11Morten Mørup

Multiplicative updates for the LASSO

(A) MU

(B) MUqp

Page 12: Informatics and Mathematical Modelling / Intelligent Signal Processing MLSP 2007 1 Morten Mørup Multiplicative updates for the LASSO Morten Mørup and Line.

Informatics and Mathematical Modelling / Intelligent Signal Processing

MLSP 2007 12Morten Mørup

Proof of convergence of updates

(A)

(B) Follows directly from proof given in (Sha et al 2003)Bounds derived in (Ding et al. 2006)

Page 13: Informatics and Mathematical Modelling / Intelligent Signal Processing MLSP 2007 1 Morten Mørup Multiplicative updates for the LASSO Morten Mørup and Line.

Informatics and Mathematical Modelling / Intelligent Signal Processing

MLSP 2007 13Morten Mørup

Small scale data set (M < J)

Prostate cancer: The study examines the correlation between the level of specific prostate antigen and 8 clinical measures (M = 8). The clinical measures were taken on 97 men (J = 97) who were about to receive a radical prostatectomy.

Data taken from (Stamey et al., 1989) also used as example in (Hastie et al. 2001)

QP: Matlab standard QP solverBP: BPD algorithm from www.sparselab.stanford.edu

Page 14: Informatics and Mathematical Modelling / Intelligent Signal Processing MLSP 2007 1 Morten Mørup Multiplicative updates for the LASSO Morten Mørup and Line.

Informatics and Mathematical Modelling / Intelligent Signal Processing

MLSP 2007 14Morten Mørup

Large scale data sets (M >>J)

Microarray data taken from (Pochet et al. 2004) Dataset 1 (Alon et al. 1999): Colon cancer 2000 genes (40 tumor,

22 normal, train:27/13 test: 13/9 )

Page 15: Informatics and Mathematical Modelling / Intelligent Signal Processing MLSP 2007 1 Morten Mørup Multiplicative updates for the LASSO Morten Mørup and Line.

Informatics and Mathematical Modelling / Intelligent Signal Processing

MLSP 2007 15Morten Mørup

Dataset 2 (Hedenfalk et al. 2001): Breast cancer 3226 genes. BRCA1 mutation, BRCA2 mutation, and sporadic cases of breast cancer. We considered BRCA1 mutations from the tissues with BRCA2 mutations or sporadic mutations (7 tumor, 15 normal, train: 4/10 test: 3/5 )

Page 16: Informatics and Mathematical Modelling / Intelligent Signal Processing MLSP 2007 1 Morten Mørup Multiplicative updates for the LASSO Morten Mørup and Line.

Informatics and Mathematical Modelling / Intelligent Signal Processing

MLSP 2007 16Morten Mørup

Conclusion

Multiplicative updates forms simple algorithms to solve for the LASSO and as such also generalizes to unconstrained optimization, i.e. = +- -.

The updates are ensured to converge (in the sense of monotonically decreasing the objective function).

The MU-algorithms devised for the LASSO more stable than traditional QP solvers such as MATLAB’s standard QP-solver but not as fast as state of the art algorithms such as the solver given for BPD at www.sparselab.stanford.edu.

The MU-algorithms can easily be extended to the elastic net and fused lasso and forms a general optimization framework.

Page 17: Informatics and Mathematical Modelling / Intelligent Signal Processing MLSP 2007 1 Morten Mørup Multiplicative updates for the LASSO Morten Mørup and Line.

Informatics and Mathematical Modelling / Intelligent Signal Processing

MLSP 2007 17Morten Mørup

ReferencesA. Alon, N. Barkai, D. A. Notterman, K. Gish, S. Ybarra, D. Mack, and A. J. Levine, “Broad patterns of gene expression revealed by clustering analysisof tumor and normal colon tissues probed by oligonucleotide arrays,” Proc. Natl. Acad. Sci. USA, 1999.S.S. Chen, D.L. Donoho, and M.A. Saunders, “Atomic decomposition by basis pursuit,” SIAM J. Sci. Comp., vol. 20, no. 1, pp. 33–61, 1999.C. Ding, T. Li, and M.I. Jordan, “Convex and seminonnegative matrix factorizations,” LBNL Tech Report 60428, 2006.I. Drori and D.L. Donoho, “Solution of l1 minimization problems by lars/homotopy methods,” in IEEE International Conference on Acoustics, Speech, and

Signal Processing, 2006.B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani, “Least angle regression,” Annals of Statistics, vol. 32, no. 2, pp. 407–499, 2004.Michael P. Friedlander and Michael A. Saunders ”DISCUSSION OF “THE DANTZIG SELECTOR” BY CAND`ES AND TAO” submitted annals of StatisticsV. Guigue, A. Rakotomamonjy, and S. Canu, “Kerne basis pursuit,” European Conference on Machine Learning, Porto, 2005.T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning, Springer, 2001.I. Hedenfalk, D. Duggan, Y. Chen, M. Radmacher, M. Bittner, R. Simon, P. Meltzer, B. Gutsterson, M. Esteller, M. Raffeld, Z. Yakhini, A. Ben-Dor, E.

Dougherty, J. Kononen, L. Bubendorf, W. Fehrle, S. Pittaluga, S. Gruvberger, N. Loman, O. Johannsson, H. Olsson, B. Wilfond, G. Sauter, O.-P. Kallioniemi, A. Borg, and J. Trent, “Gene-expression profiles in hereditary breast cancer,” The New England Journal of Medicine, vol. 344, pp. 539–548, 2001.

D.D. Lee and H.S. Seung, “Learning the parts of objects by non-negative matrix factorization,” Nature, vol. 401, no. 6755, pp. 788–91, 1999.D.D. Lee and H.S. Seung, “Algorithms for non-negativematrix factorization,” in Advances in Neural InformationProcessing Systems, 2000, pp. 556–562.M. Mørup and L.H. Clemmensen, “Mulasso,”http://www2.imm.dtu.dk/pubdb/views/edoc_download.php/5235/zip/imm5235.zip.N. Pochet, F. De Smet, A. K. Suykens, and L. R. De Moor Bart, “Systematic benchmarking of microarray data classification: assessing the role of

nonlinearity and dimensionality reduction,” Bioinformatics, vol. 20, no. 17, pp. 3185–95, 2004.M.R. Osborne, B. Presnell, and B.A. Turlach, “A new approach to variable selection in least squares problems,” IMA Journal of Numerical Analysis, vol. 20,

no. 3, pp. 389–403, 2000. S.C. Shaobing and D. Donoho, “Basis pursuit,” 28th Asilomar conf. Signals, Systems Computers, 1994.F. Sha, L.K. Saul, and D.D. Lee, “Multiplicative updates for nonnegative quadratic programming in support vector machines,” in Advances in Neural

Information vProcessing Systems 15, 2002. T. Stamey, J. Kabalin, J. McNeal, I. Johnstone, H. Freiha, E. Redwine, and N. Yang, “Prostate specific antigen in the diagnosis and treatment of

adenocarcinoma of the prostate ii. radical prostatectomy treated patients,” Journal of Urology, vol. 16, pp. 1076–1083, 1989.R. Tibshirani, “Regression shrinkage and selection via the lasso,” Journal of the Royal Statistical Society. Series B (Methodological), vol. 58, no. 1, pp. 267–

288, 1996.R. Tibshirani and M.A. Saunders, “Sparsity and smoothness via the fused lasso,” J. R. Statist. Soc. B, vol. 67, no. 1, pp. 91–108, 2005.H. Zou and T. Hastie, “Regularization and variable selection via the elastic net,” J. R. Statist. Soc. B, vol. 67, no. 2, pp. 301–320, 2005.