Informatics and Mathematical Modelling / Intelligent Signal Processing MLSP 2007 1 Morten Mørup...
-
Upload
geoffrey-rich -
Category
Documents
-
view
221 -
download
2
Transcript of Informatics and Mathematical Modelling / Intelligent Signal Processing MLSP 2007 1 Morten Mørup...
Informatics and Mathematical Modelling / Intelligent Signal Processing
MLSP 2007 1Morten Mørup
Multiplicative updates for the LASSO
Morten Mørup and Line Harder ClemmensenInformatics and Mathematical Modeling
Technical University of Denmark
Informatics and Mathematical Modelling / Intelligent Signal Processing
MLSP 2007 2Morten Mørup
Overview Multiplicative updates (MU) Non-negative matrix factorization (NMF) Convergence of MU Semi-NMF MU for the LASSO Results obtained analyzing a small and 2 large scale
BioInformatics datasets
Informatics and Mathematical Modelling / Intelligent Signal Processing
MLSP 2007 3Morten Mørup
Multiplicative updates
Step size parameter
Informatics and Mathematical Modelling / Intelligent Signal Processing
MLSP 2007 4Morten Mørup
Non-negative matrix factorization (NMF)
(Lee & Seung - 2001)
NMF gives Part based representation(Lee & Seung – Nature 1999)
Informatics and Mathematical Modelling / Intelligent Signal Processing
MLSP 2007 5Morten Mørup
Proof of convergence for =1 by auxiliary functions
(Lee & Seung 2001)
Informatics and Mathematical Modelling / Intelligent Signal Processing
MLSP 2007 6Morten Mørup
Multiplicative updates can also be used for Semi-NMF
(Ding et al. 2006)
(A) MU
(B) MUqp
(Sha et al. 2003)
Informatics and Mathematical Modelling / Intelligent Signal Processing
MLSP 2007 7Morten Mørup
The least absolute shrinkage and selection operator LASSO
(Tibshirani, 1996)Also known as basis pursuit denoising BPD (Chen et al. 1999)
Informatics and Mathematical Modelling / Intelligent Signal Processing
MLSP 2007 8Morten Mørup
The LASSO problem is in general highly over complete
=YI x J I x M XM x J
LASSO is based on a sparse coding principle / principle of parsimony – simplest solution also the best solution
Informatics and Mathematical Modelling / Intelligent Signal Processing
MLSP 2007 9Morten Mørup
LASSO by quadratic programming
(Other approaches: LARS, Homotopy method (Drori et al. 2006 ), Danzig Selector (Friedlander and Saunders))
Informatics and Mathematical Modelling / Intelligent Signal Processing
MLSP 2007 10Morten Mørup
This recast problem can naturally be solved by multiplicative updates
Informatics and Mathematical Modelling / Intelligent Signal Processing
MLSP 2007 11Morten Mørup
Multiplicative updates for the LASSO
(A) MU
(B) MUqp
Informatics and Mathematical Modelling / Intelligent Signal Processing
MLSP 2007 12Morten Mørup
Proof of convergence of updates
(A)
(B) Follows directly from proof given in (Sha et al 2003)Bounds derived in (Ding et al. 2006)
Informatics and Mathematical Modelling / Intelligent Signal Processing
MLSP 2007 13Morten Mørup
Small scale data set (M < J)
Prostate cancer: The study examines the correlation between the level of specific prostate antigen and 8 clinical measures (M = 8). The clinical measures were taken on 97 men (J = 97) who were about to receive a radical prostatectomy.
Data taken from (Stamey et al., 1989) also used as example in (Hastie et al. 2001)
QP: Matlab standard QP solverBP: BPD algorithm from www.sparselab.stanford.edu
Informatics and Mathematical Modelling / Intelligent Signal Processing
MLSP 2007 14Morten Mørup
Large scale data sets (M >>J)
Microarray data taken from (Pochet et al. 2004) Dataset 1 (Alon et al. 1999): Colon cancer 2000 genes (40 tumor,
22 normal, train:27/13 test: 13/9 )
Informatics and Mathematical Modelling / Intelligent Signal Processing
MLSP 2007 15Morten Mørup
Dataset 2 (Hedenfalk et al. 2001): Breast cancer 3226 genes. BRCA1 mutation, BRCA2 mutation, and sporadic cases of breast cancer. We considered BRCA1 mutations from the tissues with BRCA2 mutations or sporadic mutations (7 tumor, 15 normal, train: 4/10 test: 3/5 )
Informatics and Mathematical Modelling / Intelligent Signal Processing
MLSP 2007 16Morten Mørup
Conclusion
Multiplicative updates forms simple algorithms to solve for the LASSO and as such also generalizes to unconstrained optimization, i.e. = +- -.
The updates are ensured to converge (in the sense of monotonically decreasing the objective function).
The MU-algorithms devised for the LASSO more stable than traditional QP solvers such as MATLAB’s standard QP-solver but not as fast as state of the art algorithms such as the solver given for BPD at www.sparselab.stanford.edu.
The MU-algorithms can easily be extended to the elastic net and fused lasso and forms a general optimization framework.
Informatics and Mathematical Modelling / Intelligent Signal Processing
MLSP 2007 17Morten Mørup
ReferencesA. Alon, N. Barkai, D. A. Notterman, K. Gish, S. Ybarra, D. Mack, and A. J. Levine, “Broad patterns of gene expression revealed by clustering analysisof tumor and normal colon tissues probed by oligonucleotide arrays,” Proc. Natl. Acad. Sci. USA, 1999.S.S. Chen, D.L. Donoho, and M.A. Saunders, “Atomic decomposition by basis pursuit,” SIAM J. Sci. Comp., vol. 20, no. 1, pp. 33–61, 1999.C. Ding, T. Li, and M.I. Jordan, “Convex and seminonnegative matrix factorizations,” LBNL Tech Report 60428, 2006.I. Drori and D.L. Donoho, “Solution of l1 minimization problems by lars/homotopy methods,” in IEEE International Conference on Acoustics, Speech, and
Signal Processing, 2006.B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani, “Least angle regression,” Annals of Statistics, vol. 32, no. 2, pp. 407–499, 2004.Michael P. Friedlander and Michael A. Saunders ”DISCUSSION OF “THE DANTZIG SELECTOR” BY CAND`ES AND TAO” submitted annals of StatisticsV. Guigue, A. Rakotomamonjy, and S. Canu, “Kerne basis pursuit,” European Conference on Machine Learning, Porto, 2005.T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning, Springer, 2001.I. Hedenfalk, D. Duggan, Y. Chen, M. Radmacher, M. Bittner, R. Simon, P. Meltzer, B. Gutsterson, M. Esteller, M. Raffeld, Z. Yakhini, A. Ben-Dor, E.
Dougherty, J. Kononen, L. Bubendorf, W. Fehrle, S. Pittaluga, S. Gruvberger, N. Loman, O. Johannsson, H. Olsson, B. Wilfond, G. Sauter, O.-P. Kallioniemi, A. Borg, and J. Trent, “Gene-expression profiles in hereditary breast cancer,” The New England Journal of Medicine, vol. 344, pp. 539–548, 2001.
D.D. Lee and H.S. Seung, “Learning the parts of objects by non-negative matrix factorization,” Nature, vol. 401, no. 6755, pp. 788–91, 1999.D.D. Lee and H.S. Seung, “Algorithms for non-negativematrix factorization,” in Advances in Neural InformationProcessing Systems, 2000, pp. 556–562.M. Mørup and L.H. Clemmensen, “Mulasso,”http://www2.imm.dtu.dk/pubdb/views/edoc_download.php/5235/zip/imm5235.zip.N. Pochet, F. De Smet, A. K. Suykens, and L. R. De Moor Bart, “Systematic benchmarking of microarray data classification: assessing the role of
nonlinearity and dimensionality reduction,” Bioinformatics, vol. 20, no. 17, pp. 3185–95, 2004.M.R. Osborne, B. Presnell, and B.A. Turlach, “A new approach to variable selection in least squares problems,” IMA Journal of Numerical Analysis, vol. 20,
no. 3, pp. 389–403, 2000. S.C. Shaobing and D. Donoho, “Basis pursuit,” 28th Asilomar conf. Signals, Systems Computers, 1994.F. Sha, L.K. Saul, and D.D. Lee, “Multiplicative updates for nonnegative quadratic programming in support vector machines,” in Advances in Neural
Information vProcessing Systems 15, 2002. T. Stamey, J. Kabalin, J. McNeal, I. Johnstone, H. Freiha, E. Redwine, and N. Yang, “Prostate specific antigen in the diagnosis and treatment of
adenocarcinoma of the prostate ii. radical prostatectomy treated patients,” Journal of Urology, vol. 16, pp. 1076–1083, 1989.R. Tibshirani, “Regression shrinkage and selection via the lasso,” Journal of the Royal Statistical Society. Series B (Methodological), vol. 58, no. 1, pp. 267–
288, 1996.R. Tibshirani and M.A. Saunders, “Sparsity and smoothness via the fused lasso,” J. R. Statist. Soc. B, vol. 67, no. 1, pp. 91–108, 2005.H. Zou and T. Hastie, “Regularization and variable selection via the elastic net,” J. R. Statist. Soc. B, vol. 67, no. 2, pp. 301–320, 2005.