Principled Regularization for Probabilistic Matrix Factorization Robert Bell, Suhrid Balakrishnan...
-
Upload
david-barker -
Category
Documents
-
view
222 -
download
1
Transcript of Principled Regularization for Probabilistic Matrix Factorization Robert Bell, Suhrid Balakrishnan...
![Page 1: Principled Regularization for Probabilistic Matrix Factorization Robert Bell, Suhrid Balakrishnan AT&T Labs-Research Duke Workshop on Sensing and Analysis.](https://reader035.fdocuments.net/reader035/viewer/2022062717/56649e1b5503460f94b09822/html5/thumbnails/1.jpg)
Principled Regularization for Probabilistic Matrix Factorization
Robert Bell, Suhrid BalakrishnanAT&T Labs-Research
Duke Workshop on Sensing and Analysis of High-Dimensional Data
July 26-28, 2011
![Page 2: Principled Regularization for Probabilistic Matrix Factorization Robert Bell, Suhrid Balakrishnan AT&T Labs-Research Duke Workshop on Sensing and Analysis.](https://reader035.fdocuments.net/reader035/viewer/2022062717/56649e1b5503460f94b09822/html5/thumbnails/2.jpg)
2
Probabilistic Matrix Factorization (PMF)
• Approximate a large n-by-m matrix R by – M = P Q – P and Q each have k rows, k << n, m– mui = puqi
– R may be sparsely populated
• Prime tool in Netflix Prize– 99% of ratings were missing
![Page 3: Principled Regularization for Probabilistic Matrix Factorization Robert Bell, Suhrid Balakrishnan AT&T Labs-Research Duke Workshop on Sensing and Analysis.](https://reader035.fdocuments.net/reader035/viewer/2022062717/56649e1b5503460f94b09822/html5/thumbnails/3.jpg)
Regularization for PMF
• Needed to avoid overfitting – Even after limiting rank of M– Critical for sparse, imbalanced data
• Penalized least squares– Minimize
3
ii
uui
iuobserved
uui qpqpr222
),(
' )(
![Page 4: Principled Regularization for Probabilistic Matrix Factorization Robert Bell, Suhrid Balakrishnan AT&T Labs-Research Duke Workshop on Sensing and Analysis.](https://reader035.fdocuments.net/reader035/viewer/2022062717/56649e1b5503460f94b09822/html5/thumbnails/4.jpg)
Regularization for PMF
• Needed to avoid overfitting – Even after limiting rank of M– Critical for sparse, imbalanced data
• Penalized least squares– Minimize– or
4
ii
uui
iuobserved
uui qpqpr222
),(
' )(
i
iQu
uPi
iuobserved
uui qpqpr222
),(
' )(
![Page 5: Principled Regularization for Probabilistic Matrix Factorization Robert Bell, Suhrid Balakrishnan AT&T Labs-Research Duke Workshop on Sensing and Analysis.](https://reader035.fdocuments.net/reader035/viewer/2022062717/56649e1b5503460f94b09822/html5/thumbnails/5.jpg)
Regularization for PMF
• Needed to avoid overfitting – Even after limiting rank of M– Critical for sparse, imbalanced data
• Penalized least squares– Minimize– or
– ’s selected by cross validation
5
ii
uui
iuobserved
uui qpqpr222
),(
' )(
i
iQu
uPi
iuobserved
uui qpqpr222
),(
' )(
![Page 6: Principled Regularization for Probabilistic Matrix Factorization Robert Bell, Suhrid Balakrishnan AT&T Labs-Research Duke Workshop on Sensing and Analysis.](https://reader035.fdocuments.net/reader035/viewer/2022062717/56649e1b5503460f94b09822/html5/thumbnails/6.jpg)
Research Questions
• Should we use separate P and Q?
6
![Page 7: Principled Regularization for Probabilistic Matrix Factorization Robert Bell, Suhrid Balakrishnan AT&T Labs-Research Duke Workshop on Sensing and Analysis.](https://reader035.fdocuments.net/reader035/viewer/2022062717/56649e1b5503460f94b09822/html5/thumbnails/7.jpg)
Research Questions
• Should we use separate P and Q?
• Should we use k separate ’s for each dimension of P and Q?
7
![Page 8: Principled Regularization for Probabilistic Matrix Factorization Robert Bell, Suhrid Balakrishnan AT&T Labs-Research Duke Workshop on Sensing and Analysis.](https://reader035.fdocuments.net/reader035/viewer/2022062717/56649e1b5503460f94b09822/html5/thumbnails/8.jpg)
Matrix Completion with Noise(Candes and Plan, Proc IEEE, 2010)
• Rank reduction without explicit factors– No pre-specification of k, rank(M)
• Regularization applied directly to M– Trace norm, aka, nuclear norm– Sum of the singular values of M
• Minimize subject to
• “Equivalent” to L2 regularization for P, Q
8
*M
),(
2)(
iuobserved
uiui mr
![Page 9: Principled Regularization for Probabilistic Matrix Factorization Robert Bell, Suhrid Balakrishnan AT&T Labs-Research Duke Workshop on Sensing and Analysis.](https://reader035.fdocuments.net/reader035/viewer/2022062717/56649e1b5503460f94b09822/html5/thumbnails/9.jpg)
Research Questions
• Should we use separate P and Q?
• Should we use k separate ’s for each dimension of P and Q?
• Should we use the trace norm for regularization?
9
![Page 10: Principled Regularization for Probabilistic Matrix Factorization Robert Bell, Suhrid Balakrishnan AT&T Labs-Research Duke Workshop on Sensing and Analysis.](https://reader035.fdocuments.net/reader035/viewer/2022062717/56649e1b5503460f94b09822/html5/thumbnails/10.jpg)
Bayesian Matrix Factorization (BPMF) (Salakhutdinov and Mnih, ICML 2008)
• Let rui ~ N(puqi, 2)
• No PMF-type regularization• pu ~ N(P, P
-1) and qi ~ N(Q, Q-1)
• Priors for 2, P, Q, P, Q
• Fit by Gibbs sampling• Substantial reduction in prediction error
relative to PMF with L2 regularization10
![Page 11: Principled Regularization for Probabilistic Matrix Factorization Robert Bell, Suhrid Balakrishnan AT&T Labs-Research Duke Workshop on Sensing and Analysis.](https://reader035.fdocuments.net/reader035/viewer/2022062717/56649e1b5503460f94b09822/html5/thumbnails/11.jpg)
Research Questions
• Should we use separate P and Q?
• Should we use k separate reg. parameters for each dimension of P and Q?
• Should we use the trace norm for regularization?
• Does BPMF “regularize” appropriately?
11
![Page 12: Principled Regularization for Probabilistic Matrix Factorization Robert Bell, Suhrid Balakrishnan AT&T Labs-Research Duke Workshop on Sensing and Analysis.](https://reader035.fdocuments.net/reader035/viewer/2022062717/56649e1b5503460f94b09822/html5/thumbnails/12.jpg)
Matrix Factorization with Biases
• Let mui = + au + bi + puqi
• Regularization similar to before– Minimize
12
ii
uu
ii
uu
iuobserved
uiui qpbamr2222
),(
2)(
![Page 13: Principled Regularization for Probabilistic Matrix Factorization Robert Bell, Suhrid Balakrishnan AT&T Labs-Research Duke Workshop on Sensing and Analysis.](https://reader035.fdocuments.net/reader035/viewer/2022062717/56649e1b5503460f94b09822/html5/thumbnails/13.jpg)
Matrix Factorization with Biases
• Let mui = + au + bi + puqi
• Regularization similar to before– Minimize – or
13
ii
uu
ii
uu
iuobserved
uiui qpbamr2222
),(
2)(
i
iu
Qui
Pibu
ua
iuobserved
uiui qpbamr2222
),(
2)(
![Page 14: Principled Regularization for Probabilistic Matrix Factorization Robert Bell, Suhrid Balakrishnan AT&T Labs-Research Duke Workshop on Sensing and Analysis.](https://reader035.fdocuments.net/reader035/viewer/2022062717/56649e1b5503460f94b09822/html5/thumbnails/14.jpg)
Research Questions
• Should we use separate P and Q?
• Should we use k separate reg. parameters for each dimension of P and Q?
• Should we use the trace norm for regularization?
• Does BPMF “regularize” appropriately?• Should we use separate ’s for the biases?
14
![Page 15: Principled Regularization for Probabilistic Matrix Factorization Robert Bell, Suhrid Balakrishnan AT&T Labs-Research Duke Workshop on Sensing and Analysis.](https://reader035.fdocuments.net/reader035/viewer/2022062717/56649e1b5503460f94b09822/html5/thumbnails/15.jpg)
Some Things this Talk Will Not Cover
• Various extensions of PMF– Combining explicit and implicit feedback– Time varying factors– Non-negative matrix factorization – L1 regularization
– ’s depending on user or item sample sizes• Efficiency of optimization algorithms
– Use Newton’s method, each coordinate separately– Iterate to convergence
15
![Page 16: Principled Regularization for Probabilistic Matrix Factorization Robert Bell, Suhrid Balakrishnan AT&T Labs-Research Duke Workshop on Sensing and Analysis.](https://reader035.fdocuments.net/reader035/viewer/2022062717/56649e1b5503460f94b09822/html5/thumbnails/16.jpg)
No Need for Separate P and Q
• M = (cP)(c -1Q) is invariant for c ≠ 0• For initial P and Q
– Solve for c to minimize
– c =
– Gives
• Sufficient to let P = Q = PQ
16
22cQc QP P
41
2
2
P
Q
P
Q
QP212 QP
![Page 17: Principled Regularization for Probabilistic Matrix Factorization Robert Bell, Suhrid Balakrishnan AT&T Labs-Research Duke Workshop on Sensing and Analysis.](https://reader035.fdocuments.net/reader035/viewer/2022062717/56649e1b5503460f94b09822/html5/thumbnails/17.jpg)
Bayesian Motivation for L2 Regularization
• Simplest case: only one item– R is n-by-1– Ru1 = a1 + ui, a1 ~ N(0, 2), ui ~ N(0, 2)
• Posterior mean (or MAP) of a1 satisfies– – a = ( 2/ 2)
– • Best is inversely proportional to 2
17
21
1
211 )( aar a
n
uu
11 )/(ˆ rnna a
![Page 18: Principled Regularization for Probabilistic Matrix Factorization Robert Bell, Suhrid Balakrishnan AT&T Labs-Research Duke Workshop on Sensing and Analysis.](https://reader035.fdocuments.net/reader035/viewer/2022062717/56649e1b5503460f94b09822/html5/thumbnails/18.jpg)
Implications for Regularization of PMF
• Allow a ≠ b
– If a2 ≠ b
2
• Allow a ≠ b ≠ PQ
• Allow PQ1 ≠ PQ2 ≠ … ≠ PQk ?– Trace norm does not– BPMF appears to
18
![Page 19: Principled Regularization for Probabilistic Matrix Factorization Robert Bell, Suhrid Balakrishnan AT&T Labs-Research Duke Workshop on Sensing and Analysis.](https://reader035.fdocuments.net/reader035/viewer/2022062717/56649e1b5503460f94b09822/html5/thumbnails/19.jpg)
Simulation Experiment Structure
• n = 2,500 users, m = 400 items• 250,000 observed ratings
– 150,000 in Training (to estimate a, b, P, Q)– 50,000 in Validation (to tune ’s)– 50,000 in Test (to estimate MSE)
• Substantial imbalance in ratings– 8 to 134 ratings per user in Training data– 33 to 988 ratings per item in Training data
19
![Page 20: Principled Regularization for Probabilistic Matrix Factorization Robert Bell, Suhrid Balakrishnan AT&T Labs-Research Duke Workshop on Sensing and Analysis.](https://reader035.fdocuments.net/reader035/viewer/2022062717/56649e1b5503460f94b09822/html5/thumbnails/20.jpg)
Simulation Model
• rui = au + bi + pu1qi1 + pu2qi2 + ui
• Elements of a, b, P, Q, and – Independent normals with mean 0– Var(au) = 0.09
– Var(bi) = 0.16
– Var(pu1qi1) = 0.04
– Var(pu2qi2) = 0.01
– Var(ui) = 1.00 20
![Page 21: Principled Regularization for Probabilistic Matrix Factorization Robert Bell, Suhrid Balakrishnan AT&T Labs-Research Duke Workshop on Sensing and Analysis.](https://reader035.fdocuments.net/reader035/viewer/2022062717/56649e1b5503460f94b09822/html5/thumbnails/21.jpg)
Evaluation
• Test MSE for estimation of mui = E(rui)– MSE =
• Limitations– Not real data– Only one replication– No standard errors
21
Testiniu
uiui mm),(
2)ˆ(
![Page 22: Principled Regularization for Probabilistic Matrix Factorization Robert Bell, Suhrid Balakrishnan AT&T Labs-Research Duke Workshop on Sensing and Analysis.](https://reader035.fdocuments.net/reader035/viewer/2022062717/56649e1b5503460f94b09822/html5/thumbnails/22.jpg)
PMF Results for k = 0
Restrictions on ’s Values of a, b MSE for m MSE
Grand mean; no (a, b) NA .2979
22
![Page 23: Principled Regularization for Probabilistic Matrix Factorization Robert Bell, Suhrid Balakrishnan AT&T Labs-Research Duke Workshop on Sensing and Analysis.](https://reader035.fdocuments.net/reader035/viewer/2022062717/56649e1b5503460f94b09822/html5/thumbnails/23.jpg)
PMF Results for k = 0
Restrictions on ’s Values of a, b MSE for m MSE
Grand mean; no (a, b) NA .2979
a = b = 0 0 .0712 -.2267
23
![Page 24: Principled Regularization for Probabilistic Matrix Factorization Robert Bell, Suhrid Balakrishnan AT&T Labs-Research Duke Workshop on Sensing and Analysis.](https://reader035.fdocuments.net/reader035/viewer/2022062717/56649e1b5503460f94b09822/html5/thumbnails/24.jpg)
PMF Results for k = 0
Restrictions on ’s Values of a, b MSE for m MSE
Grand mean; no (a, b) NA .2979
a = b = 0 0 .0712 -.2267
a = b 9.32 .0678 -.0034
24
![Page 25: Principled Regularization for Probabilistic Matrix Factorization Robert Bell, Suhrid Balakrishnan AT&T Labs-Research Duke Workshop on Sensing and Analysis.](https://reader035.fdocuments.net/reader035/viewer/2022062717/56649e1b5503460f94b09822/html5/thumbnails/25.jpg)
PMF Results for k = 0
Restrictions on ’s Values of a, b MSE for m MSE
Grand mean; no (a, b) NA .2979
a = b = 0 0 .0712 -.2267
a = b 9.32 .0678 -.0034
Separate a, b 9.26, 9.70 .0678 .0000
25
![Page 26: Principled Regularization for Probabilistic Matrix Factorization Robert Bell, Suhrid Balakrishnan AT&T Labs-Research Duke Workshop on Sensing and Analysis.](https://reader035.fdocuments.net/reader035/viewer/2022062717/56649e1b5503460f94b09822/html5/thumbnails/26.jpg)
PMF Results for k = 1
Restrictions on ’s Values of a, b, PQ1 MSE for m MSE
Separate a, b 9.26, 9.70 .0678
26
![Page 27: Principled Regularization for Probabilistic Matrix Factorization Robert Bell, Suhrid Balakrishnan AT&T Labs-Research Duke Workshop on Sensing and Analysis.](https://reader035.fdocuments.net/reader035/viewer/2022062717/56649e1b5503460f94b09822/html5/thumbnails/27.jpg)
PMF Results for k = 1
Restrictions on ’s Values of a, b, PQ1 MSE for m MSE
Separate a, b 9.26, 9.70 .0678
a = b = PQ1 11.53 .0439 -.0239
27
![Page 28: Principled Regularization for Probabilistic Matrix Factorization Robert Bell, Suhrid Balakrishnan AT&T Labs-Research Duke Workshop on Sensing and Analysis.](https://reader035.fdocuments.net/reader035/viewer/2022062717/56649e1b5503460f94b09822/html5/thumbnails/28.jpg)
PMF Results for k = 1
Restrictions on ’s Values of a, b, PQ1 MSE for m MSE
Separate a, b 9.26, 9.70 .0678
a = b = PQ1 11.53 .0439 -.0239
Separate a, b, PQ1 8.50, 10.13, 13.44 .0439 .0000
28
![Page 29: Principled Regularization for Probabilistic Matrix Factorization Robert Bell, Suhrid Balakrishnan AT&T Labs-Research Duke Workshop on Sensing and Analysis.](https://reader035.fdocuments.net/reader035/viewer/2022062717/56649e1b5503460f94b09822/html5/thumbnails/29.jpg)
PMF Results for k = 2
Restrictions on ’s Values of a, b, PQ1 MSE for m
MSE
Separate a, b, PQ1 8.50, 10.13, 13.44, NA .0439
29
![Page 30: Principled Regularization for Probabilistic Matrix Factorization Robert Bell, Suhrid Balakrishnan AT&T Labs-Research Duke Workshop on Sensing and Analysis.](https://reader035.fdocuments.net/reader035/viewer/2022062717/56649e1b5503460f94b09822/html5/thumbnails/30.jpg)
PMF Results for k = 2
Restrictions on ’s Values of a, b, PQ1 MSE for m
MSE
Separate a, b, PQ1 8.50, 10.13, 13.44, NA .0439
a, b, PQ1 = PQ2 8.44, 9.94, 19.84, 19.84 .0441 +.0002
30
![Page 31: Principled Regularization for Probabilistic Matrix Factorization Robert Bell, Suhrid Balakrishnan AT&T Labs-Research Duke Workshop on Sensing and Analysis.](https://reader035.fdocuments.net/reader035/viewer/2022062717/56649e1b5503460f94b09822/html5/thumbnails/31.jpg)
PMF Results for k = 2
Restrictions on ’s Values of a, b, PQ1 MSE for m
MSE
Separate a, b, PQ1 8.50, 10.13, 13.44, NA .0439
a, b, PQ1 = PQ2 8.44, 9.94, 19.84, 19.84 .0441 +.0002
Separate a, b, PQ1, PQ2 8.43, 10.24, 13.38, 27.30 .0428 -.0013
31
![Page 32: Principled Regularization for Probabilistic Matrix Factorization Robert Bell, Suhrid Balakrishnan AT&T Labs-Research Duke Workshop on Sensing and Analysis.](https://reader035.fdocuments.net/reader035/viewer/2022062717/56649e1b5503460f94b09822/html5/thumbnails/32.jpg)
Results for Matrix Completion
• Performs poorly on raw ratings– MSE = .0693– Not designed to estimate biases
• Fit to residuals from PMF with k = 0– MSE = .0477– “Recovered” rank was 1– Worse than MSE’s from PMF: .0428 to .0439
32
![Page 33: Principled Regularization for Probabilistic Matrix Factorization Robert Bell, Suhrid Balakrishnan AT&T Labs-Research Duke Workshop on Sensing and Analysis.](https://reader035.fdocuments.net/reader035/viewer/2022062717/56649e1b5503460f94b09822/html5/thumbnails/33.jpg)
Results for BPMF
• Raw ratings– MSE = .0498, using k = 3– Early stopping– Not designed to estimate biases
• Fit to residuals from PMF with k = 0– MSE = .0433, using k = 2– Near .0428, for best PMF w/ biases
33
![Page 34: Principled Regularization for Probabilistic Matrix Factorization Robert Bell, Suhrid Balakrishnan AT&T Labs-Research Duke Workshop on Sensing and Analysis.](https://reader035.fdocuments.net/reader035/viewer/2022062717/56649e1b5503460f94b09822/html5/thumbnails/34.jpg)
Summary
• No need for separate P and Q
• Theory suggests using separate ’s for distinct sets of exchangeable parameters– Biases vs. factors– For individual factors
• Tentative simulation results support need for separate ’s across factors– BPMF does so automatically– PMF requires a way to do efficient tuning
34