Models of Science Teaching Chapter 6 Models of Science Teaching.
Introduction to core science models
-
Upload
yingfeng -
Category
Technology
-
view
319 -
download
3
description
Transcript of Introduction to core science models
Introduction to Core Science ModelsYahoo! Labs
2011/19/11
Agenda
Basic Counting Models: EMP
Feature Based Models: OLR
RLFM: Feature Model + Collaborative Filtering
Bonus: Tutorial on Collaborative Filtering
Note:
› Will focus on the science framework
› Will not focus on the optimization problem
EMP + OLR:
Basic Counting Models: EMP
› Simple CTR model based on counting clicks/views
Feature Based Models: OLR
RLFM: Feature Model + Collaborative Filtering
Bonus: Tutorial on Collaborative Filtering
Today Module on Yahoo FP:
Counting Models: CTR
Estimate CTR for each article independently
CTR = Click-Thru-Rate = Total Clicks / Total Views
Online Model: Update every 5 mins:
++
=
++++++=
∑∑
<
<
−
−
ts st
ts st
11tt
11tt
VV
CC
V...VV
C...CCCTR
t'' period during viewsV
t'' period during clicksC
t
t
==
CTR Curves for Two Days
Traffic obtained from a controlled randomized experimentThings to note: (a) Short lifetimes, (b) temporal effects, (c) often breaking news stories
Each curve is the CTR of an item in the Today Module over time
Counting Models: Most Popular
EMP: Estimated Most Popular ( aka GMP ):› Decay = Forget about old clicks and views ( Gamma > 0.95-0.99 )
Segmented Most Popular:› Separate model for each segment of the population
++++++=
−−
−−
...VγVγtV
...CγCγCCTR
2t2
1tt
2t2
1ttEMP
++++++=
−−
−−− ...VγVγV
...CγCγCCTR
Male2t
2Male1t
Malet
Male2t
2Male1t
Malet
MaleEMP
Tracking behavior of Estimated Most Popular model
Low click rate articles – More temporal smoothing
OLR: Online Logistic Regression
Basic Counting Models: EMP
Feature Based Models: OLR
› Motivation for using regression
› Logistic Regression framework
› Online Logistic Regression: general case
› Per item-OLR Use Case : Today Module
› Improving Model
RLFM: Feature Model + Collaborative Filtering
Affinity Models: Log Odds
Bonus: Tutorial on Collaborative Filtering
Motivation for using Regression:
Logistic Regression:› Natural framework to include more features: › Age, Gender, Location,User Interests,…
› Xk,u = value for feature k and user u: eg age of a user
› Wk = weight parameter to be learned for each feature
++++=
−
−
...VγV
...CγCCTR
Male_40_NY1t
Male_40_NYt
Male_40_NY1t
Male_40_NYt
Male_40_NY
∑ = ∗+=−{features}k uk,kclickclick XWb))P/(1Log(P
• EMP: Breaks down if segment is too small: • eg 40 yrs old Male in NewYork
Linear Regression: One Dimension
2
{examples}i ii b)Xa(YSSE ∑ = −∗−=
• Find value of “a” and “b” that minimize Sum of Square of Errors (SSE)
• Take derivative of SSE with respect to “a” and “b” and equal to 0
60 70 80 90 100 110 12080
100
120
140
160
180
Linear Fit: Y = a * X + b
X = Height
Y =
Wei
gh
t
ERROR
Can’t Apply Linear Model to Click Prediction
For example: Probably of Click for article on Retirement as function of Age
0 10 20 30 40 50 60 70 80 90 1000.0
0.2
0.4
0.6
0.8
1.0Linear Doesn't Represent the Data Well
Data Points
Linear Model
Pro
ba
bili
ty o
f C
lick
Age
Logistic Model for Click Prediction Probably of Click for article on Retirement as function of Age
)*(1
1)(
bAgeaExpClickP
−−+=
Age
0 10 20 30 40 50 60 70 80 90 1000.0
0.2
0.4
0.6
0.8
1.0Logistic Model is much better
Data Points
Logistic Model
Pro
ba
bili
ty o
f C
lick
Logistic Regression: One Dimension
• How to find parameter “a” and “b” for many training examples:
• Maximized Product of Probabilities (Likelihood):
• “Hard” to solve
))((exp1
1)(
bAgeaYiYP
ii −⋅⋅−+
=
=>−==>+=
1
1
i
i
Y
Y P(Yi=+1) = Prob user Clicked on article
P(Yi=-1) = Prob user Didn’t Click
)()()()( 321 nYPYPYPYPLikelihood ⋅⋅⋅⋅⋅=
),( ii AgeY
Optimize Logistic Likelihood for 4 Data Points:)()()()()4..1( 4321 YPYPYPYPLikelihood ⋅⋅⋅=
-0.3 -0.2 -0.1 0 0.1 0.2 0.30
2
4
6
8
10
12
Likelihod
Prob(Y1)
Prob(Y2)
X-axis: parameter “a”
b))Age(aYi(exp1
1)P(Y
ii −⋅⋅−+
=
For simplicity: I assume that I know the value of “b”
Optimize Logistic Likelihood for 40 Data Points:
-0.08 -0.05 -0.02 0 0.03 0.05 0.08 0.1 0.13 0.15 0.18 0.20
2
4
6
8
10
12
40 Data Points
4 Data Points
Lik
elih
oo
d (
res
cale
d )
X-axis: parameter “a”
)()()()40..1( 4021 YPYPYPLikelihood ⋅⋅⋅⋅=
For simplicity: I assume that I know the value of “b”
Gaussian Approximation to Likelihood:
X-axis: parameter “a”
(a)Likelihood)/2)m(aExp( 40240
240 ≈−− σ
0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.200.0E+00
2.0E+00
4.0E+00
6.0E+00
8.0E+00
1.0E+01
1.2E+01
Gaussian_Max
Likelihood40
• Replace Likelihood with a simple Gaussian with two Hyperparameters:
* Mean: m40 (what is the average value for “a”)
* Standard deviation: (what is the error around the mean)
40σ
40m
40σ
Gaussian approx allow for Update for one data point at a time:
)}P(Y)P(Y){P(Y)P(Y)/2)m(aExp( 1383940240
240 ⋅⋅⋅⋅⋅≈−− σ
)/2)m(aExp()P(Y)/2)m(aExp( 239
23940
240
240 σσ −−⋅≈−−
(a)Likelihood)/2)m(aExp( 40240
240 ≈−− σ
Posterior Likelihood * Prior
≈
• Note: for simplicity I ignored all normalizations
OLR: Online Logistic Regression: one parameter
• Solve Bayesian update for each new event:
b))Age(aY(exp1
1P(Y)
−⋅⋅−+=
)/2)m(aExp(P(Y))/2)m(aExp( 21-t
21-t
2t
2t σσ −−⋅≈−−
• Yrank approximate solution: Scott Roy talk: http://twiki.corp.yahoo.com/pub/Personalization/YRank/YRankLearning.ppt
• Yrank update formulas:
+=
+=
−
−
21t
2t
1tt
1/1/
mm
σσ
Posterior Likelihood * Prior
≈
Age)(Y,
OLR: Online Logistic Regression: General Case
• Solve Bayesian update for each new event:
)XwY(exp1
1P(Y)
fff∑ ⋅⋅−+
=
)/2σ)m(wExp(P(Y))/2σ)m(wExp(f
21-tf,
21-tf,f
f
2tf,
2tf,f ∑∑ −−⋅≈−−
• Yrank update formulas:
+=
+=
−
−
21tf,
2tf,
1tf,tf,
1/σ1/σ
mm
• Replace one parameter “a” by a set of parameters:
• Replace on feature “Age” by a set of features:
}{w f
}{Xf
}){X(Y, f
Posterior Likelihood * Prior
≈
OLR: General Case: Features
• Multi-dimension logistic regression model:
))Xw(Y(exp1
1P(Y)
{features}fff∑
=⋅⋅−+
=
Sports)about_i&likeSports(u8
about_NBAi7tsabout_Spori6
likeSportsu5SanJoseu4Age40su3Maleu2
1ff
f
*
**
1*
==
==
====
+
++
++++
∗=∑
Xw
XwXw
XwXwXwXw
wXw<= Baseline
<= User Features
<= Article Features
<= User*Article Features
• More on Features:http://twiki.corp.yahoo.com/view/SRelevance/NewsRecommendationFeatureshttp://twiki.corp.yahoo.com/view/SRelevance/COREUserProfilesSparsePolarity
OLR: Online Logistic Regression
Basic Counting Models: EMP
Feature Based Models: OLR
› Motivation for using regression
› Logistic Regression framework
› Online Logistic Regression: General Case
› Per item-OLR Use Case : Today Module
› Improving Model
RLFM: Feature Model + Collaborative Filtering
Affinity Models: Log Odds
Bonus: Tutorial on Collaborative Filtering
Per item-OLR use Case: Yahoo FP Today Module
Per item-OLR use Case: Yahoo FP Today Module• Front Page Module:
• Article don’t live very long ( < day )• Many clicks/views for each article
• Each Article treated independently: • A new OLR model for each new Article
• Trying to predict CTR for each user & article pair: u,i
)Xw(exp1
11)P(Y
fu,ures}{user_featf fi,ui ⋅−+
==∑ =
likeMusicu7likeNFLu6likeSportsu5
NewYork4Age20su3Maleu2
1fu,f
X*wX*wX*w
X*wX*wX*w
wX*w
===
==
+++
+++=∑ <= Baseline
<= User Features
Per item-OLR use Case: Yahoo FP Today Module
)Xw(exp1
11)P(Y
fu,ures}{user_featf fi,ui ⋅−+
==∑ =
-6 -5 -4 -3 -2 -1 0 1 2 3 40.0
0.2
0.4
0.6
0.8
1.0
∑ ⋅ Xw
P(Y
i=1)
1w
Per item-OLR use Case: Yahoo FP Today Module
))/2)m(w(Exp~Prior 2fi,
2fi,{features}k fi, σ−−∑ =
• Each Article has its own OLR Model and its own set of weights: }{w fi,
• Each Article has its own:
Yrank Update Formula:
+=
+=
−
−
21tf,i,
2tf,i,
1tf,i,tf,i,
1/1/
mm
σσ
)Xw(exp1
11)P(Y
fu,ures}{user_featf fi,ui ⋅−+
==∑ =
• For each event (Yui,{Xuf}) update the hyperparameters for that article:
Per item-OLR use Case: Yahoo FP Today Module
• How to use OLR model:
• Choose a candidate pool: • Roughly 50-100 pick by editors
• Explore: • In a small bucket: try all 50-100 articles randomly• Modeling: For each event(click/view) apply Yrank for that
article
• Exploit: • For the reminder (larger bucket) • Scoring: Predict article CTR, and order by decreasing CTR:
)Xm(exp1
11)P(Y CTR
fu,features}{userf fi,ui ⋅−+
===∑ =
Improving Online Learning:
• Correlated OLR: Include interactions between hyperparameters: improvement
))m(w)Am(w)2/1((Exp~Prior f2f21
f2f1,f1f2f1, f1 −−− −∑
• Mini-Batch: Update multiple data points at once: no gain in CTR
• TechPulse 2011: Taesup Moon, Pradheep Elango, Su-Lin Wuhttp://twiki.corp.yahoo.com/pub/YResearch/CokeLabDiary/techpulse.pdf
)P(Y)P(Ybatch) (miniLikelihood n1 =
Improving Explore/Exploit: UCB
• UCB: improve Explore/Exploit strategy: improvement
• Old strategy: • Explore: update OLR only from events in a small random bucket• Exploit:
• Order articles in decreasing value of predicted CTR
• New strategy: UCB (aka Upper Confidence Bound)• Single bucket• Explore:
• Update OLR with all events• Exploit:
• Order articles in decreasing value of “optimistic”
greedy-ε
UCBCTR
• TechPulse 2011: Taesup Moon, Pradheep Elango, Su-Lin Wuhttp://twiki.corp.yahoo.com/pub/YResearch/CokeLabDiary/techpulse.pdf
Improving Explore/Exploit: UCB
• Upper Confidence Bound strategy: improvement• Exploit:
• Order articles in decreasing value of “optimistic”
• ONE DIMENSION EXAMPLE:
X)m(exp1
1CTR
⋅−+=
))XσzX(m(exp1
1CTR
2UCB⋅⋅+⋅−+
=
UCBCTR
parameter tunable z =
• Replace normal CTR:
• With optimistic CTR:
-6 -5 -4 -3 -2 -1 0 1 2 3 40.0
0.2
0.4
0.6
0.8
1.0
UCBCTRCTR
RLFM: Regression based Latent Factor Model
Basic Counting Models: EMP
Feature Based Models: OLR
RLFM: Feature Model + Collaborative Filtering› RLFM components
› Using RLFM: Offline & Online update
Bonus: Tutorial on Collaborative Filtering
RLFM: Regression based Latent Factor Model• RLFM: basic idea
* Build a single logistic regression model for all users “u” and articles “i”
* Add Collaborative Filtering using Matrix Factorization• Modeling:
• Most of it is done offline in big batch mode ( millions events )• One part of the model is also updated online ( one event using Yrank
update)
⇒Latent Factor Models are work in progress:
• Original Y Labs Paper: Deepak Agarwal, Bee-Chung Chenhttp://twiki.corp.yahoo.com/pub/YResearch/CokeLabDiary/featfact.pdf
• Implementation for Coke:http://twiki.corp.yahoo.com/view/YResearch/RLFMForCoke
RLFM: Regression based Latent Factor Model
RLFM components:
1) Build a logistic regression model for all users “u” and
articles “i”
2) Add user bias and article bias
3) Collaborative Filtering using Matrix Factorization
4) Predict factors for new user/article: Cold Start
5) Add Logistic Regression + Bias + Matrix Factorization
1) Build logistic regression for all user/articles:
)*(exp1
1)1P(Y
}esall_featur{ffui,f
ui ∑=
−+==
Xw
Sports)about_i&likeSports(u8
about_NBAi7tsabout_Spori6
likeSportsu5SanJoseu4Age40su3Maleu2
1fui,f
f
*
**
*
==
==
====
+
++
++++
=∑
Xw
XwXw
XwXwXwXw
wXw<= Baseline
<= User Features
<= Article Features
<= User*Article Features
• A single set of parameters {Wf} for all users, articles• Learned offline in batch mode
• Build a single logistic regression model for all users {u}, articles {i}:
2) Add per user and per article baseline:
• Add bias parameters:● Some article are more/less popular than other● Some user read more/less stories than other
)X*wβα(exp1
1)1P(Y
res}{all_featuffui,fiu
ui ∑=
−−−+==
• Baseline is not the same for every user/article:• Old Baseline:
• New baseline: iu1 βαw ++
• More parameters to optimize:
• Better with some priors – to be described later
1w
}β{},α{},{w iuf
3) Matrix Factorization Motivation
• How do deal with:
• Article about disaster preparedness:
• Hurricanes: need user from coastline: Texas => Northeast
• Earthquakes: need user from West coast
• Would need : X_user_WestCoast * X_about_earthquakes• I don’t have that …
• But if I have many views/clicks over many such articles I can discover that pattern !!!
3) Matrix Factorization Motivation
Use
rs
Earthquake Politics
=
1111
0010
1000
0001
0010
0000
1111
1111
CDC
NewYork
Oakland
SanJose
licks
• I can discover patterns within clicks:• SIMPLE EXAMPLE:
0
0
1
1
)00001111(
+
1
0
0
0 )11110000(• Clicks mostly explained by:
2211 V*UV * U +
3) Matrix Factorization Motivation
)V*UV*U(exp1
11)P(Y)P(Click
i,2u,2i,1u,1uiui −−+
===
)V*U(exp1
11)P(Y
ikuk{factors}k
ui ∑=
−+==
• The general case:
• Note:• Number of factors ~ 50-200 << Nusers & Narticles
• Most Clicks explained by:2211 V*UV*U +
3) Matrix Factorization Model
• Matrix Factorization Model: aka Collaborative Filtering
• Obtain U’s and V’s: maximize the following likelihood
−+=
∑Π=
= ))V*U(Y(exp1
1Likelihood
{factors}kikukui{examples}ui
)V*U(exp1
1)1(P
{factors}kikuk
ui ∑=
−+==Y
... priors someh Better wit
-1 viewsand 1clicksY
ews)(clicks/vi eventspast allover Product
ui
ui
•=+==>•
=>Π•
3)Matrix Factorization Model
• Get U’s and V’s: Maximize Likelihood * Prior
ikb
uka
ba
2b
2bikik
2a
2aukuk
V allfor same theisσ
Uallfor same theisσ
0,m and 0m :Choose
)/2)m-Exp(-(V~Veach for prior
)/2)m-Exp(-(U~each Ufor prior
:Priors someh Better wit
⋅
⋅
==⋅⋅
⋅
•
σσ
• Note: above priors are uncorrelated• Original RLFM paper used correlated priors
4) Matrix Factorization Model – Cold Start Problem
• Matrix Factorization Model:
• Cold start problem: =>for new user U=0 or for new article V=0
)V*U(exp1
1)(P
{factors}kikuk
ui ∑=
−+=Y
4) Matrix Factorization Model – Cold Start Problem
• Matrix Factorization Model:
• Cold start problem: =>for new user U=0 or for new article V=0
)V*U(exp1
1)(P
{factors}kikuk
ui ∑=
−+=Y
• Solution choose different prior:
)/2σ)XD(VExp(Veach for
)/2σ)XG(UExp(each Ufor
2b
2bi,bk,
ures}{item_featbikik
2a
2au,ak,
ures}{user_feataukuk
∑
∑
=
=
−−=⋅
−−=⋅
• Parameters G’s & D’s obtained from maximizing: Likelihood * Prior
5) RLFM: Regression based Latent Factor Model
• Putting it back together: Bias + Regression + Matrix Factorization:
)V*UX*wβα(exp1
1)1(P
{factors}kikuk
{features}ffui,fiu
ui ∑∑==
−−−−+==Y
)/2σ)XD(VExp(Veach for
)/2σ)XG(UExp(each Ufor
2b
2bi,bk,
ures}{item_featbikik
2a
2au,ak,
ures}{user_feataukuk
∑
∑
=
=
−−=⋅
−−=⋅• Priors:
)/2σ)Xd(βExp(βeach for
)/2σ)Xg(αExp(αeach for
2β
2bi,b
ures}{item_featbii
2α
2au,a
ures}{user_featauu
∑
∑
=
=
−−=⋅
−−=⋅
RLFM: Regression based Latent Factor Model
Basic Counting Models: EMP
Feature Based Models: OLR
RLFM: Feature Model + Collaborative Filtering› RLFM components
› Using RLFM: Offline & Online update
Bonus: Tutorial on Collaborative Filtering
Using RLFM: Offline Modeling:
• Offline Modeling:
• Batch mode: Maximize: Likelihood * Prior
• Millions to Billions of examples processed at once
• Input: {Y’s, X’s} all events and features
• Output:
}{D},{G},{d},{g},{w :parameters
}{V},{U},{β},{α :factors
bk,ak,baf
ikukiu
Using RLFM: Online Modeling and Scoring:
• Online Scoring + some Modeling
• For new user or new article: compute factors from g,d,G,D
• new user bias:
• For old user or old article: get factors from offline batch mode
• For each event (click/view) on article “i”:
• Update Vik using per-item OLR approach
• Predict score using updated Vik:
)V*UX*wβα(exp1
1)1(P
{factors}kikuk
{features}ffui,fiu
ui ∑∑==
−−−−+==Y
au,aures}{user_feata
u Xgα ∑=
=
RLFM: Offline Results on Coke Data: Today Module• RLFM results on Offline experiment
• Y! Front Page – Today Module• CTR relative lift for RLFM vs Feature-Only as function of clicks/user
http://twiki.corp.yahoo.com/view/YResearch/RLFMReplayExperiments
Q & AContributors:
Pradheep Elango, Su-Lin Wu,Teasup Moon, Pranam Kolari
Deepak Agarwal, Bee-Chung Chen, Scott Roy
Jean-Marc Langlois
•Coke Science Papers:
http://twiki.corp.yahoo.com/view/YResearch/CokeLabDiary
Tutorial onCollaborative Filtering
Based on following Chapter
http://research.yahoo.com/files/korenBellChapterSpringer.pdf
By two of the Netflix winners
Collaborative Filtering: Introduction
Goal: predict ratings rui for a movie “i” that a user “u” hasn’t seen yet› Prediction based on Matrix of User/Movie Ratings:
● rui = 1 through 5 stars:
› Prediction equations for integer Ratings are simpler then for binary Clicks
› Rating matrix is a large very sparse matrix: ● 10M-100M users and 10k-100K movies but with ~99% blank entries
Based on : http://research.yahoo.com/files/korenBellChapterSpringer.pdf› This talk: Focus on the most relevant models & Ignore some improvements:
● Baseline adjustment : user bias, movie bias and overall average rating
● time aware model, binary features ( rated, rented )
This talk: › Adjusted Ratings:
)(Baselinerawadjusted
uiuiui rr −<=
Collaborative Filtering: the models
Correlated Neighborhood Model› Predict new rating based on ratings of similar movies
Global Neighborhood Model› Enlarge Neighborhood to be “global”
› Introduce adjustable weight parameters
Factorized Neighborhood Model› Apply matrix factorization to weight parameters
SVD Model› Apply matrix factorization to rating matrix itself
Collaborative Filtering: Correlated Neighborhood Model
• Define movie-movie Similarity measure: ● Sij based on correlation
• Define Correlated Neighborhood: • set of ~20 movies with largest Sij that are rated by “u”
• Define Weight : normalized Sijui
uj1
uj2
uj3
uj4
uj5uj6
• Predict unknown rui based on known ratings of similar movies ruj
• You will like movie“i” because you liked movies “j”
Sij6
ionNormalizat
j)Union(i,
/uju
uiij rrS ∗∝ ∑=
Collaborative Filtering: Correlated Neighborhood Model
• Movies:• i=1 Star Trek• i=2 Star Wars• i=3 Action movie• i=4 Horror movie
ionNormalizat
j)Union(i,uj
uui
ij
rr
S
∗∝
∑=
==
−−−1111
1111
1111
1111
Ratings uir
=
15.000
5.015.05.0
05.011
05.011
ijS
Movie-Movie Similarity = Sij:
Movies
Use
rs
Movies
Mov
ies
Collaborative Filtering: Correlated Neighborhood Model
∑∑ ∗== j
ijijj
ujui SSrr /~s}d_neighbor{correlate
• Predict unknown rui based on known ratings of similar movies ruj
• You will like movie“i” because you liked movies “j”
• Simple, intuitive model with ability to explain why we predict a new movie
• Modeling:• Need to precompute and store Sij: 10k * 10K = 100M• Weights are fixed to normalized value of Sij
• Optimal neighborhood is small
uju
uiij rrS ∗∝ ∑= j)Union(i,
uiuj1
uj2
uj3
uj4
uj5uj6
Sij6
• Similarity measure:
• Correlated Neighborhood: • set of ~20 movies with largest Sij that are rated by “u”
• Weight : • normalized Sij
• Scoring:
Collaborative Filtering: Global Neighborhood Model
• Modeling: Pick Wij to minimize regularized Sum of Errors:
• Extend Neighborhood to All Known Ratings for User “u”:• Let weight Wij be free parameters:
• Scoring:
∑∑ ∑ +∗−== ∈ ij
ijui
uRijuRjujui wwrrSSE 2
}gspast_ratin{
2)|(|
)(
)( / λ
• Better predictive power then previous model• Not easy to explain recommendation
• Expensive Modeling, Scoring and Storage of Wij: Size = 100M• Could try to limit based on Sij but there is a better approach
parametertion regulariza=λ
}{)( knownujruR =
)|(|
)(
/~uRij
uRjujui wrr ∗= ∑
∈
Reduce Number of Free Parameters: Matrix Factorization
• Want to reduce the number of free parameters in Wij: • Current size:10k * 10K = 100M
• Matrix factorization:• Goal: reduce number of free parameters to ~1M
=
=
...
1
1
1
............
...111
...111
...111
Weigth
• Toy example #1:• Weight matrix is uniform:
...)111(
• Replace my matrix(10k,10k) with outer-product of two vectors:• each 10k long
Reduce Number of Free Parameters: Matrix Factorization
• Want to reduce the number of free parameters in Wij: • Current size:10k * 10K = 100M
• Matrix factorization:• Goal: reduce number of free parameters to ~1M
=
=
...
1
1
1
............
...111
...111
...111
Weigth
• Toy example #1:• Weight matrix is uniform:
...)111(
• Replace my matrix(10k,10k) with outer-product of two vectors:• each 10k long: U(10k), V(10k)• U & V are call factors
= U
( )V
Reduce Number of Free Parameters: Matrix Factorization
=
=
1
1
1
1
9.0
0.18.00.18.0
8.00.18.00.1
0.18.00.18.0
8.00.18.00.1
Weigth
• Toy example #2:• Weight matrix is almost uniform:
)1111(
−+−+
+
1
1
1
1
1.0
)1111( −+−+
⋅= 11 Ud
⋅+ 22 Ud
( )1V ( )2V
jkikk
kij VUdW ⋅⋅= ∑= }1,2{
:Weights
Reduce Number of Free Parameters: Matrix Factorization
=
=
61.0
59.0
30.0
44.0
18.2
94.071.075.014.0
83.084.035.050.0
13.030.042.051.0
10.055.061.074.0
Weigth
)53.058.048.039.0(
• Toy example #3:• Arbitrary weight matrix:
++−−
+
53.0
19.0
42.0
71.0
79.0
)67.003.018.072.0( −−
−+−−
+
59.0
78.0
14.0
14.0
36.0
)16.031.086.038.0( −
−++−
+
04.0
02.0
84.0
54.0
07.0
)50.075.001.043.0( −
• Noticed that: • An arbitrary N*N matrices can be decompose using N set of factors. • Note that amplitude are decreasing: d1 = 2.18 >> d4 = 0.04
• Can approximate weight matrix with a small set of factors
Note on convention for Matrix Factorization:
• Last equation is the definition of SVD (Singular Value Decomposition)
∑=k jkkikij VdUw
• Where factors U’s, V’s are chosen to be normalized:
● Independent from each other:
'if1' kkUUi
ikik ==∑
'if0' kkUUi
ikik ≠=∑• In this talk and in Koren & Bell’s chapter:
• The dk’s are incorporated inside the Uk,Vk:• Just a convention difference
• Where the factors are now normalized as: 'if' kkdUU ki
ikik ==∑
∑=k jkikij VUw
Collaborative Filtering: Factorized Neighborhood Model
• Apply Matrix Factorization to Wij:
∑=
=>}factors{k
jkikij VUw Choose: Nk (number of factors) << N (number of movies)
~200 << 10k-100K
• Scoring: Factorized Neighborhood Model:
)|(|
)({factors}
/~ )( uR
uRjjkuj
kuiui VrUr ∑∑
∈=∗=
∑∑∑ ∑∑ ++∗−== ∈= jk
jkik
ikui uRj
uRjkujk
ikui VUVrUrSSE 22
}gspast_ratin{
2
)(
)|(|
{factors}
)( / λλ
• Cheaper computation with same predictive power
)|(|
)(
/~uRij
uRjujui wrr ∗= ∑
∈• Recall Global Neighborhood Model:
• Where Wij are free parameters
Free parameters: Uui and Vjk
• Modeling:
Collaborative Filtering: SVD ModelSVD: Historical name for Matrix Factorization apply to Rating matrix
• Matrix Factorization apply to rui:
∑=
=>}factors{k
ikukui VUr Choose: Nk (number of factors) << N (number of movies)
~200 << 10-100K
• Scoring:ik
kukui VUr ∗= ∑
={factors}
~
∑∑∑ ∑ ++∗−== = ik
ikuk
ukui
ikk
ukui VUVUrSSE 22
}gspast_ratin{
2
{factors}
)( λλ
• Same predictive power• Not easy to explain recommendation
• Modeling:
Free parameters: Uui and Vjk
The End