Median Filtering and Median Filtering and Morphological Filtering
Leveraging User Libraries to Bootstrap Collaborative Filtering€¦ · Leveraging User Libraries to...
Transcript of Leveraging User Libraries to Bootstrap Collaborative Filtering€¦ · Leveraging User Libraries to...
Leveraging User Libraries to BootstrapCollaborative Filtering
Laurent Charlin, Columbia UniversityRichard Zemel, University of Toronto
Hugo Larochelle, Université de Sherbrooke
KDD'14August 2014
Motivation
● Difficult to keep up withnew information– Researcher:
● Hundreds of papers arepublished each year at topconferences
● ArXiv.org proposes several new papers in our fieldevery day
– How can you efficientlyfind all interesting papers?
Solution: Recommendations
● Document recommendation– Scientific articles
● Recommending papers to reviewers● Recommending papers to conference attendees
– Books, music
● Novelty: Leverage the libraries of users– Articles: researchers' previously published papers
– Books & music: purchased items
Desiderata
● Want a model which quickly gives goodrecommendations
● Model which performs well for all users– Both new and frequent users
Number of ratings per user
8
Preference Prediction
● Collaborative filtering:– Intuition: User with similar past
preferences are likely to havesimilar future preferences.
– Uses only user preferences
● Shortcoming: – Cannot deal with new users (cold-start regime)
[Salakhutdinov & Mnih'08]
9
Preference Prediction with Side Information
● Side information:– Any information from user and items excluding
preferences.
– E.g., User demographics, item content
– Advantages: ● Better predictions in cold-start regimes● Other available information may be indicative of
preferences (content information about items)
10
Collaborative Score Topic ModelCSTM
1 ? ? 3 ...
? 0 2 2 ...
ratings
2 1 5 ... 1 0 1 ... 4 1 0 ... W
ord
s 1 0 0 2 0 4 W
ord
s
11
Collaborative Score Topic ModelCSTM
● Twin topic models– Topics are shared
– Topic representationsthen live in the samespace
12
Collaborative Score Topic ModelCSTM
● Match representationof documents ( ) tousers' representations( )
● Useful for Cold-start
13
Collaborative Score Topic ModelCSTM
● Per-user regression ondocument features
● Useful for frequentusers
14
Collaborative Score Topic ModelCSTM
● A graphical model ofuser-item preferencesand textual sideinformation:
● User Libraries● Item Content
CSTM
● Relationship to other models– Degeneracies of CSTM correspond to other useful
model (Language & collaborative filtering models)
● Model is learned using EM– Variational inference
● Non-conjugate model● Mean-field for topic realizations● Dirac delta posterior (MAP) for other parameters
Related Work
● Combining item content with collab. filtering– fLDA [Agarwal & Chen'10]
– Collective Topic Regression [Wang & Blei'11]
● Using (user) side information with collab.filtering– Relational learning via collective matrix
factorization [Singh & Gordon'08]
– Regression-based Latent Factor Models [Agarwal &Chen'09]
Datasets
● Conference datasets– Users are reviewers
● User libraries arereviewers' published paper.
– NIPS'10● 48 users, 1251 items
– ICML'12● 433 users, 861 items
– NIPS'13● 1042 users, 1305 papers
● Book dataset– Users are book readers
● User libraries areusers' purchased books
– Kobo● 316 users, 2601 items
Deep Learning
RL/Planning
Bayesian Non parametrics Graphical Models
NeuroscienceOptimization
Large Margin
Preference prediction results(ICML'12)
Constant
Language Models(SI)
PMF (CF)
LR(SI)
CTR(CF+SI)
CSTM(CF+SI)
RM
SE
Book recommendation results
● CSTM outperformsothers in completelycold-start regimes
● Bag of words islimiting
● Reading interestcannot be representedas a mean book
NIPS-10ICML-12 Books
25
Preference Prediction with TextualSide Information
Test
Per
form
ance
Quantity of available user data
Onlinelearningconditionedon previoususers.
Conclusion & Future Work
● Take away– Good performance both in cold and warm start regimes
– User side-information -> Quickly provide good recommendations● Online recommendations
● Future work– Computational
● Faster inference
– Domains● Legislative, images
– How do you generally model different sources of side-info.● Active elicitation