Leveraging User Libraries to Bootstrap Collaborative Filtering€¦ · Leveraging User Libraries to...

22
Leveraging User Libraries to Bootstrap Collaborative Filtering Laurent Charlin, Columbia University Richard Zemel, University of Toronto Hugo Larochelle, Université de Sherbrooke KDD'14 August 2014

Transcript of Leveraging User Libraries to Bootstrap Collaborative Filtering€¦ · Leveraging User Libraries to...

Leveraging User Libraries to BootstrapCollaborative Filtering

Laurent Charlin, Columbia UniversityRichard Zemel, University of Toronto

Hugo Larochelle, Université de Sherbrooke

KDD'14August 2014

Motivation

● Difficult to keep up withnew information– Researcher:

● Hundreds of papers arepublished each year at topconferences

● ArXiv.org proposes several new papers in our fieldevery day

– How can you efficientlyfind all interesting papers?

Solution: Recommendations

● Document recommendation– Scientific articles

● Recommending papers to reviewers● Recommending papers to conference attendees

– Books, music

● Novelty: Leverage the libraries of users– Articles: researchers' previously published papers

– Books & music: purchased items

? ?

?

Item 1 Item 2 Item 3

User libraries: user purchases, userpreviously-published papers

Data

Desiderata

● Want a model which quickly gives goodrecommendations

● Model which performs well for all users– Both new and frequent users

Number of ratings per user

8

Preference Prediction

● Collaborative filtering:– Intuition: User with similar past

preferences are likely to havesimilar future preferences.

– Uses only user preferences

● Shortcoming: – Cannot deal with new users (cold-start regime)

[Salakhutdinov & Mnih'08]

9

Preference Prediction with Side Information

● Side information:– Any information from user and items excluding

preferences.

– E.g., User demographics, item content

– Advantages: ● Better predictions in cold-start regimes● Other available information may be indicative of

preferences (content information about items)

10

Collaborative Score Topic ModelCSTM

1 ? ? 3 ...

? 0 2 2 ...

ratings

2 1 5 ... 1 0 1 ... 4 1 0 ... W

ord

s 1 0 0 2 0 4 W

ord

s

11

Collaborative Score Topic ModelCSTM

● Twin topic models– Topics are shared

– Topic representationsthen live in the samespace

12

Collaborative Score Topic ModelCSTM

● Match representationof documents ( ) tousers' representations( )

● Useful for Cold-start

13

Collaborative Score Topic ModelCSTM

● Per-user regression ondocument features

● Useful for frequentusers

14

Collaborative Score Topic ModelCSTM

● A graphical model ofuser-item preferencesand textual sideinformation:

● User Libraries● Item Content

CSTM

● Relationship to other models– Degeneracies of CSTM correspond to other useful

model (Language & collaborative filtering models)

● Model is learned using EM– Variational inference

● Non-conjugate model● Mean-field for topic realizations● Dirac delta posterior (MAP) for other parameters

Related Work

● Combining item content with collab. filtering– fLDA [Agarwal & Chen'10]

– Collective Topic Regression [Wang & Blei'11]

● Using (user) side information with collab.filtering– Relational learning via collective matrix

factorization [Singh & Gordon'08]

– Regression-based Latent Factor Models [Agarwal &Chen'09]

Experiments

Datasets

● Conference datasets– Users are reviewers

● User libraries arereviewers' published paper.

– NIPS'10● 48 users, 1251 items

– ICML'12● 433 users, 861 items

– NIPS'13● 1042 users, 1305 papers

● Book dataset– Users are book readers

● User libraries areusers' purchased books

– Kobo● 316 users, 2601 items

Deep Learning

RL/Planning

Bayesian Non parametrics Graphical Models

NeuroscienceOptimization

Large Margin

Preference prediction results(ICML'12)

Constant

Language Models(SI)

PMF (CF)

LR(SI)

CTR(CF+SI)

CSTM(CF+SI)

RM

SE

Book recommendation results

● CSTM outperformsothers in completelycold-start regimes

● Bag of words islimiting

● Reading interestcannot be representedas a mean book

NIPS-10ICML-12 Books

25

Preference Prediction with TextualSide Information

Test

Per

form

ance

Quantity of available user data

Onlinelearningconditionedon previoususers.

26

NIPS'13 recommendation system

● Provided paper/poster recos to NIPS reviewers

Conclusion & Future Work

● Take away– Good performance both in cold and warm start regimes

– User side-information -> Quickly provide good recommendations● Online recommendations

● Future work– Computational

● Faster inference

– Domains● Legislative, images

– How do you generally model different sources of side-info.● Active elicitation