Ronny lempelyahooindiabigthinkerapril2013

Recommendation Challenges in Web Media

Settings

Ronny Lempel

Yahoo! Labs, Haifa, Israel

• Pioneered in the mid/late 90s by Amazon

Recommender Systems

- 1 - Yahoo! Confidential

• Today applied “everywhere”

• Shopping sites

• Content sites (news, sports, gossip, …)

• Multimedia streaming services (videos, music)

• Social networks

• Easily merit a dedicated academic course

Bangalore/Mumbai 2013

Recommendation in Social Networks

- 2 - Yahoo! ConfidentialBangalore/Mumbai 2013

• 1988: Random House releases “Touching the Void”, a book by a mountain climber detailing a harrowing account of near death in the Andes

– It got good reviews but modest commercial success

Recommender Systems – Example of Effectiveness

• 1999: “Into Thin Air”, another mountain-climbing tragedy


• 1999: “Into Thin Air”, another mountain-climbing tragedy book, becomes a best-seller

• By virtue of Amazon’s recommender system, “Touching the Void” started to sell again, prompting Random House to rush out a new edition

– A revised paperback edition spent 14 weeks on the New York Times bestseller list

From “The Long Tail”, by Chris Anderson

Slides 4-6 courtesy of Yehuda Koren, member of Challenge winners

The Netflix Challenge


of Challenge winners “Bellkor’s Pragmatic Chaos”


“We’re quite curious, really. To the tune of one million dollars.” – Netflix Prize rules

• Goal was to improve on Netflix’ existing movie recommendation technology

• The open-to-the-public contest began October 2, 2006; winners announced September 2009

• Prize

– Based on reduction in root mean squared error (RMSE) on test data


– Based on reduction in root mean squared error (RMSE) on test data

– $1 million grand prize for 10% improvement on Cinematch result

– $50K 2007 progress prize for 8.43% improvement

– $50K 2008 progress prize for 9.44% improvement

• Netflix gets full rights to use IP developed by the winners

– Example of Crowdsourcing – Netflix basically got over 100 researcher years (and good publicity) for $1.1M


scoremovieuser

1211

52131

43452

41232

37682

movieuser

?621

?961

?72

?32

?473

Training data Test data

• Training data– 100 million

ratings– 480,000 users– 17,770 movies– 6 years of data:

2000-2005

Netflix Movie Ratings Data


37682

5763

4454

15685

23425

22345

5766

4566

?473

?153

?414

?284

?935

?745

?696

?836

2000-2005• Test data

– Last few ratings of each user (2.8 million)

• Dates of ratings are given


• Consider a matrix R of users and the items they’ve consumed– Users correspond to the rows of R, products to its columns, with

ri,j=1 whenever person i consumed item j

– In other cases, ri,j might be the rating given by person i on item j

• The matrix R is typically very sparse– …and often very large

Recommender Systems – Mathematical Abstraction

Items


– …and often very large

users

R =

Items

|U| x |I|

• Real-life task: top-k recommendation– From among the items that weren’t

consumed by each user, predict which ones the user would most enjoy

• Related task on ratings data: matrix completion– Predict users’ ratings for items they have

yet to rate, i.e. “complete” missing values


At a high level, two main techniques:

• Content-based recommendation: characterizes the affinity of users to certain features (content, metadata) of their preferred items

– Lots of classification technology under the hood

• Collaborative Filtering: exploits similar consumption

Types of Recommender Systems


• Collaborative Filtering: exploits similar consumption and preference patterns between users

– See next slides

• Many state of the art systems combine both techniques


• Compute the similarity of items [users] to each other

– Items are considered similar when users tend to rate them similarly or to co-consume them

– Users are considered similar when they tend to co-consume items or rate items similarly

• Recommend to a user:

Collaborative Filtering – Neighborhood Models


• Recommend to a user:

– Items similar to items he/she has already consumed [rated highly]

– Items consumed [rated highly] by similar users

• Key questions:

– How exactly to define pair-wise similarities?

– How to combine them into quality recommendations?


• Latent factor models (LFM):

– Maps both users and items to some f-dimensional space Rf, i.e. produce f-dimensional vectors vu and wi for each user and items

– Define rating estimates as inner products: qij = <vi,wj>

– Main problem: finding a mapping of users and items to the latent factor space that produces “good” estimates

Collaborative Filtering – Matrix Factorization


– Closely related to dimensionality reduction techniques of the ratings matrix R (e.g. Singular Value Decomposition)

users

R =

Items

≈

|U| x |I| |U| x f f x |I|

V

W


Web Media Sites


• Good recommendations require observed data on the user being recommended to [the items being recommended]– What did the user consume/enjoy before?

– Which users consumed/enjoyed this item before?

• User cold start: what happens when a new user arrives to a system? – How can the system make a good “first impression”?

Challenge: Cold Start Problems


– How can the system make a good “first impression”?

• Item cold start: how do we recommend newly arrived items with little historic consumption?


• In certain settings, items are ephemeral – a significant portion of their lifetime is spent in cold-start state– E.g. news recommendation

Low False-Positive Costs

False positive: recommending an irrelevant item

• Consequence, in media sites: a bit of lost time– As opposed to lots of lost time or money in other settings

• Opportunity: better address cold-start issues

• Item cold-start: show new item to select group of users whose feedback should help in modeling it to everyone


whose feedback should help in modeling it to everyone– Note the very short item life times in news cycles

• User cold-start: more aggressive exploration– Vs. playing it safe and perpetuating popular items

• Search: injecting randomization into the ranking of search results (Pandey et al., VLDB 2005)


Challenge: Inferring Negative Feedback

• In many recommendation settings we only know which items users have consumed, not whether they liked them– I.e. no explicit ratings data

• What can we infer about satisfaction of consumed items from observing other interactions with the content?– Web pages: what happens after the initial click?

– Short online videos: what happens after pressing “play”?


– Short online videos: what happens after pressing “play”?

– TV programs: zapping patterns

• What can we infer about items the user did not consume?

• Was the user even aware of the items he/she did not consume?– What items did the recommender system expose the user to?


Presentation Bias’ Effect on Media Consumption

• Pop Culture: items’ longevity creates familiarity

• Media sites: items are ephemeral, and users are mostly


• Media sites: items are ephemeral, and users are mostly unaware of items the site did not expose them to

• Presentation bias obscures users’ true taste – they essentially select the best of the little that was shown

• Must correctly account for presentation bias when modeling: seen and not selected ≠ not seen and not selected

• Search: negative interpretation of “skipped” search results (Joachims, KDD’2002)


Layouts of Recommendation Modules


• Interpreting interactions in vertical layouts is “easy” using the “skips” paradigm

• What about 2D, tabbed, horizontal layouts?


Layouts of Recommendation Modules

• What about multiple presentation formats?


presentation formats?


Personalized


Contextual

Popular


Contextualized, Personalization, Popular

• Web media sites often display links to additional stories on each article page– Matching the article’s context, matching the user, consumed by

the user’s friends, popular

• When creating a unified list for a given a user reading a specific page, what should be the relative importance of matching the additional stories to the page vs. matching


matching the additional stories to the page vs. matching to the user?

• Ignoring story context might create offending recommendations

• Related direction: Tensor Factorization, Karatzoglou et. al, RecSys’2010


Challenge: Incremental Collaborative Filtering

• In a live system, we often cannot afford to recomputerecommendations regularly over the entire history

• Problem: neither neighborhood models nor matrix factorization models easily lend themselves to faithful incremental processing


• Is there a model aggregation function f(Mprev, Mcurr) that is “good enough”?

T

User-Item

Interactions

t1

User-Item

Interactions

t2

User-Item

Interactions

t3

…

Mi = CF-ALG(ti)

∀f, f { M1, M2 } ≠ CF_ALG(t1∪t2)


Challenge: Repeated Recommendations

• One typically doesn’t buy the same book twice, nor do people typically read the same news story twice

• But people listen to the songs they like over and over again, and watch movies they like multiple times as well

• When and how frequently is it ok to recommend an item that was already consumed?


• On the other hand, when should we stop showing a recommendation if the user doesn’t act upon it?

• Implication: a recommendation system may not only need to track aggregated consumption to-date,– It may need to track consumption timelines

– It may need to track recommendation history


Challenge: Recommending Sets & Sequences of Items

• In some domains, users consume multiple items in rapid succession (e.g. music playlists)– Recent works: WWW’2012 (Aizenberg et al., sets) and KDD’2012

(Chen et al., sequences)

• From Independent utility of recommendations to set or sequence utility, predicting items that “go well together”– Sometimes need to respect constraints


– Sometimes need to respect constraints

• Tiling recommendations: in TV Watchlist generation, the broadcast schedules further complicates matters due to program overlaps

• Perhaps a new domain of constrained recommendations?

• Search: result set attributes (e.g. diversity) in Search (Agrawal et al., WSDM’2009)

• Netflix tutorial at RecSys’2012: diversity is key @Netflix


Social Networks and Recommendation Computation

• Some are hailing social networks as a silver bullet for recommender systems– Tell me who your friends are and we’ll tell

you what you like

• Is it really the case that we like the same media as our friends?

• Affinity trumps friendship!


• Not to be confused with non-friendship social networks, where connections are affinity related (Epinions)

• Affinity trumps friendship!– There are people out there who are “more

like us” than our limited set of friends

– Once affinity is considered, the marginal value of social connections is often negligible

RecSys 202Bangalore/Mumbai 2013

Social Networks and Recommendation Consumption

• Previous slide nonewithstanding, “social” is a great motivator for consuming recommendations– People like you rate “Lincoln” very highly vs.

– Your friends Alice and Bob saw “Lincoln” last night and loved it

• Explaining recommendations for motivating and increasing consumption is an emerging practice

• Some commercial systems completely separate their


• Some commercial systems completely separate their explanation generation from their recommendation generation

– So Alice and Bob may not be why the system recommended “Lincoln” to you, but they will be leveraged to get you to watch it

• Privacy in the face of joint consumption of a personalized experience?

RecSys 202Bangalore/Mumbai 2013

Questions, Comments?

Thank you!


rlempel (at) yahoo-inc dot com

Ronny lempelyahooindiabigthinkerapril2013

Education

Transcript of Ronny lempelyahooindiabigthinkerapril2013