Open recommendation platform

Post on 15-Jan-2015

1.435 views 2 download

Tags:

description

 

Transcript of Open recommendation platform

Open Recommendation Platform

ACM RecSys 2013, Hong Kong

Torben Brodtplista GmbH

Keynote

International News RecommenderSystems Workshop and Challenge

October 13th, 2013

where● news websites● below the article

different types● content● advertising

Where it’s coming from

Recommendations

Visitors Publisher

* company i am working for

Where it’s coming from

good recommendations for...

User happy!

Advertiser happy!

Publisher happy!

plista* happy!

Where it’s coming from

some years ago

Visitors PublisherContext

Recommendations

Collaborative Filtering

● well known algorithm● more data means more

knowledge

Where it’s coming from

one recommender

Collaborative Filtering

● time● trust● mainstream

Parameter Tuning

2008● finished studies● 1st publication● plista was born

today● 5k recs/second● many publishers

Where it’s coming from

one recommender = good results

"use as many recommenders as possible!"

Where it’s coming from

netflix prize

Collaborative Filtering

Most Popular

Text Similarity

etc ...

Where it’s coming from

more recommenders

● we have one score● lucky success? bad loss?● we needed to keep track

on different recommenders

success: 0.31 %

understanding performance

lost in serendipity

number of● clicks● orders● engages● time on site● money

bad good

understanding performance

how to measure success

10

● features?● big data math?● counting!

for blending we just count floats

understanding performance

evaluation technology

Algo1 1+1 10

Algo2 100 2+5

Algo...

understanding performance

evaluation technology

impressions

collaborative filtering 500 +1

most popular 500

text similarity 500

ZINCRBY"impressions"

"collaborative_filtering"

"1"

ZREVRANGEBYSCORE "impressions"

understanding performance

evaluation technology

impressions

collaborative filtering 500

most popular 500

text similarity 500

clicks

collaborative filtering 100

most popular 10

... 1

needs division

ZREVRANGEBYSCORE "clicks"

ZREVRANGEBYSCORE "impressions"

● CF is "always" the best recommender

● but "always" is just avg of all context

lets check on context!

understanding performance

evaluation resultssuccess

● We like anonymization! We have a big context featured by the web

● URL + HTTP Headers provide○ user agent -> device -> mobile○ IP address -> geolocation○ referer -> origin (search, direct)

Context

Context

consider list of best recommender in each context attribute sorted list for what is relevant by● clicks (content recs)● price (advertising recs)

publisher = welt.de

collaborative filtering 689

most popular 420

text similarity 135

category = archive

text similarity 400

collaborative filtering 200

... 100

hour = 15

recent 80

collaborative filtering 10

... 5

Context

Context

publisher = welt.de

collaborative filtering 689

most popular 420

text similarity 135

weekday = sunday

collaborative filtering 400

most popular 200

... 100category = archive

text similarity 200

collaborative filtering 10

... 5

ZUNION clk ... WEIGHTS p:welt.de:clk 4 w:sunday:clk 1 c:archive:clk 1

ZREVRANGEBYSCORE

"clk"

ZUNION imp ... WEIGHTS p:welt.de:imp 4 w:sunday:imp 1 c:archive:imp 1

ZREVRANGEBYSCORE

"imp"

Context

evaluation context

Context can be used for optimization and targeting.

classical targeting is limitation

Context

Targeting

Recommenders

collaborative filtering 500 +1

most popular 500

text similarity 500

Advertising

RWE Europe 500 +1

IBM Germany 500

Intel Austria 500

Onsite

new iphone su...

500 +1

twitter buys p.. 500

google has seri. 500

Advertising

RWE Europe 500 +1

IBM Germany 500

Intel Austria 500

Recommenders

collaborative filtering 500 +1

most popular 500

text similarity 500

Onsite

new iphone su...

500 +1

twitter buys p.. 500

google has seri. 500

Context

Livecube

context

recap● added another

dimension

result

● better for news: Collaborative Filtering

● better for content: Text Similarity

Context

evaluation contextsuccess

20

what did we get?

● possibly many recommenders

● know how to measure success

● technology to see success

now breath!

● real-time evaluation technology exists

● to choose best algorithm for current context we need to learn: multi-armed bayesian bandit

the ensemble

Data Science

“shuffle” exploration exploitation

temporary success?

No. 1 getting most

local minima?

Interested? Look for Ted Dunning + Bayesian Bandit

● new total / avg is much better

● thx bandit● thx ensemble

more research● timeseries

✓ better results

time

success

✓ easy exploration

● tradeoff (money decision)● between price/time we

“waste” in offline evaluation● and price we loose with

bad recommendations

● minimum pre-testing● no risk if recommender

crashs● "bad" code might find

its context

try and error

● now plista developers can try ideas

● and allow researchers to do the same

collaboration

Ensemble is able to choose

big pool of algorithms

Collaborative Filtering

Most Popular

Text Similarity

Ensemble

BPR-LinearWR-MFSVD++etc.

Research Algorithms

● first and only dataset in news context○ millions of items○ only relevant for short time

● dataset has many attributes !!● many publishers have user intersection

○ regional○ contextual

● real world !!!○ you can guide the user○ you don’t need to follow his route

● real time !!○ This is industry, it has to be usable

researcher has idea

src http://userserve-ak.last.fm/serve/_/7291575/Wickie%2B4775745.jpg

30

... probably hosted by university, plista or any cloud provider?

... needs to start the server

"message bus"● event notifications

○ impression○ click

● error notifications● item updates

train model from it

... api implementation

{ // json

"type": "impression",

"context": {

"simple": {

"27": 418, // publisher

"14": 31721, // widget

...

},

"lists": {

"10": [100, 101] // channel

}

...

}

... package content

api specs hosted at http://orp.plista.com

{ // json

"type": "impression",

"recs": ...

// what was recommended

}

api specs hosted at http://orp.plista.com

... package content

{ // json

"type": "click",

"context": ...

// will include the position

}

... package content

api specs hosted at http://orp.plista.com

recs

{ // json

"recs": {

"int": {

"3": [13010630, 84799192]

// 3 refers to content recommendations

}

...

}

generated by researchersto be shown to real user

API

Real User

Researcher

... reply to recommendation requests

api specs hosted at http://orp.plista.com

recs

Real User

Researcher

● happy user

● happy researcher

● happy plista

research can profit

● real user feedback

● real benchmark

quality is win win #2

use common frameworks

src http://en.wikipedia.org/wiki/Pac-Man

how to build fast system?

● no movies!

● news articles will outdate!

● visitors need the recs NOW

● => handle the data very fast

src http://static.comicvine.com/uploads/original/10/101435/2026520-flash.jpg

quick and fast

● fast web server

● fast network protocol

● fast message queue

● fast storage

or Apache Kafka

"send quickly" technologies

40

"real-time features feel better in a real-time world"

we don't need batch! see http://goo.gl/AJntul

our setup● php, its easy● redis, its fast● r, its well known

comparison to plista

Overview

Publisher

Recommendations

Feedback

Collaborative Filtering

Most Popular

Text Similarity

etc.

Preferences

EnsembleVisitors

● 2012

○ Contest v1

● 2013

○ ACM RecSys “News

Recommender Challenge”

● 2014

○ CLEF News Recommendation

Evaluation Labs “newsreel”

Overview