A recommendation engine for your php application

Post on 13-Apr-2017

181 views 4 download

Transcript of A recommendation engine for your php application

A recommendation engine for your PHP apps

1) Intro to recommender systems

2) PredictionIO

3) Case Study

Definition: a system that help people finding things when the process of finding what you need is challenging because you have a lot of choices/alternatives

So… it’s a search engine!

Search Engines

Document base is (almost) static

Queries are dynamic

Search Engines

Create an index analyzing the documents

Calculate relevance for a query: tf*idf

Recommender systems

Document base is growing (eg: Netflix)

Query is static: find something I like

Classification

Domain: news, products, …

Helps defining what can be suggested

Purpose: sales, information, education, build a community

What is TripAdvisor purpose?

Personalization levels

• Non personalized: best sellers

• Demographic: age, location

• Ephemeral: based on current activities

• Persistent

Types of input

• Explicit: ask user to rate something

• Implicit: inferred from user behaviour

Output

• Prediction: predicted rating, evaluation

• Recommendations: suggestion list, top-n, offers, promotion

• Filtering: email filters, news articles

A model for comparison

User: people with preference

Items: subject of rating

Rating: expression of opinion

(Community: space where opinions makes sense)

Non-personalized

Best seller

Most popular

Trending

Summary of community ratings: eg best hotel in town

Hotel

Visitor Hotel

Visitor Hotel

Hotel A Hotel B Hotel C

John 3 5

Jane 3

Fred 1 0

Tom 4

AVG 3.5 3 0

Content based

User rate items

We build a model of user preference

Look for similar items based on the model

Action 0.7

Sci Fi 3.2

Vin Diesel 1.2

… …

https://www.amazon.com/Relevant-Search-applications-Solr-Elasticsearch/dp/161729277Xhttp://www.slideshare.net/treygrainger/building-a-real-time-solrpowered-recommendation-engine

Problems/Limitations

Need to know items content

User cold start: time to learn important features for the user

What if user interest change?

Lack of serendipity: accidentally discover something you like

Collaborative filtering

No need to analyze (index) content

Can capture more subtle things

Serendipity

User-User

Select people of my neighborhood with similar taste. If other people share my taste I want their opinion combined

E.T

2 4Joe 2 2 3 ?

1 55 2 4 …

Tom 3 3 2

4 1

User-User: which users have similar tastes?

E.T

2 4Joe 2 2 3 ?

1 55 2 4 …

Tom 3 3 2

4 1

User-User: which users have similar tastes?

Item-Item

Find an items where I have expressed an opinion and look how other people felt about it. Precompute similarities between items

E.T

2 4Joe 2 2 3 ?

1 55 2 4 …

Tom 3 3

4 1

Item-Item: which item are similar?

Problems/Limitations

Sparsity

When recommending from a large item set, users will have rated only some of the items

User Cold start

Not enough known about new user to decide who is similar

Item cold start

Cannot predict ratings for new item till some similar users have rated it [No problem for content-based]

Scalability

With millions of ratings, computations become slow

Dimensionality reduction

Express my opinions as a set of tastes

Compact representation of the matrix with relevant features

Rogue One

1 3 5

Joe 1 2 3

An example

Item1 Item2 Item3 Item4 Item5

Joe 8 1 ? 2 7

Tom 2 ? 5 7 5

Alice 5 4 7 4 7

Bob 7 1 7 3 8

How similar are Joe and Tom? How similar are Joe and Bob?

Only consider items both users have rated

For each item - Calculate difference in the users’ ratings - Take the average of this difference over the items

Item1 Item2 Item3 Item4 Item5

Joe 8 1 ? 2 7

Tom 2 ? 5 7 5

Alice 5 4 7 4 7

Bob 7 1 7 3 8

Sim(Joe, Tom) = (|8-2| + |2-7| + |7-5|)/3 = 13/3 = 4.3 Sim(Alice, Bob) = (|5-7| + |4-1| + |4-3| + |7-8|)/4 = 7/4 = 1.75

Item1 Item2 Item3 Item4 Item5

Joe 8 1 ? 2 7

Tom 2 ? 5 7 5

Alice 5 4 7 4 7

Bob 7 1 7 3 8

Now we have a score or weight for each user

Recommend what similar user have rated highly

To calculate rating of an item to recommend, give weight to each user’s recommendations based on how similar they are to you.

use entire matrix or

use a K-nn algorithm: people who historically have the same tastes as me

aggregate using weighted sum

weights depends on similarity

Item1 Item2 Item3 Item4 Item5

Joe 8 1 ? 2 7

Tom 2 ? 5 7 5

Alice 5 4 7 4 7

Bob 7 1 7 3 8

How similar are Item1 and Item2? How similar are Item1 and Item3?

Only consider items both users have rated

For each item - Calculate difference in ratings for the 2 items - Take the average of this difference over the users

Item1 Item2 Item3 Item4 Item5

Joe 8 1 ? 2 7

Tom 2 ? 5 7 5

Alice 5 4 7 4 7

Bob 7 1 7 3 8

Sim(I1, I2) = (|8-1| + |5-4| + |7-1|)/3 = 14/3 = 4,6

Sim(I1, I3) = (|2-5| + |5-7| + |7-7|)/3 = 5/3 = 1,6

Item1 Item2 Item3 Item4 Item5

Joe 8 1 ? 2 7

Tom 2 ? 5 7 5

Alice 5 4 7 4 7

Bob 7 1 7 3 8

As user-user, use whole matrix or identify neighbors

Cosine similarity

[3,5]

[2,7]

[0,0]

Our domain

Domain: online book shop, both paper and digital

Recommend titles, old and news

- Who bought this also bought

- You might like

Choosing the tool

PredictionIO

Under the Apache umbrella

Based on solid open source stack

Customizable templates engines

SDK for PHP

Installation

http://actionml.com/docs/pio_by_actionml

Pre-baked Amazon AMIs

Installation via source code

http://predictionio.incubator.apache.org/install/install-sourcecode/

You can choose storage

mysql/postgres vs elasticsearch+hbase

The event server

Pattern: user -- action -- item

User 1 purchased product X

User 2 viewed product Y

User 1 added product Z in the cart

$ pio app new MyApp1

[INFO] [App$] Initialized Event Store for this app ID: 1. [INFO] [App$] Created new app: [INFO] [App$] Name: MyApp1 [INFO] [App$] ID: 1 [INFO] [App$] Access Key: 3mZWDzci2D5YsqAnqNnXH9SB6Rg3dsTBs8iHkK6X2i54IQsIZI1eEeQQyMfs7b3F

$ pio eventserver

Server runs on port 7070 by default

$ curl -i -X GET http://localhost:7070

{“status":"alive"}

$ curl -i -X GET “http://localhost:7070/events.json?accessKey=$ACCESS_KEY"

Events modeling

what can/should we model?

rate, like, buy, view, depending on the algorithm

$set , $unset and $delete

_pio* are reserved

setUser($uid, array $properties=array(), $eventTime=null)

unsetUser($uid, array $properties, $eventTime=null)

deleteUser($uid, $eventTime=null)

setItem($iid, array $properties=array(), $eventTime=null)

unsetItem($iid, array $properties, $eventTime=null)

deleteItem($iid, $eventTime=null)

recordUserActionOnItem($event, $uid, $iid, array $properties=array(), $eventTime=null)

createEvent(array $data)

getEvent($eventId)

Engines

D.A.S.E Architecture

Data Source and Preparation

Algorithm

Serving

Evaluation

$ pio template get apache/incubator-predictionio-template-recommender MyRecommendation

$ cd MyRecommendation

engine.json

"datasource": { "params" : { "appName": “MyApp1”, "eventNames": [“buy”, “view”] } },

$ pio build —verbose

$ pio train

$ pio deploy

Getting recommendations

Implementation

2 kind of suggestions

- who bought this also bought (recommendation)

- you may like (similarities)

View

Like (add to basket, add to wishlist)

Conversion (buy)

Recorded in batch

4 engines

2 for books, 2 for ebooks

(not needed now)

Retrained every night with new data

recordLike($user, array $item)

recordConversion($user, array $item)

recordView($user, array $item)

createUser($uid)

getRecommendation($uid, $itype, $n = self::N_SUGGESTION)

getSimilarity($iid, $itype, $n = self::N_SUGGESTION)

user cold start/item cold start

if we don’t get enough suggestion switch to non personalized (also for non logged users)

user cold start/item cold start

if we don’t get enough suggestion switch to non personalized (best sellers)

Michele Orselli CTO@Ideato

_orso_

micheleorselli / ideatosrl

mo@ideato.it

https://joind.in/talk/93d2d

Links• http://www.slideshare.net/NYCPredictiveAnalytics/building-a-recommendation-

engine-an-example-of-a-product-recommendation-engine?next_slideshow=1

• https://www.coursera.org/learn/recommender-systems-introduction

• http://actionml.com/

• https://github.com/grahamjenson/ger