Recommender Systems

31
RECOMMENDATION SYSTEMS Usman Sharif

Transcript of Recommender Systems

Page 1: Recommender Systems

RECOMMENDATION SYSTEMSUsman Sharif

Page 2: Recommender Systems

Why recommendation systems?

Provide a better experience to your users.

Understand the behavior and patterns of users.

Enables an opportunity to re-engage inactive users.

Boost sales Better than a search feature

Page 3: Recommender Systems

How some companies are using Recommendation Systems - Amazon

Page 4: Recommender Systems

How some companies are using Recommendation Systems - Gmail

Page 5: Recommender Systems

A simple recommendation system

Consider the following scenario A library has books and has members Members can have books issued The library wants to build a

recommender system to recommend books to their members

Page 6: Recommender Systems

Scoring Matrices

Book 1 Book 2 Book 3 Book 4

User 1 X X

User 2 X

User 3 X X

User 4 X X X

User 5 X X

Book 1 Book 2 Book 3 Book 4

Book 1 4 1 2 1

Book 2 1 2 0 1

Book 3 2 0 2 1

Book 4 1 1 1 2

Page 7: Recommender Systems

Using the scoring matrices If a user has read Book 1 recommend Book 3,

2, 4. If a user has read Book 2 recommend Book 1,

4, 3. If a user has read Book 3 recommend Book 1,

4, 2. If a user has read Book 4 recommend Book 1,

2, 3.

Page 8: Recommender Systems

Advantages

Very simple to understand and implement.

Works really well if you’re interested in looking at user’s one activity to recommend further.

Page 9: Recommender Systems

Disadvantages

Cannot work for a new user with no history.

In a real world scenario where there are thousands of books and thousands of members, there are bound to be too many zeroes (a sparse matrix).

Does not consider more than 1 item.

Page 10: Recommender Systems

Another Try Our Books records might look like this:BookId

Title Genre Writer Language

1 The Great Gatsby Classic F Scott Fitzgerald

English

2 Nine Stories Short Stories

J D Salinger English

3 The Sun Also Rises Classic Ernest Hemingway

English

4 The Hunger Games Action Suzanne Collins

English

5 The Ambler Warning Thriller Robert Ludlum English

6 The Catcher in the Rye Classic J D Salinger English

7 To Kill a Mockingbird Classic Harper Lee English

Page 11: Recommender Systems

Create an Item Similarity Matrix

Book 1

Book 2

Book 3

Book 4

Book 5

Book 6

Book 7

Book 1

3 1 2 1 1 2 2

Book 2

1 3 1 1 1 2 1

Book 3

2 1 3 1 1 2 2

Book 4

1 1 1 3 1 1 1

Book 5

1 1 1 1 3 1 1

Book 6

2 2 2 1 1 3 2

Book 7

2 1 2 1 1 2 3

• This would always be a square (n x n) matrix.• Each cell has the count of similar attributes (excluding unique attributes).• In general any measure for similarity can be used here.

Page 12: Recommender Systems

To Recommend

Look at what a user has previously read.

Use the values from the similarity matrix and recommend books based on how similar it is to the book the user has already read.

Page 13: Recommender Systems

Advantages

Recommendations can be pre-computed for a very large Item base.

Fast lookups can be built to perform recommendations.

For example, if a user is seeing the page of Book 3, you may want to recommend them Books 1, 6 and 7.

Would work for new/non-registered users.

Page 14: Recommender Systems

Disadvantage

Does not consider the user’s history. Instead looks at a collective trend.

Page 15: Recommender Systems

Another Approach - The Users Our Users records might look like

this:UserId Gender Age Location

1 Male 34 Pakistan

2 Female 28 Pakistan

3 Male 38 India

4 Male 32 India

5 Female 21 Pakistan

6 Female 24 Pakistan

Page 16: Recommender Systems

The User BorrowingUserId BookId

1 3

1 7

2 2

3 1

3 5

3 7

4 6

4 7

5 2

6 4

6 6

6 7

Page 17: Recommender Systems

Transforming User Borrowing

User 1 User 2 User 3 User 4 User 5 User 6

Book 1

X

Book 2

X X

Book 3

X

Book 4

X

Book 5

X

Book 6

X X

Book 7

X X X X

• Issue with too many zero values.• Any solutions?

Page 18: Recommender Systems

Transform the Users Records Consider Age as a discrete column

with ranges like {0-10, 11-20, 21-30, 31-40, …} so that we can create some partitions like this:PartitionId

Gender AgeGroup Location

1 Male 31-40 Pakistan

2 Female 21-30 Pakistan

3 Male 31-40 India

Page 19: Recommender Systems

Recreate User Borrowing using Partition Information

Lesser zero valued records (11/21 compared to 30/42 previously)

Much less columns than we previously had!

The notation has been changed from ‘X’ to count.

Partition 1

Partition 2

Partition 3

Book 1

1

Book 2

2

Book 3

1

Book 4

1

Book 5

1

Book 6

1 1

Book 7

1 1 2

Page 20: Recommender Systems

To Recommend

See what partition a user belongs to. Look at the column of that partition

and sort the books in descending order based on their frequency count.

Page 21: Recommender Systems

Advantages

Continues to improve over time. More partitions can be added over

time. Instead of using a collective scoring,

the technique partitions the user base into ‘similar’ users.

The technique can easily be extended on the item side and rather than having books as rows, we can have book clusters.

Page 22: Recommender Systems

Disadvantages

Needs some seed data to start. Requires some transformations. Can become very complex as the

number of users/items grow.

Page 23: Recommender Systems

Evaluating Performance (Metrics)

Almost any Information Retrieval metric can be used.

Three interesting ones: Accuracy Coverage Normalized Distance Based Performance

Measure (NDPM)

Page 24: Recommender Systems

Accuracy

UserId BookId Rank Response

1 3 1 Yes

1 2 2 No

2 7 1 No

2 5 2 Yes

3 3 1 No

3 7 2 No

• Takes into account the order in which recommendations are shown to users and how they responded to them.

• For rank position = 1:• Acc(1) = # of Positive responses with rank less than

or equal to 1 / total recommendations with rank less than or equal to 1

• Therefore, Acc(1) = 1 / 3 = 33.33%• Similarly, Acc(2) = 2 / 6 = 33.33%

Page 25: Recommender Systems

Coverage Shows the coverage of items that appear in the

recommendations for all users. For rank position = 1:

Cov(1) = Unique items in recommendations with rank less than or equal to 1 / total items.

Therefore, Cov(1) = 2 / 7 = 28.57% Similarly, Cov(2) = 4 / 7 = 57.14%

UserId BookId Rank Response

1 3 1 Yes

1 2 2 No

2 7 1 No

2 5 2 Yes

3 3 1 No

3 7 2 No

Page 26: Recommender Systems

Normalized Distance Based Performance Measure (NDPM)

Assesses the quality of the measure of recommendation system taking into account the ordering in which items are shown.

NDPM = (C- + 0.5 x C+) / Cu

C- - is the number of recommended item pairs where user responded as (No, Yes).

C+ - is the number of recommended item pairs where user responded as (Yes, No).

Cu - is the number of all item pairs where the user’s response was not same.

In our example, C-(1) = 2, C+(1) = 2 and Cu(1) = 4 => NDPM(1) = (2 + 0.5 x 2) / 4 = 75% C-(2) = 0, C+(2) = 1 and Cu(2) = 1 => NDPM(2) = (0 + 0.5 x 1) / 1 = 50% NDPM = (0.75 + 0.5) / 2 = 62.5%

UserId BookId Rank Response

1 3 1 Yes

1 2 2 No

1 7 3 No

1 5 4 Yes

2 3 1 Yes

2 7 2 No

Page 27: Recommender Systems

How to improve results

Ensure that you maintain a list of already seen recommendations for users and don’t recommend them back for some time.

Provide some sort of mechanism to user to provide information about what they’re looking for.

Infer the above from user searches.

Page 28: Recommender Systems

Some standard algorithms

Item Hierarchy You bought a printer, you will also need ink.

Attribute-based recommendations You like reading classics, written by Salinger, you might like

“Catcher in the Rye”. Collaborative Filtering – User-User Similarity

People like you who read “The Hunger Games” also read “The Ambler Warning”.

Collaborative Filtering – Item-Item Similarity You like “Catcher in the Rye” so you will like “Nine Stories”.

Social + Interest Graph Based Your friends like “The Great Gatsby” so you will like “The Great

Gatsby” too. Model Based

Training SVM, LDA, SVD for implicit features.

Page 29: Recommender Systems

Some Tools

Apache Mahout (Java)

Crab (Python)

Easyrec (RESTful API)

Page 30: Recommender Systems

Questions??