Combining Content-based and Collaborative Filtering Department of Computer Science and Engineering,...

20
Combining Content-based and Collaborative Filtering Department of Computer Science and Engineering, Slovak University of Technology [email protected] [email protected] Gabriela Polčicová Pavol Návrat

Transcript of Combining Content-based and Collaborative Filtering Department of Computer Science and Engineering,...

Page 1: Combining Content-based and Collaborative Filtering Department of Computer Science and Engineering, Slovak University of Technology polcicova@dcs.elf.stuba.sk.

Combining Content-based and Collaborative Filtering

Department of Computer Science and Engineering, Slovak University of Technology

[email protected]

[email protected]

Gabriela PolčicováPavol Návrat

Page 2: Combining Content-based and Collaborative Filtering Department of Computer Science and Engineering, Slovak University of Technology polcicova@dcs.elf.stuba.sk.

Overview

• Information Filtering and its Types• Combined Method• Experiment with Information

Filtering Methods• Conclusions

Page 3: Combining Content-based and Collaborative Filtering Department of Computer Science and Engineering, Slovak University of Technology polcicova@dcs.elf.stuba.sk.

Information Filtering (1)

– delivery of relevant information to the people who need it

• Types of Information Filtering

– Content-based - for textual documents

– Collaborative - for communities of users

• Interests

– information about interests - stored in profiles

– expressing opinions to documents - ratings

• Ratings {i, j, rij}

– for user i, item j, the value of rating rij

Page 4: Combining Content-based and Collaborative Filtering Department of Computer Science and Engineering, Slovak University of Technology polcicova@dcs.elf.stuba.sk.

Information Filtering (2)

Filter

Learninginterests

Estimating the value of rating

Choosingrecommendations

Rated items{user, item, value}

Unrated items{user, item}

Recommendations{user, item, estimation}

Page 5: Combining Content-based and Collaborative Filtering Department of Computer Science and Engineering, Slovak University of Technology polcicova@dcs.elf.stuba.sk.

Content-based Filtering (1)

• Basic idea

– recommending documents based on content and properties of document

• Profile

– consists of keywords with assigned weights

– only documents matching profile are recommended

• Recommendations

– based on objective measurable properties

Page 6: Combining Content-based and Collaborative Filtering Department of Computer Science and Engineering, Slovak University of Technology polcicova@dcs.elf.stuba.sk.

Content-based Filtering (2)

Documents rated by the user

Documents of interest

Documents unrated by the user

PROFILEKeywords, phrases

with weightsDocuments matching profile=> recommended documents

Documents, ratings

Page 7: Combining Content-based and Collaborative Filtering Department of Computer Science and Engineering, Slovak University of Technology polcicova@dcs.elf.stuba.sk.

Collaborative Filtering (1)

• Basic idea

– automating “word of mouth”

– leverage opinions of like-minded users while making decisions

• Schema

– collecting users’ opinions

– searching for like-minded users

– making recommendations

Page 8: Combining Content-based and Collaborative Filtering Department of Computer Science and Engineering, Slovak University of Technology polcicova@dcs.elf.stuba.sk.

Collaborative Filtering (2)

Profile ofcurrentuser

Profile ofuser 1

Profile ofuser 2

Profile ofuser 3

Profile ofuser 4

Profile ofuser 5

Documents fromlike-minded users’

profiles=> recommended

documents

Page 9: Combining Content-based and Collaborative Filtering Department of Computer Science and Engineering, Slovak University of Technology polcicova@dcs.elf.stuba.sk.

kci =

(rcj - rc) (rij - ri) j Ici

(rcj - rc)2 (rij - ri)2 j Ici j Ici

• Recommendations computation: weighted sum of ratings

rcj = rc +

(rij - ri) kci i Ucj

|kci|i Ucj

Collaborative Filtering (3)

• Similarity measure: Pearson Correlation Coefficient

Page 10: Combining Content-based and Collaborative Filtering Department of Computer Science and Engineering, Slovak University of Technology polcicova@dcs.elf.stuba.sk.

Combining Content-based and Collaborative Filtering (1)

• Computing of estimates for missing ratings by Content-based Filtering method for each user

• Searching for like-minded users

– computing coefficient kci between current and i-th user (only from ratings)

– computing coefficient kci’ between current and i-th user (from both ratings and estimates)

• New recommendations computation

– using ratings (with coefficients kci) and also ratings with estimates (with coefficient kci’) as weights in weighted sum of ratings and estimates

Page 11: Combining Content-based and Collaborative Filtering Department of Computer Science and Engineering, Slovak University of Technology polcicova@dcs.elf.stuba.sk.

Datasets for Experiments

• Data:

– EachMovie - users‘ ratings for movies

www.research.digital.com/SRC/eachmovie/

– IMDB - textual information for CBF (movies‘ descriptions)

www.imdb.com/

• Datasets:

– A - ratings from the period up to Mar 1, 1996

(810 ratings from 71 users)

– B - ratings from the period uo to Mar 15, 1996

(2407 ratings from 131 users)

– C - ratings from the period up to Apr 1, 1996

(12290 ratings from 651 users)

Page 12: Combining Content-based and Collaborative Filtering Department of Computer Science and Engineering, Slovak University of Technology polcicova@dcs.elf.stuba.sk.

EachMovie Data and Constant Method

Percentage of ratings in EachMovie

0%5%10%15%20%25%30%35%40%45%

1 2 3 4 5 6

ratings

A

B

C

• Constant Method rcj = 5

Page 13: Combining Content-based and Collaborative Filtering Department of Computer Science and Engineering, Slovak University of Technology polcicova@dcs.elf.stuba.sk.

Experiments with Combination of Content-based and Collaborative Filtering (2)

Dataset

Divide dataset into training

set (90%) and test set (10%)

Apply filtering methods and evaluate their performance

Content-basedFiltering method

CollaborativeFiltering method

CombinedFiltering method

recommendations

recommendations

recommendations

test, training sets

test, training sets

Evaluation of methods’ performance

Constantmethodrecommendations

test set

Page 14: Combining Content-based and Collaborative Filtering Department of Computer Science and Engineering, Slovak University of Technology polcicova@dcs.elf.stuba.sk.

Metrics

• Coverage = percentage of items for which the method is able to compute estimates

• Accuracy =

• F-measure =

• NMAE =

2.Precision.RecallPrecision + Recall

|R L| + |R L||L| + |L|

|R L||R|

|R L||L|

|rij - rij|n.s

Precision =

Recall =

R - set of recommended itemsL - set of liked items

Page 15: Combining Content-based and Collaborative Filtering Department of Computer Science and Engineering, Slovak University of Technology polcicova@dcs.elf.stuba.sk.

Results of Experiments

Coverage

0,8

0,85

0,9

0,95

1

A B C

Accuracy

0,7

0,75

0,8

0,85

0,9

A B C

F-measure

0,8

0,85

0,9

0,95

1

A B C

F-measure

0,8

0,85

0,9

0,95

1

A B C

CF

CBF

combined

constant

Page 16: Combining Content-based and Collaborative Filtering Department of Computer Science and Engineering, Slovak University of Technology polcicova@dcs.elf.stuba.sk.

Conclusions

• Combination of content-based and collaborative filtering might help in initial phase

Future work

• Weighting of coefficients

• Comparing method with additional methods

Page 17: Combining Content-based and Collaborative Filtering Department of Computer Science and Engineering, Slovak University of Technology polcicova@dcs.elf.stuba.sk.

Content-based Filtering - Vector Representation of Documents and Profiles

Wj= (0, … , 0, 0.5 , 0, … , 0, 0.3 , 0, … , 0, 0.2 , 0, … , 0)

profilei = rj .wij

n

j = 1

D = ( … , computer, … , learning, … , machine, …. )

Documentj

computer machine learning

TF-IDF

TF-IDFTF-IDF

W . Profile

|W| . |Profile|Sim(W, Profile) =

Page 18: Combining Content-based and Collaborative Filtering Department of Computer Science and Engineering, Slovak University of Technology polcicova@dcs.elf.stuba.sk.

Collaborative Filtering - Example

A B C D E F G

current 1 4 5

1 3 5 1 2

2 1 3 2 5

3 5 1 4 5

4 1 4 2 4

5 2 4 2 5

2

Page 19: Combining Content-based and Collaborative Filtering Department of Computer Science and Engineering, Slovak University of Technology polcicova@dcs.elf.stuba.sk.

kci =

(rcj - rc) (rij - ri) j Ici

(rcj - rc)2 (rij - ri)2 j Ici j Ici

• Recommendations computation: weighted sum of ratings and estimates

rcj = rc +

(rij - ri) kci + (rij - ri) kci’i Ucj

CBF

|kci| + |kci’|

i U’cj

i Ucj i U’cj

Combining Content-based and Collaborative Filtering (2)

• Similarity measure: Pearson Correlation Coefficient

’ ’

’ ’

CBF CBF

CBF CBF

Page 20: Combining Content-based and Collaborative Filtering Department of Computer Science and Engineering, Slovak University of Technology polcicova@dcs.elf.stuba.sk.

Experiments with Combination of Content-based and Collaborative Filtering (1)

• Content-based Filtering Method (CBF)

– documents and profiles: vector representation - weighted keywords (TF-IDF)

– estimation computation: normalized dot product of document and profile vectors

• Collaborative Filtering (CF)

– Pearson correlation coefficient

– weighted sum of ratings

• Combination of CF and CBF

– Pearson correlation coefficients

– weighted sum of ratings and CBF estimations

• Constant Method (rcj = 5)