Combining Content-based and Collaborative Filtering Department of Computer Science and Engineering,...
-
Upload
rebecca-logan -
Category
Documents
-
view
212 -
download
0
Transcript of Combining Content-based and Collaborative Filtering Department of Computer Science and Engineering,...
Combining Content-based and Collaborative Filtering
Department of Computer Science and Engineering, Slovak University of Technology
Gabriela PolčicováPavol Návrat
Overview
• Information Filtering and its Types• Combined Method• Experiment with Information
Filtering Methods• Conclusions
Information Filtering (1)
– delivery of relevant information to the people who need it
• Types of Information Filtering
– Content-based - for textual documents
– Collaborative - for communities of users
• Interests
– information about interests - stored in profiles
– expressing opinions to documents - ratings
• Ratings {i, j, rij}
– for user i, item j, the value of rating rij
Information Filtering (2)
Filter
Learninginterests
Estimating the value of rating
Choosingrecommendations
Rated items{user, item, value}
Unrated items{user, item}
Recommendations{user, item, estimation}
Content-based Filtering (1)
• Basic idea
– recommending documents based on content and properties of document
• Profile
– consists of keywords with assigned weights
– only documents matching profile are recommended
• Recommendations
– based on objective measurable properties
Content-based Filtering (2)
Documents rated by the user
Documents of interest
Documents unrated by the user
PROFILEKeywords, phrases
with weightsDocuments matching profile=> recommended documents
Documents, ratings
Collaborative Filtering (1)
• Basic idea
– automating “word of mouth”
– leverage opinions of like-minded users while making decisions
• Schema
– collecting users’ opinions
– searching for like-minded users
– making recommendations
Collaborative Filtering (2)
Profile ofcurrentuser
Profile ofuser 1
Profile ofuser 2
Profile ofuser 3
Profile ofuser 4
Profile ofuser 5
Documents fromlike-minded users’
profiles=> recommended
documents
kci =
(rcj - rc) (rij - ri) j Ici
(rcj - rc)2 (rij - ri)2 j Ici j Ici
• Recommendations computation: weighted sum of ratings
rcj = rc +
(rij - ri) kci i Ucj
|kci|i Ucj
Collaborative Filtering (3)
• Similarity measure: Pearson Correlation Coefficient
Combining Content-based and Collaborative Filtering (1)
• Computing of estimates for missing ratings by Content-based Filtering method for each user
• Searching for like-minded users
– computing coefficient kci between current and i-th user (only from ratings)
– computing coefficient kci’ between current and i-th user (from both ratings and estimates)
• New recommendations computation
– using ratings (with coefficients kci) and also ratings with estimates (with coefficient kci’) as weights in weighted sum of ratings and estimates
Datasets for Experiments
• Data:
– EachMovie - users‘ ratings for movies
www.research.digital.com/SRC/eachmovie/
– IMDB - textual information for CBF (movies‘ descriptions)
www.imdb.com/
• Datasets:
– A - ratings from the period up to Mar 1, 1996
(810 ratings from 71 users)
– B - ratings from the period uo to Mar 15, 1996
(2407 ratings from 131 users)
– C - ratings from the period up to Apr 1, 1996
(12290 ratings from 651 users)
EachMovie Data and Constant Method
Percentage of ratings in EachMovie
0%5%10%15%20%25%30%35%40%45%
1 2 3 4 5 6
ratings
A
B
C
• Constant Method rcj = 5
Experiments with Combination of Content-based and Collaborative Filtering (2)
Dataset
Divide dataset into training
set (90%) and test set (10%)
Apply filtering methods and evaluate their performance
Content-basedFiltering method
CollaborativeFiltering method
CombinedFiltering method
recommendations
recommendations
recommendations
test, training sets
test, training sets
Evaluation of methods’ performance
Constantmethodrecommendations
test set
Metrics
• Coverage = percentage of items for which the method is able to compute estimates
• Accuracy =
• F-measure =
• NMAE =
2.Precision.RecallPrecision + Recall
|R L| + |R L||L| + |L|
|R L||R|
|R L||L|
|rij - rij|n.s
Precision =
Recall =
R - set of recommended itemsL - set of liked items
Results of Experiments
Coverage
0,8
0,85
0,9
0,95
1
A B C
Accuracy
0,7
0,75
0,8
0,85
0,9
A B C
F-measure
0,8
0,85
0,9
0,95
1
A B C
F-measure
0,8
0,85
0,9
0,95
1
A B C
CF
CBF
combined
constant
Conclusions
• Combination of content-based and collaborative filtering might help in initial phase
Future work
• Weighting of coefficients
• Comparing method with additional methods
Content-based Filtering - Vector Representation of Documents and Profiles
Wj= (0, … , 0, 0.5 , 0, … , 0, 0.3 , 0, … , 0, 0.2 , 0, … , 0)
profilei = rj .wij
n
j = 1
D = ( … , computer, … , learning, … , machine, …. )
Documentj
computer machine learning
TF-IDF
TF-IDFTF-IDF
W . Profile
|W| . |Profile|Sim(W, Profile) =
Collaborative Filtering - Example
A B C D E F G
current 1 4 5
1 3 5 1 2
2 1 3 2 5
3 5 1 4 5
4 1 4 2 4
5 2 4 2 5
2
kci =
(rcj - rc) (rij - ri) j Ici
(rcj - rc)2 (rij - ri)2 j Ici j Ici
• Recommendations computation: weighted sum of ratings and estimates
rcj = rc +
(rij - ri) kci + (rij - ri) kci’i Ucj
CBF
|kci| + |kci’|
i U’cj
i Ucj i U’cj
Combining Content-based and Collaborative Filtering (2)
• Similarity measure: Pearson Correlation Coefficient
’ ’
’ ’
CBF CBF
CBF CBF
Experiments with Combination of Content-based and Collaborative Filtering (1)
• Content-based Filtering Method (CBF)
– documents and profiles: vector representation - weighted keywords (TF-IDF)
– estimation computation: normalized dot product of document and profile vectors
• Collaborative Filtering (CF)
– Pearson correlation coefficient
– weighted sum of ratings
• Combination of CF and CBF
– Pearson correlation coefficients
– weighted sum of ratings and CBF estimations
• Constant Method (rcj = 5)