Collaborative Filtering Based Recommendationrosta/SocialWeb/CF.pdf · 2008-01-23 · 3...
Transcript of Collaborative Filtering Based Recommendationrosta/SocialWeb/CF.pdf · 2008-01-23 · 3...
Collaborative Filtering Based Recommendation
Danielle Hyunsook Lee
January 22, 2008
1
http://www.reel.com
2
http://www.amazon.com
3 Collaborative Filtering Recommender System, Danielle Lee
Classification of Recommender Systems
� Collaborative Filtering Recommender System
� Content-based Recommender System
� Recommendation generated from the features associated with products and the ratings from a user.
� Case-based Recommender System
� A form of content-based recommendation that is well suited to the domain where individual item is described as a well defined set of features (a form of case).
� Hybrid Recommender System
� Combination of two or more recommendation techniques to gain better performance with fewer of the drawbacks of any individual one (Burke, 2002).
4 Collaborative Filtering Recommender System, Danielle Lee
Degree of PersonalizationNon-personalized
EphemeralPersistent
Recommendation Taxonomy
Recommendation MethodRaw retrieval
Manually selectedStatistical summarization
Attribute-basedItem-to-item correlation
User-to-user correlation
Recommendation MethodRaw retrieval
Manually selectedStatistical summarization
Attribute-basedItem-to-item correlation
User-to-user correlation
Community Inputs
Targeted Customer Inputs
Schafer, et al. (2001)
Implicit navigationExplicit navigationKeyword/ItemAttributeRatingsPurchase History
Item attributeExternal ItemPopularity
Purchase HistoryRatings
Text Comments
E-store EngineE-store Engine
OutputsSuggestion, PredictionRatings, Reviews
DeliveryPushPullPassive
Response/FeedbackResponse/Feedback
Good Offer for You!!5
What is Collaborative Filtering?
� Originated from the Information Tapestry project at Xerox PARC. � It allows its users to annotate the documents that they read andsystem recommends
� Collaborative Filtering is ‘the process of filtering or evaluating items using the opinions of other people (Schafer, et al., 2007).’
� CF recommends items which are likely interesting to a user based on the evaluation averaging the opinions of people with similar tastes.
� Once a user ‘A’ rates some items, CF calculates correlations between the ratings different users have given the items to find the neighbors. Then by using their opinions of new articles CF predicts new items that the ‘A’ user will like.
� People who agreed in the past, will also agree in the future. 6 Collaborative Filtering Recommender System, Danielle Lee
What is CF recommendation?
7 Collaborative Filtering Recommender System, Danielle Lee
What is CF recommendation?
8 Collaborative Filtering Recommender System, Danielle Lee
Core Concepts in CF
� User : any individual who provides ratings to a system
� User who provides ratings and user who receive recommendations
� Item : anything for which a human can provide a rating.
� Ex) art, books, CDs, journal articles, music, movie, or vacationdestinations
� Ratings: vote from a user for a item by means of some value
� Scalar/ordinal ratings (5 points likert scale), binary ratings (like/dislike), unary rating (observed/abase of rating)
� Explicit ratings and implicit ratings
9 Collaborative Filtering Recommender System, Danielle Lee
User Tasks for Collaborative Filtering
� Help me find new items I might like
� Advise me on a particular item
� Help me find a user (or some users) I might like
� Help our group find something new that we might like
� Help me find a mixture of “new” and “old” items
� Help me with tasks that are specific to this domain.
10 Collaborative Filtering Recommender System, Danielle Lee
Properties of Domains
� Are the properties of Data Distribution suitable for CF?� There are many items.
� Most users rate a single item.
� There are more users rating than items to be recommended.
� Skewed rating distribution.
� Are the Underlying Meaning properties suitable for CF?� For each user of the community, there are other users with common needs or tastes.
� Item evaluation requires personal taste
� Items are heterogeneous
� Are these properties of Data Persistence suitable for CF?� Dynamically changing items (e.g. news or job cases)
� Persistent Taste
11 Collaborative Filtering Recommender System, Danielle Lee
Motivations for Collaborative Filtering based Recommendations
� Collaborative filtering systems work by people in system, and it is expected that people to be better at evaluating documents than a computed function� Automatic filtering systems VS. Collaborative filter by people -for figuring out which of two cake recipes is “easier to follow (Maltz & Ehrilch (1995))”.
� Completely independent of any machine-readable representation of the objects being recommended.
� Works well for complex objects such as music and movies
� Good way to tell the history and value of information, whether he is the first person to read it, or if he is looking at the most commonly used reference.
12 Collaborative Filtering Recommender System, Danielle Lee
Basic Stages to Generate CF Recommendations
� The input for the CF prediction generation algorithms is a matrix of users’ ratings on items, referred as the ratings matrix.
(1) Similarity computation : assessing the similarity of all the users to the active user, i.e., the user for whom a recommendation is searched.
(2) Prediction/Recommendation generation : computing the active user rating prediction. This is done for a target item whose rating is unknown, and is obtained by weighting the ratings of the K most similar users on the target item according to the user-to-user similarity computed at (1).
13 Collaborative Filtering Recommender System, Danielle Lee
Similarity Computation
� As user profile, a matrix of users and items is used
Astronomy for Kids
Bagheera: In the Wild
Learning Network
The GeoNet Game
Leonardo Homepage
Bob 4 1 5
Alice 5
Mark 1 5 4 ????
Beatrice 1 5 4 3
Table. Teacher ratings for educational web sites (Walker, et al., 2004)
14 Collaborative Filtering Recommender System, Danielle Lee
Prediction/Recommendation Generation
� System response to a user’s request to predict how much they would like a specific item and recommend a set of items to the user.
� Prediction & Recommendation Algorithms.
� Neighborhood-based Algorithm
� Correlation, mean squared difference, personality diagnosis
� To identify active user’s neighborhood and user similarity is computed by every user rating (Computationally expensive approach)
� Non Neighborhood-based Algorithm
� Bayesian Networks
15 Collaborative Filtering Recommender System, Danielle Lee
Prediction/Recommendation Generation
� System response to a user’s request to predict how much they would like a specific item and recommend a set of items to the user.
Probabilistic Algorithm
Non-probabilistic Algorithm
User-based Nearest Neighbor
Item-based Nearest Neighbor
DimensionReduction
Prediction/Recommendation Algorithm
Bayesian-NetworkModels
Others
16 Collaborative Filtering Recommender System, Danielle Lee
User-based Nearest Neighbor Algorithm
� First, neighbor users are calculated and then generate a prediction for an item i by analyzing ratings for i from users just in u’s neighborhood.
� Skewed neighboring is possible
� Lack of different weight by the properties of item
� Calculating a user’s perfect neighborhood is immensely resource intensive calculations
∑
∑
⊂
⊂
−⋅
+=
)(
)(
),(
)(),(),(
uneighborsn
uneighborsn nni
unusim
rrnusimriupred
17 Collaborative Filtering Recommender System, Danielle Lee
Item-based Nearest Neighbor Algorithm
� Generate predictions based on similarities between items.
� Prediction for a user u and item i is composed of a weighted sum of the user u’s ratings for items most similar to i.
� The size of the model as large as the square of the number of items.
� Pruning top n correlations can efficient but can miss the target item.
∑
∑
∈
∈
⋅
=
)(
)(
),(
),(),(
uratedItemsj
uratedItemsjui
jisim
rjisimiupred
18 Collaborative Filtering Recommender System, Danielle Lee
Item-based Nearest Neighbor Algorithm
The Matrix Speed SidewaysBrokeback Mountain
User1 5 4 3
User2 4 5 5 3
User3 3 ??? 4
User4 5 3 3 4
19 Collaborative Filtering Recommender System, Danielle Lee
Other Non-Probabilistic Algorithms
� Dimensionality Reduction
� Map item space to a smaller number of underlying “dimensions.”
� Expensive offline computation and mathematical complexity
� Association Rule Mining
� Build Models based on commonly occurring patterns in the ratings matrix.
20 Collaborative Filtering Recommender System, Danielle Lee
Probabilistic Algorithm
� Bayesian-Network
� Derive probabilistic dependencies among users or items using decision trees.
� Probabilistic Clustering/Dimensionality Reduction Techniques.
� Expectation Maximization (EM) algorithm for CF with Gaussian probability distribution.
� Probabilistic algorithms can produce a probability distribution across possible rating values – information that captures the likelihood of each possible rating value.
21 Collaborative Filtering Recommender System, Danielle Lee
Evaluation of Collaborative Filtering System
� To determine the quality of the predictions and recommendations� Accuracy
� Predictive accuracy : the ability of a CF to predict a user’s rating for an item. Mean Absolute Error (MAE) = average absolute difference between the predicted ratings and the actual rating given by a user
� Rank accuracy : precision, half-life utility.
� Novelty / Serendipity (Karypis, 2001)
� Coverage (Sarwar, et. al., 2000)
� Learning Rate (Schein, et. al., 2001)
� Confidence (Herlocker, 1999)
� User Satisfaction (Swearingen & Sinha, 2001; Dahlen, B. J., 1998)
� Site Performance
22 Collaborative Filtering Recommender System, Danielle Lee
Problems regarding CF (Cont.)
� Data Sparsity & Ratings scarcity
� The ratings matrix is sparse and only a small fraction of all possible user item entries is known.
� Many CF algorithms have been designed specifically for data sets where there are many more users than items (e.g., the MovieLens data set has 65,000 users and 5,000 movies).
� CF may be inappropriate in a domain where there are many more items than users.
� Implicit vs. explicit ratings
23 Collaborative Filtering Recommender System, Danielle Lee
Problems regarding CF (Cont.)
� Problems regarding cold-start.
� New item problem : the fact that if the number of users that rated an item is small, accurate prediction for this item cannotbe generated.
� New user problem : the fact that if the number of items rated by a user is small, it is unlikely that there could be an overlap of items rated by this user and active users. User-to-user similarity cannot be reliably computed.
� New community problem : Without sufficient ratings, it’s hard to differentiate value by personalized CF recommendations.
� Clear reward systems are necessary to convince users to vote or rate items.
24 Collaborative Filtering Recommender System, Danielle Lee
Possible solutions for Cold-start Problem
Collaborative Filtering Recommender System, Danielle Lee25
� As the solution for new user problem: � Having the user rate some initial items before they can use service� Displaying non-personalized recommendation until the user has rated enough
� Asking the user to describe their taste in aggregate
� Asking the user for demographic information � Using ratings of other users with similar demographics as recommendations
� As the solution for new item problem: � Recommending items through non-CF techniques content analysis or metadata
� Randomly selecting items with few or no ratings and asking user to rate those items.
� As the solution for new community problem: � Provide ratings incentives to a small “bootstrap” subset of the community, before inviting the entire community.
Problems regarding CF (Cont.)
� Rarely-rated entities : users, items, and user and item pairs with few co-ratings
� Opinionated users : Provided more than 4 ratings and the std. dev. is greater than 1.5
� Black sheep (Peculiar users) : provided more than 4 ratings and for which the average distance of their rating on item i with respect to mean rating of item i is greater than 1
� Niche item : received less than 5 ratings
� Controversial items : received rating whose std. dev. Is greater than 1.5
26 Collaborative Filtering Recommender System, Danielle Lee
Problems regarding CF (Cont.)
� Explanation
� “Why was I recommended this item?”
� Most recommender systems are black box approach and need to provide transparency.
� Explanations provide transparency, exposing the reasoning and data behind a recommendation (Herlocker, et al., 2000)
� Benefits of Explanations are
� Transparency, Scrutability, User Involvement, Education, Acceptance, Trust, Effectiveness, Persuasiveness, Satisfaction (Tintarev & Masthoff, 2007)
� Explanations for ‘How’ and ‘Why’ are required (Explanations about model/process error & data error)
� Privacy & Security
� Trust27 Collaborative Filtering Recommender System, Danielle Lee
Problems regarding CF
� Ad hoc user profiles / Copy profile attack� Malicious intent to bias recommendations in their favor.
� Real profile attack case about sex manual
� http://www.news.com/2100-1023-976435.html
� Shilling attacks (profile injection attacks)
� Push attacks
� Nuke attacks
� Robust statistical methods to detect spam or random noise are required.
� M-estimator
� SVD (Singular value decomposition) / new SVD based on Hebbian learning.
� PLSA
28 Collaborative Filtering Recommender System, Danielle Lee
“Active/Trusted” Collaborative Filtering
� There is direct connection between people casting rates and the receivers of the recommendations based on the rate results.
� More closer approach to the “word of mouth”
� To search trustable users by exploiting trust propagation over the trust network, not to search similar users as CF (Massa & Avesani, 2007)
� Just providing a trust statement is effective way of bootstrapping RSs for new users with very few ratings.
29 Collaborative Filtering Recommender System, Danielle Lee
Trust Networks and Trust Metrics
� Trust Metrics : Algorithms whose goal is to predict, based on the trust network, the trustworthiness of “unknown” users.
� Local Trust Metrics : the very personal and subjective views of the users. Different value of trust in other users for every user � MoleTrust
� Global Trust Metrics : a global “reputation” value that approximates how the community as a whole considers a certain user. � PageRank
30 Collaborative Filtering Recommender System, Danielle Lee
Trust-Aware Recommender Architecture
(Massa & Avesani, 2004; Massa & Avesani, 2007)
31 Collaborative Filtering Recommender System, Danielle Lee
Hybrid Recommender System
� Combination of Two or more different Recommendation Technologies
� The spaces of possible hybrid recommender systems (Burke, 2007)
32
Weight Mixed Switch FC Cascade FA Meta
CF/CN
CF/DM
CF/KB
CN/CF
CN/DM
CN/KB
DM/CF
DM/CN
DM/KB
KB/CF
KB/CN
KB/DM
FC = Feature Combination, FA = Feature Augmentation, CF = Collaborative,CN = Content-based, DM = Demographic, KB = Knowledge-based
Other Useful Resources� Schafer, J. B., Frankowsky, D., Herlocker, J. & Sen, S. (2007) Collaborative Filtering Recommender Systems, In Brusilovsky, P., Kobsa, A. & Nejdl, W. (Eds.) The Adaptive Web (LNCS 4321), Springer, NY., pp. 291 ~ 324
� Walker, A., Recker, M. M., Lawless, K. & Wiley, D. (2004) Collaborative Information Filtering: a review and an educational, International Journal of Artificial Intelligent in Education, 14, pp. 1 ~ 26
� Herlocker, J. L., Konstan, J. A., Terveen, L. G. & Riedl, J. T. (2001) Evaluating Collaborative Filtering Recommender Systems, ACM Transations Inf. Syst., 22 (1), pp. 5 ~ 53
� Adomavicius, G. & Tuzhilin, A. (2005) Toward the Next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions, IEEE Transactions on Knowledge and Data Engineering, 17 (6), pp. 734 ~ 749
� Burke, R. (2002) Hybrid Recommender Systems: Survey and Experiments, User Modeling and User Adapted Interaction, 12, pp. 331 ~ 370.
33 Collaborative Filtering Recommender System, Danielle Lee