Collaborative Filtering Based Recommendationrosta/SocialWeb/CF.pdf · 2008-01-23 · 3...

Collaborative Filtering Based Recommendation

Danielle Hyunsook Lee

January 22, 2008

1

http://www.reel.com

2

http://www.amazon.com

3 Collaborative Filtering Recommender System, Danielle Lee

Classification of Recommender Systems

� Collaborative Filtering Recommender System

� Content-based Recommender System

� Recommendation generated from the features associated with products and the ratings from a user.

� Case-based Recommender System

� A form of content-based recommendation that is well suited to the domain where individual item is described as a well defined set of features (a form of case).

� Hybrid Recommender System

� Combination of two or more recommendation techniques to gain better performance with fewer of the drawbacks of any individual one (Burke, 2002).


Degree of PersonalizationNon-personalized

EphemeralPersistent

Recommendation Taxonomy

Recommendation MethodRaw retrieval

Manually selectedStatistical summarization

Attribute-basedItem-to-item correlation

User-to-user correlation

Recommendation MethodRaw retrieval

Manually selectedStatistical summarization

Attribute-basedItem-to-item correlation

User-to-user correlation

Community Inputs

Targeted Customer Inputs

Schafer, et al. (2001)

Implicit navigationExplicit navigationKeyword/ItemAttributeRatingsPurchase History

Item attributeExternal ItemPopularity

Purchase HistoryRatings

Text Comments

E-store EngineE-store Engine

OutputsSuggestion, PredictionRatings, Reviews

DeliveryPushPullPassive

Response/FeedbackResponse/Feedback

Good Offer for You!!5

What is Collaborative Filtering?

� Originated from the Information Tapestry project at Xerox PARC. � It allows its users to annotate the documents that they read andsystem recommends

� Collaborative Filtering is ‘the process of filtering or evaluating items using the opinions of other people (Schafer, et al., 2007).’

� CF recommends items which are likely interesting to a user based on the evaluation averaging the opinions of people with similar tastes.

� Once a user ‘A’ rates some items, CF calculates correlations between the ratings different users have given the items to find the neighbors. Then by using their opinions of new articles CF predicts new items that the ‘A’ user will like.

� People who agreed in the past, will also agree in the future. 6 Collaborative Filtering Recommender System, Danielle Lee

What is CF recommendation?


What is CF recommendation?


Core Concepts in CF

� User : any individual who provides ratings to a system

� User who provides ratings and user who receive recommendations

� Item : anything for which a human can provide a rating.

� Ex) art, books, CDs, journal articles, music, movie, or vacationdestinations

� Ratings: vote from a user for a item by means of some value

� Scalar/ordinal ratings (5 points likert scale), binary ratings (like/dislike), unary rating (observed/abase of rating)

� Explicit ratings and implicit ratings


User Tasks for Collaborative Filtering

� Help me find new items I might like

� Advise me on a particular item

� Help me find a user (or some users) I might like

� Help our group find something new that we might like

� Help me find a mixture of “new” and “old” items

� Help me with tasks that are specific to this domain.


Properties of Domains

� Are the properties of Data Distribution suitable for CF?� There are many items.

� Most users rate a single item.

� There are more users rating than items to be recommended.

� Skewed rating distribution.

� Are the Underlying Meaning properties suitable for CF?� For each user of the community, there are other users with common needs or tastes.

� Item evaluation requires personal taste

� Items are heterogeneous

� Are these properties of Data Persistence suitable for CF?� Dynamically changing items (e.g. news or job cases)

� Persistent Taste


Motivations for Collaborative Filtering based Recommendations

� Collaborative filtering systems work by people in system, and it is expected that people to be better at evaluating documents than a computed function� Automatic filtering systems VS. Collaborative filter by people -for figuring out which of two cake recipes is “easier to follow (Maltz & Ehrilch (1995))”.

� Completely independent of any machine-readable representation of the objects being recommended.

� Works well for complex objects such as music and movies

� Good way to tell the history and value of information, whether he is the first person to read it, or if he is looking at the most commonly used reference.


Basic Stages to Generate CF Recommendations

� The input for the CF prediction generation algorithms is a matrix of users’ ratings on items, referred as the ratings matrix.

(1) Similarity computation : assessing the similarity of all the users to the active user, i.e., the user for whom a recommendation is searched.

(2) Prediction/Recommendation generation : computing the active user rating prediction. This is done for a target item whose rating is unknown, and is obtained by weighting the ratings of the K most similar users on the target item according to the user-to-user similarity computed at (1).


Similarity Computation

� As user profile, a matrix of users and items is used

Astronomy for Kids

Bagheera: In the Wild

Learning Network

The GeoNet Game

Leonardo Homepage

Bob 4 1 5

Alice 5

Mark 1 5 4 ????

Beatrice 1 5 4 3

Table. Teacher ratings for educational web sites (Walker, et al., 2004)


Prediction/Recommendation Generation

� System response to a user’s request to predict how much they would like a specific item and recommend a set of items to the user.

� Prediction & Recommendation Algorithms.

� Neighborhood-based Algorithm

� Correlation, mean squared difference, personality diagnosis

� To identify active user’s neighborhood and user similarity is computed by every user rating (Computationally expensive approach)

� Non Neighborhood-based Algorithm

� Bayesian Networks


Prediction/Recommendation Generation

� System response to a user’s request to predict how much they would like a specific item and recommend a set of items to the user.

Probabilistic Algorithm

Non-probabilistic Algorithm

User-based Nearest Neighbor

Item-based Nearest Neighbor

DimensionReduction

Prediction/Recommendation Algorithm

Bayesian-NetworkModels

Others


User-based Nearest Neighbor Algorithm

� First, neighbor users are calculated and then generate a prediction for an item i by analyzing ratings for i from users just in u’s neighborhood.

� Skewed neighboring is possible

� Lack of different weight by the properties of item

� Calculating a user’s perfect neighborhood is immensely resource intensive calculations

∑

∑

⊂

⊂

−⋅

+=

)(

)(

),(

)(),(),(

uneighborsn

uneighborsn nni

unusim

rrnusimriupred


Item-based Nearest Neighbor Algorithm

� Generate predictions based on similarities between items.

� Prediction for a user u and item i is composed of a weighted sum of the user u’s ratings for items most similar to i.

� The size of the model as large as the square of the number of items.

� Pruning top n correlations can efficient but can miss the target item.

∑

∑

∈

∈

⋅

=

)(

)(

),(

),(),(

uratedItemsj

uratedItemsjui

jisim

rjisimiupred


Item-based Nearest Neighbor Algorithm

The Matrix Speed SidewaysBrokeback Mountain

User1 5 4 3

User2 4 5 5 3

User3 3 ??? 4

User4 5 3 3 4


Other Non-Probabilistic Algorithms

� Dimensionality Reduction

� Map item space to a smaller number of underlying “dimensions.”

� Expensive offline computation and mathematical complexity

� Association Rule Mining

� Build Models based on commonly occurring patterns in the ratings matrix.


Probabilistic Algorithm

� Bayesian-Network

� Derive probabilistic dependencies among users or items using decision trees.

� Probabilistic Clustering/Dimensionality Reduction Techniques.

� Expectation Maximization (EM) algorithm for CF with Gaussian probability distribution.

� Probabilistic algorithms can produce a probability distribution across possible rating values – information that captures the likelihood of each possible rating value.


Evaluation of Collaborative Filtering System

� To determine the quality of the predictions and recommendations� Accuracy

� Predictive accuracy : the ability of a CF to predict a user’s rating for an item. Mean Absolute Error (MAE) = average absolute difference between the predicted ratings and the actual rating given by a user

� Rank accuracy : precision, half-life utility.

� Novelty / Serendipity (Karypis, 2001)

� Coverage (Sarwar, et. al., 2000)

� Learning Rate (Schein, et. al., 2001)

� Confidence (Herlocker, 1999)

� User Satisfaction (Swearingen & Sinha, 2001; Dahlen, B. J., 1998)

� Site Performance


Problems regarding CF (Cont.)

� Data Sparsity & Ratings scarcity

� The ratings matrix is sparse and only a small fraction of all possible user item entries is known.

� Many CF algorithms have been designed specifically for data sets where there are many more users than items (e.g., the MovieLens data set has 65,000 users and 5,000 movies).

� CF may be inappropriate in a domain where there are many more items than users.

� Implicit vs. explicit ratings



� Problems regarding cold-start.

� New item problem : the fact that if the number of users that rated an item is small, accurate prediction for this item cannotbe generated.

� New user problem : the fact that if the number of items rated by a user is small, it is unlikely that there could be an overlap of items rated by this user and active users. User-to-user similarity cannot be reliably computed.

� New community problem : Without sufficient ratings, it’s hard to differentiate value by personalized CF recommendations.

� Clear reward systems are necessary to convince users to vote or rate items.


Possible solutions for Cold-start Problem

Collaborative Filtering Recommender System, Danielle Lee25

� As the solution for new user problem: � Having the user rate some initial items before they can use service� Displaying non-personalized recommendation until the user has rated enough

� Asking the user to describe their taste in aggregate

� Asking the user for demographic information � Using ratings of other users with similar demographics as recommendations

� As the solution for new item problem: � Recommending items through non-CF techniques content analysis or metadata

� Randomly selecting items with few or no ratings and asking user to rate those items.

� As the solution for new community problem: � Provide ratings incentives to a small “bootstrap” subset of the community, before inviting the entire community.


� Rarely-rated entities : users, items, and user and item pairs with few co-ratings

� Opinionated users : Provided more than 4 ratings and the std. dev. is greater than 1.5

� Black sheep (Peculiar users) : provided more than 4 ratings and for which the average distance of their rating on item i with respect to mean rating of item i is greater than 1

� Niche item : received less than 5 ratings

� Controversial items : received rating whose std. dev. Is greater than 1.5



� Explanation

� “Why was I recommended this item?”

� Most recommender systems are black box approach and need to provide transparency.

� Explanations provide transparency, exposing the reasoning and data behind a recommendation (Herlocker, et al., 2000)

� Benefits of Explanations are

� Transparency, Scrutability, User Involvement, Education, Acceptance, Trust, Effectiveness, Persuasiveness, Satisfaction (Tintarev & Masthoff, 2007)

� Explanations for ‘How’ and ‘Why’ are required (Explanations about model/process error & data error)

� Privacy & Security

� Trust27 Collaborative Filtering Recommender System, Danielle Lee

Problems regarding CF

� Ad hoc user profiles / Copy profile attack� Malicious intent to bias recommendations in their favor.

� Real profile attack case about sex manual

� http://www.news.com/2100-1023-976435.html

� Shilling attacks (profile injection attacks)

� Push attacks

� Nuke attacks

� Robust statistical methods to detect spam or random noise are required.

� M-estimator

� SVD (Singular value decomposition) / new SVD based on Hebbian learning.

� PLSA


“Active/Trusted” Collaborative Filtering

� There is direct connection between people casting rates and the receivers of the recommendations based on the rate results.

� More closer approach to the “word of mouth”

� To search trustable users by exploiting trust propagation over the trust network, not to search similar users as CF (Massa & Avesani, 2007)

� Just providing a trust statement is effective way of bootstrapping RSs for new users with very few ratings.


Trust Networks and Trust Metrics

� Trust Metrics : Algorithms whose goal is to predict, based on the trust network, the trustworthiness of “unknown” users.

� Local Trust Metrics : the very personal and subjective views of the users. Different value of trust in other users for every user � MoleTrust

� Global Trust Metrics : a global “reputation” value that approximates how the community as a whole considers a certain user. � PageRank


Trust-Aware Recommender Architecture

(Massa & Avesani, 2004; Massa & Avesani, 2007)


Hybrid Recommender System

� Combination of Two or more different Recommendation Technologies

� The spaces of possible hybrid recommender systems (Burke, 2007)

32

Weight Mixed Switch FC Cascade FA Meta

CF/CN

CF/DM

CF/KB

CN/CF

CN/DM

CN/KB

DM/CF

DM/CN

DM/KB

KB/CF

KB/CN

KB/DM

FC = Feature Combination, FA = Feature Augmentation, CF = Collaborative,CN = Content-based, DM = Demographic, KB = Knowledge-based

Other Useful Resources� Schafer, J. B., Frankowsky, D., Herlocker, J. & Sen, S. (2007) Collaborative Filtering Recommender Systems, In Brusilovsky, P., Kobsa, A. & Nejdl, W. (Eds.) The Adaptive Web (LNCS 4321), Springer, NY., pp. 291 ~ 324

� Walker, A., Recker, M. M., Lawless, K. & Wiley, D. (2004) Collaborative Information Filtering: a review and an educational, International Journal of Artificial Intelligent in Education, 14, pp. 1 ~ 26

� Herlocker, J. L., Konstan, J. A., Terveen, L. G. & Riedl, J. T. (2001) Evaluating Collaborative Filtering Recommender Systems, ACM Transations Inf. Syst., 22 (1), pp. 5 ~ 53

� Adomavicius, G. & Tuzhilin, A. (2005) Toward the Next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions, IEEE Transactions on Knowledge and Data Engineering, 17 (6), pp. 734 ~ 749

� Burke, R. (2002) Hybrid Recommender Systems: Survey and Experiments, User Modeling and User Adapted Interaction, 12, pp. 331 ~ 370.


Collaborative Filtering Based Recommendationrosta/SocialWeb/CF.pdf · 2008-01-23 · 3...

Documents

Transcript of Collaborative Filtering Based Recommendationrosta/SocialWeb/CF.pdf · 2008-01-23 · 3...