Measuring Surprise in Recommender Systems - Information …ir.ii.uam.es › redd2014 › program ›...
Transcript of Measuring Surprise in Recommender Systems - Information …ir.ii.uam.es › redd2014 › program ›...
Measuring Surprise in
Recommender Systems
Marius Kaminskas, Derek Bridge
Workshop on ‘Recommender Systems Evaluation: Dimensions and Design’
October 10, 2014
Introduction
• Beyond-accuracy objectives:
• novelty, diversity, serendipity
• How to measure them?
• user studies: expensive to conduct, small-scale
• offline studies: cheap to conduct, datasets available, but need evaluation metrics
• Our focus: metrics for offline evaluation of serendipity
Serendipity
• “The faculty of making happy and unexpected discoveries by accident” [Oxford English Dictionary]
• Serendipitous item = surprising + relevant
Measuring Recommendation Surprise
• Comparing recommended items to a baseline recommender
• motivation: serendipitous items are difficult to predict
• Measuring the recommended item’s distance from a set of expected items
• motivation: an item is surprising if it is different from
what the user expects
Measuring Recommendation Surprise
• Comparing recommended items to a baseline recommender
• motivation: serendipitous items are difficult to predict
• Measuring the recommended item’s distance from a set of expected items
• motivation: an item is surprising if it is different from
what the user expects
Our Goals
• Investigate alternative surprise metric definitions
• existing approaches exploit average distance between the target item and the set of expected items
• we hypothesize that averaging the distance results in information loss
• Measure surprise of recommendations produced by the state-of-the-art recommendation algorithms
Proposed Surprise Metrics
• Co-occurrence-based surprise
• lower-bound distance variant
• average distance variant
Proposed Surprise Metrics
• Content-based surprise
• lower-bound distance variant
• average distance variant
Experiments
• Datasets: MovieLens 1M and LastFM 1K
• Recommendation algorithms
• matrix factorization, user-based k-NN (k=50), item-based k-NN (k=50)
• Evaluation methodology:
• ‘one plus random’: one 5-star item + 1000 random items
• recommend top-10 items • measure recall and surprise
Results: surprise value comparison
• MF recommendations are the most accurate, but least surprising
• For Scont average and lower-bound distance results are consistent:
• UB recommendations are the most surprising
• For Sco-occ the results are inconsistent:
• sensitivity to rare items results in extreme metric values: close to 1 or -1
• this results in different outcomes for the lower-bound and average distance
metric variants
Results: the impact of user’s profile size
Conclusions
• Results demonstrate the trade-off between
recommendation accuracy and serendipity
• Matrix factorization produces the most accurate but least
surprising
• User-based k-NN produces the least accurate but most
surprising recommendations
• As the user’s profile size increases, information may be
lost when using average distance metric
• Co-occurrence-based metric is sensitive to rare items and
needs to be modified
Future Work
• Comparing the proposed metrics against existing
serendipity metrics
• Measuring other beyond-accuracy objectives – diversity,
novelty, coverage – and their relation to serendipity
• Conducting a user study to confirm effectiveness of the
proposed metrics
Thank you
• Questions?