Measuring Surprise in Recommender Systems - Information …ir.ii.uam.es › redd2014 › program ›...

Measuring Surprise in

Recommender Systems

Marius Kaminskas, Derek Bridge

Workshop on ‘Recommender Systems Evaluation: Dimensions and Design’

October 10, 2014

Introduction

• Beyond-accuracy objectives:

• novelty, diversity, serendipity

• How to measure them?

• user studies: expensive to conduct, small-scale

• offline studies: cheap to conduct, datasets available, but need evaluation metrics

• Our focus: metrics for offline evaluation of serendipity

Serendipity

• “The faculty of making happy and unexpected discoveries by accident” [Oxford English Dictionary]

• Serendipitous item = surprising + relevant

Measuring Recommendation Surprise

• Comparing recommended items to a baseline recommender

• motivation: serendipitous items are difficult to predict

• Measuring the recommended item’s distance from a set of expected items

• motivation: an item is surprising if it is different from

what the user expects

Our Goals

• Investigate alternative surprise metric definitions

• existing approaches exploit average distance between the target item and the set of expected items

• we hypothesize that averaging the distance results in information loss

• Measure surprise of recommendations produced by the state-of-the-art recommendation algorithms

Proposed Surprise Metrics

• Co-occurrence-based surprise

• lower-bound distance variant

• average distance variant

Proposed Surprise Metrics

• Content-based surprise

• lower-bound distance variant

• average distance variant

Experiments

• Datasets: MovieLens 1M and LastFM 1K

• Recommendation algorithms

• matrix factorization, user-based k-NN (k=50), item-based k-NN (k=50)

• Evaluation methodology:

• ‘one plus random’: one 5-star item + 1000 random items

• recommend top-10 items • measure recall and surprise

Results: surprise value comparison

• MF recommendations are the most accurate, but least surprising

• For Scont average and lower-bound distance results are consistent:

• UB recommendations are the most surprising

• For Sco-occ the results are inconsistent:

• sensitivity to rare items results in extreme metric values: close to 1 or -1

• this results in different outcomes for the lower-bound and average distance

metric variants

Results: the impact of user’s profile size

Conclusions

• Results demonstrate the trade-off between

recommendation accuracy and serendipity

• Matrix factorization produces the most accurate but least

surprising

• User-based k-NN produces the least accurate but most

surprising recommendations

• As the user’s profile size increases, information may be

lost when using average distance metric

• Co-occurrence-based metric is sensitive to rare items and

needs to be modified

Future Work

• Comparing the proposed metrics against existing

serendipity metrics

• Measuring other beyond-accuracy objectives – diversity,

novelty, coverage – and their relation to serendipity

• Conducting a user study to confirm effectiveness of the

proposed metrics

Thank you

• Questions?

Measuring Surprise in Recommender Systems - Information …ir.ii.uam.es › redd2014 › program ›...

Documents

Transcript of Measuring Surprise in Recommender Systems - Information …ir.ii.uam.es › redd2014 › program ›...