Setting Goals and Choosing Metrics for Recommender System Evaluations
-
Upload
gunnar-fabritius-schroeder -
Category
Education
-
view
1.379 -
download
0
description
Transcript of Setting Goals and Choosing Metrics for Recommender System Evaluations
Setting Goals and Choosing Metrics for Recommender
System EvaluationsGunnar Schröder, Maik Thiele, Wolfgang Lehner
Gunnar SchröderT-Systems Multimedia SolutionsDresden University of Technology
UCERSTI 2 Workshopat the 5th ACM Conference on
Recommender SystemsChicago, October 23th, 2011
Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder
How Do You Evaluate Recommender Systems?
Qualitative TechniquesQuantitative Techniques
RMSE
MAE
Precision
Recall
Area under the Curve
ROC Curves
Mean Average Precision
F1-Measure
Accuracy Metrics Non-Accuracy Metrics
User-Centric Evaluation
But why do you do it exactly this way?
Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder
Some of the Issues This Paper Tries to Touch
A large variety of metrics have been published Some metrics are highly correlated [Herlocker 2004] Little guidance for evaluating recommenders and choosing
metrics
Which aspects of the usage scenario and the data influence the choice?
Which metrics are applicable? What do these metrics express? What are differences among them? Which metric represents our use-case best? How much do the metrics suffer from biases?
Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder
Factors That Influence the Choice of Evaluation Metrics
Choice of metrics
Preference dataExplicit Implicit Unary Binary Numerical
Recommender task and interactionPrediction Classification Ranking Similarity Presentation
Objectives for recommender usageBusiness goals User interests
Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder
Major Classes of Evaluation Metrics
Prediction Accuracy Metrics Ranking Accuracy Metrics Classification Accuracy Metrics Non-Accuracy Metrics
5.0 4.8 4.7 4.3 3.8 3.2 2.4 2.1 1.6 1.2
Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder
Why Precision, Recall and F1-Measure May Fool You
Ideal recommender (example a – f) vs. Worst-case recommender (ex. g – l )
Four recommendations (R1 – R4) e.g. Precision@4 Ten items with a varying ratio of relevant items (1 – 9 relevant
items)
Precision, recall and F1-measure are very sensitive to the ratio of relevant items
They fail to distinguish between an ideal recommender and a worst-case recommender if the ratio of relevant items is varied
Figure 3
Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder
What is the Ideal Length for a Top-k Recommendation List?
A typical ranking produced by a recommender on a set of ten item with four items being relevant
The length of the top-k recommendation list is varied in examples a (k=1) to j (k=10)
Figure 1
Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder
What is the Ideal Length for a Top-k Recommendation List?
A typical ranking produced by a recommender on a set of ten item with four items being relevant
The length of the top-k recommendation list is varied in examples a (k=1) to j (k=10)
1.
2.
2.
2.
3.
part of Figure 1
Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder
What is the Ideal Length for a Top-k Recommendation List?
A typical ranking produced by a recommender on a set of ten item with four items being relevant
The length of the top-k recommendation list is varied in examples a (k=1) to j (k=10)
Markedness = Precision + InvPrecision – 1 Informedness = Recall + InvRecall – 1 Matthew’s Correlation =
[Powers 2007]
part of Figure 1
1.
1.1. 2.2.
2.3. 3.
3.
3.
3.
Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder
From Simple Classification Measures to Partial Ranking Measures
Moving a single relevant item among the recommenders ranking (examples a - j)
Idea: Consider both classification and ranking for the top-k recommendations
Area under the Curve => Limited Area under the Curve
Boolean Kendall’s Tau => Limited Boolean Kendall’s Tau
Figure 2
Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder
A Further More Complex Example to Study at Home
Conclusions: For classification use markedness, informedness and
Matthew’s correlation instead of precision, recall and F1 measure
Limited area under the curve and limited boolean Kendall’s tau are useful metrics for top-k recommender evaluations
Figure 4
Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder
Conclusion and Contributions
Important aspects that influence the metric choice Objectives for recommender usage Recommender task and interaction Aspects of preference data
Some problems of Precision, Recall and F1-Measure The advantages of markedness, informedness and Matthew’s
correlation
Two new metrics that measure the ranking of a limited top-k list Limited area under the curve, limited boolean Kendall’s tau
Guidelines for choosing a metric (See paper)
Setting Goals and Choosing Metrics for Recommender System Evaluation - Gunnar Schröder
Thank You Very Much!
Do not hesitate to contact me, if you have any questions, comments or answers!
Slides are available via e-mail or slideshare