Evaluation Metrics - UnimolEvaluation Metrics Presented by Dawn Lawrie 1 Some Possibilities...
Transcript of Evaluation Metrics - UnimolEvaluation Metrics Presented by Dawn Lawrie 1 Some Possibilities...
Evaluation MetricsPresented by Dawn Lawrie
1
Some PossibilitiesPrecisionRecallF-measureMean Average PrecisionMean Reciprocal Rank
2
Precision
Proportion of things of interest in some set
Example: I’m interested in apples
Set
Precision = 3 apples / 5 pieces of fruit
3
Recall
Proportion of things of interest in the set out of all the things of interest
Example: I’m looking for apples
Set
Recall = 3 apples / 6 total apples
4
F-measure
Harmonic mean of precision and recallCombined measure that values each the same
F1= 2 * precision * recallprecision + recall
5
Where to use
The set is well definedOrder of things in the set doesn’t matter
6
But with a Ranked List123456789
10
123456789
10
7
Mean Average Precision
Also known as MAPFavored IR metric for ranked retrieval
8
Let Relevant = Set of Apples
Computing Average Precision
AP Relevant( ) =Precision Rank r( )( )
r∈Relevant∑
Relevant
Ordered list = ranked list
9
Let Relevant = Set of Apples
Computing Average Precision
2 3 6 101112
AP Relevant( ) =Precision Rank r( )( )
r∈Relevant∑
Relevant
Ordered list = ranked list
9
Let Relevant = Set of Apples
Computing Average Precision
2 3 6 101112
AP Relevant( ) =Precision Rank r( )( )
r∈Relevant∑
Relevant
Ordered list = ranked list
9
Let Relevant = Set of Apples
Computing Average Precision
2 3 6 101112
AP Relevant( ) =Precision Rank r( )( )
r∈Relevant∑
Relevant
Ordered list = ranked list
1/2
9
Let Relevant = Set of Apples
Computing Average Precision
2 3 6 101112
AP Relevant( ) =Precision Rank r( )( )
r∈Relevant∑
Relevant
Ordered list = ranked list
1/2
9
Let Relevant = Set of Apples
Computing Average Precision
2 3 6 101112
AP Relevant( ) =Precision Rank r( )( )
r∈Relevant∑
Relevant
Ordered list = ranked list
1/2 + 2/3
9
Let Relevant = Set of Apples
Computing Average Precision
2 3 6 101112
AP Relevant( ) =Precision Rank r( )( )
r∈Relevant∑
Relevant
Ordered list = ranked list
1/2 + 2/3
9
Let Relevant = Set of Apples
Computing Average Precision
2 3 6 101112
AP Relevant( ) =Precision Rank r( )( )
r∈Relevant∑
Relevant
Ordered list = ranked list
1/2 + 2/3 + 3/6
9
Let Relevant = Set of Apples
Computing Average Precision
2 3 6 101112
AP Relevant( ) =Precision Rank r( )( )
r∈Relevant∑
Relevant
Ordered list = ranked list
1/2 + 2/3 + 3/6 + 4/10 + 5/11 + 6/12
9
Compute MAPCompute average over a query set
Apple QueryBlueberry QueryPineapple QueryBanana Query
MAP Query( ) =AP Relevant( q )( )
q∈Query∑
Query
10
Limitation of MAP
Results can be biased for query sets that include queries with few relevant documents
11
Mean Reciprocal Rank
RR (q ) =
if q retrieves no relevant documents
0
otherwise 1TopRank q( )
!
"
##
$
##
MRR Query( ) =RR (q )
q∈Query∑
Query
12
Mean Reciprocal Rank
RR (q ) =
if q retrieves no relevant documents
0
otherwise 1TopRank q( )
!
"
##
$
##
MRR Query( ) =RR (q )
q∈Query∑
Query
Reciprocal Rank
12
Understanding MRRRanks
515
13
Understanding MRRRanks
515
205215
13
Understanding MRRRanks
515
RR values0.2
0.067205215
13
Understanding MRRRanks
515
RR values0.2
0.0670.00490.0047
205215
13
Understanding MRRRanks
515
RR values0.2
0.0670.00490.0047
Average: 110 MRR: 0.069
205215
13
MRR vs. Average RankMRR=MAP when one relevant documentBound result between 0 and 1
1 is perfect retrievalAverage rank greatly influenced by documents retrieved at large ranks
High Ranks does not reflect the importance of those documents in practice
Minimizes difference between 750 and 900
14
Take Home MessageP/R and f-measure good for well defined setsMAP good for ranked results when your looking for 5+ thingsMRR good for ranked results when your looking for <5 things and best when just 1 thing
15