Evaluation Metrics - UnimolEvaluation Metrics Presented by Dawn Lawrie 1 Some Possibilities...

28
Evaluation Metrics Presented by Dawn Lawrie 1

Transcript of Evaluation Metrics - UnimolEvaluation Metrics Presented by Dawn Lawrie 1 Some Possibilities...

Page 1: Evaluation Metrics - UnimolEvaluation Metrics Presented by Dawn Lawrie 1 Some Possibilities Precision Recall F-measure Mean Average Precision Mean Reciprocal Rank 2 Precision Proportion

Evaluation MetricsPresented by Dawn Lawrie

1

Page 2: Evaluation Metrics - UnimolEvaluation Metrics Presented by Dawn Lawrie 1 Some Possibilities Precision Recall F-measure Mean Average Precision Mean Reciprocal Rank 2 Precision Proportion

Some PossibilitiesPrecisionRecallF-measureMean Average PrecisionMean Reciprocal Rank

2

Page 3: Evaluation Metrics - UnimolEvaluation Metrics Presented by Dawn Lawrie 1 Some Possibilities Precision Recall F-measure Mean Average Precision Mean Reciprocal Rank 2 Precision Proportion

Precision

Proportion of things of interest in some set

Example: I’m interested in apples

Set

Precision = 3 apples / 5 pieces of fruit

3

Page 4: Evaluation Metrics - UnimolEvaluation Metrics Presented by Dawn Lawrie 1 Some Possibilities Precision Recall F-measure Mean Average Precision Mean Reciprocal Rank 2 Precision Proportion

Recall

Proportion of things of interest in the set out of all the things of interest

Example: I’m looking for apples

Set

Recall = 3 apples / 6 total apples

4

Page 5: Evaluation Metrics - UnimolEvaluation Metrics Presented by Dawn Lawrie 1 Some Possibilities Precision Recall F-measure Mean Average Precision Mean Reciprocal Rank 2 Precision Proportion

F-measure

Harmonic mean of precision and recallCombined measure that values each the same

F1= 2 * precision * recallprecision + recall

5

Page 6: Evaluation Metrics - UnimolEvaluation Metrics Presented by Dawn Lawrie 1 Some Possibilities Precision Recall F-measure Mean Average Precision Mean Reciprocal Rank 2 Precision Proportion

Where to use

The set is well definedOrder of things in the set doesn’t matter

6

Page 7: Evaluation Metrics - UnimolEvaluation Metrics Presented by Dawn Lawrie 1 Some Possibilities Precision Recall F-measure Mean Average Precision Mean Reciprocal Rank 2 Precision Proportion

But with a Ranked List123456789

10

123456789

10

7

Page 8: Evaluation Metrics - UnimolEvaluation Metrics Presented by Dawn Lawrie 1 Some Possibilities Precision Recall F-measure Mean Average Precision Mean Reciprocal Rank 2 Precision Proportion

Mean Average Precision

Also known as MAPFavored IR metric for ranked retrieval

8

Page 9: Evaluation Metrics - UnimolEvaluation Metrics Presented by Dawn Lawrie 1 Some Possibilities Precision Recall F-measure Mean Average Precision Mean Reciprocal Rank 2 Precision Proportion

Let Relevant = Set of Apples

Computing Average Precision

AP Relevant( ) =Precision Rank r( )( )

r∈Relevant∑

Relevant

Ordered list = ranked list

9

Page 10: Evaluation Metrics - UnimolEvaluation Metrics Presented by Dawn Lawrie 1 Some Possibilities Precision Recall F-measure Mean Average Precision Mean Reciprocal Rank 2 Precision Proportion

Let Relevant = Set of Apples

Computing Average Precision

2 3 6 101112

AP Relevant( ) =Precision Rank r( )( )

r∈Relevant∑

Relevant

Ordered list = ranked list

9

Page 11: Evaluation Metrics - UnimolEvaluation Metrics Presented by Dawn Lawrie 1 Some Possibilities Precision Recall F-measure Mean Average Precision Mean Reciprocal Rank 2 Precision Proportion

Let Relevant = Set of Apples

Computing Average Precision

2 3 6 101112

AP Relevant( ) =Precision Rank r( )( )

r∈Relevant∑

Relevant

Ordered list = ranked list

9

Page 12: Evaluation Metrics - UnimolEvaluation Metrics Presented by Dawn Lawrie 1 Some Possibilities Precision Recall F-measure Mean Average Precision Mean Reciprocal Rank 2 Precision Proportion

Let Relevant = Set of Apples

Computing Average Precision

2 3 6 101112

AP Relevant( ) =Precision Rank r( )( )

r∈Relevant∑

Relevant

Ordered list = ranked list

1/2

9

Page 13: Evaluation Metrics - UnimolEvaluation Metrics Presented by Dawn Lawrie 1 Some Possibilities Precision Recall F-measure Mean Average Precision Mean Reciprocal Rank 2 Precision Proportion

Let Relevant = Set of Apples

Computing Average Precision

2 3 6 101112

AP Relevant( ) =Precision Rank r( )( )

r∈Relevant∑

Relevant

Ordered list = ranked list

1/2

9

Page 14: Evaluation Metrics - UnimolEvaluation Metrics Presented by Dawn Lawrie 1 Some Possibilities Precision Recall F-measure Mean Average Precision Mean Reciprocal Rank 2 Precision Proportion

Let Relevant = Set of Apples

Computing Average Precision

2 3 6 101112

AP Relevant( ) =Precision Rank r( )( )

r∈Relevant∑

Relevant

Ordered list = ranked list

1/2 + 2/3

9

Page 15: Evaluation Metrics - UnimolEvaluation Metrics Presented by Dawn Lawrie 1 Some Possibilities Precision Recall F-measure Mean Average Precision Mean Reciprocal Rank 2 Precision Proportion

Let Relevant = Set of Apples

Computing Average Precision

2 3 6 101112

AP Relevant( ) =Precision Rank r( )( )

r∈Relevant∑

Relevant

Ordered list = ranked list

1/2 + 2/3

9

Page 16: Evaluation Metrics - UnimolEvaluation Metrics Presented by Dawn Lawrie 1 Some Possibilities Precision Recall F-measure Mean Average Precision Mean Reciprocal Rank 2 Precision Proportion

Let Relevant = Set of Apples

Computing Average Precision

2 3 6 101112

AP Relevant( ) =Precision Rank r( )( )

r∈Relevant∑

Relevant

Ordered list = ranked list

1/2 + 2/3 + 3/6

9

Page 17: Evaluation Metrics - UnimolEvaluation Metrics Presented by Dawn Lawrie 1 Some Possibilities Precision Recall F-measure Mean Average Precision Mean Reciprocal Rank 2 Precision Proportion

Let Relevant = Set of Apples

Computing Average Precision

2 3 6 101112

AP Relevant( ) =Precision Rank r( )( )

r∈Relevant∑

Relevant

Ordered list = ranked list

1/2 + 2/3 + 3/6 + 4/10 + 5/11 + 6/12

9

Page 18: Evaluation Metrics - UnimolEvaluation Metrics Presented by Dawn Lawrie 1 Some Possibilities Precision Recall F-measure Mean Average Precision Mean Reciprocal Rank 2 Precision Proportion

Compute MAPCompute average over a query set

Apple QueryBlueberry QueryPineapple QueryBanana Query

MAP Query( ) =AP Relevant( q )( )

q∈Query∑

Query

10

Page 19: Evaluation Metrics - UnimolEvaluation Metrics Presented by Dawn Lawrie 1 Some Possibilities Precision Recall F-measure Mean Average Precision Mean Reciprocal Rank 2 Precision Proportion

Limitation of MAP

Results can be biased for query sets that include queries with few relevant documents

11

Page 20: Evaluation Metrics - UnimolEvaluation Metrics Presented by Dawn Lawrie 1 Some Possibilities Precision Recall F-measure Mean Average Precision Mean Reciprocal Rank 2 Precision Proportion

Mean Reciprocal Rank

RR (q ) =

if q retrieves no relevant documents

0

otherwise 1TopRank q( )

!

"

##

$

##

MRR Query( ) =RR (q )

q∈Query∑

Query

12

Page 21: Evaluation Metrics - UnimolEvaluation Metrics Presented by Dawn Lawrie 1 Some Possibilities Precision Recall F-measure Mean Average Precision Mean Reciprocal Rank 2 Precision Proportion

Mean Reciprocal Rank

RR (q ) =

if q retrieves no relevant documents

0

otherwise 1TopRank q( )

!

"

##

$

##

MRR Query( ) =RR (q )

q∈Query∑

Query

Reciprocal Rank

12

Page 22: Evaluation Metrics - UnimolEvaluation Metrics Presented by Dawn Lawrie 1 Some Possibilities Precision Recall F-measure Mean Average Precision Mean Reciprocal Rank 2 Precision Proportion

Understanding MRRRanks

515

13

Page 23: Evaluation Metrics - UnimolEvaluation Metrics Presented by Dawn Lawrie 1 Some Possibilities Precision Recall F-measure Mean Average Precision Mean Reciprocal Rank 2 Precision Proportion

Understanding MRRRanks

515

205215

13

Page 24: Evaluation Metrics - UnimolEvaluation Metrics Presented by Dawn Lawrie 1 Some Possibilities Precision Recall F-measure Mean Average Precision Mean Reciprocal Rank 2 Precision Proportion

Understanding MRRRanks

515

RR values0.2

0.067205215

13

Page 25: Evaluation Metrics - UnimolEvaluation Metrics Presented by Dawn Lawrie 1 Some Possibilities Precision Recall F-measure Mean Average Precision Mean Reciprocal Rank 2 Precision Proportion

Understanding MRRRanks

515

RR values0.2

0.0670.00490.0047

205215

13

Page 26: Evaluation Metrics - UnimolEvaluation Metrics Presented by Dawn Lawrie 1 Some Possibilities Precision Recall F-measure Mean Average Precision Mean Reciprocal Rank 2 Precision Proportion

Understanding MRRRanks

515

RR values0.2

0.0670.00490.0047

Average: 110 MRR: 0.069

205215

13

Page 27: Evaluation Metrics - UnimolEvaluation Metrics Presented by Dawn Lawrie 1 Some Possibilities Precision Recall F-measure Mean Average Precision Mean Reciprocal Rank 2 Precision Proportion

MRR vs. Average RankMRR=MAP when one relevant documentBound result between 0 and 1

1 is perfect retrievalAverage rank greatly influenced by documents retrieved at large ranks

High Ranks does not reflect the importance of those documents in practice

Minimizes difference between 750 and 900

14

Page 28: Evaluation Metrics - UnimolEvaluation Metrics Presented by Dawn Lawrie 1 Some Possibilities Precision Recall F-measure Mean Average Precision Mean Reciprocal Rank 2 Precision Proportion

Take Home MessageP/R and f-measure good for well defined setsMAP good for ranked results when your looking for 5+ thingsMRR good for ranked results when your looking for <5 things and best when just 1 thing

15