Download - Information Retrieval Quality of a Search Engine.

Information Retrieval

Quality of a Search Engine

Is it good ? How fast does it index

Number of documents/hour (Average document size)

How fast does it search Latency as a function of index size

Expressiveness of the query language

Measures for a search engine All of the preceding criteria are measurable

The key measure: user happiness…useless answers won’t make a user happy

Happiness: elusive to measure Commonest approach is given by the

relevance of search results How do we measure it ?

Requires 3 elements:1. A benchmark document collection2. A benchmark suite of queries3. A binary assessment of either Relevant or

Irrelevant for each query-doc pair

Evaluating an IR system Standard benchmarks

TREC: National Institute of Standards and Testing (NIST) has run large IR testbed for many years

Other doc collections: marked by human experts, for each query and for each doc, Relevant or Irrelevant

On the Web everything is more complicated since we cannot mark the entire corpus !!

General scenario

Relevant

Retrieved

collection

Precision: % docs retrieved that are relevant [issue “junk” found]

Precision vs. Recall

Relevant

Retrieved

collection

Recall: % docs relevant that are retrieved [issue “info” found]

How to compute them Precision: fraction of retrieved docs that are relevant Recall: fraction of relevant docs that are retrieved

Precision P = tp/(tp + fp) Recall R = tp/(tp + fn)

Relevant Not RelevantRetrieved tp (true positive) fp (false positive)

Not Retrieved

fn (false negative) tn (true negative)

Some considerations Can get high recall (but low precision) by

retrieving all docs for all queries!

Recall is a non-decreasing function of the number of docs retrieved

Precision usually decreases


Relevant

Highest precision, very low recall

Retrieved

Precision: fraction of retrieved docs that are relevantRecall: fraction of relevant docs that are retrieved

Relevant

Lowest precision and recall

Retrieved



Relevant

Low precision and very high recall

Retrieved



Relevant

Very high precision and recall

Retrieved



Precision-Recall curve We measures Precision at various levels of Recall Note: it is an AVERAGE over many queries

precision

recall

x

x

x

x

A common picture

precision

recall

x

x

x

x

Interpolated precision If you can increase precision by increasing

recall, then you should get to count that…

Other measures Precision at fixed recall

most appropriate for web search: 10 results

11-point interpolated average precision The standard measure for TREC: you take

the precision at 11 levels of recall varying from 10% to 100% by 10% of retrieved docs each step, using interpolation, and average them

F measure Combined measure (weighted harmonic mean):

People usually use balanced F1 measure i.e., with = 1 or = ½ thus 1/F = ½ (1/P + 1/R)

Use this if you need to optimize a single measure that balances precision and recall.

RPPR

RP

F

2

2 )1(1)1(1

1