WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION.

WXGB6106INFORMATION RETRIEVAL

Week 3RETRIEVAL EVALUATION

INTRODUCTION

Evaluation necessary

Why evaluate ?

What to evaluate?

How to evaluate?

WHY EVALUATE

Need to know the advantages and disadvantages of using a particular IRS . The user should be be able to decide whether he / she wants to use an IRS based on evaluation results.

The user should also be able to decide whether it is cost-effective to use a particular IRS based on evaluation results.

WHAT TO EVALUATEWhat can be measured and should reflect the ability

of the IRS to satisfy user needs. Coverage of system – to what extent the IRS includes

relevant material Time lag – average interval between the time the user

query request is made and the time taken to obtain an answer set.

Form of presentation of output Effort involved on part of user in getting answers to his /

her query request. Recall of the IRS - % of relevant materials actually

retrieved in the answer to a query request. Precision of the IRS - % of retrieved material that is

actually relevant.

HOW TO EVALUATE?

Various methods available.

EVALUATION2 main processes in IR :

User query request/query request/ information query/query retrieval strategy / search request

Answer set / Hits

Need to know whether the documents retrieved in the answer set fulfills the user query request.Evaluation process known as retrieval performance evaluation.Evaluation is based on 2 main components : Test reference collection Evaluation measure.

EVALUATIONTest reference collection consists of :

A collection of documents A set of example information requests A set of relevant documents (provided by

specialists) for each information request

2 interrelated measures – RECALL and PRECISION

RETRIEVAL PERFORMANCE EVALUATION

Relevance Recall and Precision

Parameters defined : I = information requestR = set of relevant documents|R| = number of documents in this setA = document answer set retrieved by the information request|A| = number of documents in this set|Ra| = number of documents in the intersection of sets R and A


Recall = fraction of the relevant documents (set R) which have been retrieved |Ra|

R = -----|R|

Precision = fraction of the retrieved documents (set A) which is relevant

|Ra|P = ----- |A|

Relevant docs in answer set

|Ra|

Relevant docs

|R|

Answer set

|A|

Collection

Precision and Recall for a given example information request


Recall and precision are expressed as %

Sorted by degree of relevance or ranking.

User will see a ranked list.


a. 10 documents in an IRS with a collection of 100 documents has been identified by specialists as being relevant to a particular query request - d3, d5, d9, d25, d39, d44, d56, d71, d89, d123

b. A query request was submitted and the following documents were retrieved and ranked according to relevance.


1. d123*2. d843. d56*4. d65. d86. d9*7. d5118. d1299. d187

10.d25*11.d3812.d4813.d25014.d11315.d3*


c. Only 5 documents retrieved (d123, d56, d9, d25, d3) are relevant to the query and matches the ones in (a).

d123 ranked 1st R=1/10 x 100% = 10%P=1/1 x 100% = 100%

d56 ranked 3rd R=2/10 x 100% = 20%P=2/3 x100% = 66%

d9 ranked 6th R=3/10 x 100% = 30%

P=3/6 x 100% = 50% d25 ranked 10th R=4/10 x 100% = 40%

P=4/10 x 100% = 40% d3 ranked 15th R=5/10 x 100% = 50%

P=5/15 x 100% = 33%

A = relevant documentsÂ = non-relevant documentsC = retrieved documentsĈ = not retrieved documentsN = total number of documents in the system

Relevant Non-relevantRetrieved A C Â CNot retrieved A Ĉ Â Ĉ

Contigency table


Contingency tableN = 100A=10, Ā =90C=15, Ĉ =85

Relevant Non-Relevant

Retrieved 5 15-5=10

Not-Retrieved

10-5=5 100-10-10=80

Recall =5/10X100% = 50% , Precision = 5/15X100% = 33%

OTHER ALTERNATIVE MEASURES

Harmonic mean – a single measure which combines R & PE measures - a single measure which combines R & P, user specifies whether interested in R or PUser-oriented measures - based on a the user’s interpretation of which documents are relevant and which documents are not relevantExpected search lengthSatisfaction – focuses only on relevant docsFrustration – focuses on non-relevant docs

REFERENCE COLLECTIONExperimentations on IR done on test collection – eg. of test collection 1. Yearly conference known as TREC – Text Retrieval Conference Dedicated to experimentation with a large test collection of over 1 million documents, testing is time consuming.For each TREC conference, a set of reference experiments designed and research groups use these reference experiments to compare their IRS TREC NIST site – http://trec.nist.gov

REFERENCE COLLECTION

Collection known as TIPSTERTIPSTER/TREC test collectionCollection composed of DocumentsA set of example information requests ot topicsA set of relevant documents for each example information request

OTHER TEST COLLECTIONS

ADI – documents on information scienceCACM – computer scienceINSPEC – abstracts on electronics, computer and physicsISI – library scienceMedlars – medical articlesdeveloped by E.A. Fox for his PhD thesis at Cornell University, Ithaca, New York in 1883 – Extending the Boolean and vector space models of information retrieval with p-norm queries and multiple concept types – http://www.ncstrl.org

WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION.

Documents

Transcript of WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION.