WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION.
-
date post
22-Dec-2015 -
Category
Documents
-
view
229 -
download
3
Transcript of WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION.
![Page 1: WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d795503460f94a5bdbb/html5/thumbnails/1.jpg)
WXGB6106INFORMATION RETRIEVAL
Week 3RETRIEVAL EVALUATION
![Page 2: WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d795503460f94a5bdbb/html5/thumbnails/2.jpg)
INTRODUCTION
Evaluation necessary
Why evaluate ?
What to evaluate?
How to evaluate?
![Page 3: WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d795503460f94a5bdbb/html5/thumbnails/3.jpg)
WHY EVALUATE
Need to know the advantages and disadvantages of using a particular IRS . The user should be be able to decide whether he / she wants to use an IRS based on evaluation results.
The user should also be able to decide whether it is cost-effective to use a particular IRS based on evaluation results.
![Page 4: WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d795503460f94a5bdbb/html5/thumbnails/4.jpg)
WHAT TO EVALUATEWhat can be measured and should reflect the ability
of the IRS to satisfy user needs. Coverage of system – to what extent the IRS includes
relevant material Time lag – average interval between the time the user
query request is made and the time taken to obtain an answer set.
Form of presentation of output Effort involved on part of user in getting answers to his /
her query request. Recall of the IRS - % of relevant materials actually
retrieved in the answer to a query request. Precision of the IRS - % of retrieved material that is
actually relevant.
![Page 5: WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d795503460f94a5bdbb/html5/thumbnails/5.jpg)
HOW TO EVALUATE?
Various methods available.
![Page 6: WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d795503460f94a5bdbb/html5/thumbnails/6.jpg)
EVALUATION2 main processes in IR :
User query request/query request/ information query/query retrieval strategy / search request
Answer set / Hits
Need to know whether the documents retrieved in the answer set fulfills the user query request.Evaluation process known as retrieval performance evaluation.Evaluation is based on 2 main components : Test reference collection Evaluation measure.
![Page 7: WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d795503460f94a5bdbb/html5/thumbnails/7.jpg)
EVALUATIONTest reference collection consists of :
A collection of documents A set of example information requests A set of relevant documents (provided by
specialists) for each information request
2 interrelated measures – RECALL and PRECISION
![Page 8: WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d795503460f94a5bdbb/html5/thumbnails/8.jpg)
RETRIEVAL PERFORMANCE EVALUATION
Relevance Recall and Precision
Parameters defined : I = information requestR = set of relevant documents|R| = number of documents in this setA = document answer set retrieved by the information request|A| = number of documents in this set|Ra| = number of documents in the intersection of sets R and A
![Page 9: WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d795503460f94a5bdbb/html5/thumbnails/9.jpg)
RETRIEVAL PERFORMANCE EVALUATION
Recall = fraction of the relevant documents (set R) which have been retrieved |Ra|
R = -----|R|
Precision = fraction of the retrieved documents (set A) which is relevant
|Ra|P = ----- |A|
![Page 10: WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d795503460f94a5bdbb/html5/thumbnails/10.jpg)
Relevant docs in answer set
|Ra|
Relevant docs
|R|
Answer set
|A|
Collection
Precision and Recall for a given example information request
![Page 11: WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d795503460f94a5bdbb/html5/thumbnails/11.jpg)
RETRIEVAL PERFORMANCE EVALUATION
Recall and precision are expressed as %
Sorted by degree of relevance or ranking.
User will see a ranked list.
![Page 12: WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d795503460f94a5bdbb/html5/thumbnails/12.jpg)
RETRIEVAL PERFORMANCE EVALUATION
a. 10 documents in an IRS with a collection of 100 documents has been identified by specialists as being relevant to a particular query request - d3, d5, d9, d25, d39, d44, d56, d71, d89, d123
b. A query request was submitted and the following documents were retrieved and ranked according to relevance.
![Page 13: WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d795503460f94a5bdbb/html5/thumbnails/13.jpg)
RETRIEVAL PERFORMANCE EVALUATION
1. d123*2. d843. d56*4. d65. d86. d9*7. d5118. d1299. d187
10.d25*11.d3812.d4813.d25014.d11315.d3*
![Page 14: WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d795503460f94a5bdbb/html5/thumbnails/14.jpg)
RETRIEVAL PERFORMANCE EVALUATION
c. Only 5 documents retrieved (d123, d56, d9, d25, d3) are relevant to the query and matches the ones in (a).
![Page 15: WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d795503460f94a5bdbb/html5/thumbnails/15.jpg)
d123 ranked 1st R=1/10 x 100% = 10%P=1/1 x 100% = 100%
d56 ranked 3rd R=2/10 x 100% = 20%P=2/3 x100% = 66%
d9 ranked 6th R=3/10 x 100% = 30%
P=3/6 x 100% = 50% d25 ranked 10th R=4/10 x 100% = 40%
P=4/10 x 100% = 40% d3 ranked 15th R=5/10 x 100% = 50%
P=5/15 x 100% = 33%
![Page 16: WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d795503460f94a5bdbb/html5/thumbnails/16.jpg)
A = relevant documents = non-relevant documentsC = retrieved documentsĈ = not retrieved documentsN = total number of documents in the system
Relevant Non-relevantRetrieved A C Â CNot retrieved A Ĉ Â Ĉ
Contigency table
![Page 17: WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d795503460f94a5bdbb/html5/thumbnails/17.jpg)
RETRIEVAL PERFORMANCE EVALUATION
Contingency tableN = 100A=10, Ā =90C=15, Ĉ =85
Relevant Non-Relevant
Retrieved 5 15-5=10
Not-Retrieved
10-5=5 100-10-10=80
Recall =5/10X100% = 50% , Precision = 5/15X100% = 33%
![Page 18: WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d795503460f94a5bdbb/html5/thumbnails/18.jpg)
OTHER ALTERNATIVE MEASURES
Harmonic mean – a single measure which combines R & PE measures - a single measure which combines R & P, user specifies whether interested in R or PUser-oriented measures - based on a the user’s interpretation of which documents are relevant and which documents are not relevantExpected search lengthSatisfaction – focuses only on relevant docsFrustration – focuses on non-relevant docs
![Page 19: WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d795503460f94a5bdbb/html5/thumbnails/19.jpg)
REFERENCE COLLECTIONExperimentations on IR done on test collection – eg. of test collection 1. Yearly conference known as TREC – Text Retrieval Conference Dedicated to experimentation with a large test collection of over 1 million documents, testing is time consuming.For each TREC conference, a set of reference experiments designed and research groups use these reference experiments to compare their IRS TREC NIST site – http://trec.nist.gov
![Page 20: WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d795503460f94a5bdbb/html5/thumbnails/20.jpg)
![Page 21: WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d795503460f94a5bdbb/html5/thumbnails/21.jpg)
REFERENCE COLLECTION
Collection known as TIPSTERTIPSTER/TREC test collectionCollection composed of DocumentsA set of example information requests ot topicsA set of relevant documents for each example information request
![Page 22: WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d795503460f94a5bdbb/html5/thumbnails/22.jpg)
OTHER TEST COLLECTIONS
ADI – documents on information scienceCACM – computer scienceINSPEC – abstracts on electronics, computer and physicsISI – library scienceMedlars – medical articlesdeveloped by E.A. Fox for his PhD thesis at Cornell University, Ithaca, New York in 1883 – Extending the Boolean and vector space models of information retrieval with p-norm queries and multiple concept types – http://www.ncstrl.org