This file is created automatically using Converter based on OpenOffice
Evaluation of multimedia interactive information retrieval€¦ · System expands the query using...
Transcript of Evaluation of multimedia interactive information retrieval€¦ · System expands the query using...
Evaluation of multimediainteractive information retrieval
Introduction to free text information retrieval (IR)• overview of process; • indexing documents;• matching queries; • traditional evaluation.
The problems of multimedia and interaction• Problem of imposed relevance judgements & task support;• Problem of a fixed notion of relevance.
A developing new view of IR evaluation
Summary
Slides: http://www.dcs.gla.ac.uk/~mark/research/talks/mpegparis.pdf
Evaluating modern IR (1)© Mark D Dunlop 1998
The IR process
Compare
Initial queryFind me documents about Alice and conversations.
RankedDocument
Representations
Documents
DocumentRepresentations
Instantiate
User’s information
need
Write
Author’sIdeas
QueryRepresentation
Re-rank
Index1 Index 2
Mark Relevant
Evaluating modern IR (2)© Mark D Dunlop 1998
Indexing textual collections
“Sometimes, when you are a Bear of very Very Little Brain, and you thinkof Things, you find sometimes that a Thing which seemed very Thingishinside you is quite different when it gets out into the open and has otherpeople looking at it.” -A.A. Milne
Standard steps in indexing textSort the terms alphabetically and convert to lower caseRemove stop wordsStem words to their base formWeight words inversely to how often they occur
ExtensionsPhrase matching Thesaurus expansionProper name detection Disambiguation
ReferencesRijsbergen79: classic text on IR – now on the webSJW97: compilation of classic papers in IR
Evaluating modern IR (3)© Mark D Dunlop 1998
Querying textual collections
bear, brain, different, find, get, inside, little, look, open, other, out, people,quite, seem, sometime2, thing3, think, very3, when2, you4.
The size of bear brains
Boolean matching“brain* AND bear*”
Term counting with automatic stemming“The size of bear brains”
Vector space matching with IDF
Score documents according to cos(ΘA,B) = A•B
|A| |B|
Probabilistic matchingRetrieve if P(rel | x) > P(non | x)
Evaluating modern IR (4)© Mark D Dunlop 1998
Traditional evaluation
Create a test collectionSet of documents, queries and relevance judgements
Calculate the following scores for each good document inranked list
Precision = Number of relevant documents in list so far
Number of documents in list so far
Recall = Number of relevant documents in list so far
Number of relevant documents
Average over all queries and plot results1.0
0.8
0.6
0.4
0.2
0.01.00.80.60.40.20.0
Recall
Pre
cisi
on
Evaluating modern IR (5)© Mark D Dunlop 1998
The IR process
Compare
Initial queryFind me documents about Alice and conversations.
RankedDocument
Representations
Documents
DocumentRepresentations
Instantiate
User’s information
need
Write
Author’sIdeas
QueryRepresentation
Re-rank
Index1 Index 2
Mark Relevant
Evaluating modern IR (6)© Mark D Dunlop 1998
Relevance feedback: an example of interactive IR
User marks some documents as being relevant
System expands the query using this informationEither using added terms (automatically or semi-automatically) orModel the set of known rel documents and compare others with it.
EvaluationIts interactive => users make decisions based on their information needs.Problem with measuring recall.
Automatic v manual query expansionIn theory: selecting terms to expand the query considerably improvesquery
expansion [Harman88].In practice: situation is less clear: experts may be able to improve on
automatic expansion but novices do not [MvR97].
Evaluating modern IR (7)© Mark D Dunlop 1998
Term picking relevance feedback interface
Evaluating modern IR (8)© Mark D Dunlop 1998
Widening the evaluation: Pejtersen’s model
[Pejtersen96]
Evaluating modern IR (9)© Mark D Dunlop 1998
Problems of multimedia and interactive IR
Textual IR works fairly well based on termsNo easy equivalent for terms exists for images, video, and sound.
Traditional evaluation ignores usersFixed queries and relevance judgements;Task becomes matching expert judgements not finding relevant documents.
Traditional evaluation requires definitive relevancejudgements
Problematic for text;Very unclear for non-text.
Introduction of HCI evaluation methodsTask centred, own tasks, workplace evaluation, think alouds & experiments;But much more expensive than traditional evaluation.
Evaluating modern IR (10)© Mark D Dunlop 1998
Image relevance test
Is this relevant to a Commission publication on new roadspolicy throughout the EU?
Paul KleeHighway & Byways
Evaluating modern IR (11)© Mark D Dunlop 1998
Multimedia evaluation: Sample image to index
Grand Floridian, honeymoon hotel
Evaluating modern IR (12)© Mark D Dunlop 1998
Harmandas’s web art searcher
Using 7 web art collections / 2,609 paintings indexed by neighbouring text.Evaluated with art students using their own queries –
performed well for their queries.
Evaluating modern IR (13)© Mark D Dunlop 1998
WebMuseum (Paris) / Munch page
Evaluating modern IR (14)© Mark D Dunlop 1998
Added problem with time-based media
What is a document?
How do you retrieve?Scene detection Texture and coloursSubtitles Speech recognitionPerson identification all expensive and rather dodgey
Evaluating modern IR (15)© Mark D Dunlop 1998
Summary
SummaryIR started with almost pure system benchmarking;Systems are now good enough we can look beyond this to the user;Methods being inherited from user modelling and HCI worlds and
adapted for a community used to fast and cheap evaluation.
Mira is an ESPRIT working group on evaluation in IR
FutureMeeting coming up in Dublin (October) and Glasgow (April).Four sub-working groups on:
* Photographic retrieval for journalists* How much consensus is found for relevance judgements?* Exploring the possibility of a multimedia test collection* Applying a multi-level evaluation framework to IR
More informationMira home page: http://www.dcs.gla.ac.uk/mira/These slides: http://www.dcs.gla.ac.uk/~mark/research/talks/mpegmira.pdf
Evaluating modern IR (16)© Mark D Dunlop 1998