Evaluation of multimedia interactive information retrieval€¦ · System expands the query using...

Evaluation of multimediainteractive information retrieval

Introduction to free text information retrieval (IR)• overview of process; • indexing documents;• matching queries; • traditional evaluation.

The problems of multimedia and interaction• Problem of imposed relevance judgements & task support;• Problem of a fixed notion of relevance.

A developing new view of IR evaluation

Summary

Slides: http://www.dcs.gla.ac.uk/~mark/research/talks/mpegparis.pdf

Evaluating modern IR (1)© Mark D Dunlop 1998

The IR process

Compare

Initial queryFind me documents about Alice and conversations.

RankedDocument

Representations

Documents

DocumentRepresentations

Instantiate

User’s information

need

Write

Author’sIdeas

QueryRepresentation

Re-rank

Index1 Index 2

Mark Relevant


Indexing textual collections

“Sometimes, when you are a Bear of very Very Little Brain, and you thinkof Things, you find sometimes that a Thing which seemed very Thingishinside you is quite different when it gets out into the open and has otherpeople looking at it.” -A.A. Milne

Standard steps in indexing textSort the terms alphabetically and convert to lower caseRemove stop wordsStem words to their base formWeight words inversely to how often they occur

ExtensionsPhrase matching Thesaurus expansionProper name detection Disambiguation

ReferencesRijsbergen79: classic text on IR – now on the webSJW97: compilation of classic papers in IR


Querying textual collections

bear, brain, different, find, get, inside, little, look, open, other, out, people,quite, seem, sometime2, thing3, think, very3, when2, you4.

The size of bear brains

Boolean matching“brain* AND bear*”

Term counting with automatic stemming“The size of bear brains”

Vector space matching with IDF

Score documents according to cos(ΘA,B) = A•B

|A| |B|

Probabilistic matchingRetrieve if P(rel | x) > P(non | x)


Traditional evaluation

Create a test collectionSet of documents, queries and relevance judgements

Calculate the following scores for each good document inranked list

Precision = Number of relevant documents in list so far

Number of documents in list so far

Recall = Number of relevant documents in list so far

Number of relevant documents

Average over all queries and plot results1.0

0.8

0.6

0.4

0.2

0.01.00.80.60.40.20.0

Recall

Pre

cisi

on


The IR process

Compare

Initial queryFind me documents about Alice and conversations.

RankedDocument

Representations

Documents

DocumentRepresentations

Instantiate

User’s information

need

Write

Author’sIdeas

QueryRepresentation

Re-rank

Index1 Index 2

Mark Relevant


Relevance feedback: an example of interactive IR

User marks some documents as being relevant

System expands the query using this informationEither using added terms (automatically or semi-automatically) orModel the set of known rel documents and compare others with it.

EvaluationIts interactive => users make decisions based on their information needs.Problem with measuring recall.

Automatic v manual query expansionIn theory: selecting terms to expand the query considerably improvesquery

expansion [Harman88].In practice: situation is less clear: experts may be able to improve on

automatic expansion but novices do not [MvR97].


Term picking relevance feedback interface


Widening the evaluation: Pejtersen’s model

[Pejtersen96]


Problems of multimedia and interactive IR

Textual IR works fairly well based on termsNo easy equivalent for terms exists for images, video, and sound.

Traditional evaluation ignores usersFixed queries and relevance judgements;Task becomes matching expert judgements not finding relevant documents.

Traditional evaluation requires definitive relevancejudgements

Problematic for text;Very unclear for non-text.

Introduction of HCI evaluation methodsTask centred, own tasks, workplace evaluation, think alouds & experiments;But much more expensive than traditional evaluation.


Image relevance test

Is this relevant to a Commission publication on new roadspolicy throughout the EU?

Paul KleeHighway & Byways


Multimedia evaluation: Sample image to index

Grand Floridian, honeymoon hotel


Harmandas’s web art searcher

Using 7 web art collections / 2,609 paintings indexed by neighbouring text.Evaluated with art students using their own queries –

performed well for their queries.


WebMuseum (Paris) / Munch page


Added problem with time-based media

What is a document?

How do you retrieve?Scene detection Texture and coloursSubtitles Speech recognitionPerson identification all expensive and rather dodgey


Summary

SummaryIR started with almost pure system benchmarking;Systems are now good enough we can look beyond this to the user;Methods being inherited from user modelling and HCI worlds and

adapted for a community used to fast and cheap evaluation.

Mira is an ESPRIT working group on evaluation in IR

FutureMeeting coming up in Dublin (October) and Glasgow (April).Four sub-working groups on:

* Photographic retrieval for journalists* How much consensus is found for relevance judgements?* Exploring the possibility of a multimedia test collection* Applying a multi-level evaluation framework to IR

More informationMira home page: http://www.dcs.gla.ac.uk/mira/These slides: http://www.dcs.gla.ac.uk/~mark/research/talks/mpegmira.pdf


Evaluation of multimedia interactive information retrieval€¦ · System expands the query using...

Documents

Transcript of Evaluation of multimedia interactive information retrieval€¦ · System expands the query using...