The SearchMaster's Toolbox ECIR Industry Day 01 Apr 2010 David Hawking.

28
The SearchMaster's Toolbox ECIR Industry Day 01 Apr 2010 David Hawking

Transcript of The SearchMaster's Toolbox ECIR Industry Day 01 Apr 2010 David Hawking.

Page 1: The SearchMaster's Toolbox ECIR Industry Day 01 Apr 2010 David Hawking.

The SearchMaster's Toolbox

ECIR Industry Day 01 Apr 2010

David Hawking

Page 2: The SearchMaster's Toolbox ECIR Industry Day 01 Apr 2010 David Hawking.
Page 3: The SearchMaster's Toolbox ECIR Industry Day 01 Apr 2010 David Hawking.
Page 4: The SearchMaster's Toolbox ECIR Industry Day 01 Apr 2010 David Hawking.

UK Customers• From 2004/5: Staffordshire University,

Scottish Care Commission

• From 2009:The Electoral Commission, Digital UK, Hargreaves Lansdown

• From 2010: London School of Economics and Political Science, Incisive Media, British Medical Journal, East Ayrshire Council, ...

Page 5: The SearchMaster's Toolbox ECIR Industry Day 01 Apr 2010 David Hawking.

“Search is life”

Page 6: The SearchMaster's Toolbox ECIR Industry Day 01 Apr 2010 David Hawking.

Costs of poor search• Butler Group: Up to 10% of salary costs

wasted through ineffective search• IDC: A company with 1000 information

workers can expect to waste more than $5M p.a. due to poor search

• Accenture: A survey of 1000 middle managers spend as long as 2 hrs/day searching for information.

Page 7: The SearchMaster's Toolbox ECIR Industry Day 01 Apr 2010 David Hawking.

Who's the SearchMaster in your organisation?

Page 8: The SearchMaster's Toolbox ECIR Industry Day 01 Apr 2010 David Hawking.

Stakeholders expect every SearchMaster to do her duty!

• To make external website search work– Sales conversions– Information dissemination– Reduced inquiry handling load

• To provide effective search of corporate information– Happy, productive employees (plus students

and other stakeholders)

Page 9: The SearchMaster's Toolbox ECIR Industry Day 01 Apr 2010 David Hawking.

Give them the tools and they will do the job!

• Searchmaster• End-user

• Simple• Powerful

Page 10: The SearchMaster's Toolbox ECIR Industry Day 01 Apr 2010 David Hawking.

1. The basic search tool• Should:

– Have good performance out of the box, without weeks of implementation.

– Be simple to configure– Avoid features which are too complex to use or

set up.– Be able to cover your content and scale to the

necessary level

Page 11: The SearchMaster's Toolbox ECIR Industry Day 01 Apr 2010 David Hawking.

2. FineTuner• Every search deployment is different

– Web, database, fileshare, Lotus

• The weighting of ranking features must accommodate to the differences

• Manual tweaking is fraught with danger– Fix one query, break a dozen

• Make a test file and use a tuning tool to learn feature weightings

Page 12: The SearchMaster's Toolbox ECIR Industry Day 01 Apr 2010 David Hawking.

Testfile Desiderata• Representative of real workload

– Need an unbiased sample

• Many queries (typically >> 100)• Multiple weighted answers (where

applicable)• Redirects• Equivalent answers• See es.csiro.au/C-TEST/

Page 13: The SearchMaster's Toolbox ECIR Industry Day 01 Apr 2010 David Hawking.

Academic Research on Evaluation

• Masses of academic research• How does it translate to tuning an

enterprise search system?– Setting good defaults– Tuning to specific characteristics in hundreds

of customer deployments

• Note: the system starts with no user interaction data.

• Creation of testfiles must be affordable.

Page 14: The SearchMaster's Toolbox ECIR Industry Day 01 Apr 2010 David Hawking.

Spreadsheet testfileemployment health.gov.au/health-career-vacant.htm

jobs health.gov.au/health-career-vacant.htm

vacancies health.gov.au/health-career-vacant.htm

recruitment health.gov.au/health-career-vacant.htm

tenders health.gov.au/list-of-tenders-and-grants.htm

grants health.gov.au/list-of-tenders-and-grants.htm

tenders health.gov.au/list-of-tenders-and-grants.htm

mental health health.gov.au/mental-health-and-wellbeing

mental health strategyhealth.gov.au/mental-strategy

aged care health.gov.au/aged-care.htm

Page 15: The SearchMaster's Toolbox ECIR Industry Day 01 Apr 2010 David Hawking.

LSE Case Study

Page 16: The SearchMaster's Toolbox ECIR Industry Day 01 Apr 2010 David Hawking.

Sources of testfiles at LSE• A-Z Sitemap (>500 entries)

– Biased toward anchortext

• Keymatches file (>500 entries)– Pessimistic

• Click data (>250 queries with > t clicks)– Biased toward clicks – 100% success!

• Pop/crit queries (134 manually judged)

All biased – Use a sampling tool!

Page 17: The SearchMaster's Toolbox ECIR Industry Day 01 Apr 2010 David Hawking.

1 2

3

dim2

dim1

Dimension-at-a-time tuning

Page 18: The SearchMaster's Toolbox ECIR Industry Day 01 Apr 2010 David Hawking.

Out of boxAs configured

-daat (tuned)-daat20000 (tuned)

-daat0/TAAT (tuned)

0

5

10

15

20

25

30

Popular/Critical Set

Page 19: The SearchMaster's Toolbox ECIR Industry Day 01 Apr 2010 David Hawking.

Fine Tuning Summary• Tuning a large number of dimensions

(Funnelback FineTune covers 38)• Millions of query executions• Achieves substantial gains

Page 20: The SearchMaster's Toolbox ECIR Industry Day 01 Apr 2010 David Hawking.

But why do queries still fail?

• Misspelled– Europian Conferense oninformation retreival

• Query words don't match document– “door” or “MOPEM” v. “manually operated

personnel egress mechanism”

• There is no answer to that question.– Maybe there should be– Scope issues.

Page 21: The SearchMaster's Toolbox ECIR Industry Day 01 Apr 2010 David Hawking.

Need more tools!

Page 22: The SearchMaster's Toolbox ECIR Industry Day 01 Apr 2010 David Hawking.

3. Spelling suggestion tools• Suggestions may be useful even if words

are correctly spelled:– Carlton furball club → Carlton football club

• Suggestions based on whole query, not word-by-word

• Don't suggest queries which make no sense in the collection being searched

• Autocompletion: Guide users to the best query

• Context is king

Page 23: The SearchMaster's Toolbox ECIR Industry Day 01 Apr 2010 David Hawking.

4. Query expansion tools• Manual rules:

– Rego → [registration rego]– MOPEM →[“manually operated personnel

egress mechanism door”]

• Related queries (automatic)– Based on co-clicking

• Contextual navigation (on-the-fly)– Finding superphrases in a deep result set

• Faceting (semi-automatic)

Page 24: The SearchMaster's Toolbox ECIR Industry Day 01 Apr 2010 David Hawking.
Page 25: The SearchMaster's Toolbox ECIR Industry Day 01 Apr 2010 David Hawking.

5. Reporting and alerting tools• Reporting on Queries which:

– Produced no results– Logged behaviour suggestive of unfulfilment

• Alerting when:– Submissions of a query (or group of related

queries) sharply increase in frequency

• For:– business intelligence– Triggering creation or changes to content

Page 26: The SearchMaster's Toolbox ECIR Industry Day 01 Apr 2010 David Hawking.

Query Spike Alerting

Page 27: The SearchMaster's Toolbox ECIR Industry Day 01 Apr 2010 David Hawking.

Conclusions• Search is important• Organisations benefit when someone takes

responsibility for effective search – the SearchMaster.

• Academic research into evaluation needs careful translation for use in enterprise search tuning.

• Further tools are needed to overcome poor queries and missing content.

Thanks to Mike Swanson of Oxfam Australia for the Ned Kelly line.

Page 28: The SearchMaster's Toolbox ECIR Industry Day 01 Apr 2010 David Hawking.