Download - Wolf Siberski

Transcript
Page 1: Wolf Siberski

Wolf Siberski 1

Wolf Siberski

What do you mean? – Determining the Intent of Keyword Queries on Structured Data

Page 2: Wolf Siberski

Wolf Siberski 2

Overview

■ Motivation■ Approaches in keyword search on structured data■ QUICK – Query Intent Construction for Keywords

■ User interaction■ Algorithm■ Evaluation

■ Conclusion

Page 3: Wolf Siberski

Wolf Siberski 3

The Information Search Process

What is my search objective?What is my search objective?

What exactly do I want to know?

What exactly do I want to know?

How do I express my search request?How do I express my search request?

Which result satisfies my information need?

Which result satisfies my information need?

Sutcliffe/Ennis: Towards a cognitive theory of information retrieval

Identify problem

Articulate needs

Query formulation

Evaluate results Unsatisfactory

results

Usergoals

Failed search

NeedTypes

Concepts

Domain Knowledge

Information system knowledge

Information System

Execute query

Information problem

Successful search

Results

Page 4: Wolf Siberski

Wolf Siberski 4

Identify problem

Articulate needs

Query formulation

Evaluate results Unsatisfactory

results

Usergoals

Failed search

NeedTypes

Concepts

Domain Knowledge

Information system knowledge

Information System

Execute query

Information problem

Successful search

Results

IMDB Example – Keyword search

In which movies did they both act?

Brad Pitt Angelina Jolie

Have they been working together?

Brad Pitt Angelina JolieIMDb Brad Pitt Angelina Jolie

Page 5: Wolf Siberski

Wolf Siberski 5

Identify problem

Articulate needs

Query formulation

Evaluate results Unsatisfactory

results

Usergoals

Failed search

NeedTypes

Concepts

Domain Knowledge

Information system knowledge

Information System

Execute query

Information problem

Successful search

Results

IMDB Example – Database search

In which movies did they both act?

Brad Pitt Angelina Jolie

Are they working together, too?

SELECT M.Title, M.Year FROM Movie M, Actor A1, Actor A2, ActsIn R1, ActsIn R2 WHEREA1.Name = 'Brad Pitt' AND A2.Name = 'Angelina Jolie' AND R1.ActorId = A1.Id AND R2.ActorId = A2.Id ANDR1.MovieId = R2.MovieId AND M.Id = R1.MovieId

Movie

PK id

title year

Actor

PK id

name

Movie Character

nameFK1 actsInFK2 actedBy

M.Title M.Year

101 Biggest Celebrity Oops 2004

Mr. & Mrs. Smith 2005

Stars on Trial 2005

The 72nd Academy Awards 2000

Page 6: Wolf Siberski

Wolf Siberski 6

Context

■ Trend: general information captured as structured data (DBpedia, LinkedData, etc.)

■ Limited support for complex information needs■ Keywords: Limited expressivity, but user-friendly■ Structured Queries: High expressivity, but difficult to master

New ways to access this data required

Page 7: Wolf Siberski

Wolf Siberski 7

IR on Structured Data (Incomplete)

■ Not a new idea (Universal Relation, 1984)

1. Relevance Notion for structured data■ Extract data subgraphs (tuple joins) matching the query■ Rank results according to relevance score■ BANKS,DISCOVER, SPARK, EASE, etc.

■ Can serve the ‚head‘ of user distribution, but not the long tail■ Low quality of relevance judgements [Coffmann/Weaver, CIKM10]

2. Form builder■ Enable visual construction of user-defined query forms

■ Requires exploration of database schema

Page 8: Wolf Siberski

Wolf Siberski 8

QUICK – Keyword Search on Databases

■ User starts with keyword search

■ QUICK guides user through query construction process

■ Combines ■ Ease-of-use of keyword search■ Expressivity of database queries

G. Zenz, X. Zhou, E. Minack, W. Siberski, and W. Nejdl:From keywords to semantic queries – Incremental query construction on the semantic web. Journal of Web Semantics, Elsevier, 2009. http://dx.doi.org/10.1016/j.websem.2009.07.005

Page 9: Wolf Siberski

Wolf Siberski 9

QUICK Search Process

User

KeywordsCompute possible query intentions

QUICK

Compute selection options

Refined Interpretation

Selection optionsSelect intended interpretation

Select intended query

QueryCompute results

Results

Is “Brad” part of a movie title?Is “Brad” part of an actor name?…

Brad Pitt Angelina Jolie

“Brad” is part of an actor name

Find movies where both Brad Pitt and Angelina Jolie are actors

Evaluate results

M.Title M.Year101 Biggest Ce… 2004Mr. & Mrs. Smith 2005Stars on Trial 2005

Wolf Siberski
Goal of quick
Page 10: Wolf Siberski

Wolf Siberski 10

Actor

name

QUICK – Concepts

■ RDF Schema

■ Query Template■ Query pattern on the schema

■ Contains only free variables

■ Semantic Query■ Interpretation of a keyword query

■ Produced from query template by binding keywords

MovieMovie

Character actsIn

title name

Actor

name

actedBy

ActorMovie

Character

actedBy

name name

Actor

name

Movie

actsIn actedBy

title

MovieCharacter

Actor

name

brad pitt

ActorMovie

Character

actedBy

name

pitt

name

bradActor

name

pitt

Movie

actsIn actedBy

title

brad

MovieCharacter

Wolf Siberski
name/name is strange template
Page 11: Wolf Siberski

Wolf Siberski 11

■ Query Hierarchy■ Semantic queries ordered by sub-query relationship

■ Query Guide■ Graph including paths to all possible queries

Query Guide

...

...

Movie

title

brad

name

brad

MovieCharacter

name

pitt

Movie

title

pitt

Actor

name

pitt

Actor

name

pitt

Movie

actsIn actedBy

title

brad

MovieCharacter

ActorMovie

Character

actedBy

name

pitt

name

brad

Actor

name

brad

MovieCharacter

Actor

name

brad pitt

Actor

name

pitt

ActorMovie

Character

actedBy

name

pitt

name

brad

Movie

title

pitt

Actor

name

brad pittActor

name

pitt

Movie

actsIn actedBy

title

brad

MovieCharacter

Page 12: Wolf Siberski

Wolf Siberski 12

QUICK Example: Construction Options

Page 13: Wolf Siberski

Wolf Siberski 13

QUICK Example: Query List

Page 14: Wolf Siberski

Wolf Siberski 14

QUICK Example: Results

Page 15: Wolf Siberski

Wolf Siberski 15

Query Guide Construction – Offline Stage

■ Generate all Query Templates■ Start with one-variable queries■ Produce all possible combinations■ Repeat until max. join path length reached

■ Build Inverted Index■ Terms -> Attributes■ Enables fast keyword-query mapping at runtime

Page 16: Wolf Siberski

Wolf Siberski 16

Query Guide Construction – Online Stage

■ Identify possible queries (leafs of query guide)■ Extract partial query graph from template graph

■ Problem: query space can be very large

Find minimal query guide

■ Cost function: # of steps+ # of inspected suggestions■ Minimal guide: smallest maximum cost

■ Depth/width tradeoff:

Too flat Too deep

...

...............

Optimum:

ln(n) split

Page 17: Wolf Siberski

Wolf Siberski 17

Greedy Query Guide Construction

■ Finding Minimal Guide: NP-Hard

■ Use approach similar to set cover approximation

■ Determine nodes (=refinement options) top-down■ Greedily select node leading to the lowest cost

– Cost estimation: minimally incurred cost

■ Repeat until all nodes are covered

Page 18: Wolf Siberski

Wolf Siberski 18

Evaluation – Experiment Settings

■ IMDB database■ Semantic Web representation

■ Queries from AOL query log■ Selection criteria

– Movie-related

– 2-5 keywords

– Refers to at least 2 entities

■ Manual assessment of query intention

■ Search process■ Manual input of keywords■ Selection of correct option according to query intention

Page 19: Wolf Siberski

Wolf Siberski 19

Evaluation – Guide Quality

■ Intended construction option usually among top 3■ Usually 3-5 clicks needed to construct query■ Effective also for large query spaces

Page 20: Wolf Siberski

Wolf Siberski 20

Conclusion

■ Query construction with QUICK■ Highly effective construction process■ All intentions can be constructed■ No query language or schema knowledge

required

■ Further directions■ Combine with relevance heuristics (IQP)■ More flexible user interaction

– Use facets for keyword bindings

– Better multi term support

■ Optimized query guide generation– Exploit entity notion (QUnits)

– Progressive query guide creation

■ Connect to QbE/Query Form Creation

Page 21: Wolf Siberski

Wolf Siberski 21

Evaluation – Performance

No. of terms Initialization time (ms)

Response time (ms)

2 98 2

3 993 19

4 16,797 1,035

>4 31,838 3,290

All 3,659 314

■ Initialization takes too much time for long queries■ RDF store as bottleneck (creation of query hierarchy)

■ After initialization, response time is ok

Page 22: Wolf Siberski

Wolf Siberski 22

Optimizations

■ Identification of semantic queries■ Index template subsets by attribute to enable fast filtering of

queries without results■ Enable fast disjunction of template subsets (e.g., ‚and on bitsets)

■ QCG generation■ Parallel subquery computation■ Caching of frequent subqueries

Page 23: Wolf Siberski

Wolf Siberski 23

Misc Ideas

■ Use Google‘s KDD annotated Named Entity Recognition test set (Piggyback, http://sites.google.com/site/massiciara/)

Page 24: Wolf Siberski

Wolf Siberski 24

Cross Connections

■ Thomas Gottron: Traditional features (e.g. TF) not useful for very short text

■ Hinrich Schütze: entity related queries often ambigouous■ Michael Granitzer: cycle of refinement/exploration■ Norbert Fuhr: generate clusters based on possible

queries and let users select the right cluster