Inferring Searcher Intent

Eugene Agichtein, Emory University 11 July 2010

AAAI 2010 Tutorial: Inferring Searcher Intent 1

AAAI 2010 Tutorial: Inferring Searcher Intent 7/11/2010

Eugene Agichtein

Emory University

Inferring Searcher Intent

Eugene Agichtein

Emory University

Tutorial Website (for expanded and updated bibliography):

http://ir.mathcs.emory.edu/intent_tutorial/

Instructor contact information:

Email: [email protected]

Web: http://www.mathcs.emory.edu/~eugene/


Tutorial Overview

• Part 1: Search Intent Modeling– Motivation: how intent inference could help search

– Search intent & information seeking behavior in traditional IR

– Searcher models: from eye tracking to clickthrough mining

• Part 2: Inferring Web Searcher Intent– Inferring result relevance: clicks

– Richer interaction models: clicks + browsing

• Part 3: Applications and Extensions– Implicit feedback for ranking

– Contextualized prediction: session modeling

– Personalization, query suggestion, active learning

2Eugene Agichtein

Emory University




About the Instructor

• Eugene Agichtein (Ah-ghi-sh-tein)http://www.mathcs.emory.edu/~eugene/

• Research: Information retrieval and data mining– Mining search behavior and interactions in web search– Text mining, information extraction, and question answering

• Relevant experience:2006 - Assistant Professor, Emory UniversitySummer’07: Visiting Researcher, Yahoo! Research2004-06: Postdoc, Microsoft Research1998 - 2004: PhD student, Columbia

• Databases/IR

Eugene Agichtein

Emory University3


Outline: Search Intent and Behavior

�Motivation: how intent inference could help search

• Search intent and information seeking behavior– Classical models of information seeking

• Web searcher intent

• Web searcher behavior– Levels of modeling: micro-, meso-, and macro- levels

– Variations in web searcher behavior

– Click models

4Eugene Agichtein

Emory University




Some Key Challenges for Web Search

• Query interpretation (infer intent)

• Ranking (high dimensionality)

• Evaluation (system improvement)

• Result presentation (information visualization)

5Eugene Agichtein

Emory University


Example: Task-Goal-Search Model

6

car safety ratings consumer reports

Eugene Agichtein

Emory University




Information Retrieval Process Overview

7

Source

Selection

Search

Query: car safety ratings

Selection

Ranked List

Examination

Documents

Delivery

Documents

Query

Formulation

Resource

query reformulation,

vocabulary learning,

relevance feedback

source reselection

Search Engine

Result Page (SERP)

Credit: Jimmy Lin, Doug Oard, …

Eugene Agichtein

Emory University


Explicit Intentions in Query Logs

• Match known goals (from ConceptNet) to query logs

Eugene Agichtein

Emory University8

Strohmaier et al., K-Cap 2009




Unfortunately, most queries are not so explicit…

Eugene Agichtein

Emory University9


Outline: Search Intent and Behavior

� Motivation: how intent inference could help search

� Search intent and information seeking behavior– Classical models of information seeking

• Web Searcher Intent– Broder

– Rose

– More recent?

• Web Searcher Behavior– Levels of modeling: micro-, meso-, and macro- levels


– Click models

• Challenges and open questions

10Eugene Agichtein

Emory University




Information Seeking Funnel

• Wandering: the user does not have an information seeking-goal in mind. May have a meta-goal (e.g. “find a topic for my final paper.”)

• Exploring: the user has a general goal (e.g. “learn about the history of communication technology”) but not a plan for how to achieve it.

• Seeking: the user has started to identify information needs that must be satisfied (e.g. “find out about the role of the telegraph in communication.”), but the needs are open-ended.

• Asking: the user has a very specific information need that corresponds to a closed-class question (“when was the telegraph invented?”).

Eugene Agichtein

Emory University11

D. Rose, 2008


Models of Information Seeking

• “Information-seeking … includes

recognizing … the information

problem, establishing a plan of

search, conducting the search,

evaluating the results, and …

iterating through the process.”-

Marchionini, 1989

– Query formulation

– Action (query)

– Review results

– Refine query

12

Adapted from: M. Hearst, SUI, 2009

Eugene Agichtein

Emory University




Reviewing Results: Relevance Clues

• What makes information or information objects

relevant? What do people look for in order to infer

relevance?

– Topicality (subject relevance)

– Extrinsic (task-, goal- specific)

• Information Science “clues research”:

– uncover and classify attributes or criteria used for

making relevance inferences

13Eugene Agichtein

Emory University


Information Scent for Navigation

• Examine clues where to find useful information

14

Search results listings must provide

the user with clues about which

results to clickEugene Agichtein

Emory University




Dynamic “Berry Picking” Model

• Information needs change during interactions

15

[Bates, 1989] M.J. Bates. The design of browsing and berrypicking techniques for the on-

line search interface. Online Review, 13(5):407–431, 1989.

Eugene Agichtein

Emory University


Goal: maximize rate of

information gain.Patches of information �websites

Basic Problem: should I

continue in the current patch

or look for another patch?

Expected gain from continuing in

current patch, how long to continue

searching in that patch

Information Foraging Theory

16Eugene Agichtein Emory

University

Pirolli and Card, CHI 1995




Diminishing returns: 80% of users scan only first 3 pages of search results

-Charnov’s Marginal Value Theorem

17Eugene Agichtein Emory

University


Hotel Search

Eugene Agichtein

Emory University18

Goal: Find

cheapest 4-star

hotel in Paris.

Step 1: pick hotel

search site

Step 3: goto 1

Step 2: scan list




Example: Hotel Search (cont’d)

Eugene Agichtein

Emory University19


Orienteering vs. Teleporting

• Orienteering:

– Searcher issues a quick, imprecise to get to approximately the right information space region

– Searchers follow known paths that require small steps that move them closer to their goal

– Easy (does not require to generate a “perfect” query)

• Teleporting:

– Issue (longer) query to jump directly to the target

– Expert searchers issue longer queries

– Requires more effort and experience.

– Until recently, was the dominant IR model

20Eugene Agichtein

Emory University

Teevan et al., CHI 2004




Serendipity

Eugene Agichtein

Emory University21

Andre et al., CHI 2009


Summary of Models

• Static, berry-picking, information foraging, orienteering, serendipity

• Classical IR Systems research mainly uses the simplest form of relevance (topicality)

• Open questions:

– How people recognize other kinds of relevance

– How to incorporating other forms of relevance (e.g., user goals/needs/tasks) into IR systems

Eugene Agichtein

Emory University22




Part 1: Search Intent and Behavior


� Search intent and information seeking behavior� Classical models of information seeking

• Web Searcher Intent– Broder

– Rose

– More recent?



– Click models

23Eugene Agichtein

Emory University


Intent Classes (top level only)

User intent taxonomy (Broder 2002)

– Informational – want to learn about something (~40% / 65%)

– Navigational – want to go to that page (~25% / 15%)

– Transactional – want to do something (web-mediated) (~35% / 20%)

• Access a serviceDownloads

• Shop

– Gray areas

• Find a good hub

• Exploratory search “see what’s there”

Eugene Agichtein

Emory University

History nonya food

Singapore Airlines

Jakarta weather

Kalimantan satellite images

Nikon Finepix

Car rental Kuala Lumpur

[from SIGIR 2008 Tutorial, Baeza-Yates and Jones]

24




Extended User Goal Taxonomy

Eugene Agichtein

Emory University25

Rose et al., 2004


Complex Search

A complex search task refers to cases where:

• searcher needs to conduct multiple searches to locate the information sources needed,

• completing the search task spans over multiple sessions (task is interrupted by other things),

• searcher needs to consult multiple sources of information (all the information is not available from one source, e.g., a book, a webpage, a friend),

• requires a combination of exploration and more directed information finding activities,

• often requires note-taking (cannot hold all the information that is needed to satisfy the final goal in memory), and

• specificity of the goal tends to vary during the search process (often starts with exploration).

Eugene Agichtein

Emory University26

Aula and Russel, 2008




Complex Search (Cont’d)

Eugene Agichtein

Emory University27

Aula and Russel, 2008


Web Search Queries

• Cultural and educational diversity

• Short queries and impatient interaction

– Few queries posed and few answers seen (first page)

– Reformulation common

• Smaller and different vocabulary

– Not “expert” searchers!

– “Which box do I type in?”

Eugene Agichtein

Emory University28



AAAI 2010 Tutorial: Inferring Searcher Intent 7/11/2010Eugene Agichtein

Emory University29

[from SIGIR 2008 Tutorial, Baeza-Yates and Jones]

Intent Distribution by Topic


Query Distribution by Demographics

• Education:

• Ethnicity:

• Gender:

Eugene Agichtein

Emory University30

[Weber & Castillo, SIGIR 2010]




Query Demographics 2: Demo

Keywords have demographic signatures

– Microsoft adCenter Demographics Prediction:

http://adlab.msn.com/DPUI/DPUI.aspx

adCenter [posters]

Quantcast

31Eugene Agichtein

Emory University


Domain-Specific Intents: Named Entities

• Named Entities

(Persons, Orgs, Places)

are often searched

– “Brittany Spears”?

• Popular

phrases

by entity

type:

Eugene Agichtein

Emory University32

[Yin & Shah, WWW 2010]




Example Intent Taxonomy for Musicians

• Musicians (most

popular phrases)

Eugene Agichtein

Emory University33

[Yin & Shah, WWW 2010]


Analyzing Searches: Funneling

• What is the intent of customers that type such queries?

• Hint: What they searched before/after?

– Search Funnels: http://adlab.msn.com/searchfunnel/

– How can you catch

customers earlier?

– What customers do

when they leave?

34Eugene Agichtein

Emory University




Part 1: Search Intent and Behavior


� Search intent and information seeking behavior� Classical models of information seeking

� Web Searcher Intent� Broder

� Rose

� Demographics



– Click models

35Eugene Agichtein

Emory University


Web Searcher Behavior

• Meso-level: query, intent, and session

characteristics

• Micro-level: how searchers interact with result

pages

• Macro-level: patterns, trends, and interests

36Eugene Agichtein

Emory University




Levels of Understanding Searcher Behavior

• Micro (eye tracking): lowest level of detail, milliseconds

• Meso (field studies): mid-level, minutes to days

• Macro (session analysis):millions of observations, days to months

37

[Daniel M. Russell, 2007]

Eugene Agichtein

Emory University


Search Behavior: Scales

Eugene Agichtein

Emory University38

from: Pirolli, 2008




Information Retrieval Process (User view)

Eugene Agichtein

Emory University39

Source

Selection

Search

Query

Selection

Ranked List

Examination

Documents

Delivery

Documents

Query

Formulation

Resource

query reformulation,

vocabulary learning,

relevance feedback

source reselection


People Look at Only a Few Results

(Source: iprospect.com WhitePaper_2006_SearchEngineUserBehavior.pdf)

Eugene Agichtein

Emory University40




Snippet Views Depend on Rank

Mean: 3.07 Median: 2.00


Eugene Agichtein

Emory University41


Snippet Views and Clicks Depend on Rank[from Joachims et al, SIGIR 2005]

Eugene Agichtein

Emory University42




“Eyes are a Window to the Soul”

• Eye tracking gives information

about search interests:

– Eye position

– Pupil diameter

– Seekads and fixations

Eugene Agichtein

Emory University43

Reading

Visual

Search

Camera


Micro-level: Examining Results

• Users rapidly scan the search result page

• What they see in lower summaries may influence

judgment of higher result

• Spend most time scrutinizing top results 1 and 2

– Trust the ranking

44


Eugene Agichtein

Emory University




POM: Partially Observable Model

Eugene Agichtein

Emory University45

Wang et al., WSDM 2010


Result Examination (cont’d)

• Searchers might use

the mouse to focus

reading attention,

bookmark promising

results, or not at all.

• Behavior varies with

task difficulty and user

expertise

46

[K. Rodden, X. Fu, A. Aula, and I. Spiro, Eye-mouse

coordination patterns on web search results pages,

Extended Abstracts of ACM CHI 2008]

Eugene Agichtein

Emory University




Result Ex. (cont): Predicting Eye-Mouse coordination

47

Guo & Agichtein, CHI 2010

0 1 2 3 4

x 104

0

150

300

450

600

750

Time

Eye

−m

ou

se

dis

tan

ce

Euclidean DistanceThresholdPrediction

0 1000 2000 3000 4000 50000

150

300

450

600

Time

Eye

−m

ou

se

dis

tan

ce

Euclidean DistancePredictionThreshold

0 3000 6000 9000 120000

150

300

450

600

Time

Eye

−m

osu

e d

ista

nce

Euclidean DistancePredictionThreshold

Eugene Agichtein

Emory University

Actual Eye-Mouse Coordination Predicted

No Coordination (30%)

Bookmarking (25%)

Eye follows mouse (25%)


Macro-Level (Session) Analysis

• Can examine theoretical user models in light of empirical data:– Orienteering?

– Foraging?

– Multi-tasking?

• Search is often a multi-step process: – Find or navigate to a good site (“orienteering”)

– Browse for the answer there: [actor most oscars] vs. [oscars]

• Teleporting – “I wouldn’t use Google for this, I would just go to…”

• Triangulation– Draw information from multiple sources and interpolate

– Example: “how long can you last without food?”

48Eugene Agichtein

Emory University




Users (sometimes) Multi-task

49


Eugene Agichtein

Emory University


Kinds of Search+Browsing Behavior

50


Eugene Agichtein

Emory University




Parallel Browsing Behavior [Huang & White, HT 2010]

• 57% of all tabbed sessions are parallel browsing

• Can mean multi-tasking

• Common scenario: “branching” in exploring search results

Eugene Agichtein

Emory University51


Search Engine Switching Behavior

Eugene Agichtein

Emory University52

White et al., CIKM 2009

• 4% of all search sessions contained a switching event

• Switching events:

– 58.6 million switching events in 6-month period

• 1.4% of all Google / Yahoo! / Live queries followed by switch

– 12.6% of all switching events involved same query

– Two-thirds of switching events from browser search box

• Users:

– 72.6% of users used multiple engines in 6-month period

– 50% of users switched search engine within a session




Overview of Search Engine Switching

• Switching is more frequent in longer sessions


53Eugene Agichtein

Emory University


Overview of Switching - Survey

• 70.5% of survey respondents reported having switched– Remarkably similar to the 72.6% observed in logs

• Those who did not switch:– Were satisfied with current engine (57.8%)

– Believed no other engine would perform better (24.0%)

– Felt that it was too much effort to switch (6.8%)

– Other reasons included brand loyalty, trust, privacy

• Within-session switching:– 24.4% of switching users did so “Often” or “Always”

– 66.8% of switching users did so “Sometimes”


54Eugene Agichtein

Emory University




Reasons for Engine Switching

• Three types of reasons:– Dissatisfaction with original engine

– Desire to verify or find additional information

– User preference

Other reasons included:

- Loyalty to dest. engine

- Multi-engine apps.

- Hope (!)


55Eugene Agichtein

Emory University


Pre-switch Behavior

• Most common are queries and non-SERP clicks

• This is the action immediately before the switch

• What about pre-switch activity across the session?


56Eugene Agichtein

Emory University




Pre-switch Behavior (Survey)

“Is there anything about your search behavior

immediately preceding a switch that may indicate to an

observer that you are about to switch engines?”

• Common answers:

– Try several small query changes in pretty quick succession

– Go to more than the first page of results, again often in quick succession and often without clicks

– Go back and forth from SERP to individual results, without spending much time on any

– Click on lots of links, then switch engine for additional info

– Do not immediately click on something


57Eugene Agichtein

Emory University


Post-switch Behavior

• Extending the analysis beyond next action:

– 20% of switches eventually lead to return to origin engine

– 6% of switches eventually lead to use of third engine

• > 50% led to a result click. Are users satisfied?


58Eugene Agichtein

Emory University




Post-Switch Satisfaction

• Measures of user effort / activity (# Queries, # Actions)

• Measure of the quality of the interaction

– % queries with No Clicks, # Actions to SAT (>30sec dwell)

• Users issue more queries/actions; seem less satisfied (higher %NoClicks and more actions to SAT)

• Switching queries may be challenging for search engines

Activity# Queries # Actions

Origin Destination Origin Destination

All Queries 3.14 3.70 9.85 11.62

Same Queries 3.08 3.73 9.03 10.25

Success% NoClicks # Actions to SatAction

Origin Destination Origin Destination

All Queries 49.7 52.7 3.81 4.71

Same Queries 54.5 59.7 3.67 4.61


59Eugene Agichtein

Emory University


Search Behavior: Expertise

• Some people are more expert at searching than others

– Search expertise, not domain expertise

– Alternative explanation: Orienteering vs. Teleporting

• Find characteristics of these “advanced search engine users” in an effort to better understand how these users search

• Understanding what advanced searchers are doing could improve the search experience for everyone

60

[White & Morris, WWW 2007]

Eugene Agichtein

Emory University




Findings – Post-query browsing

Advanced users:

– Traverse trails faster

– Spend less time viewing

each Web page

– Follow query trails with

fewer steps

– Revisit pages less often

– “Branch” less often

Feature padvanced

0% > 0% ≥ 25% ≥ 50% ≥ 75%

Session Secs 701.10 706.21 792.65 903.01 1114.71

Trail Secs 205.39 159.56 156.45 147.91 136.79

Display Secs 36.95 32.94 34.91 33.11 30.67

Num. Steps 4.88 4.72 4.40 4.40 4.39

Num. Revisits 1.20 1.02 1.03 1.03 1.02

Num.Branches

1.55 1.51 1.50 1.47 1.44

%Trails 72.14% 27.86% .83% .23% .05%

%Users 79.90% 20.10% .79% .18% .04%

Non-advanced More advanced ��Advanced

[White & Morris, 2007]

Eugene Agichtein

Emory University61


Search Behavior: Demographics

• Gender differences:

– Query “wagner”

• Women: http://en.wikipedia.org/wiki/Richard_Wagner

• Men: http://www.wagnerspraytech.com/

• Education differences:

Eugene Agichtein

Emory University62

[Weber & Castillo, SIGIR 2010]




ReFinding Behavior

• 40% of the queries led to a click on a result that the

same user had clicked on

in a past search session.

– Teevan et al., 2007

• What’s the URL for this

year’s SIGIR 2010?

– Does not really matter,

it is faster to re-find it

63

[From Teevan et al, 2007]

Eugene Agichtein

Emory University


What Is Known About Re-Finding

• Re-finding recent topic of interest

• Web re-visitation common [Tauscher & Greenberg]

• People follow known paths for re-finding

– Search engines likely to be used for re-finding

• Query log analysis of re-finding

– Query sessions [Jones & Fain]

– Temporal aspects [Sanderson & Dumais]

64Eugene Agichtein

Emory University





3100

(24%)

36

(<1%)

635

(5%)

485

(4%)

637

(5%)

4

(<1%)

660

(5%)

7503

(57%)

Click on previously clicked results?

Click on different

results?

Same query issued

before?

New query?

Click same and

different?1 click > 1 click39%

Navigational

Re-finding with different query

Eugene Agichtein

Emory University


65


Rank Change Degrades Re-Finding

• Results change rank

• Change in result rank reduces probability of re-click

– No rank change: 88% chance

– Rank change: 53% chance

• Rank change � slower repeat click

– Compared with initial search to click

– No rank change: Re-click is faster

– Rank change: Re-click is slower


66Eugene Agichtein

Emory University




Aside: Mobile Search…

• Not topic of today’s tutorial

• Some references:

�M. Jones, Mobile Search Tutorial, Mobile HCI, 2009

– K. Church, B. Smyth, K. Bradley, Keith and P. Cotter. A large scale study of European mobile search behaviour. Mobile HCI, 2008

– Kamvar, M., Kellar, M., Patel, R., and Xu, Y. Computers and iphones and mobile phones, oh my!: a logs-based comparison of search users on different devices. WWW 2009

– Kamvar, M. and Baluja, S. 2008. Query suggestions for mobile search: understanding usage patterns, CHI 2008

Eugene Agichtein

Emory University67


Part 1: Summary

�Understanding user behavior at micro-, meso-, and macro- levels

�Theoretical models of information seeking

�Web search behavior:

�Levels of detail

�Search Intent

�Variations in web searcher behavior

�Keeping found things found

68Eugene Agichtein

Emory University




Eugene Agichtein

Emory University

Part 2: Inferring Web Searcher Intent

Eugene Agichtein

Emory University


Tutorial Overview

�Part 1: Search Intent Modeling

�Motivation: how intent inference could help search

�Web search intent & information seeking models

�Web searcher behavior models

�Part 2: Inferring Web Searcher Intent

– Inferring result relevance: clicks

– Richer interaction models: clicks + browsing

– Contextualizing intent models: personalization

70Eugene Agichtein

Emory University




Part 2: Inferring Searcher Intent

• Inferring result relevance: clicks

• Richer behavior models:

– SERP presentation info

– Post-search behavior

– Rich interaction models for SERPs

• Contextualizing intent inference:

– Session-level models

– Personalization

71Eugene Agichtein

Emory University


Implicit Feedback

• Users often reluctant to provide relevance judgments

– Some searches are precision-oriented (no “more like this”)

– They’re lazy or annoyed:

– “Was this document helpful?”

• Can we gather relevance feedback without requiring

the user to do anything?

• Goal: estimate relevance from behavior

Click

72Eugene Agichtein

Emory University




Observable Behavior

Minimum Scope

Segment Object ClassB

eh

av

ior

Ca

teg

ory

Examine

Retain

Reference

Annotate

View

Listen

Select

(click)

Print Bookmark

Save

Purchase

Delete

Subscribe

Copy / paste

Quote

Forward

Reply

Link

Cite

Mark up Rate

Publish

Organize

Click

73Eugene Agichtein

Emory University


Clicks as Relevance Feedback

• Limitations:

– Hard to determine the meaning of a click. If the best

result is not displayed, users will click on something

– Presentation bias

– Click duration may be misleading

• People leave machines unattended

• Opening multiple tabs quickly, then reading them all slowly

• Multitasking

• Compare above to limitations of explicit feedback:

– Sparse, inconsistent ratings

Click

74Eugene Agichtein

Emory University




“Strawman” Click model: No Bias

• Naive Baseline

– cdi is P( Click=True | Document=d, Position=i )

– rd is P( Click=True | Document=d )

• Why this baseline?

– We know that rd is part of the explanation

– Perhaps, for ranks 9 vs 10, it’s the main explanation

– It is a bad explanation at rank 1 e.g. Eye tracking

Attractiveness of summary ~= Relevance of result

[Craswell et al., 2008]

Eugene Agichtein

Emory University75


Realistic Click models

• Clickthrough and subsequent browsing behavior of

individual users influenced by many factors

– Relevance of a result to a query

– Visual appearance and layout

– Result presentation order

– Context, history, etc.

Eugene Agichtein

Emory University76




De-biasing position (first attempt)

Relative clickthrough for queries with known relevant results

in position 3 (results in positions 1 and 2 are not relevant)

1 2 3 5 10

Result Position

Rela

tive C

lick F

requ

ency All queries

PTR=1

PTR=3

Higher clickthrough at

top non-relevant than at

top relevant document

[Agichtein et al., 2006]

Click

77Eugene Agichtein

Emory University


Simple Model: Deviation from Expected

• Relevance component: deviation from “expected”:

Relevance(q , d)= observed - expected (p)

-0.023-0.029

-0.009-0.001

-0.013

0.010

-0.002 -0.001

0.144

0.063

-0.04

-0.02

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

1 2 3 5 10

Result position

Clic

k f

req

ue

nc

y d

ev

iati

on

PTR=1

PTR=3


Click

78Eugene Agichtein

Emory University




• CD: distributional model, extends SA+N

– Clickthrough considered iff frequency > ε than expected

• Click on result 2 likely “by chance”

• 4>(1,2,3,5), but not 2>(1,3)

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

1 2 3 4 5

Result position

Cli

ck

thro

ug

h F

req

ue

nc

y D

ev

iati

on

Simple Model: Example

1

2

3

4

5

6

7

8

Click

Click

Click

79Eugene Agichtein

Emory University


Simple Model Results

Improves precision

by discarding

“chance” clicks

Click

80Eugene Agichtein

Emory University




Another Formulation

• There are two types of user/interaction

– Click based on relevance

– Click based on rank (blindly)

• A.k.a. the OR model:

– Clicks arise from

relevance OR position

– Estimate with logistic regression 1 2 3 4 5 6 7 8 9 100

0.2

0.4

ib

i


Eugene Agichtein

Emory University81


Linear Examination Hypothesis

• Users are less likely to look at lower ranks, therefore

less likely to click

• This is the AND model

– Clicks arise from

relevance AND examination

– Probability of examination does not depend on what

else is in the list

1 2 3 4 5 6 7 8 9 100

0.5

1

i

x i


Eugene Agichtein

Emory University82




Cascade Model

• Users examine the results in rank order

• At each document d

– Click with probability rd

– Or continue with probability (1-rd)

[Taylor et al., 2008]

Eugene Agichtein

Emory University83


Cascade Model (2)

Eugene Agichtein

Emory University84

query URL1 URL2 URL3 URL4

C1 C2 C3C4

r1 r2 r3 r4 Relevance

ClickThroughs

rd

(1-rd) (1-rd) (1-rd)

rdrdrd




Cascade Model Example

• 500 users typed a query

• 0 click on result A in rank 1

• 100 click on result B in rank 2

• 100 click on result C in rank 3

• Cascade (with no smoothing) says:

• 0 of 500 clicked A � rA = 0

• 100 of 500 clicked B � rB = 0.2

• 100 of remaining 400 clicked C � rC = 0.25


Eugene Agichtein

Emory University85


Cascade Model Seems Closest to Reality

Best possible: Given the true click counts for ordering BA


Eugene Agichtein

Emory University86




Dynamic Bayesan Net

87

Click

O. Chapelle, & Y Zhang, A Dynamic Bayesian Network Click

Model for Web Search Ranking, WWW 2009

did user examine url?

was user satisfied by

landing page?

user attracted to url?

Eugene Agichtein

Emory University


Dynamic Bayesan Net

88

Click



Eugene Agichtein

Emory University



AAAI 2010 Tutorial: Inferring Searcher Intent 7/11/2010 89

Dynamic Bayesan Net (results)Click

predicted relevance

agrees 80% with

human relevance



Use EM algorithm (similar to forward-backward

to learn model parameters; set manually

Eugene Agichtein

Emory University


Clicks: Summary So Far

• Simple model accounts for position bias

• Bayes Net model: extension of Cascade model

shown to work well in practice

– Limitations?

• Questions?

90

Click

Eugene Agichtein

Emory University




Capturing a Click in its Context

91

[Piwowarski et al., 2009]

Building query chainsBuilding query chains

• Simple model based on time deltas & query similarities

Analysing the chainsAnalysing the chains

• Layered Bayesian Network (BN) model

Validation of the modelValidation of the model

• Relevance of clicked documents

• Boosted Trees with features from the BN

Click

Eugene Agichtein

Emory University


Overall process

Time thresholdTime threshold

Similarity thresholdSimilarity threshold

Grouping atomic sessionsGrouping atomic sessions


Click

92Eugene Agichtein

Emory University




Layered Bayesian Network[Piwowarski et al., 2009]

Click

93Eugene Agichtein

Emory University


The BN gives the context of a click

94

Probability (Chain state=… / observations)

= (0.2, 0.4, 0.01, 0.39, 0)

Probability (Search state=… / observations)

= (0.1, 0.42, …)

Probability (Page state=… / observations)

= (0.25, 0.2, …)

Probability (Click state=… / observations)

= (0.02, 0.5, …)

Probability ([not] Relevant / observations)

= (0.4, 0.5)

Chain

Search

Relevance

Click

Page


Click

94Eugene Agichtein

Emory University




Features for one click

• For each clicked document, compute features:

– (BN) Chain/Page/Action/Relevance state distribution

– (BN) Maximum likelihood configuration, likelihood

– Word confidence values (averaged for the query)

– Time and position related features

• This is associated with a relevance judgment from

an editor and used for learning


Click

95Eugene Agichtein

Emory University


Learning with Gradient Boosted Trees

• Use a Gradient boosted trees (Friedman 2001),

with a tree depth of 4 (8 for non BN-based model)

• Used disjoint train (BN + GBT training) and test sets

– Two sets of sessions S1 and S2 (20 million chains) and

two set of queries + relevance judgment J1 and J2

(about 1000 queries with behavior data)

– Process (repeated 4 times):

• learn the BN parameters on S1+J1,

• extract the BN features and learn the GBT with S1+J1

• Extract the BN features and predict relevance assessments of

J2 with sessions of S2


Click

96Eugene Agichtein

Emory University




Results: Predicting Relevance of Clicked Docs[Piwowarski et al., 2009]

Click

97Eugene Agichtein

Emory University


Problem: Users click based on result “Snippets”

• Effect of Caption Features on Clickthrough

Inversions, C. Clarke, E. Agichtien, S. Dumais, R.

White, SIGIR 2007

[Clarke et al., 2007]

Eugene Agichtein

Emory University98




Clickthrough Inversions [Clarke et al., 2007]

Eugene Agichtein

Emory University99


Relevance is Not the Dominant Factor![Clarke et al., 2007]

Eugene Agichtein

Emory University100




Snippet Features Studied[Clarke et al., 2007]

Eugene Agichtein

Emory University101


Feature Importance[Clarke et al., 2007]

Eugene Agichtein

Emory University102




Important Words in Snippet[Clarke et al., SIGIR 2007]

Eugene Agichtein

Emory University103


Extension: Use Fair Pairs Randomization

Click

data:

Example result:

(bars should

be equal

if unbiased)

Eugene Agichtein

Emory University104

[Yue et al., WWW 2010]




Viewing Organic Results vs. Ads

• Ads and Organic results

compete for user

attention

Navigational vs. Other Diversity v Similarity

Eugene Agichtein

Emory University105

Danescu-Niculescu-Mizil et al., WWW 2010


Part 2: Inferring Searcher Intent

�Inferring result relevance: clicks

�Richer behavior models:

�SERP presentation info

�Richer interaction models: +presentation, +behavior

106Eugene Agichtein

Emory University




Richer Behavior Models

• Behavior measures of Interest

– Browsing, scrolling, dwell time

– How to estimate relevance?

• Heuristics

• Learning-based

– General model: Curious Browser [Fox et al., TOIS 2005]

– Query + Browsing [Agichtein et al., SIGIR 2006]

– Active Prediction: [Yun et al., WWW 2010]

107Eugene Agichtein

Emory University


Curious Browser[Fox et al., 2003]

108Eugene Agichtein

Emory University




Data Analysis

• Bayesian modeling at result and session level

• Trained on 80% and tested on 20%

• Three levels of SAT – VSAT, PSAT & DSAT

• Implicit measures:Result-Level Session-Level

Diff Secs, Duration Secs Averages of result-level measures (Dwell Time

and Position)

Scrolled, ScrollCnt, AvgSecsBetweenScroll,

TotalScrollTime, MaxScroll

Query count

TimeToFirstClick, TimeToFirstScroll Results set count

Page, Page Position, Absolute Position Results visited

Visits End action

Exit Type

ImageCnt, PageSize, ScriptCnt

Added to Favorites, Printed

[Fox et al., 2003]

Eugene Agichtein Emory University 109


Data Analysis, cont’d[Fox et al., 2003]





Result-Level Findings

1. Dwell time, clickthrough and exit type

strongest predictors of SAT

2. Printing and Adding to Favorites highly

predictive of SAT when present

3. Combined measures predict SAT better

than clickthrough

[Fox et al., 2003]



Result Level Findings, cont’d

Only clickthrough

Combined measures

Combined measures with

confidence of > 0.5 (80-20

train/test split)

[Fox et al., 2003]





Learning Result Preferences in Rich User Interaction Space

• Observed and Distributional features

– Observed features: aggregated values over all user interactions for

each query and result pair

– Distributional features: deviations from the “expected” behavior

for the query

• Represent user interactions as vectors in “Behavior Space”

– Presentation: what a user sees before click

– Clickthrough: frequency and timing of clicks

– Browsing: what users do after the click




Features for Behavior Representation[Agichtein et al., SIGIR2006]

PresentationPresentation

ResultPositionResultPosition Position of the URL in Current rankingPosition of the URL in Current ranking

QueryTitleOverlapQueryTitleOverlap Fraction of query terms in result TitleFraction of query terms in result Title

Clickthrough Clickthrough

DeliberationTimeDeliberationTime Seconds between query and first clickSeconds between query and first click

ClickFrequencyClickFrequency Fraction of all clicks landing on pageFraction of all clicks landing on page

ClickDeviationClickDeviation Deviation from expected click frequencyDeviation from expected click frequency

Browsing Browsing

DwellTimeDwellTime Result page dwell timeResult page dwell time

DwellTimeDeviationDwellTimeDeviation Deviation from expected dwell time for queryDeviation from expected dwell time for query

Sample Behavior Features





Predicting Result Preferences

• Task: predict pairwise preferences

– A judge will prefer Result A > Result B

• Models for preference prediction

– Current search engine ranking

– Clickthrough

– Full user behavior model

[Agichtein et al., SIGIR2006]



User Behavior Model

• Full set of interaction features

– Presentation, clickthrough, browsing

• Train the model with explicit judgments

– Input: behavior feature vectors for each query-page pair in rated results

– Use RankNet (Burges et al., [ICML 2005]) to discover model weights

– Output: a neural net that can assign a “relevance” score to a behavior feature vector






Results: Predicting User Preferences

SA+N

0.6

0.62

0.64

0.66

0.68

0.7

0.72

0.74

0.76

0.78

0.8

0 0.1 0.2 0.3 0.4

Recall

Pre

cis

ion

SA+N

CD

UserBehavior

Baseline

• Baseline < SA+N < CD << UserBehavior

• Rich user behavior features result in dramatic improvement


Eugene Agichtein

Emory University


Predicting Queries from Browsing Behavior

• Identify “Search Trigger” browse-search patterns

• Distribution of “Search-Browse” patterns:

URLs: movies.about.com/ nationalpriorities.org pds.jpl.nasa.gov/planets

Eugene Agichtein

Emory University118

[Cheng et al., WWW 2010]




Summary of Part 2

• Click data contains important information about the

distribution of intents for a query

• For accurate interpretation, must model the (many)

biases present:

– Presentation, demographics, types of intent

Eugene Agichtein

Emory University119


Part 3: Applications and Extensions

• Improving search ranking

– Implicit feedback

• Predicting Intent and Behavior

– Query suggestion, ads

• Search personalization

Eugene Agichtein

Emory University120




Observable Behavior

Minimum Scope

Segment Object ClassB

eh

avio

r C

ate

go

ry

Examine

Retain

Reference

Annotate

View

Listen

Print Bookmark

Save

Purchase

Delete

Subscribe

Copy / paste

Quote

Forward

Reply

Link

Cite

Mark up Rate

Publish

Organize

Eugene Agichtein

Emory University121


Eye Tracking

• Unobtrusive

• Relatively precise(accuracy: 1° of visual angle)

• Expensive

• Mostly used as „passive“ tool for behavior analysis, e.g. visualized by heatmaps:

• We use eye tracking for immediate implicit feedback taking into account temporal fixation patterns

122Eugene Agichtein

Emory University




Using Eye Tracking for Relevance Feedback

• Starting point: Noisy gaze data from the eye tracker.

2. Fixation detection and saccade classification

3. Reading (red) and skimming (yellow) detection line by line

See G. Buscher, A. Dengel, L. van Elst: “Eye Movements as Implicit Relevance Feedback”, in CHI '08

[Buscher et al., 2008]

123Eugene Agichtein

Emory University


Three Feedback Methods Compared

Input:

viewed

documents

Baseline TF x IDF

Gaze-Filter TF x IDF

Gaze-Length-

Filter

Reading

Speed

ReadingScore(t) x

TF x IDF

based on read vs.

skimmed passages

containing term t

based on opened

entire documents

based on read or

skimmed passages

Interest(t) x TF x IDF

based on length of

coherently read text


124Eugene Agichtein

Emory University




Eye Tracking-based RF Results


125Eugene Agichtein

Emory University


Instrumenting SERP Interactions: EMU

126

• EMU: Firefox + LibX plugin instrumentation � http log

• Track whitelisted sites e.g., Emory, Google, Yahoo search…

• All SERP events logged (asynchronous http requests)

•150 public use machines, 5,000+ opted-in users

HTTP Log

HTTP Server

Usage DataData Mining &

Management

Train Prediction

Models

Eugene Agichtein

Emory University

Gui, Agichtein, et al., JCDL 2009




Classifying Research vs. Purchase Intent

• 12 subjects (grad students and staff) asked to

1. Research a product they want to purchase eventually

(Research intent)

2. Search for a best deal on an item they want to

purchase immediately (Purchase intent)

• Eye tracking and browser instrumentation

performed in parallel for some of the subjects

127

Guo & Agichtein, SIGIR 2010

Eugene Agichtein

Emory University


Research Intent

Eugene Agichtein

Emory University128





Purchase Intent

129Eugene Agichtein

Emory University



Contextualized Intent Inference

130


Eugene Agichtein

Emory University




Implementation: Conditional Random Field (CRF) Model

Eugene Agichtein Emory

University131



Results: Ad Click Prediction

• 200%+ precision improvement (within task)

Eugene Agichtein

Emory University132





Application: Learning to Rank from Click Data

133

[ Joachims 2002 ]

Eugene Agichtein

Emory University


Results

134

[ Joachims 2002 ]

Summary:

Learned outperforms all base

methods in experiment

� Learning from clickthrough data

is possible

� Relative preferences are useful

training data.

Eugene Agichtein

Emory University




Extension: Query Chains

135

[Radlinski & Joachims, KDD 2005]

Eugene Agichtein

Emory University


Query Chains (Cont’d)

136


Eugene Agichtein

Emory University




Query Chains (Results)

• Query Chains add slight improvement over clicks

137


Eugene Agichtein

Emory University


Richer Behavior for Dynamic Ranking

138


PresentationPresentation

ResultPositionResultPosition Position of the URL in Current rankingPosition of the URL in Current ranking

QueryTitleOverlapQueryTitleOverlap Fraction of query terms in result TitleFraction of query terms in result Title

Clickthrough Clickthrough

DeliberationTimeDeliberationTime Seconds between query and first clickSeconds between query and first click

ClickFrequencyClickFrequency Fraction of all clicks landing on pageFraction of all clicks landing on page

ClickDeviationClickDeviation Deviation from expected click frequencyDeviation from expected click frequency

Browsing Browsing

DwellTimeDwellTime Result page dwell timeResult page dwell time

DwellTimeDeviationDwellTimeDeviation Deviation from expected dwell time for queryDeviation from expected dwell time for query

Sample Behavior Features (from Lecture 2)

Eugene Agichtein

Emory University




Feature Merging: Details

• Value scaling:

– Binning vs. log-linear vs. linear (e.g., μ=0, σ=1)

• Missing Values:

– 0? (meaning for normalized feature values s.t. μ=0?)

• “real-time”: significant architecture/system problems

Result URL BM25 PageRank … Clicks DwellTime …

sigir2007.org 2.4 0.5 … ? ? …

Sigir2006.org 1.4 1.1 … 150 145.2 …

acm.org/sigs/sigir/ 1.2 2 … 60 23.5 …

Query: SIGIR, fake results w/ fake feature values


139Eugene Agichtein

Emory University


Review: NDCG

• Normalized Discounted Cumulative Gain

• Multiple Levels of Relevance

• DCG:

– contribution of ith rank position:

– Ex: has DCG score of

• NDCG is normalized DCG

– best possible ranking as score NDCG = 1

)1log(

12

+

−

i

iy

45.5)6log(

1

)5log(

0

)4log(

1

)3log(

3

)2log(

1≈++++

140Eugene Agichtein

Emory University




Human Judgments

http://jobs.monsterindia.com/details/7902838.html 141

Eugene Agichtein

Emory University


Results for Incorporating Behavior into Ranking

MAP Gain

RN 0.270

RN+ALL 0.321 0.052 (19.13%)

BM25 0.236

BM25+ALL 0.292 0.056 (23.71%)

0.56

0.58

0.6

0.62

0.64

0.66

0.68

0.7

0.72

0.74

1 2 3 4 5 6 7 8 9 10K

ND

CG

RN

Rerank-All

RN+All


142Eugene Agichtein

Emory University




Which Queries Benefit Most

0

50

100

150

200

250

300

350

0.1 0.2 0.3 0.4 0.5 0.6

-0.4

-0.35

-0.3

-0.25

-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2

Frequency Average Gain

Most gains are for queries with poor original ranking


143Eugene Agichtein

Emory University


Extension to Unseen Queries/Documents: Search Trails

144

[Bilenko and White, WWW 2008]

• Trails start with a search engine query

• Continue until a terminating event

– Another search

– Visit to an unrelated site (social networks, webmail)

– Timeout, browser homepage, browser closingEugene Agichtein

Emory University




Probabilistic Model

• IR via language modeling [Zhai-Lafferty, Lavrenko]

• Query-term distribution gives more mass to rare

terms:

• Term-website weights combine dwell time and counts


145Eugene Agichtein

Emory University


Results: Learning to Rank

Add Rel(q, di) as a feature to RankNet

0.58

0.6

0.62

0.64

0.66

0.68

0.7

0.72

NDCG@1 NDCG@3 NDCG@10

ND

CG

Baseline

Baseline+Heuristic

Baseline+Probabilistic

Baseline+Probabilistic+RW


146Eugene Agichtein

Emory University




Personalization

Eugene Agichtein

Emory University147


Which Queries to Personalize?

• Personalization benefits ambiguous queries

• Inter-rater reliability (Fleiss’ kappa)

– Observed agreement (Pa) exceeds expected (Pe)

– κ = (Pa-Pe) / (1-Pe)

• Relevance entropy

– Variability in probability result is relevant (Pr)

– S = -Σ Pr log Pr

• Potential for personalization

– Ideal group ranking differs from ideal personal

– P4P = 1 - nDCGgroup

148

Teevan, J, S. T. Dumais, and D. J. Liebling. To personalize or not to

personalize: modeling queries with variation in user intent., SIGIR 2008

[Teevan et al., 2008]

Eugene Agichtein

Emory University




Predicting Ambiguous Queries

History

No Yes

Info

rma

tio

n

Qu

ery

Query length

Contains URL

Contains advanced operator

Time of day issued

Number of results (df)

Number of query suggests

Reformulation probability

# of times query issued

# of users who issued query

Avg. time of day issued

Avg. number of results

Avg. number of query suggests

Re

sult

s

Query clarity

ODP category entropy

Number of ODP categories

Portion of non-HTML results

Portion of results from .com/.edu

Number of distinct domains

Result entropy

Avg. click position

Avg. seconds to click

Avg. clicks per user

Click entropy

Potential for personalization

Teevan, J, S. T. Dumais, and D. J. Liebling. To personalize or not to

personalize: modeling queries with variation in user intent., SIGIR 2008

[Teevan et al., 2008]

149Eugene Agichtein

Emory University


Mars (Candy) vs. Mars (Planet)

• Approach:

– Intent = Set of visited documents

– Cluster refinements using document visit distribution vectors

Clustering Query Refinements by User Intent

Eugene Agichtein

Emory University150

[Sadikov et al., WWW 2010]




Approaches to Personalization

1. Pitkow et al., 2002

2. Qiu et al., 2006

3. Jeh et al., 2003

4. Teevan et al., 2005

5. Das et al., 2007

151

Figure adapted from: Personalized search on the world wide web, by

Micarelli, A. and Gasparetti, F. and Sciarrone, F. and Gauch, S., LNCS 2007

1

2 4

5

3

Eugene Agichtein

Emory University


personalization research

• Ask the searcher

– Is this relevant?

• Look at searcher’s clicks

• Similarity to content

searcher’s seen before

Teevan et al., TOCHI 2010

152Eugene Agichtein

Emory University




Ask the Searcher

• Explicit indicator of relevance

• Benefits

– Direct insight

• Drawbacks

– Amount of data limited

– Hard to get answers for the same query

– Unlikely to be available in a real system


153Eugene Agichtein

Emory University


Searcher’s Clicks

• Implicit behavior-based

indicator of relevance

• Benefits

– Possible to collect from

all users

• Drawbacks

– People click by mistake

or get side tracked

– Biased towards what is

presented


154Eugene Agichtein

Emory University




Similarity to Seen Content

• Implicit content-based indicator of relevance

• Benefits

– Can collect from all users

– Can collect for all queries

• Drawbacks

– Privacy considerations

– Measures of textual similarity noisy


155Eugene Agichtein

Emory University


Evaluating Personalized Search

• Explicit judgments (offline and in situ)

– Evaluate components before system

– NOTE: What’s relevant for you

• Deploy system

– Verbatim feedback, Questionnaires, etc.

– Measure behavioral interactions (e.g., click, reformulation, abandonment, etc.)

– Click biases –order, presentation, etc.

– Interleaving for unbiased clicks

• Link implicit and explicit (Curious Browser toolbar)

• From single query to search sessions and beyond

156

Eugene Agichtein

Emory University




User Control in Personalization (RF)

157

J-S. Ahn, P. Brusilovsky, D. He, and S.Y. Syn. Open user profiles for adaptive

news systems: Help or harm? WWW 2007

Eugene Agichtein

Emory University


Personalization Summary

• Lots of relevant content ranked low

• Potential for personalization high

• Implicit measures capture explicit variation

– Behavior-based: Highly accurate

– Content-based: Lots of variation

• Example: Personalized Search

– Behavior + content work best together

– Improves search result click through

158Eugene Agichtein

Emory University




New Direction: Active Learning

• Goal: Learn the relevances with as little training

data as possible.

• Search involves a three step process:

1. Given relevance estimates, pick a ranking to display to

users.

2. Given a ranking, users provide feedback: User clicks

provide pairwise relevance judgments.

3. Given feedback, update the relevance estimates.

159


Eugene Agichtein Emory University


Overview of Approach

• Available information:1. Have an estimate of the relevance of each result.

2. Can obtain pairwise comparisons of the top few results.

3. Do not have absolute relevance information.

• Goal: Learn the document relevance quickly.

• Addresses four questions:1. How to represent knowledge about doc relevance.

2. How to maintain this knowledge as we collect data.

3. Given our knowledge, what is the best ranking?

4. What rankings do we show users to get useful data?

160


Eugene Agichtein

Emory University




• Given a fixed query, maintain knowledge about

relevance as clicks are observed.

– This tells us which documents we are sure about, and

which ones need more data.

161

1: Representing Document Relevance[Radlinski & Joachims, KDD 2007]

Eugene Agichtein

Emory University


• Problem: could present the ranking based on

current best estimate of relevance.

– Then the data we get would always be about the

documents already ranked highly.

• Instead, optimize ranking shown users:

1. Pick top two docs to minimize future loss

2. Append current best estimate ranking.

162

4: Getting Useful Data[Radlinski & Joachims, KDD 2007]

Eugene Agichtein

Emory University




4: Exploration Strategies[Radlinski & Joachims, KDD 2007]

Eugene Agichtein

Emory University


Results: TREC Data [Radlinski & Joachims, KDD 2007]

Optimizing for relevance estimates better than for ordering

Eugene Agichtein

Emory University




Tutorial Summary

�Understanding user behavior at micro-, meso-, and macro- levels

�Theoretical models of information seeking

�Web search behavior:�Levels of detail

�Search Intent

�Variations in web searcher behavior

�Keeping found things found

�Click models

165Eugene Agichtein

Emory University


Tutorial Summary (2)

� Inferring result relevance: clicks

�Richer behavior models:

�SERP presentation info

�Post-search behavior

�Rich interaction models for SERPs

�Contextualizing intent inference:

�Session-level models

�Personalization

166Eugene Agichtein

Emory University




Inferring Searcher Intent (Information)

Eugene Agichtein

Emory University

• Tutorial Page:

�� http://ir.mathcs.emory.edu/intent_tutorial/

See the online version for expanded and updated

bibliography

• Contact information for the instructor:

– Eugene Agichtein

– Email: [email protected]

– Homepage: http://www.mathcs.emory.edu/~eugene/

167


References and Further Reading (1)

• Marti Hearst, Search User Interfaces, 2009, Chapter 3 “Models of the

Information Seeking Process”: http://searchuserinterfaces.com/

• Teevan, J., Adar, E., Jones, R. and Potts, M. Information Re-Retrieval:

Repeat Queries in Yahoo's Logs, SIGIR 2007

• Clarke, C, E. Agichtein, S. Dumais and R. W. White, The Influence of

Caption Features on Clickthrough Patterns in Web Search, SIGIR 2007

• Craswell, N., Zoeter, O., Taylor, M., Ramsey, B. An experimental

comparison of click position-bias models, WSDM 2008

• Dupret, G and Piwowarski, B: A user browsing model to predict

search engine click data from past observations. SIGIR 2008

• White, R and D. Morris, Investigating the Querying and Browsing

Behavior of Advanced Search Engine Users, SIGIR 2007

168Eugene Agichtein

Emory University





• Marti Hearst, Search User Interfaces, 2009, Chapter 3 “Models of the

Information Seeking Process”: http://searchuserinterfaces.com/

• Teevan, J., Adar, E., Jones, R. and Potts, M. Information Re-Retrieval:

Repeat Queries in Yahoo's Logs, SIGIR 2007

• Clarke, C, E. Agichtein, S. Dumais and R. W. White, The Influence of

Caption Features on Clickthrough Patterns in Web Search, SIGIR 2007

• Craswell, N., Zoeter, O., Taylor, M., Ramsey, B. An experimental

comparison of click position-bias models, WSDM 2008

• Dupret, G and Piwowarski, B: A user browsing model to predict

search engine click data from past observations. SIGIR 2008

• White, R and D. Morris, Investigating the Querying and Browsing

Behavior of Advanced Search Engine Users, SIGIR 2007

169Eugene Agichtein

Emory University



Kelly, D. and Teevan, J. Implicit feedback for inferring user preference: a bibliography. SIGIR Forum 37, 2 (Sep. 2003)

Joachims, T., Granka, L., Pan, B., Hembrooke, H., and Gay, G. Accurately interpreting clickthrough data as implicit feedback., SIGIR 2005

Agichtein, E., Brill, E., Dumais, S., and Ragno, R. Learning user interaction models for predicting web search result preferences, SIGIR 2006

Buscher, G., Dengel, A., and van Elst, L. Query expansion using gaze-based feedback on the subdocument level., SIGIR 2008

Chapelle, O, and Y. Zhang, A Dynamic Bayesian Network Click Model for Web Search Ranking, WWW 2009

Piwowarski, B, Dupret, G, Jones, R: Mining user web search activity with layered bayesian networks or how to capture a click in its context, WSDM 2009

Guo, Q and Agichtein, E. Ready to Buy or Just Browsing? Detecting Web Searcher Goals from Interaction Data, to appear, SIGIR 2010

170Eugene Agichtein

Emory University

Inferring Searcher Intent

Documents

Transcript of Inferring Searcher Intent