Inferring Searcher Intent

Click here to load reader

  • date post

    03-Feb-2022
  • Category

    Documents

  • view

    1
  • download

    0

Embed Size (px)

Transcript of Inferring Searcher Intent

Microsoft PowerPoint - AAAI2010-Inferring-Intent-slidesEugene Agichtein
Emory University
http://ir.mathcs.emory.edu/intent_tutorial/
Tutorial Overview
• Part 1: Search Intent Modeling – Motivation: how intent inference could help search
– Search intent & information seeking behavior in traditional IR
– Searcher models: from eye tracking to clickthrough mining
• Part 2: Inferring Web Searcher Intent – Inferring result relevance: clicks
– Richer interaction models: clicks + browsing
• Part 3: Applications and Extensions – Implicit feedback for ranking
– Contextualized prediction: session modeling
2 Eugene Agichtein
About the Instructor
• Research: Information retrieval and data mining – Mining search behavior and interactions in web search – Text mining, information extraction, and question answering
• Relevant experience: 2006 - Assistant Professor, Emory University Summer’07: Visiting Researcher, Yahoo! Research 2004-06: Postdoc, Microsoft Research 1998 - 2004: PhD student, Columbia
• Databases/IR
Outline: Search Intent and Behavior
Motivation: how intent inference could help search
• Search intent and information seeking behavior – Classical models of information seeking
• Web searcher intent
• Web searcher behavior – Levels of modeling: micro-, meso-, and macro- levels
– Variations in web searcher behavior
– Click models
Some Key Challenges for Web Search
• Query interpretation (infer intent)
Example: Task-Goal-Search Model
Eugene Agichtein
Emory University
Information Retrieval Process Overview
Eugene Agichtein
Emory University
Explicit Intentions in Query Logs
• Match known goals (from ConceptNet) to query logs
Eugene Agichtein
Eugene Agichtein, Emory University 11 July 2010
AAAI 2010 Tutorial: Inferring Searcher Intent 5
AAAI 2010 Tutorial: Inferring Searcher Intent 7/11/2010
Unfortunately, most queries are not so explicit…
Eugene Agichtein
Outline: Search Intent and Behavior
Motivation: how intent inference could help search
Search intent and information seeking behavior – Classical models of information seeking
• Web Searcher Intent – Broder
– More recent?
• Web Searcher Behavior – Levels of modeling: micro-, meso-, and macro- levels
– Variations in web searcher behavior
– Click models
Information Seeking Funnel
• Wandering: the user does not have an information seeking-goal in mind. May have a meta-goal (e.g. “find a topic for my final paper.”)
• Exploring: the user has a general goal (e.g. “learn about the history of communication technology”) but not a plan for how to achieve it.
• Seeking: the user has started to identify information needs that must be satisfied (e.g. “find out about the role of the telegraph in communication.”), but the needs are open- ended.
• Asking: the user has a very specific information need that corresponds to a closed-class question (“when was the telegraph invented?”).
Eugene Agichtein
Models of Information Seeking
search, conducting the search,
evaluating the results, and …
iterating through the process.”-
Eugene Agichtein
Emory University
Reviewing Results: Relevance Clues
relevant? What do people look for in order to infer
relevance?
making relevance inferences
13 Eugene Agichtein
Information Scent for Navigation
14
the user with clues about which
results to click Eugene Agichtein
Emory University
Dynamic “Berry Picking” Model
15
[Bates, 1989] M.J. Bates. The design of browsing and berrypicking techniques for the on-
line search interface. Online Review, 13(5):407–431, 1989.
Eugene Agichtein
Emory University
Goal: maximize rate of
Basic Problem: should I
current patch, how long to continue
searching in that patch
Eugene Agichtein, Emory University 11 July 2010
AAAI 2010 Tutorial: Inferring Searcher Intent 9
AAAI 2010 Tutorial: Inferring Searcher Intent 7/11/2010
Diminishing returns: 80% of users scan only first 3 pages of search results
-Charnov’s Marginal Value Theorem
17 Eugene Agichtein Emory
Hotel Search
Eugene Agichtein
Example: Hotel Search (cont’d)
Eugene Agichtein
Orienteering vs. Teleporting
• Orienteering:
– Searcher issues a quick, imprecise to get to approximately the right information space region
– Searchers follow known paths that require small steps that move them closer to their goal
– Easy (does not require to generate a “perfect” query)
• Teleporting:
– Expert searchers issue longer queries
– Requires more effort and experience.
– Until recently, was the dominant IR model
20 Eugene Agichtein
Eugene Agichtein, Emory University 11 July 2010
AAAI 2010 Tutorial: Inferring Searcher Intent 11
AAAI 2010 Tutorial: Inferring Searcher Intent 7/11/2010
Serendipity
AAAI 2010 Tutorial: Inferring Searcher Intent 7/11/2010
Summary of Models
• Static, berry-picking, information foraging, orienteering, serendipity
• Classical IR Systems research mainly uses the simplest form of relevance (topicality)
• Open questions:
– How to incorporating other forms of relevance (e.g., user goals/needs/tasks) into IR systems
Eugene Agichtein
Part 1: Search Intent and Behavior
Motivation: how intent inference could help search
Search intent and information seeking behavior Classical models of information seeking
• Web Searcher Intent – Broder
– More recent?
• Web Searcher Behavior – Levels of modeling: micro-, meso-, and macro- levels
– Variations in web searcher behavior
– Click models
Intent Classes (top level only)
User intent taxonomy (Broder 2002)
– Informational – want to learn about something (~40% / 65%)
– Navigational – want to go to that page (~25% / 15%)
– Transactional – want to do something (web-mediated) (~35% / 20%)
• Access a serviceDownloads
Eugene Agichtein
Emory University
24
Extended User Goal Taxonomy
Complex Search
A complex search task refers to cases where:
• searcher needs to conduct multiple searches to locate the information sources needed,
• completing the search task spans over multiple sessions (task is interrupted by other things),
• searcher needs to consult multiple sources of information (all the information is not available from one source, e.g., a book, a webpage, a friend),
• requires a combination of exploration and more directed information finding activities,
• often requires note-taking (cannot hold all the information that is needed to satisfy the final goal in memory), and
• specificity of the goal tends to vary during the search process (often starts with exploration).
Eugene Agichtein
Complex Search (Cont’d)
Web Search Queries
– Few queries posed and few answers seen (first page)
– Reformulation common
Eugene Agichtein
AAAI 2010 Tutorial: Inferring Searcher Intent 7/11/2010 Eugene Agichtein
Emory University 29
Intent Distribution by Topic
Query Distribution by Demographics
Query Demographics 2: Demo
Keywords have demographic signatures
– Microsoft adCenter Demographics Prediction:
Domain-Specific Intents: Named Entities
Example Intent Taxonomy for Musicians
• Musicians (most
popular phrases)
Eugene Agichtein
Analyzing Searches: Funneling
• What is the intent of customers that type such queries?
• Hint: What they searched before/after?
– Search Funnels: http://adlab.msn.com/searchfunnel/
Part 1: Search Intent and Behavior
Motivation: how intent inference could help search
Search intent and information seeking behavior Classical models of information seeking
Web Searcher Intent Broder
Rose
Demographics
• Web Searcher Behavior – Levels of modeling: micro-, meso-, and macro- levels
– Variations in web searcher behavior
– Click models
Web Searcher Behavior
characteristics
pages
36 Eugene Agichtein
Levels of Understanding Searcher Behavior
• Micro (eye tracking): lowest level of detail, milliseconds
• Meso (field studies): mid-level, minutes to days
• Macro (session analysis): millions of observations, days to months
37
Search Behavior: Scales
Information Retrieval Process (User view)
Eugene Agichtein
(Source: iprospect.com WhitePaper_2006_SearchEngineUserBehavior.pdf)
Snippet Views Depend on Rank
Mean: 3.07 Median: 2.00
[Daniel M. Russell, 2007]
AAAI 2010 Tutorial: Inferring Searcher Intent 7/11/2010
Snippet Views and Clicks Depend on Rank [from Joachims et al, SIGIR 2005]
Eugene Agichtein
• Eye tracking gives information
Micro-level: Examining Results
• What they see in lower summaries may influence
judgment of higher result
– Trust the ranking
POM: Partially Observable Model
AAAI 2010 Tutorial: Inferring Searcher Intent 7/11/2010
Result Examination (cont’d)
• Behavior varies with
[K. Rodden, X. Fu, A. Aula, and I. Spiro, Eye-mouse
coordination patterns on web search results pages,
Extended Abstracts of ACM CHI 2008]
Eugene Agichtein
Emory University
Result Ex. (cont): Predicting Eye-Mouse coordination
47
x 10 4
150
300
450
600
Time
150
300
450
600
Time
Macro-Level (Session) Analysis
• Can examine theoretical user models in light of empirical data: – Orienteering?
– Foraging?
– Multi-tasking?
• Search is often a multi-step process: – Find or navigate to a good site (“orienteering”)
– Browse for the answer there: [actor most oscars] vs. [oscars]
• Teleporting – “I wouldn’t use Google for this, I would just go to…”
• Triangulation – Draw information from multiple sources and interpolate
– Example: “how long can you last without food?”
48 Eugene Agichtein
Users (sometimes) Multi-task
Kinds of Search+Browsing Behavior
50
• 57% of all tabbed sessions are parallel browsing
• Can mean multi- tasking
Eugene Agichtein
Search Engine Switching Behavior
• 4% of all search sessions contained a switching event
• Switching events:
• 1.4% of all Google / Yahoo! / Live queries followed by switch
– 12.6% of all switching events involved same query
– Two-thirds of switching events from browser search box
• Users:
Eugene Agichtein, Emory University 11 July 2010
AAAI 2010 Tutorial: Inferring Searcher Intent 27
AAAI 2010 Tutorial: Inferring Searcher Intent 7/11/2010
Overview of Search Engine Switching
• Switching is more frequent in longer sessions
White et al., CIKM 2009
53 Eugene Agichtein
Overview of Switching - Survey
• 70.5% of survey respondents reported having switched – Remarkably similar to the 72.6% observed in logs
• Those who did not switch: – Were satisfied with current engine (57.8%)
– Believed no other engine would perform better (24.0%)
– Felt that it was too much effort to switch (6.8%)
– Other reasons included brand loyalty, trust, privacy
• Within-session switching: – 24.4% of switching users did so “Often” or “Always”
– 66.8% of switching users did so “Sometimes”
White et al., CIKM 2009
54 Eugene Agichtein
Reasons for Engine Switching
– Desire to verify or find additional information
– User preference
55 Eugene Agichtein
Pre-switch Behavior
• This is the action immediately before the switch
• What about pre-switch activity across the session?
White et al., CIKM 2009
56 Eugene Agichtein
Pre-switch Behavior (Survey)
immediately preceding a switch that may indicate to an
observer that you are about to switch engines?”
• Common answers:
– Try several small query changes in pretty quick succession
– Go to more than the first page of results, again often in quick succession and often without clicks
– Go back and forth from SERP to individual results, without spending much time on any
– Click on lots of links, then switch engine for additional info
– Do not immediately click on something
White et al., CIKM 2009
57 Eugene Agichtein
Post-switch Behavior
– 20% of switches eventually lead to return to origin engine
– 6% of switches eventually lead to use of third engine
• > 50% led to a result click. Are users satisfied?
White et al., CIKM 2009
58 Eugene Agichtein
Post-Switch Satisfaction
– % queries with No Clicks, # Actions to SAT (>30sec dwell)
• Users issue more queries/actions; seem less satisfied (higher %NoClicks and more actions to SAT)
• Switching queries may be challenging for search engines
Activity # Queries # Actions
Success % NoClicks # Actions to SatAction
Origin Destination Origin Destination
White et al., CIKM 2009
59 Eugene Agichtein
Search Behavior: Expertise
– Search expertise, not domain expertise
– Alternative explanation: Orienteering vs. Teleporting
• Find characteristics of these “advanced search engine users” in an effort to better understand how these users search
• Understanding what advanced searchers are doing could improve the search experience for everyone
60
Findings – Post-query browsing
Session Secs 701.10 706.21 792.65 903.01 1114.71
Trail Secs 205.39 159.56 156.45 147.91 136.79
Display Secs 36.95 32.94 34.91 33.11 30.67
Num. Steps 4.88 4.72 4.40 4.40 4.39
Num. Revisits 1.20 1.02 1.03 1.03 1.02
Num. Branches
%Trails 72.14% 27.86% .83% .23% .05%
%Users 79.90% 20.10% .79% .18% .04%
Non-advanced More advanced Advanced
Search Behavior: Demographics
ReFinding Behavior
• 40% of the queries led to a click on a result that the
same user had clicked on
in a past search session.
– Teevan et al., 2007
year’s SIGIR 2010?
– Does not really matter,
63
Eugene Agichtein
Emory University
What Is Known About Re-Finding
• Re-finding recent topic of interest
• Web re-visitation common [Tauscher & Greenberg]
• People follow known paths for re-finding
– Search engines likely to be used for re-finding
• Query log analysis of re-finding
– Query sessions [Jones & Fain]
– Temporal aspects [Sanderson & Dumais]
Eugene Agichtein, Emory University 11 July 2010
AAAI 2010 Tutorial: Inferring Searcher Intent 33
AAAI 2010 Tutorial: Inferring Searcher Intent 7/11/2010
3100
(24%)
36
(<1%)
635
(5%)
485
(4%)
637
(5%)
4
(<1%)
660
(5%)
7503
(57%)
Click on different
65
Rank Change Degrades Re-Finding
– No rank change: 88% chance
– Rank change: 53% chance
– Compared with initial search to click
– No rank change: Re-click is faster
– Rank change: Re-click is slower
[From Teevan et al, 2007]
66 Eugene Agichtein
Aside: Mobile Search…
• Some references:
M. Jones, Mobile Search Tutorial, Mobile HCI, 2009
– K. Church, B. Smyth, K. Bradley, Keith and P. Cotter. A large scale study of European mobile search behaviour. Mobile HCI, 2008
– Kamvar, M., Kellar, M., Patel, R., and Xu, Y. Computers and iphones and mobile phones, oh my!: a logs- based comparison of search users on different devices. WWW 2009
– Kamvar, M. and Baluja, S. 2008. Query suggestions for mobile search: understanding usage patterns, CHI 2008
Eugene Agichtein
Part 1: Summary
Theoretical models of information seeking
Web search behavior:
Levels of detail
Keeping found things found
Eugene Agichtein
Emory University
Eugene Agichtein
Emory University
Tutorial Overview
Motivation: how intent inference could help search
Web search intent & information seeking models
Web searcher behavior models
– Inferring result relevance: clicks
– Contextualizing intent models: personalization
Part 2: Inferring Searcher Intent
• Inferring result relevance: clicks
• Contextualizing intent inference:
Implicit Feedback
– Some searches are precision-oriented (no “more like this”)
– They’re lazy or annoyed:
– “Was this document helpful?”
the user to do anything?
• Goal: estimate relevance from behavior
Click
Observable Behavior
Minimum Scope
Clicks as Relevance Feedback
• Limitations:
– Hard to determine the meaning of a click. If the best
result is not displayed, users will click on something
– Presentation bias
• People leave machines unattended
• Multitasking
– Sparse, inconsistent ratings
“Strawman” Click model: No Bias
• Naive Baseline
– rd is P( Click=True | Document=d )
• Why this baseline?
– We know that rd is part of the explanation
– Perhaps, for ranks 9 vs 10, it’s the main explanation
– It is a bad explanation at rank 1 e.g. Eye tracking
Attractiveness of summary ~= Relevance of result
[Craswell et al., 2008]
Realistic Click models
– Relevance of a result to a query
– Visual appearance and layout
De-biasing position (first attempt)
Relative clickthrough for queries with known relevant results
in position 3 (results in positions 1 and 2 are not relevant)
1 2 3 5 10
Result Position
PTR=1
PTR=3
Simple Model: Deviation from Expected
• Relevance component: deviation from “expected”:
Relevance(q , d)= observed - expected (p)
-0.023 -0.029
-0.009 -0.001
Result position
• CD: distributional model, extends SA+N
– Clickthrough considered iff frequency > ε than expected
• Click on result 2 likely “by chance”
• 4>(1,2,3,5), but not 2>(1,3)
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
Result position
C li
c k
th ro
u g
h F
re q
u e
n c
y D
e v
ia ti
o n
Simple Model Results
Another Formulation
– Click based on relevance
• A.k.a. the OR model:
– Clicks arise from
relevance OR position
– Estimate with logistic regression 1 2 3 4 5 6 7 8 9 10 0
0.2
0.4
Linear Examination Hypothesis
• Users are less likely to look at lower ranks, therefore
less likely to click
– Clicks arise from
relevance AND examination
else is in the list
1 2 3 4 5 6 7 8 9 10 0
0.5
1
i
Cascade Model
• At each document d
– Click with probability rd
[Taylor et al., 2008]
Cascade Model (2)
C1 C2 C3 C4
ClickThroughs
rd
Cascade Model Example
• 0 click on result A in rank 1
• 100 click on result B in rank 2
• 100 click on result C in rank 3
• Cascade (with no smoothing) says:
• 0 of 500 clicked A rA = 0
• 100 of 500 clicked B rB = 0.2
• 100 of remaining 400 clicked C rC = 0.25
[Craswell et al., 2008]
Cascade Model Seems Closest to Reality
Best possible: Given the true click counts for ordering BA
[Craswell et al., 2008]
Dynamic Bayesan Net
Model for Web Search Ranking, WWW 2009
did user examine url?
was user satisfied by
Dynamic Bayesan Net
Model for Web Search Ranking, WWW 2009
Eugene Agichtein
Emory University
AAAI 2010 Tutorial: Inferring Searcher Intent 7/11/2010 89
Dynamic Bayesan Net (results)Click
Model for Web Search Ranking, WWW 2009
Use EM algorithm (similar to forward-backward
to learn model parameters; set manually
Eugene Agichtein
Emory University
Clicks: Summary So Far
• Bayes Net model: extension of Cascade model
shown to work well in practice
– Limitations?
• Questions?
90
Click
Capturing a Click in its Context
91
• Simple model based on time deltas & query similarities
Analysing the chains Analysing the chains
• Layered Bayesian Network (BN) model
Validation of the model Validation of the model
• Relevance of clicked documents
Click
Overall process
[Piwowarski et al., 2009]
Click
The BN gives the context of a click
94
Probability (Search state=… / observations)
Features for one click
– (BN) Chain/Page/Action/Relevance state distribution
– Word confidence values (averaged for the query)
– Time and position related features
• This is associated with a relevance judgment from
an editor and used for learning
[Piwowarski et al., 2009]
Learning with Gradient Boosted Trees
• Use a Gradient boosted trees (Friedman 2001),
with a tree depth of 4 (8 for non BN-based model)
• Used disjoint train (BN + GBT training) and test sets
– Two sets of sessions S1 and S2 (20 million chains) and
two set of queries + relevance judgment J1 and J2
(about 1000 queries with behavior data)
– Process (repeated 4 times):
• learn the BN parameters on S1+J1,
• extract the BN features and learn the GBT with S1+J1
• Extract the BN features and predict relevance assessments of
J2 with sessions of S2
[Piwowarski et al., 2009]
Results: Predicting Relevance of Clicked Docs [Piwowarski et al., 2009]
Click
• Effect of Caption Features on Clickthrough
Inversions, C. Clarke, E. Agichtien, S. Dumais, R.
White, SIGIR 2007
Clickthrough Inversions [Clarke et al., 2007]
Eugene Agichtein
Relevance is Not the Dominant Factor! [Clarke et al., 2007]
Eugene Agichtein
Eugene Agichtein
Feature Importance [Clarke et al., 2007]
Eugene Agichtein
Important Words in Snippet [Clarke et al., SIGIR 2007]
Eugene Agichtein
Extension: Use Fair Pairs Randomization
Click
data:
Eugene Agichtein, Emory University 11 July 2010
AAAI 2010 Tutorial: Inferring Searcher Intent 53
AAAI 2010 Tutorial: Inferring Searcher Intent 7/11/2010
Viewing Organic Results vs. Ads
• Ads and Organic results
Eugene Agichtein
AAAI 2010 Tutorial: Inferring Searcher Intent 7/11/2010
Part 2: Inferring Searcher Intent
Inferring result relevance: clicks
106 Eugene Agichtein
Richer Behavior Models
– Query + Browsing [Agichtein et al., SIGIR 2006]
– Active Prediction: [Yun et al., WWW 2010]
107 Eugene Agichtein
Curious Browser [Fox et al., 2003]
108 Eugene Agichtein
Data Analysis
• Implicit measures: Result-Level Session-Level
Diff Secs, Duration Secs Averages of result-level measures (Dwell Time
and Position)
Page, Page Position, Absolute Position Results visited
Visits End action
AAAI 2010 Tutorial: Inferring Searcher Intent 7/11/2010
Data Analysis, cont’d [Fox et al., 2003]
Eugene Agichtein Emory University 110
Eugene Agichtein, Emory University 11 July 2010
AAAI 2010 Tutorial: Inferring Searcher Intent 56
AAAI 2010 Tutorial: Inferring Searcher Intent 7/11/2010
Result-Level Findings
strongest predictors of SAT
predictive of SAT when present
3. Combined measures predict SAT better
than clickthrough
AAAI…