Improving Web Search Ranking by Incorporating User Behavior Information
-
Upload
hadassah-pacheco -
Category
Documents
-
view
24 -
download
0
description
Transcript of Improving Web Search Ranking by Incorporating User Behavior Information
Improving Web Search Ranking by Incorporating User Behavior Information Eugene AgichteinEric BrillSusan Dumais
Microsoft
Research
22
Web Search RankingWeb Search Ranking Rank pages relevant for a queryRank pages relevant for a query
– Content matchContent match e.g., page terms, anchor text, term e.g., page terms, anchor text, term
weightsweights
– Prior document qualityPrior document quality e.g., web topology, spam featurese.g., web topology, spam features
– Hundreds of parametersHundreds of parameters
Tune ranking functions on explicit Tune ranking functions on explicit document relevance ratings document relevance ratings
33
Query: Query: SIGIR 2006SIGIR 2006
Users can help indicate most relevant Users can help indicate most relevant resultsresults
44
Web Search Ranking: Web Search Ranking: RevisitedRevisited
Incorporate user behavior informationIncorporate user behavior information– Millions of users submit queries dailyMillions of users submit queries daily– Rich user interaction features (earlier talk)Rich user interaction features (earlier talk)– Complementary to content and web topologyComplementary to content and web topology
Some challenges:Some challenges:– User behavior “in the wild” is not reliableUser behavior “in the wild” is not reliable– How to integrate interactions into rankingHow to integrate interactions into ranking– What is the impact over all queriesWhat is the impact over all queries
55
OutlineOutline
Modelling user behavior for rankingModelling user behavior for ranking
Incorporating user behavior into Incorporating user behavior into rankingranking
Empirical evaluationEmpirical evaluation
ConclusionsConclusions
66
Related WorkRelated Work PersonalizationPersonalization
– Rerank results based on user’s Rerank results based on user’s clickthrough and browsing historyclickthrough and browsing history
Collaborative filteringCollaborative filtering– Amazon, DirectHit: rank by clickthroughAmazon, DirectHit: rank by clickthrough
General rankingGeneral ranking– Joachims et al. [KDD 2002], Radlinski et al. Joachims et al. [KDD 2002], Radlinski et al.
[KDD 2005]: tuning ranking functions with [KDD 2005]: tuning ranking functions with clickthroughclickthrough
77
Rich User Behavior Feature SpaceRich User Behavior Feature Space
Observed and distributional featuresObserved and distributional features– Aggregate observed values over all user interactions Aggregate observed values over all user interactions
for each query and result pairfor each query and result pair– Distributional features: deviations from the Distributional features: deviations from the
“expected” behavior for the query“expected” behavior for the query
Represent user interactions as vectors in user Represent user interactions as vectors in user behavior spacebehavior space– PresentationPresentation: what a user sees : what a user sees beforebefore a a
clickclick– ClickthroughClickthrough: frequency and timing of clicks: frequency and timing of clicks– BrowsingBrowsing: what users do : what users do afterafter a click a click
88
Some User Interaction FeaturesSome User Interaction Features
PresentationPresentation
ResultPositionResultPosition Position of the URL in Current rankingPosition of the URL in Current ranking
QueryTitleOverlaQueryTitleOverlapp
Fraction of query terms in result TitleFraction of query terms in result Title
Clickthrough Clickthrough
DeliberationTimeDeliberationTime Seconds between query and first clickSeconds between query and first click
ClickFrequencyClickFrequency Fraction of all clicks landing on pageFraction of all clicks landing on page
ClickDeviationClickDeviation Deviation from expected click Deviation from expected click frequencyfrequency
Browsing Browsing
DwellTimeDwellTime Result page dwell timeResult page dwell time
DwellTimeDeviatiDwellTimeDeviationon
Deviation from expected dwell time for Deviation from expected dwell time for queryquery
99
Training a User Behavior ModelTraining a User Behavior Model
Map user behavior features to Map user behavior features to relevance judgementsrelevance judgements
RankNet: RankNet: Burges et al., [ICML 2005]Burges et al., [ICML 2005]– Scalable Neural Net implementationScalable Neural Net implementation– Input: user behavior + relevance labelsInput: user behavior + relevance labels– Output: weights for behavior feature Output: weights for behavior feature
valuesvalues– Used as testbed for all experimentsUsed as testbed for all experiments
1010
Training RankNetTraining RankNet
For query results 1 and 2, present pair For query results 1 and 2, present pair of vectors and labels, label(1) > label(2)of vectors and labels, label(1) > label(2)
1111
RankNet RankNet [Burges et al. [Burges et al. 2005]2005]
Feature Vector1 Label1
NN output 1
For query results 1 and 2, present pair of For query results 1 and 2, present pair of vectors and labels, label(1) > label(2)vectors and labels, label(1) > label(2)
1212
RankNet RankNet [Burges et al. [Burges et al. 2005]2005]
Feature Vector2 Label2
NN output 1 NN output 2
For query results 1 and 2, present pair of For query results 1 and 2, present pair of vectors and labels, label(1) > label(2)vectors and labels, label(1) > label(2)
1313
RankNet RankNet [Burges et al. [Burges et al. 2005]2005]
NN output 1 NN output 2
Error is function of both outputs(Desire output1 > output2)
For query results 1 and 2, present pair of For query results 1 and 2, present pair of vectors and labels, label(1) > label(2)vectors and labels, label(1) > label(2)
1414
Predicting with RankNetPredicting with RankNet
Feature Vector1
NN output
Present individual vector and get Present individual vector and get scorescore
1515
OutlineOutline
Modelling user behaviorModelling user behavior
Incorporating user behavior into Incorporating user behavior into rankingranking
Empirical evaluationEmpirical evaluation
ConclusionsConclusions
1616
User Behavior Models for User Behavior Models for RankingRanking
Use interactions from Use interactions from previousprevious instances of instances of queryquery– General-purpose (not personalized)General-purpose (not personalized)– Only available for queries with past user interactionsOnly available for queries with past user interactions
Models:Models:– Rerank, clickthrough only: Rerank, clickthrough only:
reorder results by number of clicksreorder results by number of clicks
– Rerank, predicted preferences (all user behavior Rerank, predicted preferences (all user behavior features): reorder results by predicted preferencesfeatures): reorder results by predicted preferences
– Integrate directly into ranker: Integrate directly into ranker: incorporate user interactions as features for the incorporate user interactions as features for the rankerranker
1717
Rerank, Clickthrough Rerank, Clickthrough OnlyOnly
Promote all clicked results to the Promote all clicked results to the top of the result listtop of the result list– Re-order by click frequencyRe-order by click frequency
Retain relative ranking of un-clicked Retain relative ranking of un-clicked resultsresults
1818
Rerank, Preference Rerank, Preference PredictionsPredictions
Re-order results by function of Re-order results by function of preference prediction scorepreference prediction score
Experimented with different variantsExperimented with different variants– Using inverse of ranksUsing inverse of ranks– Intuition: scores not comparable Intuition: scores not comparable merge merge
ranksranks
1
1
1
1),(
ddIdd OIwOIScore
1919
Integrate User Behavior Features Integrate User Behavior Features Directly into RankerDirectly into Ranker
For a given queryFor a given query– Merge original feature set with user Merge original feature set with user
behavior features when availablebehavior features when available
– User behavior features computed from User behavior features computed from previous interactions with same queryprevious interactions with same query
Train RankNet on enhanced feature Train RankNet on enhanced feature setset
2020
OutlineOutline
Modelling user behaviorModelling user behavior
Incorporating user behavior into Incorporating user behavior into rankingranking
Empirical evaluationEmpirical evaluation
ConclusionsConclusions
2121
Evaluation MetricsEvaluation Metrics Precision at K: fraction of relevant in top KPrecision at K: fraction of relevant in top K
NDCG at K: norm. discounted cumulative NDCG at K: norm. discounted cumulative gaingain– Top-ranked results most importantTop-ranked results most important
MAP: mean average precisionMAP: mean average precision– Average precision for each query: mean of the
precision at K values computed after each relevant document was retrieved
K
j
jrqq jMN
1
)( )1log(/)12(
2222
DatasetsDatasets 8 weeks of user behavior data from 8 weeks of user behavior data from
anonymized opt-in client instrumentationanonymized opt-in client instrumentation
Millions of unique queries and interaction Millions of unique queries and interaction tracestraces
Random sample of 3,000 queriesRandom sample of 3,000 queries– Gathered independently of user behaviorGathered independently of user behavior– 1,500 train, 500 validation, 1,000 test1,500 train, 500 validation, 1,000 test
Explicit relevance assessments for top Explicit relevance assessments for top 10 results for each query in sample10 results for each query in sample
2323
Methods ComparedMethods Compared Content only: Content only: BM25FBM25F
Full Search Engine: Full Search Engine: RNRN– Hundreds of parameters for content match and Hundreds of parameters for content match and
document qualitydocument quality– Tuned with RankNetTuned with RankNet
Incorporating User BehaviorIncorporating User Behavior– Clickthrough: Clickthrough: Rerank-CTRerank-CT– Full user behavior model predictions: Full user behavior model predictions: Rerank-Rerank-
AllAll – Integrate all user behavior features directly: Integrate all user behavior features directly:
+All+All
2424
Content, User Behavior: Content, User Behavior: Precision at K, queries with Precision at K, queries with interactionsinteractions
BM25 < Rerank-CT < Rerank-All < +All
0.38
0.43
0.48
0.53
0.58
0.63
1 3 5 10K
Precision
BM25
Rerank-CTRerank-All
BM25+All
2525
Content, User Behavior: NDCGContent, User Behavior: NDCG
BM25 < Rerank-CT < Rerank-All < +All
0.5
0.52
0.54
0.56
0.58
0.6
0.62
0.64
0.66
0.68
1 2 3 4 5 6 7 8 9 10K
NDCG
BM25Rerank-CTRerank-AllBM25+All
2626
Full Search Engine, User Full Search Engine, User Behavior: NDCG, MAPBehavior: NDCG, MAP
MAP Gain
RN 0.270
RN+ALL 0.321 0.052 (19.13%)
BM25 0.236
BM25+ALL 0.2920.056 (23.71%)
0.56
0.58
0.6
0.62
0.64
0.66
0.68
0.7
0.72
0.74
1 2 3 4 5 6 7 8 9 10K
NDCG
RNRerank-AllRN+All
2727
Impact: All Queries, Precision Impact: All Queries, Precision at Kat K
< 50% of test queries w/ prior interactions+0.06-0.12 precision over all test queries
0.4
0.45
0.5
0.55
0.6
0.65
0.7
1 3 5 10K
Precision
RNRerank-AllRN+All
2828
Impact: All Queries, NDCGImpact: All Queries, NDCG
+0.03-0.05 NDCG over all test queries
0.56
0.58
0.6
0.62
0.64
0.66
0.68
0.7
1 2 3 4 5 6 7 8 9 10K
NDCG
RNRerank-AllRN+All
2929
Which Queries Benefit MostWhich Queries Benefit Most
0
50
100
150
200
250
300
350
0.1 0.2 0.3 0.4 0.5 0.6
-0.4-0.35-0.3-0.25-0.2-0.15-0.1-0.0500.050.10.150.2
Frequency Average Gain
Most gains are for queries with poor ranking
3030
ConclusionsConclusions Incorporating user behavior into Incorporating user behavior into
web search ranking dramatically web search ranking dramatically improves relevanceimproves relevance
Providing rich user interaction Providing rich user interaction features to ranker is the most features to ranker is the most effective strategyeffective strategy
Large improvement shown for up Large improvement shown for up to 50% of test queriesto 50% of test queries
3131
Thank youThank you
Text Mining, Search, and Navigation group: http://research.microsoft.com/tmsn/
Adaptive Systems and Interaction group:http://research.microsoft.com/adapt/
Microsoft
Research
3232
Content,User Behavior: Content,User Behavior: All Queries, Precision at KAll Queries, Precision at K
BM25 < Rerank-CT < Rerank-All < All
0.35
0.4
0.45
0.5
0.55
0.6
0.65
1 3 5 10K
Precision
BM25
Rerank-CT
Rerank-All
All
3333
Content, User Behavior: Content, User Behavior: All Queries, NDCGAll Queries, NDCG
BM25 << Rerank-CT << Rerank-All < All
0.5
0.52
0.54
0.56
0.58
0.6
0.62
0.64
0.66
0.68
1 2 3 4 5 6 7 8 9 10K
NDCG
BM25Rerank-CTRerank-AllAll
3434
Results SummaryResults Summary Incorporating user behavior into web Incorporating user behavior into web
search ranking dramatically improves search ranking dramatically improves relevancerelevance
Incorporating user behavior features into Incorporating user behavior features into ranking directly most effective strategyranking directly most effective strategy
Impact on relevance substantial Impact on relevance substantial
Poorly performing queries benefit mostPoorly performing queries benefit most