Learn to Rank search results
-
Upload
ganesh-venkataraman -
Category
Technology
-
view
369 -
download
1
description
Transcript of Learn to Rank search results
Recruiting SolutionsRecruiting SolutionsRecruiting Solutions1
Learning-to-Rank Search ResultsGanesh Venkataraman
http://www.linkedin.com/in/npcomplete@gvenkataraman
Audience Classification
Search background ML background Search + ML background
2
Search
ML
Search+ML
0% 10% 20% 30% 40% 50% 60% 70%
Outline
Search Overview Why Learning to Rank (LTR)? Biases with collecting training data from click logs
– Sampling bias– Presentation bias
Three basic approaches– Point wise– Pair wise– List wise
Key Takeaways/Summary
3
tl;dr
Ranking interacts heavily with retrieval and query understanding
Ground truth > features > model*
List wise > pair wise > point wise
4
* Airbnb engineering blog: http://nerds.airbnb.com/architecting-machine-learning-system-risk/
5
Primer on Search
Information need query select from results
rank using IR model
user:
system:
bird’s-eye view of how a search engine works
6
7
Pre Retrieval/Retrieval/Post Retrieval
Pre retrieval– Process input query, rewrite, check for spelling etc.– Hit search (potentially several) nodes with appropriate query
Retrieval– Given a query, retrieve all documents matching query along with
a score
Post retrieval– Merge sort results from different search nodes– Add relevant information to search results used by front end
8
9
Claim #1 Search is about understanding the query/user intent
10
Understanding intent
10
TITLE CO GEO
TITLE-237software engineersoftware developer
programmer…
CO-1441Google Inc.
Industry: Internet
GEO-7583Country: US
Lat: 42.3482 NLong: 75.1890 W
(RECOGNIZED TAGS: NAME, TITLE, COMPANY, SCHOOL, GEO, SKILL )
11
Fixing user errors
typos
Help users spell names
12
Claim #2 Search is about understanding systems
13
The Search Index
Inverted Index: Mapping from (search) terms to list of documents (they are present in)
Forward Index: Mapping from documents to metadata about them
14
Posting List
Term Posting List
DO = “it is what it is”
D1 = “what is it”
D2 = “it is a banana”
DocId
a
banana
is
it
what
2
2
0
0
0 Frequency
1
1
2
2
1
Bold
B
1 1 2 1
1 1 2 1
1 1
15
Candidate selection for “abraham lincoln”
Posting lists– “abraham” => {5, 7, 8, 23, 47, 101}– “lincoln” => {7, 23, 101, 151}
Query = “abraham AND lincoln”– Retrieved set => {7, 23, 101}
Some systems level issues– How to represent posting lists efficiently?– How does one traverse a very long posting list? (for words like
“the”, “an” etc.)?
16
Claim #3 Search is ranking problem
17
What is search ranking?
Ranking – Find a ordered list of documents according to relevance between
documents and query.
Traditional search– f(query, document) => score
Social networks context– f(query, document, user) => score– Find an ordered list of documents according to relevance
between documents, query and user
18
Why LTR?
Manual models become hard to tune with very large number of features and non-convex interactions
Leverages large volume of click through data in an automated way
Unique challenges involved in crowdsourcing personalized ranking
Key Issues– How do we collect training data?– How do we avoid biases?– How do we train the model?
TRAINING
Documents for training
Features
Human evaluation
Labels
Machine learning model
TRAINING
Documents for training
Features
Human evaluation
Labels
Machine learning model
21
training options – crowdsourcing judgment
Crowd source judgments• (query, user, document) -> label• {1, 2, 3, 4, 5}, higher label =>
better• Issues
• Personalized world• Difficult to scale
Mining click stream
Approach: Clicked = Relevant, Not-Clicked = Not Relevant
User eye scan direction
Unfairly penalized?
23
Position Bias
“Accurately interpreting clickthrough data as implicit feedback” – Joachims et. al, ACM SIGIR, 2005.– Experiment #1
Present users with normal Google search results 55.56% users clicked first result 5.56% clicked second result
– Experiment #2 Same result page as first experiment, but 1st and 2nd result were
flipped 57.14% users clicked first result 7.14% clicked second result
FAIR PAIRS
[Radlinski and Joachims, AAAI’06]
• Fair Pairs: • Randomize, Clicked= R,
Skipped= NR
FAIR PAIRS
Flipped
[Radlinski and Joachims, AAAI’06]
• Fair Pairs: • Randomize, Clicked= R,
Skipped= NR
FAIR PAIRS
Flipped
[Radlinski and Joachims, AAAI’06]
• Fair Pairs: • Randomize, Clicked= R,
Skipped= NR• Great at dealing with position bias• Does not invert models
27
Issue #2 – Sampling Bias
Sample bias– User clicks or skips only what is shown.– What about low scoring results from existing model?– Add low-scoring results as ‘easy negatives’ so model
learns bad results not presented to user.
…
label 0
label 0
label 0
label 0
…
page 1 page 2 page 3 page n
28
Issue #2 – Sampling Bias
29
Avoiding Sampling Bias – Easy negatives
Invasive way– For a small sample of users add bad results in the SERP page to
test that the results were indeed bad– Not really recommended since it affects UX
Non-Invasive way– Assume we have a decent model– Take tail results and add them to model as an “easy negative”– Similar approach can be done for “easy positives” depending on
applications
30
How to collect training data?
Implicit relevance judgments from click logs – including clicked and unclicked results from SERP (avoids position bias)
Add easy negatives (avoids sampling bias)
Mining click stream
Approach: Relevance labels
Label = 5 (Most relevant)
Label = 0 (least relevant)
Label = 2
Model
Learning to Rank
Pointwise: Reduce ranking to binary classification
33
+
+
+
-
Q1
+
-
-
-
Q2
+
+
-
-
Q3
Learning to Rank
Pointwise: Reduce ranking to binary classification
34
+
+
+
-
Q1
+
-
-
-
Q2
+
+
-
-
Q3
Learning to Rank
Pointwise: Reduce ranking to binary classification
35
+
+
+
-
Q1
+
-
-
-
Q2
+
+
-
-
Q3
Limitations Assume relevance is absolute Relevant documents associated with different queries are put into the
same class
Learning to Rank
Pairwise: Reduce ranking to classification of document pairs w.r.t. the same query
– {(Q1, A>B), (Q2, C>D), (Q3, E>F)}
36
Learning to Rank
Pairwise: Reduce ranking to classification of document pairs w.r.t the same query
– {(Q1, A>B), (Q2, C>D), (Q3, E>F)}
37
Learning to Rank Pairwise
– No longer assume absolute relevance– Limitation: Does not differentiate inversions at top vs. bottom positions
38
Listwise approach - DCG
Objective – Come up with a function to convert entire set of ranked search results, each with relevance labels into a score
Characteristics of such a function– Higher relevance in ranked set => higher score– Higher relevance in ranked set on higher positions => higher
score
p documents in the search results, each document ‘i’ has a relevance reli.
39
DCG
40
Rank Discounted Gain
1 3
2 4.4
3 0.5
7.9
NDCG based optimization
NDCG@k = Normalized(DCG@k) Ensures value is between 0.0 and 1.0 Since NDCG directly represents the “value” of particular
ranking given the relevance labels, one can directly formulate ranking as maximizing NDCG@k (say k = 5)
Directly pluggable into a variety of algorithms including coordinate ascent
41
Learning to Rank
42
Point wise
Simple to understand and debugStraight forward to use ✕Query independent ✕Assumes relevance is absolute
Pair wise
Assumes relevance is relativeDepends on query✕Loss function agnostic to position
List Wise
Directly operate on ranked listsLoss function aware of position✕More complicated, non-convex functions, higher
training time
43
Search Ranking
Click Logs Training Data ModelOffline
Evaluation
Online A/B test/debug
score = f(query, user, document)
tl;dr revisited
Ranking interacts heavily with retrieval and query understanding– Query understanding affects intent detection, fixing user errors etc.– Retrieval affects candidate selection, speed etc.
Ground truth > features > model*– Truth data is affected by biases
List wise > pair wise > point wise– Listwise while more complicated avoids some model level issues in
pairwise and point wise methods
44
* Airbnb engineering blog: http://nerds.airbnb.com/architecting-machine-learning-system-risk/
Useful references
“From RankNet to LambdaRank to LambdaMART: An overview” – Christopher Burges
“Learning to Rank for Information Retrieval” – Tie-Yan Liu
RankLib – has implementations of several LTR approaches
45
LinkedIn search is powered by …
46
We are hiring !!
careers.linkedin.com