Learning to Re-rank: Query-Dependent Image Re-Ranking using Click Data Manik Varma Microsoft...
-
Upload
zoe-barkus -
Category
Documents
-
view
221 -
download
0
Transcript of Learning to Re-rank: Query-Dependent Image Re-Ranking using Click Data Manik Varma Microsoft...
![Page 1: Learning to Re-rank: Query-Dependent Image Re-Ranking using Click Data Manik Varma Microsoft Research India Vidit Jain Yahoo! Labs India.](https://reader036.fdocuments.net/reader036/viewer/2022062619/5517fc32550346c6568b5018/html5/thumbnails/1.jpg)
Learning to Re-rank: Query-Dependent Image
Re-Ranking using Click Data
Manik VarmaMicrosoft Research India
Vidit JainYahoo! Labs India
![Page 2: Learning to Re-rank: Query-Dependent Image Re-Ranking using Click Data Manik Varma Microsoft Research India Vidit Jain Yahoo! Labs India.](https://reader036.fdocuments.net/reader036/viewer/2022062619/5517fc32550346c6568b5018/html5/thumbnails/2.jpg)
Keyword-based image search
Best actress, Academy awards 2011• “Natalie Portman”
Highest h-index in computer science• “Scott Shenker” Professor @ UC Berkeley
![Page 3: Learning to Re-rank: Query-Dependent Image Re-Ranking using Click Data Manik Varma Microsoft Research India Vidit Jain Yahoo! Labs India.](https://reader036.fdocuments.net/reader036/viewer/2022062619/5517fc32550346c6568b5018/html5/thumbnails/3.jpg)
“Scott Shenker”
similar results from other search engines
![Page 4: Learning to Re-rank: Query-Dependent Image Re-Ranking using Click Data Manik Varma Microsoft Research India Vidit Jain Yahoo! Labs India.](https://reader036.fdocuments.net/reader036/viewer/2022062619/5517fc32550346c6568b5018/html5/thumbnails/4.jpg)
Our holy grail
![Page 5: Learning to Re-rank: Query-Dependent Image Re-Ranking using Click Data Manik Varma Microsoft Research India Vidit Jain Yahoo! Labs India.](https://reader036.fdocuments.net/reader036/viewer/2022062619/5517fc32550346c6568b5018/html5/thumbnails/5.jpg)
• Huge improvements in performance for tail queries
• Only the raw click data
• Efficient use of visual content
In this talk
< 20 (16) 20-40 (23) 40-60 (15) 60-100 (21) >100 (118)0
2
4
6
8
10
12
14
Avg
gai
n in
mea
n(n
DC
G@
20)
(%)
# of clicked images per query
0
2
4
6
8
Avg
gai
n in
mea
n(n
DC
G@
20)
(%)
![Page 6: Learning to Re-rank: Query-Dependent Image Re-Ranking using Click Data Manik Varma Microsoft Research India Vidit Jain Yahoo! Labs India.](https://reader036.fdocuments.net/reader036/viewer/2022062619/5517fc32550346c6568b5018/html5/thumbnails/6.jpg)
Limitation 1: Ignore visual content
Simple visual processing could improve results
![Page 7: Learning to Re-rank: Query-Dependent Image Re-Ranking using Click Data Manik Varma Microsoft Research India Vidit Jain Yahoo! Labs India.](https://reader036.fdocuments.net/reader036/viewer/2022062619/5517fc32550346c6568b5018/html5/thumbnails/7.jpg)
• Score(x) = wtx with a fixed w
• Query : “taj mahal”
• Query : “delhi”
Limitation 2: Static ranker within a vertical
delhi.jpg
tajmahal.jpg
![Page 8: Learning to Re-rank: Query-Dependent Image Re-Ranking using Click Data Manik Varma Microsoft Research India Vidit Jain Yahoo! Labs India.](https://reader036.fdocuments.net/reader036/viewer/2022062619/5517fc32550346c6568b5018/html5/thumbnails/8.jpg)
Limitation 3: Noisy training labels
Query : “night train”
![Page 9: Learning to Re-rank: Query-Dependent Image Re-Ranking using Click Data Manik Varma Microsoft Research India Vidit Jain Yahoo! Labs India.](https://reader036.fdocuments.net/reader036/viewer/2022062619/5517fc32550346c6568b5018/html5/thumbnails/9.jpg)
Overview of our solution
query Ranker
Original ranked list
Re-ranked list
f*visual
f*text
Pseudo-click estimation
clicked
![Page 10: Learning to Re-rank: Query-Dependent Image Re-Ranking using Click Data Manik Varma Microsoft Research India Vidit Jain Yahoo! Labs India.](https://reader036.fdocuments.net/reader036/viewer/2022062619/5517fc32550346c6568b5018/html5/thumbnails/10.jpg)
• Existing work for text document search• linear scanning of ranked list• relationship between relevance and clicks• [Joachims07, Radlinski08, Agrawal09, Cutrell07, …]
• Our contribution for image search• (challenge) 2D display of results• (challenge) no model for browsing/click behavior• use only raw click data
Novelty in our use of click data
![Page 11: Learning to Re-rank: Query-Dependent Image Re-Ranking using Click Data Manik Varma Microsoft Research India Vidit Jain Yahoo! Labs India.](https://reader036.fdocuments.net/reader036/viewer/2022062619/5517fc32550346c6568b5018/html5/thumbnails/11.jpg)
Evidence for clicks-relevance relationship
Document search Image search0
102030405060708090
clicked items that are relevant (%)
(short snippet) (thumbnail)[Agichtein06]
![Page 12: Learning to Re-rank: Query-Dependent Image Re-Ranking using Click Data Manik Varma Microsoft Research India Vidit Jain Yahoo! Labs India.](https://reader036.fdocuments.net/reader036/viewer/2022062619/5517fc32550346c6568b5018/html5/thumbnails/12.jpg)
• Rich gets richer• The curious case of “distracting” images• Little gain when only a few were clicked
Naïve solution – ClickBoosting
Click-boosting ranked listOriginal ranked list
![Page 13: Learning to Re-rank: Query-Dependent Image Re-Ranking using Click Data Manik Varma Microsoft Research India Vidit Jain Yahoo! Labs India.](https://reader036.fdocuments.net/reader036/viewer/2022062619/5517fc32550346c6568b5018/html5/thumbnails/13.jpg)
35
50 20 5 2
Pseudo-click estimation: Regression
Query : “night train”
(#clicks)
![Page 14: Learning to Re-rank: Query-Dependent Image Re-Ranking using Click Data Manik Varma Microsoft Research India Vidit Jain Yahoo! Labs India.](https://reader036.fdocuments.net/reader036/viewer/2022062619/5517fc32550346c6568b5018/html5/thumbnails/14.jpg)
23
50 20 5 2
Visual features are not enough
Need both visual and text features
“night rod”
Query : “night train”
(#clicks)
![Page 15: Learning to Re-rank: Query-Dependent Image Re-Ranking using Click Data Manik Varma Microsoft Research India Vidit Jain Yahoo! Labs India.](https://reader036.fdocuments.net/reader036/viewer/2022062619/5517fc32550346c6568b5018/html5/thumbnails/15.jpg)
Re-ranking function
queryCompute
text features
Compute visual
features
color, texture, shape
query dependentquery independent
ytext
yvisual
Regression
Score: sR(x) = a1 sO(x) + a2 ytext(x) + a3 yvisual(x)
sR
sO
![Page 16: Learning to Re-rank: Query-Dependent Image Re-Ranking using Click Data Manik Varma Microsoft Research India Vidit Jain Yahoo! Labs India.](https://reader036.fdocuments.net/reader036/viewer/2022062619/5517fc32550346c6568b5018/html5/thumbnails/16.jpg)
• #features (~3000) >> #clicked images (~10)
• Dimensionality reduction• unsupervised: only positive labeled data• and the winner is… PCA !!!
• Bayesian formulation of regression• Gaussian Process regression• prevents over-fitting to small no. of examples• non-optimized Matlab: 20 ms for a query with 20
clicked images
Regression: challenges
![Page 17: Learning to Re-rank: Query-Dependent Image Re-Ranking using Click Data Manik Varma Microsoft Research India Vidit Jain Yahoo! Labs India.](https://reader036.fdocuments.net/reader036/viewer/2022062619/5517fc32550346c6568b5018/html5/thumbnails/17.jpg)
• Bing: top 1000 retrieved images for 19 tail queries• e.g., “gnats”, “Bern”, “fracture”, “child drinking
water”
• Relevance• highly relevant, relevant, non-relevant• referred to parent documents when needed• used only for evaluation; not training
• Evaluation• nDCG @ 20 (normalized discounted cumulative gain)
Development set for experiments
![Page 18: Learning to Re-rank: Query-Dependent Image Re-Ranking using Click Data Manik Varma Microsoft Research India Vidit Jain Yahoo! Labs India.](https://reader036.fdocuments.net/reader036/viewer/2022062619/5517fc32550346c6568b5018/html5/thumbnails/18.jpg)
Baseline: mean nDCG@20 for Bing = 0.6854
SVR and 1-NN performance in between
Pseudo-click estimation: Regression
Approach Mean nDCG @ 20
Relative Improvement
Linear Regression 0.6871 + 0.2%GP Regression 0.7692 +12.2%
![Page 19: Learning to Re-rank: Query-Dependent Image Re-Ranking using Click Data Manik Varma Microsoft Research India Vidit Jain Yahoo! Labs India.](https://reader036.fdocuments.net/reader036/viewer/2022062619/5517fc32550346c6568b5018/html5/thumbnails/19.jpg)
sR(x) = a1 sO(x) + a2 yText(x) + a3 yVisual(x)
Re-scoring Function
Approach Mean nDCG @ 20
Relative Improvement
Baseline (a2 = a3 = 0) 0.6854 – Baseline + yText (a3 = 0) 0.7077 +3.3 %Baseline + yVisual (a2 = 0) 0.6136 –10.5%Baseline + yText + yVisual 0.7692 +12.2%
![Page 20: Learning to Re-rank: Query-Dependent Image Re-Ranking using Click Data Manik Varma Microsoft Research India Vidit Jain Yahoo! Labs India.](https://reader036.fdocuments.net/reader036/viewer/2022062619/5517fc32550346c6568b5018/html5/thumbnails/20.jpg)
Evaluation on 193 Queries
0
2
4
6
8
Av
g g
ain
in
mea
n(n
DC
G@
20)
(%
)
![Page 21: Learning to Re-rank: Query-Dependent Image Re-Ranking using Click Data Manik Varma Microsoft Research India Vidit Jain Yahoo! Labs India.](https://reader036.fdocuments.net/reader036/viewer/2022062619/5517fc32550346c6568b5018/html5/thumbnails/21.jpg)
Evaluation on 193 Queries
< 20 (16) 20-40 (23) 40-60 (15) 60-100 (21) >100 (118)0
2
4
6
8
10
12
14
Av
g g
ain
in
mea
n(n
DC
G@
20)
(%
)
# of clicked images per query
![Page 22: Learning to Re-rank: Query-Dependent Image Re-Ranking using Click Data Manik Varma Microsoft Research India Vidit Jain Yahoo! Labs India.](https://reader036.fdocuments.net/reader036/viewer/2022062619/5517fc32550346c6568b5018/html5/thumbnails/22.jpg)
Query: “fracture”
Bing Our results
Clicks predominately (~92%) on images of bone fracture
![Page 23: Learning to Re-rank: Query-Dependent Image Re-Ranking using Click Data Manik Varma Microsoft Research India Vidit Jain Yahoo! Labs India.](https://reader036.fdocuments.net/reader036/viewer/2022062619/5517fc32550346c6568b5018/html5/thumbnails/23.jpg)
Query: “gnats”
Bing Our results
Re-ranked list has only “highly relevant” images
![Page 24: Learning to Re-rank: Query-Dependent Image Re-Ranking using Click Data Manik Varma Microsoft Research India Vidit Jain Yahoo! Labs India.](https://reader036.fdocuments.net/reader036/viewer/2022062619/5517fc32550346c6568b5018/html5/thumbnails/24.jpg)
Query: “camel caravan”
Bing Our results
Anecdotally, our results were perceived as more visually pleasant
![Page 25: Learning to Re-rank: Query-Dependent Image Re-Ranking using Click Data Manik Varma Microsoft Research India Vidit Jain Yahoo! Labs India.](https://reader036.fdocuments.net/reader036/viewer/2022062619/5517fc32550346c6568b5018/html5/thumbnails/25.jpg)
Query: “turkey”
Bing Our results
Multiple interpretations are retained if manifested by clicks
305 446
81446
![Page 26: Learning to Re-rank: Query-Dependent Image Re-Ranking using Click Data Manik Varma Microsoft Research India Vidit Jain Yahoo! Labs India.](https://reader036.fdocuments.net/reader036/viewer/2022062619/5517fc32550346c6568b5018/html5/thumbnails/26.jpg)
Query: “Stargate (1994)”
Bing Our results
Leads to visually diverse results for some queries
![Page 27: Learning to Re-rank: Query-Dependent Image Re-Ranking using Click Data Manik Varma Microsoft Research India Vidit Jain Yahoo! Labs India.](https://reader036.fdocuments.net/reader036/viewer/2022062619/5517fc32550346c6568b5018/html5/thumbnails/27.jpg)
• Significant improvement in nDCG@20 over commercial image ranking system
• Use of raw click data
• Address three limitations of existing search engines• incorporate visual features• user clicks to handle noisy “expert” labels• query-dependent re-ranking using GP regression
Conclusions
![Page 28: Learning to Re-rank: Query-Dependent Image Re-Ranking using Click Data Manik Varma Microsoft Research India Vidit Jain Yahoo! Labs India.](https://reader036.fdocuments.net/reader036/viewer/2022062619/5517fc32550346c6568b5018/html5/thumbnails/28.jpg)
Thank You!
![Page 29: Learning to Re-rank: Query-Dependent Image Re-Ranking using Click Data Manik Varma Microsoft Research India Vidit Jain Yahoo! Labs India.](https://reader036.fdocuments.net/reader036/viewer/2022062619/5517fc32550346c6568b5018/html5/thumbnails/29.jpg)
Additional slides
![Page 30: Learning to Re-rank: Query-Dependent Image Re-Ranking using Click Data Manik Varma Microsoft Research India Vidit Jain Yahoo! Labs India.](https://reader036.fdocuments.net/reader036/viewer/2022062619/5517fc32550346c6568b5018/html5/thumbnails/30.jpg)
• Given a ranked list of relevance judgments R
• Cumulative Gain at PCGP(R) = i=1..P 2Ri – 1
• Discounted Cumulative Gain DCGP(R) = i=1..P (2Ri – 1) / log2(i+1)
• Normalized Discounted Cumulative GainnDCGP(R) = DCGP(R) / DCGP(I)
where I is the judgment for the ideal ranked list
Measuring Search Performance – nDCG
![Page 31: Learning to Re-rank: Query-Dependent Image Re-Ranking using Click Data Manik Varma Microsoft Research India Vidit Jain Yahoo! Labs India.](https://reader036.fdocuments.net/reader036/viewer/2022062619/5517fc32550346c6568b5018/html5/thumbnails/31.jpg)
• We only have “positive” training data so discriminative methods did not work well (generating negative training data is non-trivial)
• Simple methods did work well
Click Estimation - Dimensionality Reduction
Approach Mean nDCG at 20
Relative Improvement
Average click rank 0.6266 – 8.6%Correlation with score 0.7209 +5.2 %Correlation with clicks 0.7409 +8.1%
PCA 0.7692 +12.2%
![Page 32: Learning to Re-rank: Query-Dependent Image Re-Ranking using Click Data Manik Varma Microsoft Research India Vidit Jain Yahoo! Labs India.](https://reader036.fdocuments.net/reader036/viewer/2022062619/5517fc32550346c6568b5018/html5/thumbnails/32.jpg)
• Gaussian Process Regressiony(x) = k(x, xTrain) [ k(xTrain, xTrain) + 2I ]-1 yTrain
= dt(x, xTrain) yTrain = wt(x)
• where• y is the predicted number of clicks and yTrain the
number of clicks for the set of training images• x are the features extracted from a novel image• xTrain are the training set features• is a noise parameter• k is a Gaussian kernel function
Gaussian Process regression
![Page 33: Learning to Re-rank: Query-Dependent Image Re-Ranking using Click Data Manik Varma Microsoft Research India Vidit Jain Yahoo! Labs India.](https://reader036.fdocuments.net/reader036/viewer/2022062619/5517fc32550346c6568b5018/html5/thumbnails/33.jpg)
Click Estimation - Regression
Approach Mean nDCG at 20
Relative Improvement
Linear Regression 0.6871 – 0.2%Support Vector Regression 0.6997 +2.1 %
Nearest Neighbour 0.7428 +8.3%GP Regression 0.7692 +12.2%