Fine-tuning Ranking Models:
description
Transcript of Fine-tuning Ranking Models:
![Page 1: Fine-tuning Ranking Models:](https://reader036.fdocuments.net/reader036/viewer/2022062519/568152d6550346895dc0f0f1/html5/thumbnails/1.jpg)
Fine-tuning Ranking Models:a two-step optimization approach
Vitor
Jan 29, 2008
Text Learning Meeting - CMU
With invaluable ideas from ….
![Page 2: Fine-tuning Ranking Models:](https://reader036.fdocuments.net/reader036/viewer/2022062519/568152d6550346895dc0f0f1/html5/thumbnails/2.jpg)
Motivation
• Rank, Rank, Rank…– Web retrieval, movie recommendation, NFL draft, etc.– Einat’s contextual search– Richard’s set expansion (SEAL)– Andy’s context sensitive spelling correction algorithm– Selecting seeds in Frank’s political blog classification
algorithm– Ramnath’s thunderbird extension for
• Email Leak prediction• Email Recipient suggestion
![Page 3: Fine-tuning Ranking Models:](https://reader036.fdocuments.net/reader036/viewer/2022062519/568152d6550346895dc0f0f1/html5/thumbnails/3.jpg)
Help your brothers!
• Try Cut Once!, our Thunderbird extension– Works well with Gmail accounts
• It’s working reasonably well• We need feedback.
![Page 4: Fine-tuning Ranking Models:](https://reader036.fdocuments.net/reader036/viewer/2022062519/568152d6550346895dc0f0f1/html5/thumbnails/4.jpg)
Leak warnings: hit x to remove recipient
Pause or cancel send of message
Timer: msg is sent after 10sec by default
Suggestions:hit + to add
Thunderbird plug-in
Classifier/rankers written in JavaScript
Email Recipient Recommendation
![Page 5: Fine-tuning Ranking Models:](https://reader036.fdocuments.net/reader036/viewer/2022062519/568152d6550346895dc0f0f1/html5/thumbnails/5.jpg)
Email Recipient Recommendation
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
TOCCBCC CCBCC
MA
P
Frequency
Recency
M1uc
M2uc
TFIDF
KNN
36 Enron users
![Page 6: Fine-tuning Ranking Models:](https://reader036.fdocuments.net/reader036/viewer/2022062519/568152d6550346895dc0f0f1/html5/thumbnails/6.jpg)
Email Recipient Recommendation
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
TOCCBCC CCBCC
MA
P
Frequency
Recency
M1uc
M2uc
TFIDF
KNN
Threaded
[Carvalho & Cohen, ECIR-08]
![Page 7: Fine-tuning Ranking Models:](https://reader036.fdocuments.net/reader036/viewer/2022062519/568152d6550346895dc0f0f1/html5/thumbnails/7.jpg)
Aggregating Rankings
• Many “Data Fusion” methods– 2 types:
• Normalized scores: CombSUM, CombMNZ, etc.• Unnormalized scores: BordaCount, Reciprocal Rank Sum, etc.
• Reciprocal Rank:– The sum of the inverse of the rank of document in each
ranking.
Rankingsq iq
i drankdRR
)(
1)(
[Aslam & Montague, 2001]; [Ogilvie & Callan, 2003]; [Macdonald & Ounis, 2006]
![Page 8: Fine-tuning Ranking Models:](https://reader036.fdocuments.net/reader036/viewer/2022062519/568152d6550346895dc0f0f1/html5/thumbnails/8.jpg)
Aggregated Ranking Results
[Carvalho & Cohen, ECIR-08]
![Page 9: Fine-tuning Ranking Models:](https://reader036.fdocuments.net/reader036/viewer/2022062519/568152d6550346895dc0f0f1/html5/thumbnails/9.jpg)
Intelligent Email Auto-completionTOCCBCC
CCBCC
![Page 10: Fine-tuning Ranking Models:](https://reader036.fdocuments.net/reader036/viewer/2022062519/568152d6550346895dc0f0f1/html5/thumbnails/10.jpg)
Can we do better?
• Not using other features, but better ranking methods
• Machine learning to improve ranking: Learning to rank: – Many (recent) methods:
• ListNet, Perceptrons, RankSvm, RankBoost, AdaRank, Genetic Programming, Ordinal Regression, etc.
– Mostly supervised– Generally small training sets– Workshop in SIGIR-07 (Einat was in the PC)
![Page 11: Fine-tuning Ranking Models:](https://reader036.fdocuments.net/reader036/viewer/2022062519/568152d6550346895dc0f0f1/html5/thumbnails/11.jpg)
Pairwise-based Ranking
)()( jiji dfdfdd
mimiiii xwxwxwddf ...,w)( 2211
Rank q
d1
d2
d3
d4
d5
d6
...
dT
We assume a linear function f
0, jiji ddwdd
Goal: induce a ranking function f(d) s.t.
Therefore, constraints are:),...,,( 62616 mxxx
![Page 12: Fine-tuning Ranking Models:](https://reader036.fdocuments.net/reader036/viewer/2022062519/568152d6550346895dc0f0f1/html5/thumbnails/12.jpg)
Ranking with Perceptrons
• Nice convergence properties and mistake bounds– bound on the number of mistakes/misranks
• Fast and scalable
• Many variants [Collins 2002, Gao et al 2005, Elsas et al 2008]
– Voting, averaging, committee, pocket, etc.
– General update rule:
– Here: Averaged version of perceptron
)]()([1NRR
tt dfdfWW
![Page 13: Fine-tuning Ranking Models:](https://reader036.fdocuments.net/reader036/viewer/2022062519/568152d6550346895dc0f0f1/html5/thumbnails/13.jpg)
Rank SVM
• Equivalent to maximing AUC
,2
1min
2
RPi
iranksvmw
CwL
RP
NRRranksvmw
ddwwL ],1[min2
[Joachims, KDD-02],
[Herbrich et al, 2000]
)},{(,1,,0 subject to i NRRiNRR ddRPddw
.2C
1 where,
Equivalent to:
![Page 14: Fine-tuning Ranking Models:](https://reader036.fdocuments.net/reader036/viewer/2022062519/568152d6550346895dc0f0f1/html5/thumbnails/14.jpg)
Loss Function
NRR ddw ,
0
0.5
1
1.5
2
-3 -2 -1 0 1 2 3
Lo
ss
![Page 15: Fine-tuning Ranking Models:](https://reader036.fdocuments.net/reader036/viewer/2022062519/568152d6550346895dc0f0f1/html5/thumbnails/15.jpg)
Loss Function
NRR ddw ,
0
0.5
1
1.5
2
-3 -2 -1 0 1 2 3
Lo
ss
![Page 16: Fine-tuning Ranking Models:](https://reader036.fdocuments.net/reader036/viewer/2022062519/568152d6550346895dc0f0f1/html5/thumbnails/16.jpg)
Loss Function
NRR ddwx ,
0
0.5
1
1.5
2
-3 -2 -1 0 1 2 3
Lo
ss
)(11
11
1
xsigmoidee
exx
x
![Page 17: Fine-tuning Ranking Models:](https://reader036.fdocuments.net/reader036/viewer/2022062519/568152d6550346895dc0f0f1/html5/thumbnails/17.jpg)
Loss Functions
• SigmoidRank
• SVMrank
RP
NRRranksvmw
ddwwL ],1[min2
RP
NRRkSigmoidRanw
ddwsigmoidwL )],(1[min2
xexsigmoid
1
1)(
Not convex
![Page 18: Fine-tuning Ranking Models:](https://reader036.fdocuments.net/reader036/viewer/2022062519/568152d6550346895dc0f0f1/html5/thumbnails/18.jpg)
Fine-tuning Ranking Models
Base Ranker
Sigmoid Rank
Non-convex:
Minimizing a very close approximation for the number of misranks
Final model
Base ranking model
e.g., RankSVM, Perceptron, etc.
![Page 19: Fine-tuning Ranking Models:](https://reader036.fdocuments.net/reader036/viewer/2022062519/568152d6550346895dc0f0f1/html5/thumbnails/19.jpg)
Gradient Descent
)( )()(
)()()1(
kdrankSigmoi
k
kk
kk
wLw
www
)( )](1[))(( Since xsigmoidxsigmoidxsigmoidx
)],(1)[,( 2)( )(NRRNRR
RP
kdrankSigmoi ddwsigmoidddwsigmoidwwL
![Page 20: Fine-tuning Ranking Models:](https://reader036.fdocuments.net/reader036/viewer/2022062519/568152d6550346895dc0f0f1/html5/thumbnails/20.jpg)
Results in CC prediction
0.472
0.516
0.479
0.524 0.521
0.480
0.25
0.3
0.35
0.4
0.45
0.5
0.55
TOCCBCC CCBCC
MA
P
Frequency
Recency
TFIDF
KNN
Percep
Percep+Sigmoid
RankSVM
RankSVM+Sigmoid
36 Enron users
![Page 21: Fine-tuning Ranking Models:](https://reader036.fdocuments.net/reader036/viewer/2022062519/568152d6550346895dc0f0f1/html5/thumbnails/21.jpg)
Set Expansion (SEAL) Results
0.8
0.82
0.84
0.86
0.88
0.9
0.92
0.94
SEAL-1 SEAL-2 SEAL-3
MA
P
Percep
Percep+Sigmoid
RankSVM
RankSVM+Sigmoid
ListNet
ListNet+Sigmoid
[Listnet: Cao et al. , ICML-07]
[Wang & Cohen, ICDM-2007]
![Page 22: Fine-tuning Ranking Models:](https://reader036.fdocuments.net/reader036/viewer/2022062519/568152d6550346895dc0f0f1/html5/thumbnails/22.jpg)
Results in Letor
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Ohsumed Trec3 Trec4
MA
P
Percep
Percep+Sigmoid
RankSVM
RankSVM+Sigmoid
ListNet
ListNet+Sigmoid
![Page 23: Fine-tuning Ranking Models:](https://reader036.fdocuments.net/reader036/viewer/2022062519/568152d6550346895dc0f0f1/html5/thumbnails/23.jpg)
Learning Curve
0.900
0.905
0.910
0.915
0.920
0 5 10 15 20 25 30
epoch
AU
C
Perceptron
RankSVM
TOCCBCC Enron: user lokay-m
![Page 24: Fine-tuning Ranking Models:](https://reader036.fdocuments.net/reader036/viewer/2022062519/568152d6550346895dc0f0f1/html5/thumbnails/24.jpg)
Learning Curve
CCBCC Enron: user campbel-m
0.94
0.95
0.96
0.97
0.98
0 5 10 15 20 25 30 35 40
epochs
AU
C
Perceptron
RankSVM
![Page 25: Fine-tuning Ranking Models:](https://reader036.fdocuments.net/reader036/viewer/2022062519/568152d6550346895dc0f0f1/html5/thumbnails/25.jpg)
Regularization Parameter
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
0.50
RankSVM RankSVM +Sigmoid
RankSVM RankSVM +Sigmoid
RankSVM RankSVM +Sigmoid
MA
P
C=10 C=1 C=0.1 C=0.01 C=0.001
TREC3 TREC4 Ohsumed=2
![Page 26: Fine-tuning Ranking Models:](https://reader036.fdocuments.net/reader036/viewer/2022062519/568152d6550346895dc0f0f1/html5/thumbnails/26.jpg)
Some Ideas
• Instead of number of misranks, optimize other loss functions:– Mean Average Precision, MRR, etc.– Rank Term:
– Some preliminary results with Sigmoid-MAP
• Does it work for classification?
}|}{}{{
)],(1[1)(iNRRj
jii ddwsigmoiddRank
![Page 27: Fine-tuning Ranking Models:](https://reader036.fdocuments.net/reader036/viewer/2022062519/568152d6550346895dc0f0f1/html5/thumbnails/27.jpg)
Thanks