Probabilistic Information Retrieval - Sumit Bhatia
Transcript of Probabilistic Information Retrieval - Sumit Bhatia
IntroductionProbabilistic Ranking Principle
The Binary Independence ModelOKAPI
Discussion
Probabilistic Information Retrieval
Sumit Bhatia
July 16, 2009
Sumit Bhatia Probabilistic Information Retrieval 1/23
IntroductionProbabilistic Ranking Principle
The Binary Independence ModelOKAPI
Discussion
Overview
1 IntroductionInformation RetrievalIR ModelsProbability Basics
2 Probabilistic Ranking PrincipleDocument Ranking ProblemProbability Ranking Principle
3 The Binary Independence Model
4 OKAPI
5 Discussion
Sumit Bhatia Probabilistic Information Retrieval 2/23
IntroductionProbabilistic Ranking Principle
The Binary Independence ModelOKAPI
Discussion
Information RetrievalIR ModelsProbability Basics
Information Retrieval(IR) Process
1 User has some information needs
Sumit Bhatia Probabilistic Information Retrieval 3/23
IntroductionProbabilistic Ranking Principle
The Binary Independence ModelOKAPI
Discussion
Information RetrievalIR ModelsProbability Basics
Information Retrieval(IR) Process
1 User has some information needs
2 Information Need → Query using Query Representation
Sumit Bhatia Probabilistic Information Retrieval 3/23
IntroductionProbabilistic Ranking Principle
The Binary Independence ModelOKAPI
Discussion
Information RetrievalIR ModelsProbability Basics
Information Retrieval(IR) Process
1 User has some information needs
2 Information Need → Query using Query Representation
3 Documents → Document Representation
Sumit Bhatia Probabilistic Information Retrieval 3/23
IntroductionProbabilistic Ranking Principle
The Binary Independence ModelOKAPI
Discussion
Information RetrievalIR ModelsProbability Basics
Information Retrieval(IR) Process
1 User has some information needs
2 Information Need → Query using Query Representation
3 Documents → Document Representation
4 IR system matches the two representations to determine thedocuments that satisfy user’s information needs.
Sumit Bhatia Probabilistic Information Retrieval 3/23
IntroductionProbabilistic Ranking Principle
The Binary Independence ModelOKAPI
Discussion
Information RetrievalIR ModelsProbability Basics
Boolean Retrieval Model
Query = Boolean Expression of termsex. Mitra AND Giles
Sumit Bhatia Probabilistic Information Retrieval 4/23
IntroductionProbabilistic Ranking Principle
The Binary Independence ModelOKAPI
Discussion
Information RetrievalIR ModelsProbability Basics
Boolean Retrieval Model
Query = Boolean Expression of termsex. Mitra AND Giles
Document = Term-document Matrix
Aij = 1 iff i th term is present in j th document.
Sumit Bhatia Probabilistic Information Retrieval 4/23
IntroductionProbabilistic Ranking Principle
The Binary Independence ModelOKAPI
Discussion
Information RetrievalIR ModelsProbability Basics
Boolean Retrieval Model
Query = Boolean Expression of termsex. Mitra AND Giles
Document = Term-document Matrix
Aij = 1 iff i th term is present in j th document.
“Bag of words”
Sumit Bhatia Probabilistic Information Retrieval 4/23
IntroductionProbabilistic Ranking Principle
The Binary Independence ModelOKAPI
Discussion
Information RetrievalIR ModelsProbability Basics
Boolean Retrieval Model
Query = Boolean Expression of termsex. Mitra AND Giles
Document = Term-document Matrix
Aij = 1 iff i th term is present in j th document.
“Bag of words”
No Ranking
Sumit Bhatia Probabilistic Information Retrieval 4/23
IntroductionProbabilistic Ranking Principle
The Binary Independence ModelOKAPI
Discussion
Information RetrievalIR ModelsProbability Basics
Vector Space Model
Query = free text queryex. Mitra Giles
Sumit Bhatia Probabilistic Information Retrieval 5/23
IntroductionProbabilistic Ranking Principle
The Binary Independence ModelOKAPI
Discussion
Information RetrievalIR ModelsProbability Basics
Vector Space Model
Query = free text queryex. Mitra Giles
Query and Document → vectors in “term space”
Sumit Bhatia Probabilistic Information Retrieval 5/23
IntroductionProbabilistic Ranking Principle
The Binary Independence ModelOKAPI
Discussion
Information RetrievalIR ModelsProbability Basics
Vector Space Model
Query = free text queryex. Mitra Giles
Query and Document → vectors in “term space”
Cosine similarity between query and document vectorsindicates similarity
Sumit Bhatia Probabilistic Information Retrieval 5/23
IntroductionProbabilistic Ranking Principle
The Binary Independence ModelOKAPI
Discussion
Information RetrievalIR ModelsProbability Basics
Information Retrieval(IR) Process-Revisited
1 User has some information needs
2 Information Need → Query using Query Representation
3 Documents → Document Representation
4 IR system matches the two representations to determine thedocuments that satisfy user’s information needs.
Sumit Bhatia Probabilistic Information Retrieval 6/23
IntroductionProbabilistic Ranking Principle
The Binary Independence ModelOKAPI
Discussion
Information RetrievalIR ModelsProbability Basics
Information Retrieval(IR) Process-Revisited
1 User has some information needs
2 Information Need → Query using Query Representation
3 Documents → Document Representation
4 IR system matches the two representations to determine thedocuments that satisfy user’s information needs.
Problem!
Both Query and Document Representations are Uncertain
Sumit Bhatia Probabilistic Information Retrieval 6/23
IntroductionProbabilistic Ranking Principle
The Binary Independence ModelOKAPI
Discussion
Information RetrievalIR ModelsProbability Basics
Probability Basics
Chain Rule:
P(A,B) = P(A ∩ B) = P(A|B)P(B) = P(B |A)P(A)
Partition Rule:
P(B) = P(A,B) + P(A,B)
Bayes Rule:
P(A|B) = P(B|A)P(A)P(B) =
[
P(B|A)P
X∈{A,A} P(B|X )P(X )
]
P(A)
Sumit Bhatia Probabilistic Information Retrieval 7/23
IntroductionProbabilistic Ranking Principle
The Binary Independence ModelOKAPI
Discussion
Document Ranking ProblemProbability Ranking Principle
Document Ranking Problem
Problem Statement
Given a set of documents D = {d1, d2, . . . , dn} and a query q, inwhat order the subset of relevant documentsDr = {dr1, dr2 . . . , drm} should be returned to the user.
Sumit Bhatia Probabilistic Information Retrieval 8/23
IntroductionProbabilistic Ranking Principle
The Binary Independence ModelOKAPI
Discussion
Document Ranking ProblemProbability Ranking Principle
Document Ranking Problem
Problem Statement
Given a set of documents D = {d1, d2, . . . , dn} and a query q, inwhat order the subset of relevant documentsDr = {dr1, dr2 . . . , drm} should be returned to the user.
Hint: We want the best document to be at rank 1, second best tobe at rank 2 and so on.
Sumit Bhatia Probabilistic Information Retrieval 8/23
IntroductionProbabilistic Ranking Principle
The Binary Independence ModelOKAPI
Discussion
Document Ranking ProblemProbability Ranking Principle
Document Ranking Problem
Problem Statement
Given a set of documents D = {d1, d2, . . . , dn} and a query q, inwhat order the subset of relevant documentsDr = {dr1, dr2 . . . , drm} should be returned to the user.
Hint: We want the best document to be at rank 1, second best tobe at rank 2 and so on.
Solution
Rank by probability of relevance of the document w.r.t.information need (query).=⇒ by P(R = 1|d , q)
Sumit Bhatia Probabilistic Information Retrieval 8/23
IntroductionProbabilistic Ranking Principle
The Binary Independence ModelOKAPI
Discussion
Document Ranking ProblemProbability Ranking Principle
Probability Ranking Principle
Probability Ranking Principle (Rijsbergen, 1979)
If a reference retrieval system’s response to each request is aranking of the documents in the collection in order of decreasingprobability of relevance to the user who submitted the request,where the probabilities are estimated as accurately as possible onthe basis of whatever data have been made available to the systemfor this purpose, the overall effectiveness of the system to its userwill be the best that is obtainable on the basis of those data.
Sumit Bhatia Probabilistic Information Retrieval 9/23
IntroductionProbabilistic Ranking Principle
The Binary Independence ModelOKAPI
Discussion
Document Ranking ProblemProbability Ranking Principle
Probability Ranking Principle
Probability Ranking Principle (Rijsbergen, 1979)
If a reference retrieval system’s response to each request is aranking of the documents in the collection in order of decreasingprobability of relevance to the user who submitted the request,where the probabilities are estimated as accurately as possible onthe basis of whatever data have been made available to the systemfor this purpose, the overall effectiveness of the system to its userwill be the best that is obtainable on the basis of those data.
Observation 1: PRP maximizes the mean probability at rank k.
Sumit Bhatia Probabilistic Information Retrieval 9/23
IntroductionProbabilistic Ranking Principle
The Binary Independence ModelOKAPI
Discussion
Document Ranking ProblemProbability Ranking Principle
Probability Ranking Principle
Case 1: 1/0 Loss =⇒ No selection/retrieval costs.
Sumit Bhatia Probabilistic Information Retrieval 10/23
IntroductionProbabilistic Ranking Principle
The Binary Independence ModelOKAPI
Discussion
Document Ranking ProblemProbability Ranking Principle
Probability Ranking Principle
Case 1: 1/0 Loss =⇒ No selection/retrieval costs.
Bayes’ Optimal Decision Rule:
d is relevant iff P(R = 1|d , q) > P(R = 0|d , q)
Sumit Bhatia Probabilistic Information Retrieval 10/23
IntroductionProbabilistic Ranking Principle
The Binary Independence ModelOKAPI
Discussion
Document Ranking ProblemProbability Ranking Principle
Probability Ranking Principle
Case 1: 1/0 Loss =⇒ No selection/retrieval costs.
Bayes’ Optimal Decision Rule:
d is relevant iff P(R = 1|d , q) > P(R = 0|d , q)
Theorem 1
PRP is optimal, in the sense that it minimizes the expected loss(Bayes Risk) under 1/0 loss.
Sumit Bhatia Probabilistic Information Retrieval 10/23
IntroductionProbabilistic Ranking Principle
The Binary Independence ModelOKAPI
Discussion
Document Ranking ProblemProbability Ranking Principle
Probability Ranking Principle
Case 1: 1/0 Loss =⇒ No selection/retrieval costs.
Bayes’ Optimal Decision Rule:
d is relevant iff P(R = 1|d , q) > P(R = 0|d , q)
Theorem 1
PRP is optimal, in the sense that it minimizes the expected loss(Bayes Risk) under 1/0 loss.
Case 2: PRP with differential retrieval costs
C1.P(R = 1|d , q) + C0.P(R = 0|d , q) ≤ C1.P(R = 1|d ′, q) + C0.P(R =
0|d ′, q)
Sumit Bhatia Probabilistic Information Retrieval 10/23
IntroductionProbabilistic Ranking Principle
The Binary Independence ModelOKAPI
Discussion
Binary Independence Model (BIM)
Assumptions:
1 Binary: documents are represented as binary incidence vectorsof terms. d = {d1, d2, . . . , dn}di = 1 iff term i is present in d , else it is 0.
1This is the assumption for PRP in general.
Sumit Bhatia Probabilistic Information Retrieval 11/23
IntroductionProbabilistic Ranking Principle
The Binary Independence ModelOKAPI
Discussion
Binary Independence Model (BIM)
Assumptions:
1 Binary: documents are represented as binary incidence vectorsof terms. d = {d1, d2, . . . , dn}di = 1 iff term i is present in d , else it is 0.
2 Independence: terms occur in documents independent ofother documents.
1This is the assumption for PRP in general.
Sumit Bhatia Probabilistic Information Retrieval 11/23
IntroductionProbabilistic Ranking Principle
The Binary Independence ModelOKAPI
Discussion
Binary Independence Model (BIM)
Assumptions:
1 Binary: documents are represented as binary incidence vectorsof terms. d = {d1, d2, . . . , dn}di = 1 iff term i is present in d , else it is 0.
2 Independence: terms occur in documents independent ofother documents.
3 Relevance of a document is independent of relevance of otherdocuments1
1This is the assumption for PRP in general.
Sumit Bhatia Probabilistic Information Retrieval 11/23
IntroductionProbabilistic Ranking Principle
The Binary Independence ModelOKAPI
Discussion
Binary Independence Model (BIM)
Assumptions:
1 Binary: documents are represented as binary incidence vectorsof terms. d = {d1, d2, . . . , dn}di = 1 iff term i is present in d , else it is 0.
2 Independence: terms occur in documents independent ofother documents.
3 Relevance of a document is independent of relevance of otherdocuments1
Implications:
1 Many documents have the same representation.
2 No association between terms is considered.
1This is the assumption for PRP in general.
Sumit Bhatia Probabilistic Information Retrieval 11/23
IntroductionProbabilistic Ranking Principle
The Binary Independence ModelOKAPI
Discussion
Binary Independence Model (BIM)
We wish to compute P(R |d , q).We do it in terms of term incidence vectors ~d and ~q.We thus compute P(R |~d , ~q).
Sumit Bhatia Probabilistic Information Retrieval 12/23
IntroductionProbabilistic Ranking Principle
The Binary Independence ModelOKAPI
Discussion
Binary Independence Model (BIM)
We wish to compute P(R |d , q).We do it in terms of term incidence vectors ~d and ~q.We thus compute P(R |~d , ~q).Using Bayes’ Rule, we have:
P(R = 1|~d , ~q) =P(~d |R = 1, ~q) P(R = 1|~q)
P(~d |~q)(1)
P(R = 0|~d , ~q) =P(~d |R = 0, ~q) P(R = 0|~q)
P(~d |~q)(2)
Sumit Bhatia Probabilistic Information Retrieval 12/23
IntroductionProbabilistic Ranking Principle
The Binary Independence ModelOKAPI
Discussion
Binary Independence Model (BIM)
We wish to compute P(R |d , q).We do it in terms of term incidence vectors ~d and ~q.We thus compute P(R |~d , ~q).Using Bayes’ Rule, we have:
P(R = 1|~d , ~q) =P(~d |R = 1, ~q) P(R = 1|~q)
P(~d |~q)(1)
P(R = 0|~d , ~q) =P(~d |R = 0, ~q) P(R = 0|~q)
P(~d |~q)(2)
Prior Relevance Probability
Sumit Bhatia Probabilistic Information Retrieval 12/23
IntroductionProbabilistic Ranking Principle
The Binary Independence ModelOKAPI
Discussion
Binary Independence Model
Computing the Odd ratios, we get:
O(R |~d , ~q) =P(R = 1|~q)
P(R = 0|~q)×
P(~d |R = 1, ~q)
P(~d |R = 0, ~q)(3)
Sumit Bhatia Probabilistic Information Retrieval 13/23
IntroductionProbabilistic Ranking Principle
The Binary Independence ModelOKAPI
Discussion
Binary Independence Model
Computing the Odd ratios, we get:
O(R |~d , ~q) =P(R = 1|~q)
P(R = 0|~q)×
P(~d |R = 1, ~q)
P(~d |R = 0, ~q)(3)
Document Independent!
Sumit Bhatia Probabilistic Information Retrieval 13/23
IntroductionProbabilistic Ranking Principle
The Binary Independence ModelOKAPI
Discussion
Binary Independence Model
Computing the Odd ratios, we get:
O(R |~d , ~q) =P(R = 1|~q)
P(R = 0|~q)×
P(~d |R = 1, ~q)
P(~d |R = 0, ~q)(3)
Document Independent! What for the second term?
Sumit Bhatia Probabilistic Information Retrieval 13/23
IntroductionProbabilistic Ranking Principle
The Binary Independence ModelOKAPI
Discussion
Binary Independence Model
Computing the Odd ratios, we get:
O(R |~d , ~q) =P(R = 1|~q)
P(R = 0|~q)×
P(~d |R = 1, ~q)
P(~d |R = 0, ~q)(3)
Document Independent! What for the second term?
Naive Bayes Assumption
Sumit Bhatia Probabilistic Information Retrieval 13/23
IntroductionProbabilistic Ranking Principle
The Binary Independence ModelOKAPI
Discussion
Binary Independence Model
Computing the Odd ratios, we get:
O(R |~d , ~q) =P(R = 1|~q)
P(R = 0|~q)×
P(~d |R = 1, ~q)
P(~d |R = 0, ~q)(3)
Document Independent! What for the second term?
Naive Bayes Assumption
O(R |~d , ~q) ∝m
Πt=1
P(~dt |R = 1, ~q)
P(~dt |R = 0, ~q)(4)
Sumit Bhatia Probabilistic Information Retrieval 13/23
IntroductionProbabilistic Ranking Principle
The Binary Independence ModelOKAPI
Discussion
Binary Independence Model
Observation 1: A term is either present in a document or
not.
Sumit Bhatia Probabilistic Information Retrieval 14/23
IntroductionProbabilistic Ranking Principle
The Binary Independence ModelOKAPI
Discussion
Binary Independence Model
Observation 1: A term is either present in a document or
not.
O(R |~d , ~q) ∝m
Πt:dt=1
P(~dt = 1|R = 1, ~q)
P(~dt = 1|R = 0, ~q).
m
Πt:dt=0
P(~dt = 0|R = 1, ~q)
P(~dt = 0|R = 0, ~q)(5)
Sumit Bhatia Probabilistic Information Retrieval 14/23
IntroductionProbabilistic Ranking Principle
The Binary Independence ModelOKAPI
Discussion
Binary Independence Model
Observation 1: A term is either present in a document or
not.
O(R |~d , ~q) ∝m
Πt:dt=1
P(~dt = 1|R = 1, ~q)
P(~dt = 1|R = 0, ~q).
m
Πt:dt=0
P(~dt = 0|R = 1, ~q)
P(~dt = 0|R = 0, ~q)(5)
document R = 1 R = 0
Term present dt = 1 pt ut
Term absent dt = 0 1 − pt 1 − ut
Sumit Bhatia Probabilistic Information Retrieval 14/23
IntroductionProbabilistic Ranking Principle
The Binary Independence ModelOKAPI
Discussion
Binary Independence Model
Assumption: A term not in query is equally likey to occur in
relevant and non-relevant documents.
Sumit Bhatia Probabilistic Information Retrieval 15/23
IntroductionProbabilistic Ranking Principle
The Binary Independence ModelOKAPI
Discussion
Binary Independence Model
Assumption: A term not in query is equally likey to occur in
relevant and non-relevant documents.
O(R |~d , ~q) ∝ Πt:dt=qt=1
pt
ut
. Πt:dt=0,qt=1
1 − pt
1 − ut
(6)
Sumit Bhatia Probabilistic Information Retrieval 15/23
IntroductionProbabilistic Ranking Principle
The Binary Independence ModelOKAPI
Discussion
Binary Independence Model
Assumption: A term not in query is equally likey to occur in
relevant and non-relevant documents.
O(R |~d , ~q) ∝ Πt:dt=qt=1
pt
ut
. Πt:dt=0,qt=1
1 − pt
1 − ut
(6)
Manipulating:
O(R |~d , ~q) ∝ Πt:dt=qt=1
pt(1 − ut)
ut(1 − pt). Πt:qt=1
1 − pt
1 − ut
(7)
Sumit Bhatia Probabilistic Information Retrieval 15/23
IntroductionProbabilistic Ranking Principle
The Binary Independence ModelOKAPI
Discussion
Binary Independence Model
Assumption: A term not in query is equally likey to occur in
relevant and non-relevant documents.
O(R |~d , ~q) ∝ Πt:dt=qt=1
pt
ut
. Πt:dt=0,qt=1
1 − pt
1 − ut
(6)
Manipulating:
O(R |~d , ~q) ∝ Πt:dt=qt=1
pt(1 − ut)
ut(1 − pt). Πt:qt=1
1 − pt
1 − ut
(7)
Constant for a given query!
Sumit Bhatia Probabilistic Information Retrieval 15/23
IntroductionProbabilistic Ranking Principle
The Binary Independence ModelOKAPI
Discussion
Binary Independence Model
RSVd = log Πt:dt=qt=1
pt(1 − ut)
ut(1 − pt)
(8)
=∑
t:dt=qt=1
logpt(1 − ut)
ut(1 − pt)
(9)
Sumit Bhatia Probabilistic Information Retrieval 16/23
IntroductionProbabilistic Ranking Principle
The Binary Independence ModelOKAPI
Discussion
Binary Independence Model
RSVd = log Πt:dt=qt=1
pt(1 − ut)
ut(1 − pt)
(8)
=∑
t:dt=qt=1
logpt(1 − ut)
ut(1 − pt)
(9)
Docs R=1 R=0 Total
di = 1 s n-s n
di = 0 S-s (N-n)-(S-s) N-n
Total S N-S N
Sumit Bhatia Probabilistic Information Retrieval 16/23
IntroductionProbabilistic Ranking Principle
The Binary Independence ModelOKAPI
Discussion
Binary Independence Model
RSVd = log Πt:dt=qt=1
pt(1 − ut)
ut(1 − pt)
(8)
=∑
t:dt=qt=1
logpt(1 − ut)
ut(1 − pt)
(9)
Docs R=1 R=0 Total
di = 1 s n-s n
di = 0 S-s (N-n)-(S-s) N-n
Total S N-S N
substituting, we get:
RSVd =∑
t:dt=qt=1
log(s + 1
2)/(S − s + 12)
(n − s + 12)/(N − n − S + s + 1
2)(10)
Sumit Bhatia Probabilistic Information Retrieval 16/23
IntroductionProbabilistic Ranking Principle
The Binary Independence ModelOKAPI
Discussion
Observations
Probabilities for non-relevant documents can be approximatedby collection statistics.
=⇒ log(1 − ut)
ut
= log(N − n)
n≈ log
N
n= IDF !
Sumit Bhatia Probabilistic Information Retrieval 17/23
IntroductionProbabilistic Ranking Principle
The Binary Independence ModelOKAPI
Discussion
Observations
Probabilities for non-relevant documents can be approximatedby collection statistics.
=⇒ log(1 − ut)
ut
= log(N − n)
n≈ log
N
n= IDF !
It is not so simple for relevant documents /
– Estimating from known relevant documents (not alwaysknown)– Assuming pt = constant, equivalent to IDF weighting only
Sumit Bhatia Probabilistic Information Retrieval 17/23
IntroductionProbabilistic Ranking Principle
The Binary Independence ModelOKAPI
Discussion
Observations
Probabilities for non-relevant documents can be approximatedby collection statistics.
=⇒ log(1 − ut)
ut
= log(N − n)
n≈ log
N
n= IDF !
It is not so simple for relevant documents /
– Estimating from known relevant documents (not alwaysknown)– Assuming pt = constant, equivalent to IDF weighting only
Difficulties in probability estimation and drastic assumptionsmakes achieving performance difficult
Sumit Bhatia Probabilistic Information Retrieval 17/23
IntroductionProbabilistic Ranking Principle
The Binary Independence ModelOKAPI
Discussion
OKAPI Weighting Scheme
BIM does not consider term frequencies and document length.
BM25 weighting scheme (Okapi weighting) by was developedto build a probabilistic model sensitive to these quantities.
BM25 today is widely used and has shown good performancein a number of practical systems.
Sumit Bhatia Probabilistic Information Retrieval 18/23
IntroductionProbabilistic Ranking Principle
The Binary Independence ModelOKAPI
Discussion
OKAPI Weighting Scheme
RSVd =∑
t∈q
{
logN
dft×
(k1 + 1)tftd
k1((1 − b) + b × (ld
lav)) + tftd
×(k3 + 1)tftqk3 + tftq
}
where:N is the total number of documents,dft is the document frequency, i.e.,number of documents that contain theterm t,tftd is the frequency of term t in document d ,tftq is the frequency of term t in query q,ld is the length of document d ,lav is the average length of documents,
k1, k3 and b are constants which are generally set to 2, 2 and .75
respectively.
Sumit Bhatia Probabilistic Information Retrieval 19/23
IntroductionProbabilistic Ranking Principle
The Binary Independence ModelOKAPI
Discussion
What Next?
Similarity between terms and documents - is this sufficient?
Sumit Bhatia Probabilistic Information Retrieval 20/23
IntroductionProbabilistic Ranking Principle
The Binary Independence ModelOKAPI
Discussion
What Next?
Similarity between terms and documents - is this sufficient?
JAVA: Coffee or Computer Language or Place?
Sumit Bhatia Probabilistic Information Retrieval 20/23
IntroductionProbabilistic Ranking Principle
The Binary Independence ModelOKAPI
Discussion
What Next?
Similarity between terms and documents - is this sufficient?
JAVA: Coffee or Computer Language or Place?
Time and Location of user?
Sumit Bhatia Probabilistic Information Retrieval 20/23
IntroductionProbabilistic Ranking Principle
The Binary Independence ModelOKAPI
Discussion
What Next?
Similarity between terms and documents - is this sufficient?
JAVA: Coffee or Computer Language or Place?
Time and Location of user?
Different users might want different documents for samequery?
Sumit Bhatia Probabilistic Information Retrieval 20/23
IntroductionProbabilistic Ranking Principle
The Binary Independence ModelOKAPI
Discussion
What Next?
Maximum Marginal Relevance [CG98] – Rank documents soas to minimize similarity between returned documents
Sumit Bhatia Probabilistic Information Retrieval 21/23
IntroductionProbabilistic Ranking Principle
The Binary Independence ModelOKAPI
Discussion
What Next?
Maximum Marginal Relevance [CG98] – Rank documents soas to minimize similarity between returned documents
Result Diversification [Wan09]– Rank documents so as to maximize mean relevance, given avariance level.– Variance here determines the risk the user is willing to take
Sumit Bhatia Probabilistic Information Retrieval 21/23
IntroductionProbabilistic Ranking Principle
The Binary Independence ModelOKAPI
Discussion
References
Carbonell, Jaime and Goldstein, Jade, The use of MMR,
diversity-based reranking for reordering documents and
producing summaries, SIGIR, 1998, pp. 335–336.
Christopher D. Manning, Prabhakar Raghavan, and HinrichSchutze, Introduction to information retrieval, CambridgeUniversity Press, 2008.
Jun Wang, Mean-variance analysis: A new document ranking
theory in information retrieval, Advances in InformationRetrieval, 2009, pp. 4–16.
Sumit Bhatia Probabilistic Information Retrieval 22/23
IntroductionProbabilistic Ranking Principle
The Binary Independence ModelOKAPI
Discussion
QUESTIONS???
Sumit Bhatia Probabilistic Information Retrieval 23/23