Probabilistic Information Retrieval - Sumit Bhatia

IntroductionProbabilistic Ranking Principle

The Binary Independence ModelOKAPI

Discussion

Probabilistic Information Retrieval

Sumit Bhatia

July 16, 2009

Sumit Bhatia Probabilistic Information Retrieval 1/23



Discussion

Overview

1 IntroductionInformation RetrievalIR ModelsProbability Basics

2 Probabilistic Ranking PrincipleDocument Ranking ProblemProbability Ranking Principle

3 The Binary Independence Model

4 OKAPI

5 Discussion




Discussion

Information RetrievalIR ModelsProbability Basics

Information Retrieval(IR) Process

1 User has some information needs




Discussion




2 Information Need → Query using Query Representation




Discussion





3 Documents → Document Representation




Discussion






4 IR system matches the two representations to determine thedocuments that satisfy user’s information needs.




Discussion


Boolean Retrieval Model

Query = Boolean Expression of termsex. Mitra AND Giles




Discussion




Document = Term-document Matrix

Aij = 1 iff i th term is present in j th document.




Discussion






“Bag of words”




Discussion






“Bag of words”

No Ranking




Discussion


Vector Space Model

Query = free text queryex. Mitra Giles




Discussion


Vector Space Model


Query and Document → vectors in “term space”




Discussion


Vector Space Model


Query and Document → vectors in “term space”

Cosine similarity between query and document vectorsindicates similarity




Discussion


Information Retrieval(IR) Process-Revisited








Discussion


Information Retrieval(IR) Process-Revisited





Problem!

Both Query and Document Representations are Uncertain




Discussion


Probability Basics

Chain Rule:

P(A,B) = P(A ∩ B) = P(A|B)P(B) = P(B |A)P(A)

Partition Rule:

P(B) = P(A,B) + P(A,B)

Bayes Rule:

P(A|B) = P(B|A)P(A)P(B) =

[

P(B|A)P

X∈{A,A} P(B|X )P(X )

]

P(A)




Discussion

Document Ranking ProblemProbability Ranking Principle

Document Ranking Problem

Problem Statement

Given a set of documents D = {d1, d2, . . . , dn} and a query q, inwhat order the subset of relevant documentsDr = {dr1, dr2 . . . , drm} should be returned to the user.




Discussion



Problem Statement


Hint: We want the best document to be at rank 1, second best tobe at rank 2 and so on.




Discussion



Problem Statement


Hint: We want the best document to be at rank 1, second best tobe at rank 2 and so on.

Solution

Rank by probability of relevance of the document w.r.t.information need (query).=⇒ by P(R = 1|d , q)




Discussion


Probability Ranking Principle

Probability Ranking Principle (Rijsbergen, 1979)

If a reference retrieval system’s response to each request is aranking of the documents in the collection in order of decreasingprobability of relevance to the user who submitted the request,where the probabilities are estimated as accurately as possible onthe basis of whatever data have been made available to the systemfor this purpose, the overall effectiveness of the system to its userwill be the best that is obtainable on the basis of those data.




Discussion



Probability Ranking Principle (Rijsbergen, 1979)

If a reference retrieval system’s response to each request is aranking of the documents in the collection in order of decreasingprobability of relevance to the user who submitted the request,where the probabilities are estimated as accurately as possible onthe basis of whatever data have been made available to the systemfor this purpose, the overall effectiveness of the system to its userwill be the best that is obtainable on the basis of those data.

Observation 1: PRP maximizes the mean probability at rank k.




Discussion



Case 1: 1/0 Loss =⇒ No selection/retrieval costs.




Discussion




Bayes’ Optimal Decision Rule:

d is relevant iff P(R = 1|d , q) > P(R = 0|d , q)




Discussion






Theorem 1

PRP is optimal, in the sense that it minimizes the expected loss(Bayes Risk) under 1/0 loss.




Discussion






Theorem 1

PRP is optimal, in the sense that it minimizes the expected loss(Bayes Risk) under 1/0 loss.

Case 2: PRP with differential retrieval costs

C1.P(R = 1|d , q) + C0.P(R = 0|d , q) ≤ C1.P(R = 1|d ′, q) + C0.P(R =

0|d ′, q)




Discussion

Binary Independence Model (BIM)

Assumptions:

1 Binary: documents are represented as binary incidence vectorsof terms. d = {d1, d2, . . . , dn}di = 1 iff term i is present in d , else it is 0.

1This is the assumption for PRP in general.




Discussion


Assumptions:


2 Independence: terms occur in documents independent ofother documents.





Discussion


Assumptions:



3 Relevance of a document is independent of relevance of otherdocuments1





Discussion


Assumptions:



3 Relevance of a document is independent of relevance of otherdocuments1

Implications:

1 Many documents have the same representation.

2 No association between terms is considered.





Discussion


We wish to compute P(R |d , q).We do it in terms of term incidence vectors ~d and ~q.We thus compute P(R |~d , ~q).




Discussion

Binary Independence Model

Computing the Odd ratios, we get:

O(R |~d , ~q) =P(R = 1|~q)

P(R = 0|~q)×

P(~d |R = 1, ~q)

P(~d |R = 0, ~q)(3)




Discussion


Observation 1: A term is either present in a document or

not.




Discussion


Assumption: A term not in query is equally likey to occur in

relevant and non-relevant documents.




Discussion




O(R |~d , ~q) ∝ Πt:dt=qt=1

pt

ut

. Πt:dt=0,qt=1

1 − pt

1 − ut

(6)




Discussion




O(R |~d , ~q) ∝ Πt:dt=qt=1

pt

ut

. Πt:dt=0,qt=1

1 − pt

1 − ut

(6)

Manipulating:

O(R |~d , ~q) ∝ Πt:dt=qt=1

pt(1 − ut)

ut(1 − pt). Πt:qt=1

1 − pt

1 − ut

(7)




Discussion




O(R |~d , ~q) ∝ Πt:dt=qt=1

pt

ut

. Πt:dt=0,qt=1

1 − pt

1 − ut

(6)

Manipulating:

O(R |~d , ~q) ∝ Πt:dt=qt=1

pt(1 − ut)

ut(1 − pt). Πt:qt=1

1 − pt

1 − ut

(7)

Constant for a given query!




Discussion


RSVd = log Πt:dt=qt=1

pt(1 − ut)

ut(1 − pt)

(8)

=∑

t:dt=qt=1

logpt(1 − ut)

ut(1 − pt)

(9)




Discussion



pt(1 − ut)

ut(1 − pt)

(8)

=∑

t:dt=qt=1

logpt(1 − ut)

ut(1 − pt)

(9)

Docs R=1 R=0 Total

di = 1 s n-s n

di = 0 S-s (N-n)-(S-s) N-n

Total S N-S N




Discussion



pt(1 − ut)

ut(1 − pt)

(8)

=∑

t:dt=qt=1

logpt(1 − ut)

ut(1 − pt)

(9)

Docs R=1 R=0 Total

di = 1 s n-s n

di = 0 S-s (N-n)-(S-s) N-n

Total S N-S N

substituting, we get:

RSVd =∑

t:dt=qt=1

log(s + 1

2)/(S − s + 12)

(n − s + 12)/(N − n − S + s + 1

2)(10)




Discussion

Observations

Probabilities for non-relevant documents can be approximatedby collection statistics.

=⇒ log(1 − ut)

ut

= log(N − n)

n≈ log

N

n= IDF !




Discussion

Observations


=⇒ log(1 − ut)

ut

= log(N − n)

n≈ log

N

n= IDF !

It is not so simple for relevant documents /

– Estimating from known relevant documents (not alwaysknown)– Assuming pt = constant, equivalent to IDF weighting only




Discussion

Observations


=⇒ log(1 − ut)

ut

= log(N − n)

n≈ log

N

n= IDF !

It is not so simple for relevant documents /

– Estimating from known relevant documents (not alwaysknown)– Assuming pt = constant, equivalent to IDF weighting only

Difficulties in probability estimation and drastic assumptionsmakes achieving performance difficult




Discussion

OKAPI Weighting Scheme

BIM does not consider term frequencies and document length.

BM25 weighting scheme (Okapi weighting) by was developedto build a probabilistic model sensitive to these quantities.

BM25 today is widely used and has shown good performancein a number of practical systems.




Discussion

OKAPI Weighting Scheme

RSVd =∑

t∈q

{

logN

dft×

(k1 + 1)tftd

k1((1 − b) + b × (ld

lav)) + tftd

×(k3 + 1)tftqk3 + tftq

}

where:N is the total number of documents,dft is the document frequency, i.e.,number of documents that contain theterm t,tftd is the frequency of term t in document d ,tftq is the frequency of term t in query q,ld is the length of document d ,lav is the average length of documents,

k1, k3 and b are constants which are generally set to 2, 2 and .75

respectively.




Discussion

What Next?

Similarity between terms and documents - is this sufficient?




Discussion

What Next?


JAVA: Coffee or Computer Language or Place?




Discussion

What Next?



Time and Location of user?




Discussion

What Next?



Time and Location of user?

Different users might want different documents for samequery?




Discussion

What Next?

Maximum Marginal Relevance [CG98] – Rank documents soas to minimize similarity between returned documents




Discussion

What Next?

Maximum Marginal Relevance [CG98] – Rank documents soas to minimize similarity between returned documents

Result Diversification [Wan09]– Rank documents so as to maximize mean relevance, given avariance level.– Variance here determines the risk the user is willing to take




Discussion

References

Carbonell, Jaime and Goldstein, Jade, The use of MMR,

diversity-based reranking for reordering documents and

producing summaries, SIGIR, 1998, pp. 335–336.

Christopher D. Manning, Prabhakar Raghavan, and HinrichSchutze, Introduction to information retrieval, CambridgeUniversity Press, 2008.

Jun Wang, Mean-variance analysis: A new document ranking

theory in information retrieval, Advances in InformationRetrieval, 2009, pp. 4–16.




Discussion

QUESTIONS???


Probabilistic Information Retrieval - Sumit Bhatia

Documents

Transcript of Probabilistic Information Retrieval - Sumit Bhatia