Introduction to Information Retrieval - Stanford...

7
5/21/17 1 Introduction to Information Retrieval Probabilistic Information Retrieval Christopher Manning and Pandu Nayak Introduction to Information Retrieval FROM BOOLEAN TO RANKED RETRIEVAL … IN TWO STEPS Introduction to Information Retrieval Ranked retrieval § Thus far, our queries have all been Boolean § Documents either match or don’t § Can be good for expert users with precise understanding of their needs and the collection § Can also be good for applications: Applications can easily consume 1000s of results § Not good for the majority of users § Most users incapable of writing Boolean queries § Or they are, but they think it’s too much work § Most users don’t want to wade through 1000s of results. § This is particularly true of web search Ch. 6 Introduction to Information Retrieval Problem with Boolean search: feast or famine § Boolean queries often result in either too few (=0) or too many (1000s) results. § Query 1: “standard user dlink 650” → 200,000 hits § Query 2: “standard user dlink 650 no card found”: 0 hits § It takes a lot of skill to come up with a query that produces a manageable number of hits. § AND gives too few; OR gives too many Ch. 6 Introduction to Information Retrieval Who are these people? Stephen Robertson Keith van Rijsbergen Karen Spärck Jones Introduction to Information Retrieval Why probabilities in IR? User Information Need Documents Document Representation Query Representation How to match? In traditional IR systems, matching between each document and query is attempted in a semantically imprecise space of index terms. Probabilities provide a principled foundation for uncertain reasoning. Can we use probabilities to quantify our uncertainties? Uncertain guess of whether document has relevant content Understanding of user need is uncertain

Transcript of Introduction to Information Retrieval - Stanford...

Page 1: Introduction to Information Retrieval - Stanford Universityweb.stanford.edu/class/cs276/handouts/lecture11-probir-6up.pdf · Probabilistic Information Retrieval Christopher Manningand

5/21/17

1

IntroductiontoInformationRetrieval

Introductionto

InformationRetrieval

ProbabilisticInformationRetrievalChristopherManning andPanduNayak

IntroductiontoInformationRetrieval

FROMBOOLEANTORANKEDRETRIEVAL…INTWOSTEPS

IntroductiontoInformationRetrieval

Rankedretrieval§ Thusfar,ourquerieshaveallbeenBoolean

§ Documentseithermatchordon’t§ Canbegoodforexpertuserswithpreciseunderstandingoftheirneedsandthecollection§ Canalsobegoodforapplications:Applicationscaneasilyconsume1000sofresults

§ Notgoodforthemajorityofusers§ MostusersincapableofwritingBooleanqueries

§ Ortheyare,buttheythinkit’stoomuchwork

§ Mostusersdon’twanttowadethrough1000sofresults.§ Thisisparticularlytrueofwebsearch

Ch. 6 IntroductiontoInformationRetrieval

ProblemwithBooleansearch:feastorfamine§ Booleanqueriesoftenresultineithertoofew(=0)ortoomany(1000s)results.

§ Query1:“standarduserdlink 650”→200,000hits§ Query2:“standarduserdlink 650nocardfound”:0hits

§ Ittakesalotofskilltocomeupwithaquerythatproducesamanageablenumberofhits.§ ANDgivestoofew;ORgivestoomany

Ch. 6

IntroductiontoInformationRetrieval

Who are these people?

Stephen Robertson Keith van RijsbergenKaren Spärck Jones

IntroductiontoInformationRetrieval

WhyprobabilitiesinIR?

User Information Need

DocumentsDocument

Representation

QueryRepresentation

How to match?

In traditional IR systems, matching between each document andquery is attempted in a semantically imprecise space of index terms.

Probabilities provide a principled foundation for uncertain reasoning.Can we use probabilities to quantify our uncertainties?

Uncertain guess ofwhether document has relevant content

Understandingof user need isuncertain

Page 2: Introduction to Information Retrieval - Stanford Universityweb.stanford.edu/class/cs276/handouts/lecture11-probir-6up.pdf · Probabilistic Information Retrieval Christopher Manningand

5/21/17

2

IntroductiontoInformationRetrieval

ProbabilisticIRtopics

1. Classicalprobabilisticretrievalmodel§ Probabilityrankingprinciple,etc.§ Binaryindependencemodel(≈NaïveBayestextcat)§ (Okapi)BM25

2. Bayesiannetworksfortextretrieval3. LanguagemodelapproachtoIR

§ Animportantemphasisinrecentwork

ProbabilisticmethodsareoneoftheoldestbutalsooneofthecurrentlyhottopicsinIR

§ Traditionally:neatideas,butdidn’twinonperformance§ Itseemstobedifferentnow

IntroductiontoInformationRetrieval

Thedocumentrankingproblem§ Wehaveacollectionofdocuments§ Userissuesaquery§ Alistofdocumentsneedstobereturned§ RankingmethodisthecoreofmodernIRsystems:

§ Inwhatorderdowepresentdocumentstotheuser?§ Wewantthe“best”documenttobefirst,secondbestsecond,etc….

§ Idea:Rankbyprobabilityofrelevanceofthedocumentw.r.t.informationneed§ P(R=1|documenti,query)

IntroductiontoInformationRetrieval

§ ForeventsAandB:

§ Bayes’ Rule

§ Odds:

Prior

p(A,B) = p(A∩B) = p(A | B)p(B) = p(B | A)p(A)

p(A | B) = p(B | A)p(A)p(B)

=p(B | A)p(A)

p(B | X)p(X)X=A,A∑

Recallafewprobabilitybasics

O(A) = p(A)p(A)

=p(A)

1− p(A)

Posterior

IntroductiontoInformationRetrieval

“Ifareferenceretrievalsystem’sresponsetoeachrequestisarankingofthedocumentsinthecollectioninorderofdecreasingprobabilityofrelevancetotheuserwhosubmittedtherequest,wheretheprobabilitiesareestimatedasaccuratelyaspossibleonthebasisofwhateverdatahavebeenmadeavailabletothesystemforthispurpose,theoveralleffectivenessofthesystemtoitsuserwillbethebestthatisobtainableonthebasisofthosedata.”

§ [1960s/1970s]S.Robertson,W.S.Cooper,M.E.Maron;van Rijsbergen (1979:113);Manning&Schütze (1999:538)

TheProbabilityRankingPrinciple

IntroductiontoInformationRetrieval

ProbabilityRankingPrinciple

Letx representadocumentinthecollection.LetR representrelevanceofadocumentw.r.t.given(fixed)queryandletR=1 representrelevantandR=0 notrelevant.

p(R =1| x) = p(x | R =1)p(R =1)p(x)

p(R = 0 | x) = p(x | R = 0)p(R = 0)p(x)

p(x|R=1), p(x|R=0) - probability that if a relevant (not relevant) document is retrieved, it is x.

Needtofindp(R=1|x) – probabilitythatadocumentx isrelevant.

p(R=1),p(R=0) - prior probabilityof retrieving a relevant or non-relevantdocument

p(R = 0 | x)+ p(R =1| x) =1

IntroductiontoInformationRetrieval

ProbabilityRankingPrinciple(PRP)§ Simplecase:noselectioncostsorotherutilityconcernsthatwoulddifferentiallyweighterrors

§ PRPinaction:Rankalldocumentsbyp(R=1|x)

§ Theorem:UsingthePRPisoptimal,inthatitminimizestheloss(Bayesrisk)under1/0loss§ Provableifallprobabilitiescorrect,etc.[e.g.,Ripley1996]

Page 3: Introduction to Information Retrieval - Stanford Universityweb.stanford.edu/class/cs276/handouts/lecture11-probir-6up.pdf · Probabilistic Information Retrieval Christopher Manningand

5/21/17

3

IntroductiontoInformationRetrieval

ProbabilityRankingPrinciple

§ Morecomplexcase:retrievalcosts.§ Letd beadocument§ C– costofnotretrievingarelevant document§ C’ – costofretrievinganon-relevant document

§ ProbabilityRankingPrinciple:if

foralld’notyetretrieved,thend isthenextdocumenttoberetrieved

§ Wewon’tfurtherconsidercost/utilityfromnowon

!C ⋅ p(R = 0 | d)−C ⋅ p(R =1| d) ≤ !C ⋅ p(R = 0 | !d )−C ⋅ p(R =1| !d )

IntroductiontoInformationRetrieval

ProbabilityRankingPrinciple§ Howdowecomputeallthoseprobabilities?

§ Donotknowexactprobabilities,havetouseestimates§ BinaryIndependenceModel(BIM)– whichwediscussnext– isthesimplestmodel

§ Questionableassumptions§ “Relevance”ofeachdocumentisindependentofrelevanceofotherdocuments.§ Really,it’sbadtokeeponreturningduplicates

§ Booleanmodelofrelevance§ Thatonehasasinglestepinformationneed

§ Seeingarangeofresultsmightletuserrefinequery

IntroductiontoInformationRetrieval

ProbabilisticRetrievalStrategy

§ Estimatehowtermscontributetorelevance§ Howdootherthingsliketermfrequencyanddocumentlengthinfluenceyourjudgmentsaboutdocumentrelevance?§ NotatallinBIM§ AmorenuancedansweristheOkapi(BM25)formulae[nexttime]

§ Spärck Jones/Robertson

§ Combinetofinddocumentrelevanceprobability

§ Orderdocumentsbydecreasingprobability

IntroductiontoInformationRetrieval

ProbabilisticRanking

Basic concept:“For a given query, if we know some documents that are relevant, terms that occur in those documents should be given greater weighting in searching for other relevant documents.

By making assumptions about the distribution of terms and applying Bayes Theorem, it is possible to derive weights theoretically.”

Van Rijsbergen

IntroductiontoInformationRetrieval

BinaryIndependenceModel§ TraditionallyusedinconjunctionwithPRP§ “Binary”=Boolean:documentsarerepresentedasbinary

incidencevectorsofterms(cf.IIRChapter1):§

§ iff termi ispresentindocumentx.§ “Independence”: termsoccurindocumentsindependently§ Differentdocumentscanbemodeledasthesamevector

),,( 1 nxxx !"=1=ix

IntroductiontoInformationRetrieval

BinaryIndependenceModel§ Queries:binarytermincidencevectors§ Givenqueryq,

§ foreachdocumentd needtocomputep(R|q,d).§ replacewithcomputingp(R|q,x) where x isbinarytermincidencevectorrepresentingd.

§ Interestedonlyinranking§ WilluseoddsandBayes’Rule:

O(R | q, x) = p(R =1| q, x)p(R = 0 | q, x)

=

p(R =1| q)p(x | R =1,q)p(x | q)

p(R = 0 | q)p(x | R = 0,q)p(x | q)

Page 4: Introduction to Information Retrieval - Stanford Universityweb.stanford.edu/class/cs276/handouts/lecture11-probir-6up.pdf · Probabilistic Information Retrieval Christopher Manningand

5/21/17

4

IntroductiontoInformationRetrieval

BinaryIndependenceModel

• Using Independence Assumption:

O(R | q, x) =O(R | q) ⋅ p(xi | R =1,q)p(xi | R = 0,q)i=1

n

p(x | R =1,q)p(x | R = 0,q)

=p(xi | R =1,q)p(xi | R = 0,q)i=1

n

O(R | q, x) = p(R =1| q, x)p(R = 0 | q, x)

=p(R =1| q)p(R = 0 | q)

⋅p(x | R =1,q)p(x | R = 0,q)

Constant for a given query Needs estimation

IntroductiontoInformationRetrieval

BinaryIndependenceModel

• Since xi is either 0 or 1:

O(R | q, x) =O(R | q) ⋅ p(xi =1| R =1,q)p(xi =1| R = 0,q)xi=1

∏ ⋅p(xi = 0 | R =1,q)p(xi = 0 | R = 0,q)xi=0

• Let pi = p(xi =1| R =1,q); ri = p(xi =1| R = 0,q);

• Assume, for all terms not occurring in the query (qi=0) ii rp =

O(R | q, x) =O(R | q) ⋅ p(xi | R =1,q)p(xi | R = 0,q)i=1

n

O(R | q, x) =O(R | q) ⋅ pirixi=1

qi=1

∏ ⋅(1− pi )(1− ri )xi=0

qi=1

IntroductiontoInformationRetrieval

document relevant(R=1) notrelevant(R=0)termpresent xi =1 pi ritermabsent xi =0 (1– pi) (1 – ri)

IntroductiontoInformationRetrieval

All matching terms Non-matching query terms

BinaryIndependenceModel

All matching termsAll query terms

O(R | q, x) =O(R | q) ⋅ pirixi=1

qi=1

∏ ⋅1− ri1− pi

⋅1− pi1− ri

$

%&

'

()

xi=1qi=1

∏ 1− pi1− rixi=0

qi=1

O(R | q, x) =O(R | q) ⋅ pi (1− ri )ri (1− pi )xi=qi=1

∏ ⋅1− pi1− riqi=1

O(R | q, x) =O(R | q) ⋅ pirixi=qi=1

∏ ⋅1− pi1− rixi=0

qi=1

IntroductiontoInformationRetrieval

BinaryIndependenceModel

Constant foreach query

Only quantity to be estimated for rankings

ÕÕ=== -

--

×=11 11

)1()1()|(),|(

iii q i

i

qx ii

ii

rp

prrpqROxqRO !

RetrievalStatusValue:

åÕ==== -

-=

--

=11 )1(

)1(log)1()1(log

iiii qx ii

ii

qx ii

ii

prrp

prrpRSV

IntroductiontoInformationRetrieval

BinaryIndependenceModel

AllboilsdowntocomputingRSV.

åÕ==== -

-=

--

=11 )1(

)1(log)1()1(log

iiii qx ii

ii

qx ii

ii

prrp

prrpRSV

å==

=1;

ii qxicRSV

)1()1(logii

iii pr

rpc--

=

So,howdowecomputeci’s fromourdata?

Theci arelogoddsratiosTheyfunctionasthetermweightsinthismodel

Page 5: Introduction to Information Retrieval - Stanford Universityweb.stanford.edu/class/cs276/handouts/lecture11-probir-6up.pdf · Probabilistic Information Retrieval Christopher Manningand

5/21/17

5

IntroductiontoInformationRetrieval

BinaryIndependenceModel• EstimatingRSVcoefficients intheory• Foreachtermi lookatthistableofdocumentcounts:

Documents

Relevant Non-Relevant Total

xi=1 s n-s n xi=0 S-s N-n-S+s N-n Total S N-S N

Sspi » )(

)(SNsnri -

)()()(log),,,(

sSnNsnsSssSnNKci +---

-=»

• Estimates: For now,assume nozero terms.Remembersmoothing.

)1()1(logii

iii pr

rpc--

=

IntroductiontoInformationRetrieval

Estimation– keychallenge

§ Ifnon-relevantdocumentsareapproximatedbythewholecollection,thenri (prob.ofoccurrenceinnon-relevantdocumentsforquery)isn/Nand

log1− riri

= log N − n− S + sn− s

≈ log N − nn

≈ log Nn= IDF!

IntroductiontoInformationRetrieval

Collectionvs.Documentfrequency

§ Collectionfrequencyoft isthetotal numberofoccurrencesoft inthecollection(incl.multiples)

§ Documentfrequencyisnumberofdocstisin§ Example:

§ Whichwordisabettersearchterm(andshouldgetahigherweight)?

Word Collection frequency Document frequency

insurance 10440 3997

try 10422 8760

Sec. 6.2.1 IntroductiontoInformationRetrieval

Estimation– keychallenge

§ pi (probabilityofoccurrenceinrelevantdocuments)cannotbeapproximatedaseasily

§ pi canbeestimatedinvariousways:§ fromrelevantdocumentsifyouknowsome

§ Relevanceweightingcanbeusedinafeedbackloop

§ constant(CroftandHarpercombinationmatch)– thenjustgetidf weightingofterms(withpi=0.5)

§ proportionaltoprob.ofoccurrenceincollection§ Greiff (SIGIR1998)arguesfor1/3+2/3dfi/N

RSV = log Nnixi=qi=1

IntroductiontoInformationRetrieval

ProbabilisticRelevanceFeedback1. GuessapreliminaryprobabilisticdescriptionofR=1

documents;useittoretrieveasetofdocuments2. Interactwiththeusertorefinethedescription:

learnsomedefinitememberswithR=1andR=03. Re-estimatepi andri onthebasisofthese

§ Ifi appearsinVi withinsetofdocumentsV:pi =|Vi|/|V|§ Orcancombinenewinformationwithoriginalguess(use

Bayesianprior):

4. Repeat,thusgeneratingasuccessionofapproximationstorelevantdocuments

kk++

=|||| )1(

)2(

VpVp ii

iκ is prior

weight

IntroductiontoInformationRetrieval

30

Iterativelyestimatingpi and ri(=Pseudo-relevancefeedback)1. Assumethatpi isconstantoverallxi inqueryandri

asbefore§ pi =0.5(evenodds)foranygivendoc

2. Determineguessofrelevantdocumentset:§ V isfixedsizesetofhighestrankeddocumentsonthis

model3. Weneedtoimproveourguessesforpi andri,so

§ Usedistributionofxi indocsinV.LetVi besetofdocumentscontainingxi§ pi =|Vi|/|V|

§ Assumeifnotretrievedthennotrelevant§ ri =(ni – |Vi|)/(N– |V|)

4. Goto2.untilconvergesthenreturnranking

Page 6: Introduction to Information Retrieval - Stanford Universityweb.stanford.edu/class/cs276/handouts/lecture11-probir-6up.pdf · Probabilistic Information Retrieval Christopher Manningand

5/21/17

6

IntroductiontoInformationRetrieval

PRPandBIM

§ Gettingreasonableapproximationsofprobabilitiesispossible.

§ Requiresrestrictiveassumptions:§ Termindependence§ Termsnotinquerydon’taffecttheoutcome§ Booleanrepresentationofdocuments/queries/relevance

§ Documentrelevancevaluesareindependent§ Someoftheseassumptionscanberemoved§ Problem:eitherrequirepartialrelevanceinformationor

seeminglyonlycanderivesomewhatinferiortermweights

IntroductiontoInformationRetrieval

Removingtermindependence§ Ingeneral,indextermsaren’t

independent§ Dependenciescanbecomplex§ vanRijsbergen (1979)proposed

simplemodelofdependenciesasatree§ ExactlyFriedmanand

Goldszmidt’s TreeAugmentedNaiveBayes(AAAI13,1996)

§ Eachtermdependentononeother

§ In1970s,estimationproblemsheldbacksuccessofthismodel

IntroductiontoInformationRetrieval

Secondstep:Termfrequency§ Rightinthefirstlecture,wesaidthatapageshouldrankhigherifitmentionsawordmore§ Perhapsmodulatedbythingslikepagelength

§ Wemightwantamodelwithtermfrequencyinit.

§ We’llseeaprobabilisticonenexttime– BM25

§ Quicksummaryofvectorspacemodel

IntroductiontoInformationRetrieval

Summary– vectorspaceranking

§ Representthequeryasaweightedtermfrequency/inversedocumentfrequency(tf-idf)vector

§ Representeachdocumentasaweightedtf-idf vector§ Computethecosinesimilarityscoreforthequeryvectorandeachdocumentvector

§ Rankdocumentswithrespecttothequerybyscore§ ReturnthetopK (e.g.,K =10)totheuser

IntroductiontoInformationRetrieval IntroductiontoInformationRetrieval

Cosinesimilarity

Sec. 6.3

Page 7: Introduction to Information Retrieval - Stanford Universityweb.stanford.edu/class/cs276/handouts/lecture11-probir-6up.pdf · Probabilistic Information Retrieval Christopher Manningand

5/21/17

7

IntroductiontoInformationRetrieval

tf-idfweightinghasmanyvariants

Sec. 6.4 IntroductiontoInformationRetrieval

ResourcesS.E.RobertsonandK.Spärck Jones.1976.RelevanceWeightingofSearch

Terms.JournaloftheAmericanSocietyforInformationSciences27(3):129–146.

C.J.vanRijsbergen.1979.InformationRetrieval. 2nded.London:Butterworths,chapter6.[Mostdetailsofmath]http://www.dcs.gla.ac.uk/Keith/Preface.html

N.Fuhr.1992.ProbabilisticModelsinInformationRetrieval.TheComputerJournal,35(3),243–255.[Easiestread,withBNs]

F.Crestani,M.Lalmas,C.J.vanRijsbergen,andI.Campbell.1998.IsThisDocumentRelevant?…Probably:ASurveyofProbabilisticModelsinInformationRetrieval.ACMComputingSurveys 30(4):528–552.http://www.acm.org/pubs/citations/journals/surveys/1998-30-4/p528-crestani/[Addsverylittlematerialthatisn’tinvanRijsbergen orFuhr ]