Introduction to Information Retrieval - Stanford...

Post on 17-Jun-2020

20 views 0 download

Transcript of Introduction to Information Retrieval - Stanford...

5/21/17

1

IntroductiontoInformationRetrieval

Introductionto

InformationRetrieval

ProbabilisticInformationRetrievalChristopherManning andPanduNayak

IntroductiontoInformationRetrieval

FROMBOOLEANTORANKEDRETRIEVAL…INTWOSTEPS

IntroductiontoInformationRetrieval

Rankedretrieval§ Thusfar,ourquerieshaveallbeenBoolean

§ Documentseithermatchordon’t§ Canbegoodforexpertuserswithpreciseunderstandingoftheirneedsandthecollection§ Canalsobegoodforapplications:Applicationscaneasilyconsume1000sofresults

§ Notgoodforthemajorityofusers§ MostusersincapableofwritingBooleanqueries

§ Ortheyare,buttheythinkit’stoomuchwork

§ Mostusersdon’twanttowadethrough1000sofresults.§ Thisisparticularlytrueofwebsearch

Ch. 6 IntroductiontoInformationRetrieval

ProblemwithBooleansearch:feastorfamine§ Booleanqueriesoftenresultineithertoofew(=0)ortoomany(1000s)results.

§ Query1:“standarduserdlink 650”→200,000hits§ Query2:“standarduserdlink 650nocardfound”:0hits

§ Ittakesalotofskilltocomeupwithaquerythatproducesamanageablenumberofhits.§ ANDgivestoofew;ORgivestoomany

Ch. 6

IntroductiontoInformationRetrieval

Who are these people?

Stephen Robertson Keith van RijsbergenKaren Spärck Jones

IntroductiontoInformationRetrieval

WhyprobabilitiesinIR?

User Information Need

DocumentsDocument

Representation

QueryRepresentation

How to match?

In traditional IR systems, matching between each document andquery is attempted in a semantically imprecise space of index terms.

Probabilities provide a principled foundation for uncertain reasoning.Can we use probabilities to quantify our uncertainties?

Uncertain guess ofwhether document has relevant content

Understandingof user need isuncertain

5/21/17

2

IntroductiontoInformationRetrieval

ProbabilisticIRtopics

1. Classicalprobabilisticretrievalmodel§ Probabilityrankingprinciple,etc.§ Binaryindependencemodel(≈NaïveBayestextcat)§ (Okapi)BM25

2. Bayesiannetworksfortextretrieval3. LanguagemodelapproachtoIR

§ Animportantemphasisinrecentwork

ProbabilisticmethodsareoneoftheoldestbutalsooneofthecurrentlyhottopicsinIR

§ Traditionally:neatideas,butdidn’twinonperformance§ Itseemstobedifferentnow

IntroductiontoInformationRetrieval

Thedocumentrankingproblem§ Wehaveacollectionofdocuments§ Userissuesaquery§ Alistofdocumentsneedstobereturned§ RankingmethodisthecoreofmodernIRsystems:

§ Inwhatorderdowepresentdocumentstotheuser?§ Wewantthe“best”documenttobefirst,secondbestsecond,etc….

§ Idea:Rankbyprobabilityofrelevanceofthedocumentw.r.t.informationneed§ P(R=1|documenti,query)

IntroductiontoInformationRetrieval

§ ForeventsAandB:

§ Bayes’ Rule

§ Odds:

Prior

p(A,B) = p(A∩B) = p(A | B)p(B) = p(B | A)p(A)

p(A | B) = p(B | A)p(A)p(B)

=p(B | A)p(A)

p(B | X)p(X)X=A,A∑

Recallafewprobabilitybasics

O(A) = p(A)p(A)

=p(A)

1− p(A)

Posterior

IntroductiontoInformationRetrieval

“Ifareferenceretrievalsystem’sresponsetoeachrequestisarankingofthedocumentsinthecollectioninorderofdecreasingprobabilityofrelevancetotheuserwhosubmittedtherequest,wheretheprobabilitiesareestimatedasaccuratelyaspossibleonthebasisofwhateverdatahavebeenmadeavailabletothesystemforthispurpose,theoveralleffectivenessofthesystemtoitsuserwillbethebestthatisobtainableonthebasisofthosedata.”

§ [1960s/1970s]S.Robertson,W.S.Cooper,M.E.Maron;van Rijsbergen (1979:113);Manning&Schütze (1999:538)

TheProbabilityRankingPrinciple

IntroductiontoInformationRetrieval

ProbabilityRankingPrinciple

Letx representadocumentinthecollection.LetR representrelevanceofadocumentw.r.t.given(fixed)queryandletR=1 representrelevantandR=0 notrelevant.

p(R =1| x) = p(x | R =1)p(R =1)p(x)

p(R = 0 | x) = p(x | R = 0)p(R = 0)p(x)

p(x|R=1), p(x|R=0) - probability that if a relevant (not relevant) document is retrieved, it is x.

Needtofindp(R=1|x) – probabilitythatadocumentx isrelevant.

p(R=1),p(R=0) - prior probabilityof retrieving a relevant or non-relevantdocument

p(R = 0 | x)+ p(R =1| x) =1

IntroductiontoInformationRetrieval

ProbabilityRankingPrinciple(PRP)§ Simplecase:noselectioncostsorotherutilityconcernsthatwoulddifferentiallyweighterrors

§ PRPinaction:Rankalldocumentsbyp(R=1|x)

§ Theorem:UsingthePRPisoptimal,inthatitminimizestheloss(Bayesrisk)under1/0loss§ Provableifallprobabilitiescorrect,etc.[e.g.,Ripley1996]

5/21/17

3

IntroductiontoInformationRetrieval

ProbabilityRankingPrinciple

§ Morecomplexcase:retrievalcosts.§ Letd beadocument§ C– costofnotretrievingarelevant document§ C’ – costofretrievinganon-relevant document

§ ProbabilityRankingPrinciple:if

foralld’notyetretrieved,thend isthenextdocumenttoberetrieved

§ Wewon’tfurtherconsidercost/utilityfromnowon

!C ⋅ p(R = 0 | d)−C ⋅ p(R =1| d) ≤ !C ⋅ p(R = 0 | !d )−C ⋅ p(R =1| !d )

IntroductiontoInformationRetrieval

ProbabilityRankingPrinciple§ Howdowecomputeallthoseprobabilities?

§ Donotknowexactprobabilities,havetouseestimates§ BinaryIndependenceModel(BIM)– whichwediscussnext– isthesimplestmodel

§ Questionableassumptions§ “Relevance”ofeachdocumentisindependentofrelevanceofotherdocuments.§ Really,it’sbadtokeeponreturningduplicates

§ Booleanmodelofrelevance§ Thatonehasasinglestepinformationneed

§ Seeingarangeofresultsmightletuserrefinequery

IntroductiontoInformationRetrieval

ProbabilisticRetrievalStrategy

§ Estimatehowtermscontributetorelevance§ Howdootherthingsliketermfrequencyanddocumentlengthinfluenceyourjudgmentsaboutdocumentrelevance?§ NotatallinBIM§ AmorenuancedansweristheOkapi(BM25)formulae[nexttime]

§ Spärck Jones/Robertson

§ Combinetofinddocumentrelevanceprobability

§ Orderdocumentsbydecreasingprobability

IntroductiontoInformationRetrieval

ProbabilisticRanking

Basic concept:“For a given query, if we know some documents that are relevant, terms that occur in those documents should be given greater weighting in searching for other relevant documents.

By making assumptions about the distribution of terms and applying Bayes Theorem, it is possible to derive weights theoretically.”

Van Rijsbergen

IntroductiontoInformationRetrieval

BinaryIndependenceModel§ TraditionallyusedinconjunctionwithPRP§ “Binary”=Boolean:documentsarerepresentedasbinary

incidencevectorsofterms(cf.IIRChapter1):§

§ iff termi ispresentindocumentx.§ “Independence”: termsoccurindocumentsindependently§ Differentdocumentscanbemodeledasthesamevector

),,( 1 nxxx !"=1=ix

IntroductiontoInformationRetrieval

BinaryIndependenceModel§ Queries:binarytermincidencevectors§ Givenqueryq,

§ foreachdocumentd needtocomputep(R|q,d).§ replacewithcomputingp(R|q,x) where x isbinarytermincidencevectorrepresentingd.

§ Interestedonlyinranking§ WilluseoddsandBayes’Rule:

O(R | q, x) = p(R =1| q, x)p(R = 0 | q, x)

=

p(R =1| q)p(x | R =1,q)p(x | q)

p(R = 0 | q)p(x | R = 0,q)p(x | q)

5/21/17

4

IntroductiontoInformationRetrieval

BinaryIndependenceModel

• Using Independence Assumption:

O(R | q, x) =O(R | q) ⋅ p(xi | R =1,q)p(xi | R = 0,q)i=1

n

p(x | R =1,q)p(x | R = 0,q)

=p(xi | R =1,q)p(xi | R = 0,q)i=1

n

O(R | q, x) = p(R =1| q, x)p(R = 0 | q, x)

=p(R =1| q)p(R = 0 | q)

⋅p(x | R =1,q)p(x | R = 0,q)

Constant for a given query Needs estimation

IntroductiontoInformationRetrieval

BinaryIndependenceModel

• Since xi is either 0 or 1:

O(R | q, x) =O(R | q) ⋅ p(xi =1| R =1,q)p(xi =1| R = 0,q)xi=1

∏ ⋅p(xi = 0 | R =1,q)p(xi = 0 | R = 0,q)xi=0

• Let pi = p(xi =1| R =1,q); ri = p(xi =1| R = 0,q);

• Assume, for all terms not occurring in the query (qi=0) ii rp =

O(R | q, x) =O(R | q) ⋅ p(xi | R =1,q)p(xi | R = 0,q)i=1

n

O(R | q, x) =O(R | q) ⋅ pirixi=1

qi=1

∏ ⋅(1− pi )(1− ri )xi=0

qi=1

IntroductiontoInformationRetrieval

document relevant(R=1) notrelevant(R=0)termpresent xi =1 pi ritermabsent xi =0 (1– pi) (1 – ri)

IntroductiontoInformationRetrieval

All matching terms Non-matching query terms

BinaryIndependenceModel

All matching termsAll query terms

O(R | q, x) =O(R | q) ⋅ pirixi=1

qi=1

∏ ⋅1− ri1− pi

⋅1− pi1− ri

$

%&

'

()

xi=1qi=1

∏ 1− pi1− rixi=0

qi=1

O(R | q, x) =O(R | q) ⋅ pi (1− ri )ri (1− pi )xi=qi=1

∏ ⋅1− pi1− riqi=1

O(R | q, x) =O(R | q) ⋅ pirixi=qi=1

∏ ⋅1− pi1− rixi=0

qi=1

IntroductiontoInformationRetrieval

BinaryIndependenceModel

Constant foreach query

Only quantity to be estimated for rankings

ÕÕ=== -

--

×=11 11

)1()1()|(),|(

iii q i

i

qx ii

ii

rp

prrpqROxqRO !

RetrievalStatusValue:

åÕ==== -

-=

--

=11 )1(

)1(log)1()1(log

iiii qx ii

ii

qx ii

ii

prrp

prrpRSV

IntroductiontoInformationRetrieval

BinaryIndependenceModel

AllboilsdowntocomputingRSV.

åÕ==== -

-=

--

=11 )1(

)1(log)1()1(log

iiii qx ii

ii

qx ii

ii

prrp

prrpRSV

å==

=1;

ii qxicRSV

)1()1(logii

iii pr

rpc--

=

So,howdowecomputeci’s fromourdata?

Theci arelogoddsratiosTheyfunctionasthetermweightsinthismodel

5/21/17

5

IntroductiontoInformationRetrieval

BinaryIndependenceModel• EstimatingRSVcoefficients intheory• Foreachtermi lookatthistableofdocumentcounts:

Documents

Relevant Non-Relevant Total

xi=1 s n-s n xi=0 S-s N-n-S+s N-n Total S N-S N

Sspi » )(

)(SNsnri -

)()()(log),,,(

sSnNsnsSssSnNKci +---

-=»

• Estimates: For now,assume nozero terms.Remembersmoothing.

)1()1(logii

iii pr

rpc--

=

IntroductiontoInformationRetrieval

Estimation– keychallenge

§ Ifnon-relevantdocumentsareapproximatedbythewholecollection,thenri (prob.ofoccurrenceinnon-relevantdocumentsforquery)isn/Nand

log1− riri

= log N − n− S + sn− s

≈ log N − nn

≈ log Nn= IDF!

IntroductiontoInformationRetrieval

Collectionvs.Documentfrequency

§ Collectionfrequencyoft isthetotal numberofoccurrencesoft inthecollection(incl.multiples)

§ Documentfrequencyisnumberofdocstisin§ Example:

§ Whichwordisabettersearchterm(andshouldgetahigherweight)?

Word Collection frequency Document frequency

insurance 10440 3997

try 10422 8760

Sec. 6.2.1 IntroductiontoInformationRetrieval

Estimation– keychallenge

§ pi (probabilityofoccurrenceinrelevantdocuments)cannotbeapproximatedaseasily

§ pi canbeestimatedinvariousways:§ fromrelevantdocumentsifyouknowsome

§ Relevanceweightingcanbeusedinafeedbackloop

§ constant(CroftandHarpercombinationmatch)– thenjustgetidf weightingofterms(withpi=0.5)

§ proportionaltoprob.ofoccurrenceincollection§ Greiff (SIGIR1998)arguesfor1/3+2/3dfi/N

RSV = log Nnixi=qi=1

IntroductiontoInformationRetrieval

ProbabilisticRelevanceFeedback1. GuessapreliminaryprobabilisticdescriptionofR=1

documents;useittoretrieveasetofdocuments2. Interactwiththeusertorefinethedescription:

learnsomedefinitememberswithR=1andR=03. Re-estimatepi andri onthebasisofthese

§ Ifi appearsinVi withinsetofdocumentsV:pi =|Vi|/|V|§ Orcancombinenewinformationwithoriginalguess(use

Bayesianprior):

4. Repeat,thusgeneratingasuccessionofapproximationstorelevantdocuments

kk++

=|||| )1(

)2(

VpVp ii

iκ is prior

weight

IntroductiontoInformationRetrieval

30

Iterativelyestimatingpi and ri(=Pseudo-relevancefeedback)1. Assumethatpi isconstantoverallxi inqueryandri

asbefore§ pi =0.5(evenodds)foranygivendoc

2. Determineguessofrelevantdocumentset:§ V isfixedsizesetofhighestrankeddocumentsonthis

model3. Weneedtoimproveourguessesforpi andri,so

§ Usedistributionofxi indocsinV.LetVi besetofdocumentscontainingxi§ pi =|Vi|/|V|

§ Assumeifnotretrievedthennotrelevant§ ri =(ni – |Vi|)/(N– |V|)

4. Goto2.untilconvergesthenreturnranking

5/21/17

6

IntroductiontoInformationRetrieval

PRPandBIM

§ Gettingreasonableapproximationsofprobabilitiesispossible.

§ Requiresrestrictiveassumptions:§ Termindependence§ Termsnotinquerydon’taffecttheoutcome§ Booleanrepresentationofdocuments/queries/relevance

§ Documentrelevancevaluesareindependent§ Someoftheseassumptionscanberemoved§ Problem:eitherrequirepartialrelevanceinformationor

seeminglyonlycanderivesomewhatinferiortermweights

IntroductiontoInformationRetrieval

Removingtermindependence§ Ingeneral,indextermsaren’t

independent§ Dependenciescanbecomplex§ vanRijsbergen (1979)proposed

simplemodelofdependenciesasatree§ ExactlyFriedmanand

Goldszmidt’s TreeAugmentedNaiveBayes(AAAI13,1996)

§ Eachtermdependentononeother

§ In1970s,estimationproblemsheldbacksuccessofthismodel

IntroductiontoInformationRetrieval

Secondstep:Termfrequency§ Rightinthefirstlecture,wesaidthatapageshouldrankhigherifitmentionsawordmore§ Perhapsmodulatedbythingslikepagelength

§ Wemightwantamodelwithtermfrequencyinit.

§ We’llseeaprobabilisticonenexttime– BM25

§ Quicksummaryofvectorspacemodel

IntroductiontoInformationRetrieval

Summary– vectorspaceranking

§ Representthequeryasaweightedtermfrequency/inversedocumentfrequency(tf-idf)vector

§ Representeachdocumentasaweightedtf-idf vector§ Computethecosinesimilarityscoreforthequeryvectorandeachdocumentvector

§ Rankdocumentswithrespecttothequerybyscore§ ReturnthetopK (e.g.,K =10)totheuser

IntroductiontoInformationRetrieval IntroductiontoInformationRetrieval

Cosinesimilarity

Sec. 6.3

5/21/17

7

IntroductiontoInformationRetrieval

tf-idfweightinghasmanyvariants

Sec. 6.4 IntroductiontoInformationRetrieval

ResourcesS.E.RobertsonandK.Spärck Jones.1976.RelevanceWeightingofSearch

Terms.JournaloftheAmericanSocietyforInformationSciences27(3):129–146.

C.J.vanRijsbergen.1979.InformationRetrieval. 2nded.London:Butterworths,chapter6.[Mostdetailsofmath]http://www.dcs.gla.ac.uk/Keith/Preface.html

N.Fuhr.1992.ProbabilisticModelsinInformationRetrieval.TheComputerJournal,35(3),243–255.[Easiestread,withBNs]

F.Crestani,M.Lalmas,C.J.vanRijsbergen,andI.Campbell.1998.IsThisDocumentRelevant?…Probably:ASurveyofProbabilisticModelsinInformationRetrieval.ACMComputingSurveys 30(4):528–552.http://www.acm.org/pubs/citations/journals/surveys/1998-30-4/p528-crestani/[Addsverylittlematerialthatisn’tinvanRijsbergen orFuhr ]