Sumit Bhatiasumitbhatia.net/Talks/inet.pdf · 1. D. Metzler and W.B. Croft. Combining the language...

23
Sumit Bhatia

Transcript of Sumit Bhatiasumitbhatia.net/Talks/inet.pdf · 1. D. Metzler and W.B. Croft. Combining the language...

Page 1: Sumit Bhatiasumitbhatia.net/Talks/inet.pdf · 1. D. Metzler and W.B. Croft. Combining the language model and inference network approaches to retrieval. Info. Proc. and Mgt. 40(5):735-750,

Sumit Bhatia

Page 2: Sumit Bhatiasumitbhatia.net/Talks/inet.pdf · 1. D. Metzler and W.B. Croft. Combining the language model and inference network approaches to retrieval. Info. Proc. and Mgt. 40(5):735-750,

Introduction

Inference Network based Retrieval Model

Experimental Results

Conclusions

2

Page 3: Sumit Bhatiasumitbhatia.net/Talks/inet.pdf · 1. D. Metzler and W.B. Croft. Combining the language model and inference network approaches to retrieval. Info. Proc. and Mgt. 40(5):735-750,

What are the important characteristics of aretrieval model?

◦ Document Representation

◦ Query Representation

◦ Matching Function

3

Page 4: Sumit Bhatiasumitbhatia.net/Talks/inet.pdf · 1. D. Metzler and W.B. Croft. Combining the language model and inference network approaches to retrieval. Info. Proc. and Mgt. 40(5):735-750,

Probabilistic Ranking Principle◦ P(R=1|d,q)

◦ 1-0 Loss model, BM25 Model

Language Models◦ P(d|q)

◦ Query Likelihood

◦ Relevance Based LM

◦ Translation Models

◦ Query document divergence

4

Page 5: Sumit Bhatiasumitbhatia.net/Talks/inet.pdf · 1. D. Metzler and W.B. Croft. Combining the language model and inference network approaches to retrieval. Info. Proc. and Mgt. 40(5):735-750,

Utilize multiple document representation schemes

5

CiteSeerX Document

TitleFull TextAbstract

Page 6: Sumit Bhatiasumitbhatia.net/Talks/inet.pdf · 1. D. Metzler and W.B. Croft. Combining the language model and inference network approaches to retrieval. Info. Proc. and Mgt. 40(5):735-750,

Information Need : To find about trade relationshipsbetween India and China

How to combine these results???

6

India China TradeSino India

Trade

Page 7: Sumit Bhatiasumitbhatia.net/Talks/inet.pdf · 1. D. Metzler and W.B. Croft. Combining the language model and inference network approaches to retrieval. Info. Proc. and Mgt. 40(5):735-750,

Nittany Lions vs Buckeyes

Or

Penn State vs Ohio State

Different vocabularies of document authorand query author

Mapping between related concepts

7

Page 8: Sumit Bhatiasumitbhatia.net/Talks/inet.pdf · 1. D. Metzler and W.B. Croft. Combining the language model and inference network approaches to retrieval. Info. Proc. and Mgt. 40(5):735-750,

Turtle and Croft (1991) introduced a retrievalmodel based on Inference Networks

Described IR as an inference process

◦ Users has information need I

◦ Expressed through a query Q

◦ A document is “observed”

◦ Estimate the probability that user’s informationneed is satisfied by the document

◦ Different query/document representations providethe evidence

8

Page 9: Sumit Bhatiasumitbhatia.net/Talks/inet.pdf · 1. D. Metzler and W.B. Croft. Combining the language model and inference network approaches to retrieval. Info. Proc. and Mgt. 40(5):735-750,

Inference Network (Inet)

• Directed Acyclic Graph

• Each node represents an event with a continuous/discrete outcome• In this talk, we assume binary outputs

• Root node event occurs with a prior probability

• Non root nodes store a table of conditionalprobabilities aka Link Matrix describing itsoutcome given parent node

9

Page 10: Sumit Bhatiasumitbhatia.net/Talks/inet.pdf · 1. D. Metzler and W.B. Croft. Combining the language model and inference network approaches to retrieval. Info. Proc. and Mgt. 40(5):735-750,

4 types of nodes:

◦ Document Nodes

◦ Representation Nodes

◦ Query Representation Nodes

◦ Information Need Node

10

Page 11: Sumit Bhatiasumitbhatia.net/Talks/inet.pdf · 1. D. Metzler and W.B. Croft. Combining the language model and inference network approaches to retrieval. Info. Proc. and Mgt. 40(5):735-750,

One node for each document in the corpus

Corresponds to the event that document di is observed

Only one document observed at a point

11

d1 d2 dn

Page 12: Sumit Bhatiasumitbhatia.net/Talks/inet.pdf · 1. D. Metzler and W.B. Croft. Combining the language model and inference network approaches to retrieval. Info. Proc. and Mgt. 40(5):735-750,

A set of representation nodes

Representation = an indexable feature of document, eg. terms, phrases etc.

Each document node acts as a parent for representation nodes

12

r1 r2 rk

Page 13: Sumit Bhatiasumitbhatia.net/Talks/inet.pdf · 1. D. Metzler and W.B. Croft. Combining the language model and inference network approaches to retrieval. Info. Proc. and Mgt. 40(5):735-750,

Since only one document is observed at a time, we wish to estimate the following for representation nodes:

Also known as belief, bel(rk)

13

r1 r2 rk

Page 14: Sumit Bhatiasumitbhatia.net/Talks/inet.pdf · 1. D. Metzler and W.B. Croft. Combining the language model and inference network approaches to retrieval. Info. Proc. and Mgt. 40(5):735-750,

Combine beliefs about representation nodes and other query nodes

Enable structured query operators

Need to compute Link Matrices

14

Page 15: Sumit Bhatiasumitbhatia.net/Talks/inet.pdf · 1. D. Metzler and W.B. Croft. Combining the language model and inference network approaches to retrieval. Info. Proc. and Mgt. 40(5):735-750,

15

Suppose query node q has n parent nodes with corresponding beliefs p1, p2, …, pn

There are 2n possible parent states and 2 possible outcomes of q

Represented by a 2 x 2n Link Matrix

Page 16: Sumit Bhatiasumitbhatia.net/Talks/inet.pdf · 1. D. Metzler and W.B. Croft. Combining the language model and inference network approaches to retrieval. Info. Proc. and Mgt. 40(5):735-750,

Represents the event that the information need of user is met

Combines evidence from all other query nodes into a single belief

Documents are ranked by this belief score

16

Page 17: Sumit Bhatiasumitbhatia.net/Talks/inet.pdf · 1. D. Metzler and W.B. Croft. Combining the language model and inference network approaches to retrieval. Info. Proc. and Mgt. 40(5):735-750,

17

D

R3R2R1

I

q2q1

Document Network

Query Network

Representation Nodebeliefs can be computedusing LMs

Page 18: Sumit Bhatiasumitbhatia.net/Talks/inet.pdf · 1. D. Metzler and W.B. Croft. Combining the language model and inference network approaches to retrieval. Info. Proc. and Mgt. 40(5):735-750,

Trec 4,6,7, and 8 datasets

For Trec 4, structured queries were available

For Trec 6,7 and 8 structured queries were generated from title and description fields

Example structured query for free text query “capital punishment deterrent crime”

18

Page 19: Sumit Bhatiasumitbhatia.net/Talks/inet.pdf · 1. D. Metzler and W.B. Croft. Combining the language model and inference network approaches to retrieval. Info. Proc. and Mgt. 40(5):735-750,

19

Page 20: Sumit Bhatiasumitbhatia.net/Talks/inet.pdf · 1. D. Metzler and W.B. Croft. Combining the language model and inference network approaches to retrieval. Info. Proc. and Mgt. 40(5):735-750,

Part of Lemur Toolkit

Based on Inference Networks and LMs

Powerful query language to support structured queries and evidence combination

Different document types – TREC style, XML, HTML, .doc

XML retrieval, sentence retrieval, passage retrieval, extent retrieval

Scalable

20

Page 21: Sumit Bhatiasumitbhatia.net/Talks/inet.pdf · 1. D. Metzler and W.B. Croft. Combining the language model and inference network approaches to retrieval. Info. Proc. and Mgt. 40(5):735-750,

Information Retrieval can be considered as aninference process

Inference Networks provide a theoreticallysound framework to infer the probability thata given document satisfies user’s informationneeds

Language Models provide a means toestimate the beliefs/probabilities

21

Page 22: Sumit Bhatiasumitbhatia.net/Talks/inet.pdf · 1. D. Metzler and W.B. Croft. Combining the language model and inference network approaches to retrieval. Info. Proc. and Mgt. 40(5):735-750,

1. D. Metzler and W.B. Croft. Combining the languagemodel and inference network approaches to retrieval.Info. Proc. and Mgt. 40(5):735-750, 2004.

2. Lemur Project Website: http://www.lemurproject.org

3. Trevor Strohman, Donald Metzler, Howard Turtle andW. Bruce Croft. Indri: A language-model based searchengine for complex queries. Umass Technical Report

4. H. Turtle and W. B. Croft, Evaluation of an inferencenetwork based retrieval model. Trans. Inf. Syst.,9(3):187-22, 1991.

22

Page 23: Sumit Bhatiasumitbhatia.net/Talks/inet.pdf · 1. D. Metzler and W.B. Croft. Combining the language model and inference network approaches to retrieval. Info. Proc. and Mgt. 40(5):735-750,