Dynamic Information Retrieval Tutorial

214
SIGIR Tutorial July 7 th 2014 Grace Hui Yang Marc Sloan Jun Wang Guest Speaker: Emine Yilmaz Dynamic Information Retrieval Modeling

description

Dynamic aspects of Information Retrieval (IR), including changes found in data, users and systems, are increasingly being utilized in search engines and information filtering systems. Examples include large datasets containing sequential data capturing document dynamics and modern IR systems observing user dynamics through interactivity. Existing IR techniques are limited in their ability to optimize over changes, learn with minimal computational footprint and be responsive and adaptive. The objective of this tutorial is to provide a comprehensive and up-to-date introduction to Dynamic Information Retrieval Modeling. Dynamic IR Modeling is the statistical modeling of IR systems that can adapt to change. It is a natural follow-up to previous statistical IR modeling tutorials with a fresh look on state-of-the-art dynamic retrieval models and their applications including session search and online advertising. The tutorial covers techniques ranging from classic relevance feedback to the latest applications of partially observable Markov decision processes (POMDPs) and presents to fellow researchers and practitioners a handful of useful algorithms and tools for solving IR problems incorporating dynamics. http://www.dynamic-ir-modeling.org/ @inproceedings{Yang:2014:DIR:2600428.2602297, author = {Yang, Hui and Sloan, Marc and Wang, Jun}, title = {Dynamic Information Retrieval Modeling}, booktitle = {Proceedings of the 37th International ACM SIGIR Conference on Research \&\#38; Development in Information Retrieval}, series = {SIGIR '14}, year = {2014}, isbn = {978-1-4503-2257-7}, location = {Gold Coast, Queensland, Australia}, pages = {1290--1290}, numpages = {1}, url = {http://doi.acm.org/10.1145/2600428.2602297}, doi = {10.1145/2600428.2602297}, acmid = {2602297}, publisher = {ACM}, address = {New York, NY, USA}, keywords = {dynamic information retrieval modeling, probabilistic relevance model, reinforcement learning}, }

Transcript of Dynamic Information Retrieval Tutorial

Page 1: Dynamic Information Retrieval Tutorial

SIGIR Tutorial July 7th 2014

Grace Hui Yang

Marc Sloan

Jun Wang

Guest Speaker: Emine Yilmaz

Dynamic Information Retrieval

Modeling

Page 2: Dynamic Information Retrieval Tutorial

Dynamic Information Retrieval Modeling Tutorial 2014 2

Page 3: Dynamic Information Retrieval Tutorial

Age of Empire

Dynamic Information Retrieval Modeling Tutorial 2014 3

Page 4: Dynamic Information Retrieval Tutorial

Dynamic Information Retrieval

Dynamic Information Retrieval Modeling Tutorial 2014 4

Documents

to explore Information

need

Observed

documents

User

Devise a strategy for

helping the user

explore the

information space in

order to learn which

documents are

relevant and which

aren’t, and satisfy

their information

need.

Page 5: Dynamic Information Retrieval Tutorial

Evolving IR

Dynamic Information Retrieval Modeling Tutorial 2014 5

Paradigm shifts in IR as new models emerge

e.g. VSM → BM25 → Language Model

Different ways of defining relationship between

query and document

Static → Interactive → Dynamic

Evolution in modeling user interaction with search

engine

Page 6: Dynamic Information Retrieval Tutorial

Outline

Dynamic Information Retrieval Modeling Tutorial 2014 6

Introduction

Static IR

Interactive IR

Dynamic IR

Theory and Models

Session Search

Reranking

Guest Talk: Evaluation

Page 7: Dynamic Information Retrieval Tutorial

Conceptual Model – Static IR

Dynamic Information Retrieval Modeling Tutorial 2014 7

Static IR Interactive

IR Dynamic

IR

Static IR Interactive

IR Dynamic

IR

No feedback

Page 8: Dynamic Information Retrieval Tutorial

Characteristics of Static IR

Dynamic Information Retrieval Modeling Tutorial 2014 8

Does not learn directly from user

Parameters updated periodically

Page 9: Dynamic Information Retrieval Tutorial

Static Information Retrieval Model

Dynamic Information Retrieval Modeling Tutorial 2014 9

Learning to

Rank

Page 10: Dynamic Information Retrieval Tutorial

Dynamic Information Retrieval Modeling Tutorial 2014 10

Commonly Used Static IR Models

BM25

PageRank

Language

Model

Page 11: Dynamic Information Retrieval Tutorial

Feedback in IR

Dynamic Information Retrieval Modeling Tutorial 2014 11

Page 12: Dynamic Information Retrieval Tutorial

Outline

Dynamic Information Retrieval Modeling Tutorial 2014 12

Introduction

Static IR

Interactive IR

Dynamic IR

Theory and Models

Session Search

Reranking

Guest Talk: Evaluation

Page 13: Dynamic Information Retrieval Tutorial

Conceptual Model – Interactive IR

Dynamic Information Retrieval Modeling Tutorial 2014 13

Static IR Interactive

IR Dynamic

IR

Static IR Interactive

IR Dynamic

IR

Exploit Feedback

Page 14: Dynamic Information Retrieval Tutorial

Interactive User Feedback

Dynamic Information Retrieval Modeling Tutorial 2014 14

Like, dislike,

pause, skip

Page 15: Dynamic Information Retrieval Tutorial

Learn the user’s taste

interactively!

At the same time, provide good

recommendations!

Dynamic Information Retrieval Modeling Tutorial 2014 15

Interactive Recommender

Systems

Page 16: Dynamic Information Retrieval Tutorial

Example - Multi Page Search

Dynamic Information Retrieval Modeling Tutorial 2014 16

Ambiguous

Query

Page 17: Dynamic Information Retrieval Tutorial

Example - Multi Page Search

Dynamic Information Retrieval Modeling Tutorial 2014 17

Topic: Car

Page 18: Dynamic Information Retrieval Tutorial

Example - Multi Page Search

Dynamic Information Retrieval Modeling Tutorial 2014 18

Topic: Animal

Page 19: Dynamic Information Retrieval Tutorial

Example – Interactive Search

Dynamic Information Retrieval Modeling Tutorial 2014 19

Click on ‘car’

webpage

Page 20: Dynamic Information Retrieval Tutorial

Example – Interactive Search

Dynamic Information Retrieval Modeling Tutorial 2014 20

Click on ‘Next

Page’

Page 21: Dynamic Information Retrieval Tutorial

Example – Interactive Search

Dynamic Information Retrieval Modeling Tutorial 2014 21

Page 2 results:

Cars

Page 22: Dynamic Information Retrieval Tutorial

Example – Interactive Search

Dynamic Information Retrieval Modeling Tutorial 2014 22

Click on ‘animal’

webpage

Page 23: Dynamic Information Retrieval Tutorial

Example – Interactive Search

Dynamic Information Retrieval Modeling Tutorial 2014 23

Page 2 results:

Animals

Page 24: Dynamic Information Retrieval Tutorial

Example – Dynamic Search

Dynamic Information Retrieval Modeling Tutorial 2014 24

Topic: Guitar

Page 25: Dynamic Information Retrieval Tutorial

Example – Dynamic Search

Dynamic Information Retrieval Modeling Tutorial 2014 25

Diversified Page

1

Topics: Cars,

animals, guitars

Page 26: Dynamic Information Retrieval Tutorial

Toy Example

Dynamic Information Retrieval Modeling Tutorial 2014 26

Multi-Page search scenario

User image searches for “jaguar”

Rank two of the four results over two pages:

𝑟 = 0.5 𝑟 = 0.51 𝑟 = 0.9 𝑟 = 0.49

Page 27: Dynamic Information Retrieval Tutorial

Toy Example – Static Ranking

Dynamic Information Retrieval Modeling Tutorial 2014 27

Ranked according to PRP

Page 1 Page 2

1.

2.

𝑟 = 0.9

𝑟 = 0.51

1.

2.

𝑟 = 0.5

𝑟 = 0.49

Page 28: Dynamic Information Retrieval Tutorial

Toy Example – Relevance

Feedback

Dynamic Information Retrieval Modeling Tutorial 2014 28

Interactive Search

Improve 2nd page based on feedback from 1st page

Use clicks as relevance feedback

Rocchio1 algorithm on terms in image webpage

𝑤𝑞′ = 𝛼𝑤𝑞 +

𝛽

|𝐷𝑟| 𝑤𝑑𝑑∈𝐷𝑟

−𝛾

𝐷𝑛 𝑤𝑑𝑑∈𝐷𝑛

New query closer to relevant documents and

different to non-relevant documents

1Rocchio, J. J., ’71, Baeza-Yates &

Ribeiro-Neto ‘99

Page 29: Dynamic Information Retrieval Tutorial

Toy Example – Relevance

Feedback

Dynamic Information Retrieval Modeling Tutorial 2014 29

Ranked according to PRP and Rocchio

Page 1 Page 2

2.

𝑟 = 0.9

𝑟 = 0.51

1.

2.

𝑟 = 0.5

𝑟 = 0.49

1.

*

* Click

Page 30: Dynamic Information Retrieval Tutorial

Toy Example – Relevance

Feedback

Dynamic Information Retrieval Modeling Tutorial 2014 30

No click when searching for animals

Page 1 Page 2

2.

𝑟 = 0.9

𝑟 = 0.51

1.

2.

1. ?

?

Page 31: Dynamic Information Retrieval Tutorial

Toy Example – Value Function

Dynamic Information Retrieval Modeling Tutorial 2014 31

Optimize both pages using dynamic IR

Bellman equation for value function

Simplified example:

𝑉𝑡 𝜃𝑡, Σ𝑡 = max𝑠𝑡 𝜃𝑠𝑡 + 𝐸(𝑉𝑡+1 𝜃𝑡+1, Σ𝑡+1 𝐶𝑡)

𝜃𝑡, Σ𝑡 = relevance and covariance of documents for page 𝑡

𝐶𝑡 = clicks on page 𝑡

𝑉𝑡 = ‘value’ of ranking on page 𝑡

Maximize value over all pages based on estimating feedback

Page 32: Dynamic Information Retrieval Tutorial

1 0.8 0.1 00.8 1 0.1 00.1 0.1 1 0.950 0 0.95 1

Toy Example - Covariance

Dynamic Information Retrieval Modeling Tutorial 2014 32

Covariance matrix represents similarity between images

Page 33: Dynamic Information Retrieval Tutorial

Toy Example – Myopic Value

Dynamic Information Retrieval Modeling Tutorial 2014 33

For myopic ranking, 𝑉2 = 16.380

Page 1

2.

1.

Page 34: Dynamic Information Retrieval Tutorial

Toy Example – Myopic Ranking

Dynamic Information Retrieval Modeling Tutorial 2014 34

Page 2 ranking stays the same regardless of clicks

Page 1 Page 2

2.

1.

2.

1.

Page 35: Dynamic Information Retrieval Tutorial

Toy Example – Optimal Value

Dynamic Information Retrieval Modeling Tutorial 2014 35

For optimal ranking, 𝑉2 = 16.528

Page 1

2.

1.

Page 36: Dynamic Information Retrieval Tutorial

Toy Example – Optimal Ranking

Dynamic Information Retrieval Modeling Tutorial 2014 36

If car clicked, Jaguar logo is more relevant on next page

Page 1 Page 2

2.

1.

2.

1.

Page 37: Dynamic Information Retrieval Tutorial

Toy Example – Optimal Ranking

Dynamic Information Retrieval Modeling Tutorial 2014 37

In all other scenarios, rank animal first on next page

Page 1 Page 2

2.

1.

2.

1.

Page 38: Dynamic Information Retrieval Tutorial

Interactive vs Dynamic IR

Dynamic Information Retrieval Modeling Tutorial 2014 38

• Treats interactions

independently

• Responds to

immediate

feedback

• Static IR used

before feedback

received

• Optimizes over

all interaction

• Long term gains

• Models future

user feedback

• Also used at

beginning of

interaction

Interactive Dynamic

Page 39: Dynamic Information Retrieval Tutorial

Outline

Dynamic Information Retrieval Modeling Tutorial 2014 39

Introduction

Static IR

Interactive IR

Dynamic IR

Theory and Models

Session Search

Reranking

Guest Talk: Evaluation

Page 40: Dynamic Information Retrieval Tutorial

Conceptual Model – Dynamic IR

Dynamic Information Retrieval Modeling Tutorial 2014 40

Static IR Interactive

IR Dynamic

IR

Static IR Interactive

IR Dynamic

IR

Explore and exploit Feedback

Page 41: Dynamic Information Retrieval Tutorial

Characteristics of Dynamic IR

Dynamic Information Retrieval Modeling Tutorial 2014 41

Rich interactions

Query formulation

Document clicks

Document examination

eye movement

mouse movements

etc.

Page 42: Dynamic Information Retrieval Tutorial

Characteristics of Dynamic IR

Dynamic Information Retrieval Modeling Tutorial 2014 42

Temporal dependency

clicked documents query

D1 ranked documents

q1 C1

D2

q2 C2 ……

…… Dn

qn Cn

I information need

iteration 1 iteration 2 iteration n

Page 43: Dynamic Information Retrieval Tutorial

Characteristics of Dynamic IR

Dynamic Information Retrieval Modeling Tutorial 2014 43

Overall goal

Optimize over all iterations for goal

IR metric or user satisfaction

Optimal policy

Page 44: Dynamic Information Retrieval Tutorial

Dynamic IR

Dynamic Information Retrieval Modeling Tutorial 2014 44

Dynamic IR explores actions

Dynamic IR learns from user and adjusts its

actions

May hurt performance in a single stage, but

improves over all stages

Page 45: Dynamic Information Retrieval Tutorial

Applications to IR

Dynamic Information Retrieval Modeling Tutorial 2014 45

Dynamics found in lots of different aspects of IR

Dynamic Users

Users change behaviour over time, user history

Dynamic Documents

Information Filtering, document content change

Dynamic Queries

Changing query definition i.e. ‘Twitter’

Dynamic Information Needs

Topic ontologies evolve over time

Dynamic Relevance

Seasonal/time of day change in relevance

Page 46: Dynamic Information Retrieval Tutorial

User Interactivity in DIR

Dynamic Information Retrieval Modeling Tutorial 2014 46

Modern IR interfaces

Facets

Verticals

Personalization

Responsive to particular user

Complex log data

Mobile

Richer user interactions

Ads

Adaptive targeting

Page 47: Dynamic Information Retrieval Tutorial

Big Data

Dynamic Information Retrieval Modeling Tutorial 2014 47

Data set sizes are always increasing

Computational footprint of learning to rank

Rich, sequential data

1Yin He et. al, ’11

Complex user model behaviour found in data, takes into

account reading, skipping and re-reading behaviours1

Uses a POMDP

Example

Page 48: Dynamic Information Retrieval Tutorial

Online Learning to Rank

Dynamic Information Retrieval Modeling Tutorial 2014 48

Learning to rank iteratively on sequential data

Clicks as implicit user feedback/preference

Often uses multi-armed bandit techniques

1Katja Hofmann et. al., ’11 2Yisong Yue et. al., ‘09

Uses click models to interpret clicks and a contextual bandit to improve learning1

Pairwise comparison of rankings using duelling bandits formulation2

Example

Page 49: Dynamic Information Retrieval Tutorial

Evaluation

Dynamic Information Retrieval Modeling Tutorial 2014 49

Use complex user interaction data to assess rankings

Compare ranking techniques in online testing

Minimise user dissatisfaction

1Jeff Huang et. al., ‘11 2Olivier Chapelle et. al., ‘12

Modelled cursor activity and correlated with eye tracking to validate good or bad abandonment1

Interleave search results from two ranking algorithms to determine which is better2

Example

Page 50: Dynamic Information Retrieval Tutorial

Filtering and News

Dynamic Information Retrieval Modeling Tutorial 2014 50

Adaptive techniques to personalize information filtering

or news recommendation

Understand the complex dynamics of real world events

in search logs

Capture temporal document change1

1Dennis Fetterly et. al., ‘03 2Stephen Robertson, ‘02 3Jure Leskovec et. al., ‘09

Uses relevance feedback to adapt threshold sensitivity over time in information filtering to maximise overal utility1

Detected patterns and memes in news cycles and modeled how information spreads2

Example

Page 51: Dynamic Information Retrieval Tutorial

Advertising

Dynamic Information Retrieval Modeling Tutorial 2014 51

Behavioural targeting and personalized ads

Learn when to display new ads

Maximise profit from available ads

1Shuai Yuan et. al., ‘12 2Zeyuan Allen Zhu et. al., ‘10

Uses a POMDP and ad correlation to find the optimal ad to display to a user1

Dynamic click model that can interpret complex user behaviour in logs and apply results to tail queries and unseen ads2

Example

Page 52: Dynamic Information Retrieval Tutorial

Outline

Dynamic Information Retrieval Modeling Tutorial 2014 52

Introduction

Theory and Models

Session Search

Reranking

Guest Talk: Evaluation

Page 53: Dynamic Information Retrieval Tutorial

Outline

Dynamic Information Retrieval Modeling Tutorial 2014 53

Introduction

Theory and Models

Why not use supervised learning

Markov Models

Session Search

Reranking

Evaluation

Page 54: Dynamic Information Retrieval Tutorial

Why not use Supervised Learning

for Dynamic IR Modeling?

Dynamic Information Retrieval Modeling Tutorial 2014 54

Lack of enough training data

Dynamic IR problems contain a sequence of dynamic interactions

E.g. a series of queries in session

Rare to find repeated sequences (close to zero)

Even in large query logs (WSCD 2013 & 2014, query logs from Yandex)

Chance of finding repeated adjacent query pairs is

also low

Dataset Repeated Adjacent

Query Pairs

Total Adjacent

Query Pairs

Repeated

Percentage

WSCD 2013 476,390 17,784,583 2.68%

WSCD 2014 1,959,440 35,376,008 5.54%

Page 55: Dynamic Information Retrieval Tutorial

Our Solution

Dynamic Information Retrieval Modeling Tutorial 2014 55

Try to find an optimal solution through a

sequence of dynamic interactions

Trial and Error: learn from repeated, varied attempts which

are continued until success

No Supervised Learning

Page 56: Dynamic Information Retrieval Tutorial

Trial and Error

Dynamic Information Retrieval Modeling Tutorial 2014 56

q1 – "dulles hotels"

q2 – "dulles airport"

q3 – "dulles airport location"

q4 – "dulles metrostop"

Page 57: Dynamic Information Retrieval Tutorial

Dynamic Information Retrieval Modeling Tutorial 2014 57

Rich interactions

Query formulation, Document clicks, Document examination,

eye movement, mouse movements, etc.

Temporal dependency

Overall goal

Recap – Characteristics of

Dynamic IR

Page 58: Dynamic Information Retrieval Tutorial

Dynamic Information Retrieval Modeling Tutorial 2014 58

Model interactions, which means it needs to have place holders for actions;

Model information need hidden behind user queries and other interactions;

Set up a reward mechanism to guide the entire search algorithm to adjust its retrieval strategies;

Represent Markov properties to handle the temporal dependency.

What is a Desirable Model for

Dynamic IR

A model in Trial and Error setting will do!

A Markov Model will do!

Page 59: Dynamic Information Retrieval Tutorial

Outline

Dynamic Information Retrieval Modeling Tutorial 2014 59

Introduction

Theory and Models

Why not use supervised learning

Markov Models

Session Search

Reranking

Evaluation

Page 60: Dynamic Information Retrieval Tutorial

Markov Process Markov Property1 (the “memoryless” property)

for a system, its next state depends on its current state.

Pr(Si+1|Si,…,S0)=Pr(Si+1|Si)

Markov Process

a stochastic process with Markov property.

e.g.

Dynamic Information Retrieval Modeling Tutorial 2014 60 1A. A. Markov, ‘06

s0 s1 …… si

…… si+1

Page 61: Dynamic Information Retrieval Tutorial

Dynamic Information Retrieval Modeling Tutorial 2014 61

Markov Chain

Hidden Markov Model

Markov Decision Process

Partially Observable Markov Decision Process

Multi-armed Bandit

Family of Markov Models

Page 62: Dynamic Information Retrieval Tutorial

A

Pagerank(A)

Discrete-time Markov process

Example: Google PageRank1

Markov Chain

B

Pagerank(B)

𝑃𝑎𝑔𝑒𝑟𝑎𝑛𝑘 𝑆 =1 − 𝛼

𝑁+ 𝛼

𝑃𝑎𝑔𝑒𝑟𝑎𝑛𝑘(𝑌)

𝐿(𝑌)𝑌∈Π

# of pages # of outlinks

pages linked to S

Dynamic Information Retrieval Modeling Tutorial 2014 62

D

Pagerank(D)

C

Pagerank(C)

E

Pagerank(E)

Random jump factor

1L. Page et. al., ‘99

The stable state distribution of such an MC is PageRank

State S – web page

Transition probability M

PageRank: how likely a random web surfer will land on a page

(S, M)

Page 63: Dynamic Information Retrieval Tutorial

Hidden Markov Model

A Markov chain that states are hidden and observable

symbols are emitted with some probability according to its

states1.

Dynamic Information Retrieval Modeling Tutorial 2014 63

s0 s1 s2 ……

o0 o1 o2

p0

𝑒0

p1 p2

𝑒1 𝑒2

Si– hidden state pi -- transition probability oi --observation

ei --observation probability (emission probability)

1Leonard E. Baum et. al., ‘66

(S, M, O, e)

Page 64: Dynamic Information Retrieval Tutorial

An HMM example for IR

Construct an HMM for each document1

Dynamic Information Retrieval Modeling Tutorial 2014 64

s0 s1 s2 ……

t0 t1 t2

p0

𝑒0

p1 p2

𝑒1 𝑒2

Si– “Document” or

“General English”

pi –a0 or a1

ti – query term

ei – Pr(t|D) or Pr(t|GE)

P(D|q)∝ (𝑎0𝑃 𝑡 𝐺𝐸 + 𝑎1𝑃(𝑡|𝐷))𝑡∈𝑞

Document-to-query relevance

1Miller et. al. ‘99

query

Page 65: Dynamic Information Retrieval Tutorial

MDP extends MC with actions and rewards1

si– state ai – action ri – reward

pi – transition probability

p0 p1 p2

Markov Decision Process

Dynamic Information Retrieval Modeling Tutorial 2014 65

…… s0 s1

r0

a0

s2

r1

a1

s3

r2

a2

1R. Bellman, ‘57

(S, M, A, R, γ)

Page 66: Dynamic Information Retrieval Tutorial

Definition of MDP A tuple (S, M, A, R, γ)

S : state space

M: transition matrix

Ma(s, s') = P(s'|s, a)

A: action space

R: reward function

R(s,a) = immediate reward taking action a at state s

γ: discount factor, 0< γ ≤1

policy π

π(s) = the action taken at state s

Goal is to find an optimal policy π* maximizing the expected total rewards.

Dynamic Information Retrieval Modeling Tutorial 2014 66

Page 67: Dynamic Information Retrieval Tutorial

Policy

Policy: (s) = a According to which,

select an action a at

state s.

(s0) =move right and up s0

(s1) =move right and up s1

(s2) = move right s2

Dynamic Information Retrieval Modeling Tutorial 2014 67 [Slide altered from Carlos Guestrin’s ML lecture]

Page 68: Dynamic Information Retrieval Tutorial

Value of Policy

Value: V(s) Expected long-term

reward starting from s

Start from s0

s0

R(s0) (s0)

V(s0) = E[R(s0) + R(s1) + 2 R(s2) + 3 R(s3)

+ 4 R(s4) + ]

Future rewards

discounted by [0,1)

Dynamic Information Retrieval Modeling Tutorial 2014 68 [Slide altered from Carlos Guestrin’s ML lecture]

Page 69: Dynamic Information Retrieval Tutorial

Value of Policy

Value: V(s) Expected long-term

reward starting from s

Start from s0

s0

R(s0) (s0)

V(s0) = E[R(s0) + R(s1) + 2 R(s2) + 3 R(s3)

+ 4 R(s4) + ]

Future rewards

discounted by [0,1)

s1

R(s1) s1’’

s1’

R(s1’)

R(s1’’) Dynamic Information Retrieval Modeling Tutorial 2014 69 [Slide altered from Carlos Guestrin’s ML lecture]

Page 70: Dynamic Information Retrieval Tutorial

Value of Policy

Value: V(s) Expected long-term

reward starting from s

Start from s0

s0

R(s0) (s0)

V(s0) = E[R(s0) + R(s1) + 2 R(s2) + 3 R(s3)

+ 4 R(s4) + ]

Future rewards

discounted by [0,1)

s1

R(s1) s1’’

s1’

R(s1’)

R(s1’’)

(s1)

R(s2)

s2

(s1’)

(s1’’)

s2’’

s2’

R(s2’)

R(s2’’) Dynamic Information Retrieval Modeling Tutorial 2014 70 [Slide altered from Carlos Guestrin’s ML lecture]

Page 71: Dynamic Information Retrieval Tutorial

Computing the value of a policy

Dynamic Information Retrieval Modeling Tutorial 2014 71

V(s0) = 𝐸𝜋[𝑅 𝑠0, 𝑎 + 𝛾𝑅 𝑠1, 𝑎 + 𝛾2𝑅 𝑠2, 𝑎 + 𝛾

3𝑅 𝑠3, 𝑎 + ⋯ ]

=𝐸𝜋[𝑅 𝑠0, 𝑎 + 𝛾 𝛾𝑡−1𝑅(𝑠𝑡 , 𝑎)

∞𝑡=1 ]

=𝑅 𝑠0, 𝑎 + 𝛾𝐸𝜋[ 𝛾𝑡−1𝑅(𝑠𝑡 , 𝑎)∞𝑡=1 ]

=𝑅 𝑠0, 𝑎 + 𝛾 𝑀𝜋 𝑠 (𝑠, 𝑠′) 𝑉(𝑠′)𝑠′

Value function

A possible next state The current

state

Page 72: Dynamic Information Retrieval Tutorial

Optimality — Bellman Equation

The Bellman equation1 to MDP is a recursive definition of

the optimal value function V*(.)

𝑉∗ s = max𝑎𝑅 𝑠, 𝑎 + 𝛾 𝑀𝑎(𝑠, 𝑠′)𝑉

∗(𝑠′)

𝑠′

Dynamic Information Retrieval Modeling Tutorial 2014 72

Optimal Policy

π∗ s = arg𝑚𝑎𝑥𝑎𝑅 𝑠, 𝑎 + 𝛾 𝑀𝑎 𝑠, 𝑠

′ 𝑉∗(𝑠′)

𝑠′

1R. Bellman, ‘57

state-value function

Page 73: Dynamic Information Retrieval Tutorial

Optimality — Bellman Equation

The Bellman equation can be rewritten as

𝑉∗ 𝑠 = maxa𝑄(𝑠, 𝑎)

𝑄(𝑠, 𝑎) = 𝑅 𝑠, 𝑎 + 𝛾 𝑀𝑎(𝑠, 𝑠′)𝑉∗(𝑠′)

𝑠′

Dynamic Information Retrieval Modeling Tutorial 2014 73

Optimal Policy

π∗ s = arg𝑚𝑎𝑥𝑎𝑄 𝑠, 𝑎

action-value function

Relationship

between V and Q

Page 74: Dynamic Information Retrieval Tutorial

MDP algorithms

Dynamic Information Retrieval Modeling Tutorial 2014 74

Value Iteration

Policy Iteration

Modified Policy Iteration

Prioritized Sweeping

Temporal Difference (TD) Learning

Q-Learning

Model free

approaches

Model-based

approaches

[Bellman, ’57, Howard, ‘60, Puterman and Shin, ‘78, Singh & Sutton, ‘96, Sutton & Barto, ‘98,

Richard Sutton, ‘88, Watkins, ‘92]

Solve Bellman

equation Optimal

value V*(s)

Optimal

policy *(s)

[Slide altered from Carlos Guestrin’s ML lecture]

Page 75: Dynamic Information Retrieval Tutorial

Value Iteration

Initialization Initialize 𝑉0 𝑠 arbitrarily

Loop

Iteration 𝑉𝑖+1 𝑠 ← max

𝑎𝑅 𝑠, 𝑎 + 𝛾 𝑀𝑎(𝑠, 𝑠′)𝑉𝑖(𝑠′)𝑠′

π s ← arg𝑚𝑎𝑥𝑎𝑅 𝑠, 𝑎 + 𝛾 𝑀𝑎(𝑠, 𝑠′)𝑉𝑖(𝑠′)𝑠′

Stopping criteria π s is good enough

Dynamic Information Retrieval Modeling Tutorial 2014 75 1Bellman, ‘57

Page 76: Dynamic Information Retrieval Tutorial

Greedy Value Iteration

Initialization Initialize 𝑉0 𝑠 arbitrarily

Iteration 𝑉𝑖+1 𝑠 ← max

𝑎𝑅 𝑠, 𝑎 + 𝛾 𝑀𝑎(𝑠, 𝑠′)𝑉𝑖(𝑠′)𝑠′

Stopping criteria

∀𝑠 𝑉𝑖+1 𝑠 − 𝑉𝑖 𝑠 < ε

Optimal policy

π s ← arg𝑚𝑎𝑥𝑎𝑅 𝑠, 𝑎 + 𝛾 𝑀𝑎(𝑠, 𝑠′)𝑉𝑖(𝑠′)

𝑠′

Dynamic Information Retrieval Modeling Tutorial 2014 76 1Bellman, ‘57

Page 77: Dynamic Information Retrieval Tutorial

Greedy Value Iteration

1. For each state s∈S

Initialize V0(s) arbitrarily End for 2. 𝑖 ← 0 3. Repeat 3.1 𝑖 ← 𝑖 + 1 3.2 For each 𝑠 ∈ 𝑆 𝑉𝑖 𝑠 ← max

𝑎𝑅 𝑠, 𝑎 + 𝛾 𝑀𝑎(𝑠, 𝑠′)𝑉𝑖−1(𝑠′)𝑠′

end for until ∀𝑠 𝑉𝑖 𝑠 − 𝑉𝑖−1 𝑠 < ε

4. For each 𝑠 ∈ 𝑆

π s ← arg𝑚𝑎𝑥𝑎𝑅 𝑠, 𝑎 + 𝛾 𝑀𝑎(𝑠, 𝑠′)𝑉𝑖(𝑠′)𝑠′

end for

Algorithm

Dynamic Information Retrieval Modeling Tutorial 2014 77

Page 78: Dynamic Information Retrieval Tutorial

V(0)(S1)=max{R(S1,a1), R(S1,a2)}=6

V(1)(S1)=max{ 3+0.96*(0.3*6+0.7*4), 6+0.96*(1.0*8) } =max{3+0.96*4.6, 6+0.96*8.0}

=max{7.416, 13.68}

=13.68

Greedy Value Iteration

𝑉 s = max𝑎𝑅 𝑠, 𝑎 + 𝛾 𝑀𝑎(𝑠, 𝑠′)𝑉(𝑠′)

𝑠′

V(0)(S2)=max{R(S2,a1), R(S2,a2)}=4

V(0)(S3)=max{R(S3,a1), R(S3,a2)}=8

Dynamic Information Retrieval Modeling Tutorial 2014 78

Ma1=0.3 0.7 01.0 0 00.8 0.2 0

Ma2=0 0 1.00 0.2 0.80 1.0 0

a1 a2

Page 79: Dynamic Information Retrieval Tutorial

Greedy Value Iteration

𝑉 s = max𝑎𝑅 𝑠, 𝑎 + 𝛾 𝑀𝑎(𝑠, 𝑠′)𝑉(𝑠′)

𝑠′

Dynamic Information Retrieval Modeling Tutorial 2014 79

i V(i)(S1) V(i)(S2) V(i)(S3)

0 6 4 8

1 13.680 9.760 13.376

2 18.841 17.133 20.380

3 25.565 22.087 25.759

… … … …

200 168.039 165.316 168.793

Ma1=0.3 0.7 01.0 0 00.8 0.2 0

Ma2=0 0 1.00 0.2 0.80 1.0 0

a1 a2 a1

π S1 π S𝟐 π S𝟑

a2 a1 a1

Page 80: Dynamic Information Retrieval Tutorial

Policy Iteration

Initialization

𝑉π0 𝑠 ←0, π0 s ← 𝑎𝑟𝑏𝑖𝑡𝑟𝑎𝑟𝑦 𝑝𝑜𝑙𝑖𝑐𝑦

Iteration (over i ) Policy Evaluation

𝑉π𝑖 𝑠∞←𝑅 𝑠, π𝑖 s + 𝛾 𝑀𝑎(𝑠, 𝑠′)𝑉π𝑖(𝑠′)

𝑠′

Policy Improvement

π𝑖+1 s ← arg𝑚𝑎𝑥𝑎𝑅 𝑠, 𝑎 + 𝛾 𝑀𝑎(𝑠, 𝑠′)𝑉π𝑖(𝑠′)𝑠′

Stop criteria

Policy stops changing

Dynamic Information Retrieval Modeling Tutorial 2014 80 1Howard , ‘60

Page 81: Dynamic Information Retrieval Tutorial

Policy Iteration

1.For each state s∈S 𝑉 𝑠 ←0, π0 s ← 𝑎𝑟𝑏𝑖𝑡𝑟𝑎𝑟𝑦 𝑝𝑜𝑙𝑖𝑐𝑦 , 𝑖 ← 0 End for 2. Repeat 2.1 Repeat For each 𝑠 ∈ 𝑆 𝑉′(𝑠) ← 𝑉(𝑠) 𝑉 𝑠 ← 𝑅 𝑠, π𝑖 s + 𝛾 𝑀𝑎 𝑠, 𝑠

′ 𝑉(𝑠′)𝑠′

End for until ∀𝑠 𝑉 𝑠 − 𝑉′ 𝑠 < ε 2.2 For each 𝑠 ∈ 𝑆

π𝑖+1 s ← arg𝑚𝑎𝑥𝑎𝑅 𝑠, 𝑎 + 𝛾 𝑀𝑎 𝑠, 𝑠

′ 𝑉(𝑠′)

𝑠′

End for 2.3 𝑖 ← 𝑖 + 1 Until π𝑖 = π𝑖−1

Algorithm

Dynamic Information Retrieval Modeling Tutorial 2014 81

Page 82: Dynamic Information Retrieval Tutorial

Modified Policy Iteration The “Policy Evaluation” step in Policy Iteration is time-

consuming, especially when the state space is large.

The Modified Policy Iteration calculates an approximated

policy evaluation by running just a few iterations

Dynamic Information Retrieval Modeling Tutorial 2014 82

Modified Policy

Iteration Policy Iteration

Greedy Value Iteration k=1

k=∞

Page 83: Dynamic Information Retrieval Tutorial

Modified Policy Iteration

1.For each state s∈S 𝑉 𝑠 ←0, π0 s ← 𝑎𝑟𝑏𝑖𝑡𝑟𝑎𝑟𝑦 𝑝𝑜𝑙𝑖𝑐𝑦 , 𝑖 ← 0 End for 2. Repeat 2.1 Repeat k times For each 𝑠 ∈ 𝑆

𝑉 𝑠 ← 𝑅 𝑠, π𝑖 s + 𝛾 𝑀𝑎 𝑠, 𝑠′ 𝑉(𝑠′)𝑠′

End for 2.2 For each 𝑠 ∈ 𝑆 π𝑖+1 s ← arg𝑚𝑎𝑥

𝑎𝑅 𝑠, 𝑎 + 𝛾 𝑀𝑎 𝑠, 𝑠

′ 𝑉(𝑠′)

𝑠′

End for 2.3 𝑖 ← 𝑖 + 1 Until π𝑖 = π𝑖−1

Algorithm

Dynamic Information Retrieval Modeling Tutorial 2014 83

Page 84: Dynamic Information Retrieval Tutorial

MDP algorithms

Dynamic Information Retrieval Modeling Tutorial 2014 84

Value Iteration

Policy Iteration

Modified Policy Iteration

Prioritized Sweeping

Temporal Difference (TD) Learning

Q-Learning

Model free

approaches

Model-based

approaches

[Bellman, ’57, Howard, ‘60, Puterman and Shin, ‘78, Singh & Sutton, ‘96, Sutton & Barto, ‘98,

Richard Sutton, ‘88, Watkins, ‘92]

Solve Bellman

equation Optimal

value V*(s)

Optimal

policy *(s)

[Slide altered from Carlos Guestrin’s ML lecture]

Page 85: Dynamic Information Retrieval Tutorial

Temporal Difference Learning

Dynamic Information Retrieval Modeling Tutorial 2014 85

Monte Carlo Sampling can be used for model-free policy iteration Estimate 𝑉𝜋 s in “Policy Evaluation” by the average reward of trajectories from s However, on the trajectories, some of them can be reused

So, we estimate them by an expectation over next state

𝑉𝜋 s ← 𝑉𝜋 𝑠 + 𝑟 + γ𝐸 𝑉𝜋 𝑠′ |𝑠, 𝑎

The simplest estimation: 𝑉𝜋 s ← 𝑉𝜋 𝑠 + 𝑟 + 𝛾𝑉𝜋 s′

A smoothed version:

𝑉𝜋 s ← 𝑉𝜋 𝑠 + 𝛼 𝑟 + 𝛾𝑉𝜋 s′ + (1 − 𝛼) 𝑉𝜋 𝑠

TD-Learning rule: 𝑉𝜋 s ← 𝑉𝜋 𝑠 + 𝛼 𝑟 + 𝛾𝑉𝜋 𝑠

′ − 𝑉𝜋(𝑠)

r is the immediate reward, α is the learning rate

Temporal difference

Richard Sutton, ‘88

Singh & Sutton, ‘96

Sutton & Barto, ‘98

Page 86: Dynamic Information Retrieval Tutorial

Dynamic Information Retrieval Modeling Tutorial 2014 86

1. For each state s∈S

Initialize V𝜋(s) arbitrarily

End for

2. For each step in the state sequence

2.1 Initialize s

2.2 repeat

2.2.1 take action a at state s according to 𝜋

2.2.2 observe immediate reward r and the next state 𝑠′

2.2.3 𝑉𝜋 s ← 𝑉𝜋 𝑠 + 𝛼 𝑟 + 𝛾𝑉𝜋 𝑠′ − 𝑉𝜋(𝑠)

2.2.4 𝑠 ← 𝑠′

Until s is a terminal state

End for

Algorithm

Temporal Difference Learning

Page 87: Dynamic Information Retrieval Tutorial

Q-Learning

Dynamic Information Retrieval Modeling Tutorial 2014 87

TD-Learning rule

Q-learning rule

𝑄 𝑠, 𝑎 ← 𝑄 𝑠, 𝑎 + 𝛼 𝑟 + 𝛾max𝑎′𝑄 𝑠′, 𝑎′ − 𝑄(𝑠, 𝑎)

𝑉𝜋 s ← 𝑉𝜋 𝑠 + 𝛼 𝑟 + 𝛾𝑉𝜋 𝑠′ − 𝑉𝜋(𝑠)

𝑉 𝑠 = maxa𝑄(𝑠, 𝑎)

𝜋∗ 𝑠 = arg𝑚𝑎𝑥𝑎𝑄∗(𝑠, 𝑎)

𝑄∗ 𝑠, 𝑎 = 𝑅 𝑠, 𝑎 + 𝛾 𝑀𝑎(𝑠, 𝑠′)max𝑎′𝑄∗(𝑠′, 𝑎′)

𝑠′

Page 88: Dynamic Information Retrieval Tutorial

Q-Learning

Dynamic Information Retrieval Modeling Tutorial 2014 88

1. For each state s∈S and a∈A initialize Q0(s,a) arbitrarily End for 2. 𝑖 ← 0 3. For each step in the state sequence 3.1 Initialize s 3.2 Repeat 3.2.1 𝑖 ← 𝑖 + 1 3.2.2 select an action a at state s according to Qi-1

3.2.3 take action a, observe immediate reward r and the next state 𝑠′

3.2.4 𝑄𝑖 𝑠, 𝑎 ← 𝑄𝑖−1 𝑠, 𝑎 + 𝛼 𝑟 + 𝛾max𝑎′𝑄𝑖−1 𝑠

′, 𝑎′ − 𝑄𝑖−1(𝑠, 𝑎)

3.2.5 𝑠 ← 𝑠′ Until s is a terminal state End for 4. For each 𝑠 ∈ 𝑆 π s ← arg𝑚𝑎𝑥

𝑎𝑄𝑖 𝑠, 𝑎

End for

Algorithm

Page 89: Dynamic Information Retrieval Tutorial

Apply an MDP to an IR Problem

Dynamic Information Retrieval Modeling Tutorial 2014 89

We can model IR systems using a Markov Decision

Process

Is there a temporal component?

States – What changes with each time step?

Actions – How does your system change the state?

Rewards – How do you measure feedback or

effectiveness in your problem at each time step?

Transition Probability – Can you determine this?

If not, then model free approach is more suitable

Page 90: Dynamic Information Retrieval Tutorial

Apply an MDP to an IR Problem -

Example

Dynamic Information Retrieval Modeling Tutorial 2014 90

User agent in session search

States – user’s relevance judgement

Action – new query

Reward – information gained

Page 91: Dynamic Information Retrieval Tutorial

Apply an MDP to an IR Problem -

Example

Dynamic Information Retrieval Modeling Tutorial 2014 91

Search engine’s perspective

What if we can’t directly observe user’s relevance

judgement?

Click ≠ relevance

? ? ? ?

Page 92: Dynamic Information Retrieval Tutorial

Dynamic Information Retrieval Modeling Tutorial 2014 92

Markov Chain

Hidden Markov Model

Markov Decision Process

Partially Observable Markov Decision Process

Multi-armed Bandit

Family of Markov Models

Page 93: Dynamic Information Retrieval Tutorial

POMDP Model

Dynamic Information Retrieval Modeling Tutorial 2014 93

…… s0 s1

r0

a0

s2

r1

a1

s3

r2

a2

Hidden states

Observations

Belief

1R. D. Smallwood et. al., ‘73

o1 o2 o3

Page 94: Dynamic Information Retrieval Tutorial

POMDP Definition

Dynamic Information Retrieval Modeling Tutorial 2014 94

A tuple (S, M, A, R, γ, O, Θ, B) S : state space M: transition matrix A: action space R: reward function

γ: discount factor, 0< γ ≤1 O: observation set an observation is a symbol emitted according to a hidden state.

Θ: observation function Θ(s,a,o) is the probability that o is observed when the system transitions into state s after taking action a, i.e. P(o|s,a). B: belief space Belief is a probability distribution over hidden states.

Page 95: Dynamic Information Retrieval Tutorial

Dynamic Information Retrieval Modeling Tutorial 2014 95

The agent uses a state estimator to update its belief about the hidden states

b′ = 𝑆𝐸(𝑏, 𝑎, 𝑜′)

b′ s′ = P s′ o′, a, b =𝑃(𝑠′,𝑜′|𝑎,𝑏)

P(𝑜′|𝑎,𝑏)

=Θ(𝑠′, 𝑎, 𝑜′) 𝑀(𝑠, 𝑎, 𝑠′)𝑏(𝑠)𝑠

𝑃(𝑜′|𝑎, 𝑏)

POMDP → Belief Update

Page 96: Dynamic Information Retrieval Tutorial

Dynamic Information Retrieval Modeling Tutorial 2014 96

The Bellman equation for POMDP

𝑉 𝑏 = max𝑎𝑟 𝑏, 𝑎 + 𝛾 𝑃(𝑜′|𝑎, 𝑏)𝑉(𝑏′)

𝑜′

A POMDP can be transformed into a continuous belief MDP (B, 𝑀′, A, r, γ)

B : the continuous belief space

𝑀′: transition function 𝑀𝑎′ (𝑏, 𝑏′)= 1𝑎,𝑜′(𝑏

′, 𝑏)Pr(𝑜′|𝑎, 𝑏)𝑜∈𝑂

where 1𝑎,𝑜′ 𝑏′, 𝑏 =

1, 𝑖𝑓 𝑆𝐸 𝑏, 𝑎, 𝑜′ = 𝑏′

0, 𝑒𝑙𝑠𝑒 .

A: action space

r: reward function r(b, a)= 𝑏 𝑠 𝑅(𝑠, 𝑎)𝑠∈𝑆

POMDP → Bellman Equation

Page 97: Dynamic Information Retrieval Tutorial

Dynamic Information Retrieval Modeling Tutorial 2014 97

The optimal policy of a POMDP

The optimal policy of its belief MDP

1L. Kaelbling et. al., ’98

A variation of the value iteration algorithm

Solving POMDPs – The Witness

Algorithm

Page 98: Dynamic Information Retrieval Tutorial

Policy Tree

Dynamic Information Retrieval Modeling Tutorial 2014 98

• A policy tree of depth i is an i-step non-stationary policy

• As if we run value iteration until the ith iteration

a(h)

ok(h) ok

a11

a21 a2k a2l

… …

… … … … … …

o1 ol

… aik …

a(i-1)k

ai1 ail

o1 ol ok

i steps to go

i-1 steps to go

2 steps to go

1 step to go

Page 99: Dynamic Information Retrieval Tutorial

Value of a Policy Tree

Dynamic Information Retrieval Modeling Tutorial 2014 99

Can only determine the value of a policy tree h from some belief state

b, because it never knows the exact state.

𝑉ℎ 𝑏 = 𝑏(𝑠)𝑉ℎ(𝑠)𝑠∈𝑆

𝑉ℎ 𝑠 = 𝑅 𝑠, 𝑎 ℎ + 𝛾 𝑀𝑎 ℎ (𝑠, 𝑠′) Θ(𝑠′, 𝑎 ℎ , 𝑜𝑖)𝑉𝑜𝑘 ℎ (𝑠′)𝑜𝑘∈𝑂𝑠′∈𝑆

the action at the

root node of h

the (i-1)-step subtree associated

with ok under the root node of h

Page 100: Dynamic Information Retrieval Tutorial

Idea of the Witness Algorithm

Dynamic Information Retrieval Modeling Tutorial 2014 100

For each action a, compute Γ𝑖𝑎, the set of candidate i-step policy

trees with action a at their roots

The optimal value function at the ith step, 𝑉𝑖∗(b), is the upper

surface of the value functions of all i-step policy trees.

Page 101: Dynamic Information Retrieval Tutorial

Optimal value function

Dynamic Information Retrieval Modeling Tutorial 2014 101

Geometrically, 𝑉𝑖∗(b) is piecewise linear and convex.

An example for a two-state POMDP

b(s1)+b(s2)=1

Simplex constraint

The belief space is one-dimensional

Vh2(b)

Vh3(b)

Vh1(b)

Vh5(b)

Vh4(b)

𝑉𝑖∗ 𝑏 = max

ℎ∈H 𝑉ℎ 𝑏

Pruning the Set of

Policy Trees

Page 102: Dynamic Information Retrieval Tutorial

Outlines of the Witness Algorithm

Dynamic Information Retrieval Modeling Tutorial 2014 102

Algorithm

1.𝐻1 ←{}

2. i ← 1

3. Repeat

3.1 i ← i+1

3.2 For each a in A Γ𝑖

𝑎 ← witness(𝐻i−1, a)

end for 3.3 Prune Γ𝑖

𝑎𝑎 to get 𝐻i

until 𝑠𝑢𝑝𝑏|Vi(b)− Vi−1(b)| < 𝜀

the inner loop

Page 103: Dynamic Information Retrieval Tutorial

Inner Loop of the Witness

Algorithm

Dynamic Information Retrieval Modeling Tutorial 2014 103

Inner loop of the witness algorithm

1. Select a belief b arbitrarily. Generate a best i-step policy tree hi. Add

ℎi to an agenda.

2. In each iteration

2.1 Select a policy tree ℎ𝑛𝑒𝑤 from the agenda.

2.2 Look for a witness point b using Za and ℎ𝑛𝑒𝑤. 2.3 If find such a witness point b,

2.3.1 Calculate the best policy tree ℎ𝑏𝑒𝑠𝑡 for b.

2.3.2 Add ℎ𝑏𝑒𝑠𝑡 to Za.

2.3.3 Add all the alternative trees of ℎ𝑏𝑒𝑠𝑡 to the agenda.

2.4 Else remove ℎ𝑛𝑒𝑤 from the agenda.

3. Repeat the above iteration until the agenda is empty.

Page 104: Dynamic Information Retrieval Tutorial

Other Solutions

Dynamic Information Retrieval Modeling Tutorial 2014 104

QMDP1

MC-POMDP (Monte Carlo POMDP)2

Grid Based Approximation3

Belief Compression4

……

1 Thrun et. al., ‘06 2 Thrun et. al., ‘05 3 Lovejoy, ‘91 4 Roy, ‘03

Page 105: Dynamic Information Retrieval Tutorial

Dynamic Information Retrieval Modeling Tutorial 2014 105

POMDP Dynamic IR

Environment Documents

Agents User, Search engine

States Queries, User’s decision making status, Relevance of

documents, etc

Actions Provide a ranking of documents, Weigh terms in the query,

Add/remove/unchange the query terms, Switch on or

switch off a search technology, Adjust parameters for a

search technology

Observations Queries, Clicks, Document lists, Snippets, Terms, etc

Rewards Evaluation measures (such as DCG, NDCG or MAP)

Clicking information

Transition matrix Given in advance or estimated from training data.

Observation

function

Problem dependent, Estimated based on sample datasets

Applying POMDP to Dynamic IR

Page 106: Dynamic Information Retrieval Tutorial

Session Search Example - States

SRT

Relevant &

Exploitation

SRR

Relevant &

Exploration

SNRT

Non-Relevant &

Exploitation

SNRR

Non-Relevant &

Exploration

scooter price ⟶ scooter stores Hartford visitors ⟶ Hartford

Connecticut tourism

Philadelphia NYC travel ⟶ Philadelphia NYC train

distance New York Boston ⟶

maps.bing.com

q0

106 [ J. Luo ,et al., ’14]

Page 107: Dynamic Information Retrieval Tutorial

Session Search Example - Actions

(Au, Ase)

User Action(Au)

Add query terms (+Δq)

Remove query terms (-Δq)

keep query terms (qtheme)

clicked documents

SAT clicked documents

Search Engine Action(Ase)

increase/decrease/keep term weights,

Switch on or switch off query expansion

Adjust the number of top documents used in PRF

etc.

107 [ J. Luo et al., ’14]

Page 108: Dynamic Information Retrieval Tutorial

Multi Page Search Example -

States & Actions

Dynamic Information Retrieval Modeling Tutorial 2014 108

State:

Relevance

of

document

Action:

Ranking of

documents

Observation:

Clicks Belief: Multivariate

Guassian

Reward: DCG over 2

pages

[Xiaoran Jin et. al., ’13]

Page 109: Dynamic Information Retrieval Tutorial

SIGIR Tutorial July 7th 2014

Grace Hui Yang

Marc Sloan

Jun Wang

Guest Speaker: Emine Yilmaz

Dynamic Information Retrieval

Modeling

Exercise

Page 110: Dynamic Information Retrieval Tutorial

Dynamic Information Retrieval Modeling Tutorial 2014 110

Markov Chain

Hidden Markov Model

Markov Decision Process

Partially Observable Markov Decision Process

Multi-Armed Bandit

Family of Markov Models

Page 111: Dynamic Information Retrieval Tutorial

Multi Armed Bandits (MAB)

Dynamic Information Retrieval Modeling Tutorial 2014 111

……

……

Which slot

machine should

I select in this

round?

Reward

Page 112: Dynamic Information Retrieval Tutorial

Multi Armed Bandits (MAB)

Dynamic Information Retrieval Modeling Tutorial 2014 112

I won! Is this

the best slot

machine?

Reward

Page 113: Dynamic Information Retrieval Tutorial

MAB Definition

Dynamic Information Retrieval Modeling Tutorial 2014 113

A tuple (S, A, R, B)

S : hidden reward distribution of each bandit

A: choose which bandit to play

R: reward for playing bandit

B: belief space, our estimate of each bandit’s

distribution

Page 114: Dynamic Information Retrieval Tutorial

Comparison with Markov Models

Dynamic Information Retrieval Modeling Tutorial 2014 114

Single state Markov Decision Process

No transition probability

Similar to POMDP in that we maintain a belief

state

Action = choose a bandit, does not affect state

Does not ‘plan ahead’ but intelligently adapts

Somewhere between interactive and dynamic IR

Page 115: Dynamic Information Retrieval Tutorial

Markov Multi Armed Bandits

Dynamic Information Retrieval Modeling Tutorial 2014 115

……

……

Markov

Process 1

Markov

Process 2

Markov

Process k

Which slot

machine should

I select in this

round?

Reward

Page 116: Dynamic Information Retrieval Tutorial

Markov Multi Armed Bandits

Dynamic Information Retrieval Modeling Tutorial 2014 116

……

……

Markov

Process 1

Markov

Process 2

Markov

Process k

Markov

Process

Action

Which slot

machine should

I select in this

round?

Reward

Page 117: Dynamic Information Retrieval Tutorial

MAB Policy Reward

Dynamic Information Retrieval Modeling Tutorial 2014 117

MAB algorithm describes a policy 𝜋 for choosing

bandits

Maximise rewards from chosen bandits over all

time steps

Minimize regret

𝑅𝑒𝑤𝑎𝑟𝑑 𝑎∗ − 𝑅𝑒𝑤𝑎𝑟𝑑(𝑎𝜋(𝑡))𝑇𝑡=1

Cumulative difference between optimal reward and

actual reward

Page 118: Dynamic Information Retrieval Tutorial

Exploration vs Exploitation

Dynamic Information Retrieval Modeling Tutorial 2014 118

Exploration

Try out bandits to find which has highest average reward

Exploitation

Too much exploration leads to poor performance

Play bandits that are known to pay out higher reward on average

MAB algorithms balance exploration and exploitation

Start by exploring more to find best bandits

Exploit more as best bandits become known

Page 119: Dynamic Information Retrieval Tutorial

Exploration vs Exploitation

Dynamic Information Retrieval Modeling Tutorial 2014 119

Page 120: Dynamic Information Retrieval Tutorial

MAB – Index Algorithms

Dynamic Information Retrieval Modeling Tutorial 2014 120

Gittens index1

Play bandit with highest ‘Dynamic Allocation Index’

Modelled using MDP but suffers ‘curse of dimensionality’

𝜖-greedy2

Play highest reward bandit with probability 1 − ϵ Play random bandit with probability 𝜖

UCB (Upper Confidence Bound)3

Play bandit 𝑖 with highest 𝑥𝑖 + 2 ln 𝑡

𝑇𝑖

Chances of playing infrequently played bandits increases over time

1J. C. Gittins. ‘89 2Nicolò Cesa-Bianchi et. al., ‘98 3P. Auer et. al., ‘02

Page 121: Dynamic Information Retrieval Tutorial

MAB use in IR

Dynamic Information Retrieval Modeling Tutorial 2014 121

Choosing ads to display to users1

Each ad is a bandit

User click through rate is reward

Recommending news articles2

News article is a bandit

Similar to Information Filtering case

Diversifying search results3

Each rank position is an MAB dependent on higher ranks

Documents are bandits chosen by each rank

1Deepayan Chakrabarti et. al. , ‘09 2Lihong Li et. al., ’10 3Radlinski et. al., ‘08

Page 122: Dynamic Information Retrieval Tutorial

MAB Variations

Dynamic Information Retrieval Modeling Tutorial 2014 122

Contextual Bandits1

World has some context 𝑥 ∈ 𝑋 (i.e. user location)

Learn policy 𝜋: 𝑋 → 𝐴 that maps context to arms (online or offline)

Duelling Bandits2

Play two (or more) bandits at each time step

Observe relative reward rather than absolute

Learn order of bandits

Mortal Bandits3

Value of bandits decays over time

Exploitation > exploration

1Lihong Li et. al., ‘10 2Yisong Yue et. al., ‘09 3Deepayan Chakrabarti et. al. , ‘09

Page 123: Dynamic Information Retrieval Tutorial

Comparison of Markov Models

Dynamic Information Retrieval Modeling Tutorial 2014 123

MC – a fully observable stochastic process

HMM – a partially observable stochastic process

MDP – a fully observable decision process

MAB – a decision process, either fully or partially observable

POMDP – a partially observable decision process

actions rewards states

MC No No Observable

HMM No No Unobservable

MDP Yes Yes Observable

POMDP Yes Yes Unobservable

MAB Yes Yes Fixed

Page 124: Dynamic Information Retrieval Tutorial

SIGIR Tutorial July 7th 2014

Grace Hui Yang

Marc Sloan

Jun Wang

Guest Speaker: Emine Yilmaz

Dynamic Information Retrieval

Modeling

Exercise

Page 125: Dynamic Information Retrieval Tutorial

Outline

Dynamic Information Retrieval Modeling Tutorial 2014 125

Introduction

Theory and Models

Session Search

Reranking

Guest Talk: Evaluation

Page 126: Dynamic Information Retrieval Tutorial

TREC Session Tracks (2010-2012)

Given a series of queries {q1,q2,…,qn}, top 10 retrieval

results {D1, … Di-1 } for q1 to qi-1, and click information

The task is to retrieve a list of documents for the current/last

query, qn

Relevance judgment is made based on how relevant the

documents are for qn, and how relevant they are for information

needs for the entire session (in topic description)

no need to segment the sessions

126

Page 127: Dynamic Information Retrieval Tutorial

1.pocono mountains pennsylvania

2.pocono mountains pennsylvania hotels

3.pocono mountains pennsylvania things to do

4.pocono mountains pennsylvania hotels

5.pocono mountains camelbeach

6.pocono mountains camelbeach hotel

7.pocono mountains chateau resort

8.pocono mountains chateau resort attractions

9.pocono mountains chateau resort getting to

10.chateau resort getting to

11.pocono mountains chateau resort directions

TREC 2012 Session 6

127

Information needs:

You are planning a winter vacation to the

Pocono Mountains region in Pennsylvania in

the US. Where will you stay? What will you

do while there? How will you get there?

In a session, queries change

constantly

Page 128: Dynamic Information Retrieval Tutorial

Query change is an important

form of feedback

We define query change as the syntactic editing changes

between two adjacent queries:

includes

, added terms

, removed terms

The unchanged/shared terms are called:

, theme term

1 iii qqq

iq

128

iqiq

iq

themeq q1 = “bollywood legislation”

q2 = “bollywood law”

---------------------------------------

Theme Term = “bollywood”

Added (+Δq) = “law”

Removed (-Δq) = “legislation”

Page 129: Dynamic Information Retrieval Tutorial

Where do these query changes come

from?

Given TREC Session settings, we consider two sources of

query change:

the previous search results that a user viewed/read/examined

the information need

Example:

Kurosawa Kurosawa wife

`wife’ is not in any previous results, but in the topic description

However, knowing information needs before search is

difficult to achieve

129

Page 130: Dynamic Information Retrieval Tutorial

Previous search results could influence

query change in quite complex ways

Merck lobbyists Merck lobbying US policy

D1 contains several mentions of ‘policy’, such as “A lobbyist who until 2004 worked as senior policy advisor to

Canadian Prime Minister Stephen Harper was hired last month by Merck …”

These mentions are about Canadian policies; while the user adds US policy in q2

Our guess is that the user might be inspired by ‘policy’, but he/she prefers a different sub-concept other than `Canadian policy’

Therefore, for the added terms `US policy’, ‘US’ is the novel term here, and ‘policy’ is not since it appeared in D1. The two terms should be treated differently

130

Page 131: Dynamic Information Retrieval Tutorial

We propose to model session search as a Markov decision process (MDP)

Two agents: the User and the Search Engine

Dynamic Information Retrieval Modeling Tutorial 2014 131

Environments

Search results

States Queries

Actions

User actions:

Add/remove/unchange

the query terms

Search Engine actions:

Increase/ decrease

/remain term weights

Applying MDP to Session Search

Page 132: Dynamic Information Retrieval Tutorial

Search Engine Agent’s Actions

∈ Di−1 action Example

qtheme

Y increase “pocono mountain” in s6

N increase “france world cup 98 reaction” in s28,

france world cup 98 reaction stock market→ france world cup 98 reaction

+∆q

Y decrease ‘policy’ in s37, Merck lobbyists → Merck

lobbyists US policy

N increase ‘US’ in s37, Merck lobbyists → Merck lobbyists

US policy

−∆q

Y decrease ‘reaction’ in s28, france world cup 98 reaction

→ france world cup 98

N No change

‘legislation’ in s32, bollywood legislation →bollywood law

132

Page 133: Dynamic Information Retrieval Tutorial

Query Change retrieval Model

(QCM)

Bellman Equation gives the optimal value for an MDP:

The reward function is used as the document relevance score

function and is tweaked backwards from Bellman equation:

133

V*(s) = maxa

R(s,a) + g P(s' | s,a)s '

å V*(s')

a

Di

)D|(q P maxa) ,D ,q|(q P + d)|(q P = d) ,Score(q 1-i1-i1-i1-iiii1

Document

relevant score Query

Transition

model

Maximum

past

relevance Current

reward/relevanc

e score

Page 134: Dynamic Information Retrieval Tutorial

Calculating the Transition Model

)|(log)|(

)|(log)()|(log)|(

)|(log)]|(1[+ d)|P(q log = d) ,Score(q

*1

*1

*1ii

*1

*1

dtPdtP

dtPtidfdtPdtP

dtPdtP

qti

dtqt

dtqt

i

qthemeti

ii

134

• According to Query Change and Search Engine

Actions Current reward/

relevance score

Increase weights

for theme terms

Decrease weights

for removed terms

Increase weights

for novel added

terms Decrease weights

for old added

terms

Page 135: Dynamic Information Retrieval Tutorial

Maximizing the Reward Function

Generate a maximum rewarded document denoted as d*i-1,

from Di-1

That is the document(s) most relevant to qi-1

The relevance score can be calculated as

𝑃 𝑞𝑖−1 𝑑𝑖−1 = 1 − {1 − 𝑃(𝑡|𝑑𝑖−1)}𝑡∈𝑞𝑖−1

𝑃 𝑡 𝑑𝑖−1 =#(𝑡,𝑑𝑖−1)

|𝑑𝑖−1|

From several options, we choose to only use the document with top relevance

maxDi-1

P(qi-1 |Di-1)

135

Page 136: Dynamic Information Retrieval Tutorial

Scoring the Entire Session

The overall relevance score for a session of queries is

aggregated recursively :

Scoresession(qn, d) = Score(qn, d) + gScoresession(qn-1, d)

= Score(qn, d) + g[Score(qn-1, d) + gScoresession (qn-2, d)]

= g n-i

i=1

n

å Score(qi, d)

136

Page 137: Dynamic Information Retrieval Tutorial

Experiments

TREC 2011-2012 query sets, datasets

ClubWeb09 Category B

137

Page 138: Dynamic Information Retrieval Tutorial

Search Accuracy (TREC 2012)

nDCG@10 (official metric used in TREC)

Approach nDCG@10 %chg MAP %chg

Lemur 0.2474 -21.54% 0.1274 -18.28%

TREC’12 median 0.2608 -17.29% 0.1440 -7.63%

Our TREC’12 submission

0.3021 −4.19% 0.1490 -4.43%

TREC’12 best 0.3221 0.00% 0.1559 0.00%

QCM 0.3353 4.10%† 0.1529 -1.92%

QCM+Dup 0.3368 4.56%† 0.1537 -1.41%

138

Page 139: Dynamic Information Retrieval Tutorial

Search Accuracy (TREC 2011)

nDCG@10 (official metric used in TREC)

Approach nDCG@10 %chg MAP %chg

Lemur 0.3378 -23.38% 0.1118 -25.86%

TREC’11 median 0.3544 -19.62% 0.1143 -24.20%

TREC’11 best 0.4409 0.00% 0.1508 0.00%

QCM 0.4728 7.24%† 0.1713 13.59%†

QCM+Dup 0.4821 9.34%† 0.1714 13.66%†

Our TREC’12 submission

0.4836 9.68%† 0.1724 14.32%†

139

Page 140: Dynamic Information Retrieval Tutorial

Search Accuracy for Different

Session Types TREC 2012 Sessions are classified into:

Product: Factual / Intellectual

Goal quality: Specific / Amorphous

Intellec

tual %chg Amorphous %chg Specific %chg Factual %chg

TREC best 0.3369 0.00% 0.3495 0.00% 0.3007 0.00% 0.3138 0.00%

Nugget 0.3305 -1.90% 0.3397 -2.80% 0.2736 -9.01% 0.2871 -8.51%

QCM 0.3870 14.87% 0.3689 5.55% 0.3091 2.79% 0.3066 -2.29%

QCM+DUP 0.3900 15.76% 0.3692 5.64% 0.3114 3.56% 0.3072 -2.10%

140

- Better handle sessions that demonstrate evolution and exploration

Because QCM treats a session as a continuous process by studying

changes among query transitions and modeling the dynamics

Page 141: Dynamic Information Retrieval Tutorial

Outline

Dynamic Information Retrieval Modeling Tutorial 2014 141

Introduction

Theory and Models

Session Search

Reranking

Guest Talk: Evaluation

Page 142: Dynamic Information Retrieval Tutorial

Multi Page Search

Dynamic Information Retrieval Modeling Tutorial 2014 142

Page 143: Dynamic Information Retrieval Tutorial

Multi Page Search

Dynamic Information Retrieval Modeling Tutorial 2014 143

Page 1 Page 2

2.

1.

2.

1.

Page 144: Dynamic Information Retrieval Tutorial

Relevance Feedback

Dynamic Information Retrieval Modeling Tutorial 2014 144

No UI Changes

Interactivity is Hidden

Private, performed in browser

Page 145: Dynamic Information Retrieval Tutorial

Relevance Feedback

Dynamic Information Retrieval Modeling Tutorial 2014 145

Page 1

• Diverse Ranking

• Maximise

learning

potential

• Exploration vs

Exploitation

Page 2

• Clickthroughs or

explicit ratings

• Respond to

feedback from

page 1

• Personalized

Page 146: Dynamic Information Retrieval Tutorial

Model

Dynamic Information Retrieval Modeling Tutorial 2014 146

Page 147: Dynamic Information Retrieval Tutorial

Model

Dynamic Information Retrieval Modeling Tutorial 2014 147

𝑁 𝜃1, Σ1

𝜃1 -prior estimate of relevance

Σ1 - prior estimate of covariance Document similarity

Topic Clustering

Page 148: Dynamic Information Retrieval Tutorial

Model

Dynamic Information Retrieval Modeling Tutorial 2014 148

Rank action for page 1

Page 149: Dynamic Information Retrieval Tutorial

Model

Dynamic Information Retrieval Modeling Tutorial 2014 149

Page 150: Dynamic Information Retrieval Tutorial

Model

Dynamic Information Retrieval Modeling Tutorial 2014 150

Feedback from page 1

𝒓 ~ 𝑁(𝜃𝒔1, Σ𝒔1)

Page 151: Dynamic Information Retrieval Tutorial

Model

Dynamic Information Retrieval Modeling Tutorial 2014 151

Update estimates using 𝒓1

𝜃1 = 𝜃\𝒔′𝜃𝒔′ Σ1 =

Σ\𝒔′ Σ\s′𝒔′Σs′\𝒔′ Σ𝒔′

𝜃2 = 𝜃\𝒔′ + Σ\s′𝒔′Σ𝒔′−1(𝒓1 − 𝜃𝒔′)

Σ2 = Σ\𝒔′ - Σ\s′𝒔′Σ𝒔′−1Σs′\𝒔′

Page 152: Dynamic Information Retrieval Tutorial

Model

Dynamic Information Retrieval Modeling Tutorial 2014 152

Rank using PRP

Page 153: Dynamic Information Retrieval Tutorial

Model

Dynamic Information Retrieval Modeling Tutorial 2014 153

Utility or Ranking

𝜆 𝜃𝑠𝑗1

log2(𝑗+1)+ 1 − 𝜆

𝜃𝑠𝑗2

log2(𝑗+1)2𝑀𝑗=1+𝑀

𝑀𝑗=1

DCG

Page 154: Dynamic Information Retrieval Tutorial

Model – Bellman Equation

Dynamic Information Retrieval Modeling Tutorial 2014 154

Optimize 𝒔1 to improve 𝑼𝒔2

𝑉 𝜃1, Σ1, 1 =

max𝒔1𝜆𝜃𝒔1.𝑾1 + max

𝒔2(1 − 𝜆) 𝜃𝒔

2.𝑾2𝑃 𝒓 𝑑𝒓𝒓

Page 155: Dynamic Information Retrieval Tutorial

𝜆

Dynamic Information Retrieval Modeling Tutorial 2014 155

Balances exploration and exploitation in page 1

Tuned for different queries

Navigational

Informational

𝜆 = 1 for non-ambiguous search

Page 156: Dynamic Information Retrieval Tutorial

Approximation

Dynamic Information Retrieval Modeling Tutorial 2014 156

Monte Carlo Sampling

≈ max𝒔1𝜆𝜃𝒔1.𝑾1 +max

𝒔21 − 𝜆

1

𝑆 𝜃𝒔

2.𝑾2𝑃 𝒓𝑟 ∈𝑂

Sequential Ranking Decision

Page 157: Dynamic Information Retrieval Tutorial

Experiment Data

Dynamic Information Retrieval Modeling Tutorial 2014 157

Difficult to evaluate without access to live users

Simulated using 3 TREC collections and relevance

judgements

WT10G – Explicit Ratings

TREC8 – Clickthroughs

Robust – Difficult (ambiguous) search

Page 158: Dynamic Information Retrieval Tutorial

User Simulation

Dynamic Information Retrieval Modeling Tutorial 2014 158

Rank M documents

Simulated user clicks according to relevance judgements

Update page 2 ranking

Measure at page 1 and 2

Recall

Precision

nDCG

MRR

BM25 – prior ranking model

Page 159: Dynamic Information Retrieval Tutorial

Investigating λ

Dynamic Information Retrieval Modeling Tutorial 2014 159

Page 160: Dynamic Information Retrieval Tutorial

Baselines

Dynamic Information Retrieval Modeling Tutorial 2014 160

𝜆 determined experimentally

BM25

BM25 with conditional update (𝜆 = 1)

Maximum Marginal Relevance (MMR)

Diversification

MMR with conditional update

Rocchio

Relevance Feedback

Page 161: Dynamic Information Retrieval Tutorial

Results

Dynamic Information Retrieval Modeling Tutorial 2014 161

Page 162: Dynamic Information Retrieval Tutorial

Results

Dynamic Information Retrieval Modeling Tutorial 2014 162

Page 163: Dynamic Information Retrieval Tutorial

Results

Dynamic Information Retrieval Modeling Tutorial 2014 163

Page 164: Dynamic Information Retrieval Tutorial

Results

Dynamic Information Retrieval Modeling Tutorial 2014 164

Page 165: Dynamic Information Retrieval Tutorial

Results

Dynamic Information Retrieval Modeling Tutorial 2014 165

Similar results across data sets and metrics

2nd page gain outweighs 1st page losses

Outperformed Maximum Marginal Relevance using MRR to

measure diversity

BM25-U simply no exploration case

Similar results when 𝑀 = 5

Page 166: Dynamic Information Retrieval Tutorial

Results

Dynamic Information Retrieval Modeling Tutorial 2014 166

Page 167: Dynamic Information Retrieval Tutorial

Outline

Dynamic Information Retrieval Modeling Tutorial 2014 167

Introduction

Theory and Models

Session Search

Reranking

Guest Talk: Evaluation

Page 168: Dynamic Information Retrieval Tutorial

Dynamic Information Retrieval

Evaluation

Emine Yilmaz

University College London

[email protected]

Page 169: Dynamic Information Retrieval Tutorial

Information Retrieval Systems

Match information seekers with

the information they seek

Page 170: Dynamic Information Retrieval Tutorial

Retrieval Evaluation: Traditional

View

Page 171: Dynamic Information Retrieval Tutorial

Retrieval Evaluation: Dynamic

View

Page 172: Dynamic Information Retrieval Tutorial

Retrieval Evaluation: Dynamic

View

Page 173: Dynamic Information Retrieval Tutorial

Retrieval Evaluation: Dynamic

View

Page 174: Dynamic Information Retrieval Tutorial

Different Approaches to

Evaluation

Online Evaluation

Design interactive experiments

Use users’ actions to evaluate the quality

Inherently dynamic in nature

Offline Evaluation

Controlled laboratory experiments

The users’ interaction with the engine is only simulated

Recent work focused on dynamic IR evaluation

Page 175: Dynamic Information Retrieval Tutorial

Online Evaluation

Standard click metrics

Clickthrough rate

Probability user skips over results they have considered (pSkip)

Most recently: Result interleaving

Click/Noclick

Evaluate

175

Page 176: Dynamic Information Retrieval Tutorial

What is result interleaving? A way to compare rankers online

Given the two rankings produced by two methods

Present a combination of the rankings to users

Team Draft Interleaving (Radlinski et al., 2008)

Interleaving two rankings

Input: Two rankings (“can be seen as teams who pick players”)

Repeat:

o Toss a coin to see which team (ranking) picks next

o Winner picks their best remaining player (document)

o Loser picks their best remaining player (document)

Output: One ranking (2 teams of 5)

Credit assignment

Ranking providing more of the clicked results wins

Page 177: Dynamic Information Retrieval Tutorial

Team Draft Interleaving

Ranking A 1. Napa Valley – The authority for lodging... www.napavalley.com 2. Napa Valley Wineries - Plan your wine... www.napavalley.com/wineries 3. Napa Valley College www.napavalley.edu/homex.asp 4. Been There | Tips | Napa Valley www.ivebeenthere.co.uk/tips/16681 5. Napa Valley Wineries and Wine www.napavintners.com 6. Napa Country, California – Wikipedia en.wikipedia.org/wiki/Napa_Valley

Ranking B 1. Napa Country, California – Wikipedia en.wikipedia.org/wiki/Napa_Valley 2. Napa Valley – The authority for lodging... www.napavalley.com 3. Napa: The Story of an American Eden... books.google.co.uk/books?isbn=... 4. Napa Valley Hotels – Bed and Breakfast... www.napalinks.com 5. NapaValley.org www.napavalley.org 6. The Napa Valley Marathon www.napavalleymarathon.org

Presented Ranking 1. Napa Valley – The authority for lodging... www.napavalley.com 2. Napa Country, California – Wikipedia en.wikipedia.org/wiki/Napa_Valley 3. Napa: The Story of an American Eden... books.google.co.uk/books?isbn=... 4. Napa Valley Wineries – Plan your wine... www.napavalley.com/wineries 5. Napa Valley Hotels – Bed and Breakfast... www.napalinks.com 6. Napa Valley College www.napavalley.edu/homex.asp 7 NapaValley.org www.napavalley.org

A B

Page 178: Dynamic Information Retrieval Tutorial

Team Draft Interleaving

Ranking A 1. Napa Valley – The authority for lodging... www.napavalley.com 2. Napa Valley Wineries - Plan your wine... www.napavalley.com/wineries 3. Napa Valley College www.napavalley.edu/homex.asp 4. Been There | Tips | Napa Valley www.ivebeenthere.co.uk/tips/16681 5. Napa Valley Wineries and Wine www.napavintners.com 6. Napa Country, California – Wikipedia en.wikipedia.org/wiki/Napa_Valley

Ranking B 1. Napa Country, California – Wikipedia en.wikipedia.org/wiki/Napa_Valley 2. Napa Valley – The authority for lodging... www.napavalley.com 3. Napa: The Story of an American Eden... books.google.co.uk/books?isbn=... 4. Napa Valley Hotels – Bed and Breakfast... www.napalinks.com 5. NapaValley.org www.napavalley.org 6. The Napa Valley Marathon www.napavalleymarathon.org

Presented Ranking 1. Napa Valley – The authority for lodging... www.napavalley.com 2. Napa Country, California – Wikipedia en.wikipedia.org/wiki/Napa_Valley 3. Napa: The Story of an American Eden... books.google.co.uk/books?isbn=... 4. Napa Valley Wineries – Plan your wine... www.napavalley.com/wineries 5. Napa Valley Hotels – Bed and Breakfast... www.napalinks.com 6. Napa Valley College www.napavalley.edu/homex.asp 7 NapaValley.org www.napavalley.org

B wins!

Page 179: Dynamic Information Retrieval Tutorial

Team Draft Interleaving

Ranking A 1. Napa Valley – The authority for lodging... www.napavalley.com 2. Napa Valley Wineries - Plan your wine... www.napavalley.com/wineries 3. Napa Valley College www.napavalley.edu/homex.asp 4. Been There | Tips | Napa Valley www.ivebeenthere.co.uk/tips/16681 5. Napa Valley Wineries and Wine www.napavintners.com 6. Napa Country, California – Wikipedia en.wikipedia.org/wiki/Napa_Valley

Ranking B 1. Napa Country, California – Wikipedia en.wikipedia.org/wiki/Napa_Valley 2. Napa Valley – The authority for lodging... www.napavalley.com 3. Napa: The Story of an American Eden... books.google.co.uk/books?isbn=... 4. Napa Valley Hotels – Bed and Breakfast... www.napalinks.com 5. NapaValley.org www.napavalley.org 6. The Napa Valley Marathon www.napavalleymarathon.org

Presented Ranking 1. Napa Valley – The authority for lodging... www.napavalley.com 2. Napa Country, California – Wikipedia en.wikipedia.org/wiki/Napa_Valley 3. Napa: The Story of an American Eden... books.google.co.uk/books?isbn=... 4. Napa Valley Wineries – Plan your wine... www.napavalley.com/wineries 5. Napa Valley Hotels – Bed and Breakfast... www.napalinks.com 6. Napa Valley College www.napavalley.edu/homex.asp 7 NapaValley.org www.napavalley.org

B wins!

Repeat Over Many Different

Queries!

Page 180: Dynamic Information Retrieval Tutorial

Offline Evaluation

Controlled laboratory experiments

The user’s interaction with the engine is

only simulated Ask experts to judge each query result

Predict how users behave when they search

Aggregate judgments to evaluate

180

Page 181: Dynamic Information Retrieval Tutorial

Offline Evaluation

Until recently: Metrics assume that user’s information need was not affected by the documents read

E.g. Average Precision, NDCG, …

• Users are more likely to stop searching when they see a highly relevant document

• Lately: Metrics that incorporate the affect of relevance of documents seen by the user on user behavior

Based on devising more realistic user models

EBU, ERR [Yilmaz et al CIKM10, Chapelle et al CIKM09]

181

Page 182: Dynamic Information Retrieval Tutorial

Modeling User Behavior

Cascade-based models

black powder

ammunition

1

2

3

4

5

6

7

8

9

10

• The user views search results from top to bottom

• At each rank i, the user has a certain probability of being

satisfied.

• Probability of satisfaction proportional to the

relevance grade of the document at rank i.

• Once the user is satisfied with a document, he terminates

the search.

Page 183: Dynamic Information Retrieval Tutorial

Rank Biased Precision

Query

Stop

View Next

Item

black powder

ammunition

1

2

3

4

5

6

7

8

9

10

Page 184: Dynamic Information Retrieval Tutorial

Rank Biased Precision black powder

ammunition

1

2

3

4

5

6

7

8

9

10

1=i

1=utility Total i

irel

examined docs m.utility/Nu Total RBP

)1/(1)1(=examined docs Num.1=i

1

ii

)-(1= RBP1=i

1

i

irel

Page 185: Dynamic Information Retrieval Tutorial

Expected Reciprocal Rank [Chapelle et al CIKM09]

Query

Stop

Relevant?

View Next

Item

no somewhat highly

black powder

ammunition

1

2

3

4

5

6

7

8

9

10

Page 186: Dynamic Information Retrieval Tutorial

Expected Reciprocal Rank [Chapelle et al CIKM09]

black powder

ammunition

1

2

3

4

5

6

7

8

9

10

rrank at document"perfect the" finding of Utility :(r)

1/r (r)

)position at stopsuser (1

1

rPr

ERRn

r

1

11

)1(1 r

i

ri

n

r

RRr

ERR

document i theof grade relevance : th

ig

iRig

g

i

i

docat stop of Prob.2

12 doc of relevance of Prob.

max

Page 187: Dynamic Information Retrieval Tutorial

Paris Luxurious Hotels Paris Hilton J Lo Session Evaluation

Page 188: Dynamic Information Retrieval Tutorial

What is a good system?

Page 189: Dynamic Information Retrieval Tutorial

Measuring “goodness”

The user steps down a ranked list of documents and

observes each one of them until a decision point and either

a) abandons the search, or

b) reformulates

While stepping down or sideways, the user accumulates

utility

Page 190: Dynamic Information Retrieval Tutorial

Evaluation over a single ranked list

1

2

3

4

5

6

7

8

9

10

kenya cooking

traditional swahili

kenya cooking

traditional

kenya swahili

traditional food

recipes

Page 191: Dynamic Information Retrieval Tutorial
Page 192: Dynamic Information Retrieval Tutorial

Session DCG [Järvelin et al ECIR 2008]

kenya cooking

traditional swahili

kenya cooking

traditional

2rel(r ) 1

logb (r b 1)r1

k

2rel(r ) 1

logb (r b 1)r1

k

1

logc (1 c 1)DCG(RL1)

1

logc (2 c 1) DCG(RL2)

Page 193: Dynamic Information Retrieval Tutorial

Model-based measures

Probabilistic space of users following

different paths

Ω is the space of all paths

P(ω) is the prob of a user following a path ω in Ω

Mω is a measure over a path ω

[Yang and Lad ICTIR 2009,

Kanoulas et al. SIGIR 2011]

Page 194: Dynamic Information Retrieval Tutorial

Probability of a path

Probability of abandoning at

reform 2

X

Probability of reformulating at rank

3

Q1 Q2 Q3

N R R

N R R

N R R

N R R

N R R

N N R

N N R

N N R

N N R

N N R

… … …

(1)

(2)

Page 195: Dynamic Information Retrieval Tutorial

Expected Global Utility [Yang and Lad ICTIR 2009]

1. User steps down ranked results one-by-one

2. Stops browsing documents based on a stochastic process

that defines a stopping probability distribution over ranks

and reformulates

3. Gains something from relevant documents, accumulating

utility

Page 196: Dynamic Information Retrieval Tutorial

Q1 Q2 Q3

N R R

N R R

N R R

N R R

N R R

N N R

N N R

N N R

N N R

N N R

… … …

Probability

of abandoning

the session at

reformulation i

Geometric w/ parameter preform

(1)

Page 197: Dynamic Information Retrieval Tutorial

Q1 Q2 Q3

N R R

N R R

N R R

N R R

N R R

N N R

N N R

N N R

N N R

N N R

… … …

Geo

met

ric

w/

par

amet

er p

dow

n

Probability

of reformulating

at rank j

(2)

Geometric w/ parameter preform

Page 198: Dynamic Information Retrieval Tutorial

Expected Global Utility [Yang and Lad ICTIR 2009]

The probability of a user following a path ω:

P(ω) = P(r1, r2, ..., rK)

ri is the stopping and reformulation point in list i

Assumption: stopping positions in each list are independent

P(r1, r2, ..., rK) = P(r1)P(r2)...P(rK)

Use geometric distribution (RBP) to model the stopping and

reformulation behaviour

P(ri = r) = (1-) k1

Page 199: Dynamic Information Retrieval Tutorial

Conclusions

Recent focus on evaluating the dynamic nature of the search

process

Interleaving

New offline evaluation metrics

ERR, RBU

Session evaluation metrics

Page 200: Dynamic Information Retrieval Tutorial

Outline

Dynamic Information Retrieval Modeling Tutorial 2014 200

Introduction

Theory and Models

Session Search

Reranking

Guest Talk: Evaluation

Conclusion

Page 201: Dynamic Information Retrieval Tutorial

Conclusions

Dynamic Information Retrieval Modeling Tutorial 2014 201

Dynamic IR describes a new class of interactive model

Incorporates rich feedback, temporal dependency and is goal

oriented.

Family of Markov models and Multi Armed Bandit theory

useful in building DIR models

Applicable to a range of IR problems

Useful in applications such as session search and evaluation

Page 202: Dynamic Information Retrieval Tutorial

Dynamic IR Book

Dynamic Information Retrieval Modeling Tutorial 2014 202

Published by Morgan & Claypool

‘Synthesis Lectures on Information Concepts, Retrieval, and

Services’

Due March/April 2015 (in time for SIGIR 2015)

Page 203: Dynamic Information Retrieval Tutorial

Acknowledgment

Dynamic Information Retrieval Modeling Tutorial 2014 203

We thank Dr. Emine Yilmaz for giving us the guest speech.

We sincerely thank Dr. Xuchu Dong for his help in

preparation of the tutorial

We also thank comments and suggestions from the following

colleagues:

Dr. Jamie Callan

Dr. Ophir Frieder

Dr. Fernando Diaz

Dr Filip Radlinski

Page 204: Dynamic Information Retrieval Tutorial

Dynamic Information Retrieval Modeling Tutorial 2014 204

Page 205: Dynamic Information Retrieval Tutorial

Thank You

Dynamic Information Retrieval Modeling Tutorial 2014 205

Page 206: Dynamic Information Retrieval Tutorial

References

Dynamic Information Retrieval Modeling Tutorial 2014 206

Static IR

Modern Information Retrieval. R. Baeza-Yates and B. Ribeiro-

Neto. Addison-Wesley, 1999.

The PageRank Citation Ranking: Bringing Order to the Web.

Lawrence Page , Sergey Brin , Rajeev Motwani , Terry Winograd.

1999

Implicit User Modeling for Personalized Search, Xuehua Shen et.

al, CIKM, 2005

A Short Introduction to Learning to Rank. Hang Li, IEICE

Transactions 94-D(10): 1854-1862, 2011.

Page 207: Dynamic Information Retrieval Tutorial

References

Dynamic Information Retrieval Modeling Tutorial 2014 207

Interactive IR

Relevance Feedback in Information Retrieval, Rocchio, J. J., The

SMART Retrieval System (pp. 313-23), 1971

A study in interface support mechanisms for interactive

information retrieval, Ryen W. White et. al, JASIST, 2006

Visualizing stages during an exploratory search session, Bill Kules

et. al, HCIR, 2011

Dynamic Ranked Retrieval, Cristina Brandt et. al, WSDM, 2011

Structured Learning of Two-level Dynamic Rankings, Karthik

Raman et. al, CIKM, 2011

Page 208: Dynamic Information Retrieval Tutorial

References

Dynamic Information Retrieval Modeling Tutorial 2014 208

Dynamic IR

A hidden Markov model information retrieval system. D. R. H. Miller, T. Leek, and R. M. Schwartz. In SIGIR’99, pages 214-221.

Threshold setting and performance optimization in adaptive filtering, Stephen Robertson, JIR 2002

A large-scale study of the evolution of web pages, Dennis Fetterly et. al., WWW 2003

Learning diverse rankings with multi-armed bandits. Filip Radlinski, Robert Kleinberg, Thorsten Joachims. ICML, 2008.

Interactively Optimizing Information Retrieval Systems as a Dueling Bandits Problem, Yisong Yue et. al., ICML 2009

Meme-tracking and the dynamics of the news cycle, Jure Leskovec et. al., KDD 2009

Page 209: Dynamic Information Retrieval Tutorial

References

Dynamic Information Retrieval Modeling Tutorial 2014 209

Dynamic IR

Mortal multi-armed bandits. Deepayan Chakrabarti, Ravi Kumar, Filip Radlinski, Eli Upfal. NIPS 2009

A Novel Click Model and Its Applications to Online Advertising , Zeyuan Allen Zhu et. al., WSDM 2010

A contextual-bandit approach to personalized news article recommendation. Lihong Li, Wei Chu, John Langford, Robert E. Schapire. WWW, 2010

Inferring search behaviors using partially observable markov model with duration (POMD), Yin he et. al., WSDM, 2011

No Clicks, No Problem: Using Cursor Movements to Understand and Improve Search, Jeff Huang et. al., CHI 2011

Balancing Exploration and Exploitation in Learning to Rank Online, Katja Hofmann et. al., ECIR, 2011

Large-Scale Validation and Analysis of Interleaved Search Evaluation, Olivier Chapelle et. al., TOIS 2012

Page 210: Dynamic Information Retrieval Tutorial

References

Dynamic Information Retrieval Modeling Tutorial 2014 210

Dynamic IR

Using Control Theory for Stable and Efficient Recommender Systems. T. Jambor, J. Wang, N. Lathia. In: WWW '12, pages 11-20.

Sequential selection of correlated ads by POMDPs, Shuai Yuan et. al., CIKM 2012

Utilizing query change for session search. D. Guan, S. Zhang, and H. Yang. In SIGIR ’13, pages 453–462.

Query Change as Relevance Feedback in Session Search (short paper). S. Zhang, D. Guan, and H. Yang. In SIGIR 2013.

Interactive exploratory search for multi page search results. X. Jin, M. Sloan, and J. Wang. In WWW ’13.

Interactive Collaborative Filtering. X. Zhao, W. Zhang, J. Wang. In: CIKM'2013, pages 1411-1420.

Win-win search: Dual-agent stochastic game in session search. J. Luo, S. Zhang, and H. Yang. In SIGIR ’14.

Page 211: Dynamic Information Retrieval Tutorial

References

Dynamic Information Retrieval Modeling Tutorial 2014 211

Markov Processes

A markovian decision process. R. Bellman. Indiana University

Mathematics Journal, 6:679–684, 1957.

Dynamic Programming. R. Bellman. Princeton University Press,

Princeton, NJ, USA, first edition, 1957.

Dynamic Programming and Markov Processes. R.A. Howard. MIT Press.

1960

Linear Programming and Sequential Decisions. Alan S. Manne.

Management Science, 1960

Statistical Inference for Probabilistic Functions of Finite State Markov

Chains. Baum, Leonard E.; Petrie, Ted. The Annals of Mathematical

Statistics 37, 1966

Page 212: Dynamic Information Retrieval Tutorial

References

Dynamic Information Retrieval Modeling Tutorial 2014 212

Markov Processes

Learning to predict by the methods of temporal differences. Richard Sutton. Machine Learning 3. 1988

Computationally feasible bounds for partially observed Markov decision processes. W. Lovejoy. Operations Research 39: 162–175, 1991.

Q-Learning. Christopher J.C.H. Watkins, Peter Dayan. Machine Learning. 1992

Reinforcement learning with replacing eligibility traces. Singh, S. P. & Sutton, R. S. Machine Learning, 22, pages 123-158, 1996.

Reinforcement Learning: An Introduction. Richard S. Sutton and Andrew G. Barto. MIT Press, 1998.

Planning and acting in partially observable stochastic domains. L. Kaelbling, M. Littman, and A. Cassandra. Artificial Intelligence, 101(1-2):99–134, 1998.

Page 213: Dynamic Information Retrieval Tutorial

References

Dynamic Information Retrieval Modeling Tutorial 2014 213

Markov Processes

Finding approximate POMDP solutions through belief compression. N. Roy. PhD Thesis Carnegie Mellon. 2003

VDCBPI: an approximate scalable algorithm for large scale POMDPs, P. Poupart and C. Boutilier. In NIPS-2004, pages 1081–1088.

Finding Approximate POMDP solutions Through Belief Compression. N. Roy, G. Gordon and S. Thrun. Journal of Artificial Intelligence Research, 23:1-40,2005.

Probabilistic robotics. S. Thrun, W. Burgard, D. Fox. Cambridge. MIT Press. 2005

Anytime Point-Based Approximations for Large POMDPs. J. Pineau, G. Gordon and S. Thrun. Volume 27, pages 335-380, 2006

Probabilistic Robotics. S. Thrun, W. Burgard, D. Fox. The MIT Press, 2006.

Page 214: Dynamic Information Retrieval Tutorial

References

Dynamic Information Retrieval Modeling Tutorial 2014 214

Markov Processes

The optimal control of partially observable Markov decision processes over a finite horizon. R. D. Smallwood, E.J. Sondik. Operations Research. 1973

Modified Policy Iteration Algorithms for Discounted Markov Decision Problems. M. L. Puterman and Shin M. C. Management Science 24, 1978.

An example of statistical investigation of the text eugene onegin the connection of samples in chains. A. A. Markov. Science in Context, 19:591–600, 12 2006.

Learning to Rank for Information Retrieval. Tie-Yan Liu. Springer Science & Business Media. 2011

Finite-Time Regret Bounds for the Multiarmed Bandit Problem, Nicolò Cesa-Bianchi, Paul Fischer. ICML 100-108, 1998

Multi-armed bandit allocation indices, Wiley, J. C. Gittins. 1989

Finite-time Analysis of the Multiarmed Bandit Problem, Peter Auer et. al., Machine Learning 47, Issue 2-3. 2002.