Cikm 2014 v2

32
Modeling and Detecting Changes in User Satisfaction Julia Kiseleva*, Eric Crestan, Riccardo Brigo, Roland Dittel *Eindhoven University of Technology Microsoft Bing

Transcript of Cikm 2014 v2

Modeling and Detecting

Changes in User Satisfaction

Julia Kiseleva*, Eric Crestan, Riccardo Brigo, Roland Dittel

*Eindhoven University of Technology

Microsoft Bing

Want to go to CIKM

conference

QUERY SERP

What is User Satisfaction?

What is User Satisfaction?

QUERY SERP,

What is User Satisfaction?

QUERY SERP,

What is User Satisfaction?

QUERY SERP,Pr (Ref.)

Assumption: If a “significant” amount of users

reformulate a query with a particular SERP it is an

indication of changing in user preferences

World May Change User Preferences

QUERY SERP,

QUERY SERP

ti

ti+1 ,

Tim

elin

e

Pr ti =

Pr ti+1 =

How Can We Detect the Changes?

QUERY SERP,

QUERY SERP

ti

ti+1 ,| Pr ti - Pr ti+1 |

Tim

elin

e

Pr ti =

Pr ti+1 =

How Can We Detect the Changes?

• There are many definitions in the literature

• We use the query expansion

o new years wallpaper IS REFORMULATED WITH 2014

o medals Olympics IS REFORMULATED WITH 2014

o ct 40ez IS REFORMULATED WITH 2013

o march 31 holiday IS REFORMULATED WITH 2014

o …

Detecting Query Reformulation

An Example of the Drift inReformulation Signal

The Explanation of the Drift

Before November 2013 After November 2013

The Question:

“How to detect

this kind of

changes?”

• Change detection techniques o In dynamically changing and non-stationary environments, the data distribution can

change over time yielding the phenomenon of concept drift

o The real concept drift refers to changes in the conditional distribution of the output

(i.e., target variable) given the input (input features)

• Concept drift types:

Change Detection Techniques

• Change detection techniques o In dynamically changing and non-stationary environments, the data distribution can change over time

yielding the phenomenon of concept drift

o The real concept drift refers to changes in the conditional distribution of the output (i.e., target variable)

given the input (input features)

• Concept drift types:

Time

Data

mean

Sudden/abrupt

Disambiguation

such as

“flawless Beyoncé”

Change Detection Techniques

• Change detection techniques o In dynamically changing and non-stationary environments, the data distribution can change over time

yielding the phenomenon of concept drift

o The real concept drift refers to changes in the conditional distribution of the output (i.e., target variable)

given the input (input features)

• Concept drift types:

Time

Data

mean

Incremental

Disambiguation

such as

“cikm conference

2014”

Change Detection Techniques

• Change detection techniques o In dynamically changing and non-stationary environments, the data distribution can change over time

yielding the phenomenon of concept drift

o The real concept drift refers to changes in the conditional distribution of the output (i.e., target variable)

given the input (input features)

• Concept drift types:

Time

Data

mean

Gradual

Breaking news

such as

“idaho bus crash

investigation”

Change Detection Techniques

• Change detection techniques o In dynamically changing and non-stationary environments, the data distribution can change over time

yielding the phenomenon of concept drift

o The real concept drift refers to changes in the conditional distribution of the output (i.e., target variable)

given the input (input features)

• Concept drift types:

Time

Data

mean

Reoccurring

Seasonal change

such as

“black Friday 2014”

Change Detection Techniques

• Change detection techniques o In dynamically changing and non-stationary environments, the data distribution can change over time

yielding the phenomenon of concept drift

o The real concept drift refers to changes in the conditional distribution of the output (i.e., target variable)

given the input (input features)

• Concept drift types:

Time

Data

mean

Change Detection Techniques

• Change detection techniques o In dynamically changing and non-stationary environments, the data distribution can

change over time yielding the phenomenon of concept drift

o The real concept drift refers to changes in the conditional distribution of the output

(i.e., target variable) given the input (input features)

• Concept drift types:

Time

Data

mea

n

Sudden/abru

ptIncremental Gradual

Reoccurring

concepts

Outlier

(not concept drift)

Disambiguation

such as

“medal olympics

2014”

Seasonal change

such as

“black Friday

2014”

Breaking news

such as

“idaho bus crash

investigation”

Disambiguation

such as

“cikm conference

2014”

Change Detection Techniques

Detecting Drifts in Reformulation Signal

Query: “cikm conference”

0.1

TimeLinet0

0.1 0.2 0.2 0.3

Reformulation: “2014”

Window W0ti

Detecting Drifts in Reformulation Signal

Query: “cikm conference”

0.1

TimeLinet0 ti+ t

0.1 0.2 0.2 0.3 0.7 0.8 0.8

Reformulation: “2014”

Window W0 Window W1ti

E(W0) E(W1)

Size of Window W1 = n1Size of Window W0 = n0

The

upcoming

conference

event

If |E(W1) - E(W2)|> eout

Then Drift Detected

Calculating Threshold eout

Confidence

Variance at W = W0 U W1

m = 1/(1/n0 + 1/n1)

eout

Learn

reformulation

model M

User Behavior

Logs

t0 Timelineti+

Learn

reformulation

model M

User Behavior

Logs

t0

Incoming User

Behavior logs

Timeline

Detect changes in model M

If change detected

else Do Nothing

ti ti+ t

Learn

reformulation

model M

User Behavior

Logs

ti

Incoming User

Behavior logs

Timeline

Detect changes in model M

If change detected

else Do Nothing

ti+w1 ti+w1+w2

Alarm:Change of user

satisfaction

detected

for pairs :

{<Qi,

SERPi>}1<i<n

Learn

reformulation

model M

User Behavior

Logs

t0

Incoming User

Behavior Logs

Timeline

Detect changes in model M

If change detected

else Do Nothing

ti ti+ t

1) List of reformulation terms

per query

2) List of URLs per

reformulation

Alarm:Change of user

satisfaction

detected

for pairs :

{<Qi,

SERPi>}1<i<n

o Dataset consists of 6 months

of the behavioral log data

from a commercial search

engine

o The training window size is

one month

o The test window size is two

weeks

Experimentation

Evaluation

Results

oWe successfully leveraged the concept drift detection

techniques to detect changes in user satisfaction

o The proposed technique works in unsupervised way

o Large scale evaluation has been performed

oClassification of the drift type is needed

o Prediction of the lifetime of the drift would help

Conclusion and Future Work

Questions?

Questions?

oWe successfully leveraged the concept drift detection

techniques

o The proposed technique works in unsupervised way

o Large scale evaluation has been performed

oClassification of the drift type is needed

o Prediction of the lifetime of the drift would help

Conclusion and Future Work