Personalizing Information Search: Understanding Users and their Interests Diane Kelly School of...

56
Personalizing Information Search: Understanding Users and their Interests Diane Kelly School of Information & Library Science University of North Carolina [email protected] IPAM | 04 October 2007

Transcript of Personalizing Information Search: Understanding Users and their Interests Diane Kelly School of...

Page 1: Personalizing Information Search: Understanding Users and their Interests Diane Kelly School of Information & Library Science University of North Carolina.

Personalizing Information Search: Understanding

Users and their Interests

Diane KellySchool of Information & Library Science

University of North Carolina [email protected]

IPAM | 04 October 2007

Page 2: Personalizing Information Search: Understanding Users and their Interests Diane Kelly School of Information & Library Science University of North Carolina.

What is IR?

Who works on problems in IR?

Where can I find the most recent work in IR?

A TREC primer

Background: IR and TREC

Page 3: Personalizing Information Search: Understanding Users and their Interests Diane Kelly School of Information & Library Science University of North Carolina.

Personalization is a process where retrieval is customized to the individual (not one-size-fits-all searching)

Hans Peter Luhn was one of the first people to personalize IR through selective dissemination of information (SDI) (now called ‘filtering’)

Profiles and user models are often employed to ‘house’ data about users and represent their interests

Figuring out how to populate and maintain the profile or user model is a hard problem

Background: Personalization

Page 4: Personalizing Information Search: Understanding Users and their Interests Diane Kelly School of Information & Library Science University of North Carolina.

Explicit Feedback

Implicit Feedback

User’s desktop

Major Approaches

Page 5: Personalizing Information Search: Understanding Users and their Interests Diane Kelly School of Information & Library Science University of North Carolina.

Explicit Feedback

Page 6: Personalizing Information Search: Understanding Users and their Interests Diane Kelly School of Information & Library Science University of North Carolina.

Term relevance feedback is one of the most widely used and studied explicit feedback techniques

Typical relevance feedback scenarios (examples)

Systems-centered research has found that relevance feedback works (including pseudo-relevance feedback)

User-centered research has found mixed results about its effectiveness

Explicit Feedback

Page 7: Personalizing Information Search: Understanding Users and their Interests Diane Kelly School of Information & Library Science University of North Carolina.

Terms are not presented in context so it may be hard for users to understand how they can help

Quality of terms suggested is not always good

Users don’t have the additional cognitive resources to engage in explicit feedback

Users are too lazy to provide feedback

Questions about the sustainability of explicit feedback for long-term modeling

Explicit Feedback

Page 8: Personalizing Information Search: Understanding Users and their Interests Diane Kelly School of Information & Library Science University of North Carolina.

Examples

Page 9: Personalizing Information Search: Understanding Users and their Interests Diane Kelly School of Information & Library Science University of North Carolina.

Examples

BACK

Page 10: Personalizing Information Search: Understanding Users and their Interests Diane Kelly School of Information & Library Science University of North Carolina.

Query Elicitation Study Users typically pose very short queries

This may be because

users have a difficult time articulating their information needs

traditional search interfaces encourage short queries

Polyrepresentative extraction of information needs suggests obtaining multiple representations of a single information need (reference interview)

Page 11: Personalizing Information Search: Understanding Users and their Interests Diane Kelly School of Information & Library Science University of North Carolina.

Motivation

Research has demonstrated that a positive relationship exists between query length and performance in batch-mode experimental IR

Query expansion is an effective technique for increasing query length, but research has demonstrated that users have some difficulty with traditional term relevance feedback features

Page 12: Personalizing Information Search: Understanding Users and their Interests Diane Kelly School of Information & Library Science University of North Carolina.

Elicitation Form

[Why Know]

[Already Know]

[Keywords]

Page 13: Personalizing Information Search: Understanding Users and their Interests Diane Kelly School of Information & Library Science University of North Carolina.

Results: Number of Terms

Source of Terms

Q4Q3Q2baseline

Me

an

Nu

mb

er

of

Te

rms

20

15

10

5

0

Alre

ady

Kno

w

Why

Key

wor

dsN=45

9.33

16.18

10.67

2.33

Page 14: Personalizing Information Search: Understanding Users and their Interests Diane Kelly School of Information & Library Science University of North Carolina.

Experimental Runs

Source of Terms Run IDBaseline baseline

Baseline + Pseudo Relevance Feedback

pseudo05, pseudo10, pseudo20, pseudo50

Baseline + Elicitation Form Q2

Q2

Baseline + Elicitation Form Q3

Q3

Baseline + Elicitation Form Q4

Q4

Baseline + Combination of Elicitation Form Questions

Q3Q4, Q2Q3, Q2Q4, Q234

Page 15: Personalizing Information Search: Understanding Users and their Interests Diane Kelly School of Information & Library Science University of North Carolina.

Overall Performance

Run_ID

Q23

4

Q2Q

3

Q2Q

4

Q3Q

4

Q2

Q3

Q4

pseu

do50

base

line

pseu

do20

pseu

do10

pseu

do05

Me

an

Ave

rag

e P

reci

sio

n (

MA

P)

.38

.36

.34

.32

.30

.28

.26

.24

0.3685

0.2843

Page 16: Personalizing Information Search: Understanding Users and their Interests Diane Kelly School of Information & Library Science University of North Carolina.

Query Length and Performance

Query Length

403020100

Me

an

Ave

rag

e P

reci

sio

n (

MA

P)

.38

.36

.34

.32

.30

.28

Q234

Q3Q4

Q2Q4 Q2Q3

Q4Q3

Q2

baseline

y = 0.263 + .000265(x), p=.000

Page 17: Personalizing Information Search: Understanding Users and their Interests Diane Kelly School of Information & Library Science University of North Carolina.

Major Findings Users provided lengthy responses to some

of the questions

There were large differences in the length of users’ responses to each question

In most cases responses significantly improved retrieval

Query length and performance were significantly related

Page 18: Personalizing Information Search: Understanding Users and their Interests Diane Kelly School of Information & Library Science University of North Carolina.

Implicit Feedback

Page 19: Personalizing Information Search: Understanding Users and their Interests Diane Kelly School of Information & Library Science University of North Carolina.

What is it?

Information about users, their needs and document preferences that can be obtained unobtrusively, by watching users’ interactions and behaviors with systems

What are some examples? Examine: Select, View, Listen, Scroll, Find,

Query, Cumulative measures

Retain: Print, Save, Bookmark, Purchase, Email

Reference: Link, Cite

Annotate/Create: Mark up, Type, Edit, Organize, Label

Implicit Feedback

Page 20: Personalizing Information Search: Understanding Users and their Interests Diane Kelly School of Information & Library Science University of North Carolina.

Why is it important? It is generally believed that users are unwilling

to engage in explicit relevance feedback

It is unlikely that users can maintain their profiles over time

Users generate large amounts of data each time the engage in online information-seeking activities and the things in which they are ‘interested’ is in this data somewhere

Implicit Feedback

Page 21: Personalizing Information Search: Understanding Users and their Interests Diane Kelly School of Information & Library Science University of North Carolina.

What do we “know” about it? There seems to be a positive correlation

between selection (click-through) and relevance

There seems to be a positive correlation between display time and relevance

What is problematic about it? Much of the research has been based on

incomplete data and general behavior

And has not considered the impact of contextual variables – such as task and a user’s familiarity with a topic – on behaviors

Implicit Feedback

Page 22: Personalizing Information Search: Understanding Users and their Interests Diane Kelly School of Information & Library Science University of North Carolina.

Implicit Feedback Study

To investigate: the relationship between behaviors and

relevance the relationship between behaviors and

context

To develop a method for studying and measuring behaviors, context and relevance in a natural setting, over time

Page 23: Personalizing Information Search: Understanding Users and their Interests Diane Kelly School of Information & Library Science University of North Carolina.

Method Approach: naturalistic and longitudinal,

but some control

Subjects/Cases: 7 Ph.D. students

Study period: 14 weeks

Compensation: new laptops and printers

Page 24: Personalizing Information Search: Understanding Users and their Interests Diane Kelly School of Information & Library Science University of North Carolina.

Data Collection

Document Context

Tasks

Topics

Persistence

Familiarity

Endurance

Frequency

Stage

Behaviors

DisplayTime

Printing Saving

Relevance

Usefulness

Page 25: Personalizing Information Search: Understanding Users and their Interests Diane Kelly School of Information & Library Science University of North Carolina.

Protocol

START END

14 weeks

ContextEvaluation

DocumentEvaluations

Context Evaluation;Document Evaluations

Client- & Server-side Logging

Week 1 Week 13

Page 26: Personalizing Information Search: Understanding Users and their Interests Diane Kelly School of Information & Library Science University of North Carolina.
Page 27: Personalizing Information Search: Understanding Users and their Interests Diane Kelly School of Information & Library Science University of North Carolina.

Results: Description of Data

Subject1 2 3 4 5 6 7

Client 2.6 MB 6.8 MB 3.9 MB 2.0 MB 1.5 MB 21.7 MB 4.9 MB

Proxy 1.7 GB 83 MB 39 MB 42 MB 48 MB 2.9 GB 2.1 GB

URLs

Requested15,499 5,319 3,157 3,205 3,404 14,586 11,657

Docs

Evaluated

870

(5%)

802

(14%)

384

(12%)

353

(11%)

200

(6%)

1,328

(8%)

1,160

(10%)

Tasks 6 11 19 25 12 21 33

Topics 9 80 17 35 25 40 26

Page 28: Personalizing Information Search: Understanding Users and their Interests Diane Kelly School of Information & Library Science University of North Carolina.

Relevance: Usefulness

Subject

7654321

Me

an

Use

fuln

ess

7.0

6.0

5.0

4.0

3.0

2.0

1.0

0.0

5.0

4.6

5.3

6.0

5.3

6.1

4.8

4.8 (1.65)

6.1 (2.00)

5.3 (2.20)

6.0 (0.80)

5.3 (2.40)

4.6 (0.80)

5.0 (2.40)

Page 29: Personalizing Information Search: Understanding Users and their Interests Diane Kelly School of Information & Library Science University of North Carolina.

Relevance: Usefulness

12

34

56

7

su

bje

ct

1413121110090807060504030201

Week viewed

7

5

3

1

7

5

3

1

7

5

3

1

7

5

3

1

7

5

3

1

7

5

3

1

7

5

3

1

Page 30: Personalizing Information Search: Understanding Users and their Interests Diane Kelly School of Information & Library Science University of North Carolina.

Display Time

Client Display Time

Nu

mb

er

of

Do

cum

en

ts280

260

240

220

200

180

160

140

120

100

80

60

40

200

Page 31: Personalizing Information Search: Understanding Users and their Interests Diane Kelly School of Information & Library Science University of North Carolina.

Display Time & Usefulness

Usefulness

7654321

Me

an

Lo

g D

isp

lay

Tim

e5.0

4.5

4.0

3.5

3.0

2.5

2.0

1.5

1.0

.5

0.0

Subject

1

2

3

4

5

6

7

Page 32: Personalizing Information Search: Understanding Users and their Interests Diane Kelly School of Information & Library Science University of North Carolina.

Display Time & Task

1630177812733N =

Task Number

654321

Me

an

Lo

g D

isp

lay

Tim

e

4.0

3.5

3.0

2.5

2.0

1.5

Tasks

1. Researching Dissertation

2. Shopping

3. Read News

4. Movie Reviews & Schedules

5. Preparing Course

6. Entertainment

Page 33: Personalizing Information Search: Understanding Users and their Interests Diane Kelly School of Information & Library Science University of North Carolina.

Major Findings Behaviors differed for each subject, but in

general

most display times were low

most usefulness ratings were high

not much printing or saving

No direct relationship between display time and usefulness

Page 34: Personalizing Information Search: Understanding Users and their Interests Diane Kelly School of Information & Library Science University of North Carolina.

Major Findings Main effects for display time and all

contextual variables:

Task (5 subjects)

Topic (6 subjects)

Familiarity (5 subjects)Lower levels of familiarity associated with higher display times

No clear interaction effects among behaviors, context and relevance

Page 35: Personalizing Information Search: Understanding Users and their Interests Diane Kelly School of Information & Library Science University of North Carolina.

Personalizing Search

Using the display time, task and relevance information from the study, we evaluated the effectiveness of a set of personalized retrieval algorithms

Four algorithms for using display time as implicit feedback were tested:

1. User

2. Task

3. User + Task

4. General

Page 36: Personalizing Information Search: Understanding Users and their Interests Diane Kelly School of Information & Library Science University of North Carolina.

Results

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0 2 4 6 8 10 12 14 16 18 20

0

0.1

0.2

0.3

0.4

0.5

0.6

0 2 4 6 8 10 12 14 16 18 20

TaskAndUser

TaskOnly

UserOnly

All

Iteration

MAP

Page 37: Personalizing Information Search: Understanding Users and their Interests Diane Kelly School of Information & Library Science University of North Carolina.

Major Findings Tailoring display time thresholds based on

task information improved performance, but doing so based on user information did not

There was a lot of variability between subjects, with the user-centered algorithms performing well for some and poorly for others

The effectiveness of most of the algorithms increased with time (and more data)

Page 38: Personalizing Information Search: Understanding Users and their Interests Diane Kelly School of Information & Library Science University of North Carolina.

Some Problems

Page 39: Personalizing Information Search: Understanding Users and their Interests Diane Kelly School of Information & Library Science University of North Carolina.

Relevance What are we modeling? Does click =

relevance? Relevance is multi-dimensional and dynamic

A single measure does to adequately reflect ‘relevance’

Most pages are likely to be rated as useful, even if the value or importance of the information differs

Page 40: Personalizing Information Search: Understanding Users and their Interests Diane Kelly School of Information & Library Science University of North Carolina.

Definition

Recipe

Page 41: Personalizing Information Search: Understanding Users and their Interests Diane Kelly School of Information & Library Science University of North Carolina.

Weather Forecast

Information about RockyMountain Spotted Fever

Page 42: Personalizing Information Search: Understanding Users and their Interests Diane Kelly School of Information & Library Science University of North Carolina.

Paper about Personalization

Page 43: Personalizing Information Search: Understanding Users and their Interests Diane Kelly School of Information & Library Science University of North Carolina.

Page Structure Some behaviors are more likely to occur on

some types of pages

A more ‘intelligent’ modeling function would know when and what to observe and expect

The structure of pages encourage/inhibit certain behaviors

Not all pages are equally as useful for modeling a user’s interests

Page 44: Personalizing Information Search: Understanding Users and their Interests Diane Kelly School of Information & Library Science University of North Carolina.

What types of behaviors do you expect here?

And here?

Page 45: Personalizing Information Search: Understanding Users and their Interests Diane Kelly School of Information & Library Science University of North Carolina.

And here?

And here?

Page 46: Personalizing Information Search: Understanding Users and their Interests Diane Kelly School of Information & Library Science University of North Carolina.

The Future

Page 47: Personalizing Information Search: Understanding Users and their Interests Diane Kelly School of Information & Library Science University of North Carolina.

Future New interaction styles and systems create

new opportunities for explicit and implicit feedback

Collaborative search features and query recommendation

Features/Systems that support the entire search process (e.g., saving, organizing, etc.)

QA systems

New types of feedback

Negative

Physiological

Page 48: Personalizing Information Search: Understanding Users and their Interests Diane Kelly School of Information & Library Science University of North Carolina.

Diane Kelly ([email protected])

WEB: http://ils.unc.edu/~dianek/research.html

Collaborators: Nick Belkin, Xin Fu, Vijay Dollu, Ryen White

Thank You

Page 49: Personalizing Information Search: Understanding Users and their Interests Diane Kelly School of Information & Library Science University of North Carolina.

TREC[Text REtrieval Conference]

It’s not this …

Page 50: Personalizing Information Search: Understanding Users and their Interests Diane Kelly School of Information & Library Science University of North Carolina.

What is TREC? TREC is a workshop series sponsored by the

National Institute of Standards and Technology (NIST) and the US Department of Defense.

It’s purpose is to build infrastructure for large-scale evaluation of text retrieval technology.

TREC collections and evaluation measures are the de facto standard for evaluation in IR.

TREC is comprised of different tracks each of which focuses on different issues (e.g., question answering, filtering).

Page 51: Personalizing Information Search: Understanding Users and their Interests Diane Kelly School of Information & Library Science University of North Carolina.
Page 52: Personalizing Information Search: Understanding Users and their Interests Diane Kelly School of Information & Library Science University of North Carolina.

TREC Collections Central to each TREC Track is a collection, which

consists of three major components:

1. A corpus of documents (typically newswire)

2. A set of information needs (called topics)

3. A set of relevance judgments.

Each Track also adopts particular evaluation measures

Precision and Recall; F-measure

Average Precision (AP) and Mean AP (MAP)

Page 53: Personalizing Information Search: Understanding Users and their Interests Diane Kelly School of Information & Library Science University of North Carolina.

Comparison of Measures

List 1 List 2

1 R 1/1 = 1 1 NR

2 R 2/2 = 2 2 NR

3 R 3/3 = 3 3 NR

4 R 4/4 = 4 4 NR

5 R 5/5 = 5 5 NR

6 NR 6 R 1/6 = .167

7 NR 7 R 2/7 = .286

8 NR 8 R 3/8 = .375

9 NR 9 R 4/9 = .444

10

NR 10

R 5/10 = .50

AP 1.0 AP .354

Page 54: Personalizing Information Search: Understanding Users and their Interests Diane Kelly School of Information & Library Science University of North Carolina.

Learn more about TREC http://trec.nist.gov

Voorhees, E. M., & Harman, D. K. (2005). TREC: Experiment and Evaluation in Information Retrieval, Cambridge, MA: MIT Press.

BACK

Page 55: Personalizing Information Search: Understanding Users and their Interests Diane Kelly School of Information & Library Science University of North Carolina.

Example Topic

BACK

Page 56: Personalizing Information Search: Understanding Users and their Interests Diane Kelly School of Information & Library Science University of North Carolina.

Learn more about IR ACM SIGIR Conference

Sparck-Jones, K., & Willett, P. (1997). Readings in Information Retrieval. Morgan-Kaufman Publishers.

Baeza-Yates, R., & Ribeiro-Neto, B. (1999). Modern information retrieval. New York, NY: ACM Press.

Grossman, D. A., Frieder, O. (2004). Information retrieval: Algorithms and Heuristics. The Netherlands: Springer.

BACK