Personalizing Information Search: Understanding Users and their Interests Diane Kelly School of...
-
Upload
kristopher-mclaughlin -
Category
Documents
-
view
218 -
download
4
Transcript of Personalizing Information Search: Understanding Users and their Interests Diane Kelly School of...
Personalizing Information Search: Understanding
Users and their Interests
Diane KellySchool of Information & Library Science
University of North Carolina [email protected]
IPAM | 04 October 2007
What is IR?
Who works on problems in IR?
Where can I find the most recent work in IR?
A TREC primer
Background: IR and TREC
Personalization is a process where retrieval is customized to the individual (not one-size-fits-all searching)
Hans Peter Luhn was one of the first people to personalize IR through selective dissemination of information (SDI) (now called ‘filtering’)
Profiles and user models are often employed to ‘house’ data about users and represent their interests
Figuring out how to populate and maintain the profile or user model is a hard problem
Background: Personalization
Explicit Feedback
Implicit Feedback
User’s desktop
Major Approaches
Explicit Feedback
Term relevance feedback is one of the most widely used and studied explicit feedback techniques
Typical relevance feedback scenarios (examples)
Systems-centered research has found that relevance feedback works (including pseudo-relevance feedback)
User-centered research has found mixed results about its effectiveness
Explicit Feedback
Terms are not presented in context so it may be hard for users to understand how they can help
Quality of terms suggested is not always good
Users don’t have the additional cognitive resources to engage in explicit feedback
Users are too lazy to provide feedback
Questions about the sustainability of explicit feedback for long-term modeling
Explicit Feedback
Examples
Examples
BACK
Query Elicitation Study Users typically pose very short queries
This may be because
users have a difficult time articulating their information needs
traditional search interfaces encourage short queries
Polyrepresentative extraction of information needs suggests obtaining multiple representations of a single information need (reference interview)
Motivation
Research has demonstrated that a positive relationship exists between query length and performance in batch-mode experimental IR
Query expansion is an effective technique for increasing query length, but research has demonstrated that users have some difficulty with traditional term relevance feedback features
Elicitation Form
[Why Know]
[Already Know]
[Keywords]
Results: Number of Terms
Source of Terms
Q4Q3Q2baseline
Me
an
Nu
mb
er
of
Te
rms
20
15
10
5
0
Alre
ady
Kno
w
Why
Key
wor
dsN=45
9.33
16.18
10.67
2.33
Experimental Runs
Source of Terms Run IDBaseline baseline
Baseline + Pseudo Relevance Feedback
pseudo05, pseudo10, pseudo20, pseudo50
Baseline + Elicitation Form Q2
Q2
Baseline + Elicitation Form Q3
Q3
Baseline + Elicitation Form Q4
Q4
Baseline + Combination of Elicitation Form Questions
Q3Q4, Q2Q3, Q2Q4, Q234
Overall Performance
Run_ID
Q23
4
Q2Q
3
Q2Q
4
Q3Q
4
Q2
Q3
Q4
pseu
do50
base
line
pseu
do20
pseu
do10
pseu
do05
Me
an
Ave
rag
e P
reci
sio
n (
MA
P)
.38
.36
.34
.32
.30
.28
.26
.24
0.3685
0.2843
Query Length and Performance
Query Length
403020100
Me
an
Ave
rag
e P
reci
sio
n (
MA
P)
.38
.36
.34
.32
.30
.28
Q234
Q3Q4
Q2Q4 Q2Q3
Q4Q3
Q2
baseline
y = 0.263 + .000265(x), p=.000
Major Findings Users provided lengthy responses to some
of the questions
There were large differences in the length of users’ responses to each question
In most cases responses significantly improved retrieval
Query length and performance were significantly related
Implicit Feedback
What is it?
Information about users, their needs and document preferences that can be obtained unobtrusively, by watching users’ interactions and behaviors with systems
What are some examples? Examine: Select, View, Listen, Scroll, Find,
Query, Cumulative measures
Retain: Print, Save, Bookmark, Purchase, Email
Reference: Link, Cite
Annotate/Create: Mark up, Type, Edit, Organize, Label
Implicit Feedback
Why is it important? It is generally believed that users are unwilling
to engage in explicit relevance feedback
It is unlikely that users can maintain their profiles over time
Users generate large amounts of data each time the engage in online information-seeking activities and the things in which they are ‘interested’ is in this data somewhere
Implicit Feedback
What do we “know” about it? There seems to be a positive correlation
between selection (click-through) and relevance
There seems to be a positive correlation between display time and relevance
What is problematic about it? Much of the research has been based on
incomplete data and general behavior
And has not considered the impact of contextual variables – such as task and a user’s familiarity with a topic – on behaviors
Implicit Feedback
Implicit Feedback Study
To investigate: the relationship between behaviors and
relevance the relationship between behaviors and
context
To develop a method for studying and measuring behaviors, context and relevance in a natural setting, over time
Method Approach: naturalistic and longitudinal,
but some control
Subjects/Cases: 7 Ph.D. students
Study period: 14 weeks
Compensation: new laptops and printers
Data Collection
Document Context
Tasks
Topics
Persistence
Familiarity
Endurance
Frequency
Stage
Behaviors
DisplayTime
Printing Saving
Relevance
Usefulness
Protocol
START END
14 weeks
ContextEvaluation
DocumentEvaluations
Context Evaluation;Document Evaluations
Client- & Server-side Logging
Week 1 Week 13
Results: Description of Data
Subject1 2 3 4 5 6 7
Client 2.6 MB 6.8 MB 3.9 MB 2.0 MB 1.5 MB 21.7 MB 4.9 MB
Proxy 1.7 GB 83 MB 39 MB 42 MB 48 MB 2.9 GB 2.1 GB
URLs
Requested15,499 5,319 3,157 3,205 3,404 14,586 11,657
Docs
Evaluated
870
(5%)
802
(14%)
384
(12%)
353
(11%)
200
(6%)
1,328
(8%)
1,160
(10%)
Tasks 6 11 19 25 12 21 33
Topics 9 80 17 35 25 40 26
Relevance: Usefulness
Subject
7654321
Me
an
Use
fuln
ess
7.0
6.0
5.0
4.0
3.0
2.0
1.0
0.0
5.0
4.6
5.3
6.0
5.3
6.1
4.8
4.8 (1.65)
6.1 (2.00)
5.3 (2.20)
6.0 (0.80)
5.3 (2.40)
4.6 (0.80)
5.0 (2.40)
Relevance: Usefulness
12
34
56
7
su
bje
ct
1413121110090807060504030201
Week viewed
7
5
3
1
7
5
3
1
7
5
3
1
7
5
3
1
7
5
3
1
7
5
3
1
7
5
3
1
Display Time
Client Display Time
Nu
mb
er
of
Do
cum
en
ts280
260
240
220
200
180
160
140
120
100
80
60
40
200
Display Time & Usefulness
Usefulness
7654321
Me
an
Lo
g D
isp
lay
Tim
e5.0
4.5
4.0
3.5
3.0
2.5
2.0
1.5
1.0
.5
0.0
Subject
1
2
3
4
5
6
7
Display Time & Task
1630177812733N =
Task Number
654321
Me
an
Lo
g D
isp
lay
Tim
e
4.0
3.5
3.0
2.5
2.0
1.5
Tasks
1. Researching Dissertation
2. Shopping
3. Read News
4. Movie Reviews & Schedules
5. Preparing Course
6. Entertainment
Major Findings Behaviors differed for each subject, but in
general
most display times were low
most usefulness ratings were high
not much printing or saving
No direct relationship between display time and usefulness
Major Findings Main effects for display time and all
contextual variables:
Task (5 subjects)
Topic (6 subjects)
Familiarity (5 subjects)Lower levels of familiarity associated with higher display times
No clear interaction effects among behaviors, context and relevance
Personalizing Search
Using the display time, task and relevance information from the study, we evaluated the effectiveness of a set of personalized retrieval algorithms
Four algorithms for using display time as implicit feedback were tested:
1. User
2. Task
3. User + Task
4. General
Results
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 2 4 6 8 10 12 14 16 18 20
0
0.1
0.2
0.3
0.4
0.5
0.6
0 2 4 6 8 10 12 14 16 18 20
TaskAndUser
TaskOnly
UserOnly
All
Iteration
MAP
Major Findings Tailoring display time thresholds based on
task information improved performance, but doing so based on user information did not
There was a lot of variability between subjects, with the user-centered algorithms performing well for some and poorly for others
The effectiveness of most of the algorithms increased with time (and more data)
Some Problems
Relevance What are we modeling? Does click =
relevance? Relevance is multi-dimensional and dynamic
A single measure does to adequately reflect ‘relevance’
Most pages are likely to be rated as useful, even if the value or importance of the information differs
Definition
Recipe
Weather Forecast
Information about RockyMountain Spotted Fever
Paper about Personalization
Page Structure Some behaviors are more likely to occur on
some types of pages
A more ‘intelligent’ modeling function would know when and what to observe and expect
The structure of pages encourage/inhibit certain behaviors
Not all pages are equally as useful for modeling a user’s interests
What types of behaviors do you expect here?
And here?
And here?
And here?
The Future
Future New interaction styles and systems create
new opportunities for explicit and implicit feedback
Collaborative search features and query recommendation
Features/Systems that support the entire search process (e.g., saving, organizing, etc.)
QA systems
New types of feedback
Negative
Physiological
Diane Kelly ([email protected])
WEB: http://ils.unc.edu/~dianek/research.html
Collaborators: Nick Belkin, Xin Fu, Vijay Dollu, Ryen White
Thank You
TREC[Text REtrieval Conference]
It’s not this …
What is TREC? TREC is a workshop series sponsored by the
National Institute of Standards and Technology (NIST) and the US Department of Defense.
It’s purpose is to build infrastructure for large-scale evaluation of text retrieval technology.
TREC collections and evaluation measures are the de facto standard for evaluation in IR.
TREC is comprised of different tracks each of which focuses on different issues (e.g., question answering, filtering).
TREC Collections Central to each TREC Track is a collection, which
consists of three major components:
1. A corpus of documents (typically newswire)
2. A set of information needs (called topics)
3. A set of relevance judgments.
Each Track also adopts particular evaluation measures
Precision and Recall; F-measure
Average Precision (AP) and Mean AP (MAP)
Comparison of Measures
List 1 List 2
1 R 1/1 = 1 1 NR
2 R 2/2 = 2 2 NR
3 R 3/3 = 3 3 NR
4 R 4/4 = 4 4 NR
5 R 5/5 = 5 5 NR
6 NR 6 R 1/6 = .167
7 NR 7 R 2/7 = .286
8 NR 8 R 3/8 = .375
9 NR 9 R 4/9 = .444
10
NR 10
R 5/10 = .50
AP 1.0 AP .354
Learn more about TREC http://trec.nist.gov
Voorhees, E. M., & Harman, D. K. (2005). TREC: Experiment and Evaluation in Information Retrieval, Cambridge, MA: MIT Press.
BACK
Example Topic
BACK
Learn more about IR ACM SIGIR Conference
Sparck-Jones, K., & Willett, P. (1997). Readings in Information Retrieval. Morgan-Kaufman Publishers.
Baeza-Yates, R., & Ribeiro-Neto, B. (1999). Modern information retrieval. New York, NY: ACM Press.
Grossman, D. A., Frieder, O. (2004). Information retrieval: Algorithms and Heuristics. The Netherlands: Springer.
BACK