Recognizing User Interest and Document Value from Reading and Organizing Activities in Document...
-
date post
15-Jan-2016 -
Category
Documents
-
view
215 -
download
0
Transcript of Recognizing User Interest and Document Value from Reading and Organizing Activities in Document...
Recognizing User Interest and Document Value from Reading and Organizing
Activities in Document Triage
Rajiv Badi, Soonil Bae, J. Michael Moore, Konstantinos Meintanis, Anna Zacchi, Haowei
Hsieh, Frank Shipman and Cathy Marshall
Center for the Study of Digital Libraries &
Department of Computer Science
Texas A&M University
Microsoft Corporation
What is Document Triage?
● People quickly evaluate a large set of documents selecting documents to read
● People organize them into a personal information collection
● People re-read the documents, progressively refining the organization
● Knowledge forms incrementally as initial understanding becomes more refined over time
A specific form of information collecting, reading and organizing
2/16
Prior Document Triage Study (2004)
● Task: organize the documents to help a teacher prepares a set of lessons on ethnomathematics as a reference librarian
● 24 subjects
● 40 documents from NSDL & Google searches
● Organizing tool: Visual Knowledge Builder (VKB)
● Reading tool: Internet Explorer (IE)
● Logged reading & editing events
● Asked subjects to select five most & least useful documents
3/16
Initial Document List
4/16
Document object
Collection
Metadata
Page title
Page URL
Summary
NSDL Search
System-generated Visualization based on metadata
Google Search
Document in a Web Browser
5/16
Final Organization Sample
6/16
Categories
(Collections)
Background Color
Border Color
Border Thickness
Proactive Support for Document Triage
1.Recognizing user interest and document value
2.Representing user interests
3.Recognizing documents of interest
4.Visualizing interest information
Motivations7/16
Recognizing User Interest (1)
● Explicit and implicit interest indicators
● Correlation between reading activity and user interest
● Reading time, # of visits, # of scrolls, …
● Correlation between organizing activity and user interest
● Resize, move, delete …
● Correlation between document attributes and user interest
● # of characters, # of links, # of images …
8/16
Recognizing User Interest (2)
● Prior work has focused on a single application as the source for interest indicators
● Document triage occurs in the context of multiple applications
● Interest profile is the basis for determining, sharing and storing implicit interest
9/16
Interest Profile Manager
10/16
Interest Profile
Communication Communication Communication
Communication
ReadingApplication
OrganizingApplication
OverviewApplication
Interest Profile Manager
InterestModels
Data Analysis (1)
11/16
Document Attributes
Reading Activity Organizing Activity
# of characters
# of links
# of images
Reading time
# of clicks
# of text selections
# of scrolls
# of scrolling direction changes
Time spent in scrolling
Scroll offset
# of document accesses
# of object moves
# of object resizes
# of object deletions
# of content changes
# of background color changes
# of border color changes
# of border width changes
Data Analysis (2)
● Identified the correlation between user activity & document attributes and user interest
● Found meaningful interest indicators in user activity
● Reading time, # of scrolls, # of resize events …
● Found meaningful interest indicators in document attributes
● # of characters, # of links, # of images …
● No indicator cannot dominantly identify user interest
● Significant difference between individual styles
12/16
Interest Models
● Models to estimate average interest on documents
13/16
Model name Data
Statistical Model
Reading activity model
Reading activity
Organizing activity model
Organizing activity
Combined Model
Reading & Organizing activity
Qualitative ModelReading & Organizing activity
Evaluation (1)
● The same task and topic as in the prior study in 2004
● 16 subjects
● 40 documents from NSDL & Google searches
● Asked subjects to select five most & least useful documents
● Scaled to a continuous value between 0 (least useful) and 2 (most useful)
● Calculated the absolute value of the difference between the explicit user rating and each model's predicted rating
14/16
Evaluation (2)
● Combined and qualitative models using reading and organizing activity show better performance than others
14/15
0 - 5%
5 - 10%
10 - 15%
15 - 20%
20 - 25%
25 - 30%
30 - 35%
Reading
Organizing
Combined
Qualitative
0%5%10%15%20%25%30%35%40%45%
Freq
uen
cy
Residue ErrorModels
Conclusion
● Predictive models based on user activity collected from multiple applications have been built
● Utilizing user activity from multiple applications rather than single application can improve the accuracy of prediction
● Software infra structure, Interest Profile Manager, has been developed to support the result
16/16