랭킹 최적화를 넘어 인간적인 검색으로 - 서울대 융합기술원 발표
-
Upload
jin-young-kim -
Category
Technology
-
view
5.167 -
download
7
Transcript of 랭킹 최적화를 넘어 인간적인 검색으로 - 서울대 융합기술원 발표
![Page 1: 랭킹 최적화를 넘어 인간적인 검색으로 - 서울대 융합기술원 발표](https://reader034.fdocuments.net/reader034/viewer/2022042700/559580181a28abd8318b4628/html5/thumbnails/1.jpg)
HUMANE INFORMATION SEEKING:
GOING BEYOND THE IR WAY
JIN YOUNG KIM @ SNU DCC
1
![Page 2: 랭킹 최적화를 넘어 인간적인 검색으로 - 서울대 융합기술원 발표](https://reader034.fdocuments.net/reader034/viewer/2022042700/559580181a28abd8318b4628/html5/thumbnails/2.jpg)
Jin Young Kim
• Graduate of SNU EE / Business
• 5th Year Ph.D Student in UMass Computer Science
• Starting as a Applied Researcher at Microsoft Bing
2
![Page 3: 랭킹 최적화를 넘어 인간적인 검색으로 - 서울대 융합기술원 발표](https://reader034.fdocuments.net/reader034/viewer/2022042700/559580181a28abd8318b4628/html5/thumbnails/3.jpg)
Today’s Agenda
• A brief introduction of IR as a research area
• An example of how we design a retrieval model
• Other research projects and recent trends in IR
3
![Page 4: 랭킹 최적화를 넘어 인간적인 검색으로 - 서울대 융합기술원 발표](https://reader034.fdocuments.net/reader034/viewer/2022042700/559580181a28abd8318b4628/html5/thumbnails/4.jpg)
BACKGROUND
An Information Retrieval Primer
4
![Page 5: 랭킹 최적화를 넘어 인간적인 검색으로 - 서울대 융합기술원 발표](https://reader034.fdocuments.net/reader034/viewer/2022042700/559580181a28abd8318b4628/html5/thumbnails/5.jpg)
Information Retrieval?
• The study of how an automated system can enable
its users to access, interact with, and make sense of
information.
5
User
Query
DocumentVisit
IssueSurface
![Page 6: 랭킹 최적화를 넘어 인간적인 검색으로 - 서울대 융합기술원 발표](https://reader034.fdocuments.net/reader034/viewer/2022042700/559580181a28abd8318b4628/html5/thumbnails/6.jpg)
IR Research in Context
• Situated between human interface and
system / analytics research
• Aims at satisfying user’s information needs
• Based on large-scale system infrastructure & analytics
• Need for convergence research!
6
Information Retrieval
Large-scale
System Infra.
Large-scale
(Text)Analytic
s
End-user Interface
(UX / HCI / InfoViz)
![Page 7: 랭킹 최적화를 넘어 인간적인 검색으로 - 서울대 융합기술원 발표](https://reader034.fdocuments.net/reader034/viewer/2022042700/559580181a28abd8318b4628/html5/thumbnails/7.jpg)
Major Problems in IR
• Matching
• (Keyword) Search : query – document
• Personalized Search : (user+query) – document
• Contextual Advertising : (user+context) – advertisement
• Quality
• Authority/ Spam / Freshness
• Various ways to capture them
• Relevance Scoring
• Combination of matching and quality features
• Evaluation is critical for optimal performance
7
User
Query
DocumentVisit
IssueSurface
![Page 8: 랭킹 최적화를 넘어 인간적인 검색으로 - 서울대 융합기술원 발표](https://reader034.fdocuments.net/reader034/viewer/2022042700/559580181a28abd8318b4628/html5/thumbnails/8.jpg)
HUMANE INFORMATION
RETRIEVAL
Going Beyond the IR Way
8
![Page 9: 랭킹 최적화를 넘어 인간적인 검색으로 - 서울대 융합기술원 발표](https://reader034.fdocuments.net/reader034/viewer/2022042700/559580181a28abd8318b4628/html5/thumbnails/9.jpg)
9
You need the freedom of expression.You need someone who understands.
Information seeking requires a communication.
![Page 10: 랭킹 최적화를 넘어 인간적인 검색으로 - 서울대 융합기술원 발표](https://reader034.fdocuments.net/reader034/viewer/2022042700/559580181a28abd8318b4628/html5/thumbnails/10.jpg)
10
Information Seeking circa 2012
Search engine accepts keywords only.Search engine doesn’t understand you.
![Page 11: 랭킹 최적화를 넘어 인간적인 검색으로 - 서울대 융합기술원 발표](https://reader034.fdocuments.net/reader034/viewer/2022042700/559580181a28abd8318b4628/html5/thumbnails/11.jpg)
11
Toward Humane Information Seeking
Rich User Interactions
Rich User Modeling
Profile
Context
Behavior
Search
Browsing
Filtering
![Page 12: 랭킹 최적화를 넘어 인간적인 검색으로 - 서울대 융합기술원 발표](https://reader034.fdocuments.net/reader034/viewer/2022042700/559580181a28abd8318b4628/html5/thumbnails/12.jpg)
from Query to SessionRich User ModelingHCIR Way:
12
Action Response
Action Response
Action Response
USER SYSTEM
Interaction
History
Filtering / Browsing
Relevance Feedback
…
Filtering Conditions
Related Items
…
User
Model
Rich User InteractionIR Way:The
ProfileContextBehavior
HCIR = HCI + IR
![Page 13: 랭킹 최적화를 넘어 인간적인 검색으로 - 서울대 융합기술원 발표](https://reader034.fdocuments.net/reader034/viewer/2022042700/559580181a28abd8318b4628/html5/thumbnails/13.jpg)
13
The Rest of Talk…
Web Search
Personal SearchImproving search and browsing for known-item findingEvaluating interactions combining search and browsing
User modeling based on reading level and topicProviding non-intrusive recommendations for browsing
Book SearchAnalyzing interactions combining search and filtering
![Page 14: 랭킹 최적화를 넘어 인간적인 검색으로 - 서울대 융합기술원 발표](https://reader034.fdocuments.net/reader034/viewer/2022042700/559580181a28abd8318b4628/html5/thumbnails/14.jpg)
PERSONAL SEARCH
Retrieval And Evaluation Techniques
for Personal Information [Thesis]
14
![Page 15: 랭킹 최적화를 넘어 인간적인 검색으로 - 서울대 융합기술원 발표](https://reader034.fdocuments.net/reader034/viewer/2022042700/559580181a28abd8318b4628/html5/thumbnails/15.jpg)
Example: Desktop Search
15
Example: Search over Social Media
Ranking using Multiple
Document Types for
Desktop Search [SIGIR10]
Evaluating Search in
Personal Social Media
Collections [WSDM12]
![Page 16: 랭킹 최적화를 넘어 인간적인 검색으로 - 서울대 융합기술원 발표](https://reader034.fdocuments.net/reader034/viewer/2022042700/559580181a28abd8318b4628/html5/thumbnails/16.jpg)
Structured Document Retrieval: Background
• Field Operator / Advanced Search Interface
• User’s search terms are found in multiple fields
16
Understanding Re-finding Behavior in Naturalistic Email
Interaction Logs. Elsweiler, D, Harvey, M, Hacker., M [SIGIR'11]
![Page 17: 랭킹 최적화를 넘어 인간적인 검색으로 - 서울대 융합기술원 발표](https://reader034.fdocuments.net/reader034/viewer/2022042700/559580181a28abd8318b4628/html5/thumbnails/17.jpg)
Structured Document Retrieval: Models
• Document-based Retrieval Model
• Score each document as a whole
• Field-based Retrieval Model
• Combine evidences from each field
q1 q2 ... qm
Document-based Scoring Field-based Scoring
f1
f2
fn
...
q1 q2 ... qm
f1
f2
fn
...
f1
f2
fn
...
w1
w2
wn
w1
w2
wn
17
![Page 18: 랭킹 최적화를 넘어 인간적인 검색으로 - 서울대 융합기술원 발표](https://reader034.fdocuments.net/reader034/viewer/2022042700/559580181a28abd8318b4628/html5/thumbnails/18.jpg)
18
1
1
2
21
2
• Field Relevance
• Different field is important for different query-term
‘james’ is relevant
when it occurs in <to>
‘registration’ is relevant
when it occurs in <subject>
Improved Matching for Email SearchStructured Documents[CIKM09, ECIR09,12]
![Page 19: 랭킹 최적화를 넘어 인간적인 검색으로 - 서울대 융합기술원 발표](https://reader034.fdocuments.net/reader034/viewer/2022042700/559580181a28abd8318b4628/html5/thumbnails/19.jpg)
Estimating the Field Relevance
• If User Provides Feedback
• Relevant document provides sufficient information
• If No Feedback is Available
• Combine field-level term statistics from multiple sources
19
content
title
from/to
Relevant Docs
content
title
from/to
Collection
content
title
from/to
Top-k Docs
+ ≅
![Page 20: 랭킹 최적화를 넘어 인간적인 검색으로 - 서울대 융합기술원 발표](https://reader034.fdocuments.net/reader034/viewer/2022042700/559580181a28abd8318b4628/html5/thumbnails/20.jpg)
20
Retrieval Using the Field Relevance
• Comparison with Previous Work
• Ranking in the Field Relevance Model
q1 q2 ... qm
f1
f2
fn
...
f1
f2
fn
...
w1
w2
wn
w1
w2
wn
q1 q2 ... qm
f1
f2
fn
...
f1
f2
fn
...
P(F1|q1)
P(F2|q1)
P(Fn|q1)
P(F1|qm)
P(F2|qm)
P(Fn|qm)
Per-term Field Weight
Per-term Field Score
sum
multiply
![Page 21: 랭킹 최적화를 넘어 인간적인 검색으로 - 서울대 융합기술원 발표](https://reader034.fdocuments.net/reader034/viewer/2022042700/559580181a28abd8318b4628/html5/thumbnails/21.jpg)
• Retrieval Effectiveness (Metric: Mean Reciprocal Rank)
DQL BM25F MFLM FRM-C FRM-T FRM-R
TREC 54.2% 59.7% 60.1% 62.4% 66.8% 79.4%
IMDB 40.8% 52.4% 61.2% 63.7% 65.7% 70.4%
Monster 42.9% 27.9% 46.0% 54.2% 55.8% 71.6%
Evaluating the Field Relevance Model
21
40.0%
45.0%
50.0%
55.0%
60.0%
65.0%
70.0%
75.0%
80.0%
DQL BM25F MFLM FRM-C FRM-T FRM-R
TREC
IMDB
Monster
Fixed Field Weights Per-term Field Weights
![Page 22: 랭킹 최적화를 넘어 인간적인 검색으로 - 서울대 융합기술원 발표](https://reader034.fdocuments.net/reader034/viewer/2022042700/559580181a28abd8318b4628/html5/thumbnails/22.jpg)
Evaluation Challenges for Personal Search• Evaluation of Personal Search
• Each based on its own user study
• No comparative evaluation was performed yet
• Solution: Simulated Collections
• Crawl CS department webpages, docs and calendars
• Recruit department people for user study
• Collecting User Logs
• DocTrack: a human-computation search game
• Probabilistic User Model: a method for user simulation
22
[CIKM09,SIGIR10,CIKM11]
![Page 23: 랭킹 최적화를 넘어 인간적인 검색으로 - 서울대 융합기술원 발표](https://reader034.fdocuments.net/reader034/viewer/2022042700/559580181a28abd8318b4628/html5/thumbnails/23.jpg)
DocTrack Game
23
Find It!
Target Item
![Page 24: 랭킹 최적화를 넘어 인간적인 검색으로 - 서울대 융합기술원 발표](https://reader034.fdocuments.net/reader034/viewer/2022042700/559580181a28abd8318b4628/html5/thumbnails/24.jpg)
Summary so far…
• Query Modeling for Structured Documents
• Using the estimated field relevance improves the retrieval
• User’s feedback can help personalize the field relevance
• Evaluation Challenges in Personal Search
• Simulation of the search task using game-like structures
• Related work : ‘Find It If You Can’ [SIGIR11]
24
![Page 25: 랭킹 최적화를 넘어 인간적인 검색으로 - 서울대 융합기술원 발표](https://reader034.fdocuments.net/reader034/viewer/2022042700/559580181a28abd8318b4628/html5/thumbnails/25.jpg)
WEB SEARCH
Characterizing Web Content, User Interests, and
Search Behavior by Reading Level and Topic
25
[WSDM12]
![Page 26: 랭킹 최적화를 넘어 인간적인 검색으로 - 서울대 융합기술원 발표](https://reader034.fdocuments.net/reader034/viewer/2022042700/559580181a28abd8318b4628/html5/thumbnails/26.jpg)
Reading level distribution varies across major topical categories
![Page 27: 랭킹 최적화를 넘어 인간적인 검색으로 - 서울대 융합기술원 발표](https://reader034.fdocuments.net/reader034/viewer/2022042700/559580181a28abd8318b4628/html5/thumbnails/27.jpg)
User Modeling by Reading Level and Topic• Reading Level and Topic
• Reading Level: proficiency (comprehensibility)
• Topic: topical areas of interests
• Profile Construction
• Profile Applications
• Improving personalized search ranking
• Enabling expert content recommendation
P(R|d1) P(T|d1)P(R|d1) P(T|d1)
P(R|d1) P(T|d1) P(R,T|u)
![Page 28: 랭킹 최적화를 넘어 인간적인 검색으로 - 서울대 융합기술원 발표](https://reader034.fdocuments.net/reader034/viewer/2022042700/559580181a28abd8318b4628/html5/thumbnails/28.jpg)
Profile matching can predict user’s preference over search results• Metric
• % of user’s preferences predicted by profile matching
• Profile matching measured in KL-Divergence of RT profiles
• Results
• By the degree of focus in user profile
• By the distance metric between user and website
User Group #Clicks KLR(u,s) KLT(u,s) KLRT(u,s)
↑Focused 5,960 59.23% 60.79% 65.27%
147,195 52.25% 54.20% 54.41%
↓Diverse 197,733 52.75% 53.36% 53.63%
![Page 29: 랭킹 최적화를 넘어 인간적인 검색으로 - 서울대 융합기술원 발표](https://reader034.fdocuments.net/reader034/viewer/2022042700/559580181a28abd8318b4628/html5/thumbnails/29.jpg)
Comparing Expert vs. Non-expert URLs• Expert vs. Non-expert URLs taken from [White’09]
Higher Reading Level
Low
er To
pic
Div
ersity
![Page 30: 랭킹 최적화를 넘어 인간적인 검색으로 - 서울대 융합기술원 발표](https://reader034.fdocuments.net/reader034/viewer/2022042700/559580181a28abd8318b4628/html5/thumbnails/30.jpg)
Enabling Browsing for Web Search
• SurfCanyon®
• Recommend results
based on clicks
30
Initial results indicate that
recommendations are useful
for shopping domain.
[Work-in-progress]
![Page 31: 랭킹 최적화를 넘어 인간적인 검색으로 - 서울대 융합기술원 발표](https://reader034.fdocuments.net/reader034/viewer/2022042700/559580181a28abd8318b4628/html5/thumbnails/31.jpg)
BOOK SEARCH
Understanding Book Search Behavior on the Web
31
[Submitted to SIGIR12]
![Page 32: 랭킹 최적화를 넘어 인간적인 검색으로 - 서울대 융합기술원 발표](https://reader034.fdocuments.net/reader034/viewer/2022042700/559580181a28abd8318b4628/html5/thumbnails/32.jpg)
Understanding Book Search on the Web
• OpenLibrary
• User-contributed online digital library
• DataSet: 8M records from web server log
32
![Page 33: 랭킹 최적화를 넘어 인간적인 검색으로 - 서울대 융합기술원 발표](https://reader034.fdocuments.net/reader034/viewer/2022042700/559580181a28abd8318b4628/html5/thumbnails/33.jpg)
Comparison of Navigational Behavior
• Users entering directly show different behaviors from
users entering via web search engines
33
Users entering the site directly Users entering via Google
![Page 34: 랭킹 최적화를 넘어 인간적인 검색으로 - 서울대 융합기술원 발표](https://reader034.fdocuments.net/reader034/viewer/2022042700/559580181a28abd8318b4628/html5/thumbnails/34.jpg)
Comparison of Search Behavior
34
Rich interaction reduces the query lengthsFiltering induces more interactions than search
![Page 35: 랭킹 최적화를 넘어 인간적인 검색으로 - 서울대 융합기술원 발표](https://reader034.fdocuments.net/reader034/viewer/2022042700/559580181a28abd8318b4628/html5/thumbnails/35.jpg)
LOOKING ONWARD
35
![Page 36: 랭킹 최적화를 넘어 인간적인 검색으로 - 서울대 융합기술원 발표](https://reader034.fdocuments.net/reader034/viewer/2022042700/559580181a28abd8318b4628/html5/thumbnails/36.jpg)
Where’s the Future? – Social Search
• The New Bing Sidebar makes search a social activity.
36
![Page 37: 랭킹 최적화를 넘어 인간적인 검색으로 - 서울대 융합기술원 발표](https://reader034.fdocuments.net/reader034/viewer/2022042700/559580181a28abd8318b4628/html5/thumbnails/37.jpg)
Where’s the Future? – Semantic Search
• The New Google serves ‘knowledge’ as well as docs.
37
![Page 38: 랭킹 최적화를 넘어 인간적인 검색으로 - 서울대 융합기술원 발표](https://reader034.fdocuments.net/reader034/viewer/2022042700/559580181a28abd8318b4628/html5/thumbnails/38.jpg)
Where’s the Future? – Siri-like Agent
• The New Google serves ‘knowledge’ as well as docs.
38
![Page 39: 랭킹 최적화를 넘어 인간적인 검색으로 - 서울대 융합기술원 발표](https://reader034.fdocuments.net/reader034/viewer/2022042700/559580181a28abd8318b4628/html5/thumbnails/39.jpg)
Exciting Future is Awaiting US!
• Recommended Readings in IR:
• http://www.cs.rmit.edu.au/swirl12
39
Any
Questions
?
![Page 40: 랭킹 최적화를 넘어 인간적인 검색으로 - 서울대 융합기술원 발표](https://reader034.fdocuments.net/reader034/viewer/2022042700/559580181a28abd8318b4628/html5/thumbnails/40.jpg)
Selected Publications
• Structured Document Retrieval• A Probabilistic Retrieval Model for Semi-structured Data [ECIR09]
• A Field Relevance Model for Structured Document Retrieval [ECIR11]
• Personal Search• Retrieval Experiments using Pseudo-Desktop Collections [CIKM09]
• Ranking using Multiple Document Types in Desktop Search [SIGIR10]
• Building a Semantic Representation for Personal Information [CIKM10]
• Evaluating an Associative Browsing Model for Personal Info. [CIKM11]
• Evaluating Search in Personal Social Media Collections [WSDM12]
• Web / Book Search• Characterizing Web Content, User Interests, and Search Behavior by
Reading Level and Topic [WSDM12]
• Understanding Book Search Behavior on the Web [In submission to SIGIR12]
40
More at @lifidea, or
cs.umass.edu/~jykim
![Page 41: 랭킹 최적화를 넘어 인간적인 검색으로 - 서울대 융합기술원 발표](https://reader034.fdocuments.net/reader034/viewer/2022042700/559580181a28abd8318b4628/html5/thumbnails/41.jpg)
My Self-tracking Efforts
• Life-optimization Project (2002~2006)
• LiFiDeA Project (2011-2012)
41
![Page 42: 랭킹 최적화를 넘어 인간적인 검색으로 - 서울대 융합기술원 발표](https://reader034.fdocuments.net/reader034/viewer/2022042700/559580181a28abd8318b4628/html5/thumbnails/42.jpg)
OPTIONAL SLIDES
42
![Page 43: 랭킹 최적화를 넘어 인간적인 검색으로 - 서울대 융합기술원 발표](https://reader034.fdocuments.net/reader034/viewer/2022042700/559580181a28abd8318b4628/html5/thumbnails/43.jpg)
The Great Divide: IR vs. HCI
IR
• Query / Document
• Relevant Results
• Ranking / Suggestions
• Feature Engineering
• Batch Evaluation (TREC)
• SIGIR / CIKM / WSDM
HCI
• User / System
• User Value / Satisfaction
• Interface / Visualization
• Human-centered Design
• User Study
• CHI / UIST / CSCW
43
Can we learn from each other?
![Page 44: 랭킹 최적화를 넘어 인간적인 검색으로 - 서울대 융합기술원 발표](https://reader034.fdocuments.net/reader034/viewer/2022042700/559580181a28abd8318b4628/html5/thumbnails/44.jpg)
The Great Divide: IR vs. RecSys
IR
• Query / Document
• Reactive (given query)
• SIGIR / CIKM / WSDM
RecSys
• User / Item
• Proactive (push item)
• RecSys / KDD / UMAP
44
![Page 45: 랭킹 최적화를 넘어 인간적인 검색으로 - 서울대 융합기술원 발표](https://reader034.fdocuments.net/reader034/viewer/2022042700/559580181a28abd8318b4628/html5/thumbnails/45.jpg)
The Great Divide: IR in CS vs. LIS
IR in CS
• Focus on ranking &
relevance optimization
• Batch & quantitative
evaluation
• SIGIR / CIKM / WSDM
• UMass / CMU /
Glasgow
IR in LIS
• Focus on behavioral
study & understanding
• User study & qualitative
evaluation
• ASIS&T / JCDL
• UNC / Rutgers / UW
45
![Page 46: 랭킹 최적화를 넘어 인간적인 검색으로 - 서울대 융합기술원 발표](https://reader034.fdocuments.net/reader034/viewer/2022042700/559580181a28abd8318b4628/html5/thumbnails/46.jpg)
• What
• How
Problems & Techniques in IR
Data Format (documents, records and linked data) /
Size / Dynamics (static, dynamic, streaming)
User &
Domain
End User (web and library)
Business User (legal, medical and patent)
System Component (e.g., IBM Watson)
Needs Known-item vs. Exploratory Search
Recommendation
46
System Indexing and Retrieval
(Platforms for Big Data Handling)
Analytics Feature Extraction
Retrieval Model Tuning & Evaluation
Presentation User Interface
Information Visualization
![Page 47: 랭킹 최적화를 넘어 인간적인 검색으로 - 서울대 융합기술원 발표](https://reader034.fdocuments.net/reader034/viewer/2022042700/559580181a28abd8318b4628/html5/thumbnails/47.jpg)
More about the Matching Problem
• Finding Representations
• Term vector vs. Term distribution
• Topical category, Reading level, …
• Estimating Representations
• By counting terms
• Using automatic classifiers
• Calculating Matching Scores
• Cosine similarity vs. KL-divergence
• Combining multiple reps.
47
User
Query
DocumentVisit
IssueSurface