Translating Dialects in Search: Mapping between Specialized Languages of Discourse and Documentary...
-
Upload
kenneth-parks -
Category
Documents
-
view
243 -
download
1
Transcript of Translating Dialects in Search: Mapping between Specialized Languages of Discourse and Documentary...
![Page 1: Translating Dialects in Search: Mapping between Specialized Languages of Discourse and Documentary Languages Vivien Petras UC Berkeley School of Information.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649f4e5503460f94c700ad/html5/thumbnails/1.jpg)
Translating Dialects in Search:
Mapping between Specialized Languages of Discourse
and Documentary Languages
Vivien Petras
UC Berkeley School of Information
![Page 2: Translating Dialects in Search: Mapping between Specialized Languages of Discourse and Documentary Languages Vivien Petras UC Berkeley School of Information.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649f4e5503460f94c700ad/html5/thumbnails/2.jpg)
Overcoming the Language Problem in Search
How can someone searching for violins be made aware that there are also fiddles (and vice versa)?
![Page 3: Translating Dialects in Search: Mapping between Specialized Languages of Discourse and Documentary Languages Vivien Petras UC Berkeley School of Information.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649f4e5503460f94c700ad/html5/thumbnails/3.jpg)
• The Language Problem in Information Retrieval
• Dialects & Contexts
• The Search Term Recommender
• 4 Research Questions
• Exploratory Web Interface
Outline
![Page 4: Translating Dialects in Search: Mapping between Specialized Languages of Discourse and Documentary Languages Vivien Petras UC Berkeley School of Information.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649f4e5503460f94c700ad/html5/thumbnails/4.jpg)
“how to obtain the right information for the right user
at the right time” (Chu, 2003)
Decision Process under Uncertainty
Information Retrieval
![Page 5: Translating Dialects in Search: Mapping between Specialized Languages of Discourse and Documentary Languages Vivien Petras UC Berkeley School of Information.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649f4e5503460f94c700ad/html5/thumbnails/5.jpg)
• Searching the Needle in the Haystack
• Which Needle in which Haystack
• How to express the Needle and the Haystack
Language Problem in Information Retrieval
Decision Process under Uncertainty
![Page 6: Translating Dialects in Search: Mapping between Specialized Languages of Discourse and Documentary Languages Vivien Petras UC Berkeley School of Information.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649f4e5503460f94c700ad/html5/thumbnails/6.jpg)
SearcherAuthor
Concept Space
Concept Space
QuestionText
Search Statement
Match!
• Mapping between searcher and IR system
• Mapping between author and IR system
• Mapping between search statement and document
Document
Language Mapping
![Page 7: Translating Dialects in Search: Mapping between Specialized Languages of Discourse and Documentary Languages Vivien Petras UC Berkeley School of Information.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649f4e5503460f94c700ad/html5/thumbnails/7.jpg)
IR = Language Mapping Exercise
Searcher
Concept Space
Question
Search Statement
Document
Match!
Information Retrieval
A search statement needs to describe the:• searcher’s question (information need) • documents that are relevant to a searcher’s question
![Page 8: Translating Dialects in Search: Mapping between Specialized Languages of Discourse and Documentary Languages Vivien Petras UC Berkeley School of Information.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649f4e5503460f94c700ad/html5/thumbnails/8.jpg)
In Linguistics:
unlimited semiosis
In Information Science:
Inter-indexer inconsistency (20-60%)
The Language Problem
![Page 9: Translating Dialects in Search: Mapping between Specialized Languages of Discourse and Documentary Languages Vivien Petras UC Berkeley School of Information.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649f4e5503460f94c700ad/html5/thumbnails/9.jpg)
How to alleviate language ambiguity?
Ludwig Wittgenstein:• Language games• Language regions
Language is disambiguated within contexts and specialized dialects.
Dialects and Contexts
![Page 10: Translating Dialects in Search: Mapping between Specialized Languages of Discourse and Documentary Languages Vivien Petras UC Berkeley School of Information.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649f4e5503460f94c700ad/html5/thumbnails/10.jpg)
How to alleviate language ambiguity for search term selection?
Support search term selection:• Within the dialect of a specialized community• In context• Using the language of documents (for term matching)
Dialects and Contexts
![Page 11: Translating Dialects in Search: Mapping between Specialized Languages of Discourse and Documentary Languages Vivien Petras UC Berkeley School of Information.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649f4e5503460f94c700ad/html5/thumbnails/11.jpg)
Search Term Recommender
Search Statement
SpecialtySpecialty
Specialty
Specialty
Specialty
SpecialtySpecialty
Did you mean…
Specialty Term
Specialty Term
Specialty Term
Specialty TermInformation Collection
![Page 12: Translating Dialects in Search: Mapping between Specialized Languages of Discourse and Documentary Languages Vivien Petras UC Berkeley School of Information.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649f4e5503460f94c700ad/html5/thumbnails/12.jpg)
Search Term Recommender
![Page 13: Translating Dialects in Search: Mapping between Specialized Languages of Discourse and Documentary Languages Vivien Petras UC Berkeley School of Information.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649f4e5503460f94c700ad/html5/thumbnails/13.jpg)
• Divide information collection by specialty
• Association between – specialty terms– documentary terms (subject metadata)
• Recommend highly associated terms
The Search Term Recommender Methodology
![Page 14: Translating Dialects in Search: Mapping between Specialized Languages of Discourse and Documentary Languages Vivien Petras UC Berkeley School of Information.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649f4e5503460f94c700ad/html5/thumbnails/14.jpg)
• Term selection support (query expansion & reformulation)
• Automatic classification
• Terminology mapping
The Search Term Recommender: Applications
![Page 15: Translating Dialects in Search: Mapping between Specialized Languages of Discourse and Documentary Languages Vivien Petras UC Berkeley School of Information.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649f4e5503460f94c700ad/html5/thumbnails/15.jpg)
1. How can specialties & specialty dialects be identified in an information collection?
2. Do specialty dialects really differ?
3. Is performance improved when focusing on specialty dialects?
4. How specific should specialties be?
Tested on 2 bibliographic collections:• Inspec• Medline (Ohsumed collection)
The Search Term Recommender - Questions
![Page 16: Translating Dialects in Search: Mapping between Specialized Languages of Discourse and Documentary Languages Vivien Petras UC Berkeley School of Information.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649f4e5503460f94c700ad/html5/thumbnails/16.jpg)
• Physics, Electrical and Electronic Engineering, Computers and Control
• Document: author, title, source, publication year, abstract, Inspec thesaurus descriptors, Inspec classification codes
• Test collection:
Inspec
Number of documents 427,340
Descriptors / Document 6.99
![Page 17: Translating Dialects in Search: Mapping between Specialized Languages of Discourse and Documentary Languages Vivien Petras UC Berkeley School of Information.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649f4e5503460f94c700ad/html5/thumbnails/17.jpg)
• Biomedicine and Health
• Document: author, title, source, publication year, publication type, abstract, Mesh Headings
• Test collection:
Medline Ohsumed Collection
Number of documents 168,463
Mesh Headings / Document 3.11
![Page 18: Translating Dialects in Search: Mapping between Specialized Languages of Discourse and Documentary Languages Vivien Petras UC Berkeley School of Information.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649f4e5503460f94c700ad/html5/thumbnails/18.jpg)
1. How can specialties be identified in an information collection?
2. Do specialty dialects really differ?
3. Is performance improved when focusing on specialty dialects?
4. How specific should specialties be?
Tested on 2 bibliographic collections:• Inspec• Medline (Ohsumed collection)
The Search Term Recommender System - Questions
![Page 19: Translating Dialects in Search: Mapping between Specialized Languages of Discourse and Documentary Languages Vivien Petras UC Berkeley School of Information.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649f4e5503460f94c700ad/html5/thumbnails/19.jpg)
• Domain terminology
• Publication source
• Bibliometric analysis
• Social network analysis
• Subject-specific classification
Determine specialty documents in the collection:
![Page 20: Translating Dialects in Search: Mapping between Specialized Languages of Discourse and Documentary Languages Vivien Petras UC Berkeley School of Information.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649f4e5503460f94c700ad/html5/thumbnails/20.jpg)
Inspec test collection• by top-level categories in the Inspec classification• 3 specialties: Physics, Electrical & Electronic
Engineering, Computers & Control
Ohsumed test collection• by journals grouped by subject• 33 specialties
Identification of Specialties in an Information Collection
![Page 21: Translating Dialects in Search: Mapping between Specialized Languages of Discourse and Documentary Languages Vivien Petras UC Berkeley School of Information.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649f4e5503460f94c700ad/html5/thumbnails/21.jpg)
1. How can specialties be identified in an information collection?
2. Do specialty dialects really differ?
3. Is performance improved when focusing on specialty dialects?
4. How specific should specialties be?
Tested on 2 bibliographic collections:• Inspec• Medline (Ohsumed collection)
The Search Term Recommender System - Questions
![Page 22: Translating Dialects in Search: Mapping between Specialized Languages of Discourse and Documentary Languages Vivien Petras UC Berkeley School of Information.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649f4e5503460f94c700ad/html5/thumbnails/22.jpg)
Differences in specialty dialects (specialty term overlap)
Differences in documentary languages (subject metadata term overlap)
Differences in search term recommender suggestions (term suggestion overlap)
Differences in Language
![Page 23: Translating Dialects in Search: Mapping between Specialized Languages of Discourse and Documentary Languages Vivien Petras UC Berkeley School of Information.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649f4e5503460f94c700ad/html5/thumbnails/23.jpg)
Inspec Dialects (specialty term overlap)
20%
7%
13%
13%
4%
33%
13%
Physics
ElectricalEngineering
Computers
terms analyzed: 60,601
Subject metadata term overlap: 87%Suggested term overlap: 30%
![Page 24: Translating Dialects in Search: Mapping between Specialized Languages of Discourse and Documentary Languages Vivien Petras UC Berkeley School of Information.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649f4e5503460f94c700ad/html5/thumbnails/24.jpg)
Ohsumed Dialects (Specialty term overlap)
terms analyzed: 11,663
Subject metadata term overlap: 32%Suggested term overlap: 30%
13%
29%
8%
19%
2%
21%
7%
CommunicableDiseases
GynecologyOrthopedics
![Page 25: Translating Dialects in Search: Mapping between Specialized Languages of Discourse and Documentary Languages Vivien Petras UC Berkeley School of Information.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649f4e5503460f94c700ad/html5/thumbnails/25.jpg)
1. How can specialties be identified in an information collection?
2. Do specialty dialects really differ?
3. Is performance improved when focusing on specialty dialects?
4. How specific should specialties be?
Tested on 2 bibliographic collections:• Inspec• Medline (Ohsumed collection)
The Search Term Recommender System - Questions
![Page 26: Translating Dialects in Search: Mapping between Specialized Languages of Discourse and Documentary Languages Vivien Petras UC Berkeley School of Information.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649f4e5503460f94c700ad/html5/thumbnails/26.jpg)
Comparison: specialty vs. general term suggestions
Automatic classification
![Page 27: Translating Dialects in Search: Mapping between Specialized Languages of Discourse and Documentary Languages Vivien Petras UC Berkeley School of Information.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649f4e5503460f94c700ad/html5/thumbnails/27.jpg)
Title: “A search for clusters of protostars in Orion cloud cores”
Automatic Classification
Originally assigned terms
Specialty Search Term Recommender
General Search Term Recommender
1. Infrared sources (astronomical)
2. Interstellar molecular clouds
3. Pre-main-sequence stars
4. Star associations
1. Clouds
2. Clusters of galaxies
3. Interstellar molecular clouds
4. Star clusters
5. Pre-main-sequence stars
1. Search problems
2. Clouds
3. Atomic clusters
4. Clusters of galaxies
5. Interstellar molecular clouds
Recall: Hit rate 2/4 = 0.5 1/4 = 0.25
Precision: Accuracy 2/5 = 0.4 1/5 = 0.2
Evaluation
![Page 28: Translating Dialects in Search: Mapping between Specialized Languages of Discourse and Documentary Languages Vivien Petras UC Berkeley School of Information.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649f4e5503460f94c700ad/html5/thumbnails/28.jpg)
Performance of the STR: Inspec
Inspec specialties and general STRs
0.0
0.1
0.2
0.3
0.4
0.5
0.0 0.1 0.2 0.3 0.4 0.5Recall
Pre
cisi
on
Individual Specialty STRs
General STR
Test Documents: 42,735
Specialties: 3
First 3 suggested:
Recall: 13.6%
Precision: 11.2%
![Page 29: Translating Dialects in Search: Mapping between Specialized Languages of Discourse and Documentary Languages Vivien Petras UC Berkeley School of Information.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649f4e5503460f94c700ad/html5/thumbnails/29.jpg)
Performance of the STR: Ohsumed
Ohsumed specialties and general STR
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7Recall
Prec
isio
n
Individual Specialty STRs
General STR
First 3 suggested:
Recall: 26%
Precision: 25.6%
Test Documents: 18,733
Specialties: 33
![Page 30: Translating Dialects in Search: Mapping between Specialized Languages of Discourse and Documentary Languages Vivien Petras UC Berkeley School of Information.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649f4e5503460f94c700ad/html5/thumbnails/30.jpg)
1. How can specialties be identified in an information collection?
2. Do specialty dialects really differ?
3. Is performance improved when focusing on specialty dialects?
4. How specific should specialties be?
Tested on 2 bibliographic collections:• Inspec• Medline (Ohsumed collection)
The Search Term Recommender System - Questions
![Page 31: Translating Dialects in Search: Mapping between Specialized Languages of Discourse and Documentary Languages Vivien Petras UC Berkeley School of Information.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649f4e5503460f94c700ad/html5/thumbnails/31.jpg)
• Language differences
• Collection sizes for training
Specificity of Specialties
![Page 32: Translating Dialects in Search: Mapping between Specialized Languages of Discourse and Documentary Languages Vivien Petras UC Berkeley School of Information.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649f4e5503460f94c700ad/html5/thumbnails/32.jpg)
Identifying subspecialties by classification hierarchy– e.g. Computers & Control -- Computer Hardware -- Circuits &
Devices
Specificity of Specialties - Inspec
Four levels of specificity in the Inspec collection
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.0 0.1 0.2 0.3 0.4 0.5 0.6Recall
Pre
cisi
on
Sub-sub specialty STR
Sub-specialty STR
Specialty STR
General STR
Test documents: 2425 Specialties: 3
![Page 33: Translating Dialects in Search: Mapping between Specialized Languages of Discourse and Documentary Languages Vivien Petras UC Berkeley School of Information.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649f4e5503460f94c700ad/html5/thumbnails/33.jpg)
Identifying subspecialties by journal within subject– e.g. Orthopedics -- Clinical Orthopaedics & Related Research
journal
Specificity of Specialties - Ohsumed
Three levels of specificity in the Ohsumed Collection
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7Recall
Pre
cisi
on
Journal STR
Specialty STR
General STR
Test documents: 745 Specialties: 3
![Page 34: Translating Dialects in Search: Mapping between Specialized Languages of Discourse and Documentary Languages Vivien Petras UC Berkeley School of Information.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649f4e5503460f94c700ad/html5/thumbnails/34.jpg)
Inspec
http://metadata.sims.berkeley.edu/str/inspec/inspec.html
Ohsumed
http://metadata.sims.berkeley.edu/str/ohsumed/ohsumed.html
Exploratory Web Interfaces
![Page 35: Translating Dialects in Search: Mapping between Specialized Languages of Discourse and Documentary Languages Vivien Petras UC Berkeley School of Information.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649f4e5503460f94c700ad/html5/thumbnails/35.jpg)
1. How can specialties be identified in an information collection?– Inspec: subject-specific classification– Ohsumed: journal specialty area
2. Do specialty dialects really differ?– Inspec specialties: term overlap 50%, suggestions overlap 30%– Ohsumed specialties: term overlap 30%, suggestions overlap 30%
3. Is performance improved when focusing on specialty dialects?– Inspec specialties: 10% improvement over general STR– Ohsumed specialties: 25% improvement over general STR
4. How specific should specialties be?– Depends: on language differences & collection size
Summary
![Page 36: Translating Dialects in Search: Mapping between Specialized Languages of Discourse and Documentary Languages Vivien Petras UC Berkeley School of Information.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649f4e5503460f94c700ad/html5/thumbnails/36.jpg)
Overcoming the Language Problem in Search
Search Term Recommender:
See also:
FIDDLES
50% Discount!
Thank you!