Case Study: JSTOR: A Year Later
-
Upload
accessinnovations -
Category
Documents
-
view
547 -
download
5
description
Transcript of Case Study: JSTOR: A Year Later
JSTOR Case StudyA Year LaterSharon Garewal, Metadata Librarian
AgendaIntroductionWhere we left offWhere we are nowMaintenance and Editing Process
◦Training◦Documents◦Workflows
Where we are going
INTRODUCTIONThe Numbers: 1,061Participating publishers1,936 Academic journals8,952,463 Articles and counting21,737 Books and counting25,976 19th Century British PamphletsPublishing dates: 1545 CE to 2014 CE138 Million searches performed in 2013
Social Sciences
Humanities
History
Science & Mathematics
Business & Economics
Arts
Law Area Studies Medicine & Allied Health
JSTOR Subject Areas
Where we left off: Pilot projectGoal: To better understand the use and deployment
of thesauri & test selected thesauri on our disciplines◦ Vendor: Access Innovations (AI) Data Harmony Software:
MAIStro for auto-indexing and creating a rulebase.
◦ Selection: 15k articles from 3 disciplines. 3 thesauri selected: NICEM for History, CABI for
Science and AMB for Business Auto-indexed and assigned terms. Added to rulebase to improve indexing.
◦ Lessons learned: Selecting the (right) thesauri is important Rule building increases accuracy from 70% to 88% Maintenance needs to be on-going
Building of the thesaurus Access Innovations (AI)
◦ Selected, collected and imported sources from 17+ source vocabularies
◦ Merged the lists which included sorting terms hierarchically and removing duplicates.
◦ Build and tested rules◦ Used search logs, discipline lists
and access to our content Construction standards
◦ ANSI (American National Standards Institute)
◦ NISO (National Information Standards Organization) ANSI/NISO Z39-19.2005
◦ ISO (International Standards Organization) ISO 2788, ISO 5964
◦ BS (British Standards Institute) BS 8723 parts 1-4
AI Business thesaurus AI Calculus thesaurus AI Economics thesaurus AI Geology thesaurus AI Law thesaurus AI Psychology thesaurus ASIS&T CABI ERIC Ethnographic thesaurus EuroVoc Getty Arts and Architecture
thesaurus Glossary of Statistics MeSH (abridged) NASA Thesaurus NAL NICEM Philosopher’s Index Thesaurus Statistics Canada National Transportation Library
Where we are nowThesaurus was officially delivered: June 2013Continued editorial relationship with AIAdded two more JSTOR Librarians to thesaurus
teamThesaurus Statistics
◦ Preferred Terms: 56,913◦ Equivalent Terms: 41,608◦ Top Terms or Branches: 18
Terms with [at least one] Related Term: 18,965Rulebase Statistics
◦ Rules: 100,737
JTHES
SharePoint Site
Maintenance and Editing Process: TrainingTraining Power Point
◦Pre-training day activities-Reading through standards, attend meetings…
◦3-4 day hands on training in MAIstroWeekly tasks
◦Adding, deleting and moving terms; Changing the capitalization of terms; Searching the rulebase; Interpreting complex rules; Adding complex rules
11
Reviewing a branch
Orient yourself in the branchReview terms in the branchTake notesResearch the branchOrganize the branch
◦Best practices◦Decision tree
12
Keep the following in mind when reviewing terms:
◦ Appropriacy: Is the term appropriate to the target audience?
◦ Belonging: Does the concept fit within the coverage of the thesaurus structure?
◦ Consistency: Is the term stylistically consistent with the other terms in the thesaurus structure?
◦ Currency: Does the term reflect the most current common usage for the concept?
◦ Distinctiveness: Does the term clearly represent a distinction that is important to the audience?
◦ Implication: Does the term imply additional concepts or terms?
◦ Novelty: Does the term refer to a concept that is not already in the thesaurus?
◦ Standardization: Is the term part of an authorized standard vocabulary for which there is a compliance requirement?
◦ Structure: Does a proposed new concept/term, along with others, warrant a new branch in the thesaurus?
◦ Technical Accuracy: Does the term accurately reflect the intended meaning to the intended audience?
◦ Warrant: Can you find explicit warrant (support) for your concept/term in: The JSTOR corpus, its usage, standard vocabularies for which there are compliance requirements? (i.e., user warrant & literary warrant)
From,: Weise, C. Criteria for Term Selection in Your Taxonomy, Feb. 1, 2013.
13
Costume design is the fabrication of clothing for the overall appearance of a character or performer. Costume is specific in the style of dress particular to a nation, a class, or a period…
14
Review of Costume design branch using Word
15
Review of Sociology branch using Google Docs
16
Does the term already exist in the thesaurus?
If no, search JSTOR
How many search results are there?
Less than 100 hits, do not add.
More than 100 hits. Investigate
.
What is the term about? What journals and topics are associated
with the term?
Does the term appear in article-titles? citations? abstracts? Body
of the text. This is subjective
Add term
If yes, look at where it lives and see if you can add any NPT’s or RT’s.
17
Simple Rules
18
Complex rulesProximity
◦NEAR: Within 3 words of text-to-match. Used for phrasings of a term, prepositional phrases etc.
◦WITH: Within the same sentence of text-to-match. This is the most common/default.
◦AROUND: Within 50 words of text-to-match, which is approximately one paragraph.
◦MENTIONS: Within 250 words of text-to-match, which is approximately one page. Helps cut down on noise by establishing the broadest area possible. Not used as frequently.
19
Complex rules continuedALL CAPS
◦ Text to match: sat IF (ALL CAPS) USE Standardized tests
INITIAL CAPS◦ Text to match: bush IF (INITIAL CAPS) USE U.S. Presidents
MATCH◦ Text to match: IF (MATCH “musicianship”) USE
Musicianship
BEGINS SENTENCE or ENDS SENTENCE◦ Text to match: chronicle IF(BEGINS SENTENCE) USE
History◦ Text to match: lol IF (ENDS SENTENCE) USE Humor
Remember to use Booleans! AND, OR,
NOT
ELSE & ELSE IF
20
21
Testing articles Choose an article from JSTOR to
copy/paste into Test MAI tab. Evaluate the list of MAI suggested
terms. MAI Suggested Terms;
Temperature|(54) temperature(54) Circadian rhythm|(9) temperature compensation(9) Parametric models|(6) model*(6) Biochemistry|(5) biochemical(4) biochemistry(1) Temperature dependence|(5) dependen*(3) temperature dependence(2)
The term on the left side of the | is the MAI suggested term; the term on the right is the word that triggered it.
Hits – System accurately and correctly suggests indexing terms chosen by the editor. No additional rulebuiding is necessary.
Misses – System misses terms the editor uses. Reviewing articles, following a gap analysis is necessary to identify misses. Rulebuilding and possible additional term building is necessary.
Noise – System suggests terms not used by editor or incorrectly suggests a term that is used by the editor but it’s meaning is not accurately represented. Rulebuilding is necessary.
22
23
24
Maintenance and Editing Process: DocumentsHow-To-Guides
◦How to configure Unicode setting in web browsers
◦How to correct capitalization in the term record
◦How to export a sub-branch◦How to install MAIstro on your computer◦How to remove Related Terms which
appear in the same branchTerm Building InstructionsRule Building Instructions
Instructions for Terms and Rule building include key terms, definitions and best practices.
Parking Lots“Parking Lots” are a way to keep
track of terms that we want to look into and rules we need to build.
Maintenance and Editing Process: WorkflowsData analysis
◦General accuracy: Sampling of 1000 articles across content types and disciplines.
◦Subject specific: Sampling on specific disciplines and/or journals.
New content assessment◦Weekly review of newly signed content
Search log review◦Done semiannually; Report of searched terms
in JSTOR. Review ranking of terminology and how term usage changes over time. Finding new acronyms.
Where we want to goImplementation onto the platform in
2014◦Currently working with teams in JSTOR
such as UX and Analytics to run experiments and gather metrics.
Name fileStaffing and resources
◦Continue to train additional Librarians and create additional workflows.
SME’s◦Set up a system to work with SME’s
Thank YouContact information:[email protected]