Case Study: JSTOR: A Year Later

31
JSTOR Case Study A Year Later Sharon Garewal, Metadata Librarian

description

Presented at the 10th annual Data Harmony Users Group meeting on Tuesday, February 11, 2014 by Sharon Garewal of JSTOR. JSTOR is an archive of over 8 million articles, book chapters, and primary source content. Last year at DHUG, a case study on the then incomplete JSTOR Thesaurus was presented. This presentation will give an update on how the completed thesaurus has been constructed and how branches are currently being reviewed and revised. Training materials and workflow processes, which have been documented for maintenance and editing of the thesaurus, will be shared. Discussions on finding and working with subject matter experts and the triumphs and challenges that have been encountered along the way will be highlighted.

Transcript of Case Study: JSTOR: A Year Later

Page 1: Case Study: JSTOR: A Year Later

JSTOR Case StudyA Year LaterSharon Garewal, Metadata Librarian

Page 2: Case Study: JSTOR: A Year Later

AgendaIntroductionWhere we left offWhere we are nowMaintenance and Editing Process

◦Training◦Documents◦Workflows

Where we are going

Page 3: Case Study: JSTOR: A Year Later

INTRODUCTIONThe Numbers: 1,061Participating publishers1,936 Academic journals8,952,463 Articles and counting21,737 Books and counting25,976 19th Century British PamphletsPublishing dates: 1545 CE to 2014 CE138 Million searches performed in 2013

Page 4: Case Study: JSTOR: A Year Later

Social Sciences

Humanities

History

Science & Mathematics

Business & Economics

Arts

Law Area Studies Medicine & Allied Health

JSTOR Subject Areas

Page 5: Case Study: JSTOR: A Year Later

Where we left off: Pilot projectGoal: To better understand the use and deployment

of thesauri & test selected thesauri on our disciplines◦ Vendor: Access Innovations (AI) Data Harmony Software:

MAIStro for auto-indexing and creating a rulebase.

◦ Selection: 15k articles from 3 disciplines. 3 thesauri selected: NICEM for History, CABI for

Science and AMB for Business Auto-indexed and assigned terms. Added to rulebase to improve indexing.

◦ Lessons learned: Selecting the (right) thesauri is important Rule building increases accuracy from 70% to 88% Maintenance needs to be on-going

Page 6: Case Study: JSTOR: A Year Later

Building of the thesaurus Access Innovations (AI)

◦ Selected, collected and imported sources from 17+ source vocabularies

◦ Merged the lists which included sorting terms hierarchically and removing duplicates.

◦ Build and tested rules◦ Used search logs, discipline lists

and access to our content Construction standards

◦ ANSI (American National Standards Institute)

◦ NISO (National Information Standards Organization) ANSI/NISO Z39-19.2005

◦ ISO (International Standards Organization) ISO 2788, ISO 5964

◦ BS (British Standards Institute) BS 8723 parts 1-4

AI Business thesaurus AI Calculus thesaurus AI Economics thesaurus AI Geology thesaurus AI Law thesaurus AI Psychology thesaurus ASIS&T CABI ERIC Ethnographic thesaurus EuroVoc Getty Arts and Architecture

thesaurus Glossary of Statistics MeSH (abridged) NASA Thesaurus NAL NICEM Philosopher’s Index Thesaurus Statistics Canada National Transportation Library

Page 7: Case Study: JSTOR: A Year Later

Where we are nowThesaurus was officially delivered: June 2013Continued editorial relationship with AIAdded two more JSTOR Librarians to thesaurus

teamThesaurus Statistics

◦ Preferred Terms: 56,913◦ Equivalent Terms: 41,608◦ Top Terms or Branches: 18

Terms with [at least one] Related Term: 18,965Rulebase Statistics

◦ Rules: 100,737

Page 8: Case Study: JSTOR: A Year Later

JTHES

Page 9: Case Study: JSTOR: A Year Later

SharePoint Site

Page 10: Case Study: JSTOR: A Year Later

Maintenance and Editing Process: TrainingTraining Power Point

◦Pre-training day activities-Reading through standards, attend meetings…

◦3-4 day hands on training in MAIstroWeekly tasks

◦Adding, deleting and moving terms; Changing the capitalization of terms; Searching the rulebase; Interpreting complex rules; Adding complex rules

Page 11: Case Study: JSTOR: A Year Later

11

Reviewing a branch

Orient yourself in the branchReview terms in the branchTake notesResearch the branchOrganize the branch

◦Best practices◦Decision tree

Page 12: Case Study: JSTOR: A Year Later

12

Keep the following in mind when reviewing terms:

◦ Appropriacy: Is the term appropriate to the target audience?

◦ Belonging: Does the concept fit within the coverage of the thesaurus structure?

◦ Consistency: Is the term stylistically consistent with the other terms in the thesaurus structure?

◦ Currency: Does the term reflect the most current common usage for the concept?

◦ Distinctiveness: Does the term clearly represent a distinction that is important to the audience?

◦ Implication: Does the term imply additional concepts or terms?

◦ Novelty: Does the term refer to a concept that is not already in the thesaurus?

◦ Standardization: Is the term part of an authorized standard vocabulary for which there is a compliance requirement?

◦ Structure: Does a proposed new concept/term, along with others, warrant a new branch in the thesaurus?

◦ Technical Accuracy: Does the term accurately reflect the intended meaning to the intended audience?

◦ Warrant: Can you find explicit warrant (support) for your concept/term in: The JSTOR corpus, its usage, standard vocabularies for which there are compliance requirements? (i.e., user warrant & literary warrant)

From,: Weise, C. Criteria for Term Selection in Your Taxonomy, Feb. 1, 2013.

Page 13: Case Study: JSTOR: A Year Later

13

Costume design is the fabrication of clothing for the overall appearance of a character or performer. Costume is specific in the style of dress particular to a nation, a class, or a period…

Page 14: Case Study: JSTOR: A Year Later

14

Review of Costume design branch using Word

Page 15: Case Study: JSTOR: A Year Later

15

Review of Sociology branch using Google Docs

Page 16: Case Study: JSTOR: A Year Later

16

Does the term already exist in the thesaurus?

If no, search JSTOR

How many search results are there?

Less than 100 hits, do not add.

More than 100 hits. Investigate

.

What is the term about? What journals and topics are associated

with the term?

Does the term appear in article-titles? citations? abstracts? Body

of the text. This is subjective

Add term

If yes, look at where it lives and see if you can add any NPT’s or RT’s.

Page 17: Case Study: JSTOR: A Year Later

17

Simple Rules

Page 18: Case Study: JSTOR: A Year Later

18

Complex rulesProximity

◦NEAR: Within 3 words of text-to-match. Used for phrasings of a term, prepositional phrases etc.

◦WITH: Within the same sentence of text-to-match. This is the most common/default.

◦AROUND: Within 50 words of text-to-match, which is approximately one paragraph.

◦MENTIONS: Within 250 words of text-to-match, which is approximately one page. Helps cut down on noise by establishing the broadest area possible. Not used as frequently.

Page 19: Case Study: JSTOR: A Year Later

19

Complex rules continuedALL CAPS

◦ Text to match: sat IF (ALL CAPS) USE Standardized tests

INITIAL CAPS◦ Text to match: bush IF (INITIAL CAPS) USE U.S. Presidents

MATCH◦ Text to match: IF (MATCH “musicianship”) USE

Musicianship

BEGINS SENTENCE or ENDS SENTENCE◦ Text to match: chronicle IF(BEGINS SENTENCE) USE

History◦ Text to match: lol IF (ENDS SENTENCE) USE Humor

Remember to use Booleans! AND, OR,

NOT

ELSE & ELSE IF

Page 20: Case Study: JSTOR: A Year Later

20

Page 21: Case Study: JSTOR: A Year Later

21

Page 22: Case Study: JSTOR: A Year Later

Testing articles Choose an article from JSTOR to

copy/paste into Test MAI tab. Evaluate the list of MAI suggested

terms. MAI Suggested Terms;

Temperature|(54) temperature(54) Circadian rhythm|(9) temperature compensation(9) Parametric models|(6) model*(6) Biochemistry|(5) biochemical(4) biochemistry(1) Temperature dependence|(5) dependen*(3) temperature dependence(2)

The term on the left side of the | is the MAI suggested term; the term on the right is the word that triggered it. 

Hits – System accurately and correctly suggests indexing terms chosen by the editor. No additional rulebuiding is necessary.

Misses – System misses terms the editor uses. Reviewing articles, following a gap analysis is necessary to identify misses. Rulebuilding and possible additional term building is necessary.

Noise – System suggests terms not used by editor or incorrectly suggests a term that is used by the editor but it’s meaning is not accurately represented. Rulebuilding is necessary.

22

Page 23: Case Study: JSTOR: A Year Later

23

Page 24: Case Study: JSTOR: A Year Later

24

Page 25: Case Study: JSTOR: A Year Later

Maintenance and Editing Process: DocumentsHow-To-Guides

◦How to configure Unicode setting in web browsers

◦How to correct capitalization in the term record

◦How to export a sub-branch◦How to install MAIstro on your computer◦How to remove Related Terms which

appear in the same branchTerm Building InstructionsRule Building Instructions

Page 26: Case Study: JSTOR: A Year Later

Instructions for Terms and Rule building include key terms, definitions and best practices.

Page 27: Case Study: JSTOR: A Year Later

Parking Lots“Parking Lots” are a way to keep

track of terms that we want to look into and rules we need to build.

Page 28: Case Study: JSTOR: A Year Later

Maintenance and Editing Process: WorkflowsData analysis

◦General accuracy: Sampling of 1000 articles across content types and disciplines.

◦Subject specific: Sampling on specific disciplines and/or journals.

New content assessment◦Weekly review of newly signed content

Search log review◦Done semiannually; Report of searched terms

in JSTOR. Review ranking of terminology and how term usage changes over time. Finding new acronyms.

Page 29: Case Study: JSTOR: A Year Later
Page 30: Case Study: JSTOR: A Year Later

Where we want to goImplementation onto the platform in

2014◦Currently working with teams in JSTOR

such as UX and Analytics to run experiments and gather metrics.

Name fileStaffing and resources

◦Continue to train additional Librarians and create additional workflows.

SME’s◦Set up a system to work with SME’s

Page 31: Case Study: JSTOR: A Year Later

Thank YouContact information:[email protected]