Uniting the Silos at the NLI - Association of Jewish...
Transcript of Uniting the Silos at the NLI - Association of Jewish...
Uniting the Silos at the NLI Accomplishments and Challenges
Elhanan Adler
An information silo is a management system incapable of reciprocal operation with other, related information systems. … "Information silo" is a pejorative expression that is useful for describing the absence of operational reciprocity. In Information Technology, the absence of operational reciprocity is between disparate systems also commonly referred to as disparate data systems. Derived variants are "silo thinking", "silo vision", and "silo mentality". Wikipedia 30/5/2013
Silos at the NLI
NLI Catalog •Names:
•Latin a/b •Hebrew a/b •Arabic a/b •Cyrillic a/b
•Subjects: •Latin a/b (LCSH) •Music Dept. •Mss. subjects
RAMBI (Index to Articles in Jewish Studies) •Names:
•Latin a/b •Hebrew a/b •Cyrillic a/b
•Subjects (unique): •Latin a/b •Hebrew a/b
Bibliography of the Hebrew Book (BHB) •Names:
•Hebrew a/b
Stages
1. Unite alphabetic sub-silos
2. Enabling cross-silo name searching
3. Enabling cross-silo subject searching
Step 1 – uniting alphabetic sub-silos
• Clustering name headings (RAMBI and NLI catalog) : searching any alphabet retrieves the same name in all others
• Clustering subjects (RAMBI): searching subject in either Hebrew or English retrieves both
• Converting subjects (NLI catalog): convert remaining non-LCSH subjects to LCSH
• Work in progress
Clustering headings
• Multilingual authority records (multiple 1xx fields) [not standard MARC!]
• ALEPH functionality supports searching one to retrieve the others
• Currently functions in ALEPH OPAC, soon available in new library discovery tool (“Merhav” – Ex Libris Primo)
Clustering subjects: RAMBI (June 2013 figures)
• Title/topical/geog. authorities (130/150/151): – Total: 31,458
– Clustered: 21,714 (10,852 authority clusters): 69%
– Unclustered: 9,744 31% (titles: 35%, topical: 5%, geographical: 60%)
• Title/topical/geog. subjects (630/650/651): – Total: 608,812
– Clustered: 591,415 – 97%
– Unclustered: 17,396 – 3%
Clustering names: NLI catalog (June 2013 figures)
• Began with 8,876 matches located by VIAF (Virtual international Authority File). Aug. 2012
• Ongoing matching and clustering of authority records in current cataloging and using various algorithms
• Personal name authority headings (MARC tag 100) – Total: 522,542 – Clustered: 42,728 (20,669 authority clusters) – 8% – Unclustered: 479,813
• Personal name bibliographic headings (100,700) – Total: 646,201 – Clustered: 125,529 - 19% of all 100/700, 51% of authority controlled
100/700 – Unclustered: 218,573 – No authority record: 302,099
Multiscript name authority record
Multiscript search
Retrieves all scripts
Translating LCSH headings to Hebrew (creating English-Hebrew clusters)
• The NLI uses LCSH-style English subject headings for all uniform title, topical and geographic subjects (MARC 630/650/651)
• In order to enable Hebrew subject searching, the NLI is creating parallel Hebrew terms for all LCSH subjects.
• The Hebrew terms are for searching only. The English LCSH terms appear alone in the bibliographic records.
Project timetable
• Announced early 2012
• Begun Dec. 2012
• As of June 2013
– Created authority records for 415,000 verified LCSH headings (21,000 still unverified/questionable)
– Of the verified unique LCSH headings in use, 392,000 (94%) have Hebrew translations
– Of the total LCSH headings in use (2.5 million) 98% have Hebrew translations
How did we do this?
• Created an English-Hebrew ‘dictionary’ of individual subfields (with subfield code), e.g.
• aTefillin, aIsrael, zIsrael, vBibliography, xHistory, y19th century, etc.
• The dictionary currently contains 65,000 entries (which translate 392,000 LCSH headings)
• Each evening all subject authority records are checked against the dictionary, new or changed translations are loaded to the authority file, and a list of subfields still needing translation is produced.
The dictionary
• Originally ‘seeded’ with 16,000 translations from Bar-Ilan + additional translations from Open University of Israel
• Many place names (Israel and abroad) loaded from Excel files received
• Ongoing translation based on frequency, subject area, etc. (Excel sheet)
• Batch load additions from the Excel sheet
• Manual additions/corrections
Terms needing translation – can be sorted on various columns
Translation aids
• Google Translate
• Babylon [online dictionary]
• Academy of the Hebrew Language terms database
• Hebrew bibliographic records
• Wikipedia
• Personal knowledge
Copy/paste to Google Translate
Copy/paste from Google Translate
Babylon
Academy of the Hebrew Language
Bibliographic record as source of translation
Wikipedia
Hebrew Wikipedia
Extract headings with translations (column D) and load to dictionary
Hebrew translation in subject authority record
Work is not finished…
• Fixing inconsistencies in translations from different sources
• Fixing inconsistencies in spelling from different sources: male/haser, different Hebraization of place names ( טורקיה/תורכיה ), etc.
• Sharing the translated subjects with other Israeli libraries using LCSH
• Cooperative expansion with other libraries (SACO-Israel)
Step 2 – Enabling cross-silo name searching a. NNL-RAMBI
• Move authority control of names in RAMBI to the NLI authority file
• Will make many RAMBI names multi-script searchable
• But: – Unique names in RAMBI may have different form in
NLI – Unique names in RAMBI may not be the same person
in NLI – Unique names in RAMBI may not be non-unique in NLI
(distinguished by dates, etc.)
Step 2 – Enabling cross-silo name searching b. NNL-BHB (Bibl. of the Hebrew Book)
• Names primarily Hebrew • BHB authority records (about 20,000) are rich in
biographical information • But very different forms of headings • Stage 1 – link BHB and NLI authority records • Stage 2 – transfer biographical data from BHB to
NLI (RDA tags) • Stage 3 – Add BHB forms/references to NLI
authorities to expedite single search of both databases
Sample BHB Authority Record
Different forms of headings
1874-1952 ,חיים, ויצמן•
1936-2005 ,דליה, רביקוביץ•
1886-1973 ,דוד, בן גוריון•
1138-1204 ,משה בן מימון•
חיים בן עוזר, ווייצמאן•
דליה בת לוי, ראביקוביץ•
דוד יוסף בן אביגדור, גוריון-בן•
(ם"רמב)משה בן מימון •
BHB NLI
Identify same persons by
• Ignoring some vowel-letters, patronymics, qualifiers
• Checking birth dates (BHB in authority record, NLI in heading)
• Checking for ‘fuzzy’ matches with identical titles
• Manual checking (hopefully a small number of leftovers)
Step 3 – Enabling cross-silo subject searching (Change RAMBI subjects to LCSH)
• RAMBI uses unique subjects (Hebrew or English, now largely clustered within RAMBI)
• RAMBI subjects assume Jewish context – Education = Jewish education – Music = Jewish Music – etc.
• Geographic facet is prominent – France: Antisemitism – France: Music – etc.
• RAMBI data in a larger database (discovery tool!) loses the Jewish context
• RAMBI + NLI • RAMBI via Ebsco Discovery Service (EDS) and Proquest Central
RAMBI subjects to LCSH
• Identify cases where identical subjects have different context (Education, Music)
• Create crosswalks to transfer RAMBI subjects to LCSH and phase out RAMBI thesaurus in favor of the NLI thesaurus (LCSH + Hebrew translations)
• Since RAMBI is article-level, many specific subjects will have to be created (and translated) in the NLI subject thesaurus.
Timetable
• 2013
– Complete conversion to LCSH of NLI music and manuscript subjects
– Merge RAMBI name headings with NLI headings and authority control
– Link BHB and NLI name headings
• 2014
– Convert RAMBI subjects to LCSH