Uniting the Silos at the NLI - Association of Jewish...

Post on 03-Jul-2020

2 views 0 download

Transcript of Uniting the Silos at the NLI - Association of Jewish...

Uniting the Silos at the NLI Accomplishments and Challenges

Elhanan Adler

An information silo is a management system incapable of reciprocal operation with other, related information systems. … "Information silo" is a pejorative expression that is useful for describing the absence of operational reciprocity. In Information Technology, the absence of operational reciprocity is between disparate systems also commonly referred to as disparate data systems. Derived variants are "silo thinking", "silo vision", and "silo mentality". Wikipedia 30/5/2013

Silos at the NLI

NLI Catalog •Names:

•Latin a/b •Hebrew a/b •Arabic a/b •Cyrillic a/b

•Subjects: •Latin a/b (LCSH) •Music Dept. •Mss. subjects

RAMBI (Index to Articles in Jewish Studies) •Names:

•Latin a/b •Hebrew a/b •Cyrillic a/b

•Subjects (unique): •Latin a/b •Hebrew a/b

Bibliography of the Hebrew Book (BHB) •Names:

•Hebrew a/b

Stages

1. Unite alphabetic sub-silos

2. Enabling cross-silo name searching

3. Enabling cross-silo subject searching

Step 1 – uniting alphabetic sub-silos

• Clustering name headings (RAMBI and NLI catalog) : searching any alphabet retrieves the same name in all others

• Clustering subjects (RAMBI): searching subject in either Hebrew or English retrieves both

• Converting subjects (NLI catalog): convert remaining non-LCSH subjects to LCSH

• Work in progress

Clustering headings

• Multilingual authority records (multiple 1xx fields) [not standard MARC!]

• ALEPH functionality supports searching one to retrieve the others

• Currently functions in ALEPH OPAC, soon available in new library discovery tool (“Merhav” – Ex Libris Primo)

Clustering subjects: RAMBI (June 2013 figures)

• Title/topical/geog. authorities (130/150/151): – Total: 31,458

– Clustered: 21,714 (10,852 authority clusters): 69%

– Unclustered: 9,744 31% (titles: 35%, topical: 5%, geographical: 60%)

• Title/topical/geog. subjects (630/650/651): – Total: 608,812

– Clustered: 591,415 – 97%

– Unclustered: 17,396 – 3%

Clustering names: NLI catalog (June 2013 figures)

• Began with 8,876 matches located by VIAF (Virtual international Authority File). Aug. 2012

• Ongoing matching and clustering of authority records in current cataloging and using various algorithms

• Personal name authority headings (MARC tag 100) – Total: 522,542 – Clustered: 42,728 (20,669 authority clusters) – 8% – Unclustered: 479,813

• Personal name bibliographic headings (100,700) – Total: 646,201 – Clustered: 125,529 - 19% of all 100/700, 51% of authority controlled

100/700 – Unclustered: 218,573 – No authority record: 302,099

Multiscript name authority record

Multiscript search

Retrieves all scripts

Translating LCSH headings to Hebrew (creating English-Hebrew clusters)

• The NLI uses LCSH-style English subject headings for all uniform title, topical and geographic subjects (MARC 630/650/651)

• In order to enable Hebrew subject searching, the NLI is creating parallel Hebrew terms for all LCSH subjects.

• The Hebrew terms are for searching only. The English LCSH terms appear alone in the bibliographic records.

Project timetable

• Announced early 2012

• Begun Dec. 2012

• As of June 2013

– Created authority records for 415,000 verified LCSH headings (21,000 still unverified/questionable)

– Of the verified unique LCSH headings in use, 392,000 (94%) have Hebrew translations

– Of the total LCSH headings in use (2.5 million) 98% have Hebrew translations

How did we do this?

• Created an English-Hebrew ‘dictionary’ of individual subfields (with subfield code), e.g.

• aTefillin, aIsrael, zIsrael, vBibliography, xHistory, y19th century, etc.

• The dictionary currently contains 65,000 entries (which translate 392,000 LCSH headings)

• Each evening all subject authority records are checked against the dictionary, new or changed translations are loaded to the authority file, and a list of subfields still needing translation is produced.

The dictionary

• Originally ‘seeded’ with 16,000 translations from Bar-Ilan + additional translations from Open University of Israel

• Many place names (Israel and abroad) loaded from Excel files received

• Ongoing translation based on frequency, subject area, etc. (Excel sheet)

• Batch load additions from the Excel sheet

• Manual additions/corrections

Terms needing translation – can be sorted on various columns

Translation aids

• Google Translate

• Babylon [online dictionary]

• Academy of the Hebrew Language terms database

• Hebrew bibliographic records

• Wikipedia

• Personal knowledge

Copy/paste to Google Translate

Copy/paste from Google Translate

Babylon

Academy of the Hebrew Language

Bibliographic record as source of translation

Wikipedia

Hebrew Wikipedia

Extract headings with translations (column D) and load to dictionary

Hebrew translation in subject authority record

Work is not finished…

• Fixing inconsistencies in translations from different sources

• Fixing inconsistencies in spelling from different sources: male/haser, different Hebraization of place names ( טורקיה/תורכיה ), etc.

• Sharing the translated subjects with other Israeli libraries using LCSH

• Cooperative expansion with other libraries (SACO-Israel)

Step 2 – Enabling cross-silo name searching a. NNL-RAMBI

• Move authority control of names in RAMBI to the NLI authority file

• Will make many RAMBI names multi-script searchable

• But: – Unique names in RAMBI may have different form in

NLI – Unique names in RAMBI may not be the same person

in NLI – Unique names in RAMBI may not be non-unique in NLI

(distinguished by dates, etc.)

Step 2 – Enabling cross-silo name searching b. NNL-BHB (Bibl. of the Hebrew Book)

• Names primarily Hebrew • BHB authority records (about 20,000) are rich in

biographical information • But very different forms of headings • Stage 1 – link BHB and NLI authority records • Stage 2 – transfer biographical data from BHB to

NLI (RDA tags) • Stage 3 – Add BHB forms/references to NLI

authorities to expedite single search of both databases

Sample BHB Authority Record

Different forms of headings

1874-1952 ,חיים, ויצמן•

1936-2005 ,דליה, רביקוביץ•

1886-1973 ,דוד, בן גוריון•

1138-1204 ,משה בן מימון•

חיים בן עוזר, ווייצמאן•

דליה בת לוי, ראביקוביץ•

דוד יוסף בן אביגדור, גוריון-בן•

(ם"רמב)משה בן מימון •

BHB NLI

Identify same persons by

• Ignoring some vowel-letters, patronymics, qualifiers

• Checking birth dates (BHB in authority record, NLI in heading)

• Checking for ‘fuzzy’ matches with identical titles

• Manual checking (hopefully a small number of leftovers)

Step 3 – Enabling cross-silo subject searching (Change RAMBI subjects to LCSH)

• RAMBI uses unique subjects (Hebrew or English, now largely clustered within RAMBI)

• RAMBI subjects assume Jewish context – Education = Jewish education – Music = Jewish Music – etc.

• Geographic facet is prominent – France: Antisemitism – France: Music – etc.

• RAMBI data in a larger database (discovery tool!) loses the Jewish context

• RAMBI + NLI • RAMBI via Ebsco Discovery Service (EDS) and Proquest Central

RAMBI subjects to LCSH

• Identify cases where identical subjects have different context (Education, Music)

• Create crosswalks to transfer RAMBI subjects to LCSH and phase out RAMBI thesaurus in favor of the NLI thesaurus (LCSH + Hebrew translations)

• Since RAMBI is article-level, many specific subjects will have to be created (and translated) in the NLI subject thesaurus.

Timetable

• 2013

– Complete conversion to LCSH of NLI music and manuscript subjects

– Merge RAMBI name headings with NLI headings and authority control

– Link BHB and NLI name headings

• 2014

– Convert RAMBI subjects to LCSH