The University of Cambridge Universal Catalogue: a work in progress Patricia Killiard Head of IT...
-
Upload
amos-cross -
Category
Documents
-
view
224 -
download
2
Transcript of The University of Cambridge Universal Catalogue: a work in progress Patricia Killiard Head of IT...
The University of Cambridge Universal Catalogue: a work in progress
Patricia KilliardHead of IT ServicesCambridge University Library
Libraries in the University of Cambridge UCUniversity Library
Dependent libraries• Medical Library• Scientific Periodicals• Squire Law Library• Betty & Gordon Moore
Library
• College libraries
• Departmental & Faculty libraries
• Affiliated Institutions
• Other libraries associated with the University
The Union Catalogue: Beginnings and growth
• Began in 1982 with the Union List of Serials – non-MARC records based on a printed list
• 19855 libraries began contributing short records for books to a Union Catalogue
• 1987 UC first made available to the public with 53,000 records
• 2002 90+ contributing libraries• New contributors are still joining• Software was written in-house and continued
to be used until 2002
Standards ...
• Early records were subject to no bibliographic standards to encourage contributions
• Brief records due to cost of disk space in 1980s
• No Authority control, even today• Independence of colleges, faculties and
departments means no overall control of standards ... consequences for the UC
• Serials records were non-MARC until 2002
Pre-2002 Union Catalogue Model
• Consortial model with duplicate bibliographic records
• No authority control• Completely separate from the authority-
controlled file for the University Library• Separate Union List of Serials which was de-
duplicated• Can still be seen at
http://linux01.lib.cam.ac.uk/Catalogues/OPAC/xunion.shtml
Pre-2002 Union Catalogue
Search Results in pre-2002 Union Catalogue
Cambridge Union List ofSerials
Advantages and disadvantages of the old UC modelAdvantages• Ability to request preferred 3 libraries first• Some patron functionality, e.g. Patrons able to view
books on loan• Each library’s holdings could be distinguished
immediatelyDisadvantages• Lack of de-duplication in the main Union Catalogue• Large numbers of search results• Exclusion of the University Library holdings from the UC• Separation of serials catalogue from monographs
Voyager vision for Cambridge
• Single de-duplicated Universal Catalogue incorporating all public databases, bringing University Library and other databases together
• Based on authority-controlled records• All patron functionality possible through the
UC• Libraries able to retain local rights over
records and patron functionality• Local subject headings retained
From Consortial Catalogue toUniversal Catalogue
• Department/Faculty and College databases in Voyager have multiple owning libraries - no record sharing
• Could move to a Union Catalogue module by allowing record sharing within databases but ...– Requires political will– Is very slow since records would merge on a
individual basis– Interim stage of merging confusing for patrons
Cambridge System Hardware
Universal Catalogue
Feeder databases
Web Server
Hardware specificationsSun Fire 48004 x T3 arrays configured in 2
partner groups2 x 4 x 750MHZ CPU’s16GB memory (8GB for each
domain)Disk space is: 2 x 18GB (used for Solaris)
and2 x 9 x 36GB (in one T3partner pair) for each domain
Domain A (Hookea) holds all production databases
Domain C (Hookec) holds UC
Web server = Sun 280R2 x 750MHz UltraSPARC
III processors4GB memory72GB disk
Test server = Sun 220R
Cambridge Voyager Databases
cambridgedb University Library and dependent libraries
manuscrpdb Manuscript database
depfacaedb Departments & Faculties A-E
depfacfmdb Departments & Faculties F-M
depfacozdb Departments & Faculties O-Z
collandb Colleges A-N
collpwdb Colleges P-W
otherdb Affiliated Institutions
resourcedb Resource file* (non-UC)
ucdb Universal Catalog
De-duplication
• Indexes used:– 010, 020, 022, 0350, 0359
• Large proportion of records do not have ISBNs or LCCNs
• De-duplication is very loose• Resulted in very low levels of de-duplication
(3-15%) • De-duplication may actually reduce as the file
accumulates due to addition of older records without control numbers
Replace vs Merge in de-duplication
• Bi-directional merge profile should have been available in 2001.2 but not yet working
• Essential in order to preserve British Education Index and local subject headings in 650._4 and 650._7
• Might be used in future to preserve other fields, e.g. 856 fields
Quality HierarchyLeader/06 Leader/17 040$a 040$d
* * DLC *as * * depfacaedbab * * depfacaedbas * * depfacfmdbab * * depfacfmdbas * * depfacozdbab * * depfacozdbas * * collandb
ab * * collandbas * * collpwdbab * * collpwdbas * * otherdbab * * otherdb* * * cambrdgedb
Trial UC build no. 1: Aug 2001
• First UC build with 2000.1.3 – built before remainder of system went live
• Contributing files were all test loads of data for all libraries - very slow to configure and build
• UC Phase 2 – should have had link back to holdings records but bug in 2000.1.3 prevented it from working
• Upgrade to 2000.2.1 needed to make it work (Oct 2001)
• No UB functionality• Very generic build using only 010, 020, 022 and
035 to de-duplicate
Trial build no. 2: Nov 2002
• 2 databases: cambrdgedb and depfacaedb with 2001.2 Beta
• Bugs in Sysadmin affected– Duplicate detection profiles– Quality hierarchy– Bi-directional merge– Saving values in Sysadmin generally
• Build failed several times at pre-bulk stage
Trial no. 3: March 2003
• Began March 2003, again with 2 databases• Early problems with matching location codes
and Oracle database names• Further pre-bulk problems• Delayed while databases were clustered in
March and upgraded to 2001.2.1 in early April• Build completed but
– quality hierarchy failed to work – bi-directional merge– unable to test patron functionality
Production build
• 21 July Initial load began with 2 databases: cambrdgedb and depfacaedb
• Indexed and reviewed at this stage• 22 August load of remaining databases began• 28 August load and indexing complete• Currently under review
– Authorities not loaded– UB not yet enabled– Bi-directional merge not yet functioning
De-duplication in production build
CambrdgedbProcessed 1,546,138Added 1,493,243Discarded 203Rejected 2911Replaced 49,779Replacement level 3.2%
DepfacaedbProcessed 412,727Added 339,408Discarded 397Rejected 59,397Replaced 13,523Replacement level 3.3%
CollandbProcessed 481,002Added 260,311Discarded 9593Rejected 136,146Replaced 749,51 Replacement level 15.6%
DepfacfmdbProcessed 352,619Added 284674Discarded 1419Rejected 47,660Replaced 18,866Replacement level 5.3%
Newton OPAC
UC Search Results
Full Record View
Major issues to tackle
• De-duplication of short records with no match points at present
• Authority control in a non-authority controlled environment
• Presentation of results to users:– Display doesn’t support multiple libraries in
database: shows database name as location rather than holding library
– Public names in OPAC need to be revised to reflect multiple libraries - 60 characters is not always sufficient
Short record with no de-duplication:
Short record de-duplication Option 1: Additional indexes• Creation of index solely for de-duplication
purposes• Manual matching by cataloguers• Addition of local control number in matching
records• Accurate but extremely slow• However, additional left-anchored indexes for
de-duping, like 015 (BNB numbers) would help.
Short record de-duplication
Option 2: • Combining indexes is probably the best way to
tackle the very large numbers of short records• Algorithm to combine author, title, and
publication date would be idealOption 3:• Upgrading all short records through retrocon
projects - expensive and not justified if only purpose is de-duplication
Serials: a special problem
• Two types of serials records:– Short Union List of Serials records: identical for all
libraries but multiple copies in each database– Upgraded serials records in all department/faculty
and college databases
• Need to ensure that – Higher quality records from departments etc. take
precedence– Former Union List of Serials records do not diverge
by controlling standards as they are upgraded
Authority control in the UC
• Authority records from the University Library database will be loaded into UC
• Local authorities discarded from Voyager build • No authorities in 7 out of 8 contributing
databases• Options?
– Load authorities into all databases? Too much space
– Introduce authority control into other 7 databases through Web authorities or copying authority records from cambrdgedb - problem of cleaning up existing records
Presentation of search results
• Patrons are interested in library holdings not database holdings
• Location Limits appear to be possible only by database not library
• May be able to work with access control groups and holdings sort groups
• Random order of MFHDs very confusing
Patron issues: UB environment ... but not entirely
• Full patron functionality in the UC OPAC was part of the Cambridge contract but recalls, holds and call slip requests not yet working
• Patron records from all contributing libraries display in OPAC
• Books on loan, requests, blocks, fines and fees from all libraries display in OPAC
• Circulation clustered environment• UB installed but no reciprocal borrowing
Top Enhancements
• Additional tools for de-duplication, preferably allowing combinations of indexes
• Fix for the multiple MFHDs being delivered in random order - incomprehensible to the user
• ISBN matching not ignoring text after first 10 digits (problem nos. 13283, 58877, etc.)– 020 __ |a 0335203884 and– 020 __ |a 0335203884(pbk)
• Link from the UC record to the record in the contributing database would be very useful for Cambridge
Can be seen at:
http://hookec.lib.cam.ac.uk
University of Cambridge Universal Catalogue