The Future of the Online Catalog Andrew K. Pace NCSU Libraries July 28, 2006 Library Automation:...
-
Upload
marvin-gray -
Category
Documents
-
view
215 -
download
0
Transcript of The Future of the Online Catalog Andrew K. Pace NCSU Libraries July 28, 2006 Library Automation:...
The Future of the The Future of the Online CatalogOnline Catalog
Andrew K. Pace Andrew K. Pace
NCSU LibrariesNCSU Libraries
July 28, 2006July 28, 2006
Library Automation: Library Automation: Yesterday’s Yesterday’s Technology, Technology, TomorrowTomorrow
What I will cover: Online catalog: the problemOnline catalog: the problem Brief environmental scanBrief environmental scan Endeca: team, timeline, technologyEndeca: team, timeline, technology Usability, statistical results, relevance Usability, statistical results, relevance
studystudy Dis-integrated systems / Future Dis-integrated systems / Future
CatalogsCatalogs
What ILS Catalogs Do Well…(liberally stolen from Roy Tennant)
Inventory control: What and whereInventory control: What and where Known item searchingKnown item searching
Any search other than known itemAny search other than known item Most Anything other than books (serials, Most Anything other than books (serials,
e-resources, articles, digital objects)e-resources, articles, digital objects) Logical groupings of results (e.g. FRBR)Logical groupings of results (e.g. FRBR) Faceted browsingFaceted browsing Relevance rankingRelevance ranking Sideways searching (suggestions, Sideways searching (suggestions,
expansion of searches and search targets)expansion of searches and search targets)
What ILS Catalogs Don’t do Well…(liberally stolen from Roy Tennant, and augmented by me)
“OPAC Complainers”““There is certainly no dearth of OPAC There is certainly no dearth of OPAC complainers. You have Andrew Pace (OPACs complainers. You have Andrew Pace (OPACs suck), and Roy Tennant (You Can’t Put Lipstick suck), and Roy Tennant (You Can’t Put Lipstick on a Pig) writing and presenting about the on a Pig) writing and presenting about the need for change (more simplicity) in the OPAC need for change (more simplicity) in the OPAC world. I can appreciate their arguments for a world. I can appreciate their arguments for a simpler OPAC (not to mention the rest of the simpler OPAC (not to mention the rest of the system) but other then [system) but other then [sicsic] present their ] present their arguments, neither has much in the way of arguments, neither has much in the way of suggestions nor have they sparked a suggestions nor have they sparked a movement among librarians or the automation movement among librarians or the automation vendors to do anything about the situation.”vendors to do anything about the situation.”
-ACRL Blog entry-ACRL Blog entryOct. 13 2005Oct. 13 2005
NextGen Library Search Tools RedLightGreen (RLG)RedLightGreen (RLG) OCLC FictionfinderOCLC Fictionfinder Vivisimo clustered Vivisimo clustered
search (Ex Libris, search (Ex Libris, Serials Soltions)Serials Soltions)
Grokker (EBSCO)Grokker (EBSCO) Aquabrowser visual Aquabrowser visual
context context Endeca Information Endeca Information
Access PlatformAccess Platform OCLC Custom Worldcat OCLC Custom Worldcat
and OpenWorldCatand OpenWorldCat
Innovative Interfaces Innovative Interfaces OPAC Pro & EncoreOPAC Pro & Encore
Ex Libris PrimoEx Libris Primo Polaris, AJAX-Enabled Polaris, AJAX-Enabled
OPACOPAC SirsiDynix Enterprise SirsiDynix Enterprise
Portal System, FASTPortal System, FAST Talis, et alTalis, et alWeb Web
ServicesServices Georgia Pines and the Georgia Pines and the
Library 2.0 Library 2.0 BandwagonBandwagon
Endeca purchase decision Lots of topical searches and poor Lots of topical searches and poor
subject accesssubject access– Keyword gives too many or too few Keyword gives too many or too few
results – leads to general distrustresults – leads to general distrust– Misunderstanding of authority headingsMisunderstanding of authority headings
No relevancy ranking of resultsNo relevancy ranking of results Needed more responsiveness (speed)Needed more responsiveness (speed)
Implementation Team 7 representative team members7 representative team members– Andrew Pace, IT, ChairAndrew Pace, IT, Chair– Emily Lynema, IT, ex officio (tech lead)Emily Lynema, IT, ex officio (tech lead)– Cindy Levine, Research and Information ServicesCindy Levine, Research and Information Services– Erik Moore, IT, ex officio (ILS librarian)Erik Moore, IT, ex officio (ILS librarian)– Charley Pennell, Metadata and CatalogingCharley Pennell, Metadata and Cataloging– Shirley Rodgers, ITShirley Rodgers, IT– Tito Sierra, Digital Library InitiativesTito Sierra, Digital Library Initiatives
TimelineTimeline– License / negotiation: Spring 2005License / negotiation: Spring 2005– Acquire: Summer 2005Acquire: Summer 2005– Implementation: August 2005 – January 12, 2006Implementation: August 2005 – January 12, 2006
Technical Overview Endeca ProFind co-exists with Endeca ProFind co-exists with
SirsiDynix Unicorn ILS and Web2 SirsiDynix Unicorn ILS and Web2 online catalog.online catalog.
Endeca indexes MARC records Endeca indexes MARC records exported from Unicorn.exported from Unicorn.
Index is refreshed nightly with Index is refreshed nightly with records added/updated during records added/updated during previous day.previous day.
Endeca ProFind Overview
Raw MARC data
NCSU exports and reformats
Flat text files
Data Foundry
Parse text files Indices
Navigation Engine
NCSU Web Application
HTTP
Client browser
HTTP
Endeca ProFind
Endeca ProFind Overview
Raw MARC data
NCSU exports and reformats
Flat text files
Data Foundry
Parse text files Indices
Navigation Engine
NCSU Web Application
HTTP
Client browser
HTTP
Offline - Nightly
Endeca ProFind Overview
Raw MARC data
NCSU exports and reformats
Flat text files
Data Foundry
Parse text files Indices
Navigation Engine
NCSU Web Application
HTTP
Client browser
HTTP
Always Online
Integrating Endeca Endeca doesn’t understand MARC data / MARC-8 Endeca doesn’t understand MARC data / MARC-8
character encoding – translate to UTF-8 text filescharacter encoding – translate to UTF-8 text files Each night a script updates the data indexed by Each night a script updates the data indexed by
Endeca:Endeca:– Exports updated or new MARC records from Unicorn.Exports updated or new MARC records from Unicorn.– Reformats and merges these records with those already Reformats and merges these records with those already
indexed.indexed.– Starts Endeca re-index – completely rebuilding index for Starts Endeca re-index – completely rebuilding index for
the catalog.the catalog. Process requires about 4 hours.Process requires about 4 hours. Retain Web2 OPAC for some functionalityRetain Web2 OPAC for some functionality
– Authority searching - known items and cross-referencesAuthority searching - known items and cross-references– Detailed record pages – how to make Endeca -> Web2 Detailed record pages – how to make Endeca -> Web2
link?link?
Some User Reaction““This is absolutely the coolest thing I've seen all This is absolutely the coolest thing I've seen all
century.” century.” - Will Owen, Head of Systems (UNC Libraries)Will Owen, Head of Systems (UNC Libraries)
““Also, I'm really digging the new NCSU library catalog. Also, I'm really digging the new NCSU library catalog. Very nice." Very nice."
- Educause staff (non-librarian)- Educause staff (non-librarian)
““The new Endeca system is incredible. It would be The new Endeca system is incredible. It would be difficult to exaggerate how much better it is than difficult to exaggerate how much better it is than our old online card catalog (and therefore that of our old online card catalog (and therefore that of most other universities). I've found myself most other universities). I've found myself searching the catalog just for fun, whereas before it searching the catalog just for fun, whereas before it was a chore to find what I needed.”was a chore to find what I needed.”
- NCSU Undergrad, Statistics- NCSU Undergrad, Statistics
Basic statistics (March – May 2006)
Requests by Search Type
Search -> Navigation
29%
Navigation 20%
Search 51%
Navigation statistics (March – May 2006)
Navigation Requests by Dimension
70,516
38,074
38,605
59,248
87,221
74,985
65,545
155,856
169,249
23,848
0 30,000 60,000 90,000 120,000 150,000
Author
Language
Subject: Era
Subject: Region
Library
Format
Subject: Genre
Subject: Topic
LC Classification
Availability
Requests
Navigation statistics (March – May 2006)
Navigation by Dimensions
Subject: Topic19%
Library11%
Format9%
Author9%
Subject: Genre8%
Subject: Region7%
Subject: Era5%
Language5%
New4%
LC Classification20%
Availability3%
Sorting statistics (March – May 2006)
Sorting Requests
Most Popular19%
Title A-Z13%
Pub Date53%
Author A-Z9%
Call Number6%
Other interesting tidbits… (March 2006)
Authority searching decreased 45%Authority searching decreased 45% Keyword searching increased 230% Keyword searching increased 230% – Caveat: default catalog search changed Caveat: default catalog search changed
from title authority to keywordfrom title authority to keyword ~ 5% of keyword searches offered ~ 5% of keyword searches offered
spelling correction or suggestion spelling correction or suggestion – 3.1% - automatic spell correction3.1% - automatic spell correction– 2.3% - “Did you mean…” suggestion2.3% - “Did you mean…” suggestion
Usability Testing Trends 10 undergraduate students10 undergraduate students
– 5 with Endeca catalog5 with Endeca catalog– 5 with old Web2 OPAC5 with old Web2 OPAC
Endeca performed as well as OPAC for known-Endeca performed as well as OPAC for known-item searchingitem searching– 89% Endeca tasks completed ‘easily’ (8/9)89% Endeca tasks completed ‘easily’ (8/9)– 71% OPAC tasks completed ‘easily’ (15/21)71% OPAC tasks completed ‘easily’ (15/21)
Endeca performs better than OPAC for topical Endeca performs better than OPAC for topical searchingsearching– 61% Endeca tasks completed ‘easily’ (19/31)61% Endeca tasks completed ‘easily’ (19/31)– 3% Endeca tasks completed as ‘hard’ (1/31)3% Endeca tasks completed as ‘hard’ (1/31)– 33% OPAC tasks completed ‘easily’ (13/39) 33% OPAC tasks completed ‘easily’ (13/39) – 26% OPAC tasks completed as ‘hard’ (10/39)26% OPAC tasks completed as ‘hard’ (10/39)
A study in relevance Are search results in Endeca more Are search results in Endeca more
likely to be relevant to a user’s query likely to be relevant to a user’s query than search results in Web2 OPAC? than search results in Web2 OPAC?
100 topical user searches from 1 100 topical user searches from 1 month in fall 2005month in fall 2005
How many of top 5 results relevant?How many of top 5 results relevant?– 40% relevant in Web2 OPAC40% relevant in Web2 OPAC– 68% relevant in Endeca catalog68% relevant in Endeca catalog
Relevance defined Relevance ranking in Endeca – select Relevance ranking in Endeca – select
from a variety of modules and order from a variety of modules and order them based on importance.them based on importance.
Relevance most important in Keyword Relevance most important in Keyword Anywhere - searches all fields.Anywhere - searches all fields.
At NCSU…At NCSU…1.1. Original query term(s) (no thesaurus, Original query term(s) (no thesaurus,
stemming, spell correction)stemming, spell correction)2.2. Exact phrase matchExact phrase match3.3. Field ranking (Title higher than Author higher Field ranking (Title higher than Author higher
than Table of Contents)than Table of Contents)4.4. Number of fields that contain term(s) …Number of fields that contain term(s) …
Future Plans Ongoing tweaks:Ongoing tweaks:– Continued usability testingContinued usability testing– Relevance ranking algorithms & spell correction Relevance ranking algorithms & spell correction
thresholdsthresholds– Additional browsing optionsAdditional browsing options
Endeca 2.0 ideasEndeca 2.0 ideas– FRBR-ized displayFRBR-ized display– Discussions with OCLC regarding FAST (Faceted Discussions with OCLC regarding FAST (Faceted
Access to Subject Terms) and FRBRAccess to Subject Terms) and FRBR– Patron-generated refinements (folksonomies?)Patron-generated refinements (folksonomies?)– Enrich records with supplemental Web Services Enrich records with supplemental Web Services
content – more usable TOCs, book reviews, etc.content – more usable TOCs, book reviews, etc.– The death of authority searching (?)The death of authority searching (?)– More integration with QuickSearch, other data More integration with QuickSearch, other data
repositories, and third-party discovery toolsrepositories, and third-party discovery tools
Stuff to read… Rethinking how we provide bibliographic services for the Rethinking how we provide bibliographic services for the
University of California by the Bibliographic Services Task Force University of California by the Bibliographic Services Task Force http://libraries.universityofcalifornia.edu/sopag/BSTF/Final.pdfhttp://libraries.universityofcalifornia.edu/sopag/BSTF/Final.pdf
The Changing nature of the catalog and its integration with other The Changing nature of the catalog and its integration with other discovery tools by Karen Calhoundiscovery tools by Karen Calhounhttp://www.loc.gov/catdir/calhoun-report-final.pdf http://www.loc.gov/catdir/calhoun-report-final.pdf
The Changing nature of the catalog and its integration with The Changing nature of the catalog and its integration with other discovery tools. Final report. March 17, 2006. Prepared for other discovery tools. Final report. March 17, 2006. Prepared for the the Library of Congress by Karen Calhoun: A Critical review by Thomas Library of Congress by Karen Calhoun: A Critical review by Thomas Mann Mann http://www.guild2910.org/AFSCMECalhounReviewREV.pdfhttp://www.guild2910.org/AFSCMECalhounReviewREV.pdf
A “Next Generation Catalog, Eric Morgan A “Next Generation Catalog, Eric Morgan http://dewey.library.nd.edu/morgan/ngc/http://dewey.library.nd.edu/morgan/ngc/
Metadata Research Center, SILSMetadata Research Center, SILShttp://ils.unc.edu/mrc/http://ils.unc.edu/mrc/
University of Rochester eXtensible CatalogUniversity of Rochester eXtensible Catalog Toward a 21Toward a 21stst Century Catalog, ITAL, Sept. 2006, by Antelman, Century Catalog, ITAL, Sept. 2006, by Antelman,
Lynema, and PaceLynema, and Pace
From the Calhoun Report "If one accepts the premise that library "If one accepts the premise that library
collections have value, then library leaders collections have value, then library leaders must move swiftly to establish the catalog must move swiftly to establish the catalog within the framework of online information within the framework of online information discovery systems of all kinds. Because it discovery systems of all kinds. Because it is catalog data that has made collections is catalog data that has made collections accessible over time, to fail to define a accessible over time, to fail to define a strategic future for library catalogs places strategic future for library catalogs places in jeopardy the legacy of the world's in jeopardy the legacy of the world's library collections themselves. For this library collections themselves. For this reason, the option of rejecting library reason, the option of rejecting library catalogs is not considered in this report." catalogs is not considered in this report."
The library system pile
““Seams serve as perceptible Seams serve as perceptible boundaries that provide points of boundaries that provide points of reference; without such boundaries reference; without such boundaries readers get ‘lost at sea’ and don’t know readers get ‘lost at sea’ and don’t know were they are in relation to anything were they are in relation to anything else; they can’t perceive either the else; they can’t perceive either the extent of what they have or what they extent of what they have or what they don’t have.”don’t have.”
-Thomas Mann-Thomas Mann
The library system puzzle
Catalog
Serials
A&I / FT DBs
Web
Digital Repositories
ERM Systems
Guided Navigation
Legacy ILS
Metasearch
IR
GS
Thank you.
http://www.lib.ncsu.edu/endecahttp://www.lib.ncsu.edu/endeca
Andrew Pace, Head, ITAndrew Pace, Head, IT