Fungsi Outbound Training, Fungsi Kegiatan Outbound, Tujuan Outbound Perusahaan, 081231938011
Outbound Harvesting with Encore as a Library Space-Saving Strategy : The Case of HathiTrust Docs
-
Upload
christopher-c-brown -
Category
Education
-
view
320 -
download
0
description
Transcript of Outbound Harvesting with Encore as a Library Space-Saving Strategy : The Case of HathiTrust Docs
Outbound Harvesting with Encore as a Library Space-Saving Strategy: The
Case of HathiTrust Docs
Christopher C. BrownUniversity of Denver, Penrose Library
(303) [email protected]
Friday, April 15, 2011
This presentation will show how Encore harvesting can be used to mitigate a space problem in a library, substituting online access for the need for physical access to the collection.The government documents collection will be the primary focus.
DR, IR, Digital Texts
Inbound HarvestingOutbound Harvesting
About University of Denver
Depository since 1909Historically a 70-75%
selectiveNow a 4.8% selective, but
receive 100% of online cataloging
Adding URLs to historic documents
The Problem
Currently 80% of our paper documents are in storage
We will be remodelling our library – totally displaced for at least 18 months; 100% of documents will be in storage
Government documents will remain in storage after renovation
Partial Solution: Using Encore for Outbound Harvesting
Our users are accustomed to using electronic documents
Need to divert attention away from physical collection holdings
Encore harvesting of Hathi Trust can do this
OCLC report: 15% of HathiTrust public domain materials are government docs*
Malpas, Constance. 2011. Cloud-sourcing Research Collections: Managing Print in the Mass-digitized Library Environment. Dublin, Ohio: OCLC Research. http://www.oclc.org/research/publications/library/2011/2011-01.pdf.
OAI-PMH Harvesting
http://www.openarchives.org/Promotes interoperability standards for
dissemination of contentHathi Trust allows harvesting of its recordsInnovative Interface’s Encore catalog
allows for records to be harvested (with the purchase of a harvester connection)
Encore Model
Traditional III
Millennium ILS
Local Site with Digital
Content
ClassicOPAC
Encore (III)
(next-gen catalog outside
the ILS box)
Har
vest
er
Remote Site with Digital
Content
Remote Site with Digital
Content •Harvested records appear only in Encore, not in “classic” catalog•Harvested records update on a periodic schedule – in our case daily
PD = where docs generally live
Hathi Trust AttributesFrom: http://www.hathitrust.org/rights_database
PD vs. PDUS
• Mass identification of copyright status based on bibliographically-derived information: a) As texts are loaded, a set query in Mirlyn identifies those texts that are:US federal government documents, or
• published in the US prior to 1923, or• published outside of the US before 1870• These are treated as public domain (ATTRIBUTE name=pd) based on
bibliographically-derived information (REASON name=bib). We do not restrict access to these materials. b) Those texts that do not meet these criteria (e.g,. US post-1923 and not a government document) are treated as in-copyright (i.e., ATTRIBUTE name=ic and REASON name=bib). c) An additional attribute is used to represent works published outside the United States between 1870 and 1923 because copyright status for these works depends on the location of the user. Works published outside the US prior to 1923 are in the public domain; however, due to the variations in copyright law in countries outside the US, it is estimated that 1870 is the earliest date works published in these countries may still be under copyright. Therefore, users accessing the volume from US IP addresses will have access to the works published outside the US between 1870 through 1923; however, users with non-US IP addresses will not (ATTRIBUTE name=pdus and REASON name=bib).
Public Domain Distribution
Sampling Method
I wanted to see how many government documents were in our HathiTrust harvest
Limit to HathiTrust for a given yearExamine first result on each page of 25
results (4% of results) [limitation: Encore only displays first 1,000 results]
Harvesting Hathi Docs: The Stats
Statistics as of mid-March, 2011The Docs Sampling columns show the estimated numbers of docs per year and the estimated percentage of docs per year from the Harvest
Date Range Hathi Totals
Hathi All Pub Domain
pdus + pd Hathi pdus DU pd Harvest Docs Sampling2000-2009 505,682 14,140 726 13,369 13,340 99.78%1990-1999 709,214 29,163 880 28,164 26,662 94.67%1980-1989 723,657 33,753 1,204 32,321 31,370 97.06%1970-1979 631,110 28,633 2,046 26,189 25,607 97.78%1960-1969 546,914 21,244 1,987 18,991 7,668 40.38%1950-1959 281,615 20,861 863 19,893 3,888 19.54%1940-1949 184,755 17,096 600 16,253 3,771 23.21%1930-1939 175,103 16,237 654 15,317 2,600 16.97%1920-1929 175,226 66,563 27,108 28,854 1,529 5.30%1910-1919 175,148 169,923 75,955 61,230 4,124 6.73%1900-1909 179,018 153,284 70,900 47,999 2,265 4.72%1890-1899 112,295 110,605 50,502 34,742 596 1.72%1880-1889 83,950 82,809 38,928 23,855 699 2.93%1870-1879 58,624 57,826 27,202 17,751 319 1.80%1860-1869 50,907 50,337 2,273 45,790 248 0.54%
4,593,218 872,474 301,828 430,718 124,686 28.95%
Hathi Docs Usage in Proportion to Docs Distribution
200920041999199419891984197919741969196419591954194919441939193419291924191919141909190418990
5000
10000
15000
20000
25000
30000
Total DocsHathi Docs
Sources: 1895-1976 data: Monthly Catalog, 1895-1976 (ProQuest);1976 onward data: CGP
Hathi Harvest in Perspective
Tracking of daily harvesting since harvesting began, April 16, 2010 through January 1, 2011
4/1/2
010
4/11/2
010
4/21/2
010
5/1/2
010
5/11/2
010
5/21/2
010
5/31/2
010
6/10/2
010
6/20/2
010
6/30/2
010
7/10/2
010
7/20/2
010
7/30/2
010
8/9/2
010
8/19/2
010
8/29/2
010
9/8/2
010
9/18/2
010
9/28/2
010
10/8/2
010
10/18/2
010
10/28/2
010
11/7/2
010
11/17/2
010
11/27/2
010
12/7/2
010
12/17/2
010
12/27/2
010 -
50,000
100,000
150,000
200,000
250,000
300,000
350,000
400,000
450,000
500,000
550,000
600,000
Harvested RecordsHarvested Docs
Inclusion of Serials
Although serial holdings do not sort properly, users can figure out what they need.
Access to Older Serials
Harve
sted
Rec
ord
Hathi
Trus
t Rec
ord
Hathi
Trus
t Ful
l Tex
t
And Very Old Serials
Harve
sted
Rec
ord
Hathi
Trus
t Rec
ord
Hathi
Trus
t Ful
l Tex
t
Multivolume Works
Duplicate Holdings
U. Of Michigan and U. of California holdings both show in this record
Now, the Bad News:Records are Stripped Down“Lumber, Lumber, Lumber”
Harvested Record from our Catalog
Notice the multiple duplications of subject headings
Original Record in Hathi Trust
Same record, but subject heading subfields are present
Stripped-Out Fields
008 fixed field data
650 subfields other than “a”
500 notes5xx shipping list info
300 subfields after “a”
086 SuDocs number
Use Stats for Regular Online Docs
Represents clickthroughs from the catalog record to individual government documents over 7+ years.
Use Stats for Hathi Trust?
•Statistics for all Hathi Trust records accessed, not just documents•Spikes in usage are docs librarian (my) testing, not real users
Statistics from Google Analytics
Conclusions
Encore provides an easy way to add external content to a library catalog experience
HathiTrust records are freely available and are easy to harvest
The Encore-harvested records are stripped-down and inadequate, providing too few access points and inadequate descriptions
The content is superb, contain monographic and serial documents holdings over a span of about 150 years
Overall the project is worth having in our Encore catalog, especially since our legacy documents are all in storage and will remain there
We are considering adding other external collections using Encore, such as Center for Research Libraries digital holdings.