HATHI TRUST A Shared Digital Repository HathiTrust Digital Library Cooperation for Preservation.
-
Upload
haley-woodward -
Category
Documents
-
view
233 -
download
1
Transcript of HATHI TRUST A Shared Digital Repository HathiTrust Digital Library Cooperation for Preservation.
![Page 1: HATHI TRUST A Shared Digital Repository HathiTrust Digital Library Cooperation for Preservation.](https://reader031.fdocuments.net/reader031/viewer/2022020720/5514c1cb550346b0338b491f/html5/thumbnails/1.jpg)
HATHI TRUST A Shared Digital Repository
HathiTrust Digital Library
Cooperation for Preservation
![Page 2: HATHI TRUST A Shared Digital Repository HathiTrust Digital Library Cooperation for Preservation.](https://reader031.fdocuments.net/reader031/viewer/2022020720/5514c1cb550346b0338b491f/html5/thumbnails/2.jpg)
Outline
• About HathiTrust– Mission & Goals
• Background• What we do– Services
• How we do it– Governance– Partnership & Resources– Technology
• Future Directions
![Page 4: HATHI TRUST A Shared Digital Repository HathiTrust Digital Library Cooperation for Preservation.](https://reader031.fdocuments.net/reader031/viewer/2022020720/5514c1cb550346b0338b491f/html5/thumbnails/4.jpg)
What is HathiTrust• Shared Digital Repository– Launched 2008 by 25 institutions (now 26)– Initial focus on digitized book and journal content– Expanding to non-book/non-journal, born digital – “Light” archive
• Collaboration – Preservation and access– Print collections– Local services– Public Good
![Page 6: HATHI TRUST A Shared Digital Repository HathiTrust Digital Library Cooperation for Preservation.](https://reader031.fdocuments.net/reader031/viewer/2022020720/5514c1cb550346b0338b491f/html5/thumbnails/6.jpg)
History
• Michigan Digitization Project 2004• “…U of M shall have the right to use the U of
M Digital Copy, in whole or in part at U of M's sole discretion, as part of services offered in cooperation with partner research libraries such as the institutions in the Digital Library Federation…”
![Page 7: HATHI TRUST A Shared Digital Repository HathiTrust Digital Library Cooperation for Preservation.](https://reader031.fdocuments.net/reader031/viewer/2022020720/5514c1cb550346b0338b491f/html5/thumbnails/7.jpg)
History
• Collective Agreement with CIC Announced in June 2007
• CIC agreed to establish a shared digital repository
![Page 8: HATHI TRUST A Shared Digital Repository HathiTrust Digital Library Cooperation for Preservation.](https://reader031.fdocuments.net/reader031/viewer/2022020720/5514c1cb550346b0338b491f/html5/thumbnails/8.jpg)
History
![Page 9: HATHI TRUST A Shared Digital Repository HathiTrust Digital Library Cooperation for Preservation.](https://reader031.fdocuments.net/reader031/viewer/2022020720/5514c1cb550346b0338b491f/html5/thumbnails/9.jpg)
The Partners
• When announced in October 2008, partners included:– University of California system– CIC (Committee on Institutional Cooperation)
– University of Virginia
University of ChicagoUniversity of IllinoisIndiana UniversityUniversity of IowaUniversity of Michigan Michigan State University
University of MinnesotaNorthwestern University Ohio State University Pennsylvania State University Purdue University University of Wisconsin-Madison
Columbia University
![Page 10: HATHI TRUST A Shared Digital Repository HathiTrust Digital Library Cooperation for Preservation.](https://reader031.fdocuments.net/reader031/viewer/2022020720/5514c1cb550346b0338b491f/html5/thumbnails/10.jpg)
The Name
• The meaning behind the name– Hathi (hah-tee)--Hindi for elephant– Big, strong– Never forgets, wise– Secure– Trustworthy
![Page 11: HATHI TRUST A Shared Digital Repository HathiTrust Digital Library Cooperation for Preservation.](https://reader031.fdocuments.net/reader031/viewer/2022020720/5514c1cb550346b0338b491f/html5/thumbnails/11.jpg)
Content Distribution
As of February 1:5,323,716 - Total 764,481 - Public Domain
![Page 12: HATHI TRUST A Shared Digital Repository HathiTrust Digital Library Cooperation for Preservation.](https://reader031.fdocuments.net/reader031/viewer/2022020720/5514c1cb550346b0338b491f/html5/thumbnails/12.jpg)
Content Growth
![Page 14: HATHI TRUST A Shared Digital Repository HathiTrust Digital Library Cooperation for Preservation.](https://reader031.fdocuments.net/reader031/viewer/2022020720/5514c1cb550346b0338b491f/html5/thumbnails/14.jpg)
Services
![Page 15: HATHI TRUST A Shared Digital Repository HathiTrust Digital Library Cooperation for Preservation.](https://reader031.fdocuments.net/reader031/viewer/2022020720/5514c1cb550346b0338b491f/html5/thumbnails/15.jpg)
How we do it
![Page 16: HATHI TRUST A Shared Digital Repository HathiTrust Digital Library Cooperation for Preservation.](https://reader031.fdocuments.net/reader031/viewer/2022020720/5514c1cb550346b0338b491f/html5/thumbnails/16.jpg)
Governance
HathiTrustHathiTrust
Executive Committee
Strategic Advisory
Board
Strategic Advisory
Board
Budget/FinancesDecision-making
PolicyPlanning
![Page 17: HATHI TRUST A Shared Digital Repository HathiTrust Digital Library Cooperation for Preservation.](https://reader031.fdocuments.net/reader031/viewer/2022020720/5514c1cb550346b0338b491f/html5/thumbnails/17.jpg)
Executive Committee
• Paul Courant, University Librarian and Dean of Libraries, UM• Laine Farley, Executive Director, CDL• John King, Vice Provost for Academic Information, UM• Paula Kaufman, University Librarian and Dean of Libraries, UI• Brian Schottlaender, University Librarian, UCSD• Ed Van Gemert, Director of Libraries, UW - Madison• Brenda Johnson, Dean of Libraries, IU• Brad Wheeler, Chief Information Officer, IU• John Wilkin, Executive Director of HathiTrust and
Associate University Library, LIT, UM
![Page 18: HATHI TRUST A Shared Digital Repository HathiTrust Digital Library Cooperation for Preservation.](https://reader031.fdocuments.net/reader031/viewer/2022020720/5514c1cb550346b0338b491f/html5/thumbnails/18.jpg)
Strategic Advisory Board
• Ed Van Gemert (Chair), Director of Libraries, UW - Madison• John Butler, Associate University Librarian for Information
Technology, U Minn• Patricia Cruse, Director, Preservation, CDL• Bernie Hurley, Director, Library Technologies, UC Berkeley• R. Bruce Miller, University Librarian, UC - Merced• Sarah Pritchard, University Librarian, Northwestern• Paul Soderdahl, Director, LIT, U Iowa• John Wilkin, Executive Director, HathiTrust (ex officio)
![Page 19: HATHI TRUST A Shared Digital Repository HathiTrust Digital Library Cooperation for Preservation.](https://reader031.fdocuments.net/reader031/viewer/2022020720/5514c1cb550346b0338b491f/html5/thumbnails/19.jpg)
Partnership & Resources (1)
• Funded for a initial 5 years with base-funding from partners
• Budget – separately held within UMich budget system, managed by the Executive Committee
• Cost Model – Per GB cost of storage per year with a one-time fee on new content to build a capital fund
• Review in 3rd yr of each 5 yr period
![Page 20: HATHI TRUST A Shared Digital Repository HathiTrust Digital Library Cooperation for Preservation.](https://reader031.fdocuments.net/reader031/viewer/2022020720/5514c1cb550346b0338b491f/html5/thumbnails/20.jpg)
Partnership & Resources (2)
• Staff/Expertise – highly integrated– Project managers, IT and communications
staff, copyright experts, administrators (UM,
Indiana and UC taking the lead)• Working groups• UM recently hired a Digital Preservation Librarian• Shared development space
![Page 21: HATHI TRUST A Shared Digital Repository HathiTrust Digital Library Cooperation for Preservation.](https://reader031.fdocuments.net/reader031/viewer/2022020720/5514c1cb550346b0338b491f/html5/thumbnails/21.jpg)
Financial contributions of partners
HathiTrust Functional Framework
![Page 22: HATHI TRUST A Shared Digital Repository HathiTrust Digital Library Cooperation for Preservation.](https://reader031.fdocuments.net/reader031/viewer/2022020720/5514c1cb550346b0338b491f/html5/thumbnails/22.jpg)
Partnership & Resources (3)
• Toward a Cloud Library– CLIR, Mellon Foundation– OCLC Research, NYU, HathiTrust, Recap Libraries
• Objective: Characterize the near-term opportunity for externalizing management of academic research collections leveraging capacity of large-scale shared print and digital repositories*
• Outcomes: opportunity and risk assessment based on aggregate collection analysis; draft service agreement enabling generic consumer library to selectively outsource preservation and access of low-use research collections to large-scale print and digital repositories
*From the RLG Partner Update January 7, 2010
![Page 23: HATHI TRUST A Shared Digital Repository HathiTrust Digital Library Cooperation for Preservation.](https://reader031.fdocuments.net/reader031/viewer/2022020720/5514c1cb550346b0338b491f/html5/thumbnails/23.jpg)
Partnership & Resources (4)
• CRL TRAC Audit– Portico and HathiTrust assessments timely– “Certification will augment CRL’s strategic archiving of
print, and support a responsible transition to electronic-only formats where appropriate.”
– Work with UC to design shared print journal archiving effort
– “With this hybrid strategy CRL hopes to enable its community to accelerate the shift to electronic-only resources in a careful and responsible manner.”
* http://www.crl.edu/archiving-preservation/digital-archives/certification-and-assessment-digital-repositories
![Page 24: HATHI TRUST A Shared Digital Repository HathiTrust Digital Library Cooperation for Preservation.](https://reader031.fdocuments.net/reader031/viewer/2022020720/5514c1cb550346b0338b491f/html5/thumbnails/24.jpg)
Partnership & Resources (5)
• New cost model• Based on benefits to institutions– Public Domain– In-copyright• Volumes “held”
![Page 25: HATHI TRUST A Shared Digital Repository HathiTrust Digital Library Cooperation for Preservation.](https://reader031.fdocuments.net/reader031/viewer/2022020720/5514c1cb550346b0338b491f/html5/thumbnails/25.jpg)
Partnership & Resources (6)
• Timeline:– Implement in 2013– Accept new partners now with costs based on
overlap calculations
• Requirements:– Print holdings database– Update mechanisms– Manual remediation
![Page 26: HATHI TRUST A Shared Digital Repository HathiTrust Digital Library Cooperation for Preservation.](https://reader031.fdocuments.net/reader031/viewer/2022020720/5514c1cb550346b0338b491f/html5/thumbnails/26.jpg)
Technology - OAIS
GRINInternal Data Loading
GRINInternal Data Loading
Google[OCA]
In-house Conversion
Google[OCA]
In-house Conversion
MARC record extensions (Aleph)
Rights DB
MARC record extensions (Aleph)
Rights DB
Page TurnerHathiTrust API
OAIGeoIP DB
CNRI Handles[Solr]
Page TurnerHathiTrust API
OAIGeoIP DB
CNRI Handles[Solr]
METS/PREMIS objectTIFF G4/JPEG2000
OCRMD5 checksums
METS/PREMIS objectTIFF G4/JPEG2000
OCRMD5 checksums
METS objectPNGOCRPDF
METS objectPNGOCRPDFIsilon
Site ReplicationTSM
MD5 checksum validation
IsilonSite Replication
TSMMD5 checksum validation
GROOVE(JHOVE)GROOVE(JHOVE)
;
![Page 27: HATHI TRUST A Shared Digital Repository HathiTrust Digital Library Cooperation for Preservation.](https://reader031.fdocuments.net/reader031/viewer/2022020720/5514c1cb550346b0338b491f/html5/thumbnails/27.jpg)
Technology – Architecture
• Inbound validation, standards-based object storage and related metadata
• Storage in Ann Arbor and Indianapolis• Encrypted backup to 3rd location• Rights database for rights metadata• Online catalog as source and storage for descriptive
metadata
![Page 28: HATHI TRUST A Shared Digital Repository HathiTrust Digital Library Cooperation for Preservation.](https://reader031.fdocuments.net/reader031/viewer/2022020720/5514c1cb550346b0338b491f/html5/thumbnails/28.jpg)
Technology - Ingest
• Automatic validation in GROOVE– Check barcode check digit using Luhn algorithm– Fixity check on JPG2000, TIFF, UTF8 using MD5– Well-formedness and embedded metadata check
on JPG2000, TIFF, UTF8 using JHOVE• Creation of METS and PREMIS
![Page 29: HATHI TRUST A Shared Digital Repository HathiTrust Digital Library Cooperation for Preservation.](https://reader031.fdocuments.net/reader031/viewer/2022020720/5514c1cb550346b0338b491f/html5/thumbnails/29.jpg)
• Isilon storage• Simple filesystem layout– One directory per volume, zip file and METS file– Use of a namespace allows for conflicting
identifiers– Namespaces for institutions and, if needed, types
of identifiers within the institution
Technology - Repository
![Page 30: HATHI TRUST A Shared Digital Repository HathiTrust Digital Library Cooperation for Preservation.](https://reader031.fdocuments.net/reader031/viewer/2022020720/5514c1cb550346b0338b491f/html5/thumbnails/30.jpg)
• Why METS?– Can serve as Archival Information
Package and a Dissemination Information Package
– Designed to record the relationship between pieces of complex digital objects
– Can be created automatically as texts are loaded or reloaded
– Preservation actions (PREMIS)
Technology – METS Object
![Page 31: HATHI TRUST A Shared Digital Repository HathiTrust Digital Library Cooperation for Preservation.](https://reader031.fdocuments.net/reader031/viewer/2022020720/5514c1cb550346b0338b491f/html5/thumbnails/31.jpg)
• What’s there?
–metsHdr with an ID and CREATEDATE
– 2 dmdSecs: Marcxml and mdRef
– amdSec containing one techMD with PREMIS metadata
– fileSec with 4 fileGrps (zip, images, OCR, hOCR)
– Physical structMap tying together files with metadata (pg. numbers and features)
Technology – METS Object
![Page 32: HATHI TRUST A Shared Digital Repository HathiTrust Digital Library Cooperation for Preservation.](https://reader031.fdocuments.net/reader031/viewer/2022020720/5514c1cb550346b0338b491f/html5/thumbnails/32.jpg)
Future Directions
![Page 33: HATHI TRUST A Shared Digital Repository HathiTrust Digital Library Cooperation for Preservation.](https://reader031.fdocuments.net/reader031/viewer/2022020720/5514c1cb550346b0338b491f/html5/thumbnails/33.jpg)
Future Directions (1)
![Page 34: HATHI TRUST A Shared Digital Repository HathiTrust Digital Library Cooperation for Preservation.](https://reader031.fdocuments.net/reader031/viewer/2022020720/5514c1cb550346b0338b491f/html5/thumbnails/34.jpg)
Future Directions (2)
![Page 35: HATHI TRUST A Shared Digital Repository HathiTrust Digital Library Cooperation for Preservation.](https://reader031.fdocuments.net/reader031/viewer/2022020720/5514c1cb550346b0338b491f/html5/thumbnails/35.jpg)
Links• Catalog, Full-text search, and Collection Builder
– http://catalog.hathitrust.org• METS and PREMIS implementation
– http://www.hathitrust.org/preservation• Technical profile:
– http://www.hathitrust.org/technology• Technical flow diagram
– http://www.hathitrust.org/documents/HathiTrust-PASIG-200910.pdf– http://www.hathitrust.org/documents/HathiTrust-PASIG-notes-200910.pdf
• Rights management– http://www.hathitrust.org/rights_management
• TRAC– http://www.hathitrust.org/accountability