Cory Snavely Library IT Core Services manager University of Michigan September 2010.

8
Cory Snavely Library IT Core Services manager University of Michigan September 2010

Transcript of Cory Snavely Library IT Core Services manager University of Michigan September 2010.

Page 1: Cory Snavely Library IT Core Services manager University of Michigan September 2010.

Cory SnavelyLibrary IT Core Services manager

University of Michigan

September 2010

Page 2: Cory Snavely Library IT Core Services manager University of Michigan September 2010.

www.hathitrust.org

HathiTrust project profile

• Launched October 2008• 29 member institutions and growing• primarily Google-scanned materials but also

other sources • 6.7 million volumes, 350 pages average• 250 terabytes in two US instances

Page 3: Cory Snavely Library IT Core Services manager University of Michigan September 2010.

www.hathitrust.org

Material and Data Flow

ingest

web

sync

Google or other scanning project

storage@UM

storage@IU

networkor mediadelivery

catalogrightsdatabase

web

index

Page 4: Cory Snavely Library IT Core Services manager University of Michigan September 2010.

Content Growth

Page 5: Cory Snavely Library IT Core Services manager University of Michigan September 2010.

Content Distribution Over Time

* As of July 25, 2010

Page 6: Cory Snavely Library IT Core Services manager University of Michigan September 2010.

www.hathitrust.org

• Trend is obvious, but not necessarily bad• External error detection may be impossible

What do I worry about?

Yesterday’s worry …is a non-issue due to… …but today’s worry is

Managing too many separate devices

Block/file virtualization

Storage system software reliability

and change management.

What if I have to fsck this hulking beast?

Non-volatile journals and online integrity checks

Bit rot, misdirected writes, …

Online error detection and repair

Page 7: Cory Snavely Library IT Core Services manager University of Michigan September 2010.

www.hathitrust.org

What’s the Data Integrity Roadmap?

• Not all systems provide integrity features• It’s time for the data integrity model of

systems to be a primary purchase criterion• SNIA Data Integrity and Long Term Retention

Technical Working Groups may help to surface minimum standards or common approaches; can anyone speak to progress?

Page 8: Cory Snavely Library IT Core Services manager University of Michigan September 2010.

Questions?

Cory [email protected]

www.hathitrust.org