December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types Part 2: Equipment that...

50
Software curation as a digital preservation service Keith Webster Dean of University Libraries Director of Emerging and Integrative Media Initiatives @cmkeithw

Transcript of December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types Part 2: Equipment that...

Page 1: December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types  Part 2: Equipment that Supports the Present and the Future

Software curation as a digital preservation service

Keith WebsterDean of University LibrariesDirector of Emerging and Integrative Media Initiatives

@cmkeithw

Page 2: December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types  Part 2: Equipment that Supports the Present and the Future

Software curation – why?

Page 3: December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types  Part 2: Equipment that Supports the Present and the Future

April 1, 2015 3

Archiving Static Content

Page 4: December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types  Part 2: Equipment that Supports the Present and the Future

April 1, 2015 4

What About Executable Content?

Games

Page 5: December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types  Part 2: Equipment that Supports the Present and the Future

April 1, 2015 5

What About Executable Content?

Application-specific contentGames

WordPerfect 1.0 doc Can you read it today? 100 years from now?

Original Wang doc Can you read it today? 100 years from now?

Simulation model Can you re-run old

model with new data?

Page 6: December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types  Part 2: Equipment that Supports the Present and the Future
Page 7: December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types  Part 2: Equipment that Supports the Present and the Future
Page 8: December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types  Part 2: Equipment that Supports the Present and the Future
Page 9: December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types  Part 2: Equipment that Supports the Present and the Future

Usefulknowledge

Sharableknowledge

Page 10: December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types  Part 2: Equipment that Supports the Present and the Future

• We have spent 20 years converting material to digital form, establishing standards and protocols, and looking after it

Page 11: December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types  Part 2: Equipment that Supports the Present and the Future
Page 12: December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types  Part 2: Equipment that Supports the Present and the Future

We also have a track-record in curating born-digital content

Page 13: December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types  Part 2: Equipment that Supports the Present and the Future

And some of us are making progress with social media products

Page 14: December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types  Part 2: Equipment that Supports the Present and the Future

• The rapid development in computing technology and the Internet have opened up new applications for the basic sources of research — the base material of research data — which has given a major impetus to scientific work in recent years.

• Access to research data increases the returns from public investment in this area; reinforces open scientific inquiry; encourages diversity of studies and opinion; promotes new areas of work and enables the exploration of topics not envisioned by the initial investigators.

• The value of data lies in their use. Full and open access to scientific data should be adopted as the international norm for the exchange of scientific data derived from publicly funded research.

What about the products of research?

Page 15: December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types  Part 2: Equipment that Supports the Present and the Future
Page 16: December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types  Part 2: Equipment that Supports the Present and the Future
Page 17: December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types  Part 2: Equipment that Supports the Present and the Future
Page 18: December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types  Part 2: Equipment that Supports the Present and the Future
Page 19: December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types  Part 2: Equipment that Supports the Present and the Future
Page 20: December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types  Part 2: Equipment that Supports the Present and the Future
Page 21: December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types  Part 2: Equipment that Supports the Present and the Future

The data may still be discoverable and accessible - but executable?

Page 22: December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types  Part 2: Equipment that Supports the Present and the Future

Data come in different forms, shapes and sizes

Page 23: December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types  Part 2: Equipment that Supports the Present and the Future
Page 24: December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types  Part 2: Equipment that Supports the Present and the Future
Page 25: December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types  Part 2: Equipment that Supports the Present and the Future
Page 26: December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types  Part 2: Equipment that Supports the Present and the Future
Page 27: December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types  Part 2: Equipment that Supports the Present and the Future
Page 28: December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types  Part 2: Equipment that Supports the Present and the Future

Opera5ngSystemUsageOverTime

0.00%

20.00%

40.00%

60.00%

80.00%

2003 2006 2009 2012 2015

Win8 Win7Vista Win2003OlderWin WinXPW2000 Win98Win95 WinNTLinux MacMobile

Why? – Software dependent content

Page 29: December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types  Part 2: Equipment that Supports the Present and the Future

Old software is required to authentically render old content

Originalcontentinoriginalsoftware(WordPerfectinWindows95)

Originalcontentinnewersoftware(LibreOfficeWriterinWindows

Vista)

Page 30: December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types  Part 2: Equipment that Supports the Present and the Future

Research results are at risk of loss without original software

Originalcontentinoriginalsoftware(WordStarforDOSinMicrosoftDOS)

[NB: equation predicting tree growth rates includesexponentsdocumentedusingupperlineoftext]

Originalcontentinnewersoftware(LibreOfficeWriterinWindowsVista)

[NB:equationlayoutandmeaningchanged]

Page 31: December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types  Part 2: Equipment that Supports the Present and the Future

Why? – Software dependent content

• Weneedtocurateandpreserveoperatingsystemstosupportaccesstoassetsthatdependonthem

• Weneedtocurateandpreservesoftwareapplicationstosupportaccesstocontentthatdependsonthem

• Weneedtocreateandpreservefonts,scripts,plug-insandotherdependenciestosupportaccesstocontentthatrequiresthem

• Weneedtopreservewholedesktopenvironments(e.g.SalmonRushdie’sdesktopatEmoryuniversity)tosupportaccesstotheexperienceofinteractingwithit

• Weneedtocurateandpreservepre-configureddiskimageswithsoftwarealreadyinstalledonthem–forrunningonemulatedhardware

Page 32: December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types  Part 2: Equipment that Supports the Present and the Future

Software Curation – How?

Page 33: December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types  Part 2: Equipment that Supports the Present and the Future

How? – Emulation/Virtualization

• Anemulationsoftwarepackage(“emulator”)isusedtocreateavirtualversionofonecomputerwithinanothercomputerthathasdifferenthardware

• Oldsoftwarecanberunonthe“emulated”computerhardwarejustlikeitwasrunningontheoriginalphysicalcomputer.

• Manyemulatorswereoriginallydevelopedtorunoldvideogames

Page 34: December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types  Part 2: Equipment that Supports the Present and the Future

How? – Emulation/Virtualization

• Emulationisoftenusedtosupportoldhardwaredevicesthatrequireobsoletesoftware

(e.g.assemblylinemanagementsoftware,scientificinstruments,industrialmachinery,etc)

• Emulationiswidelyusedbymobilephoneapplicationdeveloperstodevelopsoftwareforphone-hardwareusingdesktop-PChardware

(i.e.phonehardwareisemulatedondesktoppcstobuildphone-compatibleapplications)

• Virtualization=emulationbutwithcompatiblehardware(someofthehostmachine’shardwareisuseddirectlybythe“virtualized”computer)Virtualizationbridgesthegapbetweendepartureofrecentlyobsoletehardwareandthearrivalofhardwarepowerfulenoughtoemulateit

Page 35: December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types  Part 2: Equipment that Supports the Present and the Future

Olive Demo

Page 36: December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types  Part 2: Equipment that Supports the Present and the Future

April 1, 2015 36

Execution Fidelity

Ability to precisely reproduce execution

Many moving parts• hardware• operating system• dynamically linked libraries• configuration parameters• language settings• time zone settings• …

Very difficult to achieve and then maintain

Page 37: December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types  Part 2: Equipment that Supports the Present and the Future

Transform into a Scaling Problem

Pack up and carry the entire environment with you (including the OS)

Transitive closure of everything you need

Central idea of a (hardware) virtual machine (VM)

Page 38: December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types  Part 2: Equipment that Supports the Present and the Future

But VMs are Huge!

10 GB VM • @ 100 Mbps → at least 800 seconds (13 minutes)

download • @ 10 Mbps → at least 8000 seconds (over two hours)

download

No one will wait that long to look at something briefly!

How do we achieve quick launch?

Page 39: December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types  Part 2: Equipment that Supports the Present and the Future

I nte rne t

Video Streaming

Page 40: December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types  Part 2: Equipment that Supports the Present and the Future

VM Streaming Not So EasyAccess to VM image is not linear

Reference pattern depends on many runtime factors • data dependencies • human interaction • spatial and temporal locality (program behavior)

Borrow an old idea from operating systems • demand paging• intercept missing VM pieces and fetch over Internet • prefetching can mask stalls due to demand misses

(if hints are good)

Page 41: December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types  Part 2: Equipment that Supports the Present and the Future

Olive Implementation

Page 42: December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types  Part 2: Equipment that Supports the Present and the Future

Client Structure

1. Today’s Hardware (x86)

3. VMNetX (demand paging and prefetching of VM state)

4. Virtual Machine Monitor (KVM/QEMU)

gues

t env

ironm

ent

2. Operating System (Linux) (host OS)

5. Hardware emulator (e.g. Basilisk II) (not needed if old hardware was x86)

6. Old Operating System (guest OS) (e.g., Windows 3.1)

7. Old Application (e.g., Great American History Machine)

8. Data file, Script, Simulation Model, etc. (e.g. Excel spreadsheet)

host

env

ironm

ent

Virtual Machine(streamed over the Internet from Olive archive)

eg Laptop/LinuxOlive caching

Virtualize host hardware

Page 43: December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types  Part 2: Equipment that Supports the Present and the Future

Linux

Olive Implementation

VMNetXclient

FUSE

VM Image file

pristine cache

modified cache

to Olive servervia standard HTTP range

requests

Gue

st O

S

KVM / QEMU

VMM

Gue

st A

pp

Unmodified Web Server

Page 44: December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types  Part 2: Equipment that Supports the Present and the Future

https://youtu.be/J32NFUIC4m4

Page 45: December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types  Part 2: Equipment that Supports the Present and the Future

Looking Ahead

Page 46: December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types  Part 2: Equipment that Supports the Present and the Future

Many Technical ChallengesScaling and performance issues

• VMs keep getting bigger, networks are never fast enough • clever prefetching techniques

Precise emulation of hardware • even x86 extended memory modes not quite right in QEMU

(can’t boot Windows 95 in KVM/QEMU)

• exotic hardware platforms • host compatibility (e.g. CPU flags in x86) vs performance • hardware performance accelerators (e.g. GPUs)

Multi-VM ensembles (e.g. HPC environments)

Tools for easy building of VMs (physical to virtual?)

Archiving entire cloud services … many others …

We are a long way from being “done”!

Page 47: December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types  Part 2: Equipment that Supports the Present and the Future

Closing ThoughtsArchiving static content transformed human history

Archiving executable content will be equally transformative

Strong interest from university libraries, philanthropic foundations (e.g. Sloan, Mellon), and national institutions (e.g. National Archives, Library of Congress) to create a public good:

Olive reference library for the nation and the world

Library of Alexandria

I wonder what Isaac’s model would say about

this new data?

reaching back in timeIsaac’s archived VM image

Potential to Transform Scholarship

Page 48: December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types  Part 2: Equipment that Supports the Present and the Future
Page 49: December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types  Part 2: Equipment that Supports the Present and the Future

More information

https://olivearchive.org/