Hieveryone- thanksverymuchforcomingtoday.Andhappypreservationweek!
I’mLaurenWork,andI’mtheDigitalPreservationLibrarianhereatUVa.IstartedinthispositioninJanuaryofthisyear,soifwehavenotmetyet;hello!
Justasabriefintroductiontomyplacewithinthelibraryforthosewhomaynotbefamiliar- I’mpartofthesmallbutmightyContentStewardshipgroupofthelibrary,whichisresponsibleforoverseeingthephysicalanddigitalpreservationofcollections,withthegoalofmakingthempermanentlyaccessible.
Withinthisgroup,I’mpartofPreservationservices,andcanbefoundonthesecondfloorofAldermanLibrary,inroom202.Pleasefeelfreetocomesayhello.
1
So,whatdoIdoasthedigitalpreservationlibrarianhereatUVa?
I’mresponsibleforthesustainablepreservationofuniversitydigitalresources,and“digitalresources”runsthegamutfromwebsitestoarchivalharddrivesandfloppydisks,toborndigitalscholarship.
Ipersonallyalsoseemyroleasahighlycollaborativeconduitinthelibrary,connectingandservingthedigitalpreservationneedsandworkflowsthatrangefromSpecialCollectionstosoftwaredevelopmenttoCollectionAccess&Description,toscholarlycommunicationanddigitalhumanitiesandbackagain.
Soreally- muchlikemycoworkersintraditionalpreservation,Iingest,stabilize,secureandmonitoruniversityresourcestomakethemsustainablyaccessible- Ijustgettousemyhandsabitless,andstareatscreensabitmoretodoso.
2
Iframedthistalkasadigitalpreservation“primer”forafewreasons:First,alliterationisfun!
Second- forthosenotdirectlyinvolvedindigitalpreservationwork,evenwithinlibraries,thesubjectofdigitalpreservationiskindofabroad,sweepingconcept.
Andshortdetail,(suchasthedetailIjustgaveaboutmyownposition),don’treallyofferinsightintothepractice ofdigitalpreservation-whatdoesit lookliketo“do”digitalpreservation?Iaimtobreaktheseconceptsintocomponentpartsandgiveyousomeusecasestohelpincreaseunderstandingtoday.
Lastly, Iformedthistalkasa“primer”becauseIthinkit’skeytounderstandtheconceptsthatformthebasicbuildingblocksofdigitalpreservation,asultimately itallowsforbetterunderstandingofwhyandhowweapproachimportantdigitalpreservationprojectshereatUVa.
Mygoalisforyoutoleaveheretodayknowingmoreaboutdigitalpreservationasawhole,andwithanunderstanding abouthowweareundertakingdigitalpreservationworkatUVa thatties intothecommunityofconceptsandbestpractices
3
surroundingdigitalpreservation.
3
So.Let’s startatthebeginning.Whatisdigitalpreservation?
TheLibraryofCongress,oneoftheleadersinthelibraryandpreservationfield,hasthisdefinition:
Howmanypeople,basedonthisdefinition,wouldhaveanideaofwhatIdoalldayasyourdigitalpreservationlibrarian?
4
Here’saslightlymoredetaileddefinitionfromtheAssociationforLibraryCollectionsandTechnicalServices.
Ok,nowwe’retalkingaboutpolicyandactivemanagementaswellasaccess todigitalcontentovertime- we’regettingclosertowhatitmeanstoactivelyengageindigitalpreservation.
AndIwanttopausehereforamomenttosay- Iamrunningusthroughthesedefinitionsbothasanintroductiontodigitalpreservationasaconcept,butalsotoprovideaquicksnapshotofthehistoryofthefield.
5
AndIwanttoprovidethishistorybecause, inthemidtolate90s,whenthisversionoftheApplecomputerwasthehottestthinggoing,itwasalsothetimewhenpeopleinthelibraryandculturalheritageworldwerereallytryingtogettheirheadsaroundwhattodowiththehugevolumesofdigitalcontentbothbeingproduced andalsosittingonobsoletemediafromscholarsanddonorsontheirshelves.
Andtheyweregrapplingwiththesedefinitionsandwhattheymeantintermsofpolicy,standards,stewardshipandtechnologicaldevelopment.
6
Strideswerefirstmadeontheadministrative&policyend,assomehadargued,successfully,thatthebiggestimpedimentstodigitalpreservationwereorganizational,ratherthantechnical.
Soapproachesfromthattimeintheaughts includedtheseFiveOrganizationalStagesforDigitalPreservationfromKenneyandMcGovern,theTrustedRepositoryAuditCertification,andotherhighlevel,workflow,policyandstrategyrelatedwork.Allveryimportant.
7
However- muchoftheprogressofthesepolicyandstrategy-relatedapproachesdidnotaddresswhatpractitioners- thoseactuallyresponsibleforstewarding,ingesting,andprotectingthedigitalcontent - shouldactuallybedoingtotakeaction ondigitalpreservation.
8
So,in2012,theNationalDigitalStewardshipAlliance,aninitiativefromtheInformationInfrastructureandPreservationProgramattheLibraryofCongress,setouttochangethat- withasimple,techplatform-agnostic,modularapproachtoactuallydoingtheworkofdigitalpreservation.It’smeanttohelpguideboththosejustbeginningwithdigitalpreservationprogramsaswellasorganizationstryingtoadvancetheirexistingwork.
Fivemajorconceptualareaswereidentifiedasimportantareasforaction- waystoevaluatewhatyourorganizationisdoingtomitigateriskofloss,andtoidentifyconcretenextstepstotakefordigitalpreservation.
Thisismeanttobealivingdocument,withanunderstandthatit isworkinprogressasthedigitalpreservationfieldgrowsandchanges,andastechnologydoesthesame.
Andithelpsformthebasisofourcurrentapproachestodigitalpreservation,aswe’llseeinaminuteaswegothroughusecaseshereatUVA- butIwantedtoprovidethisbriefhistorytoshowhowtheworkwedocomesoutofactionableguidanceforpractitionersandstrategistsalike.Andtheseapproachesarereflectedinthetoolsandpolicythathavecontinuedtoemergeanddevelopaspeopledotheworkof
9
digitalpreservation.
9
Solet’sdiveintosomeoftheworkandplanningwe’redoinghereatUVa.
10
Scholarshiphascontinuedtotakemanyformsovertheyears,andIcouldgiveanentiretalkjustonpreservingthewebsites, images,databases,audio,socialmedia,and3Ddatathatnowmakeupcomponentsofmodernscholarship.AndmaybeIwillinthefuture!
ButtodayIwantedtofocusonourdigitalpreservationworkrelatedtooneofthemostsweepingchangesinacademia asawhole- thatofelectronicthesesanddissertations.Manylibrariesstoppedacceptingpapercopiesfordepositandpreservationwithinthelastseveralyears,andsopreservingdigitalscholarshipforlong-termaccesswithinanacademicrepositoryisoftheutmost importance.
HereatUVa we’recurrentlyapproachingthepreservationoftheseassets inseveralways,butI’lltouchontwotoday.
11
Thefirstisataverygranularlevel- thatofPDF/Aorarchivalformat.
Currently,studentssubmitaplainPDFtoLibra,andtherearenootherformatrestrictionsbeyondthat.Andthereareopportunitiestoprovideamorestable,standardizedformofPDFthatwillrendermorereliablyinthefuture.
We’reinthebeginningstages,butareworkingtowardestablishingPDF/A,asapotentialfuturestandardforthesisdeposit,whichmanyotheracademicinstitutionshavedonetohelpbetterpreservetheirscholarship.
12
Sowhatisit?Simplyput,PDF/AisaninternationalpreservationstandardforPDFthatiswidelyadoptedasaformat.It isdeviceindependent,meaningitcanbeopenedonmorethan1typeofreader,andisentirelyselfcontained.
Selfcontainedmeansthatthedocumentitselfisstructuredtocontaineverythingneededforanaccuraterepresentationofitscontents.AnditalsomeansnoexternallinkstofontsorexecutablefileslikeJavascript,novideo,andnoencryption.
Thebestwaytothinkaboutthisconceptistothinkaboutanexperiencethatmanyofushavehad– attemptingtoopenanolderdocumentorfilethatmaydisplaybrokenimages,orstrangeblockoftextwherefontshouldbe.PDF/Apreventsthesethingsbyself-containment,andprovidesuswithbetterpreservationstrategiesforourscholarlycontent.
13
Thesecondapproachfordigitalpreservation isatahigherlevel,withtheAcademicPreservationTrust.
UVa istheleaderofAPTrust,whichisaconsortial effortofacademicinstitutionstopreservetheir digitalscholarlycontent.APTrustpoolstheresourcesofitsmemberstobuild,support,andscalealarge,cloud-basedrepositorysystem.Thisallowsforbestpracticesofdigitalpreservationincludinggeographicdistribution,redundancyandrecoveryofcontent,fixitychecking,storageofmetadata,andmore.
Sotheseareafewthingswe’reactivelyworkingonforthepreservationofdigitalscholarship.
14
Whataboutuniversityordonordigitalcontentthatonlyexistscurrentlyonunstablemedialikefloppydisks,CDs,orharddrives?Whatarewedoingtopreservethiscontent?
PicturedisanactualexampleofafloppydiskfromSpecialCollections.
Iamsuremanyofyouhavesomethingsittingunderyourdeskorinaclosetthatyouaredyingtogetridof,orwonderingifweevenhavethehardwareanymoretoexaminetheseitems.
15
LetmeintroducetoyouFRED- ourforensicrecoveryofevidencedevice,andanimportanttoolforthedigitalpreservationoflegacymediafromawiderangeofsourcesattheuniversity.
IfthatsoundslikeCSItoyou- it’smeantto.Thedigitalpreservationfieldhasfounditselfdovetailinginsomewayswithcrimesceneinvestigation,inthatweaimtosecurely,andwithoutimpact, obtain,examine,andstabilizethecontentsofmedia-butforthepurposesoflongtermpreservationandprovenance,notevidence.
Again,thisissimilartomycolleaguesintraditionalpreservation- theymayreceiveanitemforpreservationandnoticemold,oraninsect.Theywouldidentifyanyissues,andtakestandardizedsteps,suchasplacingtheiteminafreezertostabilize&isolatetheitem.Similarly,IcanidentifyvirusesorcorruptfileswiththeFRED,andtreatthemaccordinglytoensurethatthecontentitself isstabilized,andthatitcannotharmotherworks,muchlikeisolatingamoldybook.
We’veactually justreceivedournewFRED,andI’mveryexcitedtogetitupandrunning,butIthoughtwecouldtakeaquickspinthroughwhatthistypeofdigitalpreservationlookslikewhenyouactuallygetcontentoffofthefloppydisk.
16
16
Hereisourfloppydiskagain.Ontheoutsidelabel, itclaimstobean“archive”fromMay-Novof1990.
Toreviewtheactualcontentonthislegacymedia,wemightgothroughseveralpreservationsteps,including:
Aninventoryandphotographoftheitemitself,
acarefulreviewoftheobjectfordamagethatcouldimpactboththepreservationoftheactualobjectandtheabilitytoreadthemediaonitfordigitalingest,
usehardwaretoinsertthephysicaldiskandconnect ittoawrite-blockedportontheFRED,toensurewedon’taccidentallywritetoorchangeanyfiles
WewouldthencreatethediskimagewithanopensourcesoftwarecalledBitCurator,
Wecanalsodothingslikeperformavirusscan,runchecksforsensitiveinformationsuchascreditcardinformationorsocialsecuritynumbers, andcreatechecksumstoallowustolatervalidatethatthediskimagehasnotchanged,andsoon.
17
17
Soyoumaybewondering- whyadiskimage?Severalreasons.Adiskimagecaptureseachsectorbitforbitonadiskandpreservestheentirefilesystemstructure.Animagealsocapturesallsegmentsofadisk,includingdeletedfiles,versionsoffiles,andunallocatedspace,whichcancontainotherhiddenfiles.Soifyousimplycopiedthefilesoverinsteadofcreatingadiskimage,wemightmissotherpotentiallyimportantinformation.
Sowe’veimagedourdiskimage- andthisiswhattheresultlookslike.Thehighlightednumberendingin.aff.txt istheaccompanyinginformationfilefromourdiskimage,whichisthefileendingin.aff rightaboutit.Theaff extensionisanadvancedforensicformat,whichisanopenformatforthestorageofdiskimagesandmetadata.Whatyouseeinthistextfileissomeofthatmetadata, includingourchecksumsandsomebasictechnicalmetadataaboutthedisk.
18
Andhereisacloserviewofdocumentsthatarepresentonthefloppy- youseeaPDF,andmbox,whichisanemailformat,somedocs,andsomethingcalledbibtex,whichisreferencemanagementsoftware.
Sothistypeofdigitalpreservationprocessallowsustopreservecontentthatbygettingitoffoflegacymedia,whereit isatriskandinaccessible,allowsustomakesurethat informationiscaptured,ingested,stabilizedandsecurelypackagedinawaythatwillthenallowustodothingslikemovethedigitalobjectstoSpecialCollectionstoallowthemtoapplytheirexpertiseforthenextstepsofthecollection,andsendthedigitalobjectstoourrepositories.
19
Changingbacktotheborn-digitalpartofthepreservationpuzzle,Iwanttotalkabitaboutsomeoftheworkwe’redoingtohelpingest,stabilize,andpreservevariousformsofcommunicationrelatedtotheuniversity,aswellashowwearethinkingaboutwaystomaketheserecordsaccessible.
Aswe’reallwellaware,wenowhavemorewaystocommunicatethaneverbefore.Andthevastmajorityofthiscommunicationiselectronic,andsomeveryephemeral.I’lltouchontwotypesofcommunicationthatwe’verecentlybeenfocusedon:emailandsocialmedia- namelyTwitter.
20
We’llstartwithemail.Thevastmajorityofcorrespondence- beitfromwriterswhoplantogifttheirpapersandcorrespondencetoSpecialCollections,universitypresidentsfulfillingtheirduties,orjusthumbledigitalpreservationlibrarianemailingwiththeirbossmanytimesaday- allofthishappensoveremail.
Andweareresponsibleforthepreservationofthiscontentwhenitcomestimeforadonororpresidenttodeposittheircorrespondence.Whileprintingemailsmayhavebeendesiredornecessaryinthepast,thereareseveralwaystoapproachthedigitalpreservationofemailmessagesnow.
Thefirst,andoneofthemostbasic, isshownonthescreennow.Downloadingyourownemailarchive- thisisanexamplefromGmail.Hasanyonedonethis,evenforbackup? Youcangetyourcalendar,contacts,emails,etc.Andthisisonewaytohavepeopletransfertheiremailsforpreservation
Buttherehasalsobeennewdevelopmentsinthetoolsavailablethatensurethesecuredelivery,andpreservationof,correspondence
21
21
Onetoolwe’vebeenworkingwithrecently,andarecurrentlydevelopingatestdonorusecasefor,isEPADD,whichisanIMLS-fundedprojectoutofStanfordthatisasoftwarethatsupportarchivalprocessesaroundtheappraisal,ingest,processing,discoveryanddeliveryofemailarchives.
Sothistoolisgreatinthatitismodularindesign,andthedifferentmodules(appraisal,process,discovery,etc.)canbesetupindifferentpartofthelibrary.Forinstance,asthedigitalpreservationlibrarian,Imayberesponsiblefortheinitialingestinthe“appraisal”module.,shownhere,whileSpecialCollectionsmayhousethediscoveryanddeliverymoduleslocallyinareadingroom.
Thismodularsetupallowsmetodothingslikereviewemailfortechnicalissues,helpdonorssetuptheirinitialingest,reviewfiletypes,andensurecontentissafeandstablebeforemovingthecollectiontoanothermodule,suchasprocessing.
Theamountofdatarelatedtocontacts,dates,attachments, images,sensitiveinformation,isextremelyusefulinthistool,andallowsyoutosearch,flagcontent,andviewemail(includingchainsofcorrespondence)toreadythecollectionforpreservationanduse.
22
22
ThisisasnapshotofwhattheprocessingmodulelookslikeinEPADDfromatestIranthroughmyownemail- youcanseeit’ssetupforuseinSpecialCollectionsandarchiveswiththeabilitytoconnecttheemailcollectiontobothafindingaidandcatalogrecord,theassignmentwithCollectionandAccessionIDs,andmore.
We’recurrentlyinthetestingstagesofthistoolandworkingwithSpecialCollectionsonausecase,soIlookforwardtohopefullygivinganothertalkonthispreservationworkinthefuture.
23
Sowhataboutsocialmedia?Thisisanewerdigitalpreservationpuzzlethatalsohighlightsissuesrelatedtoprivacy,scale,policy,use&access.
Wehavemanyactivestudentorganizations,departmentsandeventsthatarerelatedtothehistoryandrecordsoftheuniversity,nottomentiondonorswhomaybeactiveonsocialmediaanddesiringtodeposittherecordswiththeUniversity.Howcanwepreservethismaterial?
Oneway,especiallywhenworkingwiththepersonalarchivesofanindividual,is,likeemail,torequesttheirpersonaltwitterarchivefordeposit- thisgiveyoutheentirehistoryofyourtweetssincethebeginningofyouraccountinstable,widelyadoptedformatsthatwecaningestintoourrepositories.
24
Butsometimeswe’ddon’tneedtopreservethetweets ofanindividual- rather,wemaywanttopreservetherecordssurroundinganevent,suchastheupcomingbicenntenial atUva.Inthiscase,wehaveonetoolcurrentlyinusecalledTAGS,whichisopensourceandfreelydownloadableforanyonetousewiththeTwitterAPI.
Thistoolallowsyoutogathertweets aroundahashtaginasimpleGooglespreadsheetformat,whichyoucanseeherewithatestusecaseIranaroundthetimeofAntoninScalia’sdeathearlierthisyear.Thistoolisparticularlyusefulinthatitallowsyoutocapturetheimmenseamountsofmetadatathatarebehindeach140-charactertweet, whichcanbeverypowerfulforpreservationandfuturescholarlyuse.
We’realsoalwaysfollowingthedigitalpreservationandarchivescommunityofpracticeandtooldevelopment.Anexampleofanewprojectwe’refollowingisaMellongrant-basedprojectcalledDocumentingtheNow,whichisaimedtoaddressthisevent-basedcapturescenarioofTwitterforscholarlyuseandpreservation.DocNow isgearedtowardlibrariesandarchives,whonowfindthemselvesstewardingandpreservingcontentfromsocialmovementsandhistoricaleventsthatlargelytakeplaceonline,suchastheprotestsinFerguson,Mo.in2014,aroundwhich
25
thegrantisbased.
25
Thelast itemIwantedtotouchontodayishowwe’reapproachingwebarchivingatUVa aspartofourdigitalpreservationwork.
26
AndIthoughtI’dstartbyshowingofftheveryfirstUniversityofVirginiawebsitepagethatisintheInternetArchivefrom1997.ThisisourincomingLibraryDeanJohnUnsworth’s pageatthetime.
Soifit’s intheInternetArchive,it’s preservedforever,right?Notsomuch.Thewonderful,forward-thinkingInternetArchiveismanythings,buttheyhaveexplicitlystatedthattheyarenot,infactapreservationrepositorynordotheyplantobe.AndI’msureeverysingleoneofus,almostdaily,stumblesintoa404error,orgoestoawebsitethatsuddenlyseemstohavebeentakenoveronlybyads.Theinternetisbrutallyephemeralandever-changinginitsstructureandcontent.Sohowareweapproachingthepreservationofweb-baseduniversitycontentovertime?
27
OneofourapproachesisthroughaservicesomeofyoumaybefamiliarwithcalledArchive-It.ThisisasubscriptionservicefromtheInternetArchivethatallowsustoscope,testandrunspecificcrawlsonspecificwebsitestopreserveselectedcontentatintervalswecanset- thismaybejustaonetimecrawl,oranongoingcaptureeverytwoweeks.
ThesecrawlsarethenavailablefordownloadandpreservationviaWebArchivefiles,orWARCs,whichisaninternationalstandardforwebaccessiblecontentinanarchivedstate.Thesefilescontainboththeprimaryresourcesinanyformat,aswellassecondarycontentsuchasmetadata,duplicatecrawlevents,andmore.
Oneoftheongoingdigitalpreservationprojectswe’reworkingonrelatedtowebarchivesincludesaconsortial effortwith2otherAPTrustmembers- PennStateandtheUniversityofMiami- tocreateanAITtoAPTrustpreservationworkflowthatwillallowusthesendtheseWARCfilesdirectlyforpreservationinAPTrust.
28
Sowearewelookingtopreserve?Currentlywe’reinthenascentstagesofourprogram,butourfocusthroughthetestingstageoverthelastyearorsohasincludedcapturingdigitalprojectsandscholarshipsuchastheJapanense TextInitiative;
29
Universityadministrationandlife,includingtheBoardofVisitorsmeetingminutesandstreamingvideo,whichareonlyonline;
30
andstudentorganizationsandpublications,suchastheCavalierDaily.
31
Andthat’snotallwe’redoing.Imentionedtheephemeralandever-changingnatureoftheweb.Archive-Itisthecoreofourapproach,butasvideo,audio,andotherinformationonwebsitescontinuestoshift,growandchange,wealsocombineadditionalwebpreservationtoolswithourAITservice,suchaswebrecorder.io,adeveloping,opensourcesoftwarethatmoreeasilycapturesstreamingcontentonwebsites,aswellastoolssuchasWarCreate,aGoogleChromeextensionthatenablesanyonetodownloadtheWebArchive(WARC)filesfromanybrowseablewebpage.
32
There’splentyofdigitalpreservationworktokeepmebusy,butIwantedtotouchonafewitemsthatwe’recontinuingtoworkonandworktowardwithdigitalpreservationatUVA.
Thefirst referstothetransferofborn-digitalcontent.AndbythisImeancontinuouslyworkingtowardthebest,moststablesolutionsforthetransferofborndigitalcontenttothelibraryandarchivesforingest,preservationanduse- withoutneedingaphysicalcarriersuchasaharddrivetodoso.
Therearedevelopingtoolsandstandardsforborndigitalfiletransferthatallowforthepreservationoffilestructure,allowustoprovidemetadataandcreatechecksumsandreceivenotificationofdeliverytoallowthesecureauditandtransferofimportantdigitalmaterials.Sowecontinuetoworktoworktowardthat.
Thesecond, relatedandveryimportantongoingworkrelatestoconnectingtheworkflows,procedures,andstandardshereatUVa.ThisisalargecollaborativeefforttoconnectmyworkwithdigitalpreservationtotheworkflowsacrossthedifferentdepartmentsinSpecialCollections,CAD,andotherstoensurethatwe’reworkingtogethertopreserve,describeandmakeaccessible thegreatworkswehavehereat
33
UVa
Thelast isabitfurtheroutforusasaprocesscurrently,butitwarrantsmentioningaspartofstrivingtoachievethebeststandardsfordigitalpreservation.Andthatisemulation.
33
Emulationcouldprovideastrategytoallowaccess tocontentbypreservingexecutablecontent(suchasdatabases,visualizationsoftware,etc.) initsoriginalstateandrunningitonvirtualmachines.Thiscouldreplaceorsupplementotherdigitalpreservationstrategiesforfileformatssuchasthemigrationoffilestonewerversionsoftheformat.
TheOliveprojectoutofCarnegieMellonisdoingsomeveryinterestingworkinthisarea,andwewouldbeinterestedtopursueithereatUva.
34
Inclosing,Ihopeyoucomeawayfromtodayknowingabitmoreaboutdigitalpreservationasawhole,andagooddealmoreaboutallofourdigitalpreservationworkhereatUVa.
Thanks.
35