Download - Archiving Websites Containing Archiving Young Composer …besser.tsoa.nyu.edu/howard/Talks/17archiving-streaming... · 2017-05-16 · 5/16/17 3 Archive-It Issues w/Streaming Media

Transcript
Page 1: Archiving Websites Containing Archiving Young Composer …besser.tsoa.nyu.edu/howard/Talks/17archiving-streaming... · 2017-05-16 · 5/16/17 3 Archive-It Issues w/Streaming Media

5/16/17

1

ArchivingWebsitesContainingStreamingMedia

HowardBesser,NYUh5p://besser.tsoa.nyu.edu/howard/Talks/

Besser-IS&TArchiving16/5/2017 1

ArchivingWebsitesContainingStreamingMedia

•  Backgroundissuesandproblems•  ArchivingYoungComposerwebsites

– OurTechnicalCollaboraOon– OurCollaboraOonwithContentCreators(preservaOon,accesscontrol,rights,agreements)

– Workflows– Howthingswilllook– EvaluaOon

•  ImpactbeyondthisProject

Besser-IS&TArchiving16/5/2017 2

BACKGROUNDISSUESANDPROBLEMS

Besser-IS&TArchiving16/5/2017 3

WebArchivingposeschallenges

•  Anygivenwebpagemaybeupdatedfrequently

•  Weblinksconstantlybreak(404errors)•  Fewtools/servicesexistfor“Curated”webarchiving(Archive-It,CDL’sWAS),andtheyrequiresignificanttraining/experiencetolearn,butwedohaveint’l-acceptedformat(WARC)

Besser-IS&TArchiving16/5/2017 4

ManyparametersneedtobesetforWebArchiving

•  Frequencyofcrawls•  Depthofcrawls(#ofhops)•  StarOngpointsofcrawls(seeds)

Besser-IS&TArchiving16/5/2017 5

Otherissuesfordevelopinggoodcrawls

•  Qualitycontrol/assurance•  Workflows•  Fidelitytooriginalwebpages•  Howenduserwillnavigateandviewit

Besser-IS&TArchiving16/5/2017 6

Page 2: Archiving Websites Containing Archiving Young Composer …besser.tsoa.nyu.edu/howard/Talks/17archiving-streaming... · 2017-05-16 · 5/16/17 3 Archive-It Issues w/Streaming Media

5/16/17

2

Archive-It

•  TheleadingapplicaOon/serviceforcuratedwebarchivinginNorthAmerica

•  RunbytheInternetArchive,andismuchmoretargetedandcuratedthantheirWayBackMachine

•  IsbasedonCrawlersogwaredevelopedbyIA(Heritrix)in2003-2004

•  IsverypooratcapturingstreamingaudioorvideoaswellasinserOngitproperlyintoacomposedwebpage-

Besser-IS&TArchiving16/5/2017 7

Archive-ItIssuesw/StreamingMedia

Besser-IS&TArchiving16/5/2017 8

Archive-ItIssuesw/StreamingMedia

Besser-IS&TArchiving16/5/2017 9

Archive-ItIssuesw/StreamingMedia

Besser-IS&TArchiving16/5/2017 10

Archive-Itscreenshotsgeneratedaspartofourproject-

•  ByLorenaRamirez-Løpez

Besser-IS&TArchiving16/5/2017 11

Archive-ItIssuesw/StreamingMediaFireFoxversion39.0.ScreenshotofTarikO’Regan’ssitetaken2015/10/05

Besser-IS&TArchiving16/5/2017 12

Page 3: Archiving Websites Containing Archiving Young Composer …besser.tsoa.nyu.edu/howard/Talks/17archiving-streaming... · 2017-05-16 · 5/16/17 3 Archive-It Issues w/Streaming Media

5/16/17

3

Archive-ItIssuesw/StreamingMediaFireFoxversion39.0.ScreenshotofTarikO’Regan’ssitetaken2015/10/05

Besser-IS&TArchiving16/5/2017 13

Archive-ItIssuesw/StreamingMediaFireFoxversion39.0.ScreenshotofTarikO’Regan’ssitetaken2015/10/05

Besser-IS&TArchiving16/5/2017 14

Archive-ItIssuesw/StreamingMediaFireFoxversion39.0.ScreenshotofTedHearne’swebsitetaken2015/10/05

Besser-IS&TArchiving16/5/2017 15

Somesourcesofstreamingissues

•  Problemswithcapturingresourcesresidingon3rdpartyservices(YouTube,Vimeo,Soundcloud)

•  ProblemswithhowfaithfullytheA/VmaterialsarecapturedandplacedbyArchive-It

•  ProblemswithwebsitesgeneratedthroughsitebuildingplamormssuchasSquarespace

Besser-IS&TArchiving16/5/2017 16

ARCHIVINGYOUNGCOMPOSERWEBSITES

Besser-IS&TArchiving16/5/2017 17

ArchivingComposerWebsitesh5p://www.nyu.edu/about/news-publicaOons/news/2015/03/27/nyu-libraries-to-team-with-internet-archive-to-preserve-high-

quality-musical-content-on-the-web.html

•  Collect,preserve,&makeavailableWebsitesofComposers

•  $480,000grantfromMellonin2015toNYULibrary/MIAP/InternetArchive

•  Dealingwiththeissuethatcontemporarycomposerwebsitesgoupanddown(andalsoincorporaterelaOonship-buildingbtwncomposerandfans)

•  AddressingtheproblemsofcollecOngstreamingmedia•  AlsoselecOvelycollecOnghigh-qualityversionsthatareusedtogeneratethestreams,andallowingfutureresearcherstosee/hearthehigherqualityversions

Besser-IS&TArchiving16/5/2017 18

Page 4: Archiving Websites Containing Archiving Young Composer …besser.tsoa.nyu.edu/howard/Talks/17archiving-streaming... · 2017-05-16 · 5/16/17 3 Archive-It Issues w/Streaming Media

5/16/17

4

ArchivingComposerWebsites

Besser-IS&TArchiving16/5/2017 19

•  DevelopgoodandongoingrelaOonshipsbtwnLibrariesandComposers

•  DevelopTrust–  fordevelopingcollecOons,andconOnuingtoaddtothem–  forPolicyreasons

•  Examinewhattypeoferrorstakeplace–  howfaithfullyaudiovisualmaterialsarebeingcaptured–  howresourcesthatresideonthird-partyweb-services(YouTube,Vimeo,Soundcloud)are(not)displayedwithinArchive-It’sinterface

–  IssueswwebsitesgeneratedthroughsitebuildingplamormssuchasSquarespace

•  Findwaystofixthoseerrors

MetricsAccomplished(asofJan2017)

•  172Composersitescrawled,scoped,assessedforquality,&analyzedforproblems(feedingintoIAdevelopmentwork)

•  800QA/QCreportsgenerated•  IniOalwebarchivingagreementfrom165Composers(25fromNPR’s100)

•  IdenOfiedwebsiteinfrastructuresencounteredandcreatedaclassificaOonmatrix-

Besser-IS&TArchiving16/5/2017 20

WebsiteInfrastructureencountered

Besser-IS&TArchiving16/5/2017 21

ProjectTeam•  JeffersonBailey(InternetArchive)•  HowardBesser(MIAP)•  LoriDonovan(InternetArchive)•  AprilHathcock(Lib/ScholComm)•  NicoleGreenhouse(Lib/ACM)•  CarolKassel(Lib/DLTS)•  Sco5Statland(MIAP)•  DonaldMennerich(Lib/ACM/DLTS)•  DavidMillman(Lib/DLTS)•  CourtneyMumma(InternetArchive)•  RobinPreiss(Lib/AFC)•  LorenaRamirez(MIAP)---specialthanks!•  MichaelStoller(Lib/C&RS)•  KentUnderwood(Lib/AFC)•  ChelaSco5Weber(Lib/AFC)

Besser-IS&TArchiving16/5/2017 22

OURTECHNICALCOLLABORATION:CRAWLING

Besser-IS&TArchiving16/5/2017 23

NYU/IACollaboraOon

Besser-IS&TArchiving16/5/2017 24

Page 5: Archiving Websites Containing Archiving Young Composer …besser.tsoa.nyu.edu/howard/Talks/17archiving-streaming... · 2017-05-16 · 5/16/17 3 Archive-It Issues w/Streaming Media

5/16/17

5

NYU/IACollaboraOon

Besser-IS&TArchiving16/5/2017 25

TradiOonalCrawlers

Besser-IS&TArchiving16/5/2017 26

•  Archive-ItandotherwebarchivesuseHeritrix•  Followlinks,capturemostwebcontent•  Lesssuccessfulwithstreamingvideoanddynamiccontentexecutedinthebrowser

•  Umbrahelps

BROZZLER!

“browser” | “crawler” = BROZZLER

Logo: Noah Levitt Besser-IS&TArchiving16/5/2017 27 Besser-IS&TArchiving16/5/2017 28

BrozzlerSystemArchitecturev1

Besser-IS&TArchiving16/5/2017 29

OURCOLLABORATIONWITHCONTENTCREATORS

Besser-IS&TArchiving16/5/2017 30

Page 6: Archiving Websites Containing Archiving Young Composer …besser.tsoa.nyu.edu/howard/Talks/17archiving-streaming... · 2017-05-16 · 5/16/17 3 Archive-It Issues w/Streaming Media

5/16/17

6

YoungComposersCorpus

•  BeganwithNPR’s2011listof“100ComposersUnder40”

•  91of100haveownself-containedsites•  Asof5/2016havewri5enagreementswith165Composers(25ofthemfromNPR’slist)

•  Willrecruit10ofthemforenhancedarchiving(uncompressed;be5erthanwhatisonwebsite)–  Thiswillrequireanaddedappendixtocontract/agreement(whichmayinvolvedarkarchivingand/orrestrictedaccess)

Besser-IS&TArchiving16/5/2017 31

BuildingrelaOonshipswithComposers

•  EngagethemwiththeideaofpreservingtheirWebsite

•  Aretheywillingtogiveusricherversionsofcontentontheirsite?

•  Aretheywillingtomakeall(orjustpart)ofthecontentfreelyaccessible?Dotheywanttoembargosomecontentinadarkarchive?

•  DonorAgreement/Contract-

Besser-IS&TArchiving16/5/2017 32

DonorAgreement/Contract

•  Havebeenworkingonthiswithlawyersforapproximatelyoneyear

•  Havehadfairlystablelanguageinitfor6months,and2contractsalreadysignedandreturned

•  Doesdefaulttoallowinguscompleterightsforreformaungandforallowingresearcherstosee/hearallhighqualityversionsatminimumon-site– AndthusfarallComposerscontactedhaveagreedtothoseprinciples(butnotnecessarilytothecontractuallanguage)

Besser-IS&TArchiving16/5/2017 33

LongTermPreservaOonforScholarship

Besser-IS&TArchiving16/5/2017 34

Highestquality;futurelibraryprocesses

Besser-IS&TArchiving16/5/2017 35

ComposerindicatesrestricOonsonAccess

Besser-IS&TArchiving16/5/2017 36

Page 7: Archiving Websites Containing Archiving Young Composer …besser.tsoa.nyu.edu/howard/Talks/17archiving-streaming... · 2017-05-16 · 5/16/17 3 Archive-It Issues w/Streaming Media

5/16/17

7

WhoarecerOfiedusers?

Besser-IS&TArchiving16/5/2017 37

ARCHITECTURE&WORKFLOWS

Besser-IS&TArchiving16/5/2017 38

Architecture&Workflows

•  Fullcopyofallwebsites(inclrichercontent)storedinNYURepositoryandaccessiblethroughNYUFindingAids

•  MetadataisinArchiveSpace•  ConnecOonsbuiltoffofArchiveSpaceback-endAPI

Besser-IS&TArchiving16/5/2017 39

CurrentDevelopmentwork

•  Supplyingaseparateaudioplayer?•  HiringaDigitalArchivist•  PreciseformsofnavigaOonbtwnArchiveSpace,Archive-It,andrichercontentwithinNYU’sdigitalrepository

•  API-

Besser-IS&TArchiving16/5/2017 40

API•  WhatIAneedsfromNYUAPI

–  APIURL–  CredenOals(username,password)->AuthenOcaOonToken()–  RepositoryID–  ResourceID

•  WhatIAwillreturnasJSONarray–  UnitTitle–  Creator–  DataExpression–  ExtentStatement–  TechCharacterisOcs–  [SomethingBasedonAccessRestricOon,i.e.canitbestreamed]???

•  WeSpeakEtruscan,1993May21,23.5MB,1AIFFfileStereouncompressed16bit/44.1K

•  TheDreamofInnocenceIII,1998March26,150MB,1AIFFfileStereouncompressed16bit/44.1K

Besser-IS&TArchiving16/5/2017 41

HOWTHINGSMAYLOOK

Besser-IS&TArchiving16/5/2017 42

Page 8: Archiving Websites Containing Archiving Young Composer …besser.tsoa.nyu.edu/howard/Talks/17archiving-streaming... · 2017-05-16 · 5/16/17 3 Archive-It Issues w/Streaming Media

5/16/17

8

UserQueries

•  UserbrowsesthroughArchive-It•  UserseesthatA/Vcontentexists(andinsomecases,itwillincluderichercontent,butsomeofthatmightbeaccess-restricted)

•  Archive-IthandsoffusertoNYU(eitherdirectlytoA/Vcontent,ortoFindingAid)

Besser-IS&TArchiving16/5/2017 43

OneopOonforQueries

Besser-IS&TArchiving16/5/2017 44

OneopOonforhighqualitycontent

•  OnarchivedwebsitepagelisOngcomposer’scontent,userseesamessagethathigherqualitycontentisavailable,with:– AccessrestricOons,ifapplicable– Linktorelevantfindingaid–  (lookinglikefollowingimage)-

Besser-IS&TArchiving16/5/2017 45 Besser-IS&TArchiving16/5/2017 46

EVALUATION

Besser-IS&TArchiving16/5/2017 47

EvaluaOonforImprovement

•  ComposersandtheirsaOsfacOonwiththewaysinwhichaudienceswillbeabletoviewarchivesoftheirwebsites

•  Researchers,andwhetherthecontentandfuncOonalityofthesewebarchivesworksforthem

•  Tweakingwhatwedoinordertobe5erserveCreatorsandResearchers

Besser-IS&TArchiving16/5/2017 48

Page 9: Archiving Websites Containing Archiving Young Composer …besser.tsoa.nyu.edu/howard/Talks/17archiving-streaming... · 2017-05-16 · 5/16/17 3 Archive-It Issues w/Streaming Media

5/16/17

9

ScheduleandMethodologyforEvaluaOon

•  Dec2017—Scheduleone-on-oneinterviewswithsetsofcomposersandResearchers

•  Jan-Mar2018—Onehourindividualsessionswith10Composersandalsowith10Researchers,havingthemlookattheuserinterfaceandconductqueries–  Composers:AretheysaOsfiedwithhowaudienceswillbeabletoviewthe

archivalcopiesoftheirwebsites?Isitbe5erorworsethantheirownlivesites?AretheysaOsfiedwiththeaudioandvideoplacementandquality(aswellasopOons)?AretheycontentwiththeDonorAgreement?Whatchanges/improvementsmightbemadetoanyofthese?

–  Researchers:Cantheyfindwhattheyneedinthewebarchive?Isitdifficult(clunky)touse?Whatpartsdon’tworkwelloraren’tintuiOve?WewanttoidenOfywhatchangesinthecontent,funcOonality,ornavigaOonfeatureswouldimprovetheiruserexperience

•  Apr-May2018—ConstrucOonofEvaluaOonSummarycontainingthelistofimprovements/changesthatshouldbemadetotheArchivingproject

•  June-Aug2018—Implementthechanges

Besser-IS&TArchiving16/5/2017 49

IMPACTBEYONDTHISPROJECT

Besser-IS&TArchiving16/5/2017 50

ImpactBeyondthisProject•  Archive-Itwillbeabletobe5erhandlestreamingmedia,anddisplayitinpropercontext

•  WewillhavearchitecturesandworkflowsforArchive-Ittointeractwithricherlocalresources(aswellasexamplesofhowinteracOonandnavigaOoncanproceedbtwnArchive-It,ArchiveSpace,FindingAids,andaninternaldigitalrepository)

•  ModelsforinteracOonbtwncreatorsandcollecOngorganizaOonswillhavebeendeveloped(incldonoragreements)

•  Wewillhavepreserved100+++websitesofyoungcomposers

Besser-IS&TArchiving16/5/2017 51

ArchivingWebsitesContainingStreamingMedia

•  h5p://besser.tsoa.nyu.edu/howard/Talks/•  h5p://www.nyu.edu/about/news-publicaOons/news/

2015/03/27/nyu-libraries-to-team-with-internet-archive-to-preserve-high-quality-musical-content-on-the-web.html

Besser-IS&TArchiving16/5/2017 52