Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016
Proposal for Electronic Archiving System (EAS) as Free Open Source Software
1
EAS at Harvard EASisasystemthatenablesingest,managementandbasicpreservationofemailandalsopavesthewayforaccesstoemail.Itprovidesfeaturestoidentifypolicyandcuratorialissuese.g.rightsmanagement,eventstrackingetc.EASdoesnotaddressthecaptureofemailnordoesitaddressdiscoveryoremaildeliveryforendusers.Itfocusesonthecurationofemailinpreparationforlongtermpreservation.Theprojectwasdevelopedinconjunctionwith3corepartnersatHarvardUniversity(SchlesingerLibrary,HUArchivesandCountwayLibrary)with2additionalparticipants(HarvardArtMuseumsandGSDLoebLibrary).EASwasbuilttofulfilltheneedsoftheHarvardUniversitypartnersandisintegratedwithseveralotherHarvardUniversitysystems–AMS,Policy,WordshackandDRS.
Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016
Proposal for Electronic Archiving System (EAS) as Free Open Source Software
2
Archiving Email
The lifecycle 1. Collectiondevelopment
a. Pre-acquisitionappraisal✗2. Accessioning
a. Capture✗b. Normalization.✔
3. ArchivalProcessinga. Item-levelprocessing✔b. Bulkprocessing✔c. Intellectualarrangement✔d. Searchcapability✔e. Personal/Sensitiveinformationprocessing✔
4. Preservationa. Packaging✔b. Repositoryþ
5. OnlineDiscovery✗6. Access✗
✔-EASsupportsþ-EASsupportsviaDRS2✗-EASdoesnotsupportNoteveryinstitutionwillwanttofollowtheentirelifecycle.
The community and the Tools InJune2015therewasanArchivingEmailSymposiumhostedbytheLibraryofCongresswithover150attendees.AttendeesincludedpeoplefromTheSmithsonianInstitute,NARA,EmoryUniversity,StanfordUniversityamongstothers.Therewasinterestintoolstohelpinpreservingemail.Itwasalsoapparentthatthereisnoonetoolthatcoverstheentirelifecycle.Acombinationoftoolsmayhelpinstitutionsintheireffortstoarchiveemailforlongtermpreservation.InfactmanyinstitutionsusedAid4Mailand/orEmailchemytoconvertemailtostandardmboxoremlformatbeforeusingthatoutputasinputtothenexttool.
Open Sourcing EAS ByopensourcingEASitismorelikelythatotherinstitutionswillcollaborateinmakingtheirtoolsinteroperablewithEAS.ThiswouldbeadvantageoustoHarvardUniversitywheresomeEASusershaveexpressedaninterestintheuseofePADDasadonorappraisaltoolwhoseoutputmightthenbeimportedintoEAS.Itwouldalso
Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016
Proposal for Electronic Archiving System (EAS) as Free Open Source Software
3
beadvantageoustothecommunitywhoarelackingatoollikeEASwhichcouldbeusedstandalone.
EAS Technology EASiswritteninJava.EASi,theuserinterfacetoEAS,isajavaStruts2webapplicationthatrunsinTomcat.MostofthesoftwareusedinEASisopensourcewiththenotableexceptionoftheuseofinternalLTSsoftwarelibraries,theuseofOracleasthedatabaseandEmailchemyasthesoftwareusedtoconvertemailfromclosed,proprietaryfileformatstoastandardEMLformat.EASdoesnotprovideanAPIforusebyothers.
• Tomcat8• Java8• Struts2• Ant• Gradle• Maven• Oracle12(commercial)• Hibernate5.3.7• Emailchemyembeddedversion(commercial)• Mime4j0.6• Solr4.10.1• Solrj• jQuery1.8.2• jQueryUIThemeRoller• ajax-solr• flexigrid• YUIGrids• JSP• CSS• LTSutilities(proprietary)• FITS0.8• Springbatch4.0.1
SinceEASmaycontainsensitiveinformation,includingHRCI,asecurityarchitecturewascreatedtoprotectthisdata.Thissecurityarchitectureismainlyinfrastructureforexamplethroughtheuseofsecurenetworksandsshmountingoffilesystems.
Commented [RG1]: Securitywhenindocker
Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016
Proposal for Electronic Archiving System (EAS) as Free Open Source Software
4
EAS Integration with other Harvard Systems
EASintegrateswithseveralHarvardLibrarysystemsasshowinthediagramabove.
AMS (Access Management System) AMSisanLTSsystemthatprovidesauthenticationandbasicauthorizationservicesforlibrarysystems.AMSinturninteractswithHarvardKeyandLDAP.Itisawebapplicationthatmakesuseofcookiesandbrowserredirects.EASredirectsusers’browserstoAMSandinspectsencryptedcookiesthatAMScreates.EASmakesuseofanAccessclientjarinordertomanagethis.
Policy Policyisusedforauthorizationtolibrarysystems.EASmakesuseofaPolicyclientjarthatisusedtoperformdirectdatabasequeries.
Wordshack Wordshackistheauthoritycontrol/vocabularymanagerforEASandforDRS2.Wordshackmanagesadmincategory,adminflag,emailaddress,person,organization,softwareandtopicterms.ThesetermsareusedthroughoutEAS.InteractionwithWordshackisviaaRESTfulapi,howeverforperformancereasonstermsarestoredlocallyintheEASdatabase.EASmakesuseoftheclientjarfilesprovidedbyWordshackforinteractingwithWordshack.
Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016
Proposal for Electronic Archiving System (EAS) as Free Open Source Software
5
DRS (Digital Repository Service) EASusestheDRSversion2(DRS2)asthelongtermpreservationrepositoryforemails.EASwritesDRS2specificbatchestothefilesystemwhenpushingitemstoDRS2forlongtermpreservation.EASalsointeractswithDRS2viaaRESTfulapi,makinguseoftwoclientjarfilesprovidedbyDRS2.ThisRESTfulapiisusedforseveralinteractionswiththeDRS2including:CollectionsarecreatedinDRS2viaEASiAccountsareretrievedfromDRS2foruseinEASBillingcodesareretrievedfromDRS2foruseinEAS
One code base to serve us all EASistocontinuetofirstandforemostservetheHarvardUniversitycommunity.Thisrequiresit’scontinueduseofLTSspecificsystems.TomakeEASopensourceandusefulbyothersoutsideofHarvardUniversityitisnecessarytodisentangleEASfromotherLTSsystemsandfromcommercialorproprietarysoftware.FortheinitialreleaseofEASasOSSweareaimingforaminimumviableproduct–itwillcontaincorefeatureswhichwillpermitittobedeployedandbeusablewithlimitedfunctionality.Itisproposedtomanagethisthroughadependencymanagementbuildtoolandconfigurationmanagement.Therewillbeonecodebasewithoneoftwobuildversionsproduced–theLTSbuildversionandtheOpenSourcebuildversion.ThebuildfilefortheLTSversionwillbeexcludedfromtheopensourcegithubsourcecontrolrepository.InternalLTSjarfilesshouldonlybeusedintheHarvardUniversityversionofthebuiltsystemandexcludedfromtheopensourcedependencies.Theopensourceversionshouldonlyrequireopensourcedependenciesandshouldresultinastandalonebuiltsystem.Alaterphasewilladdressintegration/interoperabilitywithothertools.EASmakesuseofEmailchemyforconversionofemailstothestandardEMLformat.Thisisacommercialproduct.ItwouldbebeneficialtorefactorEAStopermitit’susewithoutthiscommercialproduct.ThiswouldfacilitatepackagingEASinaDockercontainersinceitwouldbeabreachoflicensetoincludeEmailchemyinapubliclyavailableDockercontainer.EASalsomakesuseofthecommercialOracledatabase.EASdoesnotmakeuseofOraclespecificfeaturesandcouldbeconfiguredtoalsoworkwiththeopensourcePostgreSQLdatabase.ThiswouldlowerthebarrierforadoptionofEASandalsopermitpackagingaprepopulateddemoofEASinaDockercontainer.
Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016
Proposal for Electronic Archiving System (EAS) as Free Open Source Software
6
EAS initial refactoring with Anti-corruption layers between Bounded Contexts
AnAnti-corruptionlayerisaconceptfromDomainDrivenDesign.InthecaseofEASthereareseveralboundedcontexts(authentication,accesscontrol,controlledvocabulary,collectionsmanagementetc.)thatcouldbenefitfromthislayer,permittingfutureimplementationstobepluggedinmoreeasily.OnewayoforganizingthedesignoftheAnti-corruptionlayerisasacombinationofFacades,Adaptorsandtranslators,alongwiththecommunicationandtransportmechanismsusuallyneededtotalkbetweensystems.UsingdependencyresolutionandconfigurationmanagementeithertheLTSspecificorthedefaultOSSspecificimplementationswillbeavailable.Configurationwillbeusedtomanagefeaturetogglesandfeaturegates:
Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016
Proposal for Electronic Archiving System (EAS) as Free Open Source Software
7
Example code for use of feature toggle and feature gate:
If (gateClient.isUngated(“my.feature.name”)){ doFeatureCode(); }
Future Possibilities
Integration with ePADD ePaddconsistsof4modules,inorderofworkflowusagetheyare:appraisal,processing,discoveryanddelivery.Mboxfilesarefedintotheappraisalmoduleandtoproceedtothenextmoduleitisnecessarytoexporttoanarchive,aninternalePADDnon-standardartifact.Conceptuallyanarchiveisacollectionofindexedmessagesalongwithablobstore.Thisarchivethenneedstobeimportedtothenextmoduleandtheprocessrepeatsforeachmodule.Thedeliverymoduledoesprovidetheabilitytoexportemailstomboxformat,butitmaynotbelossless.TheApril2016releaseofePADDisplannedtopermittheexportofemailstomboxformatfromtheappraisalmodule–againitmaynotbelossless.TheintentoftheePADDappraisalmoduleisforusestandaloneonadonor’sworkstation.AtHarvardUniversity,itisdesirablefordonorstobeabletousetheappraisalmoduleofePADDandforcuratorstousetheresultofthatprocessinginEAS.Somepossibleapproachesforthatachievingthatfollowbelow.WiththeApril2016releasethedonorwillbeabletoexporttheresultoftheirprocessingtomboxformatwhichcouldthenbeimportedintoEAS.EAScouldsplitthesemboxfilesintoemlfilesitself,withouttheuseofEmailchemy(itiseasytoidentifythestartofeachnewmessagebythepresenceofthe“From_line”).Thiswouldneedsomemechanismforcontrollingthis.AlternativelythemboxcouldberunthroughsomesoftwaretoproduceemlfileswhichcouldthenbeimportedintoEAS.LTSwouldrequirethatthisapproachberecorded–viaeventsandclientagent.Asanalternative,ePADDcouldprovideaclientjarfileforextractingemailsfromanarchiveintomboxorevenemlformat.EAScouldusethistoprocessanePADDarchive.Thedisadvantageofthisapproachisthatitwouldonlyworkwithjavaapplications.TheePADDarchivecontainsserializedobjectswhichcanonlybereliablyreconstitutedbyusingthejavalanguagetodoso–thislimitshowportablethesearchivesare.
Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016
Proposal for Electronic Archiving System (EAS) as Free Open Source Software
8
Potential Roles LTSprojectmanagerformanagingossinfrastructure,frameworkandformovingeastooss..HLprojectmanagerforliaisingwithHLcommunity,externalcommunityandHLleadership.LTSdevelopers
EASDRS Discovery Access/DeliveryWordshack
Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016
Proposal for Electronic Archiving System (EAS) as Free Open Source Software
9
Open Source Software Checklist BasicFactors Explanation Remarks R A C IUsefulness Softwareshould
beusefulmoreorless“asis”.
OSSversionshouldnotincludeLTSspecificjars.OSSversionshouldnotrequirecommercialproducts.
LTS ? ? ?
Interoperability Ifthesoftwareinteroperateswithothersoftwaretools,theopensourceprojectshouldhavewelldocumented,preferablystandardsbased,interfacestoexternalcode-webservices,classinterfaces,orotherpoints.
EASneedstoberefactoredtoprovideabstraction/anti-corruptionlayerswherealternateimplementationsmaybepluggedin.
LTS ? ? ?
License Thesoftwareshouldbereleasedwithalicensestatemente.g.Apache2,GPL,LGPL,MIT,BSD,AGPLv3.
Choiceislimitedbydependencyonsoftwarewithrestrictivelicenses.Ifagivendependencyisoptionalhowdoesthataffectthelicenserequirement?
LTS ? OCG&HUITCTO
HL
ContributorLicenseAgreement
Manyopensourceprojectsrequirethis.
LTS ? OGC&HUITCTO
HL
Copyright AttopofeachClass
LTS LTS OGC&Provost
HL
Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016
Proposal for Electronic Archiving System (EAS) as Free Open Source Software
10
Office&HUITCTO
Patent Somesoftwareincludesapatentinadditiontothesoftware.
Anexampleisthefacebookreactjslibrary.InvestigateHUpoliciesgoverningthis.
LTS LTS OGC&ProvostOffice&HUITCTO
HL
UserDocumentation
Forusebyusers ? ? ? ?
DeveloperDocumentation
Forusebydevelopers
LTS ? ? ?
CodeDocumentation
Classlevelataminimum
LTS ? ? ?
Sourcecontrol Github LTS ? ? ?IssueTracking Github LTSusesjira.
HowdowesynchronizeGithubissueswithinternalLTSjiraissues?
LTS ? ? ?
Deploymentpackaging
Shouldweprovideareadytorunimplementation?Thiswouldenableeasieradoptionbyothers.
ePADDprovidesapackagedversionreadyfordeploymentonWindowsorMac.EASusessomelinuxspecificfunctionality.WecouldprovideaDockerizedversionconfiguredforquicksetup.
LTS ? ? ?
Demo Shouldweprovideaself-containeddemo?
Providealotofexamplesandconcentrateonhavingsomereallyshinyonestoimpressusers/developersenoughtotakea
Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016
Proposal for Electronic Archiving System (EAS) as Free Open Source Software
11
closerlook.Contributions Whoshould
decidewhatcontributionstoaccept?
Contributionsincludeideas.
LTS? HL? ? ?
Committers InitiallyLTSonly LTS ? ? ?Tests Docontributors
needtoprovidetestsforcontributedcode:Unittests,integrationtests,functionaltests?
Initiallywillonlyacceptideasandnotcode.
LTS ? ? ?
Documentation Whatlevelofdocumentationwouldwerequireforcontributedcode?
Initiallywillnotbeacceptingcode.
LTS ? ? ?
Support Needaforumfordiscussingfeatures,technicalissuesetc.Whatforum?
RequiresanEmaillist/Googlegroupetc
LTS ? ? ?
Outreachandcommunications
Whatforumsdowewanttoposton?Whateventsdowewanttopresentat?
HL ? ? ?
R:Responsible–whoisassignedtodotheworkA:Accountable–whomakesthefinaldecisionandhasultimateownershipC:Consulted–whomustbeconsultedbeforeadecisionoractionistakenI:Informed–whomustbeinformedthatadecisionhasbeentakenHL:HarvardLibraryLTS:LibraryTechnologyServicesOGC:OfficeoftheGeneralCounsel
Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016
Proposal for Electronic Archiving System (EAS) as Free Open Source Software
12
Proposed work for Open Sourcing EAS
Miscellaneous : FieldsthataremandatoryinEASduetoitsintegrationwithDRS2willremainmandatoryandwillbepopulatedwithdefaultvaluesintheOSSversion.Future–maymakethesemandatoryfieldsconfigurableinfuture.1 CreateanewbuildprocesswithdependencyresolutionDescription ThecurrentbuildprocessforEASusesANTwithnodependency
resolution.AlternativesareIvy,MavenorGradle.FirstchoiceisGradle,secondchoiceisIvy.Mavenisstubbornly“opinionated”andwouldnotaccommodatemanyofourexistingLTSprojects.
Update Movetomavenanddockerandpossiblyansible–ongoingNeedtoupdateversionofdocker/dockercomposeTheLTSchangecontrolprocesshaschangedandisinfluxduetotheintroductionofdockerandansible.Implementationdetail-JavaserviceLoadermaybeusedtofacilitateswitchingimplementationsofservices.Mavencanthenbeusedtopullinthecorrectimplementationjartothebuild.
Comments ThisenablesacustomizedbuildforLTSversustheOSSversion.ThismustworkwiththeLTSchangecontrolprocess.LTSproprietaryjarsshouldbeexcludedfromtheOSSbuilddependencies.Jarsfromdependentprojects(e.g.hibernate)shouldbepulledinusingdependencymanagementduringthebuild.Question:Fitsincludesots.jarwhichisaproprietaryLTSjar.Whatistheimplicationofthis?EAScurrentlyuses93jarfilesinadditiontothoseusedbyFitsandSolr.SeeGrouperBuild/DependencyManagementforsomereasoningonthechoiceofabuildtool.Thisisanabsolutepre-requisiteforthisproject.Itisnot
Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016
Proposal for Electronic Archiving System (EAS) as Free Open Source Software
13
possibletoexcludejarfileswithoutthis.Buildsystemneedstobesetupinordertocontinuedevelopment.
Future Dependencies LTSArtifactoryinstanceshouldbepopulatedwithrequiredjarsFeedback RS–thisisatechnicaldebtprojectandthereforedoesnot
belonginthisprojectbutratherina“technicaldebt”projectProposedphase Phase12 Abstract out authenticationDescription Abstractoutauthenticationsothatitcanbeconfiguredto
1. UseAMSforauthentication2. UseauthenticationinformationfromanXMLfile3. Facilitateplugginginofnewauthenticationmechanismin
futureUpdate SwitchfromAMStoCAS.
DoincludeXML/Json–usejsonschemaforvalidationofdata.Comments Authenticationiscurrentlycloselytiedtotheuser’sHUID,which
isusedthroughoutthesystem.Theuser’semailaddresswhichAMSreturnsviaanLDAPlookupisalsoused.Forsecurityreasons,theLTSversionmustonlyworkwithAMSandavalidHUID.TheOSSversionshouldnotbeconfigurabletouseAMSandshouldnotincludetheaccess.jarfile.Itshouldfailgracefullyifmisconfigured.
Future InternaldatabaseOAuthShibbolethCASLDAPActiveDirectoryOpenConnect
Dependencies (1)Feedback AM–implementLDAPforphase3
GR–dependingonfeedbackfromcommunitydecideonwhichimplementationtouseforphase3
ProposedPhase Phase13 AbstractoutauthorizationDescription Abstractoutauthorizationsothatitcanbeconfiguredto
1. UsePolicyforauthorization
Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016
Proposal for Electronic Archiving System (EAS) as Free Open Source Software
14
2. UseauthorizationinformationfromanXMLfile3. Facilitateplugginginofnewauthorizationmechanismin
futureUpdate TalkwithIAMaboutmakingdirectAPIcallstogrouper
NeedtosetupgroupergroupsforEAS–runthembyLTSSupportIfIAMwon’tpermitdirectAPIcalls–staywithusingpolicyStillneedtousePolicyforDRSdepositorlookupForOSSUseauthorizationinformationfromanXML/Jsonfile(usejsonschemaforjsonvalidation).
Comments CurrentlytheuserHUIDisusedtolookuppolicyinformation.Forsecurityreasons,theLTSversionmustonlyworkwithPolicyandavalidHUIDTheOSSversionshouldnotbeconfigurabletousePolicy.Itshouldfailgracefullyifmisconfigured.
Future InternalDatabaseGrouperLDAP
Dependencies (1)(2)Feedback AM–implementLDAPforphase3
GR–dependingonfeedbackfromcommunitydecideonwhichimplementationtouseforphase3
Proposedphase Phase14 EnableconfigurationtousePostgreSQLinsteadofOracleDescription EAScurrentlyisconfiguredtouseOracle.Itmakesnouseof
OraclespecificfeaturesandcouldworkwithPostgreSQLviaminorconfigurationchangessinceEASusesHibernateORM.
Update SwitchallversionstousePostgeSQL(RDS)WillinvolveworkfromSharon’sgroupWillalsoinvolveinputfromAM/SharontoensureitremainsattherightsecuritylevelforHRCI.
Comments UseofPostgreSQLremovesadependencyonacommercialdatabase.Thiseliminatesconstraintsconcerninglicenserestrictions.UseofPosgreSQL:
• LowersthebarriertoadoptEAS(nolicensetopay)• Permitsthecreationofaselfcontained,pre-populated
EASDemoinaDockerContainer(ItisabreachoftheOraclelicensetodeploythedatabaseinaDockerContainer)
TheLTSversionofEASshouldcontinuetouseOraclefor
Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016
Proposal for Electronic Archiving System (EAS) as Free Open Source Software
15
performanceandoperationalreasons.TheOSSversionofEASshouldbeconfigurabletouseeitherOracleorPostreSQL.
Future Dependencies (1)Feedback Proposedphase Phase15 AbstractoutAccounts(DRSownercodes)andBillingCodesDescription Ownercodes(Accounts)arestoredinDRS2andalocalcopyis
createdintheEASdatabasewhenenabledforuseinEAS.BillingcodesarestoredinDRS2andretrievedforuseinEAS.AbstractoutAccountsandBillingcodessothatEAScanbeconfiguredtouse:
• AccountsandBillingCodesfromDRS2• AccountsandBillingCodesfromanXMLfile• Facilitateplugginginofothermeansofretrieving
AccountsandBillingCodesinfuture
Update OssuseXMLorJSonconfiguration–usejsonschemaforvalidation.
Comments TheLTSversionshouldonlyworkwithAccountsandBillingcodesfromDRS2.OSSversionshouldnotbeconfigurabletouseAccountsandBillingCodesfromDRS2.
Future AM-Makeitconfigurabletomakeaccountsandbillingcodesoptional.
Dependencies (1)(2)(3)Feedback RS-shouldworkoutimpactontimeitmighttaketoimplement
bullet3above.GR–ifaremakingitconfigurabletoreadthisinformationfromanxmlfile,thenweneedtocreateanabstractionlayeranyhow.GR–regardingthefutureoptionofmakingthisinformationoptional,thisinformationismandatoryinthedatabaseandsolrindexbecauseitisveryimportantforLTS.Usingdummyvaluesfromadefaultxmlfilewillnotinconvenienceuserswhodonotneedthisinformation.Thesystemhasnotbeenarchitectedforconfiguringoptionalityofdatabasetables/fieldsanddoingsowouldrequiresignificantwork.
Proposedphase Phase1
Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016
Proposal for Electronic Archiving System (EAS) as Free Open Source Software
16
6 AbstractoutDRSCollectionsDescription CollectionsarecreatedinDRS2viatheEASuserinterface.
MinimalcollectioninformationisstoredintheEASdatabaseandtheEASSolrindex.NeedtheabilitytoconfigureEAStocreateCollections:
• InDRS2withminimalinformationinEAS• MinimalinformationonlyinEAS
Comments TheLTSversionshouldonlyworkwithCollectionsinDRS2.
TheOSSversionshouldnotbeconfigurabletocreateCollectionsinDRS2
Future Aseparateprojectcouldmanagecollections.LibrarycloudhasaseparateprojectformanagingitscollectionswhichcouldbeusedasamodelforafutureEAScollectionsmanagementproject.ThiswouldrequireprovidinganapiinEASforupdatingthecorecollectioninformationintheSolrindexandthelocaldatabase.
Dependencies (1)(2)(3)Feedback WG–needtobeabletoassociateitemswithcollections.
Agreed-MakeEASconfigurabletoonlyrequiretitleforcollectionandnotcollectanyotherinformationonCollections.Theninfutureabstractoutcreationofcollectionsinothersystems.Reducedestimatebaseduponaboveagreement.
Proposedphase Phase17 AbstractoutWordshackTermsDescription EnableconfigurationofEAStocreateandusecontrolled
vocabularyterms• InWordshack• InanXMLfile• Facilitateplugginginofothermeansofmanaginga
controlledvocabulary
Update UseupdatableXML/JSonfile/store(sinceemailaddressesarecreatedduringimport)ORInOSSversioncouldjustcreatetermsdirectlyindatabase?Question–UIforcreatingtermsdirectlyindatabase
Comments Wordshacktermsareintricatelytiedintothesystem–• ontheserver• theuserinterface(itusesaWordshackwidget)in
Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016
Proposal for Electronic Archiving System (EAS) as Free Open Source Software
17
conjunctionwithaproxyservletfilter• inthedatabase• intheSolrindex
TheXML/JSonfileshouldbekeptsimplefortheinitialrelease.TheLTSversionshouldonlyworkwithWordshackterms.Thephase1OSSversionshouldnotbeconfigurabletoworkwithWordshackandshouldnotincludethewordshackclientjar.ThesupportedemailclientsarerecordedinWordshackassoftwareterms.
Future PossiblyexpandtosupportothercontrolledvocabulariesDependencies (1)(2)(3)Feedback RS-IfWordshackwereavailableasopensourceitwouldmean
thatitcouldbeusedandsowemightnotneedtodothiswork.GR–wedonotwanttoforceOSSuserstouseourimplementationofacontrolledvocabulary.AlsowedonotwanttobuildinadependencyonanotherprojectbeingopensourcedinordertoopensourceEAS.
Proposedphase Phase18 RemoveFitsfromOSSVersionDescription EnableconfigurationofEAStoremoveFITSComments FitsisusedduringimportandpushtoDRS2.
Duringingestitisonlyrequiredinordertogetfileformatinformation.FortheOSSversionitcangetthefileformatinformationbyissuingthe“file”commandunderlinux(EASalreadyneedstorununderlinuxsothisintroducesmorenon-portablecode).ThisshouldstillbeconfigurableandfailgracefullyifconfiguredtouseFITSintheOSSversion.
Future See9
Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016
Proposal for Electronic Archiving System (EAS) as Free Open Source Software
18
Dependencies (1)Feedback Proposedphase Phase19 ReplaceEASFitsServletwithOSSFitsServletDescription WhenEASwasimplementedFitswasincludedintheprojectand
implementedasawebapplication(similartoSolr).SincethenanopensourceversionoftheFitsServlethasbeendevelopedandisalmostreadyforuse.OncetheopensourceversionoftheFitsServlethasbeenreleasedthisshouldbeusedbyEASinsteadofit’sownimplementation.
Update Thishasalreadybeenimplemented–usingAM’sFITSdockerimage.However,theFITSDockerimageshouldbemadeavailableondockerhuh.
Comments UseoftheOSSversionoftheFitsServletwillmakeiteasiertokeepuptodatewiththelatestversionofFits.Thismaybemanagedbydependencyresolution.Thefits.jarfilewillstillberequiredbytheEASwebapplicationinordertoprocesstheoutputfromfitsduringimport(usedinordertopopulatethefileformatinformationforattachments).
Future Dependencies OSSversionofFitsServletmusthavebeenreleasedFeedback NeedtoalignlicensesProposedphase Phase2
Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016
Proposal for Electronic Archiving System (EAS) as Free Open Source Software
19
10 ConfigureDisablingofPushtoDRSDescription ProvidetheabilitytoconfigureEASto
• PushtoDRS• DisablePushtoDRS
Update ForOSSversion-disablepushtoDRSbutenablepackage
creation.OSSversionmuststillcreatepackageCouldcreatebatchidenticaltoDRSbatchandOSSuserscouldmanipulateitthemselvestoproducewhattheywant.ChangedescriptorstobelessDRSspecificinOSSversion.LTSversionusesOTSwhichcontainsalotofLTSspecificconstants,LTSspecificvalidationetc.TODO–TriciaandSteveneedtoestablishwhatisacceptableinthedescriptors.Issueswithdescriptors:
• ContainWordshackURIs• ContainURNs• ContaindrsAdmindata(schema)• ContainhulEventExtension(schema)
ForOSSperhapssimpledescriptorsshouldbecreatedusingjaxbandnotusingOTS.
Comments ThroughuseoftheRESTfulapiinitem21itwillprovidetheabilityforotherprojectstopulltheinformationrequiredinordertocreateapackageforpreservation.Item22wouldprovidetheabilitytoactuallycreateabagforarchivingmakinguseoftheapiprovidedbyitem21.Item11providesforexportemailsandattachmentsbutnotmetadataasanmbox.
Future Dependencies (1)Feedback RS–needabilitytocreateaverysimplebag.
WG–needoutputsodoneedtoincludeabilitytocreateapreservationpackage.ThiscouldbeanappealingdeliverableforanIMLSgrant.Reducedestimatebasedondiscussions.OnfurtherdiscussionwithRSandWGwillnotcreateabaguntilknowwhatwouldbeusefulinthebag.Needtodiscussthiswith
Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016
Proposal for Electronic Archiving System (EAS) as Free Open Source Software
20
communityduringtheworkshop.Proposedphase Phase111 AddabilitytoconfigureEAStonotuseEmailchemyDescription ThiswillremoveadependencyonaCommercialtool.
ByremovingthisdependencyitwillbepossibletopackageEASinaDockercontainer.ItalsoreducesthebarriertouseofEASibyremovingthenecessitytopayforsoftware.EASshouldfailearlyandgracefullyifEAShasbeenconfiguredtonotuseEmailchemyandifausersubmitsapackettypewhichcanonlybeprocessedbyEmailchemy.
Comments Mostofworkwillbearoundprojectbuildconfiguration.DonotwanttoresultinamoreonerousdeploymentinLTSsoneedtomakeitasautomatedaspossible.
Future Dependencies (1)Feedback Proposedphase Phase112 AddhandlingforemlfilesDescription Bypermittingthesubmissionofemlfilesinapacketuserswill
havetheoptionofusingwhatevertooltheyliketoconverttheiremailstoemlpriortousingEAS.
Comments Shouldthecreatoragentbeacombinationofemlandthetoolusedtoconverttoeml?Ifsoitshouldberecordedinthecontrolledvocabularyasasoftwareterm.
Future Dependencies (1)Feedback Proposedphase Phase113 AddhandlingformboxfilesbyEASitselfDescription Thiswouldpermithandlingofmboxfileswithoutrequiringthe
useofEmailchemy.Manymailboxescanbesavedfromemailserversetcinmboxformat.Itappearstoberelativelysimpletosplitanmboxfileintoindividualemlfiles–thestartofeachnewmessageisidentifiedbythe“From_line(useregexon/^From /lines).
Comments EASshouldberecordedastheagentinthenormalizationevent.
Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016
Proposal for Electronic Archiving System (EAS) as Free Open Source Software
21
Future Dependencies (1)Feedback Proposedphase Phase114 DothoroughreviewoflibrariesusedinEASOSSversionDescription Thisisrequiredinordertoensurethatweareincompliance
withalllicensesoflibrariesusedinEASOSS.Partofthiswillbetolistallthelibrariesusedinthe:
• OSSversion• LTSversion
Thisisalsoarequiredstepinordertosetupdependencyresolutioncorrectly.
Comments Candependencymanagementalsohandlelicenses?Wemayneedtomanuallyincludelicensesetc
Future Dependencies Feedback Proposedphase Phase115 DothoroughcleanupoftestsDescription EAShasnumerousunitandintegrationtestswhicharecurrently
badlyorganized.Theseneedtobecleanedup.Withtherefactoringitmaymakesensetointroducetheuseofmocks.
Comments Future Dependencies Feedback Proposedphase Phase216 MakeUserinterfacechangesDescription Usefeaturerequesttogglestoenable/disableLTSspecific
language.Comments Future Dependencies Feedback Proposedphase Phase1
Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016
Proposal for Electronic Archiving System (EAS) as Free Open Source Software
22
17 Reviewforuseofpublic/private/protected/packagelevel
methodsDescription TheaccessmodifiersonclasseswithinEASwerenotcarefully
managed.Leavingpublicmethodswhichshouldinfactbeprivatecanleadtomisuseofthosemethods.
Comments Future Dependencies Feedback Proposedphase Phase218 HandleconfigurationofotherjobsDescription ThereareseveraljobsusedinEAS.Thesewouldneedtobe
configurable(usingfeaturetoggles)fortheLTSversionortheOSSversion.
Comments TheLTSversionshouldpermitrunningofthesejobs:Loader,Importer,DRSprearchiver,DRSpostarchiver,DRSpacketeventsarchiver,accountmonthlystatistics.TheOSSversionshouldnotpermitrunningofDRSprearchiver,DRSpostarchiver,DRSpacketevents.
Future Dependencies Feedback Proposedphase Phase119 RemoveLTSproprietaryjarsDescription Theutil.jarLTSproprietaryjarprovidesfunctionalitythatis
mostlynowavailableincoreJavaorinopensourcelibraries.
Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016
Proposal for Electronic Archiving System (EAS) as Free Open Source Software
23
WherepossiblethecodeshouldberefactoredtousetheseimplementationsinordertoremoverelianceonLTSproprietarycode.Theldap.jarLTSproprietaryjarisnotused.Boththesejarfilesshouldberemovedifpossible.
Comments UsersofOSSprojectsneedaccesstothesourcesoanyjarfilesusedintheprojectshouldalsobeopensource.
Future Dependencies (1)Feedback Proposedphase Phase120 ImplementRESTfulapiDescription TomakeEASmoreopenforuseitwouldbebeneficialtocreatea
RESTfulapiComments ThisRESTfulapicouldbeusedbyanotherapplicationtocreatea
bag(see21).TheRESTfulapicouldbeusedbyanotherapplicationtocreateandmanagecollections(see6).ThisapimustbeimplementedsothatitmaybeusedbyexternalclientsviaRESTandbyEASitselfinprocess.
Future Dependencies (1)Feedback Proposedphase Phase321 ImplementLOCBagcreationDescription Implementcreationofabagwhichmakesuseoftheinprocess
apifrom20above.Thisprocessshouldbetriggeredviatheuserinterface
Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016
Proposal for Electronic Archiving System (EAS) as Free Open Source Software
24
Comments Usehttps://github.com/LibraryOfCongress/bagit-javatohelpinbagcreation.Question:whatshouldbeinthedescriptorfiles?METSseemstonotbepopular.Needfeedbackfromthecommunityonthis.WhenitemsaresuccessfullyarchivedtoDRStheyaredeletedfromEAS(withoutgeneratinganydeleteevents).Whatshouldhappenwhenabagiscreated?Creationofabagdoesnotmeanthattheitemshavebeensuccessfullyarchived.
Future Dependencies (18)Feedback Proposedphase Phase322 PackagefordeploymentDescription Toreducethebarriertoadoptionitisdesirabletoprovidea
deployableversionofEAS.Comments EASusessome“unixlike”osspecificcommands–andsowillnot
runonwindows(onereasonwasduetoabuginthejavaFileclasswhichdoesnothandlecertainspecialcharactersinthefilename).EAScouldbepackagedformacusingoracleAppBundlerwithhdiutil(ePADDdoesthis).ItmaybebesttoprovideitinaDockercontainer.
Future Dependencies Feedback Proposedphase Phase1
Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016
Proposal for Electronic Archiving System (EAS) as Free Open Source Software
25
Proposed Roadmap
Prerequisites/phase1 Item1 Createanewbuildprocesswithdependencyresolution
Phase 1 Item2 AbstractoutauthenticationItem3 AbstractoutauthorizationItem5 AbstractoutAccounts(DRSownercodes)andBillingCodesItem6 AbstractoutDRSCollectionsItem7 AbstractoutWordshackTermsItem8 RemoveFITSfromOSSversionItem10 ConfigureDisablingofPushtoDRSItem11 AddabilitytoconfigureEAStonotuseEmailchemyItem12 AddhandlingforemlfilesItem13 AddhandlingformboxfilesbyEASitselfItem16 MakeUserinterfacechangesItem18 HandleconfigurationofotherjobsItem19 RemoveLTSproprietaryjarsItem4 EnableconfigurationtousePostgreSQLinsteadofOracleItem14 DothoroughreviewoflibrariesusedinEASOSSversionItem22 Packagefordeployment
Phase 2 Item15 DothoroughcleanupoftestsItem17 Reviewforuseofpublic/private/protected/packagelevelmethodsItem9 ReplaceEASFitsServletwithOSSFitsServlet
Phase 3 Item20 ImplementRESTfulapiItem21 ImplementLOCBagcreation
Phase 4 Detailsaretobedecidedbythecommunity.Interoperabilityistobeaslooselycoupledaspossible–e.g.viafileinterchange,restfulapisandthelike.MakeEASinteroperablewithePaddMakeEASinteroperablewithBitcurator(redaction)
Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016
Proposal for Electronic Archiving System (EAS) as Free Open Source Software
26
Resources
Wordshack https://wiki.harvard.edu/confluence/display/LibraryTechServices/SysDev+-+WordShack
Access Management System https://wiki.harvard.edu/confluence/display/LibraryTechServices/SysDev+-+Access
Policy Server https://wiki.harvard.edu/confluence/display/LibraryTechServices/SysDev+-+Policy+Server
DRS2 https://wiki.harvard.edu/confluence/display/LibraryTechServices/SysDev+-+DRS2
Emailchemy http://www.weirdkid.com/products/emailchemy/
DArcMail http://www.digitalpreservation.gov/meetings/documents/aes15/1_LC_AES_SIA_EmailandCERP_DarcMail_20150602.pdfhttp://siarchives.si.edu/blog/yes-we%E2%80%99re-still-talking-about-emailhttp://www.history.ncdcr.gov/SHRAB/ar/emailpreservation/mail-account/mail-account_docs.html
Bitcurator http://www.bitcurator.net/
ePADD http://library.stanford.edu/projects/epaddhttps://github.com/ePADD/epaddhttps://github.com/ePADD/muse
Lifecycle Tools for Archival Email Stewardship (in progress) https://docs.google.com/spreadsheets/d/1V1N22xnr5e0EbDlZWx58bjYO6rkrMrYH9wGX9-CK8c4/edit?pli=1#gid=986222267
Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016
Proposal for Electronic Archiving System (EAS) as Free Open Source Software
27
Archiving Email Symposium 2015 http://www.digitalpreservation.gov/meetings/archivingemailsymposium.html
Email related RFCs https://tools.ietf.org/html/rfc5322
Email formats http://www.digitalpreservation.gov/formats/fdd/fdd000388.shtml http://www.digitalpreservation.gov/formats/fdd/fdd000383.shtml
fits http://projects.iq.harvard.edu/fits https://github.com/harvard-lts/fits
Open Source https://wiki.harvard.edu/confluence/display/LibraryTechServices/LTS+Open+Source+ProjectsIntroducingtheOpenSourceMaturityModelMakinganOpenSourceProjectBloom
Licenses http://choosealicense.com/licenses/https://en.wikipedia.org/wiki/Comparison_of_free_and_open-source_software_licenseshttp://opensource.org/licenses/
Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016
Proposal for Electronic Archiving System (EAS) as Free Open Source Software
28
Jar files used by EAS access.jar(LTSproprietary)activation.jarantlr-2.7.6.jaraopalliance-1.0.jarapache-mime4j-0.6.jaraspectjrt-1.6.8.jaraspectjweaver-1.6.8.jarc3p0-0.9.1.jarcglib-nodep-2.2.jarcom.ibm.jbatch-tck-spi-1.0.jarcommons-cli-1.1.jarcommons-codec-1.6.jarcommons-collections-3.1.jarcommons-configuration-1.5.jarcommons-fileupload-1.2.1.jarcommons-httpclient-3.1.jarcommons-io-2.3.jarcommons-lang-2.4.jarcommons-lang3-3.1.jarcommons-logging-1.1.3.jarcommons-pool2-2.2.jardom4j-1.6.1.jardrs2_services-dto.jar(LTSproprietary)drs2_services-util.jar(LTSproprietary)easi.jarehcache-1.5.0.jarfits.jarfluent-hc-4.3.5.jarfreemarker-2.3.15.jargeronimo-stax-api_1.0_spec-1.0.1.jarguava-15.0.jarhibernate-jpa-2.0-api-1.0.0.Final.jarhibernate-testing.jarhibernate-tools.jarhibernate3.jarhttpclient-4.3.5.jarhttpclient-cache-4.3.5.jarhttpcore-4.3.2.jarhttpmime-4.3.5.jarjavassist-3.9.0.GA.jarjaxen-core.jar
Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016
Proposal for Electronic Archiving System (EAS) as Free Open Source Software
29
jaxen-jdom.jarjcl-over-slf4j-1.6.1.jarjdom.jarjettison-1.1.jarjstl.jarjta-1.1.jarldap.jar(LTSproprietary)log4j-1.2.17.jarmail.jarmina-core-1.1.7.jarnoggit-0.5.jarognl-2.7.3.jarojdbc14.jaroscache-2.1.jarots.jar((LTSproprietary)saxpath.jarservlet-api.jarslf4j-api-1.7.7.jarslf4j-log4j12-1.7.7.jarsolr-solrj-4.10.1.jarspring-aop-3.2.3.RELEASE.jarspring-batch-core-2.2.2.RELEASE.jarspring-batch-infrastructure-2.2.2.RELEASE.jarspring-batch-test-2.2.2.RELEASE.jarspring-beans-3.2.3.RELEASE.jarspring-context-3.2.3.RELEASE.jarspring-context-support-3.2.3.RELEASE.jarspring-core-3.2.3.RELEASE.jarspring-expression-3.2.3.RELEASE.jarspring-jdbc-3.2.0.RELEASE.jarspring-orm-3.0.5.RELEASE.jarspring-retry-1.0.2.RELEASE.jarspring-test-3.2.3.RELEASE.jarspring-tx-3.2.3.RELEASE.jarstandard.jarstax2-api-3.0.1.jarstaxmate-2.0.0.jarstruts2-core-2.1.8.1.jarstruts2-json-plugin-2.1.8.1.jarswarmcache-1.0RC2.jarutil.jar(LTSproprietary)velocity-1.4.jarvelocity-tools-generic-1.1.jarwoodstox-core-lgpl-4.0.7.jar
Top Related