Cloud Computing for Research & Innovation Computing for Research & Innovation Project Directors...
-
Upload
phungxuyen -
Category
Documents
-
view
218 -
download
0
Transcript of Cloud Computing for Research & Innovation Computing for Research & Innovation Project Directors...
CloudComputingforResearch&Innovation
ProjectDirectorsGroup(PDG)
1
CloudComputingforResearch&Innovation
DavidFergusson,FrancisCrickInstituteMartinHamilton,Jisc(editor)PhilipKershaw,STFC,CEDA&JASMINStevenNewhouse,EMBL-EBI&ELIXIRJackyPallas,UCL,FarrInstitute&eMedLabJeremyYates,STFCDiRAC&SKA
CloudComputingforResearch&Innovation
ProjectDirectorsGroup(PDG)
Contents 2
Contents
ExecutiveSummary.....................................................................................................................................................3
Recommendations......................................................................................................................................................4
1.TheUKNationalE-Infrastructure............................................................................................................................6
2.CloudComputingwithintheUKNationalE-Infrastructure.....................................................................................9
3.Recommendations................................................................................................................................................14
4.DerivedActions.....................................................................................................................................................17
5.Roadmap:A5yearvisionforCloudinUKResearch.............................................................................................20
Acknowledgements...................................................................................................................................................23
AnnexA:WhatistheCloud?Afunctionalview........................................................................................................25
AnnexB:CloudComputingforResearchers.............................................................................................................30
AnnexC:Trust&publiccloud...................................................................................................................................36
AnnexD:NeIandcloud.............................................................................................................................................38
CloudComputingforResearch&Innovation
ProjectDirectorsGroup(PDG)
ExecutiveSummary 3
ExecutiveSummaryToday,researchacrossdiversedomainssuchasphysics,engineering,life-science,theenvironmentandsocialsciencesisbeingdrivenincreasinglybytheabilitytocollect,storeandanalyselargedatasets–socalled‘bigdata’.TheUK’sNationalE-Infrastructure(NeI)needstointegratethroughhigh-bandwidthlow-latencynetworksthecomputational,dataandstorageservicesneededbyresearcherstosupporttheir‘bigdata’analysistorapidlycarryouttheirworld-leadingcollaborativeresearchprogrammes.
AkeycomponentoftheNeIiscloudcomputing–theelastic,on-demandprovisioningofinfrastructure,platformsorsoftware–tomeettheneedsofresearchersfromboththepublicandprivatesectors.Suchahybridmodelrequiresintegrationofpublicsectorinstitutional,communityandnationalresourceswiththoseavailableinternationallyinboththepublicandprivatesector.
GiventhestrategicimportanceoftheNeI,andthegrowingimportanceofcloudcomputingforbigdataanalyticsintheresearchcommunity,membersofthee-InfrastructureProjectDirectorsGroup(PDG),attherequestoftheRCUKNationale-InfrastructureGroupwereaskedtoidentifyasetoftechnicalandpolicyrecommendationsthatwillimprovetheaccessibilityandusabilityofcloudresources-forresearch,teachingandadministration.
ThisreportidentifiesthemajortechnicalandpolicyissuesthatareseentobepreventingwidespreadtakeupofcloudservicesfortheUKacademicandrelatedcommunityandprovidesa5yearroadmaptoinvestigatetheseissuesandprovidecloserintegrationofpublicandprivatesectorresourcestoimprovethecapabilityoftheUKresearchcommunity.DavidFergusson,FrancisCrickInstituteJackyPallas,UCL&eMedLabMartinHamilton,Jisc(editor)PhilipKershaw,STFC,CEDA&JASMINStevenNewhouse,EMBL-EBI&ELIXIRJeremyYates,STFCDiRAC&SKA(Weareindebtedtoanumberofcolleaguesfortheircontributionstothisreport–pleaseseetheAcknowledgementssectionforafulllistofacknowledgements.)
CloudComputingforResearch&Innovation
ProjectDirectorsGroup(PDG)
Recommendations 4
Recommendations
1. ProvideaclearcommunityfocusforcloudcomputingwithintheUKNeIbyestablishingaCloudComputingWorkingGroupthatreportsdirectlytotheRCUKNeIGroup.
ACloud-SpecialInterestGroupshouldalsobesetuptoprovideaforumforallmembersoftheUKresearchcommunity(bothconsumersandproviders)tocometogethertoexchangebestpracticeandexperiences.TheSIGwouldbeabletoprovideagrassrootsviewfromacrossallresearchdomainsastofuturecloudcomputingneeds,whichcanthenbetakenupandturnedintoastrategybytheWorkingGroupandcommunicatedtoRCUKNeIGroup.TheCloudComputingWorkingGroupwouldbeabletoestablishsmallerfocusedworkinggroupstodiscussspecificissuesasrequired.
AnimportanttaskofthisgroupwouldbetoexertinfluenceontheprovidersofcloudsoftwareandscientificsoftwarevendorstoincludetherequirementsoftheUKresearchcommunityinfuturereleases.
2. ProvideaminimaltechnicalintegrationoftheUKNeIresourcestopromoteworkloadmobilityandtoreducetechnicalbarrierstoentry.
ThesetechnicalbarrierscanbereducedbyestablishingaconsistentaccessmodelacrossallUKNeIresources.Theseshouldbeimplementedsoastoenableresearcherstouseasinglesetofidentitycredentialstoaccessservicesattheirhomeinstitutionandtoaccessandmovetheirworkloadsbetweenlocal,regional,nationalandinternationalcloudcomputingresourcesastheyrequire.Anintegratedauthenticationandauthorisationinfrastructure(AAI)isneededalongsideconsistentopeninterfacestotheresourcessoastomakeworkloadsmobilebetweendifferentcloudcomputingproviders.Thiswillfostercompetitionandpreventlock-in.
ConsiderableinvestmenthasalreadybeenmadeinfederatedAAIinitiativessuchasMoonshot(whichisnowavailableastheJiscAssentservice),butfurtherinvestmentisneededtointegratetheseAAItechnologiesintocloudcomputingplatformsandtomakeNeIresources(andthoseintheprivatesector)available.
Virtualresearchenvironmentshaveanimportantroletoplayenablingresearcherstocreatesoftwareenvironmentstailoredtotheneedsoftheirapplicationdomainandusercommunities.Atthesametimehowever,thereisaneedtofacilitatethesharingofapplicationenvironmentstoenableworkloadstobeeasilydeployedonanyUKNeIcompliantcloud.ContainertechnologiessuchasDockerareanimportantenablerforthedevelopmentofthiscapability.
3. EquippingtheresearchcommunitywiththerightskillsandsupporttofullyexploitUKNeIcloudresources.
ManyNeIserviceprovidersarefindingitchallengingtofindstaffwiththerightskillstooperatetheircloudinfrastructuresandtoprovidetheconsultancynecessaryforresearchgroupstorapidlyandsuccessfullyexploit
CloudComputingforResearch&Innovation
ProjectDirectorsGroup(PDG)
Recommendations 5
theavailablecloudresources.RCUKneedstoinvestinbothbasicandadvancedtrainingforserviceprovidersandthoseworkingdirectlywithresearcherstosupporttheirmovetocloudcomputingresources.Investinginstaffworkinginthesefieldsprovidesthemwithskillsthatarepotentiallyverytransferrableintoprivatesectoraspartofnormalstaffmigration.Inparticular,trainingneedstobegiventothoseworkingtosupportresearchersinaccessingcloudinfrastructures–mostlikelytheITSupportgroupsataninstitution.RecentexperiencewiththedeploymentofprivateandcommunitycloudsintheUKresearchcommunityhashighlightedtheneedforthedevopsrole,apersonwhoseskillsetbridgesthetraditionalsystemadministrationandsoftwaredeveloperroles.
4. PolicychangesneededwithinRCUKtogrowtheadoptionofcloudcomputingandthepolicyactionsthatRCUKcaninitiateexternallyonbehalfoftheUKcloudcomputingcommunity.
Cloudcomputingisacquiredonanon-demandbasisasrequired(operatingexpense)asopposedtoaninitialupfrontpayment(capitalexpense).RCUKfundingmodelsneedtoadapttoreflectthischangeforbothresearchersandcommunityserviceproviderswhomayconsumecommercialcloudresourcesinahybridmodel.Outsideofthepublicsector,RCUKmustcontinueanddeveloptheactivitiesthathavebeeninitiatedthroughJiscinestablishingtermsandconditionswithcommercialcloudproviderstoexplorehowthebuyingpowerofthecommunityasagroupcanmakepurchasingoftheseservicesmoreeffective,efficientandproductive.
CloudComputingforResearch&Innovation
ProjectDirectorsGroup(PDG)
1.TheUKNationalE-Infrastructure 6
1.TheUKNationalE-Infrastructure
1.1 ContextThecontextfortheNationalE-InfrastructureislaidoutinAStrategicVisionforUKe-Infrastructurearoadmapforthedevelopmentanduseofadvancedcomputing,dataandnetworks1(Tildesley,2012)
ThemainrecommendationsfromtheTildesleyreportaresummarisedasfollows:
1. Createaten-yearroadmaptodefinethecomponentsoftheinfrastructure:networks;dataandstorage;compute;softwareandalgorithms;securityandauthentication;peopleandskills.
2. Createsecuredataandinformationstoresinstrategiclocationswithdata-analysisprovidedthroughcloudenvironments,workingwithopensourcesoftware.
3. EnsurethatimportantpublicdatabasesareavailabletoallUKresearchers4. Providebroadaccesstotheinfrastructureforindustrialpartners,suppliersandIndependentSoftware
Vendors(ISVs),aswellastheacademiccommunity.5. Assistthedevelopmentofaportfoliooftrainingmodulesincomputationalscience,numerical
algorithmsgrid-computing,parallelprogramming,cloudcomputing,data-centriccomputing,e-science,computeranimationandcomputergraphics.
6. Developasinglecoordinatingbodytodriveclosercooperationandenableeffectiveindustrialaccess,whileinsuringthatUKacademehasaccesstoleadingedgecapability.
InvestmentsbyBIS,theResearchCouncilsandHEIsin2011-12(£160M),2012-2013(£189M)and2014-15(£257M)haveresultedincoreelementsofthisvision(shownaboveinbold)beingputinplace2.
1.2 DefinitionTheNationalE-Infrastructure(NeI)arethoseresources,linkedbyhighbandwidthnetworks,whichprovideUKresearcherswiththecomputational,dataandstorageservicestheyneedtocarryouttheirworld-leadingcollaborativeresearchprogrammes.Inthefuture,theseresources,whichmaybelocatedinboththepublicandprivatesector,needtobeintegratedtoallowresearcherstoaccessmuchgreatercapacityandcapabilitythantheypossiblylocateintheirowninstitutions.
AnintegratedNeIwillallowresearchersfromalldisciplinesto:
1https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/32499/12-517-strategic-vision-for-uk-e-infrastructure.pdf
2https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/249474/bis-13-1178-e-infrastructure-the-ecosystem-for-innovation-one-year-on.pdf
CloudComputingforResearch&Innovation
ProjectDirectorsGroup(PDG)
1.TheUKNationalE-Infrastructure 7
• Expandthecomplexityoftheirsimulations/analysistoproducebetterscience.• Decreasethetimeneededtoobtainthesescienceresults.
• Provideatechnologyplatformthatwillsupportinnovationinboththepublicandprivatesector.
TheRCUKNationale-InfrastructureGroup(Chair,Morrell),comprisedofthe7researchcouncils,IUK,JiscandtheMetOffice,providesstrategicoversightoftheactivitiesoftheNeIProjectsthroughtheNeIProjectDirector’sGroup.TheNeIProjectDirectorsGroup(Chair,Yates),whichcomprisesrepresentativesfromNeIProjectsandProviders(seeAnnexD)andrepresentativesofInnovateUK,JiscandtheHPC-SIG,isthebodychargedwithintegratingtheprojectsandprovidersinasuchawaythatanauthorisedresearchercanaccessher/hisresourcesviaasimpleinterfaceandusingonlyonesetofcredentials.
Thereareover20LargeandSpecialistprojectsthathavebeensetupwithintheNeIecosystemincludingtheJanet6highbandwidthlowlatencynetwork,dataandcomputeservicesforsocialandeconomicscience,genomicsinlifesciencesandmedicine,climateresearch,particlephysicsandinnovation,andcomputeservicesinparticlephysics.Inadditionthereareover35HEIsprovidingservicestotheirresidentacademics.
ThecurrentstateoftheNationalE-InfrastructureisdescribedinTheNationalE-Infrastructure2014Survey3(Yates&Hamilton,NeIProjectDirectorsGroup,2014)anditsfutureevolutionisdescribedin:TheE-infrastructureRoadmap4(Morrell,Chair,RCUKNeIGroup,2014).
1.3 RequirementsInordertoestablisharesearchercentricinfrastructureitisnecessaryfor:
• Theresearchertodiscoverandaccesstoabroadsuiteofintegratedcompute,dataandstorageresourcesthatareaccessedbyahigh-bandwidthlow-latencynetwork.
• Todevelopvirtualresearchenvironmentstomorereadilyenableresearcherstotransitionfromdesktopapplicationexperiencetoservicesdeployedonremoteinfrastructures.Thisisessentialforthelongtailofscienceresearchtoeffectivelyexploitbigdata.
• Avirtualresearchenvironmentthatallowsresearcherstodiscovertheavailableandaccessibleresources,tomovedatabetweenresources,torunreproducibleandpublishableworkflowsthatsupportsopenscienceandopendata.
• AcommonUKAuthenticationInfrastructurethatisinteroperablewithinternationalidentitymanagementinfrastructures,soallowingtheusertouseNeIresourcesusingasinglesetofidentitycredentials.
• AnAuthorisationandAllocation/AccountingInfrastructurethatallowsresearchdomainsandprojectstoauthoriseresearcherstouseappropriateresources,allocatethoseresourcesandmeasuretheirusage.
3http://hpc-sig.org/?wpdmdl=4924https://www.epsrc.ac.uk/newsevents/pubs/e-infrastructure-roadmap/
CloudComputingforResearch&Innovation
ProjectDirectorsGroup(PDG)
1.TheUKNationalE-Infrastructure 8
• Asecurenetworkandstorageenvironmentthatcanofferatrestandinflightinformationassurancetothoseresearchprojects/communitiesthathavedatasecurityconcerns.
CloudComputingforResearch&Innovation
ProjectDirectorsGroup(PDG)
2.CloudComputingwithintheUKNationalE-Infrastructure 9
2.CloudComputingwithintheUKNationalE-Infrastructure
2.1 ImportanceofCloudComputingforResearchersCloudcomputinghasenormouspotentialasanenablingtechnologyfortheresearchcommunity.Computingresourcescanbeprovisionedondemandonasneededbasisandcanbemadeelastictogrowandshrinkasagivenworkloadrequires.Perhapsmostimportant,itprovidesakeysolutiontothechallengeofbigdatabybringinguserstothedata.Inthismodel,cloudenablestheprovisionofvirtualcomputingenvironmentsco-locatedwithcentrescapableofhostingthehugeamountsofdataassociatedwithmanyresearchactivities.Thesecanprovidevastlymorecomputingcapabilitythanavailableinusers’homeinstitutionsandavoidtheneedforthetransferoflargedatavolumes.
Activitiesarecategorisedbydeploymentmodel(private,public,communityorhybridclouds)andbytheservicetheyoffer(infrastructure,platformorsoftware)totheuser–SeeAnnexAformoredetails.Manycommercialcloudcomputingservicesthatarenowofferedtoordinaryconsumerscanbenefitmembersoftheresearchcommunity–SeeAnnexBformoredetails.
2.2 CurrentStatusTheCloudComputingWorkingGroupwasestablishedinAugust2013followinganactionfromtheNationale-InfrastructureProjectDirectorsGroup.ItsroleistofostercollaborationandestablishbestpracticefortheapplicationofCloudComputingintheUKresearchcommunity.Theexecutivesummaryprovidedbytheworkinggroup(Sept2013)identifiedanumberofareasthatneedtobeaddressed:
• Thereisnoobviousentrypointforresearchusers,co-ordinatedandtrainingandguidanceisneeded
• Anumberofcentreshaveplansto(orareintheprocessof)rollingoutPrivateCloudinfrastructures.Thereisaneedtoco-ordinateandshareexperiencebetweengroupstoestablishbestpracticeandavoidfragmentationandduplication
• ThereisaneedtoengageandworkwithcommercialPublicCloudprovidersandco-ordinateusagewithpartnersintheresearchcommunity–intermsoftechnologyandpolicy.
AworkshopwasorganisedinNovember20135tobringtogethermembersoftheresearchcommunityandCloudcomputingexpertsandpractitionerstoelicitfeedbackonthefindingsoftheexecutivesummary.Theworkshop
5https://indico.cern.ch/event/281517/timetable/#all
CloudComputingforResearch&Innovation
ProjectDirectorsGroup(PDG)
2.CloudComputingwithintheUKNationalE-Infrastructure 10
showedthatthereisastronglatentinterestintheresearchcommunitytobebetterinformedandguidedonhowbesttomakepracticaluseofthetechnology.
Presentationswereco-ordinatedaroundfourkeythemes:useofPublicCloud,deploymentofPrivateCloudbye-infrastructurefacilities,cloudfederationandbrokering–thebridgingtogetherofresourcesfromdifferentCloudproviders.
Table1liststhemaincloudprojectsfundedbyUKorEUagenciesthatarerelevanttoresearchersintheUK.Withtheincreasingcross-bordernatureofourresearchcollaborationsandresearchprojectstheUKhasanopportunitytoplayaleadingroleinsuchinitiativesandourowndomesticcloudsmustinteroperatewiththeselargercloudinfrastructures.
Table1:AsampleofUKandEUfundedCloudProjects
Project CloudTechnology MainFunction
EGI(EU) Various ProvideasinglefederatedcloudservicefrommanyindividualinfrastructureprovidersfromacrossEurope.
EMBL-EBI–EmbassyCloud VMware DataAnalysisforBioinformatics.
EUDAT(EU) OwnCloud Provideastandardsetofservicesformovementandstorageofdatabuiltontopofcloudinfrastructure
WorldLHCComputeGRID(STFC) Openstack(CMS) ComputationalandDataServicesforLHC
SquareKilometreArray(STFC) Openstack ComputationalandDataServicesfortheSKA
HelixNebula(EU) VariousCommercialandOpenSource
CreateafederationofprovidersandamarketplaceforEuropeanscientificapplicationdomain
JASMIN2(NERC) VMware Dataanalysisenvironmentfortheenvironmentalsciencescommunity
CLIMB(MRC) Openstack ComputationalandDataServicesforMicrobialDNAanalysis
eMedLab(MRC) Openstack ComputationalandDataServicesforHumanDNAanddiseaseanalysis
Cambridge-AWSLink AWS TestforHybridCloudjobsubmission
NECTAR(Australia) Openstack AustraliangovernmentfundedcloudtosupportAustralianresearchcommunity6
EUT0 Unknown Proposaltocreateahubofknowledgeandexpertisetocoordinatetechnologicaldevelopmentstomeetthe
6https://www.nectar.org.au/research-cloud
CloudComputingforResearch&Innovation
ProjectDirectorsGroup(PDG)
2.CloudComputingwithintheUKNationalE-Infrastructure 11
Project CloudTechnology MainFunction
e-InfrastructureneedsofdifferentScientificCommunities7
EuropeanOpenScienceCloud Unknown ProposalbyCERNendorsedbyotherresearchlaboratoriestoestablishahybridITasaServicetothepublicresearchsectorinEurope.
2.3 NextStepsTherecentPDGreportImagingtheUKNationalDataInfrastructure8suggestedthattheuseofCloudTechnologiesisapre-requisiteforthecreationofacoherentNationalDataInfrastructure.
TheSelf-ServicenatureofCloudandtheabilitytoorchestrateresourcesmakethistechnologyparticularlyrelevanttodatadrivenscience.
Cloudiswidelyseenasthenext-generationITdeliverymodel
• Agile&Flexible
• Utility-basedon-demandconsumption
• Self-servicedrivingdownadministrativeoverheadandmaintenance
PubliccloudsaresettingthebenchmarkofhowITcouldbedeliveredtousers
• Howevernotallorganisationsand/orworkflowsarereadyforpubliccloud
Applicationsarebeingwrittendifferentlytoday
• Moretolerantoffailure
• Makinguseofscale-outarchitecture
Ourdataistoolarge
• Volumesofdataarebeinggeneratedatunprecedentedlevels
• Mostofthisdataisunstructured
Servicerequestsaretoolarge
• Thetimetosciencecouldgetmuchlongerwithoutaccesstoelasticresources
• Moreandmoredevicesarecomingonline
7http://www.eu-t0.eu/8https://www.scribd.com/doc/260531862/Imagining-the-UK-National-Data-Infrastructure-Recommendations
CloudComputingforResearch&Innovation
ProjectDirectorsGroup(PDG)
2.CloudComputingwithintheUKNationalE-Infrastructure 12
• Tablets,phones,laptops,BYODgeneration…
Crucially,applicationsweren’twrittentocopewiththedemand!
• Traditionalinfrastructurecapabilitiesarebeingexhausted
• Serviceuptime,QoS,KPI’sandSLA’sareslipping
2.4 TheCurrentCloudPictureintheNeIThefragmentedandsilo-ednatureoftheNeIatfirstsightsuggeststhattheresourcescouldbemoreeffectivelyusedusingcloudtechnologies.
IntheoryworkloadscouldbedistributedamongthemanysystemsoftheNeI.
Howeverseveralproblemspresentthemselves:
• TheassetsoftheNeIareownedbyparticularresearchdomainsandprojects.o Thereisnoincentivetoshareresourcesandnoincentivetoconsolidateresources.
• Thereisnocommonidentitymanagementsystem.
• ThesparecapacityissimplynotthereintheNeI.o Elasticity,whichshouldbeabenefitofcloud,isnotpresentinaveryfullNeI.
• Cloudisanewtechnologyandmanyofoursystemsstaffandusersareunfamiliarwithit.
• ThereisnofinancialmodeloftheNeI.o Thereisnoresourcebrokeringservice.
• Thereisalreadycommunitythatisveryhardtosizethatmakesuseofpubliccloudprovision.o ThissuggeststheNeIisnotworkingforthesepeople.
Progress,however,hasbeenmade:
• AAAIhasbeeninvestedinwithsignificantbuy-infromacrosstheResearchCouncils• Jischavecreatedrelationshipswithpubliccloudproviders,suchasAWSandGoogle,tocreate
managementportalstotheseservices.• JischavepartneredwithMicrosoft9andAWS10todirectlypeertheJanetnetworkwiththeirdatacentres,
facilitatingtheadoptionofHybridCloudsthatmixinstitutionalandcloudproviderresources.• TheInvestmentsbyBISandtheResearchCouncilshavecreatedasenseofcoherenceandcommunity
thatdidnotexist3yearsago.o ThecreationofPrivateCloudsintheareasoftheEnvironmentalsciences,LHCandSKAdata
processing,andBioinformatics.
9https://www.jisc.ac.uk/news/over-18-million-students-and-staff-to-benefit-from-faster-more-secure-cloud-computing-21-may
10https://www.jisc.ac.uk/amazon-web-services
CloudComputingforResearch&Innovation
ProjectDirectorsGroup(PDG)
2.CloudComputingwithintheUKNationalE-Infrastructure 13
o ThecreationofSoftwareasaService(SaaS)infrastructuressuchastheNERCEnvironmentalWorkbench,theEnvironmentalgenomics–VirtualDesktop.
o DiRACandTheNationalServiceArcher,alreadyshareinfrastructuresandrepresentthePhysicalSciencesandEngineeringgroupingsintheUK.
o DiRACandGridPPhavealreadyswappedprojectsbetweenthemmanuallyinordertomakebestuseofresources.
o Theco-operationoftheSocialScientists,MedicalBioinformaticsandPatientRecordsprojectsinworkingwithJisctoproduceSafeShareisanexampleofhowdifficultprojectssuchasClouduptakecanbemanagedinthefuture.
o TheInfinityDataCentreinSloughhascreatedanenvironmentinwhichco-locationofequipmentisnowpossibleanddesirable.
o TheleadingroleoftheUKinGÉANTandtheEGIshouldmakesurethattheUKbuildsstructuresthatinteroperatewithandleverageresourcesmadeavailableviaEUprojects.
o TheimprovementtoJanetNetworkspeedandlatency,andsecurity,meansthatsystems(anddata)cannowbedistributed.ThishasbeenshowntoworkbyGridPPwhocannowusetheNetworkas“mobile”storageandregularlyaremovingPBsaweekoverJanet.
o ThepossiblefutureabilityoftheJanetNetworkitselftooperateinSelf-ServicemodeisawelcomeadditiontotheClouddeploymentmodel.
• JiscandthePDGhavesurveyedtheattitudeoftheNeIandHEICIOstoCloudtechnologies11• CoursesarebecomingavailabletointroducepeopletotheCloudasanoperatingsystem,includingones
specificallytargetingresearchers,suchasthoserunbyMicrosoftinpartnershipwiththeSoftwareSustainabilityInstituteandOxforde-ResearchCentre12.
11https://www.jisc.ac.uk/news/uk-education-divided-in-its-adoption-of-the-cloud-14-jul-201512http://research.microsoft.com/en-us/projects/azure/training.aspx
CloudComputingforResearch&Innovation
ProjectDirectorsGroup(PDG)
3.Recommendations 14
3.RecommendationsAlthoughanumberofneedshavebeenidentified,themembersoftheCloudWGhavelimitedresourcestoaddressthese.Aco-ordinationrolehasbeenproposedtoassisttheworkoftheworkinggroupchairandcommitteetoorganizeandfacilitatemeetings.
3.1 CommunityBuilding
• SetupaRCUKNeICloudworkinggroupandSIG–governancearrangementstobespecified.Theworkinggroupisresponsibleforthedeliveryofthe(fullyresourced)roadmap.
• LiaisewithgroupsintheresearchcommunitywhoarerollingoutPrivateCloudinterfacestotheire-infrastructure.Shareexperienceandestablishbestpracticeintheuseofsoftwareframeworks,resourcingfordeploymentandoperationsofservices.
• BuildthecommunityintheUK–presentasinglefacetofunds,stakeholdersandvendors.
• Theworkinggroupshouldidentifypilotstobesupported–RCUKbudgetrequestedforcloudinnovationfund.
• Sharetheknowledgeandexperienceofsuccessfulprojectsindifferentsubjectdomainstoensuretransferability.
• Documentexamplesoftheresearchcommunity’sexperiencesofusingpubliccloudprovidingguidanceonwhatworks,andwhatdoesn’tbyseedingactivitybysurveyingresearchers(recentRCUKgrantawardPIs?)ontheiruseofpubliccloud.
3.2 TechnicalIntegration
• Ifitistosucceed,anytechnicalintegrationmusttakeitsleadfromscience-drivenusecases.Clearfocusisneededonwhichinterfacesarerequiredandwhoisthetargetuser,betheyanadministrator,softwaredeveloperorenduserresearcher.
• Co-ordinatewithactivitiesunderwaytobrokerorfederateresourcesfrommultipleCloudprovidersandassiste-Infrastructureprovidersandend-userstobestexploittheseinitiatives.TheseactivitiesincludeJANETCloud,G-Cloud,EGIFederatedCloudTaskForceandHelixNebula.
• TargetsmallamountsoffundingtodevelopUKprioritiesthataren’tbeingmetelsewhere,seedandinfluenceproductevolution.
• HelptheUKtomakethebestuseofcommunityandpublicclouds–e.g.portabilityofapplicationsandworkloads.Thepurposeofthisistoensurethatscientificworkloadscanbeexecutedonthemostefficientandeffectiveinfrastructure.Thegoalisanecosystemofcloudsacrossdomainsandproviders,supportinginter-disciplinaryscience.
CloudComputingforResearch&Innovation
ProjectDirectorsGroup(PDG)
3.Recommendations 15
• Projecttoexplorecommoncloudmanagementportal–linktoAAAI.• ProjecttoexploreemergingcontainertechnologiessuchasDockerasstandard/recommendedapproach
topackagingandportability.Weneedtoidentifyandprioritiseusecasesforapplicationofthistechnology.
• Ratherthanasinglecentralizedresearchcloud,butinsteadlookingforopportunitiestocollaboratefromeffortsalreadyunderwayintheindividualresearchdomainsmakingmoreeffectiveuseofUKinfrastructureandpubliccloud,suchaseMedLabandCLIMB,andsharing/collaboratinge.g.arounddocumentation.
• Costandtimingaspectsofdatastorageandegress–e.g.500TBfromAWSwouldcostroughly$150,000perannumtostore,and$25,000inegresschargestoexportfromthecloud13.Largescaledatatransferrequiredtomoveworkloadsfrome.g.EBItootherfacilitiesrequiresspecialistnetworkinfrastructure.
• Accessandsharingregime-limitationsinCephandSwift,asfocushasbeenonhighperformance.Controlsaroundsharingimagesarealsosomewhatabsent.Howdoweactcollectivelytochangethis?Itwillhappeneventuallybyitself,albeitperhapsmoreslowlythanwewouldlike.
• Weneedtodeveloptheconceptofvirtualresearchenvironments:web-hostedapplicationsthatoffereasytouseinterfacesthatcanbytailoredandcustomisedtomeettheneedsofuserswhoaremoreusedtopointandclickdesktopapplications.
• Weshouldalsooffertoouruserstheoptionofvirtualdesktops.Thesearesimplyvirtualmachinesthataregeneratedandallowtheuserstoruntheirdesktop/laptopapplicationsandworkflowsonmuchmorepowerfulsystems.
• Thiswouldhelpuserstotransitionfromdesktopapplicationsthattheyaremorefamiliarwithtocloudhostedapplications.Thisclassofuserrepresentourmainaccessproblemandtheyactuallywouldcomefromacrosstheresearchspectrum.
• Therewillalwaysbethoseuserswhowanttologinasystemandsubmitjobsfromthecommandline.Thatshouldnotbediscouraged.
3.3TrainingandSupport
• Trainingneedstobecarefullytargetedtotherightclassofusers.Enduserapplicationsarelikelytoremainwithinthescopeofagivenapplicationdomain.However,attheplatformandinfrastructuretiersthereisaneedtotrainandequipadministratorsandanewclassofdevopsusers.
• SupportendusersintheresearchcommunitytoassisttheminhowtouseservicesfromCloudprovidersandco-ordinatewithtraininginitiativesthroughthePDGmembers.
• Forendusers,provideacoherentroutefromdesktopapplicationenvironmentthroughtovirtualenvironmentshostedone-Infrastructure,includingcommunityandpubliccloud.
• AcentralprogrammeofsupportakintoARCHERCSEincludingtraining,adviceandguidanceforexploitingpubliccloud,workingwithcloudproviders.
13http://aws.amazon.com/s3/pricing/
CloudComputingforResearch&Innovation
ProjectDirectorsGroup(PDG)
3.Recommendations 16
• TheResearchSoftwareEngineering(RSE)community14hasapivotalroletoplayinhelpingresearcherstoexploitthepotentialofcloudtechnologiesforresearchandinnovation-forexamplebyportingcommonpackagesandenvironmentstoruninthecloud,andpackagingwithDockerorothersuitablecontainertechnologies.
3.4 PolicyIssues
• LiaisewithPubliccloudprovidersonbehalfofe-infrastructurefacilitiesandend-usersintheareasof:training–howcanusers’bestusepubliccloudforresearch,bestuseoftechnologytofitwithresearchworkloads,policy(SLAs)andfunding.
• Furthercloudbrokerage/peeringworkbyJisc–buildingonAWSmodelwithotherproviders
• Commoninstitutionallevelmanagement/billingportal,discountsforbulkusage,delegationofbudgettoPIs,self-servicemanagementofresearchersbyPI/administrator,toolsfortrackingspending
• Costcharacterisation–refreshingtheworkoftheJisc/EPSRCstudyfrom201115
• Commonapproachforuseofpubliccloudingrantproposals(extendstowiderNeI?)
• Establish“publiccloudasanNeIresource”(Jisccouldbrokeraccesstothis?)• Policystatementonwhatdatacanbeputintocloudproviders,compliancerequirements
14http://www.rse.ac.uk/15https://www.epsrc.ac.uk/research/ourportfolio/themes/researchinfrastructure/subthemes/einfrastructure/cloud/
CloudComputingforResearch&Innovation
ProjectDirectorsGroup(PDG)
4.DerivedActions 17
4.DerivedActions
Tables2and3arealistofsuggestedactions
Table2:ActionsfortheFuturefromtheInfrastructureperspective.
ProduceafinancialmodeloftheNeI.Thisisthefirststepforanyresourcebrokeringmodel.
RCUK
ProducealogicalmapanddescriptionoftheNeIanditsservices
JiscandPDG
AgreeAAAIrolloutacrossPDG
1. AdoptionofappropriateAuthenticationmodel2. AdoptionofAuthorisationandAccountingModel3. NeedtoFederatewithexternalAccessManagement
Provider
PDG,Jisc,andRCUK
AskPDGmemberstosuggestpossibleprivatecloudsandsuggesthowinvestmentscouldbebetterco-ordinated.
Whatcanwedotoconsolidateandbuildeconomiesofscaletohelpmakeourselvesmuchmoreelastic?
PDG
NetworkasaServicemodelsneedtobeunderstoodanddeployed
Jisc
AssessCloudBrokerageModels Jisc
DeploymentofNetworkSecurityservices Jisc
UnderstandhowtheNeIcanbemadeelasticbyresourcepoolinganduseofPublicCloud
PDG
AskRCUKtosetupaCloudWorkingGrouptoinformandguidetheadoptionofCloudintheNeI.
RCUK
CloudComputingforResearch&Innovation
ProjectDirectorsGroup(PDG)
4.DerivedActions 18
ProduceafinancialmodeloftheNeI.Thisisthefirststepforanyresourcebrokeringmodel.
RCUK
CompleteNeItoPublicCloudtestsatCambridge.ThiswillprovideaprototypefortheNeI
CambridgeandAWS
HaveameetingonCloudTechnologiesandgetconsensusontheprojectsandCloudTechnologiestheUKshouldbesupporting.
TheseProjectswillinclude:
1. Suggestionsforsensibledeploymentstacks2. Whichtechnologiesshouldwesupport3. NeIinterfaceswithCloudTechnology–SelfService,
andmetering4. AAAIinterfaceswithCloudTechnology5. UnderstandingwhichareaswillofferSaaS,PaaSand
IaaS6. AroadmapforusingCloudforHPC7. HowcanwedevelopCloudtechnologies,particularlyin
theareasofdatamanagement8. CanwemakeHybridisationeasy
PDG
PublishReportsonNeIandtheCloudandsubmittoRCUKandELC
PDG
Table3:Requirementsfromaresearcherpointofview
Researcherrequirement
TechnicalRequirement Technology Whodoesthis?
SingleSignon AgreedAAAIandinterfacestoCloudTechnology
JiscAssentforAuthentication.
DomainprovidesforAuthorisationandaccounting
Jisc
RCUK
CloudComputingforResearch&Innovation
ProjectDirectorsGroup(PDG)
4.DerivedActions 19
Researcherrequirement
TechnicalRequirement Technology Whodoesthis?
KnowingwhereResourcesare
“cloud”infrastructures
LookingatqueuesthatIamallowedtolookat
VMware,OpenStack,OpenNebulaetc
CloudWG
RCUK
Accessingthoseresources
Portals
ssh
Jisc
RCUK
Collaborativeanalysisandprocessingenvironments,abilitytosharedata,analysisandresultswiththeirpeers
Virtuallaboratories IndividualresearchdomainsdevelopwithknowledgesharingthroughtheCloudWG
BuildingWorkflows Portals Taverna,ApacheMesos,ApacheSpark
DomainResearchersandSoftwareEngineers
Jisc
RCUK
ProducingVirtualimagesandInstances
Accesstoimageandinstancecreationapplications
Portals
e.g.G-Cloud,VMware,OpenStack,OpenNebula,containertechnologiessuchasDocker,etc
DomainPrivateClouds
Jisc
RunningMyworkflow Virtualisation
Portals
Batchingschedulingsoftware,e.g.qsub
VMware,OpenStack,ApacheSpark
Domain
CloudComputingforResearch&Innovation
ProjectDirectorsGroup(PDG)
5.Roadmap:A5yearvisionforCloudinUKResearch. 20
5.Roadmap:A5yearvisionforCloudinUKResearch.Cloudwill,
1. AidtheUKeconomybyallowingUKresearcherstodomorewithlessandprovideacompetitivedatainfrastructureunderpinningUKresearch,cuttingacrossresearchsectorsandallowingindustrytoaccessthesamee-infrastructureanditsresources.ThisshouldincreasetheproductivityoftheUKresearchsectorandUKIndustry.
2. EnablethesharingofresearchandknowledgewithintheUKtoimproveefficiencyandeffectiveness,withimpactthatbenefitstheUKeconomy.
3. ProvideWorldClassinfrastructureforUKresearch.Easilyaccessiblecloudinfrastructuresforeverybody.
4. Meetingthechallengesofdataintensiveresearchandthedevelopmentofdatascience,particularlytheAlanTuringInstitute.
Thiswillbedonebyfollowingaroadmapthatwillallowoure-infrastructuretomakeoptimaluseofcloudtechnologiestodeliverourresearchservices.Byunderstandingwhichcomputeworkloads(e.g.HighPerformanceComputing,HighThroughputComputing,databaseanalysis)couldmovetothepubliccloud,andmaking“privatecloud”facilitiescloudcompatiblewiththepubliccloud,wecanallowtheendusertodefinetheirownroutestoITservicesthroughtoachievetheircomputegoalsandpicktheappropriateserviceprovider(s).
Roadmap
Year1
CloudComputingforResearch&Innovation
ProjectDirectorsGroup(PDG)
21
Task Actions StakeholdersandResource
DataMovement
Replicatelargescaledatafromdatasourcestosinksinasecurefashion,e.g.EBItoeMedLab,EBItoCLIMB
Jisc,eMedLab,CLIMB,EBI.1FTEfor3months
BarrierstousingPublicCloud
Exploreissuesarounduseofpubliccloud,e.g.egresscharging
Jisc,WG,1FTEfor3months
Brokerage Brokeredcommunityagreementwithpubliccloudproviders.Thisrequiresclearconsistent/unifiedstatementfromRCUKabouthow“capital”fundscanbeusedtopurchasepubliccloudcapacity.
Jisc,WG,RCUK,1FTEfor3months
TrackExistingActivities
UKNeICloudprojects,EUProjects,PublicCloudusage Jisc.WG
Year2
Task Actions StakeholdersandResource
BrokerageService
Brokeredcloudfacilitiesstarttobecomeavailabletoresearchers
Jisc/0.3FTEp.a.
Groupbasedaccesstoresourceswithsingleidentity
AAAIimplementationavailabletoNeIProjectsandaAPIisdevelopedtoallowusebyPublicCloudProviders’AAAIinfrastructure.
Workisalreadyinprogress–SafeShareandtheintegrationofAssentandSAFE.ApilothasalreadyallowedABFAB(Assent)tointerfacewithOpenstackKeystone.
Jisc,WG,EPCC,ADRC,Farr,eMedLab.4FTEperannumin2015-2016
Year3
CloudComputingforResearch&Innovation
ProjectDirectorsGroup(PDG)
22
Task Actions StakeholdersandResource
StandardServicesAvailable
GeneralOndemandSelfServiceavailableanoptionforNeIusersforComputeanddataservices
ManagementAPI
Standardisedworkflowimages(orcontainers)
Securedataaccesswithinprivatecloud,includingimprovementstocontainersecurity
WG,Jisc,RCUK,ResearchDomains(2-3FTEfor2015-2106)WG,Jisc(1FTEfor1year)ResearchDomains(ResourceTBD)WG,Jisc(2.5FTEp.a.)
Year4
Task Actions StakeholdersandResource
CloudFederation Federateddataaccessbetweenprivateclouds Jisc,WG,RCUK2FTEfor2016-17
Year5
Task Actions StakeholdersandResource
HybridModelAvailable
Hybridmodelthatletsusersexploitthebestofcommunityspecificprivatecloudandpubliccloudresources,andseamless/efficientmigrationofworkloadanddatabetweenfederatedclouds.Expectationthatcontainerswillplayamajorroleinthis
Jisc,WG,RCUK2FTEfor2017-18
CloudComputingforResearch&Innovation
ProjectDirectorsGroup(PDG)
Acknowledgements 23
AcknowledgementsWewouldliketothankthefollowingpeopleforgivinguptheirtimetoprovideinputintothisreport:NeeloferBanglawala,EPCCStephenBooth,EPCCBrendanBouffler,AmazonWebServicesCraigBox,GooglePeterBoyle,UniversityofEdinburgh/DiRACPeterBraam,UniversityofCambridge/SKA StevenBryen,AmazonWebServicesDavidBritton,UniversityofGlasgow/GridPPDavidChadwick,UniversityofKentTimChown,JiscJeremyColes,CERNIanCollier,STFCDavidColling,ImperialCollegeTimCutts,WellcomeTrustSangerInstituteShaundeWitt,STFCRuedigerDorn,MicrosoftMatthewDovey,JiscTomEddington,NationalOceanographyCentreWilliamFlorance,GooglePaulFretter,NorwichBioScienceInstitutesAndyGrant,AtosScottHamilton,AmazonWebServicesJamesHetherington,UCL/SoftwareSustainabilityInstituteTerryHewitt,STFCHartreeCentreAdamHuffman,FrancisCrickInstituteLaurenceHurst,UniversityofBirminghamEdJackson,AmazonWebServices,ClaireJenner,UCLJensJensen,STFCOwainKenway,UCLIainLarmour,EPSRCGeorgeLeaver,UniversityofManchesterPeterMaccallum,CancerResearchUKCambridgeInstituteSimonMcIntosh-Smith,UniversityofBristolMattMcNeill,Google
CloudComputingforResearch&Innovation
ProjectDirectorsGroup(PDG)
Acknowledgements 24
AleksNenadic,UniversityofManchester/TavernaDanPerry,JiscRobinPinning,UniversityofManchester/N8AlanReal,UniversityofLeeds/HPC-SIG/N8BarakRegev,GoogleThomRödde,GoogleMarkRowlands,AmazonWebServices,AndyRichards,UniversityofOxford/OERC/SES5DavidSalmon,JiscJeremySharp,JiscRhysSmith,JiscKenjiTakeda,MicrosoftJohnTaylor,UniversityofCambridgeAndyTurner,EPCCSimonThompson,UniversityofBirmingham/CLIMBAshVadgama,AWE
CloudComputingforResearch&Innovation
ProjectDirectorsGroup(PDG)
AnnexA:WhatistheCloud?Afunctionalview 25
AnnexA:WhatistheCloud?AfunctionalviewThereiswidespreadmisunderstandingabouttheterm‘Cloudcomputing’leadingattimestoaccusationofhypeoratothertimesconflationwithestablishedcomputingparadigmssuchasGridComputing.Anunderstandingoftheconceptsisessentialinordertomakeinformedchoicesabouthowitcanbebestexploitedandinwhatwaysitdiffersorcomplementswhathasgonebefore.
CloudcomputingpowerstheservicesofInternetgiantslikeMicrosoft,GoogleandAmazon.Thattechnologyisnowavailabletoinstitutions,tolearnersandtoresearchers.Thisishugelyempowering,forexamplebyextendingthereachofanindividualfarbeyondwhatwouldhistoricallyhavebeenpossible-forafewtensofdollarsajuniorresearcher,undergraduateorcitizenscientistcancreatetheequivalentofamillionpoundsupercomputerforafewhourstocarryoutacalculation.Cloudcomputingisaportmanteautermencompassingeverythingfrominfrastructureasaservice(essentiallyrentingsomeoneelse’sserverequipment)throughtosoftwareasaservice(typicallywebsitesthatsomeoneelserunsforyou).Inthemiddle,thereisaplatformtierthatprovidesthemicro-servicesthatpowerthelikesofAndroidandiPhoneapps,andalsomanywebdeliveredservices.Butitisalsoimportanttotakeaninformedviewofcloudservices,particularlywhereriskmanagement,switchingcostsandsustainabilityareconcerned.However,atthesametimethereisasignificantamountoffear,uncertaintyanddoubtaboutcloudcomputing-andsomegenuineconcerns.Today’sconceptofcloudcomputinggrewoutoftwoparallelactivitiesthateventuallyconverged-commercialhostingofserversandstorageinprofessionallyrundatacentres,andthetwokeyrealisationsbyleadingInternetfirms.Thesewerethata)theycouldrentoutcapacitythatwouldotherwisebespare,andb)openinguptheirapplicationprogramminginterfaces(APIs)tothirdpartydeveloperswouldmakeitpossibletocreateavibrantecosystemofapplicationsthatnoonecompanycouldlikelyconstructonitsown.Itisalsocommonplaceformobileappstobebuiltuponasubstrateofcloudservices,evenifthisisnotreadilyapparenttotheenduser.
Intheexperienceofauthorsofthisreport,theUSNationalInstituteofStandardandTechnology(NIST)DefinitionofCloudComputing16documentprovidesanexcellentstartingpointtounderstandthistechnology.Foritsdefinitionitsetsoutthreekeyareas:
• EssentialCharacteristics:whatarethepropertiesthatallowustosayonesystemisacloudwhereanotherisnot?Thisiskeytounderstandingcapabilitiesofagivensystemandwhatarethedifferentiatingfactorsthatmakecloudtechnologyunique.
16http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf
CloudComputingforResearch&Innovation
ProjectDirectorsGroup(PDG)
AnnexA:WhatistheCloud?Afunctionalview 26
• ServiceModels:whatservicesdoesacloudofferandtowhom?Thethreemodelsdescribedcanbeconsideredassuccessiveabstractionaboveunderlyingcomputerhardwareuponwhichacloudruns.Atthelowerlevels,cloudcomputinginterfacesprovideadministratorswithpowerfulcapabilitytodeploywholevirtualcomputinginfrastructures.Attheoppositeendofthescaleendusersusinganapplicationmayhavenoorlittleconceptthatitiscloud-hosted.
• DeploymentModels:toomany‘cloud’maybesynonymouswithlargepubliccloudprovidersbutacloudmaybedeployedinanyofnumberofdeploymentmodelsincludingforexampleaprivatecloudhostedfortheuseofasingleorganisation.Thedeploymentmodelorcombinationofdeploymentmodelsareacriticalconsiderationforgettingthemostfromcloudcomputingtechnologyforanygivenresearchprojectorprogramme.
InthefollowingalltextquotedinitalicsistakenverbatimfromtheNISTdocument.
Cloudcomputingisamodelforenablingubiquitous,convenient,on-demandnetworkaccesstoasharedpoolofconfigurablecomputingresources(e.g.,networks,servers,storage,applications,andservices)thatcanberapidlyprovisionedandreleasedwithminimalmanagementeffortorserviceproviderinteraction.
Thiscloudmodeliscomposedoffiveessentialcharacteristics,threeservicemodels,andfourdeploymentmodels.
A.1EssentialCharacteristicsOn-demandself-service:Aconsumercanunilaterallyprovisioncomputingcapabilities,suchasservertimeandnetworkstorage,asneededautomaticallywithoutrequiringhumaninteractionwitheachserviceprovider.
Broadnetworkaccess:Capabilitiesareavailableoverthenetworkandaccessedthroughstandardmechanismsthatpromoteusebyheterogeneousthinorthickclientplatforms(e.g.,mobilephones,tablets,laptops,andworkstations).
Resourcepooling:Theprovider’scomputingresourcesarepooledtoservemultipleconsumersusingamulti-tenantmodel,withdifferentphysicalandvirtualresourcesdynamicallyassignedandreassignedaccordingtoconsumerdemand.Thereisasenseoflocationindependenceinthatthecustomergenerallyhasnocontrolorknowledgeovertheexactlocationoftheprovidedresourcesbutmaybeabletospecifylocationatahigherlevelofabstraction(e.g.,country,state,ordatacentre).Examplesofresourcesincludestorage,processing,memory,andnetworkbandwidth.
Rapidelasticity:Capabilitiescanbeelasticallyprovisionedandreleased,insomecasesautomatically,toscalerapidlyoutwardandinwardcommensuratewithdemand.Totheconsumer,thecapabilitiesavailableforprovisioningoftenappeartobeunlimitedandcanbeappropriatedinanyquantityatanytime.
Measuredservice:Cloudsystemsautomaticallycontrolandoptimizeresourceusebyleveragingameteringcapabilityatsomelevelofabstractionappropriatetothetypeofservice(e.g.,storage,processing,bandwidth,andactiveuseraccounts).Typicallythisisdoneonapay-per-usercharge-per-usebasis.Resourceusagecanbe
CloudComputingforResearch&Innovation
ProjectDirectorsGroup(PDG)
AnnexA:WhatistheCloud?Afunctionalview 27
monitored,controlled,andreported,providingtransparencyforboththeproviderandconsumeroftheutilizedservice.
A.2ServiceModels
Figure1.Cloudcomputingtiers(CC-BYFlickruserPhilWolff)
SoftwareasaService(SaaS):Thecapabilityprovidedtotheconsumeristousetheprovider’sapplicationsrunningonacloudinfrastructure.Theapplicationsareaccessiblefromvariousclientdevicesthrougheitherathinclientinterface,suchasawebbrowser(e.g.,web-basedemail),oraprograminterface.Theconsumerdoesnotmanageorcontroltheunderlyingcloudinfrastructureincludingnetwork,servers,operatingsystems,storage,orevenindividualapplicationcapabilities,withthepossibleexceptionoflimiteduser-specificapplicationconfigurationsettings.
SaaSisexemplifiedinthecommercialsectorbythelikesoftheMicrosoftOffice36517andGoogleAppsforEducation18communicationsandcollaborationsuites,andtheSalesForce.com19CustomerRelationshipManagement(CRM)system.Softwarevendorsareincreasinglymovingtodeliveringapplicationsthroughcloudcomputingasthisreducesthefrictionoftakinguptheproduct-andmorecynicallyhelpsthemtoretaincustomers,exploitcustomers’data,andtoupsellcustomerstootherproductsandservices.Intheresearchsector,thereisalonghistoryofweb-basedapplicationsandservicestosupportdistributedusercommunities.ThesecouldbeclassifiedunderthecategoryofSaaS.Inrecentyears,anumberofinitiativeshavegrownaroundtheconceptofVirtualResearchLaboratories,theideaofsharedworkspacesforscientificusertocollaborateandhostedonacloudinfrastructuretoenableaccesstogreatercomputingprocessingcapacityandstoragethatwouldotherwisebepossiblewithadesktopapplication.ExamplesaretheCSIROVirtualLaboratorieshostedontheAustralianresearchcloud,Nectar20.
17http://products.office.com/en-gb/business/Office18https://www.google.com/edu/19http://www.salesforce.com/uk/products/20https://www.nectar.org.au/virtual-laboratories-1
CloudComputingforResearch&Innovation
ProjectDirectorsGroup(PDG)
AnnexA:WhatistheCloud?Afunctionalview 28
PlatformasaService(PaaS):Thecapabilityprovidedtotheconsumeristodeployontothecloudinfrastructureconsumer-createdoracquiredapplicationscreatedusingprogramming.languages,libraries,services,andtoolssupportedbytheprovider.Theconsumerdoesnotmanageorcontroltheunderlyingcloudinfrastructureincludingnetwork,servers,operatingsystems,orstorage,buthascontroloverthedeployedapplicationsandpossiblyconfigurationsettingsfortheapplication-hostingenvironment.
Thisisessentiallyakitofpartsthatcanbeusedbydeveloperstosimplifytheprocessofbuildinganddeployingapplications.Examplesincludefacilitiesforreliablyhostingapplicationsatscalethathavebeenwrittenincommonprogramminglanguages,suchasAmazonElasticBeanstalk21,MicrosoftAzureWebSites22orGoogleAppEngine23.Providersoftenexposeprogramminginterfacesintotheirownsoftwareandservices,suchastheGoogleMapsAPI24,whichiswidelyusedtointegrateGoogleMapsintothirdpartysitesandservices.Forresearchinfrastructures,PaaSislargelymanifestintheformofcommandlineorvirtualdesktopaccesstoavirtualmachinewhichhasbeenspecificallytailoredwithapplicationsandlibrariesforagivenapplicationdomain.AnexampleisCloudBioLinux25,acustomisedversionofthepopularUbuntuLinuxoperatingsystemdistribution.InfrastructureasaService(IaaS):Thecapabilityprovidedtotheconsumeristoprovisionprocessing,storage,networks,andotherfundamentalcomputingresourceswheretheconsumerisabletodeployandrunarbitrarysoftware,whichcanincludeoperatingsystemsandapplications.Theconsumerdoesnotmanageorcontroltheunderlyingcloudinfrastructurebuthascontroloveroperatingsystems,storage,anddeployedapplications;andpossiblylimitedcontrolofselectnetworkingcomponents(e.g.,hostfirewalls).
ThisisexemplifiedbythelikesofAmazonWebServices26(AWS),GoogleComputeEngine27andMicrosoftAzureVirtualMachines28.Eachoftheseservicesgivesyounearinstantaccesstovirtualmachineshostedinoneofthecloudproviders’datacentres,pre-loadedwiththeoperatingsystemandoftentheapplicationsoftwareyourequire.IntheresearchcommunitythereareexamplesofgroupsusingVMware’svCloudDirector29softwaretoprovideIaaS,alsoOpenNebula30.GroupsareincreasinglyturningtoOpenStack31,asdescribedinAnnexB.
21http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/Welcome.html22http://azure.microsoft.com/en-gb/documentation/services/websites/23https://cloud.google.com/appengine/docs24https://developers.google.com/maps/25http://cloudbiolinux.org/26http://aws.amazon.com/27https://cloud.google.com/compute/28http://azure.microsoft.com/en-gb/services/virtual-machines/29http://www.vmware.com/products/vcloud-suite/30http://opennebula.org/31http://www.openstack.org/
CloudComputingforResearch&Innovation
ProjectDirectorsGroup(PDG)
AnnexA:WhatistheCloud?Afunctionalview 29
A.3DeploymentModels
Privatecloud:Thecloudinfrastructureisprovisionedforexclusiveusebyasingleorganizationcomprisingmultipleconsumers(e.g.,businessunits).Itmaybeowned,managed,andoperatedbytheorganization,athirdparty,orsomecombinationofthem,anditmayexistonoroffpremises.
Communitycloud:Thecloudinfrastructureisprovisionedforexclusiveusebyaspecificcommunityofconsumersfromorganizationsthathavesharedconcerns(e.g.,mission,securityrequirements,policy,andcomplianceconsiderations).Itmaybeowned,managed,andoperatedbyoneormoreoftheorganizationsinthecommunity,athirdparty,orsomecombinationofthem,anditmayexistonoroffpremises.
Publiccloud:Thecloudinfrastructureisprovisionedforopenusebythegeneralpublic.Itmaybeowned,managed,andoperatedbyabusiness,academic,orgovernmentorganization,orsomecombinationofthem.Itexistsonthepremisesofthecloudprovider.
Hybridcloud:Thecloudinfrastructureisacompositionoftwoormoredistinctcloudinfrastructures(private,community,orpublic)thatremainuniqueentities,butareboundtogetherbystandardizedorproprietarytechnologythatenablesdataandapplicationportability(e.g.,cloudburstingforloadbalancingbetweenclouds).
CloudComputingforResearch&Innovation
ProjectDirectorsGroup(PDG)
AnnexB:CloudComputingforResearchers 30
AnnexB:CloudComputingforResearchersManyoftheproductsandserviceswerefertohereareeitheropensource,oravailableinafreeorlowcostpubliccloudplatformtierthatresearcherscouldreadilyexperimentwith.
B.1SoftwareasaServiceAsnotedabove,softwareasaserviceisreallyanotherwayofsayingthattheproductisdeliveredasawebsiteorwebservice.SoftwareasaServiceproductsaretypicallyhostedbythefirmwhichhasproducedthem,andhencenotgenerallyavailablethroughprivateorcommunityclouds.Wehavesubdividedthiscategoryintotwo:
Generalisedproductsthathaverelevancetoresearchers
ExampleshereincludeawiderangeofInternetcollaborationsuitessuchasBox32,Dropbox33,GoogleDrive34andMicrosoftOneDrive35.Manyoftheseproductshaveadualtrackmodelthatincludesafreemiumserviceaimedatconsumers,andanenterpriseserviceforpayingcustomers.Thesetendtohaveverydifferenttermsandconditions,forexampleconsumertermsandconditionsoftenexplicitlyruleoutanywarrantywhereasenterpriseagreementswillprovideaServiceLevelAgreementincludingpenaltyclausesifSLAcommitmentsarenotmet.Bespokeenterpriseagreementsareoftencreatedthataddresscustomers’particularconcerns,followinglegalandcontractualreview.
WewillmakeaspecialmentionhereoftheGoogleAppsforEducationandMicrosoftOffice365collaborationsuites,whichhavebeenwidelytakenupbyresearchandeducationinstitutions.Wecansurmisethatthisisbecausetheyofferworld-leadingfacilitiesatzerocostforthebaseservice.Thesesameserviceshaveasignificantper-usercostforbusinesses,butarebeingwidelyadoptedbyfirmsbecauseofaperceptionthattheysignificantlyundercutthecostsofprocuringconventionalserverandstorageequipmentandtheninstallingandrunningtheclosestequivalentsoftwarelocally-suchasMicrosoft’sExchangeproduct.Furthermore,thesecloudproductsarecontinuallybeingdevelopedandevolved.GooglenotablytakeprideinmakinghundredsofsmallpointreleasesofGoogleAppsinatypicaltwelvemonthperiod.
Wehavealsoseenasignificantamountofinterestfromresearchersinusinggeneralisedcommunicationtoolssuchassocialmediaandblogstodisseminateresults,tosupportopenpractices,andtohelpstimulateapublicdebateabouttheirwork.Researchersworkinginscientificcomputingoftenexploitproductsandservicesaimed
32https://www.box.com/en_GB/home/33https://www.dropbox.com/34https://www.google.com/drive/35https://onedrive.live.com/about/en-us/
CloudComputingforResearch&Innovation
ProjectDirectorsGroup(PDG)
AnnexB:CloudComputingforResearchers 31
atthedevelopercommunitysuchasGitHub36versioncontrol,whichoffersbothafreemiumconsumerserviceandanenterpriseservice.
Productsthatarespecificallytargetedatresearchers
Inparalleltoresearchadoptionofthesortofgeneralisedserviceslistedabove,anumberofproductshaveemergedthatspecificallytargetresearchersasacommunity.Theseinclude:
• ProfessionalnetworkingsitessuchasMendeley37,Academia.edu38andResearchGate39.
• Specialisedfacilitiesforsharingcodeanddatasuchasfigshare.com40.
• Openaccessjournals,repositoriesandpre-printarchivessuchasarXiv.org41,PLOS42,institutionalrepositoriesandsitesfromtraditionalpublisherssuchasElsevier’sScienceDirect43.
• Equipmentcataloguessuchasequipment.data.ac.uk44andKit-Catalogue45fromJiscandEPSRC.
• LabmanagementsoftwaresuchasQuartzy46–afreeservicewhichcoincidentallylinksusersoflabequipmentwithvendorstoreducethefrictionofreplacingconsumablesandequipmentmoregenerally.
• Labtests“asaservice”–sendoffyoursampletobeprocessedanddownloadtheresults.
• IndependentSoftwareVendors(ISVs)ofscientificcomputingsoftwareandmajorhardwareprovidersareinmanycasesstartingtooffertheirownpackagesthroughtheirowncloudservice–e.g.theAtosExtremeFactory47.
B.2PlatformasaServiceAsnotedabove,ithasbecomeincreasinglycommonforcloudproviderstoexposetheunderpinningsoftheirinternalservicesforotherdeveloperstobuildupon.Thesetendtobepromotedfirstandforemostasfacilitiesforhelpingdeveloperstobuildhighvolumewebsites-suchasloadbalancingandhighlyperformance/availabledatabases.However,cloudprovidersalsooftenexpose“microservices”thatareofparticularrelevanceto
36http://github.com37https://www.mendeley.com/38https://www.academia.edu/39http://www.researchgate.net/40http://figshare.com/41http://arxiv.org/42https://www.plos.org/43http://www.sciencedirect.com/44http://equipment.data.ac.uk/45http://www.kit-catalogue.com/46https://www.quartzy.com/47http://www.bull.com/extreme-factory
CloudComputingforResearch&Innovation
ProjectDirectorsGroup(PDG)
AnnexB:CloudComputingforResearchers 32
researchers–e.g.machinelearninganddataprocessingfacilities.Wehavepickedoutsomeexamplesbelow.Itisimportanttonoteherethatmanyofthekeyplatformtierproductsareeitheropensourceoravailableinanopensourceequivalent,facilitatingtheiruseinprivateandcommunityclouds.RelationaldatabasesMostofthemajorpubliccloudprovidershavesomeformofrelationaldatabaseplatformservice.Theseareusuallycompatiblewithexistingwidelydeployedproducts–e.g.AmazonRDS48canbeusedinplaceofMySQLsimplybychangingtheDatabaseSourceName(DSN)yourcodeconnectsto.NoSQLCloudprovidersoftenoffersocalledNoSQLservices,whicharetypicallynon-relational(schemaless)approachestostoringandmanipulatinglargevolumesofinformation.ExampleshereincludeGoogleCloudDatastore49,AmazonDynamoDB50,severalAzureNoSQLservicesandtheopensourceCassandra51,CouchDB52andMongoDB53products.ObjectstoresSoftwarebuilttouseacloudmodelhastendedtoavoidhavingpermanentlymountedsharedfilesystemssuchaswemightseeinanHPCcluster.Instead,itiscommonpracticetouseobjectstoressuchasAmazonS354,GoogleCloudStorage55,AzureBlobStorage56oropensourcepackageslikeRedis57.Researchersmightbemorefamiliarwithusingmoreniche(yetverywellestablishedintheresearchcommunity)objectstoresoftwareliketheopensourceiRods58,whichisnotyetoffereddirectlyasaservicebythemajorcloudproviders.“Bigdata”ThegoldstandardsoftwareforhandlingbigdataistheApacheHadoop59project,anopensourcesuitewhichgivesroughlyequivalentcapabilitytoGoogle’sproprietaryBigTablealgorithmviaitsHBasesubsystem.Hadoopiswidelyavailablefromcloudprovidersasahostedservice,andalsooftensoldtoenterpriseandinstitutionsasabigdatahandlingsolution–e.g.viaapackagedappliance/cluster.SubsequenttothesuccessofHadoop,Google
48http://aws.amazon.com/rds/49https://cloud.google.com/datastore/docs/concepts/overview50http://aws.amazon.com/documentation/dynamodb/51http://cassandra.apache.org/52http://couchdb.apache.org/53https://www.mongodb.org/54http://aws.amazon.com/s3/55https://cloud.google.com/storage/56http://azure.microsoft.com/storage/57http://redis.io/58http://irods.org/59https://hadoop.apache.org/
CloudComputingforResearch&Innovation
ProjectDirectorsGroup(PDG)
AnnexB:CloudComputingforResearchers 33
openedupitsowninternalserviceastheBigQuery60product.Itisbecomingcommonforcloudproviderstoprovide“bigdataasaservice”usingHadooportheirownproprietarytechnology,e.g.AmazonoffertheirElasticMapReduce61product.MachinelearningMachinelearningisgenerallydefinedintermsof“teaching”thealgorithmtorecogniseagiventargetusingtrainingdatasets.Forexample,GooglerecentlyusedacacheofcatvideosonYouTubetocreateamachinelearningmodelthatwasreliablyabletoidentifycatsinvideos–andworkhasnowmovedonmorecomplexconceptssuchasdescribingthecontentsofaphotograph.MachinelearningtoolssuchasAmazonMachineLearning62,GooglePredictionAPI63,AzureMachineLearningStudio64andApacheMahout65aregenerallyavailable.Mahoutisopensource,butnotyetofferedasaplatformservicebythemajorcloudproviders,whohavetheirownalternatives.ItisimportanttonotethatmanyplatformservicesexposeApplicationProgrammingInterfaces(APIs)thatareuniquetothatservice.Whilstthisisnotuniversallytrue(e.g.AmazonRDS“lookslike”MySQL),thereisapotentiallysignificanteffortrequiredtomigratefromoneprovider’splatformservicetoanother.Furthermore,someoftheassociateddata,suchasamachinelearningmodel,maynotbereadilyexportedtoanotherprovider.Thereforeswitchingcostswillincludetheregenerationofthatdataset.
B.3InfrastructureasaServiceThisisthecloudtierwhichresearchersinscientificcomputingmaybebestequippedtoengagewithdirectly,andindeedmanyresearchersareusingprivate,communityorhybridcloudstodaywithoutperhapsrealisingit–forexampleCERNhasmoveditsentirecomputeinfrastructureovertotheOpenStack66opensourcecloudsoftware.InfrastructureasaServicetypicallyequatestorunningupvirtualmachines(VMs)onthecloudprovider’sshared(multitenant)infrastructurebutitequallyappliestostorageandnetworkingconfiguration.Cloudprovidershavevariousstrategiesforpreventingthesevirtualmachinesfrominterferingwithoneanother,suchassecuritygroupstoprovideIPaddresslevelaccesscontrols.Whilstthereareanumberofvirtualmachineimageformatsused,toolsarealsoavailabletoconvertfromonetoanother,andinmostcasesitispossibletotakeavirtualmachineimagefromoneproviderandconvertittotheformatusedbyanother–e.g.fromaVMwareVMDKdiskimageoranOpenStackQCOW2imagetoanAmazonMachineImage(AMI).
60https://cloud.google.com/bigquery/61http://aws.amazon.com/elasticmapreduce/62http://aws.amazon.com/machine-learning/63https://cloud.google.com/prediction/64https://studio.azureml.net/65http://mahout.apache.org/66http://www.openstack.org/
CloudComputingforResearch&Innovation
ProjectDirectorsGroup(PDG)
AnnexB:CloudComputingforResearchers 34
OrchestrationInfrastructureasaServiceisverywellestablished,withawiderangeoforchestrationtoolsfromcloudprovidersandtheopensourceandcommercialsoftwarecommunities.AtoneextremethesepermittheusertobringupacompletesuiteofVMsfeaturingdifferentapplicationsandconfigurations,tocreatevirtualdatacentres.AtanotherextremewemightsimplybeinterestedinensuringthatcertainsoftwaredependenciesareinstalledonaVM.OrchestrationexamplesfromcloudprovidersincludeAmazon’sCloudFormation67,GoogleCloudDeploymentManager68andAzureAutomation69.Therearealsoprovider-agnosticorchestrationtoolssuchasOpenNebula70(AWS,Azureandprivate/hybridclouds).CannedVMsOneofthemoredauntingaspectsofsupportingscientificcomputingcanbethesheerrangeofsoftwarepackagesinusebyaninstitution’sresearchers.Mostmajorcloudprovidersofferamarketplaceofready-madeVMswithcommonlyusedsoftwarepre-installedonthem,suchasAmazon’sAWSMarketplace71orMicrosoft’sVMDepot72.Thishasthepotentialtobeahugetimesaverforboththeresearcherandthescientificcomputingteam,e.g.foranapplicationwithcomplexdependenciessuchastheGalaxy73bioinformaticssuite.However,theresearchermayneedtobeextremelycarefulabouttrackingsoftwareversionswhenusingreadymadeVMs,tobesureofbeingabletoreproduceresults.Forexample,itmaybeadvisabletoarchivetheVMusedforaparticularpieceofwork,justincasethisturnsouttonolongerbeavailablethroughthecloudprovideratalaterdatewhenitisnecessarytorepeatthework.BringYourOwnLicenseItisbecomingincreasinglycommontofindbothopensourceandcommercialofftheshelfscientificcomputingsoftwareavailablepackagedasVMs,e.g.ANSYSprovidetheirFLUENTCFDsoftwareinthiswaythroughtheAWSMarketplace.CommercialsoftwareistypicallyprovidedunderaBringYourOwnLicensemodel,wheretheenduserisexpectedtoalreadybealicensedcustomer-perhapswithanegotiatedlicenseextensiontousecloudfacilities.Withoutthelicensekeyoraconnectiontoalicenseserver,thesoftwareisuseless.InsomecasestheIndependentSoftwareVendor(ISV)operatestheirowncloudbasedlicenseserverforrealtimecheckingoflicenseentitlementagainstconcurrentusage.Quasi-HPCfacilities
67http://aws.amazon.com/cloudformation/68https://cloud.google.com/deployment-manager/69http://azure.microsoft.com/en-gb/services/automation/70http://opennebula.org/71https://vmdepot.msopentech.com/List/Index72https://vmdepot.msopentech.com73https://galaxyproject.org/
CloudComputingforResearch&Innovation
ProjectDirectorsGroup(PDG)
AnnexB:CloudComputingforResearchers 35
OfcoursemanyresearchersareaccustomedtorunningtheircodesontraditionalHPCclusters,andcloudprovidershaverecognisedthisbyprovidingtoolsthatlettheresearcherconvenientlybringupalargenumberofVMswithdedicatedloginnodesandsharedstorageinthetypicalHPCclustermodel.ExamplesoftheseincludetheAzureHPCClusterService74andAmazon’scfnclustertool75.These“quasiclusters”oftendonothavethesortofperformantinterconnectandhighperformanceparallelfilesystemthatwemightseeonatrueHPCcluster–althoughMicrosoftnowoffercomputenodeswithInfinibandinterconnectaspartofAzure.
Formanyclassesofcomputeworkloadthismaynotbeablocker(e.g.embarrassinglyparallelbioinformaticsjobs),butfurtherworkisstillrequiredwithcommonlyusedcodestoexploreperformanceaspects.Somecloudprovidershaverecognisedthatthereissufficientdemandforhighperformanceinterconnectsandaremakingspecialisthardwareavailablesuchasextralargenodes,lowlatencyinterconnects,GPUsandevenFPGAs.
Virtualization,containersandDockerInanidealworld,itwouldbepossibletosimplypickupanapplicationandmoveitanditsdependenciestothecomputeplatformofchoice.Thisissomethingthatthescientificcomputingcommunityhasoftenaddressedbystaticallylinkingexecutablestothelibrariesthattheydependon,butintoday’sincreasinglycomplexdataprocessingenvironmentthisapproachhasitslimitations.Virtualizationatthemachinelevelhasmadeitpossibletobuildthedesiredenvironment,e.g.ontheresearcher’slaptop,andthentakeasnapshotofittorunonthecomputeplatformofchoice.However,thewiderangeofmachineimagesandhypervisorsmakesitdifficulttogeneralisethisapproach–whatisneededissomekindofpackagingstandard.Afurtherwrinkleisthathypervisorstypicallyaddanunwelcomeoverheadtocomputeanddataintensiveworkloads.TheDocker76projectattemptstosolvetheseproblemsusingviaoperatingsystemlevelvirtualizationapproach,buildingonLinuxkernelfeaturestoprovideportableprovideragnostic“containers”thatencapsulateapplicationsandtheirdependencieswhilstisolatingthemfromeachotherandimprovingontheperformanceofahypervisorapproachtovirtualization.Thisisactuallynotanewapproach,andDockerbuildsonideasthatmaybefamiliarfromSolarisZonesandmainframeoperatingsystems.Dockercontainersareinherentlyportable.SupportforDockerhasbeenforthcomingfromAmazon77,Google78andMicrosoft79amongstothers.Whilstthereareothercompetingtechnologies,DockerhasreachedcriticalmassintermsofmindshareandiswidelyusedwithintheInternetindustrye.g.eBay,Yelp,Spotify,YandexandBaidu.TheformationoftheOpenContainerProject80insummer2015,withbroadindustrysupport,suggeststhatapplicationvirtualisationissettorapidlybecomethenorm. 74http://azure.microsoft.com/en-gb/solutions/big-compute/75http://aws.amazon.com/hpc/cfncluster/76http://docker.io77http://docs.aws.amazon.com/AmazonECS/latest/developerguide/docker-basics.html78https://cloud.google.com/compute/docs/containers79https://azure.microsoft.com/en-gb/documentation/articles/virtual-machines-docker-vm-extension/80https://www.opencontainers.org/
CloudComputingforResearch&Innovation
ProjectDirectorsGroup(PDG)
AnnexC:Trust&publiccloud 36
AnnexC:Trust&publiccloudPost-Snowden,wehaveincreasinglystartedtocastacriticaleyeonservicesprovidedbyUSownedtechfirms-havetheycolludedwiththeFBI,CIAorNSAtobuildinbackdoorstotheirproducts?TheUKGovernmenthasprovided81aconvenientsetofCloudSecurityPrinciplesthatresearchersandinstitutionscanusetoestablishwhetheraproviderhastakenadequatecaretoprotecttheirdata.LeadingpubliccloudproviderslikeAmazon82andMicrosoft83haveprovidedtheirowncompliancestatementsregardingtheCloudSecurityPrinciples.
Publiccloudproviderswouldalsonotethatitisperfectlypossibletocreateaninsecureorunreliablecloudservicesimplybynotfollowingbestpractice,justasitwasalwayspossibletocreateaninsecureorunreliablein-houseservice.Amazoncodifythisthroughtheirstatementaboutsharedresponsibilities,showninthefigurebelow.
Figure2.Amazonsharedresponsibilitymodel
ForresearchersandinstitutionsoperatingintheUKandthegreaterEuropeanEconomicArea,thereareparticularissuesarounddatasetsthatithasbeenstipulatedmaynotleavethecountry,orleaveEuropeasawhole.AnexampleofthiswouldbegenomedatagatheredaspartofGenomicsEngland’s100,000Genomesproject-whichmustbekeptinEngland84.
Conversely,theEU-SafeHarborAgreement85whichcreatesamanagedprocessgoverningthecontrolledreleaseofEuropeandatatotheUnitedStates,iscurrentlybeingchallengedintheEuropeanCourtofJusticebyAustrianlawstudentMaximillianSchrems,whoallegesthatitcontravenesEUDataProtectionlegislation86.IftheSchremscaseissuccessful,EU-SafeHarborwillbestruckfromthebooksandUSserviceprovidersmaybeforcedtoopenEuropeanfacilitiesorpreventedfromoperatinginEurope.
81https://www.gov.uk/government/publications/cloud-service-security-principles
82http://d0.awsstatic.com/whitepapers/compliance/AWS_CESG_UK_Cloud_Security_Principles.pdf83http://www.microsoft.com/en-gb/enterprise/it-trends/cloud-computing/articles/14-points.aspx84http://www.genomicsengland.co.uk/the-100000-genomes-project/faqs/data-faqs/85http://en.wikipedia.org/wiki/International_Safe_Harbor_Privacy_Principles86http://cjicl.org.uk/2015/04/13/safe-harbor-before-the-eu-court-of-justice/
CloudComputingforResearch&Innovation
ProjectDirectorsGroup(PDG)
AnnexC:Trust&publiccloud 37
WemightdrawtheconclusionfromSnowdenthatallformsofdigitaltechnologyhavebeenorwillbeinfiltratedbynationalactors,butitisalsothecasethatthemajorcloudprovidershavebeenworkingtirelesslytoreducetheattacksurface-e.g.byencryptinglinksbetweendatacentres,usingtwofactorauthentication,andencryptingdataintransitandatrest.Forsoundbusinessreasons,cloudprovidersaretryingtoensurethatanauditableandjudicialprocessisfollowedwhenprovidinginformationaboutusersortheirdatatoauthorities,andtofrustrate“trawling”efforts.Forexample,MicrosoftmountedalegalchallengearoundthereleaseofdatafromtheirDublinAzuredatacentretotheUSgovernment87,andGooglepublisharegularTransparencyReport88quantifyinggovernmentrequestsfordata.
Tohelpestablishsomenormsarounduseofpubliccloudgloballywehaveprovidedseveralexamplesbelowofcasestudieswherecloudtechnologiesbeingusedforsensitiveapplicationsincludingpersonaldata,governmentdataandincaseswhereintellectualpropertyisakeyconsideration:
• “TheFinancialIndustryRegulatoryAuthority(FINRA)intheUSisusingAWStoanalyzeandstoreapproximately30billionmarketeventseveryday,savingsome$10m-$20mthroughthemovetothecloud”-FINRA89
• “Inthepast,asimplequestionaboutgeneticslinkedtoamedicalconditionmighttakehours,orevendays,toexecute.ByleveragingGoogleCloudPlatform,theanalysisof1,000patients’genomicdata,across218diseases,generatesnearreal-timeresults”–NorthrupGrumman90
• “ThePhilipsHealthSuitedigitalplatformanalyzesandstores15PBofpatientdatagatheredfrom390millionimagingstudies,medicalrecords,andpatientinputstoprovidehealthcareproviderswithactionabledata,whichtheycanusetodirectlyimpactpatientcare”–Philips91
• “MountSinaiandtheircollaboratorsatStationXareminingthemorethan2,000breastandovariantumorandgermlineDNAsequences(100TBdata)generatedbyTheCancerGenomeAtlasConsortium”–MountSinaiMedicalCentre92
• "Insteadofhavingtospendintheorderof£50,000peryearonstorage,wecanexpandourcloudstorageorbuysometier-threestorageinstead.That'sanorderofmagnitudecheaper-wecanliterallyknockazerooffthatsumwhenweneedtoexpand”–HomertonHospital93
FromaUKperspectivewewouldalsonotethatGoogleAppsiscurrentlybeingrolledoutacrossHMRevenue&Customs94,andtheUKSupremeCourthasmovedtoOffice36595.
87http://www.theguardian.com/technology/2014/dec/14/privacy-is-not-dead-microsoft-lawyer-brad-smith-us-government88http://www.google.com/transparencyreport/89http://aws.amazon.com/solutions/case-studies/finra/90http://googlecloudplatform.blogspot.co.uk/2015/03/personalized-medicine-with-Northrop-Grumman-and-Google-Cloud-Platform.html91http://aws.amazon.com/solutions/case-studies/philips/92http://aws.amazon.com/solutions/case-studies/mt-sinai/
93http://www.computing.co.uk/ctg/news/2360094/case-study-how-homerton-hospital-saved-90-per-cent-on-storage-hardware-by-shifting-from-npfit-for-clinical-imaging
94http://thenextweb.com/google/2015/06/05/the-uk-tax-man-switches-to-googles-cloud-and-drops-microsoft/
CloudComputingforResearch&Innovation
ProjectDirectorsGroup(PDG)
AnnexD:NeIandcloud 38
AnnexD:NeIandcloud
D.1Publicinvestmentine-InfrastructureInvestmentsbyBIS,theResearchCouncilsandHEIsin2011-12(£160M),2012-2013(£189M)and2014-15(£257M)haveresultedincoreelementsofthenationale-Infrastructurebeingputinplace.
2011-2012
InvestmentsweremadeincoreHPCandNetworkinginfrastructure.InadditioninvestmentsweremadeintheAuthenticationInfrastructureMoonshot(nowknownasJiscAssent).TheseinvestmentsarelistedinTable1below.
Table4:2011-2012HPCInvestments
HPCProject RC Amount/£M
NationalService EPSRC,NERC 43
HartreeCentre STFC 30
DIRAC STFC 15
GRIDPP STFC 3
TheGenomeAnalysisCentre(TGAC)
BBSRC 8
Monsoon NERC/MetOffice 1
JASMIN2&CEMS NERC,&UKSA 7.75
RegionalCentres:N8,SES5,MID+,HPCMidlands,ARCHIE-
EPSRC 10
95http://www.microsoft.com/en-gb/enterprise/it-trends/cloud-computing/articles/how-the-cloud-is-future-proofing-it-at-the-uk-supreme-
court.aspx
CloudComputingforResearch&Innovation
ProjectDirectorsGroup(PDG)
AnnexD:NeIandcloud 39
HPCProject RC Amount/£M
WeSt
JANETNetworkandAuthentication(Moonshot)
Jisc 31
HPCDataStorage EPSRC,STFC 15
2012-2013
BigDataprojectsusingfundsannouncedbytheGovernmentinDecember2012werefundedatthistime.MajorAwardshavebeenmadeto18centresintheUK,16ofwhomareHEIs.TheseawardsarelistedinTable2.Thepre-eminentroleofHEIsinmanagingandprovidingnationalandLargeSpecialistdataandcomputeservicestoUKacademiaisemphasisedbytheseawards.
Table5:BigDataInvestments2012-2013
BigDataProject RC Amount/£M
Digitaltransformationsinartsandhumanities
AHRC 8
E-infrastructureforbiosciences BBSRC 13
ResearchdatafacilityandsoftwareDevelopment
EPSRC 8
Administrativedatacentres ESRC 36
Understandingpopulations ESRC 12
Businessdatasafe ESRC 14
Biomedicalinformatics MRC 55
CloudComputingforResearch&Innovation
ProjectDirectorsGroup(PDG)
AnnexD:NeIandcloud 40
BigDataProject RC Amount/£M
NERCEnvironmentalBigDataInitiative
NERC 13
SquareKilometreArray STFC 11
EnergyEfficiencyComputingHartreeCentre
STFC 19
Total 189
Furtherinvestmentsweremadeasfollows:
• TheMedicalResearchCouncil(MRC)willinvest£50millioninbioinformatics,whichusesmanyareasofcomputerscience,statistics,mathematicsandengineeringtoprocessbiologicaldata.Theseinclude£19MtotheFarrInstituteofHealthInformaticsResearchatnodes,London,Manchester,Wales,Scotland;andthe£32MMedicalBioinformaticsinitiativetofund5projects;eMedLab(UCLPartners-Crick-Sanger-EBI),TheMRCConsortiumforMedicalMicrobialBioinformatics,LeedsMRCMedicalBioinformaticsCentreandtheMRC/UVRIMedicalInformaticsCentre.
• TheArtsandHumanitiesResearchCouncil(AHRC)invested£4millionin21newopendataprojects.Theywillmakelargedatasetsthatordinarilyonlyacademicswouldhaveaccesstoaccessibletothegeneralpublic.
• TheEconomicandSocialResearchCouncil(ESRC)hasinvested£14millionin4newresearchcentresatEssex,Glasgow,UCLandLeedsUniversities,aswellasafurther£5MinvestmentintheAdministrativeDataServiceatEssexUniversity.Thecentreswillmakedatafromprivatesectororganisationsandlocalgovernmentaccessibletoresearchersinvestigatinganythingfromtransporttoobesity.Atpresentthedataisbeingcollectedbytheseorganisations,butisnotbeingusedforresearchpurposes.Theprogrammeisdividedupinto3phases:Phase1wassetuptogetinformationfromgovernmentdepartmentsandsetupAdministrativeDataResearchNetwork;Phase2:setupBusinessandLocalGovernmentDataResearchCentres;collectandsetupPhase3SocialMediaandThirdSectordata.
• TheNaturalEnvironmentResearchCouncil(NERC)hasinvested£4.6millionoffundingfor24projectstohelptheUKresearchcommunitytakeadvantageofexistingenvironmentaldata.
• TheEngineeringandPhysicalSciencesResearchCouncil(EPSRC)haveinvested£8MintheResearchDataFacility(RDF)whichisoperatedbyEPCC.
TheRDFisdesignedtoprovideresearchdatamanagementanddataanalysisservicesforALLRCUKresearchers.Accesswillbegovernedbyapeerreviewmechanism.EarlyprojectsincludethehostingoftheDiRACCodeBenchmarkingProject.
CloudComputingforResearch&Innovation
ProjectDirectorsGroup(PDG)
AnnexD:NeIandcloud 41
Otherdatainfrastructureinvestmentsincludethe
• EuropeanBioinformaticsInstituteatHinxton(Cambridge)whichreceived£75MfromBBSRC,
• theOpenDataInstitute,anotfor-profitfundedbyInnovationUK(£10Mover5years,subjecttoindustryinvestment)andbyindustryisabigdataundertakingdedicatedtoprovidingopenaccesstodatafromacrossthepublicsectorinordertoenableindustrialandacademicexploitation.
• In2012,theClinicalPracticeResearchDatalink,a£60millionservicefundedbytheMHRAandtheNationalInstituteforHealthResearch,wasestablishedtoprovidepatientdataformedicalresearch.
• TheGovernmentearmarked£100millionfortheNHStosequencetheDNAofupto100,000patientswithcancerandrarediseases,whichwillincludethedevelopmentofappropriatedatainfrastructure(NHS).
2014-2015
Threemajorinvestmentsdominatedthisperiod
1. CentreforCognitiveComputingattheHartreeCentre.Thiswasfundedatthe£115Mlevelwithafurther£230MfromIBM
2. A10PflopSupercomputerfortheMetOffice(£100M)3. AlanTuringCentreforDataScience(£42M)
Inadditionitwasannouncedthatafurther£100MwouldbemadeavailabletotheSKAProjectaspartofBigDataInvestments.
D.2NeI,cloudtechnology&theaccessagendaAtpresenttheNeIprojectsarelargelyhiddenfromthegeneralresearchcommunityandsomewouldsayarelargelyhiddenfromeventheirintendeduserbase.
HPCandHTCarestillregardedasdifficulttoolstousebyresearchers,eventhoughtheyareprobablynomoredifficulttousethanapieceoflabequipment.
HowdowethenmakeHPC,DataIntensiveComputingandHTCberegardedaspartoftheirbasicresearchlaboratory?
Thisrequiresresearcherstoviewtheseresourcesas
CloudComputingforResearch&Innovation
ProjectDirectorsGroup(PDG)
AnnexD:NeIandcloud 42
• Viewable.Theyaremanifestedtangibly,whetheronadesktoporamobiledevice
• Easytointerfacewithsothatuserscano Submitworkflowso Constructworkflowso Seewhatresourcesareavailableo Checkwhatresourcestheyhaveused
• NeedingonlyONEuseridentity
Thediagramsaboveandbelowgivesanexampleofhowthisworks.Asfarastheuserisconcerneddirectlyinterfacingisnownotneeded.WhatbecomesimportantnowaretheCloudServicesdescribedearlierinthisreport.
Theneedtointeractdirectlywiththecomputeanddataresourcesisremovedandisreplacedbyasetofgenericfunctionsinaworkflowthatallowtheuserineffecttousethesameworkflowonamultiplicityofsystems.
Itisthisthatremovesabarriertousagebygivenusertheon-demandself-servicetheywouldliketouse.
CloudComputingforResearch&Innovation
ProjectDirectorsGroup(PDG)
AnnexD:NeIandcloud 43
AnexampleintheUKofafunctioningCloudisJASMIN296.JASMIN2supportsthedataanalysisrequirementsoftheUKandEuropeanclimateandearthsystemmodellingcommunity.Itconsistsofmulti-Petabytefaststorageco-locatedwithdataanalysiscomputingfacilities,withsatelliteinstallationsatBristol,LeedsandReadingUniversities.
JASMIN2isasuccessfulClouddeploymentthatservicesakeyUKresearchcommunity.
96http://jasmin.ac.uk/