Translational Cancer Detection Pipeline over CELAR ... · performance (for example, how does...

30
Automatic, multi-grained elasticity-provisioning for the Cloud Translational Cancer Detection Pipeline over CELAR Application and Evaluation Deliverable no.: D8.3 Date: 29-09-2015 CELAR is funded by the European Commission DG-INFSO Seventh Framework Programme, Contract no.: 317790

Transcript of Translational Cancer Detection Pipeline over CELAR ... · performance (for example, how does...

Page 1: Translational Cancer Detection Pipeline over CELAR ... · performance (for example, how does execution time scale with thread count?) ... and optimizes job assignment to maximize

Automatic,multi-grainedelasticity-provisioningfortheCloud

TranslationalCancerDetectionPipelineoverCELARApplicationandEvaluation

Deliverableno.:D8.3Date:29-09-2015

CELARisfundedbytheEuropeanCommissionDG-INFSOSeventhFrameworkProgramme,Contractno.:317790

Page 2: Translational Cancer Detection Pipeline over CELAR ... · performance (for example, how does execution time scale with thread count?) ... and optimizes job assignment to maximize

D8.3 Translational Cancer Detection Pipeline over CELAR Application and Evaluation 2

TableofContents1 Introduction...................................................................................................................................62 Design............................................................................................................................................82.1DefinitionsandTerms............................................................................................................................82.2Actors.....................................................................................................................................................92.2.1 Administrator...................................................................................................................................92.2.2 User................................................................................................................................................102.2.3 Supervisor......................................................................................................................................102.2.3.1DistinguishingSCANandSupervisorRoles....................................................................................112.2.4 InfrastructureProvider..................................................................................................................112.3Architecture.........................................................................................................................................112.4Scheduler.............................................................................................................................................122.4.1 JobsandReward............................................................................................................................122.4.2 InteractionwiththeSupervisor.....................................................................................................142.4.3 DynamicProfiling...........................................................................................................................152.5Workers...............................................................................................................................................162.6DistributedFilesystem.........................................................................................................................162.7ClientUtilities......................................................................................................................................162.8SCANv1vs.SCANv2..............................................................................................................................172.9Summary..............................................................................................................................................183 Implementation...........................................................................................................................193.1Scheduler.............................................................................................................................................193.2Workers...............................................................................................................................................193.3Sharedfilesystem.................................................................................................................................193.4Distributedfilesystem..........................................................................................................................203.5SCAN+CELARDeployment...................................................................................................................204 ExampleWorkload.......................................................................................................................234.1GATK....................................................................................................................................................234.2GROMACS............................................................................................................................................234.3CellProfiler...........................................................................................................................................245 Evaluation....................................................................................................................................255.1HorizontalScalability...........................................................................................................................255.2JobDurationPrediction.......................................................................................................................265.3JobParallelism.....................................................................................................................................276 ConclusionandFutureWork........................................................................................................297 Bibliography.................................................................................................................................30ListofFiguresFigure1:DiagramofActorsandrelationshipsintheSCANsystem.............................................................9Figure2:CorefunctionalityoftheSCANService.......................................................................................12Figure3:SCANCAMFdescriptionandtheScheduler’skeyattributes......................................................20Figure4:ConstraintsandstrategiesforhorizontalscalingofSCAN’sworkerpool...................................21Figure5:SlipStreamviewofSCANnewlydeployedonOkeanos...............................................................22Figure6:TimeseriesofrelativeerrorsinestimatingGATKtoolruntime..................................................26

Page 3: Translational Cancer Detection Pipeline over CELAR ... · performance (for example, how does execution time scale with thread count?) ... and optimizes job assignment to maximize

D8.3 Translational Cancer Detection Pipeline over CELAR Application and Evaluation 3

Figure7:GROMACSperformancevs.threadcount...................................................................................27Figure8:NumberofcoresallocatedtoeachGROMACStask....................................................................28ListofAbbreviationsNGS NextGenerationSequencingAPI ApplicationProgrammingInterfaceDB DatabaseGUI GraphicalUserInterfacePaaS PlatformasaServiceUML UnifiedModelingLanguageVM VirtualMachineWP WorkPackage

Page 4: Translational Cancer Detection Pipeline over CELAR ... · performance (for example, how does execution time scale with thread count?) ... and optimizes job assignment to maximize

D8.3 Translational Cancer Detection Pipeline over CELAR Application and Evaluation 4

DeliverableTitle TranslationalCancerDetectionPipelineover

CELARApplicationandEvaluationDeliverableno D8.3Filename CELAR_D8.3.docxAuthor(s) ChristopherSmowton,WeiXingDate 2015-09-29Startoftheproject:01-10-2013 Duration:36months Projectcoordinatororganization:ATHENARESEARCHANDINNOVATIONCENTERININFORMATIONCOMMUNICATION&KNOWLEDGETECHNOLOGIES(ATHENA) Duedateofdeliverable:30-09-2015 Actualsubmissiondate:30-09-2015 DisseminationLevel

X PU Public PP Restrictedtootherprogrammeparticipants(includingtheCommissionServices) RE Restrictedtoagroupspecifiedbytheconsortium(includingtheCommissionServices) CO Confidential,onlyformembersoftheconsortium(includingtheCommissionServices)Deliverablestatusversioncontrol

Version Date Author1 2015-07-20 ChristopherSmowton,Wei

Xing2 2015-09-21 ChristopherSmowton,Wei

Xing3 2015-09-22 ChristopherSmowton,Wei

Xing4 2015-09-28 ChristopherSmowton,Wei

Xing5 2015-09-29 ChristopherSmowton,Wei

Xing6 2015-12-08[post-review] ChristopherSmowton,Wei

XingAbstract

TheaimofCELARWP8istodevelopacloud-basedcancerdetectionapplication(SCAN)fortranslational cancer research. The key objective of the SCAN development is to managecomputationanddataelasticityinbiologicalapplicationsusingtheCELARcloudplatform.In

Page 5: Translational Cancer Detection Pipeline over CELAR ... · performance (for example, how does execution time scale with thread count?) ... and optimizes job assignment to maximize

D8.3 Translational Cancer Detection Pipeline over CELAR Application and Evaluation 5

this technical report,wepresent the finaldesignof theSCANbiological analysisplatform,describeitsimplementationandevaluateitsperformance,andspecificallyitsabilitytoscalescientificcomputing,againstthreebiologicalanalysispipelinesthatarerepresentativeofthebestpracticesinbiocomputing.SCAN is a ResourceManager (RM) that coordinates the executionof end-users’ biologicalanalyses using a dynamically mutable pool of worker virtual machines. It improves overexistingRMswithbettersupportfortheelasticcloudexecutionscenario,andbyintroducinganovelmetricbasedonuserhappinessthathelpstoinformelasticcloudscalingdecisions.Keywords

Translational Cancer Research, Cancer Detection Pipeline, CELAR System Architecture,AutomatedElasticityProvisioning,ResourceAllocation,ApplicationManagementPlatform,ElasticityPlatform,CloudComputing,Workflows

Page 6: Translational Cancer Detection Pipeline over CELAR ... · performance (for example, how does execution time scale with thread count?) ... and optimizes job assignment to maximize

D8.3 Translational Cancer Detection Pipeline over CELAR Application and Evaluation 6

1 Introduction

Inthefieldofbiologicalcomputing,adiversearrayofscientificprogramsandalgorithmsareused toanalyzeandmodelbiologicalprocesses, fromdeterminingDNAmutations tomodelingmolecularstructuresandinteractions.

SCANisaplatformforexecutingthoseanalysisprogramsinanelasticcloudenvironment.It isaresourcemanager,keeping trackofqueuesofuseranalysisrequestsandapoolofworkers thatcanrun them. It isparticularlysuited to theelasticcloud in that itexposesmetrics indicating how well it is currently performing, and allows run-time addition,removalandreconfigurationofworkers.

Thiskindofelastic cloudresourcemanagercanbenefitbiological researchersby freeingthemfromhavingtomanuallyallocateresourcestotheiranalyses,whichoften-non-expertusersareapttodopoorly.

A system or person, called SCAN’s Supervisor, is typically expected to monitor SCAN’sperformancemetrics and therefore dynamically adjust the number and type ofworkersthatareplacedatSCAN’sdisposal.CELARworkswithSCANbyplayingthisSupervisorrole.

Biological computing tasks are, in practice, typically executed using a cluster resourcemanager, such as the classic Portable Batch System or a more modern system such asApache YARN. SCAN aims to match and improve upon these systems’ capabilities, andparticularlyexploitopportunitiespresentedbytheelasticcloud:

• Under most widely-deployed resource management schemes, users oradministrators are responsible for determining the distribution of resourcesbetweenjobs(forexample,howmanycoresshouldamulti-threadedjobrequest?).SCAN adaptively allocates resources, taking into account both the observedpropertiesoftheuser’staskandthecurrentworkloadlevelSCANisexperiencing.

• SCANdynamicallyprofilesusers’jobs,characterizingtherelationshipbetweentheirinputdataandrunningtime,andtherelationshipbetweenresourceallocationandperformance(forexample,howdoesexecutiontimescalewiththreadcount?)

• SCAN maintains a distributed filesystem (DFS) and optimizes job assignment tomaximizedatalocality.ThisissimilartotheDFSusedwithmoderncloud-orientedresource managers, such as Apache YARN, but is suitable for use with classicalworkloadsthataremoreoftenruninclusterenvironments.

Page 7: Translational Cancer Detection Pipeline over CELAR ... · performance (for example, how does execution time scale with thread count?) ... and optimizes job assignment to maximize

D8.3 Translational Cancer Detection Pipeline over CELAR Application and Evaluation 7

SCANwasdesignedattheCancerResearchUKManchesterInstitute.ThuswhileSCANisageneral-purpose resource manager, suitable for any variety of scientific computing orindeedanyfieldthatneedsjobdistributionacrossanelasticallyallocatedworkerpool,theauthors’ particular use case centers around biocomputing and more specifically canceranalysis.Cancerresearchrequireslarge-scalecomputingbecausetheinstrumentsusedinthe field, such as DNA sequencers, often produce large datasets that require significantprocessingbeforeusefulinformationcanbeextracted.Performingtheseanalysesquicklyisalso vital if they are to be made practical for use in day-to-day medicine, for exampleprovidingdiagnosisfromatumorbiopsywithinanacceptabletimeframe.Cancerresearchalsoproduceshighlyvariabledemandforcomputingresourcesduetoshort-termshiftsinresearchers’ focus; thus it stands to benefit greatly from elastic scaling of the computeresourcesemployed.

SCAN’sdesignwaspreviouslysketchedindeliverableD8.1[1]andelaboratedinD8.2[2,1]alongwithdetailsofitsinitialimplementation.WealsobenchmarkedtheversionofSCANdescribed in D8.2 (“SCANv1”) [3] and studied the opportunities for dynamic resourceallocationtoSCANjobs[4]sincepublishingD8.2.Inthisdocument,wedescribeextensionsof SCANv1’s design (producing SCANv2), describe the implementation of SCANv2, andevaluate SCANv2’s functionality using three sample biological applications, picked toexhibit a range of different resource requirements typical of different sub-fields ofbiocomputing. Whilst SCANv1 was originally specified and designed to tackle cancerresearch problems, in SCANv2 we generalize to address biological bio-computing ingeneral,whichwearguehassimilarneeds.

WedescribeSCANv2’sdesigninsection2,givedetailsof its implementationinsection3,characterize thebiologicalapplicationsweuse to test it insection4,evaluateSCANv2 insection5,andfinallyconcludeinsection6.

Page 8: Translational Cancer Detection Pipeline over CELAR ... · performance (for example, how does execution time scale with thread count?) ... and optimizes job assignment to maximize

D8.3 Translational Cancer Detection Pipeline over CELAR Application and Evaluation 8

2 Design

SCAN is a resource manager for biological analysis execution in an elastic cloudenvironment.SCANisagnosticofthespecifickindofworktheuserswishtoexecuteinthecloud. However, within the specific subfield of cancer research SCAN can be applied tomanage common workflows such as DNA mutation and structural variation analysis,proteinidentificationfrommassspectrometryorchemicalandphysicalsimulation.

2.1 DefinitionsandTerms

Users are humans that have analysis programs theywish to execute in an elastic cloudenvironment.

TheAdministratorisahumanthatsetsupSCANatthestartoftheday.

TheSupervisor is a human or automated system that continually adjusts the resources(virtualmachinesandtheirvirtualhardware,includingcores,memoryanddisks)providedtoSCAN.

TheInfrastructureProvidersuppliesthoseresources.

JobsaretheunitsofworkUserssubmittoSCANforexecution.Theyconsistofashellscriptforexecutionandaclass.Theymaybeannotatedwithanestimatedsize.

Jobclassesspecifywhichjobsmayberegardedassimilar,andwhoseestimatedsizesarecomparable, for thepurposeofperformancemodeling.Comparing theestimated sizesofjobsofdifferingclassismeaningless.

Composite tasks are workflows consisting of multiple jobs: for example, one mightproduceanintermediateresultthatanotherconsumes.

IndesigningSCAN,weaimtoprovideaservicethatiseasyfortheUsertointeractwith,inthat it is easy for them to supply SCAN with enough information to run their jobseffectively. It should also be informative to the Supervisor, so that they can practicallydevisepolicyrelatingSCAN’sperformancemetricstotheresourcesitshouldbegiven,andof course should run User workloads more effectively, and with less effort, than theaverageusercoordinatingtheirworkloadmanually.

Page 9: Translational Cancer Detection Pipeline over CELAR ... · performance (for example, how does execution time scale with thread count?) ... and optimizes job assignment to maximize

D8.3 Translational Cancer Detection Pipeline over CELAR Application and Evaluation 9

2.2 Actors

Figure1:DiagramofActorsandrelationshipsintheSCANsystem

Figure 1 illustrates the key actors involved with SCAN: the Administrator who sets theSCAN service up to begin with, the Users who submit work for SCAN to execute, theSupervisor (which may be a human or another system, such as CELAR) who monitorsSCAN’sperformance,andtheInfrastructureProviderfromwhichthesupervisoracquiresworkersthattheSCANServicewillusetoexecuteUsers’jobs.

2.2.1 Administrator

The administrator’s role takes place entirely at initialization time, before any jobs aresubmittedorexecuted.Theymustprovideinitialparametersindicatingtheclassesoftaskthat users can submit, the target completion time for such tasks, and the relationshipbetween inputdata size, degreeofmultithreading and completion time.Thesemeasuresaredefined inmoredetail in Section2.4, and serveonlyas initial values,being replacedwithobservedvaluesaftersufficientrunsthatdynamicprofilingdatabecomesmeaningful.

TheadministratormustalsocommissionavirtualorphysicalmachineonwhichtoruntheSCAN Scheduler (its central coordinating component), reachable from both potentialworker machines and potential users. They may use the Supervisor’s virtual-machinecommissioningmechanismtoachievethis,as isstandardpracticewhenCELARplays theSupervisorrole.

They can also provide templates for users,which ease the process of job submission byprovidingshell scripts that canbeparameterized,andviewers,whichprovideuserswith

Page 10: Translational Cancer Detection Pipeline over CELAR ... · performance (for example, how does execution time scale with thread count?) ... and optimizes job assignment to maximize

D8.3 Translational Cancer Detection Pipeline over CELAR Application and Evaluation 10

convenient means to observe files in the SCAN system without having to analyse themoffline.

2.2.2 User

Userssubmit jobstoSCANwhichtheywishtobeexecuted.Theycanbesubmittedasanarbitraryshellscript,orbyspecifyingthenameofatemplatedefinedbytheadministratoralongwith zero ormore parameters. Theymust also annotate each jobwith one of theclasses defined by the administrator, indicating that the job is expected to shareperformance characteristics (response to data input size andmultithreading)with otherjobshavingthesameclass,andshouldusethecompletiontimetargetsassociatedwiththatclass.

Users’jobsmaymakeuseoftheSCANfilesystem,inwhichcasetheusershouldprovidealistofrequiredinputsandexpectedoutputswhensubmittingthejob(thisisdescribedinmoredetailinSection2.6).TheseallowSCANtooptimizeforandtrackdatalocality.Usersmayuploadanddownloadfilesto/fromtheSCANfilesystem,butcanalsohavetheirjobssourcedatafromorpostdatatoanexternalresource,suchasaclouddataservice.

The SCAN Service does not take responsibility for dependency-ordering jobs; however,SCANcomeswithapluginfortheGATKQueuejobmanagementutility,whichprovidesthisfunctionalityclient-side.

2.2.3 Supervisor

TheSupervisor’s role is tomonitorSCAN,and,on thebasisof itsobservedperformance,provide itwithvirtualorphysicalmachines toexecute itsworkload.Theservice informsthesupervisorofbasicstatistics,suchasthequeuelength,theworkloadmakeupintermsofclasses,andthehardwareutilizationexperiencedbyitsexistingworkers(e.g.CPUandmemoryutilization).ThesupervisorshouldthuschoosehowmanyworkerstoprovidetoSCAN, and the hardware specification of each one.When workers are to be withdrawnfromservice,itshouldchoosewhichonetotakeaway.

Itshouldalsomonitorthestorageneedsoftheworkers,whichcooperatetoimplementtheSCAN file system, and dynamically adjust the storage attached to each one to achievebalance.

Thesupervisorrolecanbeplayedbyahuman(probablythesameastheadministrator),butforourpurposeswillalwaysbefulfilledbytheCELARsystem.

Page 11: Translational Cancer Detection Pipeline over CELAR ... · performance (for example, how does execution time scale with thread count?) ... and optimizes job assignment to maximize

D8.3 Translational Cancer Detection Pipeline over CELAR Application and Evaluation 11

2.2.3.1 DistinguishingSCANandSupervisorRoles

SCAN and its Supervisor play distinct but closely related roles, which can be easilyconfused,sowewillexplicitlydistinguishthetwohere.SCAN'sroleissimilartocluster-orcloud-basedresourcemanagerssuchasPBS(aclusterresourcemanager)orYARN(acloudresource manager). Their goal is to dispatch users' jobs against a pool of workers. Aresourcemanager’spoolofworkersisusuallyeitherstatic(asinatypicalclusterscenario)orunderthemanager'sdirectcontrol.SCAN,bycontrast,managesadynamicallyvariablepoolofworkers.Workers' size andnumberarenot chosen,nor are theyhiredbySCAN.They are dynamically allocated in different numbers, sizes by its external Supervisor, inthiscasetheCELARsystem.SCANneedsonlytoprovideCELARwithoperationalstatisticsinorder forscalingdecisionstobereached.Hence,CELARobservesSCAN'sperformanceand decides howmany andwhat kind ofworkers SCAN should be suppliedwith, whileSCANcontrolshowthoseworkersaresharedbetweenitsusers.

2.2.4 InfrastructureProvider

Theinfrastructureprovideriswhateverservicethesupervisoremploystoprovideworkerphysical or virtual machines for use by the SCAN Service. It should be able to hostGNU/Linux instances suitable for running users’ required programs, as well as SCAN’srequiredutilities,suchasanSSHserver.Inthisdocument’sevaluationsectionweuseKVMvirtual machine instances provided through the ~Okeanos service, but other suitabletechnologies include lightweightcontainersystemssuchasLXC,orevensystems thatdonotimplementcontainerisolationsuchasaclassicalclusterscheduler.

Inorder fortheSupervisortoscalethenumberandspecificationofworkersprovidedtoscan, theProvider shouldbe capable of dynamically acquiring and releasing a variety ofdifferentkindsofvirtualorphysicalmachine. Ideally itshouldalsobeable toattachanddetach storage devices from an activeworker, thus enabling dynamic storage allocationwithintheSCANfilesystem.

2.3 Architecture

TheSCANService implements fourkeyfunctionalities:schedulingusers’ jobstoworkers,modeling and predicting those jobs’ performance, managing the SCAN filesystem, andexposinginformativeperformancemetricstoitsSupervisor.

Page 12: Translational Cancer Detection Pipeline over CELAR ... · performance (for example, how does execution time scale with thread count?) ... and optimizes job assignment to maximize

D8.3 Translational Cancer Detection Pipeline over CELAR Application and Evaluation 12

Figure2:CorefunctionalityoftheSCANService

Figure 2 illustrates its core functionalities unrelated to storage management, which isexplored inmoredetail later: accepting jobs submittedbyusers, alongwith annotationsthat help SCAN predict its likely duration, scheduling those jobs against workers,monitoring job execution to generate a superior model in the future, and providingperformance metrics to its Supervisor. All depicted functionality is part of the SCANScheduler,exceptfortheWorkersthemselves,whichareseparatecomponents,andsomeelementsofuserjobsubmission,whichareexecutedclient-side.

2.4 Scheduler

TheSCANScheduleristheprimarycomponentoftheService.ItmanagestheWorkerpooland job queue and is responsible for computing most performance metrics that areexposedtotheSupervisor.

2.4.1 JobsandReward

As mentioned previously, Users submit Jobs to the Scheduler as either arbitrary shellscripts,orbynaminga templatedefinedbytheadministratorandsupplyingparameters,whichtheschedulersubstitutes into the template toproduceashellscript forexecution.For example, within the domain of cancer research, a job might run a step in a DNAmutationanalysispipeline.

TheschedulermaintainsasingleFIFOqueueofpendingjobs,andattemptstoruneachjobinturnononeoftheworkersinitspool.

Page 13: Translational Cancer Detection Pipeline over CELAR ... · performance (for example, how does execution time scale with thread count?) ... and optimizes job assignment to maximize

D8.3 Translational Cancer Detection Pipeline over CELAR Application and Evaluation 13

Theusersuppliesanestimatedsizeforeachjobtheysubmit,indicatinginarbitraryunitstheamountofworkthatisexpectedrelativetootherjobsofthesameclass(forexample,formanygenomeanalysis tasks the size inbytes,ornumberof records,of the input filesuffices as a job size estimate). The Scheduler selects the optimal degree of parallelismbased on this estimate, parameters supplied by the administrator, and the currentresourcesavailableintheworkerpool.

The rules for selecting theappropriatedegreeofparallelism (numberof coresassigned)areasfollows:

• Theusermayspecifyanupperlimit,whichisneverexceeded

• Never try toassignmorecores than theworkerwith themostcores (occupiedornot).

• Foreachpossiblenumberofcoresuptowhateverlimitapplies,findthecorecountthat will yield the highest Reward (an arbitrary value approximating userhappiness).SCANcalculatestheexpectedRewardasfollows:

The administrator provides (and the Scheduler successively refines) the job class’expected degree of parallelism, P, theconstant time overhead, c, thetime per unitwork, t, thetarget completion time,T0andtherewardperhourearly/late,Rh.Forajob of estimated size S running with n cores, having queued for Q hours and withexpectedwaitingtimeW,thisyieldstheexpectedrewardR:

𝑅 𝑆,𝑛,𝑊 = 𝑅!(𝑇! − (𝑄 +𝑊 + 𝑐 + 𝑡𝑆 1− 𝑃 +𝑃𝑛 ))

This consists of theexpected single-threaded time𝑐 + 𝑡𝑆, corrected for thenumberofcoresassumingthataproportionofitsruntimePisperfectlyscalablewhilsttheremaining1− 𝑃doesnotscale,assignedrewardperhouritdiffersfromthetargetcompletiontime𝑇!.

ThewaitingtimeWdependsonthecurrentstateoftheworkerpool: ifaworkeralreadyhasncoresfreeitiszero;otherwiseitissetbasedonwhenenoughcurrently-runningjobsareexpectedtoreleaseenoughworkercores,orwhentheSupervisorisexpectedtosupplyanextraworker,whicheverissooner.

The scheduler selects the degree of parallelism with the highest expected reward. Thismeans that whilst with idle workers and non-zero parallelism factor P we will alwaysmaximize parallelism, SCANwill evaluate the tradeoff betweenwaiting for resources to

Page 14: Translational Cancer Detection Pipeline over CELAR ... · performance (for example, how does execution time scale with thread count?) ... and optimizes job assignment to maximize

D8.3 Translational Cancer Detection Pipeline over CELAR Application and Evaluation 14

becomefreesoanewjobcanrunwithmorecores,comparedtostartingrunningsoonerwithlessresources.

Therewardschemejustdescribedisanadaptationofourpreviousworkonoptimalsizingof biological analysis tasks [4]. That work examined how best to execute a variablebiologicalanalysisworkloadwhenthenumberandnatureoftheworkermachinesisunderSCAN’sdirectcontrol;hereweadaptthatschemetoanexternalcontrollersuchasCELAR.

NotethatwhilstthisrewardschemerequirestheadministratortohavesomeexpertiseintheperformancecharacteristicsofthejobclassestheyexpecttorunusingSCAN,theuserisonlyrequiredtoprovideasizeestimateperjob,whichismuchlessdifficult.Formanyfileformats it suffices to count bytes or records to provide a useful estimate of processingdifficulty.

2.4.2 InteractionwiththeSupervisor

Ideally, SCANwants its supervisor to provideworkers in a quantity appropriate for thecurrentworkload,andwith“hardware”specificationsappropriateforthetasksthatSCANhastoschedule.Theworkersshouldbesufficientlymulticorethattheschedulerdoesnotfrequentlyhavetorunjobssuboptimallybecausenosufficiently‘large’slotisavailable,butnot so large that there is significant wastage due to partially occupied workers. Theyshould also have the right CPU-cores-to-main-memory ratio that the limits on both aretypicallyencounteredsimultaneously.

SCAN exposes several statistics to its Supervisor to showwhat performance bottlenecksarepresentanddirectperformanceimprovement:

• Queue length. Thenumberof jobswaiting in thequeue for resources tobecomeavailable, or for the supervisor to supplymoreworkers.Highvalues indicate thataddingmoreworkersisdesirable.

• Total reward earned. The sumof theRewardearned so far from jobs thathavecompleted.

• Reward lostduetoqueuing.ThesumofRewardlostsofarduetojobshavingtospend time in thequeue.Gives an ideaof the scaleof thequeuingproblemwhenconsideredrelativetothetotalreward.

• Rewardlostduetoscalingproblems.ThesumofRewardlostsofarbecauseajobranwithalowerdegreeofparallelismthanisideal,becausenoworkershadenoughcoresoronlypartially-occupiedworkerswereavailable.Ifsignificantcomparedtototalreward,thisindicatessupplyingworkerswithmorecoresinthefuturemaybebeneficial.

Page 15: Translational Cancer Detection Pipeline over CELAR ... · performance (for example, how does execution time scale with thread count?) ... and optimizes job assignment to maximize

D8.3 Translational Cancer Detection Pipeline over CELAR Application and Evaluation 15

• Worker CPU core utilization andmemory utilization. If both are low, and thequeue length is zero, indicates idling workers that may be best released to savemoney,orifnoworkeriscompletelyidle,indicatesitmaybebeneficialtohiremoreworkerswithlesscores,sothatfragmentationdoesnotleadtowastage.Ifoneisfullbut the other is not, indicates the resourcebalance is suboptimal andwemaybepaying for one or other resource we cannot use. Note SCAN reports CPU andmemoryreservations(e.g.SCANhasassigned2/4corestorunningjobs)ratherthanactual usage, on the assumption that any reservationunderutilization is transientanddoesnotrepresentanopportunityforopportunisticmultitasking.

TheSupervisorcanalsoadviseSCANthatitsresourcedemandsareexcessive(e.g.wearealreadyspendingmostorallofouravailablebudgettohireworkers).Inthissituationitisdesirable for SCAN to run tasks single-threaded even if this results in lower reward,becausesingle-threadedexecutionusuallyhasloweroverhead,leadingtohigherlong-termthroughputandmoreeffectiveclearingofalongqueue.

SCAN maintains a threshold expressed in number of concurrently hired cores, abovewhichitwillonlyrunjobssingle-threadedforthisreason.TheSupervisorcanrequestthatthe thresholdbe loweredbysome factor,andSCANwill raise itbyaconstant incrementperjobcompleted.Inthisway,SCANwilltrytograduallyrelaxthesingle-threadconstraintuntilitisnotifiedagaintoeconomise.

2.4.3 DynamicProfiling

Whilst the SCAN Administrator provides the initial parameters used to calculate thereward to be gained from running jobs, the relationship between users’ size estimates,numberof cores allocated and completion time is dynamically profiledby the Schedulerandusedtoimproveitsmodelforfuturetasks.

Once at least 5 jobs of a particular class have completed, linear regression is used todetermine theconstant time factor involved in running thatkindof joband thegradientthatdependsonthedatasizeestimateprovidedbyusers.

Once at least 5 jobs of similar size but different degrees of multithreading have beenexecuted, the Scheduler extrapolates their completion times to find the theoreticalcompletiontimeofajobwithaninfinitenumberofcoresassigned.Comparingthistothecompletion timeobservedwitha single coreassignedallows theScheduler toupdate itsviewof the scalability factorwhichdivides the single-threaded runtime into a perfectly-scalableandanon-scalableproportion.

Clearly these linear models could be improved upon; however, we find that they areadequateformodelingthekindsofbiologicalanalysisworkloadsweareinterestedin.

Page 16: Translational Cancer Detection Pipeline over CELAR ... · performance (for example, how does execution time scale with thread count?) ... and optimizes job assignment to maximize

D8.3 Translational Cancer Detection Pipeline over CELAR Application and Evaluation 16

2.5 Workers

SCANWorkersrunjobsassignedtothembytheScheduler.Itmayassignmorethanonejobtoexecuteconcurrentlyontheworker,anditinformstheworkerofhowmanyCPUcoresandhowmuchmemoryisassignedtoeachjobasitbeginsexecution.

Theworker exposes thenumberof CPU cores and gigabytes ofmemory it currently hasfree to the Supervisor. The Supervisor may use this information to choose, whendecommissioningworkerstoshrinktheworkerpool,whichoneisbestremoved.

2.6 DistributedFilesystem

TheSCANSchedulerandWorkers together implementadistributed filesystem forusers’Jobs to store their inputs, outputs and temporary data. The workers pool their localstorage,whichmaybeadaptivelyresizedbytheSupervisor,whilsttheSchedulermaintainsthe namespace, coordinates data movements and arranges transfers between workerswhenrequired.

To use the distributed filesystem (SCANFS), users’ jobs must specify the paths to theirrequiredinputfilesandtheoutputstheyareexpectedtoproduce.TheSchedulertakesdatalocalityintoaccount,preferringworkersthatalreadyhave,orarealreadyreceivingacopyofneededinputfiles.Ifthisisnoobject,italsotriestoachievebalancebyassigningjobstoworkerswiththehigheramountoffreelocalspace.

Inatypicalusecase,auserwithapipelineofjobswilluploadtheirinputdatatoSCANFS,each pipeline stagewill consume either the input or the previous stage’s output, ideallyrunningdata-localratherthanmovingdatabetweenworkers,beforefinallydownloadingthefinaloutput.

WorkersexposeanadditionalmetrictotheSupervisor,theamountoflocalfreespace,sothatifpossibletheSupervisorcanattachmorestoragetothatworker,ratherthanrequireittomovedatatootherworkersbeforeitcanrunjobs.

Files inSCANFSarenot initiallyreplicated(it isnot intendedtoprovidehighdurability),butreplicationmayarisewhendataiscopiedbecausedata-localexecutionofsomejobwasnot possible. If a particular file’s replica count rises above an administrator-specifiedthreshold,copiesontheworkerswiththeleastfreespacearedeleted.

2.7 ClientUtilities

SCAN exposes a graphicalweb interface that provides job submission and access to anyviewers the administrator has configured. It also provides command-line utilities for

Page 17: Translational Cancer Detection Pipeline over CELAR ... · performance (for example, how does execution time scale with thread count?) ... and optimizes job assignment to maximize

D8.3 Translational Cancer Detection Pipeline over CELAR Application and Evaluation 17

uploadinganddownloadingto/fromtheSCANfilesystemandforexaminingthestateoftheSCANService.

In order to support graphs of jobs with dependencies (for example a pipeline or adistributed computation using a fork/join distribution pattern), SCAN also provides aplugin for the GATK Queue job management framework, which provides dependencytrackingagainstavarietyofjobexecutionbackends.TheusercandefineaQueuescriptjustastheywouldforconventionalclusterexecution,addingSCAN-specificattributessuchasasizeestimateforthejob.

Queue then submits each individual task as a SCAN job and polls for job completion todeterminewhenitcansubmitdependenttasksforexecution.

2.8 SCANv1vs.SCANv2

SCANv1, the version of SCAN described in D8.2 at the end of CELAR project year 2,prototypedmanyofthefeaturesdescribedherebutalsohadsignificantshortcomings.Theprincipledifferencesbetweenthatprototypeandthisyear’sfinalprototypeare:

• SCANv2adaptivelycontrolsthenumberofcoresassignedtoeachjobandcanassignjobs to any freeworker. SCANv1 required jobs of a particular class to have fixedresourcerequirements,andwouldonlyexecutethemonworkersbelongingtothesameclass.

• SCANv1 workers were homogenous per class, whereas SCANv2 workers arearbitrarilyheterogeneous.Thismakesitmucheasierforthesupervisortoprovideanappropriatebalanceofworkerflavors(virtualhardwarespecifications).

• SCANv1 required the supervisor to judge horizontal scaling decisions based oncrudemetrics,suchasqueue lengthandworkerutilization.SCANv2 implementsamore sophisticated reward estimation algorithm that assesses the merits ofhorizontalscaling,makingthesupervisor’sroleeasier.

• SCANv1 used a central filesystem (specifically a CIFS server) to allow jobs toconsume the results of other jobs. SCANv2 implements a distributed filesysteminstead,spreadingtheI/Oloadandimprovingoverallperformance.

• SCANv1provideda shareddatabase similar inpurposeand implementation to itscentral filesystem. We found that this was not useful for the workloadsimplementedagainstSCANv2,sowedidnotpursuethisangle;howeverweexpecttheoptimalsolutionforworkloadsthatcommunicateviaadatabasetohavesimilarcharacteristicstotheSCANdistributedfilesystem.

Page 18: Translational Cancer Detection Pipeline over CELAR ... · performance (for example, how does execution time scale with thread count?) ... and optimizes job assignment to maximize

D8.3 Translational Cancer Detection Pipeline over CELAR Application and Evaluation 18

2.9 Summary

WehavepresentedthedesignoftheSCANsystem,inwhichtheSCANServiceprovidesjobmanagement and scheduling, automatically optimizing parallelism and storage locality,whilst amonitoring Supervisor such as CELARobserves its performance and adapts thequantity and quality of workers at its disposal. The system maintains a distributedfilesystemforuserjobs.Usershandleschedulingofgraphsofdependentjobsthemselves,perhapsusingtheprovidedGATKQueueplugin.

Page 19: Translational Cancer Detection Pipeline over CELAR ... · performance (for example, how does execution time scale with thread count?) ... and optimizes job assignment to maximize

D8.3 Translational Cancer Detection Pipeline over CELAR Application and Evaluation 19

3 Implementation

In this section we will give the particular detailed implementation decisions taken inrealizingtheSCANv2design.

3.1 Scheduler

Theschedulerisimplementedinaround3,000linesofPython,usingCherryPytoprovideHTTP/RPCsupport andNumPy toprovidemathematical support functionsneeded in itsdynamic profiling code. Its performancemetrics are exposed as HTTP GET RPCs whilstHTTP POST RPCs implement user commands, such as job submission, and internalcommandssuchasworkerregistrationandderegistration.AllRPCsacceptURL-formattedarguments and provide JSON-formatted responses in order to provide maximuminteroperabilitywithotherprogramminglanguages.

The sameCherryPy serverprovides theuser interface, implementedusing simpleHTMLforms,andviewermodules,whichtheSCANAdministratorprovidesasCherryPydispatchfiles.

3.2 Workers

Workers only need to provide a GNU/Linux environmentwith passwordless SSH accessfrom the Scheduler. This is typically achieved using a deployment script that installs apublickeycorrespondingtoaprivatekeyheldbytheScheduler.Anyremoteactionsthatthe Scheduler makes on a Worker are executed over this SSH channel, whilst Workeractions targeting the Scheduler (such as registration and deregistration) areHTTPRPCsagainsttheSchedulersimilartousercommands.

Usersoftwarethatmayneedtobeinstalledontheworkersshouldbetakencareofbytheuser-submittedscripts.Itwillbeinstalledastheunprivilegeduseraccountusedtoexecutethejobsthemselves.

3.3 Sharedfilesystem

Besidesthedistributedfilesystemthatisprovidedforbulkuserdata,asharedNFSvolumeisalsohostedby theSchedulerandmountedoneachWorker inorder toprovidesimplesharing of low-volume control data. For example, theQueue plugin that is shippedwithSCAN for user job dependency tracking requires a temporary directory shared by allworkers tostorecontrol information,andsynchronizingsuch filesacross thedistributedfilesystem is impractical. Because of the low volumes of traffic, we do not expect thiscentralfilesystemtobecomeabottleneckatreasonableworkloads.

Page 20: Translational Cancer Detection Pipeline over CELAR ... · performance (for example, how does execution time scale with thread count?) ... and optimizes job assignment to maximize

D8.3 Translational Cancer Detection Pipeline over CELAR Application and Evaluation 20

3.4 Distributedfilesystem

TheprimarySCANfilesystemisimplementedasasimpleuserspacedistributedfilesystemwith explicit data movement and replication. The Scheduler maintains a map of thenamespace that indicates which Workers hold which files, whilst the Workers act asstoragenodes.

ABtrfsvolumeisusedoneachWorkernodetoimplementlocaldiskspanning,permittinglocal volumes to be transparently added and removed whilst user jobs are using thefilesystem,butthenotionalsharednamespacebetweennodesisonlysynchronizedwhenuserjobsareexplicitlyannotatedwithaneedforaparticularfile.Usersspecifysuchfilesusing the symbolic path/scanfs,which SCAN rewrites to the local Btrfs volume’s realmountpoint.

Forexample, ausermight implementa twostagepipeline,withaproducer thatoutputs/scanfs/X and a consumer that subsequently uses it. They should annotate both theproducerandtheconsumer.Theproducerannotationupdatesthecentralnamespace(andtriggers a check to make sure the file was really produced); the consumer annotationindicatesthattheSchedulershouldtrytofindaWorkerthatalreadyhasacopyofthefile,or otherwise schedule adata copy to the chosenWorkerbefore the task is permitted torun. The copy takes place betweenWorkers, not via the Scheduler, usingscp. Keys forcopyingaredistributedbytheSchedulerwhenWorkersregisterthemselves.

3.5 SCAN+CELARDeployment

InordertodeploySCANwithCELARplayingtheSupervisorrole(acombinationIwillcallSCAN+CELAR for short), we must first describe it using CELAR’s Cloud ApplicationManagementFramework(CAMF).Wecreatedefinetwo“applicationserver”components,onerepresentingtheSCANSchedulerandtheotheritspoolofworkers.Figure3showsthisessentialdescriptionandtheattributesoftheschedulercomponent.

Figure3:SCANCAMFdescriptionandtheScheduler’skeyattributes

Page 21: Translational Cancer Detection Pipeline over CELAR ... · performance (for example, how does execution time scale with thread count?) ... and optimizes job assignment to maximize

D8.3 Translational Cancer Detection Pipeline over CELAR Application and Evaluation 21

Next,policymustbedeclaredtoguideCELARinscalingtheSCANworkerpool, includingpickingappropriatevirtualmachineflavourswhencommissioningnewworkers.Theseareconveyed by specifying constraints (bounds on SCAN’s performance metrics) andstrategies(procedurestofollowiftheconstraintsarebroken).

Asimplepolicysetcouldspecifyanupperboundonthescheduler’sjobqueuelengthandalower bound on overall worker utilization, whilst a more involved set could specifyconstraintsonthereward-lossmetricselaboratedinsection2.4.Figure4showstheformercase,withsimplehorizontalscalingpolicyenteredintoCAMF’sGUI.

Figure4:ConstraintsandstrategiesforhorizontalscalingofSCAN’sworkerpool

Weprovided JCatascopia [5]probes thatconveySCAN’sperformancemetrics toCELAR’sdecision-making components. These are simple Java programs that requestmetrics (e.g.queue length, or reward lostdue toworkerunder-provision) fromSCANviaHTTPRPCsandforwardtheresultstotheJCatascopiaAgent.

Finally,weprovidedshellscriptsthatinitializetheSCANschedulerandworker(theformerat start of day, and the latterwhenever aworker is commissioned), andwhich cleanupwhenaworkerisabouttobedecommissionedorhaveastoragedeviceaddedorremoved.

Both scheduler and worker initialization scripts assume a basic Ubuntu 14.04 image tostartwith,andthen:

• InstallSCAN’sdependencies(notablyJava,PythonandtheCherryPywebserver)

• DownloadandbuildSCAN’sJCatascopiaprobes

• SetuppasswordlessunprivilegedSSHaccessbetweenschedulerandworkers

• Onworkersonly,establishaninitialBtrfsvolumeusingaloopbackmountofsparespaceontheVM’smaindisk,whichwillhost theSCANdistributedFSandmaybeextendedwithadditionalstoragelater.

• Onworkersonly,downloadandinstallsoftwareneededtorunusers’jobs.

• Registerworkerswiththescheduler,markingthemreadytoenterservice.

Page 22: Translational Cancer Detection Pipeline over CELAR ... · performance (for example, how does execution time scale with thread count?) ... and optimizes job assignment to maximize

D8.3 Translational Cancer Detection Pipeline over CELAR Application and Evaluation 22

Thescriptweexecutebeforedecommissioningaworkerjustnotifiestheschedulerthatitwillsoonleaveservice(preventingnewjobsfrombeingassignedtoit)andwaitsforworkalreadyinprogresstofinish.

Thescriptssuppliedforaddingadisktoaworkerexplorethe/dev/disktreetofindthenewdisk,persistentlystoretherelationshipbetweentheinfrastructure’sdeviceidentifierandthatseenbytheoperatingsystem,andaddittotheworker’sBtrfsvolumeset.ThescriptfordiskremovalensuresthatalldataismovedontootherdevicesandthedeviceiscleanlyremovedfromtheBtrfsvolumebeforethepeusdo-physicaldevicecanberemoved.

This setup, along with an appropriate CELAR orchestrator image for our targetinfrastructure, permittedus to successfully and fully automaticallydeploy SCANonboththeFlexiantandOkeanoscloudproviders.Figure5showstheviewofanewdeploymentagainstOkeanosasseeninSlipStream,CELAR’slow-levelcommissioningandorchestrationsystem[6].

Figure5:SlipStreamviewofSCANnewlydeployedonOkeanos

Page 23: Translational Cancer Detection Pipeline over CELAR ... · performance (for example, how does execution time scale with thread count?) ... and optimizes job assignment to maximize

D8.3 Translational Cancer Detection Pipeline over CELAR Application and Evaluation 23

4 ExampleWorkload

ToevaluateSCANv2,wetesteditusingthreecommonly-usedbiologicalapplicationswithdifferingresourceneeds:theGATK,GROMACSandFIJI.

4.1 GATK

TheBroad Institute’sGenomeAnalysisToolkit [7] is a general-purpose suiteof tools formanipulating data involved with DNA sequencing. We tested a particular pipeline forcallinggeneticmutationsstarting fromalignedDNAreads fromasequencingmachine. Itconsists of 7 distinct stages which improve read alignment, adjust the measured readqualityscores,callvariantsandthenfilterthembystatisticalconfidence.

Each stage of this process exhibits different resource requirements: some are memory-intensive,somearesuitableformultithreadedexecutionwhilstotherseitherdon’tsupportitatallordon’tbenefitmuch,andsomehavegreaterstorageneedsthanothers.Wehavepreviouslypublishedamoredetailedinvestigationoftheneedsofeachanalysisstage[3].

WeconfiguredtheSCANservicetoallocateaclassperpipelinestage,sothatSCANwouldattempttocharacterizeeachstage’sbehavior individuallyratherthanfindacompromisesolution for all of them together.We supplied each classwith initial parameters (whichSCAN uses to predict job execution time based on estimated job size and degree ofparallelism) resulting from our previous offline profiling work [3]. However since thatwork used different hardware we expect SCAN’s performance to improve once it hasobservedenoughrunstoreplacethosefindingswithitsowndynamicprofilingresults.

4.2 GROMACS

GROMACS[8]isanopensourcemoleculardynamicssuite.Itisageneral-purposetoolforsimulatingthebehaviorofcomplexmoleculesimmersedinasolvent.Weparticularlyuseitto determine the equilibrium structures of proteins and thus determine the effects ofalterationstotheirmakeupandconditions.

Much like the GATK, it is commonly used to build a pipeline of analysis stages, each ofwhich subjects the molecule being simulated to different conditions and restraints. WeassignaSCANclassforeachone,andinitializeitsparametersbasedonofflinetesting.

The stages have similar character in terms of their resource demands (requiringconsiderably more memory than the GATK, but less storage bandwidth) but exhibitdiffering scalability. This is because whilst the GATK’s workloads can often be coarselydivided intowell-parallelised sub-problems,GROMACS’problem is inherentlymonolithic

Page 24: Translational Cancer Detection Pipeline over CELAR ... · performance (for example, how does execution time scale with thread count?) ... and optimizes job assignment to maximize

D8.3 Translational Cancer Detection Pipeline over CELAR Application and Evaluation 24

and division into threads is necessarily fine-grained, with a great deal of inter-threadcommunication. Thismeans that diminishing returns can be experienced at high threadcounts, andGROMACSprocesses canbe strong candidates for shrinkagewhen resourcesaretight.

4.3 CellProfiler

CellProfiler [9] is an open source image analysis toolkit, whichwe use to automaticallylocateandclassifylivingcellsindigitizedimagesofmicroscopeslides.UnliketheGATKandGROMACS it usually operates single-threadedwhen run as a batch processing task, andtypicaltaskruntimesaremuchshorterthanGATKorGROMACSpipelines.Thismakesitanideal “padding” task, able to use spare cores even when fragmented across differentmachinesoronlyavailableforashorttime.

We configured CellProfiler to convert an input plain image into a coloured diagramindicatingwheredifferentcellswereidentified.Wevarytheanalysisdifficultyfromtasktotaskbysubmittingdifferentsizesofinputimage.

Page 25: Translational Cancer Detection Pipeline over CELAR ... · performance (for example, how does execution time scale with thread count?) ... and optimizes job assignment to maximize

D8.3 Translational Cancer Detection Pipeline over CELAR Application and Evaluation 25

5 Evaluation

InthissectionwepresentourevaluationoftheSCANprototype,focusedonexhibitingitssuitabilityforusewithCELARandidentifyingdirectionsforimprovement.

5.1 HorizontalScalability

InourfirstexperimentwestudiedthehorizontalscalabilityoftheSCANresourcemanagerasawhole,runningasimpleandpredictableworkloadatavarietyofworkerpoolsizestodeterminetowhatdegreethecentralizedaspectsofthesystemlimititsabilitytoscaleout.

We ran SCANunder constant load, using a single kind of task (GATKpipeline runswithinput size randomly, uniformly distributed between 8 and 12MB). The GATK runswererestrictedtorunsingle-threadedatallpipelinestages.The inputdatasetsare farsmallerthan typical real-world GATK runs, andwere picked so that coordination activity at theschedulerwasfrequentandlikelytoexposelimitedscalabilityinthefaceofshort-runningjobs.

Allworkers in this experiment had 4 vCPUs and 4GB of peusdo-physicalmemory, hiredfrom the Okeanos cloud service. Neither worker specifications nor taskmulti-threadingwere varied in order to avoid extraneous variables perturbing ourmeasurements.Withworkers of this size, one Queue coordinator task and one GATK analysis can runconcurrently on aworker,withmemory being the limiting factor. Therefore in order tokeep the system busy we ran one client per worker assigned to SCAN, which wouldrepeatedlyuploadinputdata,executetheGATKmutationcallingpipelineusingSCANandthendownloadtheresults.

Table1:HorizontalScalingExperimentalResults

WorkerPoolSize PipelineRunsCompleted RuntimeMean/SD(minutes)5 10 7.66±0.4810 20 8.45±0.5915 30 10.27±0.4120 60 10.58±0.8125 50 12.49±0.48

Table1showstheresultsforbetween5and25concurrently-activeworkers.Weobservethatscalingisimperfect,withtheaveragepipelineruntaking1.63timesaslongwhentheworkerpoolandworkloadareenlargedbyafactorof5.Therearemanypossiblefactorslimitingscalability,suchascontentionforsharedresourcesattheinfrastructurelevel(forexample, worker VMs may share physical hardware, or contend for the same networklinks),orareducedchanceofachievingdatalocalityintheSCANfilesystem.

Page 26: Translational Cancer Detection Pipeline over CELAR ... · performance (for example, how does execution time scale with thread count?) ... and optimizes job assignment to maximize

D8.3 Translational Cancer Detection Pipeline over CELAR Application and Evaluation 26

OverallhoweverweconcludethatthisresultshowsthatitispracticaltoscaleSCANouttomoderateworkerpoolsizes.Weplantoextendthisexperimenttolargerworkerpoolsasresourcelimitationsallow.

5.2 JobDurationPrediction

As part of the previous experiment we measured the reliability of SCAN’s job durationprediction mechanism (described in section 2.4.3). The scheduler began with a defaultnaïveestimationofeachGATKpipeline stage’sbehavior,but then iteratively improvedalinearmodelrelatingthejob’sestimated‘size’(inthiscasesimplymeasuringthenumberofrecordssubmittedforanalysis)toitsexecutiontime.

Figure6:TimeseriesofrelativeerrorsinestimatingGATKtoolruntime

Figure6showstheseriesofestimatesmadebySCAN’sdynamicprofilingcodeintheorderthey were made, represented as the ratio of actual time to estimated time taken. 1.0indicatesaperfectprediction;0.5 indicates theactual runtimewashalf theestimateand2.0indicatestheopposite.Thisparticularseriesofestimateswasgeneratedinthepreviousexperiment’s25-workerrun.TheinitialseriesofverybadestimatescorrespondstoatimewhenoneormoreGATKtooltypeshasnotbeenrunenoughtimestotrainSCAN’smodelofthatkindoftaskandverypessimisticestimatesareproduced.Thebroadtrendforactual-to-estimatedratiosofaround1.0 isgoodnews, showing thatSCANcangenerallypredicthow long a task will run quite well, which is important for its reward-estimationmechanismdescribedpreviously.Thetimeswhenashort-termtrend inestimatequality,such as the slope from estimate #174 onwards, most likely indicates a period whenresource contention is varying, and themodel is taking time to adjust to the change incircumstances.

Page 27: Translational Cancer Detection Pipeline over CELAR ... · performance (for example, how does execution time scale with thread count?) ... and optimizes job assignment to maximize

D8.3 Translational Cancer Detection Pipeline over CELAR Application and Evaluation 27

5.3 JobParallelism

In our next experiment,we subjected the same25 4-coreworkers, plus 10 extra 8-coreworkers,toasustainedloadofGROMACSanalysisrequestsfrom25concurrentclients.Theworkloadconsumedallavailablecores,andsoSCANallocatedvaryingnumbersofcorestoeachpipelinestageasitreachedthefrontofthequeueinresponsetopresentlyavailableworker cores.Wemeasured the relationship between average completion time for eachGROMACSpipelinestageandthenumberofcoresitwasallocated.

Figure7:GROMACSperformancevs.threadcount

Figure7showstherelationshipobservedforeachtool(EM,NVT,NPTandMainarestagesof a typical GROMACS analysis pipeline, each of which simulates atomic and molecularinteractionsunderdifferingconstraints).Asexpectedthetrendisbroadlyformorethreadstoimproveperformance,butweseesomeinterestingeffects:

Two cores do not improve performance asmuch aswewould expect, and for two toolsactually harms performance. This may indicate that when two GROMACS processes aresharing(virtual)hardwarethe3-coresvs.1-coreconfigurationperformsbetterthanthe2-coresvs.2-coresone,perhapsbecause the latter involves twoprocesseswith conflictinginter-coretraffic.HoweverifcachebehaviorisindeedtoblamethisclearlydependsheavilyontheunderlyingimplementationofthevirtualmachinesprovidedasSCANworkers.ThisindicatesthatitmaybenecessarytoimproveuponSCAN’scurrentsimplemodeloftasks’degrees of parallelism (described in 2.4.3), as it is not currently able to model thiscounterintuitiveresult.

We also observe thatmoving from4 to 8 cores has only slight benefit, indicating that ifsufficientworkisavailable,betteroverallthroughputwouldbeachievedbyrunningeachindividual task with less threads. This indicates that there is much to be gained fromSCAN’sdynamicadjustmentofper-jobparallelismbasedonoverallworkload, as for this

Page 28: Translational Cancer Detection Pipeline over CELAR ... · performance (for example, how does execution time scale with thread count?) ... and optimizes job assignment to maximize

D8.3 Translational Cancer Detection Pipeline over CELAR Application and Evaluation 28

particularworkloadSCAN’suserswillclearlybehappiestwith8threadsassignedpertaskifSCANhasmanyidleworkers,butwith4threadspertaskiftherearenospareresourcesandaqueueisbuildingup.

Figure8:NumberofcoresallocatedtoeachGROMACStask

Wealsoplot(inFigure8)thenumberofcoresinSCAN’spoolallocatedtoeachGROMACStask as a function of time, showing the effect of adding the 10 8-core workersmidwaythrough theexperiment.Before theyareadded tasksareoftenassignedoddnumbersofcoresinordertofitthemintosparespaceratherthanwait;aftertheyareaddedresourcesaremoreplentifulandalltasksareassignedanentireworker,gettingeither4or8coresdependingonthebestworkeravailablewhenthetaskreachedthefrontofthequeue.

This illustrates that SCAN’s dynamic core allocation mechanism is correctly flexingbetweencompellingtaskstosharewhenresourcesarescarcetowardsthebeginningoftheexperiment and permitting them to use all available resources when they are plentifultowardstheend.

Page 29: Translational Cancer Detection Pipeline over CELAR ... · performance (for example, how does execution time scale with thread count?) ... and optimizes job assignment to maximize

D8.3 Translational Cancer Detection Pipeline over CELAR Application and Evaluation 29

6 ConclusionandFutureWork

WehavedescribedthedesignandimplementationofSCAN,aresourcemanagerdesignedfor the elastic cloud, managing a pool of dynamically allocated, heterogeneous workersused to execute scientific analysis tasks. It extends the prior art of resource managers(RMs) bymarrying the dynamic job-sizing features seen inmodern cloud-oriented RMssuchasApacheYARN,with support for arbitrary scientific analysis applications that aretypicallyusedinpracticalbiologicallaboratories.Italsoimplementsanovelmodelofuserhappiness inorder tomotivatechoicesregardingthesizeof theworkerpoolusedbytheRM, and the assignment ofworker resources to each task under SCAN’s control. SCAN’shappiness-modelling algorithm [4] and preliminary profiling studies [3] have beenindependentlypublished.

Our evaluation of SCAN demonstrates its horizontal scaling capability as well as itsdynamic determination of per-jobmultithreading, trading off job latency (and thereforeindividualusersatisfaction)againstoverallthroughout(andthusglobalsatisfaction).

SCAN has previously been run under CELAR v0.11exhibiting fully automated horizontalscaling.WhentheCELARapplicationframeworkv0.2releasebecomesavailableinthenearfuture we intend to evaluate the complete feedback loop between SCAN’s diagnosticmetrics,exposedtoCELARtoprovideadviceconcerningtheworkersSCANneedstorunitsanalysis tasks, the worker VMs that CELAR chooses to commission as a result, and theresultantperformanceofusers’analysessubmittedtoSCANforexecution.Wealsointendto investigate applying prior work in modeling application parallelism to improve overSCAN’scurrentsimplisticview.

Finally we intend to investigate adapting CELAR from its current heavyweight virtualmachinemodel toworkwith lightweight isolation systems, such as LinuxContainers, orcooperative resource sharing systems such as typical cluster schedulers, in order tocombine the convenience and excellent efficiency of these systems with the benefits ofdynamicscaling.

1DescribedindeliverableD8.2[2];avideoofthisdeploymentisalsoavailableathttp://cognito.cslab.ece.ntua.gr/final/scanFinal2.mp4

Page 30: Translational Cancer Detection Pipeline over CELAR ... · performance (for example, how does execution time scale with thread count?) ... and optimizes job assignment to maximize

D8.3 Translational Cancer Detection Pipeline over CELAR Application and Evaluation 30

7 Bibliography

[1]DimitriosTsoumakos,StaloSofokleous,IoannisLiabotis,VangelisFloros,ChristosLoverdosWeiXing,"TranslationalCancerDetectionPipelineDesign,"CRUKManchesterInstitute,2014.

[2]WeiXing,DemetrisAntoniades,AndoenaBalla,VangelisFloros,IoannisLiabotis,GiannisGiannakopoulos,GeorgianaCopilChristopherSmowton,"TranslationalCancerDetectionPipelineoverCELARApplicationV1andUpdatedDesign,"CRUKManchesterInstitute,2014,.

[3]ChristopherSmowtonetal.,"Analysingcancergenomicsintheelasticcloud,"in15thIEEE/ACMInternationalSymposiumonCluster,CloudandGridComputing(CCGrid),2015.

[4]ChristopherSmowton,GeorgianaCopil,Hong-LinhTruong,CrispinMiller,andWeiXing,"GenomeAnalysisinaDynamicallyScaledHybridCloud,"inIEEE11thInternationalConferenceoneScience,2015.

[5]AthanasiosFoudoulisetal.,"CloudInformationSystem,"UniversityofCyprus,2015.[6]KonstantinSkaburskas,"CELARSystemPrototype,"SixSQ,2015.[7]AaronMcKennaetal.,"TheGenomeAnalysisToolkit:aMapReduceframeworkfor

analyzingnext-generationDNAsequencingdata,"GenomeResearch,vol.20,no.9,pp.1297-1303,2010.

[8]HermanJCBerendsen,DavidvanderSpoel,andRudivanDrunen,"GROMACS:Amessage-passingparallelmoleculardynamicsimplementation,"ComputerPhysicsCommunications,vol.91,no.1,pp.43-56,1995.

[9]AnneECarpenteretal.,"CellProfiler:imageanalysissoftwareforidentifyingandquantifyingcellphenotypes,"GenomeBiology,vol.7,no.10,p.R100,2006.