Multicore at RAL

HEP SYSMAN, Jan 2014

What we used to do

• Whole-node queue configured on CREAM CE– Submits to whole node queue in Torque, which has:resources_default.nodes = 1:ppn=8

– Maui config

JOBFLAGS=DEDICATEDNODE

• Used dedicated whole-node (8-core) worker nodes# Whole node queueSRCFG[wholenode] STARTTIME=0:00:00 ENDTIME=24:00:00SRCFG[wholenode] PERIOD=INFINITYSRCFG[wholenode] CLASSLIST=gridWNSRCFG[wholenode] HOSTLIST=lcg0990,lcg0991,lcg0992,lcg0993,lcg0994

• More information:http://www.gridpp.rl.ac.uk/blog/tag/whole-node-jobs/

1 node; 8 virtual processors per node

What we used to do

• Problems– CREAM CE: nothing stops single core jobs from being

submitted to & running on the whole node queue• Mainly problem with glite-WMS jobs• Therefore limited whole node queue only to VOs which asked for

it• Still issues with LHCb SUM tests ending up on the whole node

queue– Jobs may be queued for a long time – not good for SUM tests

• Is there any way around this?– Partitioned resources

• Need to manually move resources between single-core & multi-core jobs

– Maui wasn’t good at scheduling whole-node jobs• Even with dedicated nodes!• Solution: wrote our own scheduler (the “Kelly Scheduler”),

running in parallel with Maui, which scheduled whole-node jobs

The situation after moving from Torque/Maui to HTCondor:

• CREAM CEs– Initially had 8-core queues– Removed them because ALICE/LHCb were submitting normal

single core SUM test jobs to them• ARC CEs

– Ours configured to have only a single queue– Any VO can run multicore jobs if they want to

• Just have to request > 1 CPU. E.g. add to XRSL:(count=8)

• Any number of CPUs can be requested (2, 4, 8, 31, …)– But a job requesting 402 CPUs, for example, is unlikely to run

Current status

• HTCondor configuration– All WNs configured to have partitionable slots– Each WN has a single partitionable slot with a fixed set of

resources (cores, memory, disk, swap, …)– These resources can then be divided up as necessary for jobs

• Dynamic slots created from the partitionable slot• When dynamic slots exit, merge back into the patitionable slot• When any single resource is used up (e.g. memory), no more

dynamic slots can be created– Enables us to run jobs with different memory requirements, as

well as jobs requiring different numbers of cores– Configuration:

NUM_SLOTS = 1SLOT_TYPE_1 = cpus=100%,mem=100%,autoNUM_SLOTS_TYPE_1 = 1SLOT_TYPE_1_PARTITIONABLE = TRUE

Current status

• Example WN# condor_status lcg1560.gridpp.rl.ac.uk -autoformat Name State Activity SlotType Memory Cpusslot1@lcg1560.gridpp.rl.ac.uk Unclaimed Idle Partitionable 101500 0 partitionable slotslot1_10@lcg1560.gridpp.rl.ac.uk Claimed Busy Dynamic 2500 1slot1_11@lcg1560.gridpp.rl.ac.uk Claimed Busy Dynamic 4000 1slot1_12@lcg1560.gridpp.rl.ac.uk Claimed Busy Dynamic 3000 1slot1_13@lcg1560.gridpp.rl.ac.uk Claimed Busy Dynamic 4000 1slot1_14@lcg1560.gridpp.rl.ac.uk Claimed Busy Dynamic 3000 1slot1_15@lcg1560.gridpp.rl.ac.uk Claimed Busy Dynamic 3000 1slot1_16@lcg1560.gridpp.rl.ac.uk Claimed Busy Dynamic 3000 1slot1_17@lcg1560.gridpp.rl.ac.uk Claimed Busy Dynamic 4000 1slot1_18@lcg1560.gridpp.rl.ac.uk Claimed Busy Dynamic 2500 1slot1_19@lcg1560.gridpp.rl.ac.uk Claimed Busy Dynamic 3000 1slot1_1@lcg1560.gridpp.rl.ac.uk Claimed Busy Dynamic 3000 1slot1_20@lcg1560.gridpp.rl.ac.uk Claimed Busy Dynamic 4000 1slot1_21@lcg1560.gridpp.rl.ac.uk Claimed Busy Dynamic 3000 1slot1_22@lcg1560.gridpp.rl.ac.uk Claimed Busy Dynamic 4000 1slot1_23@lcg1560.gridpp.rl.ac.uk Claimed Busy Dynamic 3000 1slot1_2@lcg1560.gridpp.rl.ac.uk Claimed Busy Dynamic 3000 1slot1_3@lcg1560.gridpp.rl.ac.uk Claimed Busy Dynamic 3000 1slot1_4@lcg1560.gridpp.rl.ac.uk Claimed Busy Dynamic 3000 1slot1_5@lcg1560.gridpp.rl.ac.uk Claimed Busy Dynamic 3000 1slot1_6@lcg1560.gridpp.rl.ac.uk Claimed Busy Dynamic 4000 1slot1_7@lcg1560.gridpp.rl.ac.uk Claimed Busy Dynamic 3000 1slot1_8@lcg1560.gridpp.rl.ac.uk Claimed Busy Dynamic 4000 1slot1_9@lcg1560.gridpp.rl.ac.uk Claimed Busy Dynamic 2000 8 multicore job

Current status

• If lots of single core jobs running, how does a multicore job start?

• condor_defrag daemon– Finds worker nodes to drain– Configuration parameters, including:

• How often to run• Max number of “whole” machines (we’re using 300

currently)• Max concurrent draining machines (we’re using 60 currently)• Expression that specifies which machines to drain• Expression that specifies when to cancel draining• Expression that specifies which machines are already “whole”

machines• Expression that specifies which machines are more desirable to

drain– Currently fairly simple

• E.g. doesn’t know anything about demand (idle multicore jobs)

• Current setup– Added accounting groups (fairshares) for ATLAS multicore jobs

group_ATLAS.atlasgroup_ATLAS.atlas_pilotgroup_ATLAS.atlas_pilot_multicoregroup_ATLAS.prodatlsgroup_ATLAS.prodatls_multicore

– Definition of “whole” machines (currently require 8 cores only drained)DEFRAG_WHOLE_MACHINE_EXPR = ((Cpus == TotalCpus) || (Cpus >= 8))

– Changed expression that specifies which machines are more desirable to drain

• Default:DEFRAG_RANK = -ExpectedMachineGracefulDrainingBadput

• New (last week):DEFRAG_RANK = ifThenElse(Cpus >= 8, -10, (TotalCpus - Cpus)/(8.0 - Cpus))

• Made a big improvement– Previously only older 8-core machines were selected for draining– Now machines with the most numbers of cores are selected for

draining8

Current status

The job runtime in cpu-seconds that would be lost if graceful draining were initiated at the time the machine’s ad was published. Assumes jobs will run to their walltime limit.

default

Entire machines drained, but draining is cancelled when a minimum of 8 cores become available

• Effect of change to DEFRAG_RANK

• Effect of switching off condor_defrag– Will number of running multicore jobs quickly reduce to zero?– Number of running multicore jobs decreased slowly, then

became constant

Current status

Cancelled all draining

worker nodes

• Time to drain worker nodes– Data extracted from log files on WNs

(/var/log/condor/StartLog)

Current status

Preliminary

Past week Jobs idle/running Multicore jobs idle/running

Number of running ATLAS multicore jobs has been relatively constant due to fairshares (except when condor_defrag was switched off)

Current status

• Concerns– Resources wasted while slots are draining

• Currently always a fixed number of WNs are draining (60)– Can this be improved?

• Simple option perhaps: cron which changes defrag parameters– Do we need to try to make multicore slots “sticky”?

• i.e. when a multicore job runs, prefer to run another multicore job rather than 8 single core jobs

• Multiple multicore jobs per WN– Currently will have at most 1 multicore job per WN (unless not

many single core jobs)– Modifications to our condor_defrag configuration necessary to

allow this

Future plans

Multicore at RAL

Documents

Transcript of Multicore at RAL

Experiences Deploying Xrootd at RAL

Experiences Deploying Xrootd at RAL Chris Brew (RAL)

Status of the AFC at RAL

Nuancier RAL Classic - Photoshoplus · 2019. 7. 22. · Nuancier RAL Classic RAL 1000 RAL 1003 RAL 1006 RAL 1012 RAL 1015 RAL 1018 RAL 1001 RAL 1004 RAL 1007 RAL 1013 RAL 1016 RAL

How to Survive the Multicore Software Revolution (or at ...

titanpowdercoating.com.au Colour Card 02.pdf · ral 4001 ral 4002 aaoa ral 4003 ral 6000 ral 6001 ral 6002 ral 6003 ral 6004 ral 6005 ral 4004 ral 4005 ral 4006 ral 6006 ral 6007

MICE RF – Installation at RAL

Docker WRF at NCAR / RAL

Superbeam target work at RAL

Multicore Accounting John Gordon, STFC-RAL WLCG MB, July 2015.

How to Survive the Multicore Software Revolution (or at ... · How to Survive the Multicore Software Revolution (or at ... Twenty Questions to Ask When ... If software developers

The NVIDIA Co-Design Lab for Hybrid Multicore Computing at ETH …€¦ · NVIDIA Co-Design Lab for Hybrid Multicore Computing at ETH Zurich Foster collaboration between scientific

A World of Color ArmorBrite - Raynor Garage Doors · ral 1004 ral 1005 ral 1006 ral 1007 ral 1011 ral 1012 ral 1013 ral 1014 ral 1015 ral 1016 ral 1017 ral 1018 ral 1019 ral 1020

RAL 1000 RAL 1001 RAL 1002 RAL 1003 RAL 1004 · 2019. 12. 5. · 12/5/2019 RAL Colours | RAL CLASSIC Colours 2/15 RAL 1018 RAL 1019 RAL 1020 RAL 1021 RAL 1023 RAL 1024 RAL 1026 RAL

Pardoseli epoxidice - PALETAR RAL · 2019. 2. 12. · paletar ral ral 1000 ral 4007 ral 7008 ral 1003 ral 4009 ral 7010 ral 1001 ral 1002 ral 4008 ral 7009 ral 5000 ral 7011 ral 1004

F.R.P. ROOFING SHEETS · ral 1019 ral 1020 ral 1021 ral 1023 ral 1024 ral 1027 ral 1028 ral 1032 ral 1033 ral 1034 ral 2000 ral 2001 ral 2002 ral 2003 ral 2004 ral 2008 ral 2009 ral

RAL 1 - Pirkanmaan Peltiuuni...RAL 3016 RAL 4009 RAL 5015 RAL 6007 RAL 6024 RAL 7006 RAL 8001 RAL 8025 RAL 2013 RAL 2011 RAL 3017 RAL 4010 RAL 5017 RAL 6008 RAL 6025 RAL 7008 RAL 7033

How to Survive the Multicore Software Revolution (or at …keld/teaching/IPDC_f10/How_to_Survive_the_Multi... · Page i How to Survive the Multicore Software Revolution (or at Least

How to Survive the Multicore Software Revolution (or at Least

ral karta - Vikingviking.hr/images/uploads/ral_karta.pdf · 2010. 10. 7. · ral 1000 ral 1011 ral 1019 ral 1033 ral 2009 ral 3004 ral 3015 ral 4001 ral 4009 ral 5008 ral 5017 ral