Optimizing HPC resource allocation through monitoring

29
Optimizing HPC resource allocation through monitoring Alexandre Beche <[email protected]>

Transcript of Optimizing HPC resource allocation through monitoring

Page 1: Optimizing HPC resource allocation through monitoring

Optimizing HPC resource allocation through monitoringAlexandre Beche <[email protected]>

Page 2: Optimizing HPC resource allocation through monitoring

Outlines

1. Context2. Job monitoring3. Beyond Slurm monitoring

Page 3: Optimizing HPC resource allocation through monitoring

Context

Page 4: Optimizing HPC resource allocation through monitoring

Blue Brain Project

HostInstitutionEcole Polytechnique Fédérale deLausanne(EPFL)

DirectorHenryMarkram

Co-DirectorsSeanHill,FelixSchürmann

Teamtoday~85scientists,engineers&staff

Timeline2005foundedatEPFL2011/2012ETHBoardfunding2013-2016SwissNationalResearchInfrastructure

MainInternationalCollaborationsSwitzerland(CSCS,CERN)Israel(HUJI)USA(Yale,ANL,AllenBrain)Spain(UPM)SaudiArabia(KAUST)Europe(HBP)

Geneva

Page 5: Optimizing HPC resource allocation through monitoring

• 111,700 neurons/mm3

• 31,000 neurons

• 55 morphological types • ~ 2 mm thick

• 346 m of axon• 211 m of dendrites

• Excitatory to inhibitory neuron ratio of 86:14 %

• 13 excitatory & 42 inhibitory m-types

• Maximum branch order of m-types:

• 11 electrical types

• 207 morpho-electrical types

• 13 HH type ion channel models

• Ion channel density distribution profiles:

• bAP & EPSP attenuation for 207 morpho-electrical types

Axon

Dendri

tes

Excitatory

Inhibitory

• 0.63 synapses/mm3

• 2258 viable synaptic pathways

• 600 intra-laminar pathways

• 1658 inter-laminar pathways

• 664 excitatory pathways• 1594 inhibitory pathways

• Extrinsic to intrinsic synapse ratio of75:25 %

• Mean synapses/connection

• 6 synapse types

• 3025 possible synaptic pathways

• Total conductance in a single neuron is 971 nS

• 207 synaptomes

• Mean conductance per synapse: 0.85 nS for excitatory & 0.66 nS for inhibitory synapses

• The per synapse conductance of 1.5 nSfor connections between L5TTPCs is the highest in the microrcircuit

Axon

Uniform

Soma

Dendri

tes8 6 6

24 3550 17

4.3 8.5

Exc.

Inh. 274 nS697 nS

Excita

tory

Inhibi

tory

• Space clamp corrected synaptic conductances for 607 pathways

I

II/III

IV

VI

V

100 µm

Data-Driven Modeling & Simulation!

Ø 80authorsØ Jointeffortbetween

computerandneuro-scientists

Ø ReproducibleworkØ Extensible

Markram etal,Cell2015https://bbp.epfl.ch/nmc-portal

Page 6: Optimizing HPC resource allocation through monitoring

HPC Today’s Infrastructure

ProductionHPC• Modeldevelopment• Reconstruction• Simulation• SWdevelopment

4 Compute Racks4096 Compute Nodes5D CN torus0.8 PF/s peak64TB DRAM4.2 PB GPFS storage

64 IONodes3D torus128TB FlashLinux

SystemsResearch• Memoryextension• Applicationcoupling• Interactivesupercomputing• Reproducibility

IBMBlueGene/Q

IBMBlueGene ActiveStorage

ElasticCompute• Visualizationandanalysis• Webservices• SWdevelopment• ContinuousIntegration• ContinuousDeployment

IaaS4 nodes

Viz & Analytics36 nodes

IaaS12 nodes

CEPH GPFSNetAppCEPH

Viz & Analytics14 nodes

GPFS

GVA

Page 7: Optimizing HPC resource allocation through monitoring

HPC Resources usage

Facts:

• 70Users• 35Projects• 3 Clusters

Dailycorehoursavailable

DailyJobsubmitted

Activeuser*

BlueGene 1572864 103 20Lugano cluster 13824 632 53

Genevacluster 4512 259 20

*Userwhosubmittedatleastonejoboverthelastmonth

Page 8: Optimizing HPC resource allocation through monitoring

HPC Resources usage

Facts:

• 70Users• 35Projects• 3 Clusters

Dailycorehoursavailable

DailyJobsubmitted

Activeuser*

BlueGene 1572864 103 20Lugano cluster 13824 632 53

Genevacluster 4512 259 20

*Userwhosubmittedatleastonejoboverthelastmonth

Observations:

• Clustersarescheduling10xmorejobs thanbluegene• Clusternodesareshared(--exclusiveislimited),notbluegene

EmphasisofthepresentationwillbeputonHPCclusters

Page 9: Optimizing HPC resource allocation through monitoring

Job monitoring

Page 10: Optimizing HPC resource allocation through monitoring

Problem description

Extractedfromsreport

InteractiveHPCClustersusageovertime

idle

allocated

Page 11: Optimizing HPC resource allocation through monitoring

Problem description

Extractedfromsreport

InteractiveHPCClustersusageovertime

idle

allocated

Symptoms:Usercan’tgetanallocation

Cause:Clusterisfullyallocated

Solution:1) BuyabiggeroneJ2) Seeifresourcesare

optimallyused

Page 12: Optimizing HPC resource allocation through monitoring

Slurm accounting DB

Knowledge of all jobs / step executed• Average waiting time in the queue• Submission rate by user / project

“sacct” data are indexed into ElasticSearch• Near real-time (every 10 minutes)• Analytics, web report generation

Limitations:• Not natively aware of resources used by the job

Page 13: Optimizing HPC resource allocation through monitoring

BBP Monitoring infrastructure

Scalableandextensibleframework• Basedonopensourcetechnologies• Enabledatacollectionandanalysis

Page 14: Optimizing HPC resource allocation through monitoring

BBP Monitoring infrastructure

Scalableandextensibleframework• Basedonopensourcetechnologies• Enabledatacollectionandanalysis

Nativehostinstrumentation• 10 secondsresolution• 250 metrics pernode

Limitation:Systemmetricsdoesnothaveanyknowledgeaboutworkload

Page 15: Optimizing HPC resource allocation through monitoring

Dashboard 1/3

Jobmetadata

Systemmetrics

Page 16: Optimizing HPC resource allocation through monitoring

Dashboard 2/3

Page 17: Optimizing HPC resource allocation through monitoring

Dashboard 3/3

Page 18: Optimizing HPC resource allocation through monitoring

Non-optimal use case

Un-usedallocationSingle-threadused

Jobstarted

Jobfin

ishe

d

Allocationdetails:• Singlenodes,allcores,

batchpartition

Jobdetails:• CPUbound

Page 19: Optimizing HPC resource allocation through monitoring

Benefits

Developers• Analyze code from a system perspective– Non-intrusive monitoring / negligible (perf) overhead

• Detect code inefficiency / limiting resource– Non optimal parallelization

Page 20: Optimizing HPC resource allocation through monitoring

Benefits

Developers• Analyze code from a system perspective– Non-intrusive monitoring / negligible (perf) overhead

• Detect code inefficiency / limiting resource– Non optimal parallelization

System admins• Analyze system metrics with workload context• Detect non-optimal allocation– Allocation bigger than execution time

Holisticviewenablingcrossteam(competencies)debugging

Page 21: Optimizing HPC resource allocation through monitoring

Ongoing work / limitation

Ongoing work• Creating KPI out of the available metrics– Efficiency of a job (cpu seconds used /reserved)

Page 22: Optimizing HPC resource allocation through monitoring

Ongoing work / limitation

Ongoing work• Creating KPI out of the available metrics– Efficiency of a job (cpu seconds used /reserved)

Limitations• Mainly system metrics so far

– Only memory are collected at cgroup level

• Missing infiniband metrics• Job internals are hidden

Page 23: Optimizing HPC resource allocation through monitoring

Beyond Slurm monitoring

Page 24: Optimizing HPC resource allocation through monitoring

Job workload context

Focus has been put on monitoring the job in the infrastructure

No hint is given to the job internals• Job entering in a given phase

Page 25: Optimizing HPC resource allocation through monitoring

Job workload context

Focus has been put on monitoring the job in the infrastructure

No hint is given to the job internals• Job entering in a given phase

Providing a library to enhance job context• Workload manager agnostic• Not a profiling tool• Lightweight way of sharing information from the job• Allows to ship user-defined metadata

Page 26: Optimizing HPC resource allocation through monitoring

BBP_instrumentations

Page 27: Optimizing HPC resource allocation through monitoring

BBP_instrumentations

Arbitrarytags Reportmetrics

Page 28: Optimizing HPC resource allocation through monitoring

Summary

Detection of non-optimal usage • Un-used allocation• Developers now have tools to understand job behaviors

Internal job monitoring• Allows understanding which resources are consumed by

section of job through user-defined metadata

Correlation of scattered information enable powerful analysis

Page 29: Optimizing HPC resource allocation through monitoring

Acknowledgements

BBP core services & HPC teams