Monitoring of HPC and Embedded Systems -...

33
26/08/2016 :: :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: Monitoring of HPC and Embedded Systems Dennis Hoppe EXCESS Workshop

Transcript of Monitoring of HPC and Embedded Systems -...

Page 1: Monitoring of HPC and Embedded Systems - EXCESSexcess-project.eu/excess_workshop/Chalmers16/hlrs-monitoring-in... · Monitoring of HPC and Embedded Systems ... •Greening of the

26/08/2016:: ::

::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::

::

Monitoring of HPC andEmbedded Systems

Dennis Hoppe

EXCESS Workshop

Page 2: Monitoring of HPC and Embedded Systems - EXCESSexcess-project.eu/excess_workshop/Chalmers16/hlrs-monitoring-in... · Monitoring of HPC and Embedded Systems ... •Greening of the

26/08/2016:: ::

::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::

::

Agenda

• Rationale for Monitoring

• ATOM Monitoring Framework– Monitoring in HPC

– Monitoring of Embedded Systems

• Usage Examples– EXCESS

– DreamCloud

– PHANTOM

• Summary

EXCESS Workshop 2

Page 3: Monitoring of HPC and Embedded Systems - EXCESSexcess-project.eu/excess_workshop/Chalmers16/hlrs-monitoring-in... · Monitoring of HPC and Embedded Systems ... •Greening of the

26/08/2016:: ::

::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::

::

WHY MONITORING?

EXCESS Workshop <#>

Page 4: Monitoring of HPC and Embedded Systems - EXCESSexcess-project.eu/excess_workshop/Chalmers16/hlrs-monitoring-in... · Monitoring of HPC and Embedded Systems ... •Greening of the

26/08/2016:: ::

::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::

::

Rationale for Monitoring

EXCESS Workshop

Maintenance

Accounting

Storage Monitoring

Hardware Performance

Power and Energy Monitoring

Application Profiling

4

Page 5: Monitoring of HPC and Embedded Systems - EXCESSexcess-project.eu/excess_workshop/Chalmers16/hlrs-monitoring-in... · Monitoring of HPC and Embedded Systems ... •Greening of the

26/08/2016:: ::

::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::

::

Demand for Energy Efficiency Mirrored in EU Projects

• CoolEmAll [CoolEmAll, 2011]

• DreamCloud [DreamCloud, 2013]

• ECO2Clouds [ECO2Clouds, 2012]

• ExaSolvers [ExaSolvers, 2013]

• EXCESS [EXCESS, 2013]

• JUNIPER [JUNIPER, 2012]

• PHANTOM [PHANTOM, 2015]

EXCESS Workshop 5

Page 6: Monitoring of HPC and Embedded Systems - EXCESSexcess-project.eu/excess_workshop/Chalmers16/hlrs-monitoring-in... · Monitoring of HPC and Embedded Systems ... •Greening of the

26/08/2016:: ::

::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::

::

It’s all about saving energy

• Energy consumption is the major challenge in HPC (Exascale Challenge) [Ashby et al., 2010]

– Energy consumption must be a design goal in future algorithm design

– Standardization of interfaces and APIs to collect energy consumption data is needed

– Use of fine-grained measurement tools to evaluate energy saving effects on performance and vice versa

• Greening of the HPC domain will become as important the greening movement of the automotive domain

EXCESS Workshop 6

Page 7: Monitoring of HPC and Embedded Systems - EXCESSexcess-project.eu/excess_workshop/Chalmers16/hlrs-monitoring-in... · Monitoring of HPC and Embedded Systems ... •Greening of the

26/08/2016:: ::

::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::

::

Requirements on Current Monitoring Tools [Hoppe et al., 2015]

Key Property Zabbix Nagios OpenNMS

Architecture

Non-Intrusiveness

Scalability

Timeliness ()

Granularity

Extensibility

Data Storage

Visualization

Adaptability

Predictability

EXCESS Workshop 7

Key properties defined in [Aceta et al., 2013], Katsaros et al., 2011], [Telesca et al., 2014]

Page 8: Monitoring of HPC and Embedded Systems - EXCESSexcess-project.eu/excess_workshop/Chalmers16/hlrs-monitoring-in... · Monitoring of HPC and Embedded Systems ... •Greening of the

26/08/2016:: ::

::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::

::

Requirements on Current Monitoring Tools [Hoppe et al., 2015]

Key Property Zabbix Nagios OpenNMS

Architecture

Non-Intrusiveness

Scalability

Timeliness ()

Granularity

Extensibility

Data Storage

Visualization

Adaptability

Predictability

EXCESS Workshop 8

None of the existing monitoring solutions fully satisfies the requirements imposed by current and future projects!

Towards a novel monitoring framework

Page 9: Monitoring of HPC and Embedded Systems - EXCESSexcess-project.eu/excess_workshop/Chalmers16/hlrs-monitoring-in... · Monitoring of HPC and Embedded Systems ... •Greening of the

26/08/2016:: ::

::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::

::

ATOM MONITORING FRAMEWORK

EXCESS Workshop 9

Image source: navantis.com

Page 10: Monitoring of HPC and Embedded Systems - EXCESSexcess-project.eu/excess_workshop/Chalmers16/hlrs-monitoring-in... · Monitoring of HPC and Embedded Systems ... •Greening of the

26/08/2016:: ::

::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::

::

Key Features of ATOM

• Analyzing the system's run-time context

• Low-intrusive, highly scalable architecture

• Flexible, language independent plug-in system

• RESTful Web service to push and retrieve data

• Light-weight and easy-to-grasp user library

• Integration with PBS resource manager (HPC) for on-demand monitoring of applications and infrastructure

• Web-based front-end for data exploration and analysis

EXCESS Workshop 10

Page 11: Monitoring of HPC and Embedded Systems - EXCESSexcess-project.eu/excess_workshop/Chalmers16/hlrs-monitoring-in... · Monitoring of HPC and Embedded Systems ... •Greening of the

26/08/2016:: ::

::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::

::

ATOM Architecture

EXCESS Workshop

– MONITOR: ATOM monitoring server

– ACTOR: ATOM metric collector

– Rickshaw (D3.js)

– NodeJS

– Elasticsearch

11

Page 12: Monitoring of HPC and Embedded Systems - EXCESSexcess-project.eu/excess_workshop/Chalmers16/hlrs-monitoring-in... · Monitoring of HPC and Embedded Systems ... •Greening of the

26/08/2016:: ::

::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::

::

MONITORING IN HPC

EXCESS Workshop 12

Image source: hlrs.de

Page 13: Monitoring of HPC and Embedded Systems - EXCESSexcess-project.eu/excess_workshop/Chalmers16/hlrs-monitoring-in... · Monitoring of HPC and Embedded Systems ... •Greening of the

26/08/2016:: ::

::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::

::

ATOM Setup on the HLRS/EXCESS Cluster

EXCESS Workshop

• Cluster is used for software development, testing, profiling, evaluations within HLRS and for external project partners:• highly configurable and extensible; current power consumption is

roughly between 0.5 and 2.0 kW• power measurement framework integrated with PBS system; no further

performance overhead is induced while profiling applications

13

Page 14: Monitoring of HPC and Embedded Systems - EXCESSexcess-project.eu/excess_workshop/Chalmers16/hlrs-monitoring-in... · Monitoring of HPC and Embedded Systems ... •Greening of the

26/08/2016:: ::

::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::

::

HLRS Power and Performance Measurement System

EXCESS Workshop 14

Page 15: Monitoring of HPC and Embedded Systems - EXCESSexcess-project.eu/excess_workshop/Chalmers16/hlrs-monitoring-in... · Monitoring of HPC and Embedded Systems ... •Greening of the

26/08/2016:: ::

::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::

::

Metric Gathering

• PAPI-C

• RAPLProcessor

• /proc/meminfo, /proc/vmstat

• iostatMemory

• PAPI-CNetwork

• Nvidia SMIGraphic Cards

• External Measurement SystemSystem

• ATOM monitoring API (HTTP, C, Java, Python, ...)Software

EXCESS Workshop 15

Page 16: Monitoring of HPC and Embedded Systems - EXCESSexcess-project.eu/excess_workshop/Chalmers16/hlrs-monitoring-in... · Monitoring of HPC and Embedded Systems ... •Greening of the

26/08/2016:: ::

::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::

::

Sampling Rate of Metrics [Hoppe et al., 2016]

• User-defined rates for each plug-in via configuration file– external power measurements up to 50kHz

– stable support for sampling at up to 50 Hz (= 20ms)

– ATOM allows for a 50 times higher resolution than standard monitoring

EXCESS Workshop 16

Page 17: Monitoring of HPC and Embedded Systems - EXCESSexcess-project.eu/excess_workshop/Chalmers16/hlrs-monitoring-in... · Monitoring of HPC and Embedded Systems ... •Greening of the

26/08/2016:: ::

::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::

::

MONITORING OFEMBEDDED SYSTEMS

EXCESS Workshop 17

Image source: movidius.com

Page 18: Monitoring of HPC and Embedded Systems - EXCESSexcess-project.eu/excess_workshop/Chalmers16/hlrs-monitoring-in... · Monitoring of HPC and Embedded Systems ... •Greening of the

26/08/2016:: ::

::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::

::

Support for Movidius Myriad2

• MV0182 development board is integrated into the EXCESS cluster

– connections through Ethernet (also USB is supported)

– integrated with PBS resource manager (well-established in HPC)

– shunts and A/D converters are integrated via daughter card MV0198

• Arduino MEGA 2560 micro controller

– connects to MV0182 through the I2C bus

– connects to node of EXCESS cluster via USB serial interface

• Monitoring plug-in

– collects data from Arduino, and pushes it into the database

EXCESS Workshop 18

Page 19: Monitoring of HPC and Embedded Systems - EXCESSexcess-project.eu/excess_workshop/Chalmers16/hlrs-monitoring-in... · Monitoring of HPC and Embedded Systems ... •Greening of the

26/08/2016:: ::

::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::

::

Movidius Myriad2 Experiment Visualization

EXCESS Workshop <#>

http://mf.excess-project.eu

Page 20: Monitoring of HPC and Embedded Systems - EXCESSexcess-project.eu/excess_workshop/Chalmers16/hlrs-monitoring-in... · Monitoring of HPC and Embedded Systems ... •Greening of the

26/08/2016:: ::

::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::

::

Integration of Myriad2 into HPC Workflow

EXCESS Workshop <#>

Page 21: Monitoring of HPC and Embedded Systems - EXCESSexcess-project.eu/excess_workshop/Chalmers16/hlrs-monitoring-in... · Monitoring of HPC and Embedded Systems ... •Greening of the

26/08/2016:: ::

::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::

::

USAGE EXAMPLES

EXCESS Workshop 21

Page 22: Monitoring of HPC and Embedded Systems - EXCESSexcess-project.eu/excess_workshop/Chalmers16/hlrs-monitoring-in... · Monitoring of HPC and Embedded Systems ... •Greening of the

26/08/2016:: ::

::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::

::

EXCESS (FP7 Project) [EXCESS, 2013]

• Energy-aware scheduling with StarPU

EXCESS Workshop 22

StarPU ATOMMonitoring

Broker

Page 23: Monitoring of HPC and Embedded Systems - EXCESSexcess-project.eu/excess_workshop/Chalmers16/hlrs-monitoring-in... · Monitoring of HPC and Embedded Systems ... •Greening of the

26/08/2016:: ::

::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::

::

DreamCloud (FP7 Project) [DreamCloud, 2013]

EXCESS Workshop 23

• Exploit monitoring data to improve task allocation

Page 24: Monitoring of HPC and Embedded Systems - EXCESSexcess-project.eu/excess_workshop/Chalmers16/hlrs-monitoring-in... · Monitoring of HPC and Embedded Systems ... •Greening of the

26/08/2016:: ::

::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::

::

PHANTOM (H2020 Project) [PHANTOM, 2015]

• Extend ATOM’s support for embedded systems

EXCESS Workshop 24

Page 25: Monitoring of HPC and Embedded Systems - EXCESSexcess-project.eu/excess_workshop/Chalmers16/hlrs-monitoring-in... · Monitoring of HPC and Embedded Systems ... •Greening of the

26/08/2016:: ::

::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::

::

SUMMARY

EXCESS Workshop 25

Page 26: Monitoring of HPC and Embedded Systems - EXCESSexcess-project.eu/excess_workshop/Chalmers16/hlrs-monitoring-in... · Monitoring of HPC and Embedded Systems ... •Greening of the

26/08/2016:: ::

::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::

::

Take Away Messages

• ATOM monitoring framework– is a light-weight, and easy to use monitoring framework

focusing on HPC and embedded system support

– has fundamental performance and energy metric support

– easily extendable through a convenient plug-in system

– offers users various interfaces to explore profiling data (i.e. front-end, RESTful service; clients in Java, C and Python)

• Increasing demand across multiple projects including EXCESS, DreamCloud, JUNIPER, and PHANTOM

EXCESS Workshop 26

Page 27: Monitoring of HPC and Embedded Systems - EXCESSexcess-project.eu/excess_workshop/Chalmers16/hlrs-monitoring-in... · Monitoring of HPC and Embedded Systems ... •Greening of the

26/08/2016:: ::

::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::

::

Open Source

• Apache License v2.0

• Github (https://github.com/excess-project)

– monitoring-frontend

– monitoring-server

– monitoring-agent

– monitoring-api

– monitoring-setup-ansible

• API documentation

– https://excess-project.github.io/monitoring-server

EXCESS Workshop 27

Page 28: Monitoring of HPC and Embedded Systems - EXCESSexcess-project.eu/excess_workshop/Chalmers16/hlrs-monitoring-in... · Monitoring of HPC and Embedded Systems ... •Greening of the

26/08/2016:: ::

::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::

::

References• [Ashby et al., 2010]

– The Opportunities and Challenges of Exascale Comp., Summary Report of the Advanced Scientific Comp. Advisory Committee (ASCAC) Subcommittee at the US Department of Energy Office of Science, 2010.

• [CoolEmAll, 2011] http://tricoryne.man.poznan.pl

• [DreamCloud, 2013] http://www.dreamcloud-project.eu

• [ECO2Clouds, 2012] http://eco2clouds.eu

• [ExaSolvers, 2009] http://www.parallelintime.org/projects/sppexa.html• [EXCESS, 2013] http://www.excess-project.eu

• [Hoppe et al., 2015]– First Prototype of Monitoring Framework for the Conventional HPC and Movidius Platforms, Technical Report FP7-

611183 D3.3, EU FP7 Project EXCESS, February 2015.

• [Hoppe et al., 2016]– Lessons Learned and Final Remarks, Technical Report FP7-611183 D3.5, EU FP7 Project EXCESS, February 2016.

• [JUNIPER, 2012] http://www.juniper-project.eu

• [Katsaros et al., 2011]– Monitoring: A fundamental Process to provide QoS Guarantees in Cloud based Platforms, Cloud Computing:

Methodology, System, and Applications, 2011.

• [Khabi et al., 2016]– Report on the Final Evaluation Results and Discussion, Technical Report FP7-611183 D5.8, EU FP7 Project EXCESS,

August 2016.

• [PHANTOM, 2015] http://www.phantom-project.org

• [Telesca et al., 2014]– System Performance Monitoring of the ALICE Data Acquisition System with Zabbix, Journal of Physics, 2014.

EXCESS Workshop 28

Page 29: Monitoring of HPC and Embedded Systems - EXCESSexcess-project.eu/excess_workshop/Chalmers16/hlrs-monitoring-in... · Monitoring of HPC and Embedded Systems ... •Greening of the

26/08/2016:: ::

::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::

::

APPENDIX

EXCESS Workshop <#>

Page 30: Monitoring of HPC and Embedded Systems - EXCESSexcess-project.eu/excess_workshop/Chalmers16/hlrs-monitoring-in... · Monitoring of HPC and Embedded Systems ... •Greening of the

26/08/2016:: ::

::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::

::

RESTful Web Service

http://mf.excess-project.eu:3030

EXCESS Workshop <#>

Page 31: Monitoring of HPC and Embedded Systems - EXCESSexcess-project.eu/excess_workshop/Chalmers16/hlrs-monitoring-in... · Monitoring of HPC and Embedded Systems ... •Greening of the

26/08/2016:: ::

::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::

::

ATOM Client in C

EXCESS Workshop 31

Page 32: Monitoring of HPC and Embedded Systems - EXCESSexcess-project.eu/excess_workshop/Chalmers16/hlrs-monitoring-in... · Monitoring of HPC and Embedded Systems ... •Greening of the

26/08/2016:: ::

::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::

::

Web Front-End: List of Experiments

EXCESS Workshop 32

Page 33: Monitoring of HPC and Embedded Systems - EXCESSexcess-project.eu/excess_workshop/Chalmers16/hlrs-monitoring-in... · Monitoring of HPC and Embedded Systems ... •Greening of the

26/08/2016:: ::

::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::

::

Visualization of Metric Data

EXCESS Workshop 33