Evaluation and implementation of CEP mechanisms …These metrics describe the status of the hosts...

Project report

CERN Summer Student Programme

Evaluation and implementationof CEP mechanisms to act upon

infrastructure metricsmonitored by Ganglia

Author:Martin Adam

Supervisors:Cristovao Cordeiro,Domenico Giordano

September 4, 2015

IT-SDC

AbstractThe LHC experiments are progressively moving towards computing resourcesthat are provided dynamically by Cloud services. It is important to monitor thehealth and performance of the virtual machines of these dynamic clusters andto provide early warnings in order to prevent the problems of degraded serviceand interruptions due to eventual failures of the cluster nodes. The goal of theproject is to develop a system that will digest monitoring information comingfrom the cluster, analyze it almost in real time and provide necessary inputfor the control engine of the workload management systems of the experiments.The system should be generic and not coupled to any experiment frameworks,so that it can be used by any LHC experiment.

1

AcknowledgmentI would like to express my special thanks of gratitude to my supervisors CristovaoCordeiro and Domenico Giordano for support, guidance and great patience withme over the summer, Luca Magnoni for many expert advice throughout thewhole project, my office mate Rocio for company when working late and therest of the section IT-SDC-MI for help during my whole stay in CERN.

2

ContentsAbstract 1

1 Introduction 4

2 Application design 4

3 Technologies 5

4 Users documentation 6

5 Processing Ganglia data 75.1 Data retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75.2 Data parsing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85.3 Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

5.3.1 Classifying . . . . . . . . . . . . . . . . . . . . . . . . . . 95.3.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 10

5.4 Alarms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105.5 Storing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

6 Results 12

7 Conclusion 14

Bibliography 15

3

1 IntroductionThe WLCG is integrating cloud technologies, which provide an additional ap-proach for delivering required computing capacity [1]. With the current mon-itoring systems the metrics only provide information about the current statusof the servers, but no alarms nor any automatic action is triggered when ananomaly occurs. Providing such alarms could improve the utilization of thecloud infrastructure by flagging the erratically behaving servers, which thencould be deleted and the resources occupied by them consequently released andre-used by another server.

For monitoring the cloud infrastructure the Ganglia monitoring system isbeing adopted, even though it does not specialize solely in cloud monitoring.Independently of whether the monitored servers are physical or virtual, thepurpose of this application is to gather Ganglias’ monitoring data, re-use it todetect erratic behavior and promptly issue an alarm, which would then providean opportunity for the administrator or provisioner to react and fix the situation.

This project has proven that the concept of using Complex Event Process-ing (CEP) mechanisms for analyzing live monitoring data and issuing onlinenotifications could be successfully implemented. The outline of this paper isas follows: section 2 describes the general design of the application, section 3introduces the technologies that were used, section 4 instructs the user how touse the application, section 5 describes the implementation, section 6 providesthe results of a test deployment and some conclusions are provided in section 7.

2 Application designThe name of the application is DAM-Alarming, where DAM stands for DataAnalytics from Monitoring and represents several loosely coupled projects whichgather monitoring data from the cloud infrastructure, analyze them and provideoutput to the users.

DAM-Alarming fetches data from Ganglia using TCP. This data is thenparsed into events that are fed to the CEP engine which processes them usingseveral queries, and on the output we get alarms, all of which are inserted intoElasticsearch, and some issued through email notifications.

Figure 1: Application architecture

4

3 TechnologiesGanglia[2] is a scalable distributed system monitoring tool for high-performancecomputing systems such as clusters and grids. It allows the user to remotely viewlive or historical statistics (such as CPU utilization, load averages and networktraffic). Ganglia is based on a hierarchical design targeted at federations ofclusters. It consists of two daemons: gmond which collects monitoring metricsand gmetad which polls the metrics from gmonds, aggregates and stores themin a Round-robin database.

Esper[3] is an open source event series analysis and event correlation engine.It enables you to detect rich situations in event series (historical or currently-arriving) and to trigger custom actions when event conditions occur amongevent series. Esper is designed for high volume event correlation where millionsof events coming in would make it difficult to store them all to later queryusing classical database architectures. Esper provides a rich Event ProcessingLanguage (EPL) to express filtering, aggregation, and joins, possibly over slidingwindows of multiple event series. It also includes pattern semantics to expresscomplex temporal causality among events (followed-by relationship).

Elasticsearch[4] is a distributed, open source search and analytics engine, de-signed for horizontal scalability, reliability, and easy management using schema-free JSON documents. Logstash[5] is a flexible, open source data collection, en-richment, and transportation pipeline. With connectors to common infrastruc-ture for easy integration, Logstash is designed to efficiently process a growinglist of log, event, and unstructured data sources for distribution into a varietyof outputs, including Elasticsearch. Kibana [6]is an open source, interactivebrowser based analytics and search dashboard for Elasticsearch.

5

4 Users documentationDAM-Alarming is distributed using a jar. When ran from a command line, usercan specify several parameters.

$ java -jar DAM - Alarming .jar -husage: dam - alarming

--cluster <arg > only retrieve from theseclusters

-e disable email listener-h print this message

--host <arg > only retrieve from these hosts-l <arg > logging level-m force parsing into POJO

--server <arg > only retrieve from these servers--vo <arg > only retrieve from these VOs

The long parameters cluster, host, server and vo are used for specifyingthe cluster, host, server and VO respectively, from which the user wants toretrieve data. The argument for all of these could be one value, or a commaseparated list of multiple values.

The e parameter disables the email listener, which effectively turns off theemail notifications so alarms will be only logged and inserted into Elasticsearch.

The l parameter specifies the logging level. The choices are ALL,TRACE,DEBUG, WARN, ERROR, FATAL and OFF. On the DEBUG level, the application logsall of its progress, including the initialization of the Esper engine and insertionof all the events. On the ERROR level, the non-fatal errors are logged, includingthe stack trace of the exception that caused the error. On FATAL level, only theerrors that cause the shutdown of the application are logged. Finally the defaultlevel is INFO, on which the listener logs all the events that trigger alarms.

Inside the jar file, there are two files with .properties suffix, both of whichare used as configuration files. File log4j.properties is used to configurethe log4j logging engine and only the variable logDir specifying the directorywhere the log files are created may be changed. Apache log4j[7] is a Java-basedlogging utility. It is easy to use and has multiple output formats with manycompatibilities including Logstash.

The file configuration.peoperties is the main configuration file. Specificdocumentation of the configuration directives could be found in the next sectionon relevant places.

6

5 Processing Ganglia dataThe application uses a two threaded architecture. The main thread handles theinitialization, the parsing and the processing of monitoring data fetched by thesecond thread.

5.1 Data retrievalThe gmetad collects data from gmonds every 15 seconds, so for consistency,DAM-Alarming retrieves data with the same frequency. For this, the applicationneeded a persistent framework to schedule the retrieving thread in a fixed rate.The Quartz scheduler [8] was evaluated, but added extra complexity with noclear advantage over native scheduling, which was used in the end (specificallythe package java.util.concurrent). The scheduler is using a single threadpool implementation, which prevents the overlapping of retrieving runs.

The task of retrieving data from Ganglia is handled by an instance of theclass XMLRetriever. On initialization it downloads a JSON file describing eachGanglia monitoring server the application is supposed to get monitoring datafrom. The URL of this JSON is specified in the configuration file under keymainJson. Every hostname in the json represents one ganglia monitor. Theobject indexed by the hostname key must contain information about VO, nameof the monitored clusters, name of the json file describing the clusters sendingdata to this server and a description:

{" agmegi .cern.ch": {

"VO": "ATLAS"," cluster_cfg ": "egi -atlas -gmond -cluster -cfg.json"," description ": " Ganglia Monitor for all ATLAS VMs in

EGI"}, ...

}

Once this file is downloaded and parsed, the class proceeds to download andparse JSON files from each monitoring server. If the user inputted any cluster,VO or server filtering criteria, they are applied here.

The class implements the java.lang.Runnable. When the run() is invokedby the scheduler of the main thread, all the clusters are polled for monitoringdata. There is a 1.5 second timeout on polling implemented using a future task:

FutureTask<Str ing> fu tu r e = new FutureTask<Str ing >(new Cal lab le<Str ing >() {

public St r ing c a l l ( ) {return getMonitor ingSnapshot ( monitor ) ;}

} ) ;try {

exec . execute ( fu tu r e ) ;S t r ing out = fu tu r e . get (1500 , TimeUnit .MILLISECONDS) ;

}

7

The final XML consisting of multiple monitoring snapshots from all theclusters is surrounded in the root element REPORT so it can be parsed as onedocument. Surrounding each cluster report, an element VO is added. It has twoattributes: NAME and SERVER, used to easily identify to which VO, respectivelyserver the report belongs. The NAME specifies the virtual organization which thecluster serves, the attribute SERVER contains the URL of the monitoring server.

5.2 Data parsingParsing raw data handles an instance of class XMLParser, which is created whenthe application is launched. The parsing is done using the library org.xml.sax[9],which was chosen for its’ simplicity: SAX parser does not create an in-memoryrepresentation of the XML document, instead it applies a stream approach,which is sufficient for this use case.

The XMLs are parsed to map objects per host, each of them then convertedinto an Esper event. Each event contains several fixed properties describing thehost, see table 1.

These metrics describe the status of the hosts gmond. The list of metricsdescribing the hosts status can be configured via the configuration file usingthe keys intMetrics and floatMetrics. The value should contain a commaseparated list of values, that match the names of the ganglia metrics the userwants to parse to integer or float respectively. All the metrics can be thenaccessed by the Espers EPL statements.

5.3 ProcessingOnce the events are parsed, they are immediately sent to the Epser engine usinga four layer alarming model, similar to the model Metis monitoring system[10][11] uses. The input events are parsed monitoring reports from Ganglia.The first statement classifies the state of the host (see section 5.3.1) outputtingstatus of the host at the time. Second statement keeps only events representingstate transition (see 5.3.2). The third statement filters email notifications toreduce email traffic (see 5.4).

When the engine is initialized the application starts loading the .epl files

Table 1: Host metrics parsed into an eventKey Type Notehostname String Hostname of the VMcluster String Name of the cluster the VM belongs tomonitor String Hostname of the monitoring server the VM belongs toVO String Name of the VO the VM belongs toreported Long Timestamp of the reporttimestampToDate String Timestamp of the report translated to human readable dategmondStart Integer Timestamp of when the gmond on the host startedtn Integer Time since last metric report by the hosts gmondip String IPlocation String Location of the machine as defined in gangliatmax Integer Ganglia internal parameter

8

Figure 2: Monitoring model

containing modules that are then deployed and the statements they contain de-fine the applications’ output. The list of modules to be deployed is specifiedby the key EPLModules in the configuration file thus making the applicationsoutput easily configurable. There are three files with implemented modules:filter.epl, core.epl, statistics.epl. The filter.epl file contains themodule that classifies the state of the VM and creates an event, which is thenprocessed by the statements in module core located in file core.epl. Thestatistics.epl file contains statements counting the total number of rawevents and notification distribution over clusters and concrete hostnames whichoutputs directly into the log file.

5.3.1 Classifying

There are two statements in the filter module. First one filters out the hosts,that reported a long time ago, thus their report is no longer valid and areclassified as UNREACHABLE. The second statement chooses one of the states OK,WARNING, ERROR considering the cpu_idle metric and comparing it with fixedthresholds.

9

insert into Checkselect

hostname ,avg( cpu_idle ) as ave ,cpu_idle as lastVal ,’cpu_idle ’ as metricName ,VO ,cluster ,monitor ,reported ,timestampToDate ,’high cpu_idle ’ as condition ,case

when avg( cpu_idle ) > 85then State.ERROR

when avg( cpu_idle ) > 70then State. WARNING

when cpu_idle is nullthen State. UNKNOWN

else State.OKend as state

from GangliaReport .win:time( var_timeWindowLength min)group by hostname , clusterhaving count (*) > ( var_timeWindowLength *2);

The having clause in the end prevents averaging over less than a half filledwindow. The event is then inserted and processed by the core statements.

5.3.2 Evaluation

First of the core statements processes the stream of Check events, groups themby hostname and cluster and outputs only those, which report a different statusthan the previous one.

insert into Statusselect * from Check.std: groupwin (hostname , cluster ).

win: length (2)where state is not prev (1, state)

and prev (1, state) is not nulland state is not State. UNKNOWN ;

It outputs in the Status event representing a transition in host state. Allthese events are inserted into ElasticSearch. Next filtering is then applied tolower the number of events resulting into email notifications in order not tospam the user with false positive alarms.

5.4 AlarmsWhen the output of a statement has to be handled differently then just insertedinto another statement, an instance of a class implementing the UpdateListener[12]interface has to be attached to the statement.

10

The application outputs to a log file for debugging purposes (handled by theSimpleListener class), but the main aim is to alarm the user. For this reasonthere is an EMailListener attached to the top level statement. The applica-tion also features other classes, implementing the UpdateListener interface butapart from the JsonListener inserting data into Elasticsearch, all are mainlyfor statistics and debugging purposes.

Figure 3: Chart from Kibana4 visualising the number of notifications in time

5.5 Storing

Figure 4: Kibana4 pie chart visualizingthe distribution of notification types

All the Status events are insertedinto ElasticSearch. There is are twokibana dashboards12 providing statis-tics and visualization for alarms (seefigure 5.3). Both feature severalgroupings and statistics reporting themost erratic clusters, VMs, distribu-tion of state types (see figure 5.4) andmore. This became a crucial partof the development process, especiallywhen tuning the EPL statements (fig-ure 5.5 illustrates illustrates, how canKibana help with independent casestudy).

Figure 5: Chart from Kibana3 visualising the number of notifications in timeper one host

1https://dashb-es-dev.cern.ch\protect\kern+.2222em\relax444/#/dashboard/DAM-Alarming

2https://dashb-es-dev.cern.ch/_plugin/kibana/hashtag/dashboard/file/default.json/hashtag/#/dashboard/elasticsearch/DAM-CEP

11

https://dashb-es-dev.cern.ch\protect \kern +.2222em\relax 444/#/dashboard/DAM-Alarming

https://dashb-es-dev.cern.ch\protect \kern +.2222em\relax 444/#/dashboard/DAM-Alarming

https://dashb-es-dev.cern.ch/_plugin/kibana/hashtag/dashboard/file/default.json/hashtag/#/dashboard/elasticsearch/DAM-CEP

https://dashb-es-dev.cern.ch/_plugin/kibana/hashtag/dashboard/file/default.json/hashtag/#/dashboard/elasticsearch/DAM-CEP

6 ResultsThe prototype of the application waslaunched two weeks into development and the results were presented in theIT-SDC-MI section meeting on the 20th July. The prototype was getting dataonly from the LHCb cloud Ganglia monitoring server 3 and had only the filteringstatement deployed, the advanced alarming model was yet to be implemented.The test clarified, that this statement could determine a VM behaving errati-cally.

Figure 6: Top 10 alarming hosts in firstlive run

After the all components of theapplication were tested by an unittest, the alarming model was im-proved and the application waslaunched for all the production Gan-glia monitoring servers. Over thefirst 17 hours 21000 emails were sent,which was considered to be too muchfor manual evaluation, so the statis-tics EPL module was developed andover the next two days the applica-tion was not only sending emails, butalso grouping data and writing statis-tics snapshots into a log file. Therewere two evaluated runs, each last-ing 16 and 17 hours. The statisticshinted, that the distribution of notifications isn’t uniform over all the VMs,rather some of them change status rapidly thus producing a large number ofalarms.

After tweaking the thresholds and the window length over which the averagemetrics are computed, the application produced emails in a rate of 500/hour,so the need for filtering email notifications rose and the third layer of EPL wasadded (see top 10 most alarming hosts of run 1 in figure 6.6 and of run 2 infigure 6.7. Notice the decrease in the total number of notifications for the mostalarming host).

3lgm.cern.ch

12

lgm.cern.ch

Figure 7: Top 10 alarming hosts in second live run after tweaking the filteringstatement

The evaluation of those statistics was time consuming, so a decision to con-nect the output to Elasticsearch was made. These data plots (see figure 5.5)helped tweaking the statements so the amount of false positive alarms could beminimized.

The final performance of the application was presented on a White Arealecture [13] on 2nd September. The results presented included performance over24 hours from 27.8. 12:00 - 28.8. 12:00, when while monitoring 4417 serversthe application inserted 8737 status updates into Elasticsearch and issued 106email notifications.

13

7 ConclusionThe anomaly detection and alarming system based on raw monitoring data com-ing from Ganglia was implemented successfully and thus the concept of usage ofCEP in analyzing data from monitoring cloud cluster infrastructure was proven.The application was tested on 5 production Ganglia monitoring servers servingATLAS, CMS and LHCb processing more then 65 000 events/hour detectinganomalies and erratic behavior based on the cpu_idle metric, issuing alarmswithout spamming the user with emails. Yet to be evaluated is considering moremetrics and the possibility to correlate them in one event as well as developingmore elaborate ways to reduce email traffic, including flapping detection andweekly cluster reports with live notifications based only on the overall clusterperformance.

14

References[1] IAN BIRD. Computing for the Large Hadron Collider. Annual Re-

view of Nuclear and Particle Science. 2011-11-23, vol.61(1):99-118. DOI:10.1146/annurev-nucl-102010-130059.

[2] MATT MASSIE, Bernard Li. Monitoring with Ganglia. 1. ed. Sebastopol,CA: O’Reilly, 2013. ISBN 978-144-9329-709.

[3] Esper official webpage. Referenced: 2015-08-26.Url: http://www.espertech.com/products/esper.php

[4] Elasticsearch official webpage. Referenced: 2015-09-02.Url: https://www.elastic.co/products/elasticsearch

[5] Logstash official webpage. Referenced: 2015-09-02.Url: https://www.elastic.co/products/logstash

[6] Kibana official webpage. Referenced: 2015-09-02.Url: https://www.elastic.co/products/kibana

[7] Log4j official webpage. Referenced: 2015-09-02.Url: http://wiki.apache.org/logging/log4j1

[8] Quartz scheduler official webpage. Referenced: 2015-08-25.Url: http://quartz-scheduler.org/

[9] SAX official webpage. Referenced: 2015-08-25.Url: http://sax.sourceforge.net/

[10] Messaging at CERN. Referenced: 2015-09-03.Url: https://mig.web.cern.ch/mig/

[11] textitLuca Magnoni, Advanced monitoring with complex stream processingReferenced: 2015-09-03.Url: https://indico.cern.ch/event/382420/

[12] ESPER JavaDoc: UpdateListener interface. Referenced: 2015-09-03.Url: http://www.espertech.com/esper/release-5.2.0/esper-javadoc/com/espertech/esper/client/UpdateListener.html

[13] White Area Lecture: Martin Adam, Evaluation and implementation ofCEP mechanisms to act upon infrastructure metrics monitored by GangliaReferenced: 2015-09-03.Url: https://indico.cern.ch/event/441530/

15

http://www.espertech.com/products/esper.php

https://www.elastic.co/products/elasticsearch

https://www.elastic.co/products/logstash

https://www.elastic.co/products/kibana

http://wiki.apache.org/logging/log4j1

http://quartz-scheduler.org/

http://sax.sourceforge.net/

https://mig.web.cern.ch/mig/

https://indico.cern.ch/event/382420/

http://www.espertech.com/esper/release-5.2.0/esper-javadoc/com/espertech/esper/client/UpdateListener.html

http://www.espertech.com/esper/release-5.2.0/esper-javadoc/com/espertech/esper/client/UpdateListener.html

https://indico.cern.ch/event/441530/

Evaluation and implementation of CEP mechanisms …These metrics describe the status of the hosts...

Documents

Transcript of Evaluation and implementation of CEP mechanisms …These metrics describe the status of the hosts...