Department: EP-NU Report.docx.pdf · 4. Getting Started 5 4.1 Creating a CERN CentOS 7...

A WEB-BASED SOLUTION TO VISUALIZE

OPERATIONAL MONITORING LINUX

CLUSTER FOR THE PROTODUNE DATA

QUALITY MONITORING CLUSTER

BADISA MOSESANE EP-NU

Supervisor: Nektarios Benekos

Department: EP-NU

1

Table of Contents

Abstract 2

1. Introduction 3

2. System Overview 3

3. Key concepts of Puppet and Foreman as used in Neutrino cluster 4

4. Getting Started 5

4.1 Creating a CERN CentOS 7 Puppet-managed OpenStack virtual machine 5

4.3 Using Facter to display VM facts / information 6

5. Creating Puppet manifests that apply to npcmp hostgroup 7

5.1 Triggering Puppet to run as a background daemon 8

5.2 Using Hiera to configure modules 8

5.3 Installing packages using Puppet 9

5.4 Adding Physical nodes in Foreman 10

5.5 Visualizing reports from Puppet 11

6. Conclusion 12

7. References 13

2

Abstract

The Neutrino computing cluster made of 300 Dell PowerEdge 1950 U1 nodes serves an integral

role to the CERN Neutrino Platform (CENF). It represents an effort to foster fundamental

research in the field of Neutrino physics as it provides data processing facility. We cannot begin

to over emphasize the need for data quality monitoring coupled with automating system

configurations and remote monitoring of the cluster. To achieve these, a software stack has been

chosen to implement automatic propagation of configurations across all the nodes in the cluster.

The bulk of these discusses and delves more into the automated configuration management

system on this cluster to enable the fast online data processing and Data Quality (DQM) process

for the Neutrino Platform cluster (npcmp.cern.ch).

3

1. Introduction

One of the critical tasks during the protoDUNE operation period is the Data Quality Monitoring

(DQM) and Prompt processing. To run and maintain the data quality monitoring process, a set of

specific hardware of Linux server nodes has to be installed at experimental hall (EHN1). The

neutrino cluster consists of 300 U1 nodes each with 16 cores, 16 GB RAM and a CERN CentOS

(CC7) Linux has been installed as OS. There is an evident need for an automated configuration

management system for this cluster as monitoring of such infrastructure can be cumbersome. To

manage the machines hosted in EHN1, some virtual machine (VM) servers are hosted on CERN

OpenStack Cloud infrastructure. A Puppet-managed virtual machine was used and configured

with Puppet to provide a longer-lived service to users.

In the Neutrino cluster, open source Puppet helps to describe machine configurations in a

declarative language, bring machines to a desired state and keep them there through automation.

The deployment type of Puppet in this cluster was agent/master architecture. Puppet was used

with Foreman, an open source tool that helps with the management of servers by an easy

interaction with Puppet to automate tasks and application deployment. Foreman provides a

robust web interface that allows us to provision, configure cluster nodes and leverage its External

Node Classifier (ENC) and reporting capabilities to ease the management of Puppet.

For a fully operational cluster these tasks must be carried out; implementation and monitoring of

all the necessary web-paged tools for the DQM cluster operational monitoring, including:

frontend configuration, server nodes and users monitoring, job monitoring, batch process

monitoring, sites availability, data management and transfers between CERN-EOS and EHN1

and the outside world.

2. System Overview

The Neutrino cluster configuration management is based on Puppet and its suite of tools. The

main components include; facter, puppetDB, puppet manifests, Hiera (data), Foreman and

mCollective. A puppet run occurs several times a day in all the nodes. All hosts are registered in

Foreman service, a web-front end for the configuration management system which states

hostgroups (cluster of machines) where each host belongs to. Nodes periodically use a local

puppet agent daemon to ask for puppet master configuration to configure themselves.

The puppet master then asks Foreman service which hostgroup the asking node belongs to.

Puppet master finds out the configuration of hostgroups (e.g. packages needed for web server) by

reading puppet manifests and hiera (data) for that hostgroup stored in a Git repository. The

Puppet master compiles desired configuration for a node and hands it back to the nodes Puppet

agent to apply it. If the node had any configurations that were out of step, then the local puppet

agent applies the appropriate changes else the Puppet agent does nothing.

4

Foreman uses the concept of hostgroups to group nodes into a single IT service. The Neutrino

hostgroup (npcmp) is divided into sub-hostgroups to allow the service manager to split up the

sub components of the service. The sub-hostgroups are:

• npcmp/development which has development nodes,

• npcmp/balance for load balancing nodes,

• npcmp/frontend for web frontend,

• npcmp/htc for HTCondor batch system and

• npcmp/workers which are compute nodes.

The sub-hostgroups are as shown in the image below. To access the hostgroups one must be a

member of np-cmp-admin e-group and can use this link: https://judy.cern.ch/hostgroups/

3. Key concepts of Puppet and Foreman as used in Neutrino cluster

★ NPCMP Hostgroup: this is a property which is set on each node in Foreman when the

node is registered. All nodes in the npcmp hostgroup are all part of the same service, they

have common configuration and are managed by the same group of people i.e. np-cmp-

admin e-group.

★ Manifests: files written in the Puppet DSL language which Puppet compiles on every run

for each node. We have two types of manifests, npcmp manifests located in gitlab

repository (https://:@gitlab.cern.ch:8443/ai/it-puppet-hostgroup-npcmp.git) describes

nodes in the npcmp hostgroup and module manifests; re-usable units of code, typically

each configuring one OS daemon or OS feature. The manifests are stored and versioned

using the git versioning system.

https://judy.cern.ch/hostgroups/

http://docs.puppetlabs.com/puppet/latest/reference/lang_summary.html

http://docs.puppetlabs.com/puppet/latest/reference/lang_summary.html

5

★ Environments: collections of modules and hostgroups at different development levels.

They are defined in YAML files inside Git repository. The Puppet masters use this

repository to check out the correct modules and hostgroups for each environment. A

machine belongs to one and only one environment (usually production). Nodes in this

environment get the released production version of all hostgroup and module code. In the

qa environment, nodes get the pre-release version of all hostgroup and module code.

★ Change Control: A critical part managing services for our cluster is Change Control

process (QA process), designed to give maximum visibility to configuration changes

going through the system. All changes on shared modules are tested by the author in a

feature branch then merged into qa environment to inform others about the change.

★ Hiera: A key/value store with a hierarchical search path. The configuration values (data)

used by Puppet to describe nodes are put here. These are also stored (and versioned)

using the Git versioning system.

4. Getting Started

4.1 Creating a CERN CentOS 7 Puppet-managed OpenStack virtual machine

First, we created a second level hostgroup (npcmp) in Foreman where our virtual machines are

going to land in. Next SSH into aiadm and clone the git repository containing all the bits of the

npcmp hostgroup in the AFS directory.

In the cloud panel, download the keypair “npcmpkey” under the access & security tab, run the

command shown below to make the VM known to Foreman, boot it and wait for the first two

Puppet reports. We spawned 4 virtual machines np-cmp-vmldb-01, np-cmp-vmldb-02, np-cmp-

vmldb-03, np-cmp-vmldb-04 and instances are displayed in the OpenStack EP npcmp project.

6

4.2 Configuring Git environment

Our it-puppet-hostgroup-npcmp in Gitlab heavily uses Git version control system to manage

Puppet nodes in the neutrino cluster. This helps coordinate developers work to avoid too many

conflicts when merging code to masters.

4.3 Using Facter to display VM facts / information

The VMs are ready and configured with Puppet, we now can navigate to Foreman to see the

latest Puppet reports of each VM. Once inside our VM, we can use the facter command to

display facts about the system such as operating system, the domain, etc.

A snippet of output of the above command is as shown below;

7

5. Creating Puppet manifests that apply to npcmp hostgroup

We created a skeleton puppet manifest inside the “code/manifests/npcmp.pp” directory and

added first lines of puppet code inside the directory where the git repo was cloned. In our sub-

hostgroup that takes care of the batch system, we have made a manifest to the htc sub-hostgroup.

After making this manifest we add to the staging area, commit and push to Git. These new codes

become visible to Puppet master by less than a minute.

8

5.1 Triggering Puppet to run as a background daemon Since puppet can take about an hour to run as a daemon in background it is useful to trigger a

Puppet run manually. On our VM, np-cmp-vmlbd-01, we get on a root shell and force a run to

apply new changes made in the manifest. First SSH into [email protected] and run

the command puppet agent -t -v.

The base configuration done on the manifest file has been fully applied by Puppet. The

highlighted notify message shown in the above screenshot was edited in the init.pp file inside it-

puppet-hostgroup-npcmp/code/manifests/init.pp

5.2 Using Hiera to configure modules

Modules are configurable via Hiera variables that are set at hostgroup level. Puppet has a set of

reusable modules maintained by several IT groups that can be configured on our nodes. AFS and

SSSD were configured using existing modules so our VMs have access to AFS and SSSD and

has an interactive shell. It is worth noting that all the relevant monitoring bits for the new

running services are brought in by Puppet automatically.

mailto:[email protected]

http://configdocs.web.cern.ch/configdocs/details/hiera.html

9

SSSD was configured and AFS was turned off in the load balancing client. Other modules can be

configured in the manifest in a similar manner. The configured SSSD can be used to grant shell

access to a given user. The image below demonstrates how to give a user access to the machines

in a hostgroup.

5.3 Installing packages using Puppet

Puppet does away with the worry of dependencies as it can install packages on behalf of the user.

All Puppet-managed boxes, CERN CentOS and EPEL repositories are all managed by OSrepos

module. This module activates yum distro-sync in operating systems of the RedHat family with

major versions 6 and above. When using this module, repositories will be installed in /etc/yum-

puppet.repos.d/. Examples of packages to be installed using Puppet are Ganglia, Nagios,

EOSclient etc.

https://gitlab.cern.ch/ai/it-puppet-module-ganglia

https://gitlab.cern.ch/ai/it-puppet-module-nagios

https://gitlab.cern.ch/ai/it-puppet-module-eosclient/tree/master

10

We tell Puppet to install this package to the hostgroup manifest. More packages can be installed

and availed to all nodes in a similar fashion.

5.4 Adding Physical nodes in Foreman

To add a physical machine in Foreman, we set the desired name, hostgroup, interfaces (IP, MAC,

FQDN), operating system. Foreman resolves a kickstart template for the physical host triggering

an installation by first, preparing the host for installation and rebooting the box to begin

installation process.

After setting the desired parameters, OS, interfaces and additional information we can use a CLI

to perform a similar task by running ai-installhost and ai-remote-power-control command on

11

aiadm. Physical machines must be booted over the network and to start the installation we have

to reboot the physical machine that was added as a new host in Foreman.

5.5 Visualizing reports from Puppet

Foreman allows us to visualize reports from Puppet runs on each individual physical node in our

Neutrino cluster. The reports tab shown in the image allows to zoom in to the activity of the

nodes, view applied, restarted, failed, skipped and pending events.

Foreman provides a way to view the host configuration chart and the status of hosts in the

cluster. Among other things it shows hosts that had performed modifications without error, hosts

in error state, good host reports in the last 1 day, hosts that had pending changes, out of sync

hosts, hosts with no reports and finally hosts with alerts disabled. Run distribution graph shows

the number of clients over time in a space of 1500 minutes.

12

6. Conclusion

The Neutrino computing cluster has a noticeable development progress as now most of the nodes

which were bare metals are now running the CERN CentOS 7 (CC7) operating system. The

configuration management system of the cluster was done based on Puppet and Foreman for its

reporting capabilities. Part of the work or tasks to be done in the future for this cluster is to setup

a notification system to alert users and/or admins of the Neutrino system in real time information

about the nodes and jobs. The batch system for the Neutrino cluster is HTCondor which is a

specialized workload management system for compute-intensive jobs. A cluster monitoring

software will be configured to monitor the neutrino cluster as it has the capability to scale as we

envision to add more nodes to the cluster.

13

7. References

CERN. (2017). CERN Configuration Management System User Guide. [online] Available at:

https://configdocs.web.cern.ch/configdocs/ [Accessed 9 Aug. 2017].

CERN. (2017). DpmSetupPuppetInstallation. [online] Available at:

https://twiki.cern.ch/twiki/bin/view/DPM/DpmSetupPuppetInstallation/ [Accessed 9 Aug. 2017].

CERN. (2017). CERN ops. [online] Available at:

https://github.com/cernops/ [Accessed 9 Aug. 2017].

Department: EP-NU Report.docx.pdf · 4. Getting Started 5 4.1 Creating a CERN CentOS 7...

Documents

Transcript of Department: EP-NU Report.docx.pdf · 4. Getting Started 5 4.1 Creating a CERN CentOS 7...