Department: EP-NU Report.docx.pdf · 4. Getting Started 5 4.1 Creating a CERN CentOS 7...
Transcript of Department: EP-NU Report.docx.pdf · 4. Getting Started 5 4.1 Creating a CERN CentOS 7...
A WEB-BASED SOLUTION TO VISUALIZE
OPERATIONAL MONITORING LINUX
CLUSTER FOR THE PROTODUNE DATA
QUALITY MONITORING CLUSTER
BADISA MOSESANE EP-NU
Supervisor: Nektarios Benekos
Department: EP-NU
1
Table of Contents
Abstract 2
1. Introduction 3
2. System Overview 3
3. Key concepts of Puppet and Foreman as used in Neutrino cluster 4
4. Getting Started 5
4.1 Creating a CERN CentOS 7 Puppet-managed OpenStack virtual machine 5
4.3 Using Facter to display VM facts / information 6
5. Creating Puppet manifests that apply to npcmp hostgroup 7
5.1 Triggering Puppet to run as a background daemon 8
5.2 Using Hiera to configure modules 8
5.3 Installing packages using Puppet 9
5.4 Adding Physical nodes in Foreman 10
5.5 Visualizing reports from Puppet 11
6. Conclusion 12
7. References 13
2
Abstract
The Neutrino computing cluster made of 300 Dell PowerEdge 1950 U1 nodes serves an integral
role to the CERN Neutrino Platform (CENF). It represents an effort to foster fundamental
research in the field of Neutrino physics as it provides data processing facility. We cannot begin
to over emphasize the need for data quality monitoring coupled with automating system
configurations and remote monitoring of the cluster. To achieve these, a software stack has been
chosen to implement automatic propagation of configurations across all the nodes in the cluster.
The bulk of these discusses and delves more into the automated configuration management
system on this cluster to enable the fast online data processing and Data Quality (DQM) process
for the Neutrino Platform cluster (npcmp.cern.ch).
3
1. Introduction
One of the critical tasks during the protoDUNE operation period is the Data Quality Monitoring
(DQM) and Prompt processing. To run and maintain the data quality monitoring process, a set of
specific hardware of Linux server nodes has to be installed at experimental hall (EHN1). The
neutrino cluster consists of 300 U1 nodes each with 16 cores, 16 GB RAM and a CERN CentOS
(CC7) Linux has been installed as OS. There is an evident need for an automated configuration
management system for this cluster as monitoring of such infrastructure can be cumbersome. To
manage the machines hosted in EHN1, some virtual machine (VM) servers are hosted on CERN
OpenStack Cloud infrastructure. A Puppet-managed virtual machine was used and configured
with Puppet to provide a longer-lived service to users.
In the Neutrino cluster, open source Puppet helps to describe machine configurations in a
declarative language, bring machines to a desired state and keep them there through automation.
The deployment type of Puppet in this cluster was agent/master architecture. Puppet was used
with Foreman, an open source tool that helps with the management of servers by an easy
interaction with Puppet to automate tasks and application deployment. Foreman provides a
robust web interface that allows us to provision, configure cluster nodes and leverage its External
Node Classifier (ENC) and reporting capabilities to ease the management of Puppet.
For a fully operational cluster these tasks must be carried out; implementation and monitoring of
all the necessary web-paged tools for the DQM cluster operational monitoring, including:
frontend configuration, server nodes and users monitoring, job monitoring, batch process
monitoring, sites availability, data management and transfers between CERN-EOS and EHN1
and the outside world.
2. System Overview
The Neutrino cluster configuration management is based on Puppet and its suite of tools. The
main components include; facter, puppetDB, puppet manifests, Hiera (data), Foreman and
mCollective. A puppet run occurs several times a day in all the nodes. All hosts are registered in
Foreman service, a web-front end for the configuration management system which states
hostgroups (cluster of machines) where each host belongs to. Nodes periodically use a local
puppet agent daemon to ask for puppet master configuration to configure themselves.
The puppet master then asks Foreman service which hostgroup the asking node belongs to.
Puppet master finds out the configuration of hostgroups (e.g. packages needed for web server) by
reading puppet manifests and hiera (data) for that hostgroup stored in a Git repository. The
Puppet master compiles desired configuration for a node and hands it back to the nodes Puppet
agent to apply it. If the node had any configurations that were out of step, then the local puppet
agent applies the appropriate changes else the Puppet agent does nothing.
4
Foreman uses the concept of hostgroups to group nodes into a single IT service. The Neutrino
hostgroup (npcmp) is divided into sub-hostgroups to allow the service manager to split up the
sub components of the service. The sub-hostgroups are:
• npcmp/development which has development nodes,
• npcmp/balance for load balancing nodes,
• npcmp/frontend for web frontend,
• npcmp/htc for HTCondor batch system and
• npcmp/workers which are compute nodes.
The sub-hostgroups are as shown in the image below. To access the hostgroups one must be a
member of np-cmp-admin e-group and can use this link: https://judy.cern.ch/hostgroups/
3. Key concepts of Puppet and Foreman as used in Neutrino cluster
★ NPCMP Hostgroup: this is a property which is set on each node in Foreman when the
node is registered. All nodes in the npcmp hostgroup are all part of the same service, they
have common configuration and are managed by the same group of people i.e. np-cmp-
admin e-group.
★ Manifests: files written in the Puppet DSL language which Puppet compiles on every run
for each node. We have two types of manifests, npcmp manifests located in gitlab
repository (https://:@gitlab.cern.ch:8443/ai/it-puppet-hostgroup-npcmp.git) describes
nodes in the npcmp hostgroup and module manifests; re-usable units of code, typically
each configuring one OS daemon or OS feature. The manifests are stored and versioned
using the git versioning system.
5
★ Environments: collections of modules and hostgroups at different development levels.
They are defined in YAML files inside Git repository. The Puppet masters use this
repository to check out the correct modules and hostgroups for each environment. A
machine belongs to one and only one environment (usually production). Nodes in this
environment get the released production version of all hostgroup and module code. In the
qa environment, nodes get the pre-release version of all hostgroup and module code.
★ Change Control: A critical part managing services for our cluster is Change Control
process (QA process), designed to give maximum visibility to configuration changes
going through the system. All changes on shared modules are tested by the author in a
feature branch then merged into qa environment to inform others about the change.
★ Hiera: A key/value store with a hierarchical search path. The configuration values (data)
used by Puppet to describe nodes are put here. These are also stored (and versioned)
using the Git versioning system.
4. Getting Started
4.1 Creating a CERN CentOS 7 Puppet-managed OpenStack virtual machine
First, we created a second level hostgroup (npcmp) in Foreman where our virtual machines are
going to land in. Next SSH into aiadm and clone the git repository containing all the bits of the
npcmp hostgroup in the AFS directory.
In the cloud panel, download the keypair “npcmpkey” under the access & security tab, run the
command shown below to make the VM known to Foreman, boot it and wait for the first two
Puppet reports. We spawned 4 virtual machines np-cmp-vmldb-01, np-cmp-vmldb-02, np-cmp-
vmldb-03, np-cmp-vmldb-04 and instances are displayed in the OpenStack EP npcmp project.
6
4.2 Configuring Git environment
Our it-puppet-hostgroup-npcmp in Gitlab heavily uses Git version control system to manage
Puppet nodes in the neutrino cluster. This helps coordinate developers work to avoid too many
conflicts when merging code to masters.
4.3 Using Facter to display VM facts / information
The VMs are ready and configured with Puppet, we now can navigate to Foreman to see the
latest Puppet reports of each VM. Once inside our VM, we can use the facter command to
display facts about the system such as operating system, the domain, etc.
A snippet of output of the above command is as shown below;
7
5. Creating Puppet manifests that apply to npcmp hostgroup
We created a skeleton puppet manifest inside the “code/manifests/npcmp.pp” directory and
added first lines of puppet code inside the directory where the git repo was cloned. In our sub-
hostgroup that takes care of the batch system, we have made a manifest to the htc sub-hostgroup.
After making this manifest we add to the staging area, commit and push to Git. These new codes
become visible to Puppet master by less than a minute.
8
5.1 Triggering Puppet to run as a background daemon Since puppet can take about an hour to run as a daemon in background it is useful to trigger a
Puppet run manually. On our VM, np-cmp-vmlbd-01, we get on a root shell and force a run to
apply new changes made in the manifest. First SSH into [email protected] and run
the command puppet agent -t -v.
The base configuration done on the manifest file has been fully applied by Puppet. The
highlighted notify message shown in the above screenshot was edited in the init.pp file inside it-
puppet-hostgroup-npcmp/code/manifests/init.pp
5.2 Using Hiera to configure modules
Modules are configurable via Hiera variables that are set at hostgroup level. Puppet has a set of
reusable modules maintained by several IT groups that can be configured on our nodes. AFS and
SSSD were configured using existing modules so our VMs have access to AFS and SSSD and
has an interactive shell. It is worth noting that all the relevant monitoring bits for the new
running services are brought in by Puppet automatically.
9
SSSD was configured and AFS was turned off in the load balancing client. Other modules can be
configured in the manifest in a similar manner. The configured SSSD can be used to grant shell
access to a given user. The image below demonstrates how to give a user access to the machines
in a hostgroup.
5.3 Installing packages using Puppet
Puppet does away with the worry of dependencies as it can install packages on behalf of the user.
All Puppet-managed boxes, CERN CentOS and EPEL repositories are all managed by OSrepos
module. This module activates yum distro-sync in operating systems of the RedHat family with
major versions 6 and above. When using this module, repositories will be installed in /etc/yum-
puppet.repos.d/. Examples of packages to be installed using Puppet are Ganglia, Nagios,
EOSclient etc.
10
We tell Puppet to install this package to the hostgroup manifest. More packages can be installed
and availed to all nodes in a similar fashion.
5.4 Adding Physical nodes in Foreman
To add a physical machine in Foreman, we set the desired name, hostgroup, interfaces (IP, MAC,
FQDN), operating system. Foreman resolves a kickstart template for the physical host triggering
an installation by first, preparing the host for installation and rebooting the box to begin
installation process.
After setting the desired parameters, OS, interfaces and additional information we can use a CLI
to perform a similar task by running ai-installhost and ai-remote-power-control command on
11
aiadm. Physical machines must be booted over the network and to start the installation we have
to reboot the physical machine that was added as a new host in Foreman.
5.5 Visualizing reports from Puppet
Foreman allows us to visualize reports from Puppet runs on each individual physical node in our
Neutrino cluster. The reports tab shown in the image allows to zoom in to the activity of the
nodes, view applied, restarted, failed, skipped and pending events.
Foreman provides a way to view the host configuration chart and the status of hosts in the
cluster. Among other things it shows hosts that had performed modifications without error, hosts
in error state, good host reports in the last 1 day, hosts that had pending changes, out of sync
hosts, hosts with no reports and finally hosts with alerts disabled. Run distribution graph shows
the number of clients over time in a space of 1500 minutes.
12
6. Conclusion
The Neutrino computing cluster has a noticeable development progress as now most of the nodes
which were bare metals are now running the CERN CentOS 7 (CC7) operating system. The
configuration management system of the cluster was done based on Puppet and Foreman for its
reporting capabilities. Part of the work or tasks to be done in the future for this cluster is to setup
a notification system to alert users and/or admins of the Neutrino system in real time information
about the nodes and jobs. The batch system for the Neutrino cluster is HTCondor which is a
specialized workload management system for compute-intensive jobs. A cluster monitoring
software will be configured to monitor the neutrino cluster as it has the capability to scale as we
envision to add more nodes to the cluster.
13
7. References
CERN. (2017). CERN Configuration Management System User Guide. [online] Available at:
https://configdocs.web.cern.ch/configdocs/ [Accessed 9 Aug. 2017].
CERN. (2017). DpmSetupPuppetInstallation. [online] Available at:
https://twiki.cern.ch/twiki/bin/view/DPM/DpmSetupPuppetInstallation/ [Accessed 9 Aug. 2017].
CERN. (2017). CERN ops. [online] Available at:
https://github.com/cernops/ [Accessed 9 Aug. 2017].