Munin and Nagios - NETWAYS · Munin and Nagios Integration with Nagios 2010-10-07 26 / 53 Nagios...

53
Munin and Nagios 2010-10-07 1 / 53 Munin and Nagios Stig Sandbeck Mathisen Redpill Linpro AS 2010-10-07 PRODUCTS DEVELOPMENT APPLICATION MANAGEMENT IT OPERATIONS SUPPORT TRAINING

Transcript of Munin and Nagios - NETWAYS · Munin and Nagios Integration with Nagios 2010-10-07 26 / 53 Nagios...

Munin and Nagios 2010-10-07 1 / 53

Munin and Nagios

Stig Sandbeck Mathisen

Redpill Linpro AS

2010-10-07

PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING

Munin and Nagios Introduction 2010-10-07 2 / 53

Outline

1 Introduction2 The Munin project3 Munin master and node4 Munin plugins5 Integration with Nagios6 Example graph set7 New features in Munin 1.48 New features in Munin 2.09 Speeding up munin10 The end

PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING

Munin and Nagios Introduction 2010-10-07 3 / 53

About the speaker

System Administrator - 12 years with open source based systemadministration. 5 years at Redpill Linpro.

Debian Developer - varnish, munin, puppet, facter

PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING

Munin and Nagios Introduction 2010-10-07 4 / 53

About Redpill LinproLeading Nordic provider of professional Open Source services - across thestack.

Presence in Denmark, Finland, Norway and Sweden

Hosting and training facilities in Oslo and Karlstad.

190 employees in Gothenburg, Helsinki, Karlstad, Oslo, Stavanger andStockholm.

More than 300 customers across the Nordic countries, 60% in the“enterprise tier”

15 years in the Open Source business.

PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING

Munin and Nagios The Munin project 2010-10-07 5 / 53

Outline

1 Introduction2 The Munin project3 Munin master and node4 Munin plugins5 Integration with Nagios6 Example graph set7 New features in Munin 1.48 New features in Munin 2.09 Speeding up munin10 The end

PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING

Munin and Nagios The Munin project 2010-10-07 6 / 53

Introduction to Munin

Munin is a networked resource monitoring tool. It gathers data from yoursystems, creates lots of graphs, and...

can show you trends

can help you predict bottlenecks

can show you old data for comparison with current numbers

can send events to other systems

PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING

Munin and Nagios The Munin project 2010-10-07 7 / 53

History

Originally called LRRD, about the time Nagios was still known as Netsaint.The old code is still available as the “LRRD” project at SourceForge.2002: Started by Linpro2004: 1.0 released, development moved from CVS to Subversion2005: 1.2 released2009: 1.4 released

PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING

Munin and Nagios Munin master and node 2010-10-07 8 / 53

Outline

1 Introduction2 The Munin project3 Munin master and node4 Munin plugins5 Integration with Nagios6 Example graph set7 New features in Munin 1.48 New features in Munin 2.09 Speeding up munin10 The end

PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING

Munin and Nagios Munin master and node 2010-10-07 9 / 53

Master

The munin master runs from cron. It runs four jobs, each with its ownlock. A new cron job can begin while the previous already runs.

munin-update: Contacts eachnode, and retrieves pluginconfiguration and data

munin-graph: Creates graphsfrom RRD files

munin-limits: Checks for limitbreaches

munin-html: Updates theHTML documents

PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING

Munin and Nagios Munin master and node 2010-10-07 10 / 53

Node

The munin node listens for connections on 4949/TCP, and runs plugins onrequest. It runs each plugin when the master asks, to retrieveconfiguration and values.

The node uses Net::Server

Plugins are commonly written in shell, perl, python, ruby, awk...

SNMP plugins runs on a node, and queries other hosts

PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING

Munin and Nagios Munin master and node 2010-10-07 11 / 53

Wire protocolThe wire protocol is a simple and in clear text. Keywords are “list”,“config” and “fetch”.

Example

# munin node at puppet1.example.org

list

apache_accesses apache_processes [...] uptime users vmstat

config uptime

graph_title Uptime

[...]

.

fetch uptime

uptime.value 148.03

.

quitPRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING

Munin and Nagios Munin plugins 2010-10-07 12 / 53

Outline

1 Introduction2 The Munin project3 Munin master and node4 Munin plugins5 Integration with Nagios6 Example graph set7 New features in Munin 1.48 New features in Munin 2.09 Speeding up munin10 The end

PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING

Munin and Nagios Munin plugins 2010-10-07 13 / 53

What is a munin plugin?

A plugin is a standalone executable.It is run by the Munin Node whenthe Munin Master connects.The plugin prints a clear textkey/value list of configuration andvalues on STDOUT.

plugins

Most plugins are shell scripts,but many are made in perl,python or ruby.

PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING

Munin and Nagios Munin plugins 2010-10-07 14 / 53

Plugin design

Munin plugins are designed to besimple to develop.If run with the argument “config”, itdisplays its configuration.If run with no arguments, it outputsits values.

Magic markers inside the plugin listsother capabilities, like the optionalarguments “autoconf” and“suggest”.

Magic markers

#%# capabilities=autoconf

PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING

Munin and Nagios Munin plugins 2010-10-07 15 / 53

Plugin configuration

Each plugin have a sensible defaultconfiguration.You can configure plugins in/etc/munin/plugin-conf.d/ onthe munin node.You can also configure plugins inmunin.conf on the munin master.

Configuration items

user (default is “nobody”)

group (default is “nogroup”)

Environment variables for theplugin

PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING

Munin and Nagios Munin plugins 2010-10-07 16 / 53

Example

#!/bin/sh

case "$1" in

config)

echo ’graph_title System Boredom Index’

echo ’graph_vlabel boredom in %’

echo ’time.label Total time’

echo ’bored.value Bored time’

;;

*)

awk ’{printf "time.value %d\n"

"bored.value %d\n", $1, $2}’ /proc/uptime

;;

esac

PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING

Munin and Nagios Munin plugins 2010-10-07 17 / 53

Boring example graph

A single line is described as a label,and given a value.

Several lines can be combined ininteresting ways with CDEF.

Example

time.graph no

bored.graph no

foo.cdef time,100,*,bored,/

foo.label Part time bored silly

PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING

Munin and Nagios Integration with Nagios 2010-10-07 18 / 53

Outline

1 Introduction2 The Munin project3 Munin master and node4 Munin plugins5 Integration with Nagios6 Example graph set7 New features in Munin 1.48 New features in Munin 2.09 Speeding up munin10 The end

PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING

Munin and Nagios Integration with Nagios 2010-10-07 19 / 53

Munin events

Munin generates events whenever a valuerises above, or sinks below a predefined limit.Many plugins support limits, but some donot set them by default.

Example

[cpu]

env.iowait_warning 5

env.iowait_critical 20

PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING

Munin and Nagios Integration with Nagios 2010-10-07 20 / 53

Munin and Nagios events

Munin ships with usable defaults to sendevents to Nagios.Events...

correspond with Nagios events

are processed by a templating system

are sent to a defined contact

Supported events

CRITICAL

WARNING

UNKNOWN

OK

PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING

Munin and Nagios Integration with Nagios 2010-10-07 21 / 53

Integration with Nagios

The contact.nagios.text template isdefined by default in munin.Use the contact.nagios.command

configuration setting to send events.

where?

This is configured in/etc/munin/munin.conf

PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING

Munin and Nagios Integration with Nagios 2010-10-07 22 / 53

Configuration for the munin master

On the munin master, We define a contact for nagios, using send nsca,and configure send nsca to encrypt the messages.

example munin.conf

contact.nagios.command /usr/sbin/send_nsca \

-H nagios.example.com -c /etc/send_nsca.cfg

example send nsca.conf

password="I like cheese!"

encryption_method=8

PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING

Munin and Nagios Integration with Nagios 2010-10-07 23 / 53

Configuration for the nagios server

On the nagios side, we set up nsca to receive events from our muninmasters.

Example part of nsca.conf

server_port=5667

nsca_user=nagios

nsca_group=nagios

command_file=/var/run/nagios/nagios.cmd

password="I like cheese!"

decryption_method=8

PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING

Munin and Nagios Integration with Nagios 2010-10-07 24 / 53

Configuration for the nagios server

Nagios, by default, do not accept external commands. We need toconfigure this to enable nsca to send the events to Nagios.

Example part of nagios.conf

check_external_commands=1

log_external_commands=1

command_file=/var/run/nagios/nagios.cmd

PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING

Munin and Nagios Integration with Nagios 2010-10-07 25 / 53

Nagios service configuration

To accept munin services, you will need a passive service check.

example nagios service

passive_checks_enabled 1

active_checks_enabled 0

max_check_attempts 1

PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING

Munin and Nagios Integration with Nagios 2010-10-07 26 / 53

Nagios service configuration

For services we have not heard from in a while, we use the “freshness”feature to set the service state to UNKNOWN. Munin reports state on allservices every 24 hours by default, so a bit more than this is a sensibledefault.

example nagios service

check_command check_passive_timeout

check_freshness 1

freshness_threshold 93600 # 26 hours

normal_check_interval 604800 # 1 week

The check passive timeout command is a dummy script that alwaysreturn state UNKNOWN.

PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING

Munin and Nagios Integration with Nagios 2010-10-07 27 / 53

Integration with other systems

You can define your own contact text and command to interface with anysystem you need. Both the text and the command can use aText::Balanced template to get event data from Munin.

Command run defined in contact.example.command

Text for STDIN defined in contact.example.text

Default STDIN text is defined in: contact.default.text

PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING

Munin and Nagios Integration with Nagios 2010-10-07 28 / 53

Text::Balanced templates

The command and text templates use Text::Balanced. This providesvariables, tests and loops.

Example

[${var:group};${var:host}] -> ${var:graph_title} ->

warnings: ${loop<,>:wfields ${var:label}=${var:value}} /

criticals: ${loop<,>:cfields ${var:label}=${var:value}}

Example

[example.com;foo] -> HDD temperature -> warnings:

sde=29.00,sda=26.00,sdc=25.00,sdd=26.00,sdb=26.05

/ criticals:

PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING

Munin and Nagios Integration with Nagios 2010-10-07 29 / 53

Example: mail

Add the following to /etc/munin/munin.conf:

Example

contact.mail.command /usr/bin/mail -s "[...]" [email protected]

If contact.mail.text is not defined, it will fall back tocontact.default.text.The “[...]” should be replaced with a Text::Balanced template toprovide a useful subject.For mail, you do not need to provide a contact.mail.text, sincecontact.default.text is designed as a mail template.

PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING

Munin and Nagios Example graph set 2010-10-07 30 / 53

Outline

1 Introduction2 The Munin project3 Munin master and node4 Munin plugins5 Integration with Nagios6 Example graph set7 New features in Munin 1.48 New features in Munin 2.09 Speeding up munin10 The end

PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING

Munin and Nagios Example graph set 2010-10-07 31 / 53

Example graph set

Peak in network connections andtraffic. We look for correspondinggraphsNote: logarithmic scale on the“netstat” graph ensures detail is notlost to peaks.

PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING

Munin and Nagios Example graph set 2010-10-07 32 / 53

Example graph set

Apache HTTPD has a correspondingpeak in traffic.In this case this is a customerpublishing a new version of theirdevelopment tool.

PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING

Munin and Nagios Example graph set 2010-10-07 33 / 53

Example graph set

We check the server health. Do wehave a bottleneck? It does not looklike it, but there is a strange bump inI/O latency and utilisation notrelated to the network traffic. Also,the periods of 100% utilisation meanswe have to talk the customer intogetting faster disks or more RAM.

PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING

Munin and Nagios Example graph set 2010-10-07 34 / 53

Example graph set

Found more corresponding graphs:“MySQL”. We notice no cachedSELECTs? Contact the customer?No corresponding network traffic -looks like a local job. Probably worthmentioning to the customer, butdon’t panic.

PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING

Munin and Nagios New features in Munin 1.4 2010-10-07 35 / 53

Outline

1 Introduction2 The Munin project3 Munin master and node4 Munin plugins5 Integration with Nagios6 Example graph set7 New features in Munin 1.48 New features in Munin 2.09 Speeding up munin10 The end

PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING

Munin and Nagios New features in Munin 1.4 2010-10-07 36 / 53

Released in December 2009, after three months of solid developmentfollowing four years of having the project in “maintenance mode”.

23 committers, with contributions from many more

1500 changesets

100 new plugins, including JVM profiling plugins.

TLS / SSL support

Better SNMP Support

Multigraph plugins

Documentation improvements, including good per-plugindocumentation

Better non-Linux support

PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING

Munin and Nagios New features in Munin 1.4 2010-10-07 37 / 53

Multigraph plugins

“You are in a maze of twisty littlegraphs, all alike”. Multigraph pluginscreate a tree of graphs, all from oneplugin.

nested graphs

fast plugins

new scaling issues on the master

PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING

Munin and Nagios New features in Munin 2.0 2010-10-07 38 / 53

Outline

1 Introduction2 The Munin project3 Munin master and node4 Munin plugins5 Integration with Nagios6 Example graph set7 New features in Munin 1.48 New features in Munin 2.09 Speeding up munin10 The end

PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING

Munin and Nagios New features in Munin 2.0 2010-10-07 39 / 53

Version numbers

Starting with “munin 2.0”, the project has changed what the versionnumbers signify.

Major version - new features

Minor version - bug fixes

PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING

Munin and Nagios New features in Munin 2.0 2010-10-07 40 / 53

Asynchronous proxy node

This node contacts munin-node periodically, andstores the result. The master connects to the proxynode, and retrieves stored results.The master does not have to wait for nodes torespond.This will increase peak write loads on the master.

Munin master

Server

Async proxy node

Munin node

PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING

Munin and Nagios New features in Munin 2.0 2010-10-07 41 / 53

SSH transport

With Munin 1.4, we have SSH tunnelling. Starting with Munin 2.0, wehave a native SSH transport.

[old-style-host]

address host.example.com

[new-style-host]

address ssh://[email protected]:\

/path/to/stdio-enabled-node --params

For now, this requires the use of the “asynchronous proxy node”, to limitthe privileges of the SSH node.

PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING

Munin and Nagios New features in Munin 2.0 2010-10-07 42 / 53

Zooming graphs

No longer locked to specific time periods, you can now drill down in thegraphs to look at interesting time periods.

PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING

Munin and Nagios New features in Munin 2.0 2010-10-07 43 / 53

Multi master

With the “Asynchronous Proxy Node”, wecan also support multi master setupswithout conflicts.One master for the customer, one for thehosting provider.

Munin node

Munin master

(provider)

Munin master

(customer)

Munin node

Munin node

Munin node

PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING

Munin and Nagios New features in Munin 2.0 2010-10-07 44 / 53

Unresolved issues in 2.0

There is a fair bit of work left before 2.0 can be released.

Performance and scaling

Functional and pretty HTML (got functional so far)

Whatever we broke in 1.4...

PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING

Munin and Nagios Speeding up munin 2010-10-07 45 / 53

Outline

1 Introduction2 The Munin project3 Munin master and node4 Munin plugins5 Integration with Nagios6 Example graph set7 New features in Munin 1.48 New features in Munin 2.09 Speeding up munin10 The end

PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING

Munin and Nagios Speeding up munin 2010-10-07 46 / 53

Scaling munin

On a 1 CPU (2 threads) system, 65k RRDfiles are...

Updated in 1 minute

Graphed in 40 minutes

The cron job runs every 5 minutes; Houston,we have a problem.

PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING

Munin and Nagios Speeding up munin 2010-10-07 47 / 53

FastCGI

FastCGI to the rescue!This gives us graphs on demand. When we have a large number of graphs,it makes little sense to update them all.The “munin 2.0” CGI grapher can make zoom-able graphs for any timeperiod.We still need a CGI HTML generator to reduce resource usage for largegraph sets. This is not implemented yet.

PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING

Munin and Nagios Speeding up munin 2010-10-07 48 / 53

Storage tuning

Enough RAM for file system caching

Everything on SSD

OpenSolaris / FreeBSD: ZFSZIL/L2ARC

Linux: FlashCache

Turn off atime

Linux ext3/ext4: mount optiondata=journal

PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING

Munin and Nagios The end 2010-10-07 49 / 53

Outline

1 Introduction2 The Munin project3 Munin master and node4 Munin plugins5 Integration with Nagios6 Example graph set7 New features in Munin 1.48 New features in Munin 2.09 Speeding up munin10 The end

PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING

Munin and Nagios The end 2010-10-07 50 / 53

Munin links

http://munin-monitoring.org/ - project, documentation, bugtracker, svn source code access

http://exchange.munin-monitoring.org/ Extra plugins forMunin contributed by the Munin community

http://munin-monitoring.org/wiki/HowToContactNagios

In-depth configuration examples for Munin / Nagios integration

http://munin-monitoring.org/wiki/MuninAlertVariables

What to put in the contact.example.text when making templates

PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING

Munin and Nagios The end 2010-10-07 51 / 53

Munin book

Gabriele Pohl and Michael Rennerhave published an entire, thoroughlywritten book on Munin in German:“Munin - Graphisches Netzwerk- undSystem-Monitoring” (ISBN978-3-937514-48-2), published byOpen Source Press in cooperationwith Linpro.

PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING

Munin and Nagios The end 2010-10-07 52 / 53

Questions from the audience?

? ? ? ??? ? ?? ??? ? ? ? ? ? ? ? ??? ?? ? ? ??? ?? ??? ? ?? ? ? ? ?? ?

I was going to write “If I timed thiscorrectly, we should now have around10 minutes for questions”.I decided against it. What if I missedthe time? That would beembarrassing.

PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING

Munin and Nagios The end 2010-10-07 53 / 53

Thank you for listening...and I hope to see you at the next OSMC.

PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING