Spur Infrastructure Performance With Proactive IT Monitoring
Proactive monitoring with Monit
Transcript of Proactive monitoring with Monit
Barking at daemonsAn small open source utility to monitor Unix systems with automatic error recovery capabilities.
What Monit can monitor
Files, Dirs and Filesystems
Monitor these items for changes, such as timestamps changes, checksum changes or size changes.
Hosts
Monitor network connections to various servers, either on localhost or on remote hosts. TCP, UDP and Unix Domain Sockets are supported. Network tests can be performed on a protocol level.
System
General system resources on localhost such as overall CPU usage, Memory and Load Average.
Processes
Daemon processes or similar programs running on localhost, such as those started at system boot time from /etc/init.d/
Programs and scripts
Test programs or scripts at certain times, much like cron, but in addition, you can test the exit value of a program and perform an action or send an alert if the exit value indicates an error.
Configuration (i)
◉ Global configuration file at /etc/monitrc.◉ Sample global configuration:
○ Check services at 30 seconds intervals:
set daemon 30
# with start delay 240 # optional: delay the first check by 4-minutes (by
# # default Monit check immediately after Monit start)
Configuration (ii)
◉ Set Monit’s logfile:
◉ Mail configuration:
set logfile /var/log/monit.log
set mailserver localhost
# By default Monit will drop alert events if no mail servers are available.
# If you want to keep the alerts for later delivery retry, you can use the
# EVENTQUEUE statement.
set eventqueue
basedir /var/monit # set the base directory where events will be stored
slots 100 # optionally limit the queue size
Configuration (iii)
## Alert email recipient:
set alert [email protected]
## Alert email format:
set mail-format {
from: monit@$HOST
subject: monit alert -- $EVENT $SERVICE
message: $EVENT Service $SERVICE
Date: $DATE
Action: $ACTION
Host: $HOST
Description: $DESCRIPTION
Your faithful employee,
Monit
}
Configuration (iv)
◉ HTTP interface:
◉ Additional configuration files:
set httpd port 2812 and
allow admin:monit # require user 'admin' with password 'monit'
include /etc/monit.d/*
Basic commands (i)
Controlled from command line with the command monit:◉ Start Monit daemon: $ monit◉ Exit Monit: $ monit quit◉ Status summary: $ monit summary◉ Disable monitoring of a named service or all services:
$ monit unmonitor name
$ monit unmonitor all
◉ Enable monitoring:
$ monit monitor name
$ monit monitor all
Basic commands (ii)
◉ Start named service or all services:
$ monit start name
$ monit start all
◉ Stop named service or all services:
$ monit stop name
$ monit stop all
◉ Restart named service or all services:
$ monit restart name
$ monit restart all
Proactive process monitoring
check process tomcat-8 with pidfile /var/run/tomcat-8.pid
start program = “/etc/init.d/tomcat-8 start”
stop program = “/etc/init.d/tomcat-8 stop”
Restart process if it has stopped accepting connections
check process tomcat-8 with pidfile /var/run/tomcat-8.pid
start program = “/etc/init.d/tomcat-8 start”
stop program = “/etc/init.d/tomcat-8 stop”
restart program = “/etc/init.d/tomcat-8 restart”
if failed port 8080 protocol http then restart
Restart process if it has stopped accepting connections avoiding false positives
check process tomcat-8 with pidfile /var/run/tomcat-8.pid
start program = “/etc/init.d/tomcat-8 start”
stop program = “/etc/init.d/tomcat-8 stop”
restart program = “/etc/init.d/tomcat-8 restart”
if failed port 8080 protocol http for 2 cycles then restart
Check process response to requests
check process apache with pidfile /usr/local/apache/logs/httpd.pid
start program = "/etc/init.d/httpd start"
stop program = "/etc/init.d/httpd stop"
if failed host www.tildeslash.com port 80 protocol http
and request "/somefile.html"
then restart
if failed port 443 type tcpssl protocol http
with timeout 15 seconds
then restart
Avoid noisy alarms
check process apache with pidfile /usr/local/apache/logs/httpd.pid
start program = "/etc/init.d/httpd start"
stop program = "/etc/init.d/httpd stop"
if failed host www.tildeslash.com port 80 protocol http
and request "/somefile.html"
then restart
if failed port 443 type tcpssl protocol http
with timeout 15 seconds
then restart
if 3 restarts within 5 cycles then unmonitor
Check resources used by process (e.g. DoS attacks)
check process apache with pidfile /usr/local/apache/logs/httpd.pid
start program = "/etc/init.d/httpd start" with timeout 60 seconds
stop program = "/etc/init.d/httpd stop"
if cpu > 60% for 2 cycles then alert
if cpu > 80% for 5 cycles then restart
if totalmem > 200.0 MB for 5 cycles then restart
if children > 250 then restart
if loadavg(5min) greater than 10 for 8 cycles then stop
if failed host www.tildeslash.com port 80 protocol http
and request "/somefile.html"
then restart
if failed port 443 type tcpssl protocol http
with timeout 15 seconds
then restart
if 3 restarts within 5 cycles then unmonitor
Monitor filesystem space and inode usage
check filesystem datafs with path /dev/sdb1
start program = "/bin/mount /data"
stop program = "/bin/umount /data"
if space usage > 80% for 5 times within 15 cycles then alert
if space usage > 99% then stop
if inode usage > 30000 then alert
if inode usage > 99% then stop
Monitor file checksum (e.g. rootkits)
check file apache with path /usr/sbin/httpd
if failed checksum then alert
if failed uid root then alert
if failed gid root then alert
if failed permission 755 then alert
Monitor a directory that should change
check directory incomming with path /var/data/ftp
if timestamp > 1 hour then alert
Check network interface status
check network eth0 with interface eth0
start program = '/etc/init.d/net.eth0 start'
stop program = '/etc/init.d/net.eth0 stop'
if failed link then restart
Check network link capacity changes
check network eth0 with interface eth0
if changed link capacity then alert
Check network link usage (saturation, bandwidth)
check network eth0 with interface eth0
if saturation > 90% then alert
if upload > 500 kB/s then alert
if total download > 1 GB in last 2 hours then alert
if total download > 10 GB in last day then alert
Check remote host availability by issuing a ping test
check host osoco.es with address osoco.es
if failed ping then alert
Check the content of a response from a web server
check host myserver with address 192.168.1.1
if failed port 80 protocol http
and request /some/path with content = "a string"
then alert
Check connection with custom protocol (MySQL)
check host databaserver with address 192.168.1.1
if failed ping then alert
if failed
port 3306
protocol mysql username foo password bar
then alert
Check custom program status output
check program myscript with path /usr/local/bin/myscript.sh
if status != 0 then alert
Check custom program every workday at 8AM
check program checkOracleDatabase
with path /var/monit/programs/checkoracle.pl
every "* 8 * * 1-5"
Check service dependencies before start/stop/monitor/unmonitor
check process apache
with pidfile "/usr/local/apache/logs/httpd.pid"
...
depends on httpd
check file httpd with path /usr/local/apache/bin/httpd
if failed checksum then unmonitor
Hierarchy of dependenciescheck process apache
...
depends on tomcat
check process tomcat
...
depends on mysql
check process mysql
...
depends on datafs
check filesystem datafs with path /dev/sdb1
start program = "/bin/mount /data"
stop program = "/bin/umount /data"
One interface to rule them all
◉ M/Monit: ○ Monitoring and
management of all your Monit hosts.
○ Also works on mobile devices.
○ A one-time payment
and the license is perpetual.
One interface to rule them all
◉ Monittr: ○ https://github.com/karmi/monittr○ Free and very basic option.
Thanks!
This work is licensed under a Creative Commons Attribution 4.0 International License.
You can find me at
◉ @rafael_luque◉ [email protected]
Cover photo licensed by Edward Conte under a Creative Commond by-nc license: https://www.flickr.com/photos/edwardconde/11447139646/