Jason Stanley, Secure-24 - Own IT Through Proactive IT Monitoring

17
© 2016 All Rights Reserved CONFIDENTIAL #GALAXZ16 #GALAXZ1 6 OwnIT Through Proactive Monitoring Quis custodiet ipsos custodes? Who will monitor the monitors themselves? @jstanley232 1 Jason Stanley Enterprise Monitoring Engineer @Secure_24 [email protected] Github.com/jstanley23 Zenoss Community Forums/IRC : jstanley

Transcript of Jason Stanley, Secure-24 - Own IT Through Proactive IT Monitoring

Page 1: Jason Stanley, Secure-24 - Own IT Through Proactive IT Monitoring

© 2016 All Rights ReservedCONFIDENTIAL #GALAXZ16 1

#GALAXZ16

OwnIT Through Proactive MonitoringQuis custodiet ipsos custodes?

Who will monitor the monitors themselves?@jstanley232

Jason StanleyEnterprise Monitoring Engineer @Secure_24

[email protected]/jstanley23Zenoss Community Forums/IRC: jstanley

Page 2: Jason Stanley, Secure-24 - Own IT Through Proactive IT Monitoring

© 2016 All Rights ReservedCONFIDENTIAL #GALAXZ16 2

Secure-24 has 15 years of experience delivering managed IT operations, application hosting and cloud services to enterprises worldwide. We manage SAP, Oracle, Hyperion, JD Edwards, and other mission critical applications across all industries and for businesses of every size. Our industry-leading client satisfaction rates result from lowering IT operational costs and our relentless focus on superior service and support.

Page 3: Jason Stanley, Secure-24 - Own IT Through Proactive IT Monitoring

© 2016 All Rights ReservedCONFIDENTIAL #GALAXZ16 3

Zenoss is the primary monitoring tool for infrastructure, client devices and applications.

Page 4: Jason Stanley, Secure-24 - Own IT Through Proactive IT Monitoring

© 2016 All Rights ReservedCONFIDENTIAL #GALAXZ16

Replaced other monitoring platforms with Zenoss

• Oracle Enterprise Manager

• Solarwinds

• Nimsoft

• Nagios

• Tidal

Page 5: Jason Stanley, Secure-24 - Own IT Through Proactive IT Monitoring

© 2016 All Rights ReservedCONFIDENTIAL #GALAXZ16

Primary Zenoss environment

• Zenoss 4.2.5 RPS 538

• 100+ ZenPacks

• 9k+ devices

• 1.7m+ data points

• Dedicated servers• 3 dedicated Hubs

• 16 dedicated multi-tenant collectors

• 9 customer dedicated collectors

Page 6: Jason Stanley, Secure-24 - Own IT Through Proactive IT Monitoring

© 2016 All Rights ReservedCONFIDENTIAL #GALAXZ16 6

Monitoring from within

Zenoss provides a lot of built-in self monitoring and additional ZenPacks.

Zenoss Daemons

› Processes

› Heartbeats

Zenoss Toolbox Scans

Tracebacks and exceptions

 ZenPacks

› ZenPacks.zenoss.MySqlMonitor

› ZenPacks.Zenoss.RabbitMQ

› ZenPacks.Zenoss.Memcached

Page 7: Jason Stanley, Secure-24 - Own IT Through Proactive IT Monitoring

© 2016 All Rights ReservedCONFIDENTIAL #GALAXZ16 7

Daemon monitoring

Built-in Methods

Process› Most daemon processes are already added

› Polls every 3 minutes

› Monitors CPU, memory, and count

/Status/Heartbeat› Takes longer to spawn event than processes

› Can signify issues with the daemon or hub

Note:› Verify new daemons are added to processes

› Heartbeats are same instance only

Page 8: Jason Stanley, Secure-24 - Own IT Through Proactive IT Monitoring

© 2016 All Rights ReservedCONFIDENTIAL #GALAXZ16 8

Zenoss ZenPacks

ZenPacks.zenoss.MySqlMonitor *› Critical to monitor up/down

› Primary use internal is graphs and trending

ZenPacks.Zenoss.RabbitMQ *› Critical to monitor up/down

› Primary use internal is graphs and trending

ZenPacks.Zenoss.Memcached› Can be monitoring internally for up/down

› Can have negative user experience if down

*Should monitor externally

Page 9: Jason Stanley, Secure-24 - Own IT Through Proactive IT Monitoring

© 2016 All Rights ReservedCONFIDENTIAL #GALAXZ16 9

Zenoss Toolbox Scans and Exceptions Events

https://github.com/zenoss/zenoss.toolbox

Setup scans in crontab to set and forget

All toolbox scans now create events!

Warning:› Do not run zencatalogscan –f without

zenrelationscan and findposkeyerror coming back clean first.

Exceptions and tracebacks

Modelers, datasources and templates can error out

Check your events for sneaky errors:› Message: traceback

› Message: exception

TALES exceptions will come in under the Hub’s full name and is a single event.

Page 10: Jason Stanley, Secure-24 - Own IT Through Proactive IT Monitoring

© 2016 All Rights ReservedCONFIDENTIAL #GALAXZ16

Event Monitoring

Event flow in Zenoss is one of the more important aspects of the tool. Without events, you will not be alerted to any issues in your environments.

For this reason, we place a special need on monitoring this aspect.

Page 11: Jason Stanley, Secure-24 - Own IT Through Proactive IT Monitoring

© 2016 All Rights ReservedCONFIDENTIAL #GALAXZ16 11

Monitoring from afar

We focus on monitoring Zenoss event flow from a remote location. In case Zenoss goes down, we will still get alerted.

Zenoss Webserver

RabbitMQ

› rawevents

› zenevents

› signal

Zeneventserver

Synthetic Event Checks

› zeneventd Event processing and transforms

› Zeneventserver Changing event state

Page 12: Jason Stanley, Secure-24 - Own IT Through Proactive IT Monitoring

© 2016 All Rights ReservedCONFIDENTIAL #GALAXZ16 12

Web (Http) checks

Both zenwebserver and zeneventserver can be monitored with a simple http check. zenwebserver

› Http check to 8080 to the Dashboard URL with a regex /zport/dmd/Dashboard

zeneventserver› Http check to 8084 to hit the zeneventserver API

/zeneventserver/api/1.0/events

Page 13: Jason Stanley, Secure-24 - Own IT Through Proactive IT Monitoring

© 2016 All Rights ReservedCONFIDENTIAL #GALAXZ16 13

RabbitMQ

Very important to monitor RabbitMQ queues. If something happens with RabbitMQ, event processing is compromised in Zenoss.

For this reason, we will monitor the queues remotely. Alerting on anything above a certain threshold.*

* This threshold should be set depending on your environment.

We see 3 queues are the most important.› rawevents

Where raw events from the collectors are sent

› zenevents After events are processed by zeneventd, they are sent here for

zeneventserver

› signal Events that are true for any trigger and need to be processed by

a notification are sent here for zenactiond to process.

Page 14: Jason Stanley, Secure-24 - Own IT Through Proactive IT Monitoring

© 2016 All Rights ReservedCONFIDENTIAL #GALAXZ16 14

Synthetic Checks

Pre-existing event check› Checks the functionality of zeneventserver

by Acknowledging a pre-existing event *

Un-acknowledging a pre-existing event *

› Verifies the following is up and running: ZenDS

zeneventserver

zenwebserver

› Only uses a single event, if the event is closed a new one must be created• Script can be used to create event for you and provide the event

ID to use

New event check› Checks the Zenoss event process by:

Opening a new event

Finding new event

Verifying event was modified by transform

Closing event

Verifying event was closed

› Verifies the following is up and running: ZenDS

zenwebserver

zeneventd

zeneventserver

› Creates a new event each and every time

Page 15: Jason Stanley, Secure-24 - Own IT Through Proactive IT Monitoring

© 2016 All Rights ReservedCONFIDENTIAL #GALAXZ16

Page 16: Jason Stanley, Secure-24 - Own IT Through Proactive IT Monitoring

© 2016 All Rights ReservedCONFIDENTIAL #GALAXZ16

Take Aways

The script we use for monitoring can be found on the community wiki or on github.com

Along with documentation on how to use it.

http://wiki.zenoss.org/Monitoring_Zenoss

https://github.com/jstanley23/MonitoringZenoss

Page 17: Jason Stanley, Secure-24 - Own IT Through Proactive IT Monitoring

© 2016 All Rights ReservedCONFIDENTIAL #GALAXZ16

Question me this