Jason Stanley, Secure-24 - Own IT Through Proactive IT Monitoring
Transcript of Jason Stanley, Secure-24 - Own IT Through Proactive IT Monitoring
© 2016 All Rights ReservedCONFIDENTIAL #GALAXZ16 1
#GALAXZ16
OwnIT Through Proactive MonitoringQuis custodiet ipsos custodes?
Who will monitor the monitors themselves?@jstanley232
Jason StanleyEnterprise Monitoring Engineer @Secure_24
[email protected]/jstanley23Zenoss Community Forums/IRC: jstanley
© 2016 All Rights ReservedCONFIDENTIAL #GALAXZ16 2
Secure-24 has 15 years of experience delivering managed IT operations, application hosting and cloud services to enterprises worldwide. We manage SAP, Oracle, Hyperion, JD Edwards, and other mission critical applications across all industries and for businesses of every size. Our industry-leading client satisfaction rates result from lowering IT operational costs and our relentless focus on superior service and support.
© 2016 All Rights ReservedCONFIDENTIAL #GALAXZ16 3
Zenoss is the primary monitoring tool for infrastructure, client devices and applications.
© 2016 All Rights ReservedCONFIDENTIAL #GALAXZ16
Replaced other monitoring platforms with Zenoss
• Oracle Enterprise Manager
• Solarwinds
• Nimsoft
• Nagios
• Tidal
© 2016 All Rights ReservedCONFIDENTIAL #GALAXZ16
Primary Zenoss environment
• Zenoss 4.2.5 RPS 538
• 100+ ZenPacks
• 9k+ devices
• 1.7m+ data points
• Dedicated servers• 3 dedicated Hubs
• 16 dedicated multi-tenant collectors
• 9 customer dedicated collectors
© 2016 All Rights ReservedCONFIDENTIAL #GALAXZ16 6
Monitoring from within
Zenoss provides a lot of built-in self monitoring and additional ZenPacks.
Zenoss Daemons
› Processes
› Heartbeats
Zenoss Toolbox Scans
Tracebacks and exceptions
ZenPacks
› ZenPacks.zenoss.MySqlMonitor
› ZenPacks.Zenoss.RabbitMQ
› ZenPacks.Zenoss.Memcached
© 2016 All Rights ReservedCONFIDENTIAL #GALAXZ16 7
Daemon monitoring
Built-in Methods
Process› Most daemon processes are already added
› Polls every 3 minutes
› Monitors CPU, memory, and count
/Status/Heartbeat› Takes longer to spawn event than processes
› Can signify issues with the daemon or hub
Note:› Verify new daemons are added to processes
› Heartbeats are same instance only
© 2016 All Rights ReservedCONFIDENTIAL #GALAXZ16 8
Zenoss ZenPacks
ZenPacks.zenoss.MySqlMonitor *› Critical to monitor up/down
› Primary use internal is graphs and trending
ZenPacks.Zenoss.RabbitMQ *› Critical to monitor up/down
› Primary use internal is graphs and trending
ZenPacks.Zenoss.Memcached› Can be monitoring internally for up/down
› Can have negative user experience if down
*Should monitor externally
© 2016 All Rights ReservedCONFIDENTIAL #GALAXZ16 9
Zenoss Toolbox Scans and Exceptions Events
https://github.com/zenoss/zenoss.toolbox
Setup scans in crontab to set and forget
All toolbox scans now create events!
Warning:› Do not run zencatalogscan –f without
zenrelationscan and findposkeyerror coming back clean first.
Exceptions and tracebacks
Modelers, datasources and templates can error out
Check your events for sneaky errors:› Message: traceback
› Message: exception
TALES exceptions will come in under the Hub’s full name and is a single event.
© 2016 All Rights ReservedCONFIDENTIAL #GALAXZ16
Event Monitoring
Event flow in Zenoss is one of the more important aspects of the tool. Without events, you will not be alerted to any issues in your environments.
For this reason, we place a special need on monitoring this aspect.
© 2016 All Rights ReservedCONFIDENTIAL #GALAXZ16 11
Monitoring from afar
We focus on monitoring Zenoss event flow from a remote location. In case Zenoss goes down, we will still get alerted.
Zenoss Webserver
RabbitMQ
› rawevents
› zenevents
› signal
Zeneventserver
Synthetic Event Checks
› zeneventd Event processing and transforms
› Zeneventserver Changing event state
© 2016 All Rights ReservedCONFIDENTIAL #GALAXZ16 12
Web (Http) checks
Both zenwebserver and zeneventserver can be monitored with a simple http check. zenwebserver
› Http check to 8080 to the Dashboard URL with a regex /zport/dmd/Dashboard
zeneventserver› Http check to 8084 to hit the zeneventserver API
/zeneventserver/api/1.0/events
© 2016 All Rights ReservedCONFIDENTIAL #GALAXZ16 13
RabbitMQ
Very important to monitor RabbitMQ queues. If something happens with RabbitMQ, event processing is compromised in Zenoss.
For this reason, we will monitor the queues remotely. Alerting on anything above a certain threshold.*
* This threshold should be set depending on your environment.
We see 3 queues are the most important.› rawevents
Where raw events from the collectors are sent
› zenevents After events are processed by zeneventd, they are sent here for
zeneventserver
› signal Events that are true for any trigger and need to be processed by
a notification are sent here for zenactiond to process.
© 2016 All Rights ReservedCONFIDENTIAL #GALAXZ16 14
Synthetic Checks
Pre-existing event check› Checks the functionality of zeneventserver
by Acknowledging a pre-existing event *
Un-acknowledging a pre-existing event *
› Verifies the following is up and running: ZenDS
zeneventserver
zenwebserver
› Only uses a single event, if the event is closed a new one must be created• Script can be used to create event for you and provide the event
ID to use
New event check› Checks the Zenoss event process by:
Opening a new event
Finding new event
Verifying event was modified by transform
Closing event
Verifying event was closed
› Verifies the following is up and running: ZenDS
zenwebserver
zeneventd
zeneventserver
› Creates a new event each and every time
© 2016 All Rights ReservedCONFIDENTIAL #GALAXZ16
© 2016 All Rights ReservedCONFIDENTIAL #GALAXZ16
Take Aways
The script we use for monitoring can be found on the community wiki or on github.com
Along with documentation on how to use it.
http://wiki.zenoss.org/Monitoring_Zenoss
https://github.com/jstanley23/MonitoringZenoss
© 2016 All Rights ReservedCONFIDENTIAL #GALAXZ16
Question me this