David Gildeh (CEO of Dataloop) - Monitoring Nightmares for DevOps at #DOXLON
Transcript of David Gildeh (CEO of Dataloop) - Monitoring Nightmares for DevOps at #DOXLON
Monitoring Nightmares for DevOps(AKA What we learnt about Monitoring from talking to over 60 companies)
Our Story
Steven Acreman(CTO)
David Gildeh(CEO)
Colin Hemmings(Chief Architect)
Our Monitoring Nightmare
Application
MySQL Database
OpsView (Nagios)
Logstash ElasticSearch Kibana
AppDynamicsPingdom
GraphiteCollectD
PagerDuty
Amazon AWS
Alfresco JVM
SOLR
Transformations
BrowserGoogle Analytics
Custom ScriptsReporting SystemSQL DB’s
Mixpanel GoSquared
Geckoboard
Lets start a monitoring company and dress like
The Apprentice!
#MONITORINGSUCKS
Sooooo 2011…
Our Sample
The Results
http://blog.dataloop.io/2014/01/30/what-we-learnt-talking-to-60-companies-about-monitoring/
Still Dominated by Nagios & Open-Source
How Tools Change with No. of Servers
Typical Monitoring Stack
Is my site up or down? (External)
What happened? (Logs)
How is my application performing? (APM)
What’s my app actually doing?(Custom Metrics)
Is everything working as expected?(Service)
Dashing(Custom Dashboards)
Nightmare 1: Everyone’s building a Kit Car
StatsD
Nightmare 2: Scaling the Kit Car
Solution 1: SaaS Monitoring Tools
Nightmare 3: Too Many Metrics
Solution 2: Anomaly Detection
Nightmare 4: Spammy Alerts
Solution 3: Alert Best Practices
• Only alert on actionable metrics• Multi-Condition Alert Rules
• Alert Handlers• Con!gurable Nagging
Nightmare 5: Continuous Deployment
Solution 4: Agile Monitoring
• AUTOMATION!• Con!guration Management• Tagging
GOAL: Minimize time/complexity to add & edit checks & alerts
Nightmare 6: Data Silos
Solution 5: Make Data Visible
Nightmare 6: Monitoring Micro-Services
Nightmare 7: Adoption outside Ops
AnythingElse
Solution 6: Self-Service Monitoring
• Nice UI/pretty web interfaces• Simple – no manual required• Account Model
#MONITORINGBLISS