David Gildeh (CEO of Dataloop) - Monitoring Nightmares for DevOps at #DOXLON

29
Monitoring Nightmares for DevOps (AKA What we learnt about Monitoring from talking to over 60 companies)

Transcript of David Gildeh (CEO of Dataloop) - Monitoring Nightmares for DevOps at #DOXLON

Page 1: David Gildeh (CEO of Dataloop) - Monitoring Nightmares for DevOps at #DOXLON

Monitoring Nightmares for DevOps(AKA What we learnt about Monitoring from talking to over 60 companies)

Page 2: David Gildeh (CEO of Dataloop) - Monitoring Nightmares for DevOps at #DOXLON

Our Story

Steven Acreman(CTO)

David Gildeh(CEO)

Colin Hemmings(Chief Architect)

Page 3: David Gildeh (CEO of Dataloop) - Monitoring Nightmares for DevOps at #DOXLON

Our Monitoring Nightmare

Application

MySQL Database

OpsView (Nagios)

Logstash ElasticSearch Kibana

AppDynamicsPingdom

GraphiteCollectD

PagerDuty

Amazon AWS

Alfresco JVM

SOLR

Transformations

BrowserGoogle Analytics

Custom ScriptsReporting SystemSQL DB’s

Mixpanel GoSquared

Geckoboard

Page 4: David Gildeh (CEO of Dataloop) - Monitoring Nightmares for DevOps at #DOXLON
Page 5: David Gildeh (CEO of Dataloop) - Monitoring Nightmares for DevOps at #DOXLON

Lets start a monitoring company and dress like

The Apprentice!

Page 6: David Gildeh (CEO of Dataloop) - Monitoring Nightmares for DevOps at #DOXLON

#MONITORINGSUCKS

Sooooo 2011…

Page 7: David Gildeh (CEO of Dataloop) - Monitoring Nightmares for DevOps at #DOXLON
Page 8: David Gildeh (CEO of Dataloop) - Monitoring Nightmares for DevOps at #DOXLON

Our Sample

Page 9: David Gildeh (CEO of Dataloop) - Monitoring Nightmares for DevOps at #DOXLON

The Results

http://blog.dataloop.io/2014/01/30/what-we-learnt-talking-to-60-companies-about-monitoring/

Page 10: David Gildeh (CEO of Dataloop) - Monitoring Nightmares for DevOps at #DOXLON

Still Dominated by Nagios & Open-Source

Page 11: David Gildeh (CEO of Dataloop) - Monitoring Nightmares for DevOps at #DOXLON

How Tools Change with No. of Servers

Page 12: David Gildeh (CEO of Dataloop) - Monitoring Nightmares for DevOps at #DOXLON

Typical Monitoring Stack

Is my site up or down? (External)

What happened? (Logs)

How is my application performing? (APM)

What’s my app actually doing?(Custom Metrics)

Is everything working as expected?(Service)

Dashing(Custom Dashboards)

Page 13: David Gildeh (CEO of Dataloop) - Monitoring Nightmares for DevOps at #DOXLON

Nightmare 1: Everyone’s building a Kit Car

StatsD

Page 14: David Gildeh (CEO of Dataloop) - Monitoring Nightmares for DevOps at #DOXLON

Nightmare 2: Scaling the Kit Car

Page 15: David Gildeh (CEO of Dataloop) - Monitoring Nightmares for DevOps at #DOXLON

Solution 1: SaaS Monitoring Tools

Page 16: David Gildeh (CEO of Dataloop) - Monitoring Nightmares for DevOps at #DOXLON

Nightmare 3: Too Many Metrics

Page 17: David Gildeh (CEO of Dataloop) - Monitoring Nightmares for DevOps at #DOXLON

Solution 2: Anomaly Detection

Page 18: David Gildeh (CEO of Dataloop) - Monitoring Nightmares for DevOps at #DOXLON

Nightmare 4: Spammy Alerts

Page 19: David Gildeh (CEO of Dataloop) - Monitoring Nightmares for DevOps at #DOXLON

Solution 3: Alert Best Practices

•  Only alert on actionable metrics•  Multi-Condition Alert Rules

•  Alert Handlers•  Con!gurable Nagging

Page 20: David Gildeh (CEO of Dataloop) - Monitoring Nightmares for DevOps at #DOXLON

Nightmare 5: Continuous Deployment

Page 21: David Gildeh (CEO of Dataloop) - Monitoring Nightmares for DevOps at #DOXLON

Solution 4: Agile Monitoring

•  AUTOMATION!•  Con!guration Management•  Tagging

GOAL: Minimize time/complexity to add & edit checks & alerts

Page 22: David Gildeh (CEO of Dataloop) - Monitoring Nightmares for DevOps at #DOXLON

Nightmare 6: Data Silos

Page 23: David Gildeh (CEO of Dataloop) - Monitoring Nightmares for DevOps at #DOXLON

Solution 5: Make Data Visible

Page 24: David Gildeh (CEO of Dataloop) - Monitoring Nightmares for DevOps at #DOXLON

Nightmare 6: Monitoring Micro-Services

Page 25: David Gildeh (CEO of Dataloop) - Monitoring Nightmares for DevOps at #DOXLON

Nightmare 7: Adoption outside Ops

AnythingElse

Page 26: David Gildeh (CEO of Dataloop) - Monitoring Nightmares for DevOps at #DOXLON

Solution 6: Self-Service Monitoring

•  Nice UI/pretty web interfaces•  Simple – no manual required•  Account Model

Page 27: David Gildeh (CEO of Dataloop) - Monitoring Nightmares for DevOps at #DOXLON
Page 28: David Gildeh (CEO of Dataloop) - Monitoring Nightmares for DevOps at #DOXLON

#MONITORINGBLISS

Page 29: David Gildeh (CEO of Dataloop) - Monitoring Nightmares for DevOps at #DOXLON

www.dataloop.io

@dataloopio

[email protected]