Monitoring at section.io - Operational Intelligence Meetup May 2016
-
Upload
jason-stangroome -
Category
Technology
-
view
258 -
download
0
Transcript of Monitoring at section.io - Operational Intelligence Meetup May 2016
Monitoring at section.ioOperational visibility for both the platform and our users
•Runs on your local machine and pre-production•Configuration and deployment via git•Fast global cache management•HTTPS and HTTP/2 by default
A modern CDN
• Integrates with popular open-source•API driven•Near real-time log access•Consistent operational interface
Open platform
•Delivery Proxies• Varnish Cache•ModSecurity
•Kibana•Graphite•Umpire
Containers
•Web access logs, syslog, performance data•Docker Volumes•Elastic Beats•Log rotation
Gathering data
•600 million web access logs per week•60,000 log entries processed per minute•7 days of logs are searchable
Log volume
Log flow
Delivery
networks
Logstash
receivers
redis
Logstash processor
s
Logstash senders
redis
Ops Elasticsearch
clusterApps
Elasticsearch
cluster
StatsD, Carbon
Between about 5 seconds and 2 minutes
•Kibana•Elasticsearch API•Traces
Log visibility
•Metrics can optimise common log queries•Metrics retention:• 1 minute granularity for 1 month• 1 hour granularity for 13 months
•Graphite, Tessera, and Grafana•Heroku Umpire
Beyond logs
•CPU utilisation, memory usage, disk space•Traffic: connections, requests, packets, bytes• By partition, node, geo-region, and domain• By HTTP response status code
•Log latency, queue depth, processing rate•Message counts, errors, processing time
Platform monitoring
•Cache hit, miss, pass• By content-type
•Response time (median, mean, upper 95%)•WAF intercepts• By rule• By country
Website monitoring
•Every staff member does on-call•Every alert is actionable•Every incident feeds the product backlog
Internal processes
•Yelp Elastalert•Custom log fields•A `tail -f` UI•Automated anomaly detection
Beyond today
Jason Stangroome
Twitter: @jstangroomehttps://blog.stangroome.comhttps://www.section.io/blog
Thank you