Monitoring NGINX (plus): key metrics and how-to
-
Upload
datadogslides -
Category
Technology
-
view
297 -
download
6
description
Transcript of Monitoring NGINX (plus): key metrics and how-to
Monitoring nginxAlexis Lê-Quôc, Datadog
@alq
Agenda• Dramatis personae • Observations • Monitoring 1 nginx (plus) with logs • Monitoring 1 nginx (plus) with metrics • Monitoring N nginx effectively
@alq CTO at Datadog
Datadog == monitoring• Monitoring as a service • Work really will with large, dynamic environments (e.g. clouds) • Aggregate performance metrics • Correlate nginx performance with the rest of your infrastructure
ObservationsFrom the field
Some stats• Across all monitored servers • nginx ~10% • Apache ~5% • CPU and CPU/$ is the dominant resource
% of instances per core count
0%
10%
20%
30%
40%
Core count1 2 4 8 12 16 24 32
10%
1%3%
10%
30%
7%
39%
10%
% of instances per type (AWS only)
0%
7.5%
15%
22.5%
30%
EC2 typec3.l c3.2xl c1.xl c3.8xl m3.l c3.xl m3.m cc2.8xl t2.m c3.4xl rest
8.6%
3.1%4.4%4.5%4.7%5%5.3%
7.6%
13%14%
30%
Monitoring nginx1. Monitoring with logs 2. Monitoring with status 3. Monitoring with statsd
Monitoring with logs
• Canonical example of log indexers • Your choice of:
• logstash • splunk • logentries, sumologic, loggly, etc.
nginx log forwarder indexer UI
Monitoring with logs
nginx log forwarder indexer UI
Strengths Weaknesses
forensics & anomalies low signal-to-noise ratio
content-driven analysis “black box”
Monitoring with metrics
• open-source: ngx_http_stub_status_module • bare-bone metrics • human-readable text presentation
• plus: ngx_http_status_module • a lot more metrics for each function • json format
• Your choice of… • Datadog, Nagios, Zabbix, etc. for open-source • Datadog for nginx plus
nginx status collector aggregator UI/alerts
Monitoring with metrics
nginx status collector aggregator UI/alerts
Strengths Weaknesses
lightweight & real-time no insight into content
“white box”
Simple metrics taxonomy1. What it measures
• Work or resource • Focus on work because work == value • Resource analysis useful to understand performance
• Use Brendan Gregg’s USE • Utilization (% over time) • Saturation (queue length) • Errors (count over time)
2. Type • Gauge: sample • Counter: accumulated sample, needs to be derived to be
meaningful
http://www.brendangregg.com/usemethod.html
Open-source metrics
Class Type Resource/Work Notes
Current connections Gauge Resource reading, writing,
idleAccepted
connections Counter Resource
Handled connections Counter Resource <= accepted if
resource limit
Requests Counter Work True purpose of the server
•Latency must be measured using logs or statsd.
Key “plus” metrics
Class Type Resource/Work Notes
5xx Errors Counter Work without log analysis
5xx/sum(Nxx) Gauge Work error rate %
idle/dropped connections Gauge Resource saturation
active/total connections Gauge Resource upstream
capacity
Requests Counter Work true purpose of the server
• Latency must be measured using logs or statsd.
Monitoring with statsd
nginx statsd UI/alerts
Strengths Weaknesses
lightweight, real-time, standard not comprehensive
custom metrics, content-aware
https://github.com/zebrafishlabs/nginx-statsd
Example
Monitoring nginx1. Logs for content-analysis (forensics, anomalies, marketing) 2. Status for (white box) performance monitoring 3. statsD for custom metrics
No single method gives you everything you need.
Monitoring a lot of nginx1. Requires aggregation 2. It’s all about Metadata (“Pet-to-cattle” mindset) 3. Correlation
Aggregation• By default for log-based monitoring • Not by default for metric-based monitoring
Metadata• Analyze by properties that are not the host identity • Find anomalies that are not obvious • Pet-to-cattle evolution: hosts don’t matter, services do
Correlation• nginx is only one piece of the infrastructure
#plugwww.datadog.com
Thank you!Questions/Comments? @alq