Monitoring NGINX (plus): key metrics and how-to

Post on 02-Jul-2015

298 views 6 download

description

NGINX just works and that's why we use it. That does not mean that it should be left unmonitored. As a web server, it plays a central role in a modern infrastructure. As a gatekeeper, it sees every interaction with the application. If you monitor it properly it can explain a lot about what is happening in the rest of your infrastructure. In this talk you will learn more about NGINX (plus) metrics, what they mean and how to use them. You will also learn different methods (status, statsd, logs) to monitor NGINX with their pros and cons, illustrated with real data coming from real servers.

Transcript of Monitoring NGINX (plus): key metrics and how-to

Monitoring nginxAlexis Lê-Quôc, Datadog

@alq

Agenda• Dramatis personae • Observations • Monitoring 1 nginx (plus) with logs • Monitoring 1 nginx (plus) with metrics • Monitoring N nginx effectively

@alq CTO at Datadog

Datadog == monitoring• Monitoring as a service • Work really will with large, dynamic environments (e.g. clouds) • Aggregate performance metrics • Correlate nginx performance with the rest of your infrastructure

ObservationsFrom the field

Some stats• Across all monitored servers • nginx ~10% • Apache ~5% • CPU and CPU/$ is the dominant resource

% of instances per core count

0%

10%

20%

30%

40%

Core count1 2 4 8 12 16 24 32

10%

1%3%

10%

30%

7%

39%

10%

% of instances per type (AWS only)

0%

7.5%

15%

22.5%

30%

EC2 typec3.l c3.2xl c1.xl c3.8xl m3.l c3.xl m3.m cc2.8xl t2.m c3.4xl rest

8.6%

3.1%4.4%4.5%4.7%5%5.3%

7.6%

13%14%

30%

Monitoring nginx1. Monitoring with logs 2. Monitoring with status 3. Monitoring with statsd

Monitoring with logs

• Canonical example of log indexers • Your choice of:

• logstash • splunk • logentries, sumologic, loggly, etc.

nginx log forwarder indexer UI

Monitoring with logs

nginx log forwarder indexer UI

Strengths Weaknesses

forensics & anomalies low signal-to-noise ratio

content-driven analysis “black box”

Monitoring with metrics

• open-source: ngx_http_stub_status_module • bare-bone metrics • human-readable text presentation

• plus: ngx_http_status_module • a lot more metrics for each function • json format

• Your choice of… • Datadog, Nagios, Zabbix, etc. for open-source • Datadog for nginx plus

nginx status collector aggregator UI/alerts

Monitoring with metrics

nginx status collector aggregator UI/alerts

Strengths Weaknesses

lightweight & real-time no insight into content

“white box”

Simple metrics taxonomy1. What it measures

• Work or resource • Focus on work because work == value • Resource analysis useful to understand performance

• Use Brendan Gregg’s USE • Utilization (% over time) • Saturation (queue length) • Errors (count over time)

2. Type • Gauge: sample • Counter: accumulated sample, needs to be derived to be

meaningful

http://www.brendangregg.com/usemethod.html

Open-source metrics

Class Type Resource/Work Notes

Current connections Gauge Resource reading, writing,

idleAccepted

connections Counter Resource

Handled connections Counter Resource <= accepted if

resource limit

Requests Counter Work True purpose of the server

•Latency must be measured using logs or statsd.

Key “plus” metrics

Class Type Resource/Work Notes

5xx Errors Counter Work without log analysis

5xx/sum(Nxx) Gauge Work error rate %

idle/dropped connections Gauge Resource saturation

active/total connections Gauge Resource upstream

capacity

Requests Counter Work true purpose of the server

• Latency must be measured using logs or statsd.

Monitoring with statsd

nginx statsd UI/alerts

Strengths Weaknesses

lightweight, real-time, standard not comprehensive

custom metrics, content-aware

https://github.com/zebrafishlabs/nginx-statsd

Example

Monitoring nginx1. Logs for content-analysis (forensics, anomalies, marketing) 2. Status for (white box) performance monitoring 3. statsD for custom metrics

No single method gives you everything you need.

Monitoring a lot of nginx1. Requires aggregation 2. It’s all about Metadata (“Pet-to-cattle” mindset) 3. Correlation

Aggregation• By default for log-based monitoring • Not by default for metric-based monitoring

Metadata• Analyze by properties that are not the host identity • Find anomalies that are not obvious • Pet-to-cattle evolution: hosts don’t matter, services do

Correlation• nginx is only one piece of the infrastructure

#plugwww.datadog.com

Thank you!Questions/Comments? @alq