Download - Monitoring NGINX (plus): key metrics and how-to

Transcript
Page 1: Monitoring NGINX (plus): key metrics and how-to

Monitoring nginxAlexis Lê-Quôc, Datadog

@alq

Page 2: Monitoring NGINX (plus): key metrics and how-to

Agenda• Dramatis personae • Observations • Monitoring 1 nginx (plus) with logs • Monitoring 1 nginx (plus) with metrics • Monitoring N nginx effectively

Page 3: Monitoring NGINX (plus): key metrics and how-to

@alq CTO at Datadog

Page 4: Monitoring NGINX (plus): key metrics and how-to

Datadog == monitoring• Monitoring as a service • Work really will with large, dynamic environments (e.g. clouds) • Aggregate performance metrics • Correlate nginx performance with the rest of your infrastructure

Page 5: Monitoring NGINX (plus): key metrics and how-to
Page 6: Monitoring NGINX (plus): key metrics and how-to
Page 7: Monitoring NGINX (plus): key metrics and how-to

ObservationsFrom the field

Page 8: Monitoring NGINX (plus): key metrics and how-to

Some stats• Across all monitored servers • nginx ~10% • Apache ~5% • CPU and CPU/$ is the dominant resource

Page 9: Monitoring NGINX (plus): key metrics and how-to

% of instances per core count

0%

10%

20%

30%

40%

Core count1 2 4 8 12 16 24 32

10%

1%3%

10%

30%

7%

39%

10%

Page 10: Monitoring NGINX (plus): key metrics and how-to

% of instances per type (AWS only)

0%

7.5%

15%

22.5%

30%

EC2 typec3.l c3.2xl c1.xl c3.8xl m3.l c3.xl m3.m cc2.8xl t2.m c3.4xl rest

8.6%

3.1%4.4%4.5%4.7%5%5.3%

7.6%

13%14%

30%

Page 11: Monitoring NGINX (plus): key metrics and how-to

Monitoring nginx1. Monitoring with logs 2. Monitoring with status 3. Monitoring with statsd

Page 12: Monitoring NGINX (plus): key metrics and how-to

Monitoring with logs

• Canonical example of log indexers • Your choice of:

• logstash • splunk • logentries, sumologic, loggly, etc.

nginx log forwarder indexer UI

Page 13: Monitoring NGINX (plus): key metrics and how-to

Monitoring with logs

nginx log forwarder indexer UI

Strengths Weaknesses

forensics & anomalies low signal-to-noise ratio

content-driven analysis “black box”

Page 14: Monitoring NGINX (plus): key metrics and how-to

Monitoring with metrics

• open-source: ngx_http_stub_status_module • bare-bone metrics • human-readable text presentation

• plus: ngx_http_status_module • a lot more metrics for each function • json format

• Your choice of… • Datadog, Nagios, Zabbix, etc. for open-source • Datadog for nginx plus

nginx status collector aggregator UI/alerts

Page 15: Monitoring NGINX (plus): key metrics and how-to

Monitoring with metrics

nginx status collector aggregator UI/alerts

Strengths Weaknesses

lightweight & real-time no insight into content

“white box”

Page 16: Monitoring NGINX (plus): key metrics and how-to

Simple metrics taxonomy1. What it measures

• Work or resource • Focus on work because work == value • Resource analysis useful to understand performance

• Use Brendan Gregg’s USE • Utilization (% over time) • Saturation (queue length) • Errors (count over time)

2. Type • Gauge: sample • Counter: accumulated sample, needs to be derived to be

meaningful

http://www.brendangregg.com/usemethod.html

Page 17: Monitoring NGINX (plus): key metrics and how-to

Open-source metrics

Class Type Resource/Work Notes

Current connections Gauge Resource reading, writing,

idleAccepted

connections Counter Resource

Handled connections Counter Resource <= accepted if

resource limit

Requests Counter Work True purpose of the server

•Latency must be measured using logs or statsd.

Page 18: Monitoring NGINX (plus): key metrics and how-to

Key “plus” metrics

Class Type Resource/Work Notes

5xx Errors Counter Work without log analysis

5xx/sum(Nxx) Gauge Work error rate %

idle/dropped connections Gauge Resource saturation

active/total connections Gauge Resource upstream

capacity

Requests Counter Work true purpose of the server

• Latency must be measured using logs or statsd.

Page 19: Monitoring NGINX (plus): key metrics and how-to

Monitoring with statsd

nginx statsd UI/alerts

Strengths Weaknesses

lightweight, real-time, standard not comprehensive

custom metrics, content-aware

https://github.com/zebrafishlabs/nginx-statsd

Page 20: Monitoring NGINX (plus): key metrics and how-to

Example

Page 21: Monitoring NGINX (plus): key metrics and how-to

Monitoring nginx1. Logs for content-analysis (forensics, anomalies, marketing) 2. Status for (white box) performance monitoring 3. statsD for custom metrics

No single method gives you everything you need.

Page 22: Monitoring NGINX (plus): key metrics and how-to

Monitoring a lot of nginx1. Requires aggregation 2. It’s all about Metadata (“Pet-to-cattle” mindset) 3. Correlation

Page 23: Monitoring NGINX (plus): key metrics and how-to

Aggregation• By default for log-based monitoring • Not by default for metric-based monitoring

Page 24: Monitoring NGINX (plus): key metrics and how-to

Metadata• Analyze by properties that are not the host identity • Find anomalies that are not obvious • Pet-to-cattle evolution: hosts don’t matter, services do

Page 25: Monitoring NGINX (plus): key metrics and how-to

Correlation• nginx is only one piece of the infrastructure

Page 26: Monitoring NGINX (plus): key metrics and how-to

#plugwww.datadog.com

Page 27: Monitoring NGINX (plus): key metrics and how-to

Thank you!Questions/Comments? @alq