@henridf Henri Dubois-Ferriere - Percona · Henri Dubois-Ferriere @henridf. Hello. Henri...

35
Percona Live, 2018-11-06 Monitoring Kubernetes with Prometheus Henri Dubois-Ferriere @henridf

Transcript of @henridf Henri Dubois-Ferriere - Percona · Henri Dubois-Ferriere @henridf. Hello. Henri...

Page 1: @henridf Henri Dubois-Ferriere - Percona · Henri Dubois-Ferriere @henridf. Hello. Henri Dubois-Ferriere Technical Director, Sysdig Doing “observability” for many many years,

Percona Live, 2018-11-06

Monitoring Kubernetes with Prometheus

Henri Dubois-Ferriere@henridf

Page 2: @henridf Henri Dubois-Ferriere - Percona · Henri Dubois-Ferriere @henridf. Hello. Henri Dubois-Ferriere Technical Director, Sysdig Doing “observability” for many many years,

Hello.Henri Dubois-FerriereTechnical Director, Sysdig

Doing “observability” for many many years, from network to web apps via many startups.

PhD in CS from EPFL

Repatriate from San Francisco to Switzerland

Page 3: @henridf Henri Dubois-Ferriere - Percona · Henri Dubois-Ferriere @henridf. Hello. Henri Dubois-Ferriere Technical Director, Sysdig Doing “observability” for many many years,

Outline

● Kubernetes

● Prometheus

● Kubernetes metrics & sources

● Deployment

Page 4: @henridf Henri Dubois-Ferriere - Percona · Henri Dubois-Ferriere @henridf. Hello. Henri Dubois-Ferriere Technical Director, Sysdig Doing “observability” for many many years,

Monitor why?

● Know about outages before users tell me

● Understand my production environment (or try…)

● Plan/trend/forecast

Page 5: @henridf Henri Dubois-Ferriere - Percona · Henri Dubois-Ferriere @henridf. Hello. Henri Dubois-Ferriere Technical Director, Sysdig Doing “observability” for many many years,

Kubernetes

Page 6: @henridf Henri Dubois-Ferriere - Percona · Henri Dubois-Ferriere @henridf. Hello. Henri Dubois-Ferriere Technical Director, Sysdig Doing “observability” for many many years,

Kubernetes

- Container orchestration system

- aka “OS for your cluster”

- Abstracts away the underlying infra

- declarative APIs with control loops

Page 7: @henridf Henri Dubois-Ferriere - Percona · Henri Dubois-Ferriere @henridf. Hello. Henri Dubois-Ferriere Technical Director, Sysdig Doing “observability” for many many years,

https://commons.wikimedia.org/wiki/File:Kubernetes.png

Page 8: @henridf Henri Dubois-Ferriere - Percona · Henri Dubois-Ferriere @henridf. Hello. Henri Dubois-Ferriere Technical Director, Sysdig Doing “observability” for many many years,

Prometheus

Page 9: @henridf Henri Dubois-Ferriere - Percona · Henri Dubois-Ferriere @henridf. Hello. Henri Dubois-Ferriere Technical Director, Sysdig Doing “observability” for many many years,

Prometheus

❏ Started at SoundCloud in 2012

❏ Motivated by challenges with monitoring dynamic

environments

❏ Made public 2015, now second CNCF “graduate”

Page 10: @henridf Henri Dubois-Ferriere - Percona · Henri Dubois-Ferriere @henridf. Hello. Henri Dubois-Ferriere Technical Director, Sysdig Doing “observability” for many many years,

More than a TSDB

https://prometheus.io/assets/architecture.png

Page 11: @henridf Henri Dubois-Ferriere - Percona · Henri Dubois-Ferriere @henridf. Hello. Henri Dubois-Ferriere Technical Director, Sysdig Doing “observability” for many many years,

It’s all about the pull

- Prom scrapes targets to get metrics

- Nice side effect: know when target down

- Needs to know what to scrape

Page 12: @henridf Henri Dubois-Ferriere - Percona · Henri Dubois-Ferriere @henridf. Hello. Henri Dubois-Ferriere Technical Director, Sysdig Doing “observability” for many many years,

What should Prometheus scrape?

- Service discovery provides answer

- Azure, Consul, GCE, K8S, EC2, ...

- Can also watch a file containing target list

Page 13: @henridf Henri Dubois-Ferriere - Percona · Henri Dubois-Ferriere @henridf. Hello. Henri Dubois-Ferriere Technical Director, Sysdig Doing “observability” for many many years,

Dimensional data model

Query: http_requests_total{code=”200”, method=”get”}

Selector (aka filter)Metric name

Page 14: @henridf Henri Dubois-Ferriere - Percona · Henri Dubois-Ferriere @henridf. Hello. Henri Dubois-Ferriere Technical Director, Sysdig Doing “observability” for many many years,

Query: http_requests_total{code=”200”, method=”get”}

Response:http_requests_total{code="200", method=”get”, route="/api/users"} 1528706829.115 1741

http_requests_total{code="200", method=”get”, route="/api/objects"} 1528706829.115 1920

Label/value pairs (aka dimensions)

Dimensional data model

Page 15: @henridf Henri Dubois-Ferriere - Percona · Henri Dubois-Ferriere @henridf. Hello. Henri Dubois-Ferriere Technical Director, Sysdig Doing “observability” for many many years,

Query: http_requests_total{code=”200”, method=”get”}

Response:http_requests_total{code="200", method=”get”, route="/api/users"} 1528706829.115 1741

http_requests_total{code="200", method=”get”, route="/api/objects"} 1528706829.115 1920

Timestamp value

Dimensional data model

Page 16: @henridf Henri Dubois-Ferriere - Percona · Henri Dubois-Ferriere @henridf. Hello. Henri Dubois-Ferriere Technical Director, Sysdig Doing “observability” for many many years,

Metadata discovery

- SD also provides metadata

- Metadata can be mixed in with metrics

- Powerful relabelling feature for label manipulation at

ingest

Page 17: @henridf Henri Dubois-Ferriere - Percona · Henri Dubois-Ferriere @henridf. Hello. Henri Dubois-Ferriere Technical Director, Sysdig Doing “observability” for many many years,

Instrumentation

Page 18: @henridf Henri Dubois-Ferriere - Percona · Henri Dubois-Ferriere @henridf. Hello. Henri Dubois-Ferriere Technical Director, Sysdig Doing “observability” for many many years,

Off-the-shelf or write your own

Page 19: @henridf Henri Dubois-Ferriere - Percona · Henri Dubois-Ferriere @henridf. Hello. Henri Dubois-Ferriere Technical Director, Sysdig Doing “observability” for many many years,

Kubernetes metrics

Page 20: @henridf Henri Dubois-Ferriere - Percona · Henri Dubois-Ferriere @henridf. Hello. Henri Dubois-Ferriere Technical Director, Sysdig Doing “observability” for many many years,

Monitoring resources and methods

- For resources like memory, queues, CPUs, disks…- USE Method: Utilization, Saturation, Errors - http://www.brendangregg.com/usemethod.html

- For services- “RED” Method: Request rate, Error rate, Duration- https://www.weave.works/blog/the-red-method-key-metrics-for-micr

oservices-architecture/

Page 21: @henridf Henri Dubois-Ferriere - Percona · Henri Dubois-Ferriere @henridf. Hello. Henri Dubois-Ferriere Technical Director, Sysdig Doing “observability” for many many years,

- Host metrics- CPU- Memory- Disk- Network- ...

- Not K8S specific, but useful as referential and for totals

node_exporter: node metrics

Page 22: @henridf Henri Dubois-Ferriere - Percona · Henri Dubois-Ferriere @henridf. Hello. Henri Dubois-Ferriere Technical Director, Sysdig Doing “observability” for many many years,

- Runs in kubelet (usually, for now..)

- Resource stats about running containers

- Mostly container and node-level labels…

- (k8s: plus namespace and pod_name)

cAdvisor: container metrics

Page 23: @henridf Henri Dubois-Ferriere - Percona · Henri Dubois-Ferriere @henridf. Hello. Henri Dubois-Ferriere Technical Director, Sysdig Doing “observability” for many many years,

Sample cAdvisor metric queries

Percent of total cluster memory used: sum(container_memory_rss) / sum(machine_memory_bytes)

Memory used by kubernetes namespace: sum(container_memory_rss) by (namespace)

Top 5 pods by network I/O:topk(5, sum by (pod_name) (rate(container_network_transmit_bytes_total[5m])))

Page 24: @henridf Henri Dubois-Ferriere - Percona · Henri Dubois-Ferriere @henridf. Hello. Henri Dubois-Ferriere Technical Director, Sysdig Doing “observability” for many many years,

$ kubectl get deploy my-app -o yaml apiVersion: extensions/v1beta1 kind: Deployment metadata: name: my-app ... spec: replicas: 4 ... status: replicas: 4 ...

Kube-state metrics

Page 25: @henridf Henri Dubois-Ferriere - Percona · Henri Dubois-Ferriere @henridf. Hello. Henri Dubois-Ferriere Technical Director, Sysdig Doing “observability” for many many years,

$ kubectl get deploy my-app -o yaml apiVersion: extensions/v1beta1 kind: Deployment metadata: name: my-app ... spec: replicas: 4 ... status: replicas: 4 ...

Kube-state metrics

kube_deployment_spec_replicas{deployment="my-app", ...}

Metrics created by kube-state-metricsWith label set from this deployment

kube_deployment_status_replicas{deployment="my-app", ...}

Page 26: @henridf Henri Dubois-Ferriere - Percona · Henri Dubois-Ferriere @henridf. Hello. Henri Dubois-Ferriere Technical Director, Sysdig Doing “observability” for many many years,

Sample kube-state-metrics queries

Deployments with issueskube_deployment_spec_replicas!=kube_deployment_status_replicas_available

Top 10 longest-running pods (“reverse uptime”)topk(10, sort_desc(time() - kube_pod_created))

Page 27: @henridf Henri Dubois-Ferriere - Percona · Henri Dubois-Ferriere @henridf. Hello. Henri Dubois-Ferriere Technical Director, Sysdig Doing “observability” for many many years,

- API Server

- etcd3

- kube-dns

- scheduler, controller-manager

Kube core service metrics

Page 28: @henridf Henri Dubois-Ferriere - Percona · Henri Dubois-Ferriere @henridf. Hello. Henri Dubois-Ferriere Technical Director, Sysdig Doing “observability” for many many years,

Metrics recap

Deployment mode How many Metrics about

node_exporter daemonset 1 per node node resources

cAdvisor inside kubelet 1 per node container resources

kube-state-metrics deployment singleton k8s object state

etcd, Api Server, controller manager, ...

core service singleton or HA group Itself

Page 29: @henridf Henri Dubois-Ferriere - Percona · Henri Dubois-Ferriere @henridf. Hello. Henri Dubois-Ferriere Technical Director, Sysdig Doing “observability” for many many years,

Deploying

Page 30: @henridf Henri Dubois-Ferriere - Percona · Henri Dubois-Ferriere @henridf. Hello. Henri Dubois-Ferriere Technical Director, Sysdig Doing “observability” for many many years,

- Monitoring runs inside thing being monitored?

- Yes. It’s fine really. Really, it’s fine.

- (And being outside has own challenges)

Monitoring from the inside

Page 31: @henridf Henri Dubois-Ferriere - Percona · Henri Dubois-Ferriere @henridf. Hello. Henri Dubois-Ferriere Technical Director, Sysdig Doing “observability” for many many years,

- Metrics services- node_exporter- kube-state-metrics- (cAdvisor usually enabled out of box)

- Prometheus running- Storage- Read access to API server (for service discovery)- Service discovery config for above- Service discovery config for apps/services

Deployment outline

Page 32: @henridf Henri Dubois-Ferriere - Percona · Henri Dubois-Ferriere @henridf. Hello. Henri Dubois-Ferriere Technical Director, Sysdig Doing “observability” for many many years,

helm fetch stable/prometheus

vi prometheus/values.yaml # configure install

helm upgrade -i # or manually deploy yaml

Helm-based install

Page 33: @henridf Henri Dubois-Ferriere - Percona · Henri Dubois-Ferriere @henridf. Hello. Henri Dubois-Ferriere Technical Director, Sysdig Doing “observability” for many many years,

Prometheus operator

- Use Kubernetes API facilities to make Prometheus “native”

- new Prometheus-related objects: `kubectl get prometheus`

- PrometheusRule, ServiceMonitor, AlertManager,

AlertingSpec, ...

- Prometheus configuration abstracted via all these objects

- Young but promising

- Consider more direct route first (hand-rolled or Helm), and Operator once

more familiar with challenges of direct route

Page 34: @henridf Henri Dubois-Ferriere - Percona · Henri Dubois-Ferriere @henridf. Hello. Henri Dubois-Ferriere Technical Director, Sysdig Doing “observability” for many many years,

Thank You.Henri Dubois-Ferriere@henridf

Page 35: @henridf Henri Dubois-Ferriere - Percona · Henri Dubois-Ferriere @henridf. Hello. Henri Dubois-Ferriere Technical Director, Sysdig Doing “observability” for many many years,

Pointers

- Prometheus SD for Kubernetes:

https://prometheus.io/docs/prometheus/latest/configuration/configuration/#kubernetes_sd_config

- KSM metrics: https://github.com/kubernetes/kube-state-metrics/tree/master/Documentation

- Prometheus Helm chart: https://github.com/helm/charts/tree/master/stable/prometheus

- Prometheus operator: https://github.com/coreos/prometheus-operator

- “A deep dive into Kubernetes metrics” blog series:

https://blog.freshtracks.io/a-deep-dive-into-kubernetes-metrics-66936addedae