1
Prometheus, Grafana and SUSE Manager 4
How to securely automate the configuration of
monitoring on large environments
João CavalheiroEngineering [email protected]
2
Agenda
1. Monitoring Tools and Best Practices
2. Large Scale Infrastructures
3. Monitoring with SUSE Manager
4. Demo
3
The Importance of Metrics
4
About Prometheus
● Has its own time-series database
● Data collection via pull model over HTTP
● Targets set via static configuration or service discovery
● Prometheus has its own alerting system – Alertmanager
● Metrics have a name, a set of labels, a timestamp and a value
5
About Grafana
● Used to query and visualize metrics
● Works with Prometheus, but not only– Grafana supports multiple backends
– It is possible to combine data from different sources
● Fully customizable– Each panel has a wide variety of styling and formatting options
– Supports templates
– Collection of add-ons and pre-built dashboards
6
Prometheus and Grafana Look Good Together
7
Monitoring Best Practices
8
● Few alerts, focus on end-user pain
● Alert on symptoms rather than causes
● High latency and error rates high up in the stack
● Alerts must be actionable
● Don’t assume no news is good news!
Monitoring Best Practices - Alerting
9
Monitoring Best Practices – Clusters
● In Kubernetes, everything is ephemeral by design
● Cluster: Node resource usage, number of nodes and running pods
● Pods: k8s metrics, pod container metrics, and application metrics
● Combine container-level metrics with service-level alerts
● Know your services
10
Monitoring Large Scale Infrastructures
11
Prometheus Hierarchical Federation
12
Prometheus Cross-Service Federation
● Prometheus servers can scrape data from one another
● Common use cases are HA and sanity checks
● Useful for fail-safe alerting
13
Prometheus Known Limitations
● Performance and scaling beyond a certain limit
● Hard to query data from multiple Prometheus Servers
● Samples can be lost under high load
● No reliable long-term data storage
● No dynamic sample rate
14
Prometheus Large-scale Extensions
● Cortex from Grafana Labs
– https://grafana.com/oss/cortex/
● M3 from Uber
– https://eng.uber.com/m3/
● Thanos
– https://thanos.io/
15
Monitoring with SUSE Manager
16
Setup your Monitoring Infrastructure
SUSE Manager provides:
● Packages for Prometheus, Grafana and commonly used exporters
● Formulas to automate the setup of Prometheus and Grafana
● Support for Prometheus federations (*)
● Pre-built Grafana dashboards
● Server self-monitoring
(*) In SUSE Manager 4.1
17
Monitor your Client Systems
Add monitoring to your existing systems with:
● Formulas to easily configure exporters
● Prometheus Service Discovery via SUSE Manager API
● Support for existing Prometheus servers
● Grafana dashboard templates
● Alert templates
18
Coming Next
● Automatic metric collection from clusters (e.g. CaaSP)
● Improved Service Discovery performance
● Expose minion metrics on a single port
● Encryption and authentication
● More exporters, more dashboards
19
Demo
20
General Disclaimer
This document is not to be construed as a promise by any participating company to develop, deliver, or market a product. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. SUSE makes no representations or warranties with respect to the contents of this document, and specifically disclaims any express or implied warranties of merchantability or fitness for any particular purpose. The development, release, and timing of features or functionality described for SUSE products remains at the sole discretion of SUSE. Further, SUSE reserves the right to revise this document and to make changes to its content, at any time, without obligation to notify any person or entity of such revisions or changes. All SUSE marks referenced in this presentation are trademarks or registered trademarks of SUSE, LLC, Inc. in the United States and other countries. All third-party trademarks are the property of their respective owners.
Top Related