How to build a container monitoring solution - David Gildeh, CEO and Co-Founder of Outlyer
-
Upload
outlyer -
Category
Technology
-
view
609 -
download
1
Transcript of How to build a container monitoring solution - David Gildeh, CEO and Co-Founder of Outlyer
Monitoring for Cloud Scale Microservices
www.outlyer.com | @outlyerapp
How to build a Container
Monitoring Solution04-May-2017
Some Fun Facts About Docker
The Average
Host Runs 8
Containers
=
8 times more
metrics per
host
The Average
Container
Runs for 2
days
=
More Metric
Churn
12% of Hosts
Run
Containers
& % Growing
Fast
Source: Outlyer Customers
Monitoring Docker – The Basics
Physical Server
Hypervisor
OS OS
Mo
nito
rin
g A
ge
nt
MySQL
Java
Tomcat
Mo
nito
rin
g A
ge
nt
MySQL
PHP
Apache
VM Monitoring
Physical Server
Hypervisor
OS OS
Mo
nito
rin
g
Agent
MyS
QL
Ja
va
To
mca
t
Container Monitoring
Docker
Mo
nito
rin
g
Agent
MyS
QL
PH
P
Ap
ach
e
Docker
Monitoring Docker – The Basics
VM Monitoring
• All processes are accessible from ‘localhost’
• Agent runs in each VM
• Simple plugins can monitor each process
Container Monitoring
• All processes are siloed into containers
• Agent on each host inside its own container
or on the host VM
• Can’t monitor inside containers so you have
to monitor from the outside like a remote
mini-host
Monitoring Docker – Monitoring Processes
Shell Command Monitoring
Need to run “docker exec” with container ID
Endpoint Monitoring
Need to inject container IP address on internal
Docker network
Monitoring Docker – Organized Chaos via Orchestration
What Container Monitoring Looks Like in the Real World
Configuration Management is replaced with Auto-discovery
Summary: Everything’s dynamic & needs smart automation.
cAdvisor (GO Binary)
Collection
Where we started: V1 with cAdvisor
Container
Autodiscovery &
Metrics
(Pseudo Files)
Prometheus
Scraper
Generic
Application Metric
Scraper
Read Store
Web UI
REST API
Prometheus
Endpoint
StatdD
InfluxDB
ElasticSearch
Redis
Kafka
BigQuery
Where we started: V1 with cAdvisor – however…
• Used a lot of memory
• Kept crashing & hard to debug remotely
• Hard to customize
Back to drawing board: Replace cAdvisor
Back to drawing board: Use our Prometheus Scraper?
Ports: 9,000 – 10,000
9104
9113
3002
Agent
Back to drawing board: Build our own custom integration
Docker – Getting Metrics Out
Collection Point CPU Metrics Memory Metrics I/O Metrics Network Metrics
Pseudo-Files Yes Yes Some From 1.6.1
Stats Command Basic Basic From 1.9.0 Basic
Docker Remote
API
Yes Yes Some Yes
Docker Pseudo-Files
$ docker exec $CONTAINER_ID cat /sys/fs/cgroup/memory/memory.stat
cache 532480
rss 44650496
rss_huge 0
mapped_file 0
dirty 0
writeback 0
swap 0
pgpgin 244711
pgpgout 233680
pgfault 545794
pgmajfault 0
inactive_anon 8192
active_anon 44703744
inactive_file 102400
active_file 290816
unevictable 0
hierarchical_memory_limit 9223372036854771712
hierarchical_memsw_limit 9223372036854771712
total_cache 532480
total_rss 44650496
total_rss_huge 0
total_mapped_file 0
total_dirty 0
total_writeback 0
total_swap 0
total_pgpgin 244711
total_pgpgout 233680
total_pgfault 545794
total_pgmajfault 0
total_inactive_anon 8192
total_active_anon 44703744
total_inactive_file 102400
total_active_file 290816
total_unevictable 0
Docker Stats Command
$ docker stats CONTAINER_ID [CONTAINER_ID...]
CONTAINER CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
333db2d96a40 0.19% 50.61 MiB / 1.952 GiB 2.53% 60 kB / 195 kB 0 B / 36.9 kB 35
As of Docker 1.9.0 stats command also includes disk IO metrics.
Docker Remote API
GET /containers/{id}/stats
{
"read": "2015-01-08T22:57:31.547920715Z",
"pids_stats": {
”current": 3
},
"networks": {
"eth0": {},
"eth5": {}
},
"memory_stats": {
"stats": {},
"max_usage": 6651904,
"usage": 6537216,
"failcnt": 0,
"limit": 67108864
},
"blkio_stats": { },
"cpu_stats": {
"cpu_usage": {
"percpu_usage": [],
"usage_in_usermode": 50000000,
"total_usage": 100215355,
"usage_in_kernelmode": 30000000
},
"system_cpu_usage": 739306590000000,
"online_cpus": 4,
"throttling_data": {}
},
"precpu_stats": {
"cpu_usage": {},
"system_cpu_usage": 9492140000000,
"online_cpus": 4,
"throttling_data": {}
}
}
The Winner: Pseudo Files
Collection Point Ranking Reasoning
Pseudo-Files 1 Reliable between Docker versions
Stats Command 3 Basic reporting, only works with Docker
Docker Remote API 2 Good reporting but would vary by Docker version and
also may have networking issues
Making Nagios Plugins Work Against Containers
Shell Command Monitoring
Need to run “docker exec” with container ID
Endpoint Monitoring
Need to inject container IP address on internal
Docker network
Making Nagios Plugins Work Against Containers:
Making it Magic
Making it work with Orchestrators
Services = Pets, Containers = Cattle.
Making it work with Orchestrators
Se
rvic
e
Image from http://blog.arungupta.me/kubernetes-design-patterns/
Making it work with Orchestrators: Dimensional Labels
Every container and their metrics gets applied the following
dimensional labels via Kubernetes:
• Node
• Pod
• Service
• Custom Labels
Making it work with Orchestrators: Host View
Metric Series Churn = Constantly Growing Indexes
ContainerID = 1:
cpu.user=22%
rss=44232322
…
active_file=232232
Swap=0
ContainerID = 2:
cpu.user=22%
rss=44232322
…
active_file=232232
Swap=0
ContainerID = 3:
cpu.user=22%
rss=44232322
…
active_file=232232
Swap=0
Metric Series Churn Solution: Partition indexes by time
https://fabxc.org/blog/2017-04-10-writing-a-tsdb/
What’s Next?
Services & Tracing