Redundant Devopsabout reinventing the wheel
Szabolcs Szabolcsi-Toth @_Nec
Senior Engineer
JSConf Budapest 2017
Curator, Organizer
What’s this about?
Metrics Error Logs Logging Secret Store Service Discovery Process Supervision Running Programs Connecting Services
Metrics
Metrics are • time bound, historical
• numeric data
• software, network or hardware property
Metrics are great! • see trends
• mark releases
• notice anomalies: spikes & gaps
• create alerts
!
?
Metric delivery • collect (scrape) or push data?
• collect periodically
• put metric data where it can be collected
Tools for metrics
• prometheus
• graphite
Node best practices • put your metrics on an accessible endpoint /metrics /status
• there are node libs to automate this instrument http
• let the metrics tool do scraping, delivery
• watch those nice graphs ☺check out grafana
Key metrics • latency check for slow queries, create performance tests on them iterate code, re-test againdo not average, use a histogram
• resource usageslow memory leaks disk is getting full predict resource shortage via trends
latency
Sending metrics is not the job of your app
Error logs
Catch errors as fast as possible! • instant alert of production errors
• use while feature testing
• keep an eye on it during releases
• aggregate errors in a single service, see all
• catch before the user
Ideal error reports have• environment of error
build / release / branch / server
• stack trace exact code location
• custom data anything that helps identifying the problem
Error log delivery• can happen any time,
hopefully rare
• push data
• expect the unexpected,handle the unhandled
• never log secrets
• sampling, throttling, timeoutdo not let error logging itselfkill your app
Tools & services for error reporting• airbrake
• errbit (airbrake api, open source)
• sentry
• raygun
• rollbar
• …
Integrate, get notified! • pagerduty
• slack / hipchat chatops - resolve, react within your chat
Logging
Logging vs Error logging • logging is anticipated
• error logs are occasional
Log levels
Log levels, recap • fatal - needs instant intervention, see error logs
• error - inform the user, see error logs
• warn - escalate if happens again
• info - just a step in a regular flow
• debug - full of lines, and traces
Benefits of logging, custom logs • debug
• custom events
• tracking the usage and behaviour of app
• profile, AB test, product development
Logging in node • console.log
• bragi
• debug
• npmlog
• winston
Logging in node - general • has timestamps
• has loglevels
• can be routed to stdout/stderr
• can be formatted
• create or use Correlation ID
Correlation ID quick quidecID
cID
cIDcID
cID
cIDcID
cID
cID
services
logs
Best practices • just put it to stdout
(docker & kubernetes clearly ecourages this)
• let the log collector handle it
• pipe stdout to a file, or whatever you like
• able to set to debug mode runtime use signals
• never log secrets
Log collectors • fluentd
• logstash
• syslog-ng
• rsyslog
A good log collector should • read from stdout / file tail
• use your correlation ID
• remove the burden of transferring your logs
Remote logging • Stackdriver (fluentd based)
• Elasticsearch (fluentd based)
Sending logs is not the job of your app
Secret Store
Secrets • passwords / usernames
• db names
• API keys
• private keys
NOT Secret Storage × source code
× private VCS repositories
× config files
× simple database fields
× ENV variables
Benefits • ACL, policies
access set of secrets by revokeable tokens
• centralized key rotationedit, update all secrets at one place
• single use access, n-use access
• time bound keys
• audit log
• runtime accessno secrets stored on disk
• build-time access
How it works
build server
app server
Secret Store
Build time Run time
Version Control
secret/name secret/name
secrets built in the deployed code
secrets were requested on app startup, stored onlyin memory
- token- secret/name
- actual secrets
- token- secret/name
- actual secrets
Secret store server • powerful encryption
• has to be unlocked on start
• secrets are totally inaccessible without unlocking
Secret store services • HashiCorp Vault
• Amazon KMS
• Docker Swarm
• Keywhiz
Never store your secrets in your source code
Service Discovery
Service discovery can help • Service Registration
and notify other services of the registered one
• Service Discoverysearching for services?
• Monitoring is a service active and responding?
• Load Balancing direct traffic to the new service
How it works • can act like a DNS
simple usecaseinternal network
• can write / create configsmore complexmore control
How it works
APP
SD AGENT check PORTcheck PID
LBStart scraping metrics
Loadbalancer directstraffic
Service registry
Service discovery agent • separate task, job, process
• can be configured what to check
• independent of your app
Service discovery services • Apache Zookeeper
• Netflix Eureka
• HashiCorp Consul
• Doozer
• Etcd (can be used to build service discovery)
Registering services is not the job of your app
Process Supervision
Process supervision • keeping your app working
• based on some property you definenot just process id, butportpinghttp response
• can fail after trying
Process supervision in Node-land • PM2
• forever
Process supervision in general • monit
manage any processsmall footprint simple
Pro Con
UsingMonit
Not usingMonit
monit can instantly restart your failing
service
you might not know why it was failing
MTTR* can be relatively high
you can debug what actually
happened
*Mean Time To Repair
Running Programs
Simple role • start & stop your app
watch the process itself handle process state
• send signals to the appsignals can be interpreted as tasks
Running Programs in general • runit
• upstart
• systemd
• Supervisord
• God
• Circus
A good program runner • distribution independent
you can migrate your scripts any time
• easy to config
monit + runit (or similar) • avoid using auto restart in both
can create weird race conditions, they do not know about each other
• use runit to configure app start/stop
• let monit decide when to restart & use runit
Connecting Services
Goals & benefits • decoupling
separate services loosen up the connection between them
• scalingscale up easily when needed scale down after
HTTP based APIs
vs
Message Queues
HTTP based APIs
LOA
D B
ALA
NC
ER
Service “1” Service “2”
Message Queues
Service “1” Service “2”
MESSAGE QUEUE
HTTP based APIs or
Message Queues?
It depends
HTTP APIs
• async / sync • remote • open API
Msg Queues
• async (usually) • grouped, close • low latency
Lessons learned
Prototype & learn
• use whatever modules and services you like
• get ready to go to live & production environments
• get ready to scale easily
Focus your app
• your app should do it’s job!
• not sending logs, metrics, notifying service registries or keeping itself running
• keep it simple
Talk to your ops
• they are here to run your app
• can help you a lot
• get on a common ground
• ask the right questions
With many thanks to
Peter Wilcsinszky / @pepov
Ferenc Kovacs / @Tyr43l
Let’s talk! :)
Find me around here, or come visit us in 2 weeks!
JSConf Budapest 2017
Thank you
Top Related