Download - Monitoring at a SAAS Startup: Tradeoffs and Tools

Monitoring at a SaaS Startup

Tradeoffs and Tools

Bridget Kromhout

8thbridge.comsmall social commerce startupacquired in the last week by Fluid, Inc.small devteamI am the ops team

twisty maze of little shell scripts

bespoke artisanal monitoring

difficult to modify;doesn’t scale

http://www.pcgameshardware.de/screenshots/1280x1024/2007/07/CA01.jpg

New Relic

pros:nice graphsapplication-level viewgood error analysis

cons:slow to updatemany false-positive alertshigh prices (better now)

MotivatingChange

http://99designs.com/illustrations/contests/illustration-pagerduty-161025/entries

: as hideous as you remember

https://laur.ie/blog/2014/02/why-ill-be-letting-nagios-live-on-a-bit-longer-thank-you-very-much/

“Horrendous interface”“Well, it’s more “old” than anything

else. At least everything is in the

same place as you left it because it’s

been the same since 1912.”

“Sensu has so many moving parts that I wouldn’t be able to sleep at night unless I set up a Nagios instance to make sure they were all running.”

-- @murphy_slaw (via @lozzd)

HBase: monitor all the ports?!?

hbck: the HBase consistency checker

nagios -> bash script -> parsing output of hbck

http://www.ymc.ch/en/how-to-monitor-hbase-health-by-nagios

adding alert after alert after...

http://modiinhub.com/wp-content/uploads/2014/02/logo-mongodb-tagline.png

MMS (MongoDB Monitoring Service)

“cyber” monday: 1988 called; wants its word back.

the rewards of hubris

MMS showed the issue but we weren't alerting on it didn't understand the global write lock

If it moves, we track it. Sometimes we’ll draw a graph of something that isn’t moving

yet, just in case it decides to make a run for it. -- @indec

http://codeascraft.com/2011/02/15/measure-anything-measure-everything/

Graphite & StatsD

➔ Graphite◆ Store and visualize time-series data◆ http://graphite.readthedocs.org/

➔ StatsD ◆ Measure everything! (Timers, counters, events, …)◆ https://github.com/etsy/statsd/

Where we were

➔ Graphite 0.9.9 (wanted 0.9.12)◆ over 2 years old◆ missing new features (Consolidate by!)

➔ StatsD was newish, but…◆ hand-rolled◆ running in a screen session◆ on a special snowflake box

Community cookbooks?

➔ Graphite ones good, but…◆ focus on Apache (we use nginx)◆ we haven’t moved to Chef 11 (gasp!)

➔ StatsD◆ https://github.com/librato/statsd-cookbook◆ launches daemons via upstart◆ generates config file based on attributes

Graphite cookbook (Part 1)

➔ Install in a virtualenv (django, uwsgi, nginx)➔ Dependencies recommended

◆ https://github.com/graphite-project/graphite-web/blob/master/requirements.txt

➔ libcairo2-dev package on Ubuntu 12.04 LTS➔ install graphite’s 3 parts via pip

Graphite cookbook (Part 2)

➔ graphite-web◆ Django app, renders graphs

➔ whisper◆ fixed-size database for storing time-series data◆ like RRD

➔ carbon◆ carbon-cache.py - stores data◆ carbon-aggregator.py - buffers, then stores◆ carbon-relay.py - for sharding/replication

when in doubt: tcpdump is your friend

http://blog.johngoulah.com/2012/10/looking-under-the-covers-of-statsd/



carbon-aggravator (between 0.9.10 & 0.9.12)

# If set true, metric received will be forwarded to# DESTINATIONS in addition to# the output of the aggregation rules. If set false # the carbon-aggregator will# only ever send the output of aggregation.FORWARD_ALL = True

Carbonate

whisper-fill.py

backfill datapoints between whisper files

2am: sudden drop-off

8am: look at graphs: ?!?!

10am: and we’re back.

What’s next?

❏ finds real problems❏ actionable alerting❏ usable by all❏ …?

the ideal monitoring solution...

http://www.quickmeme.com/img/f5/f512ff9bee084263df5571d3c81388019dcb063173e1dbcfa2babac9274576b6.jpg

What we’re actually using now

StatsDApplication-level error analysis

Alarms for autoscaling

Timers & counters

Log & host-level

Hadoop & HBase visualization

MongoDBGraphs

Time-series data graphing

client-side plugins

External uptime checksoncall rotation/alerting

Threshold-based alarms

Dashboard

Discuss!

Twitter: @bridgetkromhoutEmail: [email protected]