Download - Monitoring at a SAAS Startup: Tradeoffs and Tools

Transcript
Page 1: Monitoring at a SAAS Startup: Tradeoffs and Tools

Monitoring at a SaaS Startup

Tradeoffs and Tools

Bridget Kromhout

Page 2: Monitoring at a SAAS Startup: Tradeoffs and Tools

8thbridge.comsmall social commerce startupacquired in the last week by Fluid, Inc.small devteamI am the ops team

Page 3: Monitoring at a SAAS Startup: Tradeoffs and Tools

twisty maze of little shell scripts

bespoke artisanal monitoring

difficult to modify;doesn’t scale

http://www.pcgameshardware.de/screenshots/1280x1024/2007/07/CA01.jpg

Page 4: Monitoring at a SAAS Startup: Tradeoffs and Tools

New Relic

pros:nice graphsapplication-level viewgood error analysis

cons:slow to updatemany false-positive alertshigh prices (better now)

Page 5: Monitoring at a SAAS Startup: Tradeoffs and Tools

MotivatingChange

http://99designs.com/illustrations/contests/illustration-pagerduty-161025/entries

Page 6: Monitoring at a SAAS Startup: Tradeoffs and Tools

: as hideous as you remember

Page 7: Monitoring at a SAAS Startup: Tradeoffs and Tools

https://laur.ie/blog/2014/02/why-ill-be-letting-nagios-live-on-a-bit-longer-thank-you-very-much/

“Horrendous interface”“Well, it’s more “old” than anything

else. At least everything is in the

same place as you left it because it’s

been the same since 1912.”

Page 8: Monitoring at a SAAS Startup: Tradeoffs and Tools

“Sensu has so many moving parts that I wouldn’t be able to sleep at night unless I set up a Nagios instance to make sure they were all running.”

-- @murphy_slaw (via @lozzd)

Page 9: Monitoring at a SAAS Startup: Tradeoffs and Tools

HBase: monitor all the ports?!?

hbck: the HBase consistency checker

nagios -> bash script -> parsing output of hbck

http://www.ymc.ch/en/how-to-monitor-hbase-health-by-nagios

Page 10: Monitoring at a SAAS Startup: Tradeoffs and Tools

adding alert after alert after...

Page 11: Monitoring at a SAAS Startup: Tradeoffs and Tools

http://modiinhub.com/wp-content/uploads/2014/02/logo-mongodb-tagline.png

Page 12: Monitoring at a SAAS Startup: Tradeoffs and Tools
Page 13: Monitoring at a SAAS Startup: Tradeoffs and Tools

MMS (MongoDB Monitoring Service)

Page 14: Monitoring at a SAAS Startup: Tradeoffs and Tools

“cyber” monday: 1988 called; wants its word back.

the rewards of hubris

MMS showed the issue but we weren't alerting on it didn't understand the global write lock

Page 15: Monitoring at a SAAS Startup: Tradeoffs and Tools

If it moves, we track it. Sometimes we’ll draw a graph of something that isn’t moving

yet, just in case it decides to make a run for it. -- @indec

http://codeascraft.com/2011/02/15/measure-anything-measure-everything/

Page 16: Monitoring at a SAAS Startup: Tradeoffs and Tools

Graphite & StatsD

➔ Graphite◆ Store and visualize time-series data◆ http://graphite.readthedocs.org/

➔ StatsD ◆ Measure everything! (Timers, counters, events, …)◆ https://github.com/etsy/statsd/

Page 17: Monitoring at a SAAS Startup: Tradeoffs and Tools

Where we were

➔ Graphite 0.9.9 (wanted 0.9.12)◆ over 2 years old◆ missing new features (Consolidate by!)

➔ StatsD was newish, but…◆ hand-rolled◆ running in a screen session◆ on a special snowflake box

Page 18: Monitoring at a SAAS Startup: Tradeoffs and Tools

Community cookbooks?

➔ Graphite ones good, but…◆ focus on Apache (we use nginx)◆ we haven’t moved to Chef 11 (gasp!)

➔ StatsD◆ https://github.com/librato/statsd-cookbook◆ launches daemons via upstart◆ generates config file based on attributes

Page 19: Monitoring at a SAAS Startup: Tradeoffs and Tools

Graphite cookbook (Part 1)

➔ Install in a virtualenv (django, uwsgi, nginx)➔ Dependencies recommended

◆ https://github.com/graphite-project/graphite-web/blob/master/requirements.txt

➔ libcairo2-dev package on Ubuntu 12.04 LTS➔ install graphite’s 3 parts via pip

Page 20: Monitoring at a SAAS Startup: Tradeoffs and Tools

Graphite cookbook (Part 2)

➔ graphite-web◆ Django app, renders graphs

➔ whisper◆ fixed-size database for storing time-series data◆ like RRD

➔ carbon◆ carbon-cache.py - stores data◆ carbon-aggregator.py - buffers, then stores◆ carbon-relay.py - for sharding/replication

Page 21: Monitoring at a SAAS Startup: Tradeoffs and Tools

when in doubt: tcpdump is your friend

http://blog.johngoulah.com/2012/10/looking-under-the-covers-of-statsd/

Page 22: Monitoring at a SAAS Startup: Tradeoffs and Tools

carbon-aggravator (between 0.9.10 & 0.9.12)

# If set true, metric received will be forwarded to# DESTINATIONS in addition to# the output of the aggregation rules. If set false # the carbon-aggregator will# only ever send the output of aggregation.FORWARD_ALL = True

Page 23: Monitoring at a SAAS Startup: Tradeoffs and Tools

Carbonate

whisper-fill.py

backfill datapoints between whisper files

Page 24: Monitoring at a SAAS Startup: Tradeoffs and Tools

2am: sudden drop-off

8am: look at graphs: ?!?!

10am: and we’re back.

Page 25: Monitoring at a SAAS Startup: Tradeoffs and Tools

What’s next?

Page 26: Monitoring at a SAAS Startup: Tradeoffs and Tools

❏ finds real problems❏ actionable alerting❏ usable by all❏ …?

the ideal monitoring solution...

http://www.quickmeme.com/img/f5/f512ff9bee084263df5571d3c81388019dcb063173e1dbcfa2babac9274576b6.jpg

Page 27: Monitoring at a SAAS Startup: Tradeoffs and Tools

What we’re actually using now

StatsDApplication-level error analysis

Alarms for autoscaling

Timers & counters

Log & host-level

Hadoop & HBase visualization

MongoDBGraphs

Time-series data graphing

client-side plugins

External uptime checksoncall rotation/alerting

Threshold-based alarms

Dashboard

Page 28: Monitoring at a SAAS Startup: Tradeoffs and Tools

Discuss!

Twitter: @bridgetkromhoutEmail: [email protected]