Metrics driven engineering (velocity 2011)
-
date post
12-Sep-2014 -
Category
Technology
-
view
2.156 -
download
3
description
Transcript of Metrics driven engineering (velocity 2011)
METRICS-DRIVENENGINEERING at
Kellan Elliott-McCrea, VP of [email protected] @kellan
Tuesday, June 5, 12
Tuesday, June 5, 12
Tuesday, June 5, 12
What is Etsy?
Tuesday, June 5, 12
8.5+ million items in the marketplace
Tuesday, June 5, 12
400,000+ active
Tuesday, June 5, 12
$300+ million in sales in 2010
~$41 million/month
Tuesday, June 5, 12
> $1000 / minute
Tuesday, June 5, 12
> 1 billion page views / month
Tuesday, June 5, 12
business in over 150 countries
Tuesday, June 5, 12
deploy the site, every ~20 minutes
Tuesday, June 5, 12
engineering team grew
~4x in 2010
Tuesday, June 5, 12
Metrics?
Tuesday, June 5, 12
Logs, Graphs, Trends,
and Correlations
Tuesday, June 5, 12
Metrics Driven?
Tuesday, June 5, 12
Making Decisions
Tuesday, June 5, 12
How many visitors are
using this thing?
Tuesday, June 5, 12
Can we deploy that to
100% of our visitors?
Tuesday, June 5, 12
Did we make it faster?
Tuesday, June 5, 12
Did I just break something?
Tuesday, June 5, 12
WHO MAKES THESE GRAPHS?
Well, the Ops team manages the network, racks the servers, installed the
monitoring tools, wears the pagers, blah, blah, blah...
Q.A.
Tuesday, June 5, 12
but... Engineers build
the application.
Tuesday, June 5, 12
Dev + Ops
Tuesday, June 5, 12
ACCESS
Tuesday, June 5, 12
Yes! No.
Tuesday, June 5, 12
“Engineers are too busy!”
Tuesday, June 5, 12
Here’s the BIG SECRET...
Tuesday, June 5, 12
... MAKE IT EASY!
Tuesday, June 5, 12
Simple, open source tools
Tuesday, June 5, 12
Cacti (network, SNMP)Ganglia (machines)Graphite (application)Splunk (log analysis, nightly reports)Nagios (alerting)
Tuesday, June 5, 12
Gan★cluster oriented★huge community contributed recipes★2.0 released today (including several Flickr and Etsy patches!)★gmetad makes it easy to track custom metrics
Tuesday, June 5, 12
Tuesday, June 5, 12
Graphite★super flexible collection and display★per metrics buckets★single instance ★super easy to write and use custom display functions
Tuesday, June 5, 12
Logging
Tuesday, June 5, 12
Logger::log_error("User login failed. Reason: $msg for
$username", “login”);
Tuesday, June 5, 12
web0054 [Fri Mar 04 16:27:48 2011] [error] [login] [14531658] User login failed. Reason: wrong
password for ...
Tuesday, June 5, 12
web0054 [Fri Mar 04 16:27:48 2011] [error] [login] [14531658] User login failed. Reason: wrong
password for ...
Tuesday, June 5, 12
web0054 [Fri Mar 04 16:27:48 2011] [error] [login] [14531658] User login failed. Reason: wrong
password for ...
Tuesday, June 5, 12
web0054 [Fri Mar 04 16:27:48 2011] [info] [login] [14531658] User login failed. Reason: wrong
password for ...
Tuesday, June 5, 12
web0054 [Fri Mar 04 16:27:48 2011] [info] [login] [14531658] User login failed. Reason: wrong
password for ...
Tuesday, June 5, 12
web0054 [Fri Mar 04 16:27:48 2011] [info] [login] [14531658] User login failed. Reason: wrong
password for ...
Tuesday, June 5, 12
Counting and Timinghttp://code.flickr.com/blog/2008/10/27/counting-timing/
Tuesday, June 5, 12
Logster
Tuesday, June 5, 12
Logsterhttps://github.com/etsy/logster
Tuesday, June 5, 12
Forked from ganglia-logtailer :
- Daemon mode (only cron mode) + Support for Graphite + Simplified parsing scripts
Tuesday, June 5, 12
web0001 [04:28:54 2011] [error] [client 10.101.x.x] Oh noooooo!web0001 [04:28:54 2011] [warning] [client 10.101.x.x] Gaaaaahhh!web0001 [04:28:54 2011] [error] [client 10.101.x.x] Help me, Rhonda.web0001 [04:28:54 2011] [error] [client 10.101.x.x] Oh noooooo!web0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!web0001 [04:28:54 2011] [error] [client 10.101.x.x] Heeeeeeellllllllllllllppppp!web0001 [04:28:54 2011] [error] [client 10.101.x.x] Oh noooooo!web0001 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!web0201 [04:28:54 2011] [warning] [client 10.101.x.x] Gaaaaahhh!web0034 [04:28:54 2011] [warning] [client 10.101.x.x] Oh noooooooooooweb0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!web1101 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!web0201 [04:28:54 2011] [error] [client 10.101.x.x] You've been eaten by a grue.web0055 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!!!web0002 [04:28:54 2011] [warning] [client 10.101.x.x] Sky is falling.web0089 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!web0020 [04:28:54 2011] [error] [client 10.101.x.x] Sky is falling.web1101 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!web0055 [04:28:54 2011] [warning] [client 10.101.x.x] Gaaaaahhh!web0001 [04:28:54 2011] [warning] [client 10.101.x.x] Oh noooooooooooweb0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!web0034 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!web0087 [04:28:54 2011] [fatal] [client 10.101.x.x] Sky is falling.web0002 [04:28:54 2011] [error] [client 10.101.x.x] Oh noooooo!web0201 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!web0077 [04:28:54 2011] [warning] [client 10.101.x.x] Gaaaaahhh!web0355 [04:28:54 2011] [warning] [client 10.101.x.x] Oh noooooooooooweb0052 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!web0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!web0003 [04:28:54 2011] [error] [client 10.101.x.x] You've been eaten by a grue.web0066 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!!!web0001 [04:28:54 2011] [warning] [client 10.101.x.x] Sky is falling
Tuesday, June 5, 12
Fatals Errors Warnings
Tuesday, June 5, 12
★runs out of cron★maintains a cursor into log files★supports ganglia and graphite ★custom parsers much easier to write then gmetad
Tuesday, June 5, 12
Apache access logs
Tuesday, June 5, 12
LogFormat "%h %l %u %t \"%r\" %>s %b" common
Tuesday, June 5, 12
LogFormat "%{X-Forwarded-For}i %{True-Client-IP}i %l %u %t \"%r\" %>s %b
\"%{Referer}i\" \"%{User-Agent}i\" %{etsy_shop_id}n %{etsy_uaid}n %V %
{etsy_ab_selections}n %{etsy_request_uuid}n %
{etsy_api_consumer_key}n %{etsy_api_method_name}n %
{php_memory_usage_bytes}n %{php_time_microsec}n %D" combined
Tuesday, June 5, 12
%{etsy_ab_selections}n
Tuesday, June 5, 12
%{etsy_uaid}n
Tuesday, June 5, 12
Graphs
Tuesday, June 5, 12
“If Engineering at Etsy has a religion, it’s the Church of Graphs. If it moves, we
track it.” - Erik Kastner
http://codeascraft.etsy.com/2011/02/15/measure-anything-measure-everything/
Tuesday, June 5, 12
Tuesday, June 5, 12
StatsD
Tuesday, June 5, 12
StatsDhttps://github.com/etsy/statsd/
Tuesday, June 5, 12
StatsD::increment("logins.success");StatsD::timing("gearman.time", $msec);
Tuesday, June 5, 12
StatsD::timing("gearman.time", $msec);
90th pct
average
lower
Tuesday, June 5, 12
Ad hocname value timestamp
Tuesday, June 5, 12
echo "events.deploy.site 1 `date +%s`" \| nc graphite.etsycorp.com 2003
Tuesday, June 5, 12
Correlations
Tuesday, June 5, 12
echo "events.deploy.site 1 `date +%s`" \| nc graphite.etsycorp.com 2003
Tuesday, June 5, 12
Trends + Eventstarget=drawAsInfinite(events.deploy.site)
Tuesday, June 5, 12
What Happened?
Tuesday, June 5, 12
Holt-Winters
Tuesday, June 5, 12
"Forecasting Sales by Exponentially Weighted Moving Averages". Peter
Tuesday, June 5, 12
"Aberrant Behavior Detection in Time Series for Network Monitoring".
Tuesday, June 5, 12
"Holt-Winters Forecasting Applied to Poisson
Processes in Real-Time".
Tuesday, June 5, 12
holtWintersConfidence(Upper|Lower)
Tuesday, June 5, 12
holtWintersAberration
Tuesday, June 5, 12
business metrics with confidence bands
==alertable business metrics
Tuesday, June 5, 12
16,000 metrics in GRAPHITE
(plus 32,000 metrics in GANGLIA)
Tuesday, June 5, 12
16,000 metrics in GRAPHITE
(plus 32,000 metrics in GANGLIA)
Tuesday, June 5, 12
Dashboards
Tuesday, June 5, 12
Dashboards
Tuesday, June 5, 12
Dashboards
Tuesday, June 5, 12
<a href="http://graphite.etsycorp.com/render?from=-1hours&width=800&height=600&title=File+or+Script+Not+Found&yMin=0&target=webs.errorLog.notExist&target=drawAsInfinite%28deploys.config.production%29&target=drawAsInfinite%28deploys.web.production%29&target=drawAsInfinite%28deploys.search.production%29&target=drawAsInfinite%28deploys.imagestorage.other%29&colorList=%2300cc00,%230000ff,%23ff0000,%23006633,%23cc6600"> <img src="http://graphite.etsycorp.com/render?from=-1hours&width=280&height=220&title=File+or+Script+Not+Found&hideLegend=1&yMin=0&target=webs.errorLog.notExist&target=drawAsInfinite%28deploys.config.production%29&target=drawAsInfinite%28deploys.web.production%29&target=drawAsInfinite%28deploys.search.production%29&target=drawAsInfinite%28deploys.imagestorage.other%29&colorList=%2300cc00,%230000ff,%23ff0000,%23006633,%23cc6600"></a>
Hard
Tuesday, June 5, 12
$g = new Graphite($time);$g->setTitle('File Not Found');$g->addMetric('webs.errorLog.notExist', '#00cc00');$g->showDeploys(true);echo $g->getDashboardHTML(280, 220);
Easy!
Tuesday, June 5, 12
48 dashboards by32 engineers
Tuesday, June 5, 12
Application health
Tuesday, June 5, 12
High-level visibility
Tuesday, June 5, 12
Low MTTD
Tuesday, June 5, 12
Confidence
Tuesday, June 5, 12
Make metrics
Tuesday, June 5, 12
Make metrics
Tuesday, June 5, 12
Make metrics
Tuesday, June 5, 12
Not that much
Tuesday, June 5, 12
codeascraft.etsy.comgithub.com/etsy/statsdgithub.com/etsy/logster
bitbucket.org/maplebed/ganglia-logtailer
Tuesday, June 5, 12
Questions?
Tuesday, June 5, 12