Take My Logs. Please!

Post on 08-May-2015

3.879 views 1 download

description

Details on how we capture application data in our access and error logs, as well as how to generate quick reports and graphs from these logs. This talk was presented at O'Reilly's Velocity Online Conference on October 26, 2011.

Transcript of Take My Logs. Please!

Take my logs. Please.

Mike BrittainDirector of Engineering, InfrastructureEtsy.com

mike@etsy.com @mikebrittain

(hello?)

This sounds boooooorrrrring...No, no... hang in there!

25 MM uniques/month150 Countries$300 MM+ sales last year

Apache, PHP, MySQL, PostgreSQL,Memcache, Gearman,Solr, etc.

What’s working?

What’s working?Performance

What’s working?PerformanceOperability

What’s working?PerformanceOperabilitySimplicity

Logging + Trending

App logging(Apache access and error logs)

LogFormat "%h %l %u %t

\"%r\" %>s %b

“Common”

LogFormat "%h %l %u %t

\"%r\" %>s %b

\"%{Referer}i\"

\"%{User-agent}i\""

“Combined”

mod_log_config%f Filename requested

%k # of keepalive requests served on this connection

%T Time taken to serve the request, in seconds

%f Filename requested

%k # of keepalive requests served on this connection

%D Time taken to serve the request, in microseconds

mod_log_config

%f Filename requested

%k # of keepalive requests served on this connection

%D Time taken to serve the request, in microseconds

%{foobar}n Contents of “note” foobar from another module

mod_log_config

apache_note(“foobar”, $whatever);

apache_note()

LogFormat %{True-Client-IP}i %l %t \"%r\"

%>s %b \"%{Referer}i\"

\"%{User-Agent}i\" %V

%{user_id}n %{shop_id}n %{uaid}n

%{ab_selections}n %{request_uid}n

%{api_consumer_key}n

%{api_method_name}n

%{php_bytes}n %{php_microsec}n %D

“Steroids”

LogFormat %{True-Client-IP}i %l %t \"%r\"

%>s %b \"%{Referer}i\"

\"%{User-Agent}i\" %V

%{user_id}n %{shop_id}n %{uaid}n

%{ab_selections}n %{request_uid}n

%{api_consumer_key}n

%{api_method_name}n

%{php_bytes}n %{php_microsec}n %D

“Steroids”

$GLOBALS['timer'] = microtime(true) * 1000000;

$GLOBALS['timer'] = microtime(true) * 1000000;

register_shutdown_function('pageStats');

function pageStats() {

}

$GLOBALS['timer'] = microtime(true) * 1000000;

register_shutdown_function('pageStats');

function pageStats() {

$timer_end = microtime(true) * 1000000;

$diff = $timer_end - $GLOBALS['timer'];

}

$GLOBALS['timer'] = microtime(true) * 1000000;

register_shutdown_function('pageStats');

function pageStats() {

$timer_end = microtime(true) * 1000000;

$diff = $timer_end - $GLOBALS['timer'];

apache_note('php_microsec', $diff);

apache_note('php_bytes',

memory_get_peak_usage());

}

What about “%D”?

LogFormat %{True-Client-IP}i %l %t \"%r\"

%>s %b \"%{Referer}i\"

\"%{User-Agent}i\" %V

%{user_id}n %{shop_id}n %{uaid}n

%{ab_selections}n %{request_uid}n

%{api_consumer_key}n

%{api_method_name}n

%{php_bytes}n %{php_microsec}n %D

“Steroids”

LogFormat %{True-Client-IP}i %l %t \"%r\"

%>s %b \"%{Referer}i\"

\"%{User-Agent}i\" %V

%{user_id}n %{shop_id}n %{uaid}n

%{ab_selections}n %{request_uid}n

%{api_consumer_key}n

%{api_method_name}n

%{php_bytes}n %{php_microsec}n %D

“Steroids”

LogFormat %{True-Client-IP}i %l %t \"%r\"

%>s %b \"%{Referer}i\"

\"%{User-Agent}i\" %V

%{user_id}n %{shop_id}n %{uaid}n

%{ab_selections}n %{request_uid}n

%{api_consumer_key}n

%{api_method_name}n

%{php_bytes}n %{php_microsec}n %D

“Steroids”

LogFormat %{True-Client-IP}i %l %t \"%r\"

%>s %b \"%{Referer}i\"

\"%{User-Agent}i\" %V

%{user_id}n %{shop_id}n %{uaid}n

%{ab_selections}n ...

easy_reg=1; personalize_widget=0;

icon_in_cornflower_blue=1;

“Steroids”

Coming soon...%{locale}n (i18n)

%{platform}n (desktop vs. mobile)

%{locale}n (i18n)

%{platform}n (desktop vs. mobile)

OPS-1805, OPS-1827

etsy.com/careers

Coming soon...

Using something else?time, http method, request uri, response code, referer, user-agent, response time, response memory, custom segmentation fields...

Quick averagesgrep "GET /listing/" access.log | \

awk '{sum=sum+$(NF-1)} END {print sum/NR}'

Quick graphsgrep "GET /listing/" access.log | \

perl -pe "s/.*\[.*\d{4}:(\d{2}):(\d{2}):\d{2}.*\]/\1:\2/" | \

awk '{print $1, $(NF-1)}' > /tmp/pagetimes.dat

gives you...

Quick graphs# /tmp/pagetimes.dat

18:37 251.018:38 252.118:39 253.518:40 251.018:45 250.0

and then...

Quick graphs# GNUPLOT

set terminal png

set output 'listings.png'

set yrange [0:2000]

set xdata time

set timefmt "%d/%B/%Y:%H:%M:%S"

set format x "%H:%M"

plot '/tmp/pagetimes.dat' using 1:2 with points

Quick graphs

Error logsPHP + Apache errors in one fileSimple logging interface

Error logsLevels: error, info, debugNamespace: perf, sql, __class__

Logger::error("Query exceeded 5 sec: $query",

“sql_long_query”);

web0054 [Fri Mar 04 16:27:48 2011] [error]

[sql_long_query] [mk04gw1p71] Query exceeded

5 sec: SELECT * FROM ...

web0054 [Fri Mar 04 16:27:48 2011] [error]

[sql_long_query] [mk04gw1p71] Query exceeded

5 sec: SELECT * FROM ...

$ grep "16:27:48" access.log | wc -l

1527

web0054 [Fri Mar 04 16:27:48 2011] [error]

[sql_long_query] [mk04gw1p71] Query exceeded

5 sec: SELECT * FROM ...

iowerror.log -> request_uid -> access.log

request uri, ab selections, user id, locale, platform, api key, etc.

Filteringtail -f error.log | grep -v “sql_long_query” | ...

web0001 [04:28:54 2011] [error] [client 10.101.x.x] Oh noooooo!web0001 [04:28:54 2011] [warning] [client 10.101.x.x] Gaaaaahhh!web0001 [04:28:54 2011] [error] [client 10.101.x.x] Help me, Rhonda.web0001 [04:28:54 2011] [error] [client 10.101.x.x] Oh noooooo!web0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!web0001 [04:28:54 2011] [error] [client 10.101.x.x] Heeeeeeellllllllllllllppppp!web0001 [04:28:54 2011] [error] [client 10.101.x.x] Oh noooooo!web0001 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!web0201 [04:28:54 2011] [warning] [client 10.101.x.x] Gaaaaahhh!web0034 [04:28:54 2011] [warning] [client 10.101.x.x] Oh noooooooooooweb0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!web1101 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!web0201 [04:28:54 2011] [error] [client 10.101.x.x] You've been eaten by a grue.web0055 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!!!web0002 [04:28:54 2011] [warning] [client 10.101.x.x] Sky is falling.web0089 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!web0020 [04:28:54 2011] [error] [client 10.101.x.x] Sky is falling.web1101 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!web0055 [04:28:54 2011] [warning] [client 10.101.x.x] Gaaaaahhh!web0001 [04:28:54 2011] [warning] [client 10.101.x.x] Oh noooooooooooweb0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!web0034 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!web0087 [04:28:54 2011] [fatal] [client 10.101.x.x] Sky is falling.web0002 [04:28:54 2011] [error] [client 10.101.x.x] Oh noooooo!web0201 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!web0077 [04:28:54 2011] [warning] [client 10.101.x.x] Gaaaaahhh!web0355 [04:28:54 2011] [warning] [client 10.101.x.x] Oh noooooooooooweb0052 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!web0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!web0003 [04:28:54 2011] [error] [client 10.101.x.x] You've been eaten by a grue.web0066 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!!!web0001 [04:28:54 2011] [warning] [client 10.101.x.x] Sky is fallingweb0020 [04:28:54 2011] [error] [client 10.101.x.x] Sky is falling.web1101 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!web0055 [04:28:54 2011] [warning] [client 10.101.x.x] Gaaaaahhh!web0001 [04:28:54 2011] [warning] [client 10.101.x.x] Oh noooooooooooweb0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!web0034 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!web0087 [04:28:54 2011] [fatal] [client 10.101.x.x] Sky is falling.

Trendingfatals errors warnings

LogsterRun by cronMaintains a cursor on log filesSimple parsing & aggregationOutput to Ganglia or Graphite

github.com/etsy

web0054 [Fri Mar 04 16:27:48 2011] [error] [login] [mk04gw1p71] User login failed.

Reason: wrong password for ...

^.+ \[.+\] \[(?P<log_level>.+)\]

if (fields['log_level'] == “fatal”): self.fatals += 1

elif (fields['log_level'] == “error”): self.errors += 1

elif (fields['log_level'] == “warning”): self.warnings += 1

...

MetricObject("fatals", (self.fatals / self.duration), "per sec")

MetricObject("errors", (self.errors / self.duration), "per sec")

MetricObject("warning", (self.warnings / self.duration), "per sec")

fatals errors warnings

Logster

Signed-in vs. Signed-out

github.com/etsy

Log a plethora of data.Don’t be afraid to use one file.

Use custom fields to segment data.

Correlate errors to specific requests.

Make f#@k!ng graphs.

Convert rates to trend lines.

Take my logs. Please!

Mike BrittainDirector of Engineering, InfrastructureEtsy.com

mike@etsy.com @mikebrittain

codeascraft.etsy.comgithub.com/etsy

Thank you.