Dash Profiler 200910

22
DashProfiler Lightweight Code Instrumentation [email protected] - October 2009

Transcript of Dash Profiler 200910

Page 1: Dash Profiler 200910

DashProfilerLightweight Code Instrumentation

[email protected] - October 2009

Page 2: Dash Profiler 200910

A Problem

A web application ~100K lines of code

Using many external services

If response time goes up... what’s causing it?

Continuous monitoring in production

Must have very low CPU and I/O cost

Minimal code changes

Page 3: Dash Profiler 200910

A Typical Approachpackage MyNetIO;

sub send_request {my ($hostname, $request) = @_

...send to $hostname...

}

How much time was spent sending the request?

Page 4: Dash Profiler 200910

A Typical Approachpackage MyNetIO;

use Time::Hires qw(time);

sub send_request {my ($hostname, $request) = @_my $start = time();

...send to $hostname...

$durations->{MyNetIO}{$hostname} = time() - $start;}

• Doesn’t record count so can’t produce averages.• Two lines of code. Worse if multiple return statements.• Doesn’t record time if function exits via an exception.

Still need to write code to flush

Page 5: Dash Profiler 200910

A Solution: DashProfiler

Simple

Flexible

Lightweight

Page 6: Dash Profiler 200910

DashProfiler

• Can group samples into granular time units

• Can measure exclusive time in a period

• Can flush to disk at intervals

• Just needs one line of code per sample

Page 7: Dash Profiler 200910

DashProfiler Internals

Built on DBI::Profile, part of the DBI

Aggregates measurements into a data tree

Two-level tree by default:

$root->{ $key1 }->{ $key2 }->[ ...leaf node... ]

$root->{ ‘MyNetIO’ }->{ $hostname }->[ ...leaf node... ]

Page 8: Dash Profiler 200910

DashProfiler Data

Each leaf node in the tree is a reference to an array:

$root->{ $key1 }->{ $key2 } = [ 106, # 0: count of samples at this node 0.0312958955764771, # 1: total duration 0.000490069389343262, # 2: first duration 0.000176072120666504, # 3: shortest duration 0.00140702724456787, # 4: longest duration 1023115819.83019, # 5: time of first sample 1023115819.86576, # 6: time of last sample ]

First sample populates allLater samples always update 0, 1, and 6and may update 3 or 4

Page 9: Dash Profiler 200910

DashProfiler By-Time

Optional extra time level in the data tree

$time = int(time() / $granularity) * $granularity;

$root->{ $time }->{ ‘MyNetIO’ }->{ $hostname }->[ ... ]

So a new sub-tree is grown each granularity seconds

Page 10: Dash Profiler 200910

DashProfiler Config

use DashProfiler;

DashProfiler->add_profile( foo => { } );

DashProfiler->add_profile( foo => {granularity => 10,flush_interval => 600,flush_hook => sub { ... }, sample_class => ‘DashProfiler::Sample’,dbi_profile_class => ‘DBI::Profile’,period_exclusive => ...,period_summary => ...,...});

Create named profiles

Lots of features

Page 11: Dash Profiler 200910

Without DashProfiler

package MyNetIO;

use Time::Hires qw(time);

sub send_request {my ($hostname, $request) = @_my $start = time();

...send to $hostname...

$durations->{MyNetIO}{$hostname} = time() - $start;}

Page 12: Dash Profiler 200910

Without DashProfiler

package MyNetIO;

use DashProfiler::Import foo_profiler => [ ‘MyNetIO’ ];

sub send_request {my ($hostname, $request) = @_my $sample = foo_profiler( $hostname );

...send to $hostname...

}

- DashProfiler::Import imports a pre-curried profiler code ref

- Profilers return bless object containing timestamp

- Object destruction triggers accumulation of sample

Page 13: Dash Profiler 200910

With DashProfiler

package MyNetIO;

use DashProfiler::Import foo_profiler => [ ‘MyNetIO’ ];

sub send_request {my ($hostname, $request) = @_my $sample = foo_profiler( $hostname );

...send to $hostname...

}

Name of profile created with add_profile()

Value to use for ‘key1’

Value to use for ‘key2’

Duration is measured when $sample goes out of scope

Page 14: Dash Profiler 200910

With DashProfiler

package MyNetIO;

use DashProfiler::Import foo_profiler => [ ‘MyNetIO’ ];

sub send_request {my ($hostname, $request) = @_my $sample = foo_profiler( $hostname ) if foo_profiler_enabled();

...send to $hostname...

}

Automatically imported compile-time constant

reduces cost to zeroif profile is disabled

Page 15: Dash Profiler 200910

DashProfiler FlushData is written to STDERR on exit, by default

Regular flushing is enabled by specifying a flush_interval

The dbi_profile_class handles the flush. Choices include:DBI::ProfileDBI::ProfileDataDBI::ProfileData::Apache

DashProfiler->add_profile( foo => {...,flush_interval => 600,dbi_profile_class => ‘DBI::ProfileData’,flush_hook => sub { ... },...

});

Page 16: Dash Profiler 200910

DashProfiler Periods

• Group samples into periods- e.g. http request to response- start_sample_period() and end_sample_period()

- counted, to enable averages and totals per period

- can output period counts instead of sample counts

• Measure ‘exclusive’ time- time from period start to end that’s not been

accounted for by other samples

- enabled via period_exclusive option

Page 17: Dash Profiler 200910

Example DataAverage response times over 24 hours

DashProfiler doesn’t generate graphs itself, but thedata can be used to create graphs like these

Page 18: Dash Profiler 200910

Example DataWorst case response times over 24 hours

Page 19: Dash Profiler 200910

DashProfiler Perspectives

• Each DashProfiler can have multiple DBI Profile objects attached

• Samples accumulate in all attached profiles

• Each profile can have a different Path

• giving different ‘perspectives’ or level of detail

- key1 + key2

- key1 + country + browser type

- key2 + browser type

- ... etc.

Page 20: Dash Profiler 200910

DashProfiler Per-Period

• Optional extra ‘per-period’ DBI profile

• Enabled via period_summary option

• Automatically attached and reset by start_sample_period()

• Gives current totals for this period

• Great for ‘debug footers’ on web page showing how much time was spent generating this page

Page 21: Dash Profiler 200910

DashProfiler Cost

0.000021s

Time cost of taking a sample:

- Time to create sample object, destroy it, accumulate the counts- In hot code can be 0.000015s- (Timings made on a 2GHz MacBook Pro Intel Core Duo)- (Could be made much faster by porting sampler class to C)