Lies, Damn Lies, and Benchmarks

LiesDamn Lies& Benchmarks

Steven LembarkWorkhorse [email protected]

“Perl is too slow”

Heard that before? Yeah...

Mostly wrong – can't refute it without data.

Need to benchmark the times.

Damn lies...

Good benchmarks find realistic times.

Most benchmarks prove a point.

They get ignored.

Ignored results are not lazy.

Benchmarking perl

The *NIX “time” command.

Good enough to answer most questions.

Avoids much Benchmarking Stuff (“BS”).

Simplest tool: “time”

real, system, and user times.

real time heavily affected by system load.

system + user better indication of “work”.

real – work = blocked.

“bash takes less time to start up”

perl isn't any slower:

Zero work for both.

Real is all blocked.

$ time perl -e 0

real 0m0.005suser 0m0.000ssys 0m0.000s

$ time bash /dev/null


BS: Startup Times

If something just ran it is probably in core.

Saves overhead running it the second time.

Run everything twice to benchmark startups.

Multiple runs or single-user manage background noise.

Minimizing startup issues

Save kernel calls, context switches, interrupts, latency, transfer I/O...

tmpfs on linux minimizes overhead.

Test with un-loaded system.

Avoid “virtual” systems (CPU, EMC) unless that is what you are testing.

What does startup time tell us?

Opterons are fast?

Useless by itself.

Necessary baseline.

Differences are a warning.

Analyzing startup times.

Big differences usually indicate a problem:

Mis-compiled: “-O0” “-g” on production code.

Mixing 32- and 64-bit code and O/S.

Background noise from other running jobs.

Botched startups leave everything else suspect.

Do something!

OK, let's time an operation.

Listing a directory is common enough.

“ls” lists the contents, sorts lexically.

Perl's “glob” is similar.

Trivial persuit: ls vs glob.

lembark@dizzy etl $ time bash -c '/bin/ls -d /tmp/*'


lembark@dizzy etl $ time perl -e '$\="\n"; $,=" "; print glob "/tmp/*"'


Mostly blocked: 7ms bash vs. 9ms perl.

Failing to clear the screen can skew results!

Remote display, virtual machines.

BS: Milliseconds matter

Really care about 12ms? OK, perl is slower.

Most of the difference is in blocked time.

Hint: perl and shell block at the same rate.

perl compiles a statement, which adds overhead.

Use “ls” for what it is.

Doing more

Search files using their basenames:

Find all of the basenames from “2012.05.05” through “2012.05.16”.

First step: How many files are there?

Times

Compare File::Find with /bin/find.

Roughly same system time, added user for compile.

Shell is faster because it is single-purpose.$ time find . -type f | wc -l;18583


$ time perl -MFile::Find -e 'my $i = 0; find sub { -l or -d or ++$i },"."; print $i, "\n"'18583


Multi-layer pipesCompare the basename to a regex.

Shell:

find . -type f | xargs -l1 basename |

egrep -E '2012.05.(?:0[5-9]|1[0-6])'

Find files, extract basenames, and search with extended syntax (largely borrowed from Perl).

One-liner with perl, File::Find & File::Basename.

BS: Forks & pipes are “free”.

Real, user, and system time are higher for bash.

xargs has to fork/exec many copies of basename.

system overhead from buffering pipes is also higher.

Plumbing is expensive!$ time find . -type f | xargs -l1 basename | egrep -E '2012.05.(?:0[5-9]|1[0-6])' | wc -l1604


$ time perl -MFile::Find=find -MFile::Basename=basename -e 'my $i=0; find sub { -l || -d and return;/2012.05.(?:0[5-9]|1[0-6])/ and ++$i }, "."; print $i, "\n"'1604


Replacing content “in place”

perl's “-i” replaces files in place.

Shell pre-opens files, can't “sort -d < a > a”.

Shell requires “sort -d < a > b && mv b a”.

Now imagine filtering a few thousand files...

perl -n & -p with -i

Say you have to update the package names for a few hundred modules from “::Source” to “::RDS”.

Mixing shell with perl:

find . -type f | xargs perl -i -p -e's/::Source\b/::RDS/g';

Exercise: Try writing this in pure shell.

Running it doesn't take long eitherNice division of labor:

find & xargs deal with the names.

perl deals with the regex.

not much typing either way.

not much time either.$ time find . -type f | xargs perl -i -p -e 's/::Source\b/::RDS/g'


What this means to you.

Plumbing and forks are not free.

Single-purpose programs faster for one thing.

Chaining the simpler tools adds overhead.

Languages faster for multi-stage tasks.

Lies, Damn Lies, and Benchmarks

Technology

Transcript of Lies, Damn Lies, and Benchmarks