Lies, Damn Lies, and Benchmarks

21
Lies Damn Lies & Benchmarks Steven Lembark Workhorse Computing [email protected]

Transcript of Lies, Damn Lies, and Benchmarks

Page 1: Lies, Damn Lies, and Benchmarks

LiesDamn Lies& Benchmarks

Steven LembarkWorkhorse [email protected]

Page 2: Lies, Damn Lies, and Benchmarks

“Perl is too slow”

Heard that before? Yeah...

Mostly wrong – can't refute it without data.

Need to benchmark the times.

Page 3: Lies, Damn Lies, and Benchmarks

Damn lies...

Good benchmarks find realistic times.

Most benchmarks prove a point.

They get ignored.

Ignored results are not lazy.

Page 4: Lies, Damn Lies, and Benchmarks

Benchmarking perl

The *NIX “time” command.

Good enough to answer most questions.

Avoids much Benchmarking Stuff (“BS”).

Page 5: Lies, Damn Lies, and Benchmarks

Simplest tool: “time”

real, system, and user times.

real time heavily affected by system load.

system + user better indication of “work”.

real – work = blocked.

Page 6: Lies, Damn Lies, and Benchmarks

“bash takes less time to start up”

perl isn't any slower:

Zero work for both.

Real is all blocked.

$ time perl -e 0

real 0m0.005suser 0m0.000ssys 0m0.000s

$ time bash /dev/null

real 0m0.005suser 0m0.000ssys 0m0.000s

Page 7: Lies, Damn Lies, and Benchmarks

BS: Startup Times

If something just ran it is probably in core.

Saves overhead running it the second time.

Run everything twice to benchmark startups.

Multiple runs or single-user manage background noise.

Page 8: Lies, Damn Lies, and Benchmarks

Minimizing startup issues

Save kernel calls, context switches, interrupts, latency, transfer I/O...

tmpfs on linux minimizes overhead.

Test with un-loaded system.

Avoid “virtual” systems (CPU, EMC) unless that is what you are testing.

Page 9: Lies, Damn Lies, and Benchmarks

What does startup time tell us?

Opterons are fast?

Useless by itself.

Necessary baseline.

Differences are a warning.

Page 10: Lies, Damn Lies, and Benchmarks

Analyzing startup times.

Big differences usually indicate a problem:

Mis-compiled: “-O0” “-g” on production code.

Mixing 32- and 64-bit code and O/S.

Background noise from other running jobs.

Botched startups leave everything else suspect.

Page 11: Lies, Damn Lies, and Benchmarks

Do something!

OK, let's time an operation.

Listing a directory is common enough.

“ls” lists the contents, sorts lexically.

Perl's “glob” is similar.

Page 12: Lies, Damn Lies, and Benchmarks

Trivial persuit: ls vs glob.

lembark@dizzy etl $ time bash -c '/bin/ls -d /tmp/*'

real 0m0.007suser 0m0.000ssys 0m0.000s

lembark@dizzy etl $ time perl -e '$\="\n"; $,=" "; print glob "/tmp/*"'

real 0m0.019suser 0m0.010ssys 0m0.000s

Mostly blocked: 7ms bash vs. 9ms perl.

Failing to clear the screen can skew results!

Remote display, virtual machines.

Page 13: Lies, Damn Lies, and Benchmarks

BS: Milliseconds matter

Really care about 12ms? OK, perl is slower.

Most of the difference is in blocked time.

Hint: perl and shell block at the same rate.

perl compiles a statement, which adds overhead.

Use “ls” for what it is.

Page 14: Lies, Damn Lies, and Benchmarks

Doing more

Search files using their basenames:

Find all of the basenames from “2012.05.05” through “2012.05.16”.

First step: How many files are there?

Page 15: Lies, Damn Lies, and Benchmarks

Times

Compare File::Find with /bin/find.

Roughly same system time, added user for compile.

Shell is faster because it is single-purpose.$ time find . -type f | wc -l;18583

real 0m0.080suser 0m0.020ssys 0m0.050s

$ time perl -MFile::Find -e 'my $i = 0; find sub { -l or -d or ++$i },"."; print $i, "\n"'18583

real 0m0.274suser 0m0.220ssys 0m0.050s

Page 16: Lies, Damn Lies, and Benchmarks

Multi-layer pipesCompare the basename to a regex.

Shell:

find . -type f | xargs -l1 basename |

egrep -E '2012.05.(?:0[5-9]|1[0-6])'

Find files, extract basenames, and search with extended syntax (largely borrowed from Perl).

One-liner with perl, File::Find & File::Basename.

Page 17: Lies, Damn Lies, and Benchmarks

BS: Forks & pipes are “free”.

Real, user, and system time are higher for bash.

xargs has to fork/exec many copies of basename.

system overhead from buffering pipes is also higher.

Plumbing is expensive!$ time find . -type f | xargs -l1 basename | egrep -E '2012.05.(?:0[5-9]|1[0-6])' | wc -l1604

real 0m29.823suser 0m0.710ssys 0m4.220s

$ time perl -MFile::Find=find -MFile::Basename=basename -e 'my $i=0; find sub { -l || -d and return;/2012.05.(?:0[5-9]|1[0-6])/ and ++$i }, "."; print $i, "\n"'1604

real 0m0.301suser 0m0.170ssys 0m0.130s

Page 18: Lies, Damn Lies, and Benchmarks

Replacing content “in place”

perl's “-i” replaces files in place.

Shell pre-opens files, can't “sort -d < a > a”.

Shell requires “sort -d < a > b && mv b a”.

Now imagine filtering a few thousand files...

Page 19: Lies, Damn Lies, and Benchmarks

perl -n & -p with -i

Say you have to update the package names for a few hundred modules from “::Source” to “::RDS”.

Mixing shell with perl:

find . -type f | xargs perl -i -p -e's/::Source\b/::RDS/g';

Exercise: Try writing this in pure shell.

Page 20: Lies, Damn Lies, and Benchmarks

Running it doesn't take long eitherNice division of labor:

find & xargs deal with the names.

perl deals with the regex.

not much typing either way.

not much time either.$ time find . -type f | xargs perl -i -p -e 's/::Source\b/::RDS/g'

real 0m0.112suser 0m0.044ssys 0m0.016s

Page 21: Lies, Damn Lies, and Benchmarks

What this means to you.

Plumbing and forks are not free.

Single-purpose programs faster for one thing.

Chaining the simpler tools adds overhead.

Languages faster for multi-stage tasks.