Lawrence Livermore National Laboratory Pianola: A script-based I/O benchmark Lawrence Livermore...

14
Lawrence Livermore National Laboratory Pianola: A script-based I/O benchmark Lawrence Livermore National Laboratory, P. O. Box 808, Livermore, CA 94551 This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344 John May PSDW08, 17 November 2008 LLNL-PRES-406688

Transcript of Lawrence Livermore National Laboratory Pianola: A script-based I/O benchmark Lawrence Livermore...

Page 1: Lawrence Livermore National Laboratory Pianola: A script-based I/O benchmark Lawrence Livermore National Laboratory, P. O. Box 808, Livermore, CA 94551.

Lawrence Livermore National Laboratory

Pianola: A script-based I/O benchmark

Lawrence Livermore National Laboratory, P. O. Box 808, Livermore, CA 94551

This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344

John MayPSDW08, 17 November 2008

LLNL-PRES-406688

Page 2: Lawrence Livermore National Laboratory Pianola: A script-based I/O benchmark Lawrence Livermore National Laboratory, P. O. Box 808, Livermore, CA 94551.

2Lawrence Livermore National Laboratory

I/O benchmarking: What’s going on here?

Is my computer’s I/O system “fast”? Is the I/O system keeping up with my application? Is the app using the I/O system effectively? What tools do I need to answer these questions?

And what exactly do I mean by “I/O system” anyway?• For this talk, an I/O system is everything involved in

storing data from the filesystem to the storage hardware

Page 3: Lawrence Livermore National Laboratory Pianola: A script-based I/O benchmark Lawrence Livermore National Laboratory, P. O. Box 808, Livermore, CA 94551.

3Lawrence Livermore National Laboratory

Existing tools can measure general or application-specific performance

IOzone automatically measures I/O system performance for different operations and parameters

Relatively little ability to customize I/O requests

Many application-oriented benchmarks• SWarp, MADbench2…

Interface-specific benchmarks• IOR, //TRACE

64256

1024 40961638465536

26214410485764194304

4

32

256

2048

16384

0

100000

200000

300000

400000

500000

600000

KB/sec

File size (KB)

Record size (KB)

Write new file

500000-600000

400000-500000

300000-400000

200000-300000

100000-200000

0-100000

Page 4: Lawrence Livermore National Laboratory Pianola: A script-based I/O benchmark Lawrence Livermore National Laboratory, P. O. Box 808, Livermore, CA 94551.

4Lawrence Livermore National Laboratory

Measuring the performance that matters

System benchmarks only measure general response, not application-specific response

Third-party application-based benchmarks may not generate the stimulus you care about

In-house applications may not be practical benchmarks• Difficult for nonexperts to build and run• Nonpublic source cannot be distributed to vendors

and collaborators Need benchmarks that…• Can be generated and used easily • Model application-specific characteristics

Page 5: Lawrence Livermore National Laboratory Pianola: A script-based I/O benchmark Lawrence Livermore National Laboratory, P. O. Box 808, Livermore, CA 94551.

5Lawrence Livermore National Laboratory

Script-based benchmarks emulate real apps

Capture trace data from application and generate the same sequence of operations in a replay-benchmark

We began with //TRACE from CMU (Ganger’s group)• Records I/O events and intervening “compute” times• Focused on parallel I/O, but much of the infrastructure is useful for our

sequential I/O work

Application

Capture library

opencomputereadcompute…

Replay script Replay tool

Script-based benchmark

Page 6: Lawrence Livermore National Laboratory Pianola: A script-based I/O benchmark Lawrence Livermore National Laboratory, P. O. Box 808, Livermore, CA 94551.

6Lawrence Livermore National Laboratory

Challenges for script-based benchmarking:Recording I/O calls at the the right level

fprintf( … );/* work */fprintf( … );/* more work */fprintf( … );…

write( … );opencomputewrite…

Instrumenting at high level+ Easy with LD_PRELOAD- Typically generates more

events, so logs are bigger- Need to replicate formatting- Timing includes computation

Instrumenting at low level+ Fewer types of calls to

capture+ Instrumentation is at I/O

system interface- Cannot use LD_PRELOAD to

intercept all calls

Page 7: Lawrence Livermore National Laboratory Pianola: A script-based I/O benchmark Lawrence Livermore National Laboratory, P. O. Box 808, Livermore, CA 94551.

7Lawrence Livermore National Laboratory

First attempt at capturing system calls:Linux strace utility

Records any selected set of system calls Easy to use: just add to command line Produces easily parsed output

$ strace -r -T -s 0 -e trace=file,desc ls 0.000000 execve("/bin/ls", [, ...], [/* 29 vars */]) = 0 <0.000237> 0.000297 open("/etc/ld.so.cache", O_RDONLY) = 3 <0.000047> 0.000257 fstat64(3, {st_mode=S_IFREG|0644, st_size=64677, ...}) = 0 <0.000033> 0.000394 close(3) = 0 <0.000015> 0.000230 open("/lib/librt.so.1", O_RDONLY) = 3 <0.000046> 0.000289 read(3, ""..., 512) = 512 <0.000028> ...

Page 8: Lawrence Livermore National Laboratory Pianola: A script-based I/O benchmark Lawrence Livermore National Laboratory, P. O. Box 808, Livermore, CA 94551.

8Lawrence Livermore National Laboratory

Strace results look reasonably accurate,but overall runtime is exaggerated

0

5

10

15

20

25

30

35

40

45

0 50 100 150 200 250 300 350 400 450

Execution Time

I/O Time

Application Read Application Write Replay Read Replay Write

Read (sec.) Write (sec.) Elapsed (sec.)

Uninstrumented -- -- 324

Instrumented Application

41.8 11.8 402

Replay 37.0 11.8 390

Page 9: Lawrence Livermore National Laboratory Pianola: A script-based I/O benchmark Lawrence Livermore National Laboratory, P. O. Box 808, Livermore, CA 94551.

9Lawrence Livermore National Laboratory

For accurate recording, gather I/O calls using binary instrumentation

Can intercept and instrument specific system-level calls Overhead of instrumentation is paid at program startup Existing Jockey library works well for x86_32, but not ported to

other platforms Replay can be portable, though

mov 4 (%esp), %ebx

int $0x80

ret

mov $0x4, %eax

save current statemov 4 (%esp), %ebx

nop

ret

jmp to trampoline call my write function

restore state

jmp to original code

InstrumentedUninstrumented

Page 10: Lawrence Livermore National Laboratory Pianola: A script-based I/O benchmark Lawrence Livermore National Laboratory, P. O. Box 808, Livermore, CA 94551.

10Lawrence Livermore National Laboratory

Issues for accurate replay

Replay engine must be able to read and parse events quickly

Reading script must not interfere significantly with I/O activities being replicated

Script must be portable across platforms

Accurate replayMinimize I/O impact of reading script

Correctly reproduce inter-event delays

opencomputewrite…

Replayed I/O events

Page 11: Lawrence Livermore National Laboratory Pianola: A script-based I/O benchmark Lawrence Livermore National Laboratory, P. O. Box 808, Livermore, CA 94551.

11Lawrence Livermore National Laboratory

Accurate replay:Preparsing, compression, and buffering

Text-formatted output script is portable across platforms Instrumentation output is parsed into binary format and

compressed (~30:1)• Conversion done on target platform

Replay engine reads and buffers script data during “compute” phases between I/O events

opencomputewrite…

010010110101010110…

……

preparsing compression buffering

Page 12: Lawrence Livermore National Laboratory Pianola: A script-based I/O benchmark Lawrence Livermore National Laboratory, P. O. Box 808, Livermore, CA 94551.

12Lawrence Livermore National Laboratory

Replay timing and profile match original application well

0

5

10

15

20

25

30

35

40

0 50 100 150 200 250 300 350 400

Execution Time

I/O Time

Application Read Application Write Replay Read Replay Write

Read (sec.) Write (sec.) Elapsed(sec.)

Uninstrumented -- -- 314

Instrumented Application

35.8 12.8 334

Replay 35.7 12.5 319

Page 13: Lawrence Livermore National Laboratory Pianola: A script-based I/O benchmark Lawrence Livermore National Laboratory, P. O. Box 808, Livermore, CA 94551.

13Lawrence Livermore National Laboratory

Things that didn’t help

Compressing text script as it’s generated• Only 2:1 compression• Time of I/O events themselves are not what’s very

important during instrumentation phase Replicating the memory footprint• Memory used by application is taken from same pool

as I/O buffer cache• Smaller application (like the replay engine) should go

faster because more buffer space available• Replicated memory footprint by tracking brk() and

mmap() calls, but it made no difference!

Page 14: Lawrence Livermore National Laboratory Pianola: A script-based I/O benchmark Lawrence Livermore National Laboratory, P. O. Box 808, Livermore, CA 94551.

14Lawrence Livermore National Laboratory

Conclusions on script-based I/O benchmarking

Gathering accurate I/O traces is harder than it seems• Currently, no solution is both portable and efficient

Replay is easier, but efficiency still matters Many possibilities for future work—which matter most?• File name transformation• Parallel trace and replay• More portable instrumentation• How to monitor mmap’d I/O?