Deep into your applications, performance & profiling
-
Upload
fabien-arcellier -
Category
Engineering
-
view
197 -
download
1
Transcript of Deep into your applications, performance & profiling
DEEP INTO YOURAPPLICATION ...PERFORMANCE & PROFILING
/ Fabien Arcellier @farcellier
ABOUT @FARCELLIERTechnical Architect, Developer, Life-long learner at
Favourite subject : Devops,Performance & Software craftmanship
Octo Technology
WHAT'S THE MENUWhat means profiling a application ?How does it works ?Apply on real world application memcached
PROFILING IN A FEW WORDS ...
Software profiling is a form of dynamicprogram analysis that measures, forexample :
the space or time complexity of aprogramthe usage of particular instructionsthe frequency and duration of functioncalls, ...
calls, ...@copyright wikipedia
TO GET THIS SORT OF REPORT ...
TO HAVE A BETTER VIEW ON WHAT'SHAPPENS ON YOUR HARDWARE, ...
@copyright highscalability
TO IMPROVE YOUR APPLICATIONPERFORMANCE, ...
@copyright macifcourseaularge
You need measurements to improve continuously yourapplication performance.
TO UNDERSTAND YOUR
TO UNDERSTAND YOURAPPLICATION, ...
You want to understand what is consuming your CPU.
TO MONITOR YOUR SERVER, ...Flame Graph Search
app__libc_start_mainmain
dotmat_mul
You want to understand what your CPUs are doing.
AT THE BEGINNING THERE IS APROGRAM ...
int main(void) return 0;
int func1(void) return 0;
Use gcc to compile itgcc c app.c o app
WITH A SIMPLE SYMBOLS TABLE ...readelf Displays information about ELF files
readelf s app
45: 0000000000400580 2 FUNC GLOBAL DEFAULT 13 __libc_csu_fini46: 00000000004004f8 11 FUNC GLOBAL DEFAULT 13 func1...57: 0000000000601040 0 NOTYPE GLOBAL DEFAULT 25 _end58: 0000000000400400 0 FUNC GLOBAL DEFAULT 13 _start59: 0000000000601038 0 NOTYPE GLOBAL DEFAULT 25 __bss_start60: 00000000004004ed 11 FUNC GLOBAL DEFAULT 13 main...
00000000004004ed : Virtual address of the symbolFUNC : type.main : Name of the symbol
HOW IT WORKS ?
60: 00000000004004ed 11 FUNC GLOBAL DEFAULT 13 main
CAPTURE EVENTS AND ASSOCIATETHEM TO SYMBOLS
Generally we can list 3 type of profilers :
Instrumented profilingSampling profilingEvent-based profiling (Java, .Net, ...)
INSTRUMENTED PROFILINGGprof, Callgrind, ...
ProCapture all eventsGranularity
ConsSlower than raw execution (20 times slower forcallgrind)Intrusive (modify code assembly or emulate a virtualprocessor)What they capture and what they show could differs
TOOLING - CALLGRINDCallgrind is a callgraph analyzer that comes with Valgrind.Valgrind is a virtual machine using just-in-time (JIT)compilation techniques.
EXAMPLE WITH A MATRIX CALCULUS
You can instrument your execution with callgrind andexplore on kcachegrind.
SAMPLING PROFILINGPerf, Oprofile, Intel Vtune, ...
Pro~5 or 10% slower than raw executionRun on any code
ConsSome events are invisible
SANDBOX - WRITE MY OWNSAMPLING PROFILER
To understand how simple a sampling profiler is, write yourown thread dump using gdb.
gstack() tmp=$(tempfile) echo thread apply all bt >"$tmp" gdb batch nx q x "$tmp" p "$1" rm f "$tmp"
You execute with frequency to know where your program isspending time
while sleep 1; do gstack @pid@ ; done
TOOLING - PERF & FLAMEGRAPHPerf instrumentation appears on linux 2.6+ (Ubuntu 11.10& redhat 6)common interface for hardware counterFlamegraph is actively developped by Brendan Gregg
EXAMPLE WITH A MATRIX CALCULUSFlame Graph
app__libc_start_mainmain
dotmat_mul
We don't have any time record on mat_new, even if it'scalled 3 times.
FLAMEGRAPH INSTALLATIONgit clone https://github.com/brendangregg/FlameGraph.gitsudo ln s $PWD/flamegraph.pl /usr/bin/flamegraph.plsudo ln s $PWD/stackcollapseperf.pl /usr/bin/stackcollapseperf.plsudo ln s $PWD/stackcollapsejstack.pl /usr/bin/stackcollapsejstack.plsudo ln s $PWD/stackcollapsegdb.pl /usr/bin/stackcollapsegdb.pl
WHAT'S HAPPENDS INSIDEMEMCACHE ?
COMPILE MEMCACHEgit clone https://github.com/memcached/memcached.gitcd memcached./configure && make
WHAT'S HIDDEN INSIDE MEMCACHEBINARY ?
readelf s ./memcached
...434: 000000000040edf0 10 FUNC GLOBAL DEFAULT 13 slabs_rebalancer_resume435: 0000000000000000 0 FUNC GLOBAL DEFAULT UND setuid@@GLIBC_2436: 0000000000000000 0 FUNC GLOBAL DEFAULT UND event_base_loop437: 0000000000412fd0 315 FUNC GLOBAL DEFAULT 13 pause_threads438: 00000000004135e0 10 FUNC GLOBAL DEFAULT 13 STATS_LOCK439: 0000000000000000 0 FUNC GLOBAL DEFAULT UND getaddrinfo@@GLIBC_2440: 0000000000000000 0 FUNC GLOBAL DEFAULT UND strerror@@GLIBC_2441: 000000000040f550 201 FUNC GLOBAL DEFAULT 13 do_item_unlink442: 0000000000000000 0 FUNC GLOBAL DEFAULT UND event_init443: 0000000000000000 0 FUNC GLOBAL DEFAULT UND sleep@@GLIBC_2444: 0000000000412b40 247 FUNC GLOBAL DEFAULT 13 assoc_delete...
WHAT'S HAPPENS WHEN I WRITE 100RECORD ON MEMCACHE
Doing a test with valgrind (not production friendly)Capture cpu usage with gdbCapture cpu usage with perf_eventCapture cache miss with perf_event
MEMCACHE - PROFILING WITHCALLGRIND
Understand what's happen internally by following executiontrace.
valgrind tool=callgrind instratstart=no ./memcached
On another terminalcallgrind_control i onphp memcacheset.phpcallgrind_control i off
MEMCACHE - PROFILING WITHCALLGRIND
kcachegrind callgrind.out.@pid@
MEMCACHE - PROFILING WITH GDB./memcached &
while sleep 0.1; do gstack 8748; done > stack.txtcat stack.txt | stackcollapsegdb.pl | flamegraph.pl > gdb_graph.svg
In an another terminalphp memcacheset.php
MEMCACHE - PROFILING WITH PERFWe capture events to build callgraph
perf record g ./memcached
In an another terminalphp memcacheset.php
To show an interactive reportperf reportperf report stdio
MEMCACHE - PROFILING CPU CYCLEWITH PERF
perf script | stackcollapseperf.pl | flamegraph.pl > graph_stack_missing.svg
Flamegraph
Some information from kernel are missing.
MEMCACHED - PROFILING CPUCYCLE WITH PERF - WITH KERNEL
STACKTRACE./memcached &sudo perf record a g p @pid@
In an another terminalphp memcacheset.php
Generate the flamegraphperf script | stackcollapseperf.pl | flamegraph.pl > graph.svg
Flamegraph
MEMCACHED - PROFILING CACHEMISS WITH PERF
./memcached &sudo perf record e branchmisses a g p @pid@
SYSTEM - WHAT'S YOUR SYSTEM ISDOING ?
sudo perf record a g
USE FLAMEGRAPH WITH JAVAYou can export a flamegraph from jstack output
Logstash contention flamegraph
GOING FURTHERPerf wikiCallgrind docsBrendan Gregg websiteHow profilers lie: the cases of gprof and KCachegrindIntel Vtune
TO SUMMARYPrefer :
perf when you are looking for a bottleneck or you want towatch what's happens on a machinecallgrind when you want to understand what's happen inthe code and when the performance is not a requirement