GNU - Department of Computer Science · GNU gprof The gnu Proler Ja yF enlason and Ric hard...

44

Transcript of GNU - Department of Computer Science · GNU gprof The gnu Proler Ja yF enlason and Ric hard...

Page 1: GNU - Department of Computer Science · GNU gprof The gnu Proler Ja yF enlason and Ric hard Stallman. This man ual describ es the gnu proler gprof and ho wy ou can use it to determine

GNU gprof

The gnu Pro�ler

Jay Fenlason and Richard Stallman

Page 2: GNU - Department of Computer Science · GNU gprof The gnu Proler Ja yF enlason and Ric hard Stallman. This man ual describ es the gnu proler gprof and ho wy ou can use it to determine

This manual describes the gnu pro�ler� gprof� and how you can use it todetermine which parts of a program are taking most of the execution time�We assume that you know how to write� compile� and execute programs�gnu gprof was written by Jay Fenlason�

This manual was edited January ���� by Je�rey Osier and updated Septem�ber ���� by Brent Baccala�

Copyright c� ��� ���� ����� ��� Free Software Foundation� Inc�

Permission is granted to make and distribute verbatim copies of this manualprovided the copyright notice and this permission notice are preserved on allcopies�

Permission is granted to copy and distribute modi�ed versions of this manualunder the conditions for verbatim copying� provided that the entire resultingderived work is distributed under the terms of a permission notice identicalto this one�

Permission is granted to copy and distribute translations of this manual intoanother language� under the same conditions as for modi�ed versions�

Page 3: GNU - Department of Computer Science · GNU gprof The gnu Proler Ja yF enlason and Ric hard Stallman. This man ual describ es the gnu proler gprof and ho wy ou can use it to determine

Chapter �� Introduction to Pro�ling �

� Introduction to Pro�ling

Pro�ling allows you to learn where your program spent its time andwhich functions called which other functions while it was executing� Thisinformation can show you which pieces of your program are slower thanyou expected� and might be candidates for rewriting to make your programexecute faster� It can also tell you which functions are being called moreor less often than you expected� This may help you spot bugs that hadotherwise been unnoticed�

Since the pro�ler uses information collected during the actual execution ofyour program� it can be used on programs that are too large or too complexto analyze by reading the source� However� how your program is run willa�ect the information that shows up in the pro�le data� If you don�t usesome feature of your program while it is being pro�led� no pro�le informationwill be generated for that feature�

Pro�ling has several steps�

� You must compile and link your program with pro�ling enabled� SeeChapter Compiling�� page ��

� You must execute your program to generate a pro�le data �le� SeeChapter � Executing�� page ��

� You must run gprof to analyze the pro�le data� See Chapter � Invok�ing�� page ��

The next three chapters explain these steps in greater detail�

Several forms of output are available from the analysis�

The �at pro�le shows how much time your program spent in each func�tion� and how many times that function was called� If you simply want toknow which functions burn most of the cycles� it is stated concisely here�See Section ��� Flat Pro�le�� page ���

The call graph shows� for each function� which functions called it� whichother functions it called� and how many times� There is also an estimateof how much time was spent in the subroutines of each function� This cansuggest places where you might try to eliminate function calls that use a lotof time� See Section �� Call Graph�� page ���

The annotated source listing is a copy of the program�s source code�labeled with the number of times each line of the program was executed�See Section ��� Annotated Source�� page ��

To better understand how pro�ling works� you may wish to read a de�scription of its implementation� See Section ��� Implementation�� page ���

Page 4: GNU - Department of Computer Science · GNU gprof The gnu Proler Ja yF enlason and Ric hard Stallman. This man ual describ es the gnu proler gprof and ho wy ou can use it to determine

GNU gprof

Page 5: GNU - Department of Computer Science · GNU gprof The gnu Proler Ja yF enlason and Ric hard Stallman. This man ual describ es the gnu proler gprof and ho wy ou can use it to determine

Chapter � Compiling a Program for Pro�ling �

� Compiling a Program for Pro�ling

The �rst step in generating pro�le information for your program is tocompile and link it with pro�ling enabled�

To compile a source �le for pro�ling� specify the ��pg� option when yourun the compiler� �This is in addition to the options you normally use��

To link the program for pro�ling� if you use a compiler such as cc to dothe linking� simply specify ��pg� in addition to your usual options� The sameoption� ��pg�� alters either compilation or linking to do what is necessary forpro�ling� Here are examples�

cc �g �c myprog�c utils�c �pgcc �o myprog myprog�o utils�o �pg

The ��pg� option also works with a command that both compiles andlinks�

cc �o myprog myprog�c utils�c �g �pg

If you run the linker ld directly instead of through a compiler such as cc�you may have to specify a pro�ling startup �le �gcrt��o� as the �rst input �leinstead of the usual startup �le �crt��o�� In addition� you would probablywant to specify the pro�ling C library� �libc�p�a�� by writing ��lc�p� insteadof the usual ��lc�� This is not absolutely necessary� but doing this gives younumber�of�calls information for standard library functions such as read andopen� For example�

ld �o myprog �lib�gcrt��o myprog�o utils�o �lc�p

If you compile only some of the modules of the program with ��pg�� youcan still pro�le the program� but you won�t get complete information aboutthe modules that were compiled without ��pg�� The only information youget for the functions in those modules is the total time spent in them� thereis no record of how many times they were called� or from where� This willnot a�ect the �at pro�le �except that the calls �eld for the functions willbe blank�� but will greatly reduce the usefulness of the call graph�

If you wish to perform line�by�line pro�ling� you will also need to specifythe ��g� option� instructing the compiler to insert debugging symbols into theprogram that match program addresses to source code lines� See Section ��� Line�by�line�� page ��

In addition to the ��pg� and ��g� options� you may also wish to specify the��a� option when compiling� This will instrument the program to performbasic�block counting� As the program runs� it will count how many times itexecuted each branch of each �if� statement� each iteration of each �do� loop�etc� This will enable gprof to construct an annotated source code listingshowing how many times each line of code was executed�

Page 6: GNU - Department of Computer Science · GNU gprof The gnu Proler Ja yF enlason and Ric hard Stallman. This man ual describ es the gnu proler gprof and ho wy ou can use it to determine

� GNU gprof

Page 7: GNU - Department of Computer Science · GNU gprof The gnu Proler Ja yF enlason and Ric hard Stallman. This man ual describ es the gnu proler gprof and ho wy ou can use it to determine

Chapter �� Executing the Program �

� Executing the Program

Once the program is compiled for pro�ling� you must run it in orderto generate the information that gprof needs� Simply run the program asusual� using the normal arguments� �le names� etc� The program shouldrun normally� producing the same output as usual� It will� however� runsomewhat slower than normal because of the time spent collecting and thewriting the pro�le data�

The way you run the program�the arguments and input that you giveit�may have a dramatic e�ect on what the pro�le information shows� Thepro�le data will describe the parts of the program that were activated for theparticular input you use� For example� if the �rst command you give to yourprogram is to quit� the pro�le data will show the time used in initializationand in cleanup� but not much else�

Your program will write the pro�le data into a �le called �gmon�out� justbefore exiting� If there is already a �le called �gmon�out�� its contents areoverwritten� There is currently no way to tell the program to write thepro�le data under a di�erent name� but you can rename the �le afterward ifyou are concerned that it may be overwritten�

In order to write the �gmon�out� �le properly� your program must exitnormally� by returning from main or by calling exit� Calling the low�levelfunction �exit does not write the pro�le data� and neither does abnormaltermination due to an unhandled signal�

The �gmon�out� �le is written in the program�s current working directoryat the time it exits� This means that if your program calls chdir� the�gmon�out� �le will be left in the last directory your program chdir�d to� Ifyou don�t have permission to write in this directory� the �le is not written�and you will get an error message�

Older versions of the gnu pro�ling library may also write a �le called�bb�out�� This �le� if present� contains an human�readable listing of thebasic�block execution counts� Unfortunately� the appearance of a human�readable �bb�out� means the basic�block counts didn�t get written into�gmon�out�� The Perl script bbconv�pl� included with the gprof sourcedistribution� will convert a �bb�out� �le into a format readable by gprof�

Page 8: GNU - Department of Computer Science · GNU gprof The gnu Proler Ja yF enlason and Ric hard Stallman. This man ual describ es the gnu proler gprof and ho wy ou can use it to determine

� GNU gprof

Page 9: GNU - Department of Computer Science · GNU gprof The gnu Proler Ja yF enlason and Ric hard Stallman. This man ual describ es the gnu proler gprof and ho wy ou can use it to determine

Chapter �� �code gprof Command Summary �

� gprof Command Summary

After you have a pro�le data �le �gmon�out�� you can run gprof to in�terpret the information in it� The gprof program prints a �at pro�le and acall graph on standard output� Typically you would redirect the output ofgprof into a �le with ����

You run gprof like this�

gprof options �executable��le �pro�le�data��les� � ��� �� out�le�

Here square�brackets indicate optional arguments�

If you omit the executable �le name� the �le �a�out� is used� If you giveno pro�le data �le name� the �le �gmon�out� is used� If any �le is not in theproper format� or if the pro�le data �le does not appear to belong to theexecutable �le� an error message is printed�

You can give more than one pro�le data �le by entering all their namesafter the executable �le name� then the statistics in all the data �les aresummed together�

The order of these options does not matter�

��� Output Options

These options specify which of several output formats gprof should pro�duce�

Many of these options take an optional symspec to specify functions tobe included or excluded� These options can be speci�ed multiple times� withdi�erent symspecs� to include or exclude sets of symbols� See Section ��� Symspecs�� page ���

Specifying any of these options overrides the default ���p �q��� whichprints a �at pro�le and call graph analysis for all functions�

�A�symspec���annotated�source��symspec�

The ��A� option causes gprof to print annotated source code�If symspec is speci�ed� print output only for matching symbols�See Section ��� Annotated Source�� page ��

�b��brief If the ��b� option is given� gprof doesn�t print the verbose blurbs

that try to explain the meaning of all of the �elds in the tables�This is useful if you intend to print out the output� or are tiredof seeing the blurbs�

Page 10: GNU - Department of Computer Science · GNU gprof The gnu Proler Ja yF enlason and Ric hard Stallman. This man ual describ es the gnu proler gprof and ho wy ou can use it to determine

GNU gprof

�C�symspec���exec�counts��symspec�

The ��C� option causes gprof to print a tally of functions andthe number of times each was called� If symspec is speci�ed�print tally only for matching symbols�

If the pro�le data �le contains basic�block count records� specif�ing the ��l� option� along with ��C�� will cause basic�block exe�cution counts to be tallied and displayed�

�i��file�info

The ��i� option causes gprof to display summary informationabout the pro�le data �le�s� and then exit� The number ofhistogram� call graph� and basic�block count records is displayed�

�I dirs��directory�path�dirs

The ��I� option speci�es a list of search directories in which to�nd source �les� Environment variable GPROF PATH can alsobe used to convery this information� Used mostly for annotatedsource output�

�J�symspec���no�annotated�source��symspec�

The ��J� option causes gprof not to print annotated source code�If symspec is speci�ed� gprof prints annotated source� but ex�cludes matching symbols�

�L��print�path

Normally� source �lenames are printed with the path componentsuppressed� The ��L� option causes gprof to print the full path�name of source �lenames� which is determined from symbolicdebugging information in the image �le and is relative to thedirectory in which the compiler was invoked�

�p�symspec���flat�profile��symspec�

The ��p� option causes gprof to print a �at pro�le� If symspecis speci�ed� print �at pro�le only for matching symbols� SeeSection ��� Flat Pro�le�� page ���

�P�symspec���no�flat�profile��symspec�

The ��P� option causes gprof to suppress printing a �at pro�le�If symspec is speci�ed� gprof prints a �at pro�le� but excludesmatching symbols�

Page 11: GNU - Department of Computer Science · GNU gprof The gnu Proler Ja yF enlason and Ric hard Stallman. This man ual describ es the gnu proler gprof and ho wy ou can use it to determine

Chapter �� �code gprof Command Summary �

�q�symspec���graph��symspec�

The ��q� option causes gprof to print the call graph analysis� Ifsymspec is speci�ed� print call graph only for matching symbolsand their children� See Section �� Call Graph�� page ���

�Q�symspec���no�graph��symspec�

The ��Q� option causes gprof to suppress printing the call graph�If symspec is speci�ed� gprof prints a call graph� but excludesmatching symbols�

�y��separate�files

This option a�ects annotated source output only� Normally�gprof prints annotated source �les to standard�output� Ifthis option is speci�ed� annotated source for a �le named�path�filename� is generated in the �le �filename�ann��

�Z�symspec���no�exec�counts��symspec�

The ��Z� option causes gprof not to print a tally of functionsand the number of times each was called� If symspec is speci�ed�print tally� but exclude matching symbols�

��function�orderingThe ���function�ordering� option causes gprof to print a sug�gested function ordering for the program based on pro�ling data�This option suggests an ordering which may improve paging� tlband cache behavior for the program on systems which supportarbitrary ordering of functions in an executable�

The exact details of how to force the linker to place functions ina particular order is system dependent and out of the scope ofthis manual�

��file�ordering map��leThe ���file�ordering� option causes gprof to print a sug�gested �o link line ordering for the program based on pro�lingdata� This option suggests an ordering which may improve pag�ing� tlb and cache behavior for the program on systems whichdo not support arbitrary ordering of functions in an executable�

Use of the ��a� argument is highly recommended with this op�tion�

The map �le argument is a pathname to a �le which providesfunction name to object �le mappings� The format of the �le issimilar to the output of the program nm�

Page 12: GNU - Department of Computer Science · GNU gprof The gnu Proler Ja yF enlason and Ric hard Stallman. This man ual describ es the gnu proler gprof and ho wy ou can use it to determine

�� GNU gprof

c�parse�o��������� T yyparse

c�parse�o��������� C yyerrflag

c�lang�o��������� T maybe�objc�method�name

c�lang�o��������� T print�lang�statistics

c�lang�o��������� T recognize�objc�keyword

c�decl�o��������� T print�lang�identifier

c�decl�o��������� T print�lang�type

� � �

GNU nm ���extern�only� ���defined�only� ��v� ���print�file�name�can be used to create map �le�

�T��traditional

The ��T� option causes gprof to print its output in �traditional�BSD style�

�w width��width�width

Sets width of output lines to width� Currently only used whenprinting the function index at the bottom of the call graph�

�x��all�lines

This option a�ects annotated source output only� By default�only the lines at the beginning of a basic�block are annotated� Ifthis option is speci�ed� every line in a basic�block is annotatedby repeating the annotation for the �rst line� This behavior issimilar to tcov�s ��a��

��demangle��no�demangle

These options control whether C symbol names should bedemangled when printing output� The default is to demanglesymbols� The ��no�demangle option may be used to turn o�demangling�

��� Analysis Options

�a��no�static

The ��a� option causes gprof to suppress the printing of stat�ically declared �private� functions� �These are functions whosenames are not listed as global� and which are not visible outsidethe �le�function�block where they were de�ned�� Time spent inthese functions� calls to�from them� etc� will all be attributed to

Page 13: GNU - Department of Computer Science · GNU gprof The gnu Proler Ja yF enlason and Ric hard Stallman. This man ual describ es the gnu proler gprof and ho wy ou can use it to determine

Chapter �� �code gprof Command Summary ��

the function that was loaded directly before it in the executable�le� This option a�ects both the �at pro�le and the call graph�

�c��static�call�graph

The ��c� option causes the call graph of the program to be aug�mented by a heuristic which examines the text space of the ob�ject �le and identi�es function calls in the binary machine code�Since normal call graph records are only generated when func�tions are entered� this option identi�es children that could havebeen called� but never were� Calls to functions that were notcompiled with pro�ling enabled are also identi�ed� but only ifsymbol table entries are present for them� Calls to dynamic li�brary routines are typically not found by this option� Parentsor children identi�ed via this heuristic are indicated in the callgraph with call counts of ����

�D��ignore�non�functions

The ��D� option causes gprof to ignore symbols which are notknown to be functions� This option will give more accuratepro�le data on systems where it is supported �Solaris and HPUXfor example��

�k from�toThe ��k� option allows you to delete from the call graph anyarcs from symbols matching symspec from to those matchingsymspec to�

�l��line The ��l� option enables line�by�line pro�ling� which causes his�

togram hits to be charged to individual source code lines� in�stead of functions� If the program was compiled with basic�blockcounting enabled� this option will also identify how many timeseach line of code was executed� While line�by�line pro�ling canhelp isolate where in a large function a program is spending itstime� it also signi�cantly increases the running time of gprof�and magni�es statistical inaccuracies� See Section ��� SamplingError�� page ��

�m num��min�count�num

This option a�ects execution count output only� Symbols thatare executed less than num times are suppressed�

Page 14: GNU - Department of Computer Science · GNU gprof The gnu Proler Ja yF enlason and Ric hard Stallman. This man ual describ es the gnu proler gprof and ho wy ou can use it to determine

� GNU gprof

�n�symspec���time��symspec�

The ��n� option causes gprof� in its call graph analysis� to onlypropagate times for symbols matching symspec�

�N�symspec���no�time��symspec�

The ��n� option causes gprof� in its call graph analysis� not topropagate times for symbols matching symspec�

�z��display�unused�functions

If you give the ��z� option� gprof will mention all functions inthe �at pro�le� even those that were never called� and that hadno time spent in them� This is useful in conjunction with the��c� option for discovering which routines were never called�

��� Miscellaneous Options

�d�num���debug��num�

The ��d num� option speci�es debugging options� If num is notspeci�ed� enable all debugging� See Section ����� Debugging��page ���

�Oname��file�format�name

Selects the format of the pro�le data �les� Recognized formatsare �auto� �the default�� �bsd�� �magic�� and �prof� �not yet sup�ported��

�s��sum The ��s� option causes gprof to summarize the information in

the pro�le data �les it read in� and write out a pro�le data �lecalled �gmon�sum�� which contains all the information from thepro�le data �les that gprof read in� The �le �gmon�sum� may beone of the speci�ed input �les� the e�ect of this is to merge thedata in the other input �les into �gmon�sum��

Eventually you can run gprof again without ��s� to analyze thecumulative data in the �le �gmon�sum��

�v��version

The ��v� �ag causes gprof to print the current version number�and then exit�

Page 15: GNU - Department of Computer Science · GNU gprof The gnu Proler Ja yF enlason and Ric hard Stallman. This man ual describ es the gnu proler gprof and ho wy ou can use it to determine

Chapter �� �code gprof Command Summary ��

��� Depricated Options

These options have been replaced with newer versions that usesymspecs�

�e function�nameThe ��e function� option tells gprof to not print informationabout the function function name �and its children� � �� in thecall graph� The function will still be listed as a child of anyfunctions that call it� but its index number will be shown as��not printed��� More than one ��e� option may be given� onlyone function name may be indicated with each ��e� option�

�E function�nameThe �E function option works like the �e option� but time spentin the function �and children who were not called from anywhereelse�� will not be used to compute the percentages�of�time for thecall graph� More than one ��E� option may be given� only onefunction name may be indicated with each ��E� option�

�f function�nameThe ��f function� option causes gprof to limit the call graphto the function function name and its children �and theirchildren� � ��� More than one ��f� option may be given� onlyone function name may be indicated with each ��f� option�

�F function�nameThe ��F function� option works like the �f option� but only timespent in the function and its children �and their children� � �� willbe used to determine total�time and percentages�of�time for thecall graph� More than one ��F� option may be given� only onefunction name may be indicated with each ��F� option� The ��F�option overrides the ��E� option�

Note that only one function can be speci�ed with each �e� �E� �f or�F option� To specify more than one function� use multiple options� Forexample� this command�

gprof �e boring �f foo �f bar myprogram � gprof�output

lists in the call graph all functions that were reached from either foo or barand were not reachable from boring�

��� Symspecs

Many of the output options allow functions to be included or excludedusing symspecs �symbol speci�cations�� which observe the following syntax�

filename�containing�a�dot

Page 16: GNU - Department of Computer Science · GNU gprof The gnu Proler Ja yF enlason and Ric hard Stallman. This man ual describ es the gnu proler gprof and ho wy ou can use it to determine

�� GNU gprof

funcname�not�containing�a�dot linenumber � � any�filename � � � � any�funcname linenumber � �

Here are some sample symspecs�

main�c Selects everything in �le �main�c��the dot in the string tellsgprof to interpret the string as a �lename� rather than as afunction name� To select a �le whose name does not contain adot� a trailing colon should be speci�ed� For example� �odd�� isinterpreted as the �le named �odd��

main Selects all functions named �main�� Notice that there may bemultiple instances of the same function name because some ofthe de�nitions may be local �i�e�� static�� Unless a function nameis unique in a program� you must use the colon notation ex�plained below to specify a function from a speci�c source �le�Sometimes� function names contain dots� In such cases� it is nec�essar to add a leading colon to the name� For example� ���mul�selects function ��mul��

main�c mainSelects function �main� in �le �main�c��

main�c ���Selects line ��� in �le �main�c��

Page 17: GNU - Department of Computer Science · GNU gprof The gnu Proler Ja yF enlason and Ric hard Stallman. This man ual describ es the gnu proler gprof and ho wy ou can use it to determine

Chapter �� Interpreting �code gprof�s Output ��

� Interpreting gprof�s Output

gprof can produce several di�erent output styles� the most important ofwhich are described below� The simplest output styles ��le information� exe�cution count� and function and �le ordering� are not described here� but aredocumented with the respective options that trigger them� See Section ��� Output Options�� page ��

��� The Flat Pro�le

The �at pro�le shows the total amount of time your program spent ex�ecuting each function� Unless the ��z� option is given� functions with noapparent time spent in them� and no apparent calls to them� are not men�tioned� Note that if a function was not compiled for pro�ling� and didn�trun long enough to show up on the program counter histogram� it will beindistinguishable from a function that was never called�

This is part of a �at pro�le for a small program�

Flat profile�

Each sample counts as ��� seconds�

cumulative self self total

time seconds seconds calls ms�call ms�call name

����� ��� ��� � �� ���� ���� open

���� ���� ��� �� ���� �� offtime

���� ���� ��� � � � � � memccpy

���� ���� ��� � ��� ��� write

���� ���� ��� mcount

���� ���� ���� �� ���� ���� tzset

���� ���� ���� � ���� ���� tolower

���� ���� ���� �� ���� ���� strlen

���� ���� ���� �� ���� ���� strchr

���� ���� ���� ���� ����� main

���� ���� ���� ���� ���� memcpy

���� ���� ���� ���� �� print

���� ���� ���� ���� ���� profil

���� ���� ���� ���� ����� report

� � �

The functions are sorted by �rst by decreasing run�time spent in them� thenby decreasing number of calls� then alphabetically by name� The functions�mcount� and �profil� are part of the pro�ling aparatus and appear in every�at pro�le� their time gives a measure of the amount of overhead due topro�ling�

Just before the column headers� a statement appears indicating how muchtime each sample counted as� This sampling period estimates the margin of

Page 18: GNU - Department of Computer Science · GNU gprof The gnu Proler Ja yF enlason and Ric hard Stallman. This man ual describ es the gnu proler gprof and ho wy ou can use it to determine

�� GNU gprof

error in each of the time �gures� A time �gure that is not much larger thanthis is not reliable� In this example� each sample counted as ���� seconds�suggesting a ��� Hz sampling rate� The program�s total execution time was���� seconds� as indicated by the �cumulative seconds� �eld� Since eachsample counted for ���� seconds� this means only six samples were takenduring the run� Two of the samples occured while the program was inthe �open� function� as indicated by the �self seconds� �eld� Each of theother four samples occured one each in �offtime�� �memccpy�� �write�� and�mcount�� Since only six samples were taken� none of these values can beregarded as particularly reliable� In another run� the �self seconds� �eldfor �mcount� might well be ������ or ������� See Section ��� Sampling Error��page �� for a complete discussion�

The remaining functions in the listing �those whose �self seconds� �eldis ������� didn�t appear in the histogram samples at all� However� the callgraph indicated that they were called� so therefore they are listed� sortedin decreasing order by the �calls� �eld� Clearly some time was spent exe�cuting these functions� but the paucity of histogram samples prevents anydetermination of how much time each took�

Here is what the �elds in each line mean�

� time This is the percentage of the total execution time your programspent in this function� These should all add up to �����

cumulative secondsThis is the cumulative total number of seconds the computerspent executing this functions� plus the time spent in all thefunctions above this one in this table�

self secondsThis is the number of seconds accounted for by this functionalone� The �at pro�le listing is sorted �rst by this number�

calls This is the total number of times the function was called� If thefunction was never called� or the number of times it was calledcannot be determined �probably because the function was notcompiled with pro�ling enabled�� the calls �eld is blank�

self ms�callThis represents the average number of milliseconds spent in thisfunction per call� if this function is pro�led� Otherwise� this �eldis blank for this function�

total ms�callThis represents the average number of milliseconds spent in thisfunction and its descendants per call� if this function is pro�led�Otherwise� this �eld is blank for this function� This is the only�eld in the �at pro�le that uses call graph analysis�

Page 19: GNU - Department of Computer Science · GNU gprof The gnu Proler Ja yF enlason and Ric hard Stallman. This man ual describ es the gnu proler gprof and ho wy ou can use it to determine

Chapter �� Interpreting �code gprof�s Output ��

name This is the name of the function� The �at pro�le is sorted bythis �eld alphabetically after the self seconds and calls �elds aresorted�

��� The Call Graph

The call graph shows how much time was spent in each function andits children� From this information� you can �nd functions that� while theythemselves may not have used much time� called other functions that diduse unusual amounts of time�

Here is a sample call from a small program� This call came from the samegprof run as the �at pro�le example in the previous chapter�

granularity� each sample hit covers byte�s� for ���� of ���� seconds

index time self children called name

�spontaneous�

�� ���� ���� ���� start ��

���� ���� � main � �

���� ���� � on�exit � ��

���� ���� � exit ����

�����������������������������������������������

���� ���� � start ��

� � ���� ���� ���� main � �

���� ���� � report ���

�����������������������������������������������

���� ���� � main � �

��� ���� ���� ���� report ���

���� ���� ��� timelocal ���

���� ��� � print ���

���� ��� ��� fgets � �

���� ���� ��� strncmp �cycle � ����

���� ���� ��� lookup � ��

���� ���� � fopen � �

���� ���� ��� chewtime � ��

���� ���� ��� skipspace ����

�����������������������������������������������

��� ���� ��� ��� ���� �cycle as a whole� ���

��� ��� ��� �� offtime �cycle � ���

���� ���� ��� tzset �cycle � � ��

�����������������������������������������������

The lines full of dashes divide this table into entries� one for each function�Each entry has one or more lines�

In each entry� the primary line is the one that starts with an index numberin square brackets� The end of this line says which function the entry is for�

Page 20: GNU - Department of Computer Science · GNU gprof The gnu Proler Ja yF enlason and Ric hard Stallman. This man ual describ es the gnu proler gprof and ho wy ou can use it to determine

� GNU gprof

The preceding lines in the entry describe the callers of this function and thefollowing lines describe its subroutines �also called children when we speakof the call graph��

The entries are sorted by time spent in the function and its subroutines�

The internal pro�ling function mcount �see Section ��� Flat Pro�le��page ��� is never mentioned in the call graph�

����� The Primary Line

The primary line in a call graph entry is the line that describes thefunction which the entry is about and gives the overall statistics for thisfunction�

For reference� we repeat the primary line from the entry for functionreport in our main example� together with the heading line that shows thenames of the �elds�

index time self children called name

� � �

��� ���� ���� ���� report ���

Here is what the �elds in the primary line mean�

index Entries are numbered with consecutive integers� Each functiontherefore has an index number� which appears at the beginningof its primary line�

Each cross�reference to a function� as a caller or subroutine ofanother� gives its index number as well as its name� The indexnumber guides you if you wish to look for the entry for thatfunction�

� time This is the percentage of the total time that was spent in thisfunction� including time spent in subroutines called from thisfunction�

The time spent in this function is counted again for the callers ofthis function� Therefore� adding up these percentages is mean�ingless�

self This is the total amount of time spent in this function� Thisshould be identical to the number printed in the seconds �eldfor this function in the �at pro�le�

children This is the total amount of time spent in the subroutine callsmade by this function� This should be equal to the sum of all theself and children entries of the children listed directly belowthis function�

called This is the number of times the function was called�

Page 21: GNU - Department of Computer Science · GNU gprof The gnu Proler Ja yF enlason and Ric hard Stallman. This man ual describ es the gnu proler gprof and ho wy ou can use it to determine

Chapter �� Interpreting �code gprof�s Output ��

If the function called itself recursively� there are two numbers�separated by a ��� The �rst number counts non�recursive calls�and the second counts recursive calls�

In the example above� the function report was called once frommain�

name This is the name of the current function� The index number isrepeated after it�

If the function is part of a cycle of recursion� the cycle number isprinted between the function�s name and the index number �seeSection ���� Cycles�� page ��� For example� if function gnurris part of cycle number one� and has index number twelve� itsprimary line would be end like this�

gnurr �cycle �� ����

����� Lines for a Function�s Callers

A function�s entry has a line for each function it was called by� Theselines� �elds correspond to the �elds of the primary line� but their meaningsare di�erent because of the di�erence in context�

For reference� we repeat two lines from the entry for the function report�the primary line and one caller�line preceding it� together with the headingline that shows the names of the �elds�

index time self children called name

� � �

���� ���� � main � �

��� ���� ���� ���� report ���

Here are the meanings of the �elds in the caller�line for report calledfrom main�

self An estimate of the amount of time spent in report itself whenit was called from main�

children An estimate of the amount of time spent in subroutines ofreport when report was called from main�

The sum of the self and children �elds is an estimate of theamount of time spent within calls to report from main�

called Two numbers� the number of times report was called frommain� followed by the total number of nonrecursive calls toreport from all its callers�

name and index numberThe name of the caller of report to which this line applies�followed by the caller�s index number�

Page 22: GNU - Department of Computer Science · GNU gprof The gnu Proler Ja yF enlason and Ric hard Stallman. This man ual describ es the gnu proler gprof and ho wy ou can use it to determine

� GNU gprof

Not all functions have entries in the call graph� some options togprof request the omission of certain functions� When a callerhas no entry of its own� it still has caller�lines in the entries ofthe functions it calls�

If the caller is part of a recursion cycle� the cycle number isprinted between the name and the index number�

If the identity of the callers of a function cannot be determined� a dummycaller�line is printed which has ��spontaneous�� as the �caller�s name� andall other �elds blank� This can happen for signal handlers�

����� Lines for a Function�s Subroutines

A function�s entry has a line for each of its subroutines�in other words�a line for each other function that it called� These lines� �elds correspondto the �elds of the primary line� but their meanings are di�erent because ofthe di�erence in context�

For reference� we repeat two lines from the entry for the function main�the primary line and a line for a subroutine� together with the heading linethat shows the names of the �elds�

index time self children called name

� � �

� � ���� ���� ���� main � �

���� ���� � report ���

Here are the meanings of the �elds in the subroutine�line for main callingreport�

self An estimate of the amount of time spent directly within reportwhen report was called from main�

children An estimate of the amount of time spent in subroutines ofreport when report was called from main�

The sum of the self and children �elds is an estimate of thetotal time spent in calls to report from main�

called Two numbers� the number of calls to report from main followedby the total number of nonrecursive calls to report� This ratiois used to determine how much of report�s self and childrentime gets credited to main� See Section �� Assumptions��page �

name The name of the subroutine of main to which this line applies�followed by the subroutine�s index number�

If the caller is part of a recursion cycle� the cycle number isprinted between the name and the index number�

Page 23: GNU - Department of Computer Science · GNU gprof The gnu Proler Ja yF enlason and Ric hard Stallman. This man ual describ es the gnu proler gprof and ho wy ou can use it to determine

Chapter �� Interpreting �code gprof�s Output �

����� How Mutually Recursive Functions AreDescribed

The graph may be complicated by the presence of cycles of recursionin the call graph� A cycle exists if a function calls another function that�directly or indirectly� calls �or appears to call� the original function� Forexample� if a calls b� and b calls a� then a and b form a cycle�

Whenever there are call paths both ways between a pair of functions�they belong to the same cycle� If a and b call each other and b and c calleach other� all three make one cycle� Note that even if b only calls a if itwas not called from a� gprof cannot determine this� so a and b are stillconsidered a cycle�

The cycles are numbered with consecutive integers� When a functionbelongs to a cycle� each time the function name appears in the call graph itis followed by ��cycle number���

The reason cycles matter is that they make the time values in the callgraph paradoxical� The �time spent in children� of a should include the timespent in its subroutine b and in b�s subroutines�but one of b�s subroutinesis a� How much of a�s time should be included in the children of a� when ais indirectly recursive�

The way gprof resolves this paradox is by creating a single entry for thecycle as a whole� The primary line of this entry describes the total timespent directly in the functions of the cycle� The �subroutines� of the cycleare the individual functions of the cycle� and all other functions that werecalled directly by them� The �callers� of the cycle are the functions� outsidethe cycle� that called functions in the cycle�

Here is an example portion of a call graph which shows a cycle containingfunctions a and b� The cycle was entered by a call to a from main� both aand b called c�

index time self children called name

����������������������������������������

��� � � main � �

��� ��� ��� � �� �cycle as a whole� ���

�� � � b �cycle � ���

���� � a �cycle � ���

����������������������������������������

� a �cycle � ���

��� � ��� �� � � b �cycle � ���

a �cycle � ���

� � ��� c ���

����������������������������������������

��� � � main � �

b �cycle � ���

��� ����� ���� � a �cycle � ���

Page 24: GNU - Department of Computer Science · GNU gprof The gnu Proler Ja yF enlason and Ric hard Stallman. This man ual describ es the gnu proler gprof and ho wy ou can use it to determine

GNU gprof

� b �cycle � ���

� � ��� c ���

����������������������������������������

�The entire call graph for this program contains in addition an entry formain� which calls a� and an entry for c� with callers a and b��

index time self children called name

�spontaneous�

�� ����� � ��� � start ��

��� ��� � main � �

����������������������������������������

��� ��� � start ��

� � ����� ��� ��� main � �

��� � � a �cycle � ���

����������������������������������������

��� � � main � �

��� ��� ��� � �� �cycle as a whole� ���

�� � � b �cycle � ���

���� � a �cycle � ���

� � ��� c ���

����������������������������������������

� a �cycle � ���

��� � ��� �� � � b �cycle � ���

a �cycle � ���

� � ��� c ���

����������������������������������������

��� � � main � �

b �cycle � ���

��� ����� ���� � a �cycle � ���

� b �cycle � ���

� � ��� c ���

����������������������������������������

� � ��� b �cycle � ���

� � ��� a �cycle � ���

��� ���� � � � c ���

����������������������������������������

The self �eld of the cycle�s primary line is the total time spent in all thefunctions of the cycle� It equals the sum of the self �elds for the individualfunctions in the cycle� found in the entry in the subroutine lines for thesefunctions�

The children �elds of the cycle�s primary line and subroutine lines countonly subroutines outside the cycle� Even though a calls b� the time spentin those calls to b is not counted in a�s children time� Thus� we do notencounter the problem of what to do when the time in those calls to bincludes indirect recursive calls back to a�

Page 25: GNU - Department of Computer Science · GNU gprof The gnu Proler Ja yF enlason and Ric hard Stallman. This man ual describ es the gnu proler gprof and ho wy ou can use it to determine

Chapter �� Interpreting �code gprof�s Output �

The children �eld of a caller�line in the cycle�s entry estimates theamount of time spent in the whole cycle� and its other subroutines� on thetimes when that caller called a function in the cycle�

The calls �eld in the primary line for the cycle has two numbers� �rst�the number of times functions in the cycle were called by functions outsidethe cycle� second� the number of times they were called by functions in thecycle �including times when a function in the cycle calls itself�� This is ageneralization of the usual split into nonrecursive and recursive calls�

The calls �eld of a subroutine�line for a cycle member in the cycle�sentry says how many time that function was called from functions in thecycle� The total of all these is the second number in the primary line�scalls �eld�

In the individual entry for a function in a cycle� the other functions inthe same cycle can appear as subroutines and as callers� These lines showhow many times each function in the cycle called or was called from eachother function in the cycle� The self and children �elds in these lines areblank because of the di�culty of de�ning meanings for them when recursionis going on�

��� Line�by�line Pro�ling

gprof�s ��l� option causes the program to perform line�by�line pro�ling�In this mode� histogram samples are assigned not to functions� but to indi�vidual lines of source code� The program usually must be compiled with a��g� option� in addition to ��pg�� in order to generate debugging symbols fortracking source code lines�

The �at pro�le is the most useful output table in line�by�line mode� Thecall graph isn�t as useful as normal� since the current version of gprof doesnot propagate call graph arcs from source code lines to the enclosing func�tion� The call graph does� however� show each line of code that called eachfunction� along with a count�

Here is a section of gprof�s output� without line�by�line pro�ling� Notethat ct�init accounted for four histogram hits� and ���� calls to init�block�

Flat profile�

Each sample counts as ��� seconds�

cumulative self self total

time seconds seconds calls us�call us�call name

����� ��� ���� ���� ��� ��� ct�init

Call graph �explanation follows�

Page 26: GNU - Department of Computer Science · GNU gprof The gnu Proler Ja yF enlason and Ric hard Stallman. This man ual describ es the gnu proler gprof and ho wy ou can use it to determine

� GNU gprof

granularity� each sample hit covers � byte�s� for ���� of ��� seconds

index time self children called name

���� ���� ����� name�too�long

���� ���� ������� deflate

���� ���� ������ deflate�fast

���� ���� �� ������ ct�init

��� ��� ���� ���� ���� init�block

Now let�s look at some of gprof�s output from the same program run� thistime with line�by�line pro�ling enabled� Note that ct�init�s four histogramhits are broken down into four lines of source code � one hit occured on eachof lines ���� ���� � and ��� In the call graph� note how ct�init�s ����calls to init�block are broken down into one call from line ���� ���� callsfrom line ��� ���� calls from line ��� and ��� calls from ���

Flat profile�

Each sample counts as ��� seconds�

cumulative self

time seconds seconds calls name

���� ��� ��� ct�init �trees�c�����

���� �� ��� ct�init �trees�c����

���� �� ��� ct�init �trees�c��� �

���� ��� ��� ct�init �trees�c�����

Call graph �explanation follows�

granularity� each sample hit covers � byte�s� for ���� of ��� seconds

time self children called name

���� ���� ����� name�too�long �gzip�c�����

���� ���� ����� deflate �deflate�c�����

���� ���� ����� ct�init �trees�c�����

���� ���� ����� deflate �deflate�c�� ��

���� ���� ������ deflate �deflate�c�����

���� ���� ������ deflate �deflate�c�����

���� ���� ����� deflate �deflate�c�����

���� ���� ������ deflate �deflate�c�����

Page 27: GNU - Department of Computer Science · GNU gprof The gnu Proler Ja yF enlason and Ric hard Stallman. This man ual describ es the gnu proler gprof and ho wy ou can use it to determine

Chapter �� Interpreting �code gprof�s Output �

���� ���� ������ deflate�fast �deflate�c�����

���� ���� �������� ct�init �trees�c�����

���� ���� ��������� ct�init �trees�c�����

���� ���� �� ������ ct�init �trees�c�����

��� ��� ���� ���� ���� init�block �trees�c�����

��� The Annotated Source Listing

gprof�s ��A� option triggers an annotated source listing� which lists theprogram�s source code� each function labeled with the number of times itwas called� You may also need to specify the ��I� option� if gprof can�t �ndthe source code �les�

Compiling with �gcc � � � �g �pg �a� augments your program with basic�block counting code� in addition to function counting code� This enablesgprof to determine how many times each line of code was exeucted� Forexample� consider the following function� taken from gzip� with line numbersadded�

ulg updcrc�s� n�

uch �s�

� unsigned n�

� �

� register ulg c�

� static ulg crc � �ulg��xffffffffL�

� if �s �� NULL� �

� c � �xffffffffL�

� else �

c � crc�

� if �n� do �

� c � crc�� �tab������

� � while ���n��

� �

� crc � c�

� return c � �xffffffffL�

� �

updcrc has at least �ve basic�blocks� One is the function itself� The ifstatement on line � generates two more basic�blocks� one for each branch ofthe if� A fourth basic�block results from the if on line ��� and the contentsof the do loop form the �fth basic�block� The compiler may also generateadditional basic�blocks to handle various special cases�

Page 28: GNU - Department of Computer Science · GNU gprof The gnu Proler Ja yF enlason and Ric hard Stallman. This man ual describ es the gnu proler gprof and ho wy ou can use it to determine

� GNU gprof

A program augmented for basic�block counting can be analyzed withgprof �l �A� I also suggest use of the ��x� option� which ensures that eachline of code is labeled at least once� Here is updcrc�s annotated source listingfor a sample gzip run�

ulg updcrc�s� n�

uch �s�

unsigned n�

���

register ulg c�

static ulg crc � �ulg��xffffffffL�

�� if �s �� NULL� �

�� c � �xffffffffL�

�� � else �

�� c � crc�

�� if �n� do �

�� �� c � crc�� �tab������

�� �� �� �� � while ���n��

�� crc � c�

�� return c � �xffffffffL�

���

In this example� the function was called twice� passing once through eachbranch of the if statement� The body of the do loop was executed a total of��� times� Note how the while statement is annotated� It began execution��� times� once for each iteration through the loop� One of those times�the last time� it exited� while it branched back to the beginning of the loop���� times�

Page 29: GNU - Department of Computer Science · GNU gprof The gnu Proler Ja yF enlason and Ric hard Stallman. This man ual describ es the gnu proler gprof and ho wy ou can use it to determine

Chapter �� Inaccuracy of �code gprof Output �

� Inaccuracy of gprof Output

��� Statistical Sampling Error

The run�time �gures that gprof gives you are based on a sampling pro�cess� so they are subject to statistical inaccuracy� If a function runs only asmall amount of time� so that on the average the sampling process ought tocatch that function in the act only once� there is a pretty good chance it willactually �nd that function zero times� or twice�

By contrast� the number�of�calls and basic�block �gures are derived bycounting� not sampling� They are completely accurate and will not varyfrom run to run if your program is deterministic�

The sampling period that is printed at the beginning of the �at pro�lesays how often samples are taken� The rule of thumb is that a run�time�gure is accurate if it is considerably bigger than the sampling period�

The actual amount of error can be predicted� For n samples� the expectederror is the square�root of n� For example� if the sampling period is ����seconds and foo�s run�time is � second� n is ��� samples �� second�����seconds�� sqrt�n� is �� samples� so the expected error in foo�s run�time is��� seconds ��� ���� seconds�� or ten percent of the observed value� Again�if the sampling period is ���� seconds and bar�s run�time is ��� seconds�n is ����� samples� sqrt�n� is ��� samples� so the expected error in bar�srun�time is � second� or one percent of the observed value� It is likely to varythis much on the average from one pro�ling run to the next� �Sometimes itwill vary more��

This does not mean that a small run�time �gure is devoid of information�If the program�s total run�time is large� a small run�time for one functiondoes tell you that that function used an insigni�cant fraction of the wholeprogram�s time� Usually this means it is not worth optimizing�

One way to get more accuracy is to give your program more �but similar�input data so it will take longer� Another way is to combine the data fromseveral runs� using the ��s� option of gprof� Here is how�

�� Run your program once�

� Issue the command �mv gmon�out gmon�sum��

�� Run your program again� the same as before�

�� Merge the new data in �gmon�out� into �gmon�sum� with this command�

gprof �s executable��le gmon�out gmon�sum

�� Repeat the last two steps as often as you wish�

�� Analyze the cumulative data using this command�

gprof executable��le gmon�sum � output��le

Page 30: GNU - Department of Computer Science · GNU gprof The gnu Proler Ja yF enlason and Ric hard Stallman. This man ual describ es the gnu proler gprof and ho wy ou can use it to determine

GNU gprof

��� Estimating children Times

Some of the �gures in the call graph are estimates�for example� thechildren time values and all the the time �gures in caller and subroutinelines�

There is no direct information about these measurements in the pro�ledata itself� Instead� gprof estimates them by making an assumption aboutyour program that might or might not be true�

The assumption made is that the average time spent in each call to anyfunction foo is not correlated with who called foo� If foo used � seconds inall� and �� of the calls to foo came from a� then foo contributes secondsto a�s children time� by assumption�

This assumption is usually true enough� but for some programs it is farfrom true� Suppose that foo returns very quickly when its argument is zero�suppose that a always passes zero as an argument� while other callers of foopass other arguments� In this program� all the time spent in foo is in thecalls from callers other than a� But gprof has no way of knowing this� itwill blindly and incorrectly charge seconds of time in foo to the childrenof a�

We hope some day to put more complete data into �gmon�out�� so thatthis assumption is no longer needed� if we can �gure out how� For the nonce�the estimated �gures are usually more useful than misleading�

Page 31: GNU - Department of Computer Science · GNU gprof The gnu Proler Ja yF enlason and Ric hard Stallman. This man ual describ es the gnu proler gprof and ho wy ou can use it to determine

Chapter �� Answers to Common Questions �

� Answers to Common Questions

How do I �nd which lines in my program were executed the most times�Compile your program with basic�block counting enabled� runit� then use the following pipeline�

gprof �l �C obj�le sort �k � �n �r

This listing will show you the lines in your code executed mostoften� but not necessarily those that consumed the most time�

How do I �nd which lines in my program called a particular function�Use gprof �l and lookup the function in the call graph� Thecallers will be broken down by function and line number�

How do I analyze a program that runs for less than a second�Try using a shell script like this one�

for i in �seq � ����� dofastprogmv gmon�out gmon�out��i

done

gprof �s fastprog gmon�out��

gprof fastprog gmon�sum

If your program is completely deterministic� all the call countswill be simple multiples of ��� �i�e� a function called once ineach run will appear with a call count of �����

Page 32: GNU - Department of Computer Science · GNU gprof The gnu Proler Ja yF enlason and Ric hard Stallman. This man ual describ es the gnu proler gprof and ho wy ou can use it to determine

�� GNU gprof

Page 33: GNU - Department of Computer Science · GNU gprof The gnu Proler Ja yF enlason and Ric hard Stallman. This man ual describ es the gnu proler gprof and ho wy ou can use it to determine

Chapter � Incompatibilities with Unix �code gprof ��

Incompatibilities with Unix gprof

gnu gprof and Berkeley Unix gprof use the same data �le �gmon�out��and provide essentially the same information� But there are a few di�erences�

� gnu gprof uses a new� generalized �le format with support for basic�block execution counts and non�realtime histograms� A magic cookieand version number allows gprof to easily identify new style �les� OldBSD�style �les can still be read� See Section �� File Format�� page ���

� For a recursive function� Unix gprof lists the function as a parent andas a child� with a calls �eld that lists the number of recursive calls�gnu gprof omits these lines and puts the number of recursive calls inthe primary line�

� When a function is suppressed from the call graph with ��e�� gnu gprofstill lists it as a subroutine of functions that call it�

� gnu gprof accepts the ��k� with its argument in the form �from�to��instead of �from to��

� In the annotated source listing� if there are multiple basic blocks on thesame line� gnu gprof prints all of their counts� seperated by commas�

� The blurbs� �eld widths� and output formats are di�erent� gnu gprofprints blurbs after the tables� so that you can see the tables withoutskipping the blurbs�

Page 34: GNU - Department of Computer Science · GNU gprof The gnu Proler Ja yF enlason and Ric hard Stallman. This man ual describ es the gnu proler gprof and ho wy ou can use it to determine

� GNU gprof

Page 35: GNU - Department of Computer Science · GNU gprof The gnu Proler Ja yF enlason and Ric hard Stallman. This man ual describ es the gnu proler gprof and ho wy ou can use it to determine

Chapter �� Details of Pro�ling ��

Details of Pro�ling

�� Implementation of Pro�ling

Pro�ling works by changing how every function in your program is com�piled so that when it is called� it will stash away some information aboutwhere it was called from� From this� the pro�ler can �gure out what functioncalled it� and can count how many times it was called� This change is madeby the compiler when your program is compiled with the ��pg� option� whichcauses every function to call mcount �or �mcount� or ��mcount� dependingon the OS and compiler� as one of its �rst operations�

The mcount routine� included in the pro�ling library� is responsible forrecording in an in�memory call graph table both its parent routine �the child�and its parent�s parent� This is typically done by examining the stack frameto �nd both the address of the child� and the return address in the originalparent� Since this is a very machine�dependant operation� mcount itself istypically a short assembly�language stub routine that extracts the requiredinformation� and then calls ��mcount�internal �a normal C function� withtwo arguments � frompc and selfpc� ��mcount�internal is responsible formaintaining the in�memory call graph� which records frompc� selfpc� andthe number of times each of these call arcs was transversed�

GCC Version provides a magical function ���builtin�return�address�� which allows a generic mcount function to extract the requiredinformation from the stack frame� However� on some architectures� mostnotably the SPARC� using this builtin can be very computationally expen�sive� and an assembly language version of mcount is used for performancereasons�

Number�of�calls information for library routines is collected by using aspecial version of the C library� The programs in it are the same as in theusual C library� but they were compiled with ��pg�� If you link your programwith �gcc � � � �pg�� it automatically uses the pro�ling version of the library�

Pro�ling also involves watching your program as it runs� and keeping ahistogram of where the program counter happens to be every now and then�Typically the program counter is looked at around ��� times per second ofrun time� but the exact frequency may vary from system to system�

This is done is one of two ways� Most UNIX�like operating systems pro�vide a profil�� system call� which registers a memory array with the kernel�along with a scale factor that determines how the program�s address spacemaps into the array� Typical scaling values cause every to bytes of ad�dress space to map into a single array slot� On every tick of the systemclock �assuming the pro�led program is running�� the value of the programcounter is examined and the corresponding slot in the memory array is incre�mented� Since this is done in the kernel� which had to interrupt the process

Page 36: GNU - Department of Computer Science · GNU gprof The gnu Proler Ja yF enlason and Ric hard Stallman. This man ual describ es the gnu proler gprof and ho wy ou can use it to determine

�� GNU gprof

anyway to handle the clock interrupt� very little additional system overheadis required�

However� some operating systems� most notably Linux �� �and earlier��do not provide a profil�� system call� On such a system� arrangements aremade for the kernel to periodically deliver a signal to the process �typicallyvia setitimer���� which then performs the same operation of examining theprogram counter and incrementing a slot in the memory array� Since thismethod requires a signal to be delivered to user space every time a sample istaken� it uses considerably more overhead than kernel�based pro�ling� Also�due to the added delay required to deliver the signal� this method is lessaccurate as well�

A special startup routine allocates memory for the histogram and eithercalls profil�� or sets up a clock signal handler� This routine �monstartup�can be invoked in several ways� On Linux systems� a special pro�ling startup�le gcrt��o� which invokes monstartup before main� is used instead of thedefault crt��o� Use of this special startup �le is one of the e�ects of using�gcc � � � �pg� to link� On SPARC systems� no special startup �les are used�Rather� the mcount routine� when it is invoked for the �rst time �typicallywhen main is called�� calls monstartup�

If the compiler�s ��a� option was used� basic�block counting is also en�abled� Each object �le is then compiled with a static array of counts� ini�tially zero� In the executable code� every time a new basic�block begins �i�e�when an if statement appears�� an extra instruction is inserted to incrementthe corresponding count in the array� At compile time� a paired array wasconstructed that recorded the starting address of each basic�block� Takentogether� the two arrays record the starting address of every basic�block�along with the number of times it was executed�

The pro�ling library also includes a function �mcleanup� which is typ�ically registered using atexit�� to be called as the program exits� and isresponsible for writing the �le �gmon�out�� Pro�ling is turned o�� variousheaders are output� and the histogram is written� followed by the call�grapharcs and the basic�block counts�

The output from gprof gives no indication of parts of your programthat are limited by I�O or swapping bandwidth� This is because samples ofthe program counter are taken at �xed intervals of the program�s run time�Therefore� the time measurements in gprof output say nothing about timethat your program was not running� For example� a part of the programthat creates so much data that it cannot all �t in physical memory at oncemay run very slowly due to thrashing� but gprof will say it uses little time�On the other hand� sampling by run time has the advantage that the amountof load due to other users won�t directly a�ect the output you get�

Page 37: GNU - Department of Computer Science · GNU gprof The gnu Proler Ja yF enlason and Ric hard Stallman. This man ual describ es the gnu proler gprof and ho wy ou can use it to determine

Chapter �� Details of Pro�ling ��

�� Pro�ling Data File Format

The old BSD�derived �le format used for pro�le data does not contain amagic cookie that allows to check whether a data �le really is a gprof �le�Furthermore� it does not provide a version number� thus rendering changesto the �le format almost impossible� gnu gprof uses a new �le format thatprovides these features� For backward compatibility� gnu gprof continuesto support the old BSD�derived format� but not all features are supportedwith it� For example� basic�block execution counts cannot be accommodatedby the old �le format�

The new �le format is de�ned in header �le �gmon�out�h�� It consists ofa header containing the magic cookie and a version number� as well as somespare bytes available for future extensions� All data in a pro�le data �le is inthe native format of the host on which the pro�le was collected� gnu gprofadapts automatically to the byte�order in use�

In the new �le format� the header is followed by a sequence of records�Currently� there are three di�erent record types� histogram records� call�graph arc records� and basic�block execution count records� Each �le cancontain any number of each record type� When reading a �le� gnu gprof willensure records of the same type are compatible with each other and computethe union of all records� For example� for basic�block execution counts� theunion is simply the sum of all execution counts for each basic�block�

����� Histogram Records

Histogram records consist of a header that is followed by an array of bins�The header contains the text�segment range that the histogram spans� thesize of the histogram in bytes �unlike in the old BSD format� this does not in�clude the size of the header�� the rate of the pro�ling clock� and the physicaldimension that the bin counts represent after being scaled by the pro�lingclock rate� The physical dimension is speci�ed in two parts� a long name ofup to �� characters and a single character abbreviation� For example� a his�togram representing real�time would specify the long name as �seconds� andthe abbreviation as �s�� This feature is useful for architectures that supportperformance monitor hardware �which� fortunately� is becoming increasinglycommon�� For example� under DEC OSF��� the �upro�le� command canbe used to produce a histogram of� say� instruction cache misses� In thiscase� the dimension in the histogram header could be set to �i�cache misses�and the abbreviation could be set to ��� �because it is simply a count� nota physical dimension�� Also� the pro�ling rate would have to be set to � inthis case�

Histogram bins are ���bit numbers and each bin represent an equalamount of text�space� For example� if the text�segment is one thousand

Page 38: GNU - Department of Computer Science · GNU gprof The gnu Proler Ja yF enlason and Ric hard Stallman. This man ual describ es the gnu proler gprof and ho wy ou can use it to determine

�� GNU gprof

bytes long and if there are ten bins in the histogram� each bin represents onehundred bytes�

����� Call�Graph Records

Call�graph records have a format that is identical to the one used inthe BSD�derived �le format� It consists of an arc in the call graph and acount indicating the number of times the arc was traversed during programexecution� Arcs are speci�ed by a pair of addresses� the �rst must be withincaller�s function and the second must be within the callee�s function� Whenperforming pro�ling at the function level� these addresses can point anywherewithin the respective function� However� when pro�ling at the line�level� itis better if the addresses are as close to the call�site�entry�point as possible�This will ensure that the line�level call�graph is able to identify exactly whichline of source code performed calls to a function�

����� Basic�Block Execution Count Records

Basic�block execution count records consist of a header followed by asequence of address�count pairs� The header simply speci�es the length ofthe sequence� In an address�count pair� the address identi�es a basic�blockand the count speci�es the number of times that basic�block was executed�Any address within the basic�address can be used�

�� gprofs Internal Operation

Like most programs� gprof begins by processing its options� Dur�ing this stage� it may building its symspec list �sym�ids�c sym�id�add�� if options are speci�ed which use symspecs� gprof maintainsa single linked list of symspecs� which will eventually get turnedinto � symbol tables� organized into six include�exclude pairs � onepair each for the �at pro�le �INCL FLAT�EXCL FLAT�� the callgraph arcs �INCL ARCS�EXCL ARCS�� printing in the call graph�INCL GRAPH�EXCL GRAPH�� timing propagation in the call graph�INCL TIME�EXCL TIME�� the annotated source listing �INCL ANNO�EXCL ANNO��and the execution count listing �INCL EXEC�EXCL EXEC��

After option processing� gprof �nishes building the symspec list byadding all the symspecs in default�excluded�list to the exclude listsEXCL TIME and EXCL GRAPH� and if line�by�line pro�ling is speci�ed�EXCL FLAT as well� These default excludes are not added to EXCL ANNO�EXCL ARCS� and EXCL EXEC�

Next� the BFD library is called to open the object �le� verify that it isan object �le� and read its symbol table �core�c core�init�� using bfd�canonicalize�symtab after mallocing an appropiate sized array of asym�

Page 39: GNU - Department of Computer Science · GNU gprof The gnu Proler Ja yF enlason and Ric hard Stallman. This man ual describ es the gnu proler gprof and ho wy ou can use it to determine

Chapter �� Details of Pro�ling ��

bols� At this point� function mappings are read �if the ���file�ordering�option has been speci�ed�� and the core text space is read into memory �ifthe ��c� option was given��

gprof�s own symbol table� an array of Sym structures� is now built� Thisis done in one of two ways� by one of two routines� depending on whetherline�by�line pro�ling ���l� option� has been enabled� For normal pro�ling� theBFD canonical symbol table is scanned� For line�by�line pro�ling� every textspace address is examined� and a new symbol table entry gets created everytime the line number changes� In either case� two passes are made throughthe symbol table � one to count the size of the symbol table required� and theother to actually read the symbols� In between the two passes� a single arrayof type Sym is created of the appropiate length� Finally� symtab�c symtab�finalize is called to sort the symbol table and remove duplicate entries�entries with the same memory address��

The symbol table must be a contiguous array for two reasons� First� theqsort library function �which sorts an array� will be used to sort the symboltable� Also� the symbol lookup routine �symtab�c sym�lookup�� which �ndssymbols based on memory address� uses a binary search algorithm which re�quires the symbol table to be a sorted array� Function symbols are indicatedwith an is�func �ag� Line number symbols have no special �ags set� Addi�tionally� a symbol can have an is�static �ag to indicate that it is a localsymbol�

With the symbol table read� the symspecs can now be translated intoSyms �sym�ids�c sym�id�parse�� Remember that a single symspec canmatch multiple symbols� An array of symbol tables �syms� is created� eachentry of which is a symbol table of Syms to be included or excluded from aparticular listing� The master symbol table and the symspecs are examinedby nested loops� and every symbol that matches a symspec is inserted intothe appropriate syms table� This is done twice� once to count the size ofeach required symbol table� and again to build the tables� which have beenmalloced between passes� From now on� to determine whether a symbol ison an include or exclude symspec list� gprof simply uses its standard symbollookup routine on the appropriate table in the syms array�

Now the pro�le data �le�s� themselves are read �gmon�io�c gmon�out�read�� �rst by checking for a new�style �gmon�out� header� then assumingthis is an old�style BSD �gmon�out� if the magic number test failed�

New�style histogram records are read by hist�c hist�read�rec� For the�rst histogram record� allocate a memory array to hold all the bins� and readthem in� When multiple pro�le data �les �or �les with multiple histogramrecords� are read� the starting address� ending address� number of bins andsampling rate must match between the various histograms� or a fatal errorwill result� If everything matches� just sum the additional histograms intothe existing in�memory array�

Page 40: GNU - Department of Computer Science · GNU gprof The gnu Proler Ja yF enlason and Ric hard Stallman. This man ual describ es the gnu proler gprof and ho wy ou can use it to determine

� GNU gprof

As each call graph record is read �call�graph�c cg�read�rec�� the par�ent and child addresses are matched to symbol table entries� and a call grapharc is created by cg�arcs�c arc�add� unless the arc fails a symspec checkagainst INCL ARCS�EXCL ARCS� As each arc is added� a linked list ismaintained of the parent�s child arcs� and of the child�s parent arcs� Boththe child�s call count and the arc�s call count are incremented by the record�scall count�

Basic�block records are read �basic�blocks�c bb�read�rec�� but only ifline�by�line pro�ling has been selected� Each basic�block address is matchedto a corresponding line symbol in the symbol table� and an entry made in thesymbol�s bb addr and bb calls arrays� Again� if multiple basic�block recordsare present for the same address� the call counts are cumulative�

A gmon�sum �le is dumped� if requested �gmon�io�c gmon�out�write��

If histograms were present in the data �les� assign them to symbols�hist�c hist�assign�samples� by iterating over all the sample bins andassigning them to symbols� Since the symbol table is sorted in order of as�cending memory addresses� we can simple follow along in the symbol tableas we make our pass over the sample bins� This step includes a symspeccheck against INCL FLAT�EXCL FLAT� Depending on the histogram scalefactor� a sample bin may span multiple symbols� in which case a fractionof the sample count is allocated to each symbol� proportional to the degreeof overlap� This e�ect is rare for normal pro�ling� but overlaps are morecommon during line�by�line pro�ling� and can cause each of two adjacentlines to be credited with half a hit� for example�

If call graph data is present� cg�arcs�c cg�assemble is called� First� if��c� was speci�ed� a machine�dependant routine �find�call� scans througheach symbol�s machine code� looking for subroutine call instructions� andadding them to the call graph with a zero call count� A topological sortis performed by depth��rst numbering all the symbols �cg�dfn�c cg�dfn��so that children are always numbered less than their parents� then mak�ing a array of pointers into the symbol table and sorting it into numeri�cal order� which is reverse topological order �children appear before par�ents�� Cycles are also detected at this point� all members of which areassigned the same topological number� Two passes are now made throughthis sorted array of symbol pointers� The �rst pass� from end to beginning�parents to children�� computes the fraction of child time to propogate toeach parent and a print �ag� The print �ag re�ects symspec handling ofINCL GRAPH�EXCL GRAPH� with a parent�s include or exclude �print orno print� property being propagated to its children� unless they themselvesexplicitly appear in INCL GRAPH or EXCL GRAPH� A second pass� frombeginning to end �children to parents� actually propogates the timings alongthe call graph� subject to a check against INCL TIME�EXCL TIME� Withthe print �ag� fractions� and timings now stored in the symbol structures�

Page 41: GNU - Department of Computer Science · GNU gprof The gnu Proler Ja yF enlason and Ric hard Stallman. This man ual describ es the gnu proler gprof and ho wy ou can use it to determine

Chapter �� Details of Pro�ling ��

the topological sort array is now discarded� and a new array of pointers isassembled� this time sorted by propagated time�

Finally� print the various outputs the user requested� which is now fairlystraightforward� The call graph �cg�print�c cg�print� and �at pro�le�hist�c hist�print� are regurgitations of values already computed� Theannotated source listing �basic�blocks�c print�annotated�source� usesbasic�block information� if present� to label each line of code with call counts�otherwise only the function call counts are presented�

The function ordering code is marginally well documented in the sourcecode itself �cg�print�c�� Basically� the functions with the most use and themost parents are placed �rst� followed by other functions with the most use�followed by lower use functions� followed by unused functions at the end�

����� Debugging gprof

If gprof was compiled with debugging enabled� the ��d� option triggersdebugging output �to stdout� which can be helpful in understanding its op�eration� The debugging number speci�ed is interpreted as a sum of thefollowing options�

� Topological sortMonitor depth��rst numbering of symbols during call graphanalysis

� � Cycles Shows symbols as they are identi�ed as cycle heads

�� � TallyingAs the call graph arcs are read� show each arc and how the totalcalls to each function are tallied

� � Call graph arc sortingDetails sorting individual parents�children within each callgraph entry

�� � Reading histogram and call graph recordsShows address ranges of histograms as they are read� and eachcall graph arc

� � Symbol tableReading� classifying� and sorting the symbol table from the ob�ject �le� For line�by�line pro�ling ���l� option�� also shows linenumbers being assigned to memory addresses�

�� � Static call graphTrace operation of ��c� option

�� � Symbol table and arc table lookupsDetail operation of lookup routines

Page 42: GNU - Department of Computer Science · GNU gprof The gnu Proler Ja yF enlason and Ric hard Stallman. This man ual describ es the gnu proler gprof and ho wy ou can use it to determine

�� GNU gprof

��� � Call graph propagationShows how function times are propagated along the call graph

�� � Basic�blocksShows basic�block records as they are read from pro�le data�only meaningful with ��l� option�

���� � SymspecsShows symspec�to�symbol pattern matching operation

�� � Annotate sourceTracks operation of ��A� option

Page 43: GNU - Department of Computer Science · GNU gprof The gnu Proler Ja yF enlason and Ric hard Stallman. This man ual describ es the gnu proler gprof and ho wy ou can use it to determine

i

Table of Contents

� Introduction to Pro�ling � � � � � � � � � � � � � � � � � � � �

� Compiling a Program for Pro�ling � � � � � � � � � � �

� Executing the Program � � � � � � � � � � � � � � � � � � � � �

� gprof Command Summary � � � � � � � � � � � � � � � � � �

��� Output Options � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ��� Analysis Options � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ����� Miscellaneous Options � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ���� Depricated Options � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ����� Symspecs � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ��

� Interpreting gprofs Output � � � � � � � � � � � � � � � ��

��� The Flat Pro�le � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ���� The Call Graph � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ��

���� The Primary Line � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ���� Lines for a Function�s Callers � � � � � � � � � � � � � � � � � � � ������ Lines for a Function�s Subroutines � � � � � � � � � � � � � � ����� How Mutually Recursive Functions Are Described

� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ���� Line�by�line Pro�ling � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ���� The Annotated Source Listing � � � � � � � � � � � � � � � � � � � � � � � � � � � �

� Inaccuracy of gprof Output� � � � � � � � � � � � � � � � ��

��� Statistical Sampling Error � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ��� Estimating children Times � � � � � � � � � � � � � � � � � � � � � � � � � � � � �

� Answers to Common Questions � � � � � � � � � � � � �

� Incompatibilities with Unix gprof � � � � � � � � � � ��

Page 44: GNU - Department of Computer Science · GNU gprof The gnu Proler Ja yF enlason and Ric hard Stallman. This man ual describ es the gnu proler gprof and ho wy ou can use it to determine

ii GNU gprof

Details of Pro�ling � � � � � � � � � � � � � � � � � � � � � � � � ����� Implementation of Pro�ling � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ���� Pro�ling Data File Format � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ��

���� Histogram Records � � � � � � � � � � � � � � � � � � � � � � � � � � � � ����� Call�Graph Records � � � � � � � � � � � � � � � � � � � � � � � � � � � � ������ Basic�Block Execution Count Records � � � � � � � � � � � ��

��� gprof�s Internal Operation � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ������� Debugging gprof � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ��