123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718719720721722723724725726727728729730731732733734735736737738739740741742743744745746747748749750751752753754755756757758759760761762763764765766767768769770771772773774775776777778779780781782783784785786787788789790791792793794795796797798799800801802803804805806807808809810811812813814815816817818819820821822823824825826827828829830831832833834835836837838839840841842843844845846847848849850851852853854855856857858859860861862863864865866867868869870871872873874875876877878879880881882883884885886887888889890891892893894895896897898899900901902903904905906907908909910911912913914915916917918919920921922923924925926927928929930931932933934 |
- \input texinfo @c -*-texinfo-*-
- @setfilename gprof
- @settitle gprof
- @ifinfo
- This file documents the gprof profiler of the GNU system.
- Copyright (C) 1988 Free Software Foundation, Inc.
- Permission is granted to make and distribute verbatim copies of
- this manual provided the copyright notice and this permission notice
- are preserved on all copies.
- @ignore
- Permission is granted to process this file through Tex and print the
- results, provided the printed document carries copying permission
- notice identical to this one except for the removal of this paragraph
- (this paragraph not being relevant to the printed manual).
- @end ignore
- Permission is granted to copy and distribute modified versions of this
- manual under the conditions for verbatim copying, provided that the entire
- resulting derived work is distributed under the terms of a permission
- notice identical to this one.
- Permission is granted to copy and distribute translations of this manual
- into another language, under the above conditions for modified versions.
- @end ifinfo
- @titlepage
- @center @titlefont{gprof}
- @sp 1
- @center The GNU Profiler
- @sp 2
- @center Jay Fenlason and Richard Stallman
- @sp 4
- This manual describes the GNU profiler, @code{gprof}, and how you can use
- it to determine which parts of a program are taking most of the execution
- time. We assume that you know how to write, compile, and execute programs.
- GNU @code{gprof} was written by Jay Fenlason.
- @sp 8
- Copyright @copyright{} 1988 Free Software Foundation, Inc.
- Permission is granted to make and distribute verbatim copies of
- this manual provided the copyright notice and this permission notice
- are preserved on all copies.
- @ignore
- Permission is granted to process this file through Tex and print the
- results, provided the printed document carries copying permission
- notice identical to this one except for the removal of this paragraph
- (this paragraph not being relevant to the printed manual).
- @end ignore
- Permission is granted to copy and distribute modified versions of this
- manual under the conditions for verbatim copying, provided that the entire
- resulting derived work is distributed under the terms of a permission
- notice identical to this one.
- Permission is granted to copy and distribute translations of this manual
- into another language, under the same conditions as for modified versions.
- @end titlepage
- @ifinfo
- @node Top, Why, Top, (dir)
- @ichapter Profiling a Program: Where Does It Spend Its Time?
- This manual describes the GNU profiler @code{gprof}, and how you can use it
- to determine which parts of a program are taking most of the execution
- time. We assume that you know how to write, compile, and execute programs.
- GNU @code{gprof} was written by Jay Fenlason.
- @menu
- * Why:: What profiling means, and why it is useful.
- * Compiling:: How to compile your program for profiling.
- * Executing:: How to execute your program to generate the
- profile data file @file{gmon.out}.
- * Analyzing:: How to run @code{gprof}, and how to specify
- options for it.
- * Flat Profile:: The flat profile shows how much time was spent
- executing directly in each function.
- * Call Graph:: The call graph shows which functions called which
- others, and how much time each function used
- when its subroutine calls are included.
- * Implementation:: How the profile data is recorded and written.
- * Sampling Error:: Statistical margins of error.
- How to accumulate data from several runs
- to make it more accurate.
- * Assumptions:: Some of @code{gprof}'s measurements are based
- on assumptions about your program
- that could be very wrong.
- * Incompatibilities:: (between GNU @code{gprof} and Unix @code{gprof}.)
- @end menu
- @end ifinfo
- @node Why, Compiling, Top, Top
- @chapter Why Profile
- Profiling allows you to learn where your program spent its time and which
- functions called which other functions while it was executing. This
- information can show you which pieces of your program are slower than you
- expected, and might be candidates for rewriting to make your program
- execute faster. It can also tell you which functions are being called more
- or less often than you expected. This may help you spot bugs that had
- otherwise been unnoticed.
- Since the profiler uses information collected during the actual execution
- of your program, it can be used on programs that are too large or too
- complex to analyze by reading the source. However, how your program is run
- will affect the information that shows up in the profile data. If you
- don't use some feature of your program while it is being profiled, no
- profile information will be generated for that feature.
- Profiling has several steps:
- @itemize @bullet
- @item
- You must compile and link your program with profiling enabled.
- @xref{Compiling}.
- @item
- You must execute your program to generate a profile data file.
- @xref{Executing}.
- @item
- You must run @code{gprof} to analyze the profile data.
- @xref{Analyzing}.
- @end itemize
- The next three chapters explain these steps in greater detail.
- The result of the analysis is a file containing two tables, the
- @dfn{flat profile} and the @dfn{call graph} (plus blurbs which briefly
- explain the contents of these tables).
- The flat profile shows how much time your program spent in each function,
- and how many times that function was called. If you simply want to know
- which functions burn most of the cycles, it is stated concisely here.
- @xref{Flat Profile}.
- The call graph, shows, for each function, which functions called it, which
- other functions it called, and how many times. There is also an estimate
- of how much time was spent in the subroutines of each function. This can
- suggest places where you might try to eliminate function calls that use a
- lot of time. @xref{Call Graph}.
- @node Compiling, Executing, Why, Top
- @chapter Compiling a Program for Profiling
- The first step in generating profile information for your program is
- to compile and link it with profiling enabled.
- To compile a source file for profiling, specify the @samp{-pg} option when
- you run the compiler. (This is in addition to the options you normally
- use.)
- To link the program for profiling, if you use a compiler such as @code{cc}
- to do the linking, simply specify @samp{-pg} in addition to your usual
- options. The same option, @samp{-pg}, alters either compilation or linking
- to do what is necessary for profiling. Here are examples:
- @example
- cc -g myprog.c utils.c -pg
- cc -o myprog myprog.o utils.o -pg
- @end example
- The @samp{-pg} option also works with a command that both compiles and links:
- @example
- cc -o myprog myprog.c utils.c -g -pg
- @end example
- If you run the linker @code{ld} directly instead of through a compiler such
- as @code{cc}, you must specify the profiling startup file
- @file{/lib/gcrt0.o} as the first input file instead of the usual startup
- file @file{/lib/crt0.o}. In addition, you would probably want to specify
- the profiling C library, @file{/usr/lib/libc_p.a}, by writing @samp{-lc_p}
- instead of the usual @samp{-lc}. This is not absolutely necessary, but doing
- this gives you number-of-calls information for standard library functions such
- as @code{read} and @code{open}. For example:
- @example
- ld -o myprog /lib/gcrt0.o myprog.o utils.o -lc_p
- @end example
- If you compile only some of the modules of the program with @samp{-pg}, you
- can still profile the program, but you won't get complete information about
- the modules that were compiled without @samp{-pg}. The only information
- you get for the functions in those modules is the total time spent in them;
- there is no record of how many times they were called, or from where. This
- will not affect the flat profile (except that the @code{calls} field for
- the functions will be blank), but will greatly reduce the usefulness of the
- call graph.
- So far GNU @code{gprof} has been tested only with C programs, but it ought
- to work with any language in which programs are compiled and linked to form
- executable files. If it does not, please let us know.
- @node Executing, Analyzing, Compiling, Top
- @chapter Executing the Program to Generate Profile Data
- Once the program is compiled for profiling, you must run it in order to
- generate the information that @code{gprof} needs. Simply run the program
- as usual, using the normal arguments, file names, etc. The program should
- run normally, producing the same output as usual. It will, however, run
- somewhat slower than normal because of the time spent collecting and the
- writing the profile data.
- The way you run the program---the arguments and input that you give
- it---may have a dramatic effect on what the profile information shows. The
- profile data will describe the parts of the program that were activated for
- the particular input you use. For example, if the first command you give
- to your program is to quit, the profile data will show the time used in
- initialization and in cleanup, but not much else.
- You program will write the profile data into a file called @file{gmon.out}
- just before exiting. If there is already a file called @file{gmon.out},
- its contents are overwritten. There is currently no way to tell the
- program to write the profile data under a different name, but you can rename
- the file afterward if you are concerned that it may be overwritten.
- In order to write the @file{gmon.out} file properly, your program must exit
- normally: by returning from @code{main} or by calling @code{exit}. Calling
- the low-level function @code{_exit} does not write the profile data, and
- neither does abnormal termination due to an unhandled signal.
- The @file{gmon.out} file is written in the program's @emph{current working
- directory} at the time it exits. This means that if your program calls
- @code{chdir}, the @file{gmon.out} file will be left in the last directory
- your program @code{chdir}'d to. If you don't have permission to write in
- this directory, the file is not written. You may get a confusing error
- message if this happens. (We have not yet replaced the part of Unix
- responsible for this; when we do, we will make the error message
- comprehensible.)
- @node Analyzing, Flat Profile, Executing, Top
- @chapter Analyzing the Profile Data: @code{gprof} Command Summary
- After you have a profile data file @file{gmon.out}, you can run @code{gprof}
- to interpret the information in it. The @code{gprof} program prints a
- flat profile and a call graph on standard output. Typically you would
- redirect the output of @code{gprof} into a file with @samp{>}.
- You run @code{gprof} like this:
- @example
- gprof @var{options} [@var{executable-file} [@var{profile-data-files}@dots{}]] [> @var{outfile}]
- @end example
- @noindent
- Here square-brackets indicate optional arguments.
- If you omit the executable file name, the file @file{a.out} is used. If
- you give no profile data file name, the file @file{gmon.out} is used. If
- any file is not in the proper format, or if the profile data file does not
- appear to belong to the executable file, an error message is printed.
- You can give more than one profile data file by entering all their names
- after the executable file name; then the statistics in all the data files
- are summed together.
- The following options may be used to selectively include or exclude
- functions in the output:
- @table @code
- @item -a
- The @code{-a} option causes @code{gprof} to ignore static (private)
- functions. (These are functions whose names are not listed as global,
- and which are not visible outside the file/function/block where they
- were defined.) Time spent in these functions, calls to/from them,
- etc, will all be attributed to the function that was loaded directly
- before it in the executable file. This is compatible with Unix
- @code{gprof}, but a bad idea. This option affects both the flat
- profile and the call graph.
- @item -e @var{function_name}
- The @code{-e @var{function}} option tells @code{gprof} to not print
- information about the function (and its children@dots{}) in the call
- graph. The function will still be listed as a child of any functions
- that call it, but its index number will be shown as @samp{[not
- printed]}.
- @item -E @var{function_name}
- The @code{-E @var{function}} option works like the @code{-e} option,
- but time spent in the function (and children who were not called from
- anywhere else), will not be used to compute the percentages-of-time
- for the call graph.
- @item -f @var{function_name}
- The @code{-f @var{function}} option causes @code{gprof} to limit the
- call graph to the function and its children (and their
- children@dots{}).
- @item -F @var{function_name}
- The @code{-F @var{function}} option works like the @code{-f} option,
- but only time spent in the function and its children (and their
- children@dots{}) will be used to determine total-time and
- percentages-of-time for the call graph.
- @item -z
- If you give the @code{-z} option, @code{gprof} will mention all
- functions in the flat profile, even those that were never called, and
- that had no time spent in them.
- @end table
- The order of these options does not matter.
- Note that only one function can be specified with each @code{-e},
- @code{-E}, @code{-f} or @code{-F} option. To specify more than one
- function, use multiple options. For example, this command:
- @example
- gprof -e boring -f foo -f bar myprogram > gprof.output
- @end example
- @noindent
- lists in the call graph all functions that were reached from either
- @code{foo} or @code{bar} and were not reachable from @code{boring}.
- There are two other useful @code{gprof} options:
- @table @code
- @item -b
- If the @code{-b} option is given, @code{gprof} doesn't print the
- verbose blurbs that try to explain the meaning of all of the fields in
- the tables. This is useful if you intend to print out the output, or
- are tired of seeing the blurbs.
- @item -s
- The @code{-s} option causes @code{gprof} to summarize the information
- in the profile data files it read in, and write out a profile data
- file called @file{gmon.sum}, which contains all the information from
- the profile data files that @code{gprof} read in. The file @file{gmon.sum}
- may be one of the specified input files; the effect of this is to
- merge the data in the other input files into @file{gmon.sum}.
- @xref{Sampling Error}.
- Eventually you can run @code{gprof} again without @samp{-s} to analyze the
- cumulative data in the file @file{gmon.sum}.
- @end table
- @node Flat Profile, Call Graph, Analyzing, Top
- @chapter How to Understand the Flat Profile
- @cindex flat profile
- The @dfn{flat profile} shows the total amount of time your program
- spent executing each function. Unless the @samp{-z} option is given,
- functions with no apparent time spent in them, and no apparent calls
- to them, are not mentioned. Note that if a function was not compiled
- for profiling, and didn't run long enough to show up on the program
- counter histogram, it will be indistinguishable from a function that
- was never called.
- @c???
- Here is a sample flat profile for a small program:
- @example
- Each sample counts as 0.01 seconds.
- % time seconds cumsec calls function
- 79.17 0.19 0.19 6 a
- 16.67 0.04 0.23 1 main
- 4.17 0.01 0.24 mcount
- 0.00 0 0.24 1 profil
- @end example
- @noindent
- The functions are sorted by decreasing run-time spent in them. The
- functions @code{mcount} and @code{profil} are part of the profiling
- aparatus and appear in every flat profile; their time gives a measure of
- the amount of overhead due to profiling. (These internal functions are
- omitted from the call graph.)
- The sampling period estimates the margin of error in each of the time
- figures. A time figure that is not much larger than this is not reliable.
- In this example, the @code{seconds} field for @code{mcount} might well be 0
- or 0.02 in another run. @xref{Sampling Error}, for a complete discussion.
- Here is what the fields in each line mean:
- @table @code
- @item % time
- This is the percentage of the total execution time your program spent
- in this function. These should all add up to 100%.
- @item seconds
- This is the total number of seconds the computer spent executing the
- user code of this function.
- @item cumsec
- This is the cumulative total number of seconds the computer spent
- executing this functions, plus the time spent in all the functions
- above this one in this table.
- @item calls
- This is the total number of times the function was called. If the
- function was never called, or the number of times it was called cannot
- be determined (probably because the function was not compiled with
- profiling enabled), the @dfn{calls} field is blank.
- @item function
- This is the name of the function.
- @end table
- @node Call Graph, Implementation, Flat Profile, Top
- @chapter How to Read the Call Graph
- @cindex call graph
- The @dfn{call graph} shows how much time was spent in each function
- and its children. From this information, you can find functions that,
- while they themselves may not have used much time, called other
- functions that did use unusual amounts of time.
- Here is a sample call from a small program. This call came from the
- same @code{gprof} run as the flat profile example in the previous
- chapter.
- @example
- index % time self children called name
- <spontaneous>
- [1] 100.00 0 0.23 0 start [1]
- 0.04 0.19 1/1 main [2]
- ----------------------------------------
- 0.04 0.19 1/1 start [1]
- [2] 100.00 0.04 0.19 1 main [2]
- 0.19 0 1/1 a [3]
- ----------------------------------------
- 0.19 0 1/1 main [2]
- [3] 82.61 0.19 0 1+5 a [3]
- ----------------------------------------
- @end example
- The lines full of dashes divide this table into @dfn{entries}, one for each
- function. Each entry has one or more lines.
- In each entry, the primary line is the one that starts with an index number
- in square brackets. The end of this line says which function the entry is
- for. The preceding lines in the entry describe the callers of this
- function and the following lines describe its subroutines (also called
- @dfn{children} when we speak of the call graph).
- The entries are sorted by time spent in the function and its subroutines.
- The internal profiling functions @code{mcount} and @code{profil}
- (@pxref{Flat Profile}) are never mentioned in the call graph.
- @menu
- * Primary:: Details of the primary line's contents.
- * Callers:: Details of caller-lines' contents.
- * Subroutines:: Details of subroutine-lines' contents.
- * Cycles:: When there are cycles of recursion,
- such as @code{a} calls @code{b} calls @code{a}@dots{}
- @end menu
- @node Primary, Callers, Call Graph, Call Graph
- @section The Primary Line
- The @dfn{primary line} in a call graph entry is the line that
- describes the function which the entry is about and gives the overall
- statistics for this function.
- For reference, we repeat the primary line from the entry for function
- @code{a} in our main example, together with the heading line that shows the
- names of the fields:
- @example
- index % time self children called name
- @dots{}
- [3] 82.61 0.19 0 1+5 a [3]
- @end example
- Here is what the fields in the primary line mean:
- @table @code
- @item index
- Entries are numbered with consecutive integers. Each function
- therefore has an index number, which appears at the beginning of its
- primary line.
- Each cross-reference to a function, as a caller or subroutine of
- another, gives its index number as well as its name. The index number
- guides you if you wish to look for the entry for that function.
- @item % time
- This is the percentage of the total time that was spent in this
- function, including time spent in subroutines called from this
- function.
- The time spent in this function is counted again for the callers of
- this function. Therefore, adding up these percentages is meaningless.
- @item self
- This is the total amount of time spent in this function. This
- should be identical to the number printed in the @code{seconds} field
- for this function in the flat profile.
- @item children
- This is the total amount of time spent in the subroutine calls made by
- this function. This should be equal to the sum of all the @code{self}
- and @code{children} entries of the children listed directly below this
- function.
- @item called
- This is the number of times the function was called.
- If the function called itself recursively, there are two numbers,
- separated by a @samp{+}. The first number counts non-recursive calls,
- and the second counts recursive calls.
- In the example above, the function @code{a} called itself five times,
- and was called once from @code{main}.
- @item name
- This is the name of the current function. The index number is
- repeated after it.
- If the function is part of a cycle of recursion, the cycle number is
- printed between the function's name and the index number
- (@pxref{Cycles}). For example, if function @code{gnurr} is part of
- cycle number one, and has index number twelve, its primary line would
- be end like this:
- @example
- gnurr <cycle 1> [12]
- @end example
- @end table
- @node Callers, Subroutines, Primary, Call Graph
- @section Lines for a Function's Callers
- A function's entry has a line for each function it was called by.
- These lines' fields correspond to the fields of the primary line, but
- their meanings are different because of the difference in context.
- For reference, we repeat two lines from the entry for the function
- @code{a}, the primary line and one caller-line preceding it, together
- with the heading line that shows the names of the fields:
- @example
- index % time self children called name
- @dots{}
- 0.19 0 1/1 main [2]
- [3] 82.61 0.19 0 1+5 a [3]
- @end example
- Here are the meanings of the fields in the caller-line for @code{a}
- called from @code{main}:
- @table @code
- @item self
- An estimate of the amount of time spent in @code{a} itself when it was
- called from @code{main}.
- @item children
- An estimate of the amount of time spent in @code{a}'s subroutines when
- @code{a} was called from @code{main}.
- The sum of the @code{self} and @code{children} fields is an estimate
- of the amount of time spent within calls to @code{a} from @code{main}.
- @item called
- Two numbers: the number of times @code{a} was called from @code{main},
- followed by the total number of nonrecursive calls to @code{a} from
- all its callers.
- @item name and index number
- The name of the caller of @code{a} to which this line applies,
- followed by the caller's index number.
- Not all functions have entries in the call graph; some
- options to @code{gprof} request the omission of certain functions.
- When a caller has no entry of its own, it still has caller-lines
- in the entries of the functions it calls. Since this caller
- has no index number, the string @samp{[not printed]} is used
- instead of one.
- If the caller is part of a recursion cycle, the cycle number is
- printed between the name and the index number.
- @end table
- If the identity of the callers of a function cannot be determined, a
- dummy caller-line is printed which has @samp{<spontaneous>} as the
- ``caller's name'' and all other fields blank. This can happen for
- signal handlers.
- @c What if some calls have determinable callers' names but not all?
- @node Subroutines, Cycles, Callers, Call Graph
- @section Lines for a Function's Subroutines
- A function's entry has a line for each of its subroutines---in other
- words, a line for each other function that it called. These lines'
- fields correspond to the fields of the primary line, but their meanings
- are different because of the difference in context.
- For reference, we repeat two lines from the entry for the function
- @code{main}, the primary line and a line for a subroutine, together
- with the heading line that shows the names of the fields:
- @example
- index % time self children called name
- @dots{}
- [2] 100.00 0.04 0.19 1 main [2]
- 0.19 0 1/1 a [3]
- @end example
- Here are the meanings of the fields in the subroutine-line for @code{main}
- calling @code{a}:
- @table @code
- @item self
- An estimate of the amount of time spent directly within @code{a}
- when @code{a} was called from @code{main}.
- @item children
- An estimate of the amount of time spent in subroutines of @code{a}
- when @code{a} was called from @code{main}.
- The sum of the @code{self} and @code{children} fields is an estimate
- of the total time spent in calls to @code{a} from @code{main}.
- @item called
- Two numbers, the number of calls to @code{a} from @code{main}
- followed by the total number of nonrecursive calls to @code{a}.
- @item name
- The name of the subroutine of @code{a} to which this line applies,
- followed by the subroutine's index number. If the subroutine is
- a function omitted from the call graph, it has no index number,
- so @samp{[not printed]} appears instead.
- If the caller is part of a recursion cycle, the cycle number is
- printed between the name and the index number.
- @end table
- @node Cycles,, Subroutines, Call Graph
- @section How Mutually Recursive Functions Are Described
- @cindex cycle
- @cindex recursion cycle
- The graph may be complicated by the presence of @dfn{cycles of
- recursion} in the call graph. A cycle exists if a function calls
- another function that (directly or indirectly) calls (or appears to
- call) the original function. For example: if @code{a} calls @code{b},
- and @code{b} calls @code{a}, then @code{a} and @code{b} form a cycle.
- Whenever there are call-paths both ways between a pair of functions, they
- belong to the same cycle. If @code{a} and @code{b} call each other and
- @code{b} and @code{c} call each other, all three make one cycle. Note that
- even if @code{b} only calls @code{a} if it was not called from @code{a},
- @code{gprof} cannot determine this, so @code{a} and @code{b} are still
- considered a cycle.
- The cycles are numbered with consecutive integers. When a function
- belongs to a cycle, each time the function name appears in the call graph
- it is followed by @samp{<cycle @var{number}>}.
- The reason cycles matter is that they make the time values in the call
- graph paradoxical. The ``time spent in children'' of @code{a} should
- include the time spent in its subroutine @code{b} and in @code{b}'s
- subroutines---but one of @code{b}'s subroutines is @code{a}! How much of
- @code{a}'s time should be included in the children of @code{a}, when
- @code{a} is indirectly recursive?
- The way @code{gprof} resolves this paradox is by creating a single entry
- for the cycle as a whole. The primary line of this entry describes the
- total time spent directly in the functions of the cycle. The
- ``subroutines'' of the cycle are the individual functions of the cycle, and
- all other functions that were called directly by them. The ``callers'' of
- the cycle are the functions, outside the cycle, that called functions in
- the cycle.
- Here is a portion of the call graph which shows a cycle containing
- functions @code{a} and @code{b}. The cycle was entered by a call to
- @code{a} from @code{main}; both @code{a} and @code{b} called @code{c}.@refill
- @example
- index % time self children called name
- ----------------------------------------
- 1.77 0 1/1 main [2]
- [3] 91.71 1.77 0 1+5 <cycle 1 as a whole> [3]
- 1.02 0 3 b <cycle 1> [4]
- 0.75 0 2 a <cycle 1> [5]
- ----------------------------------------
- 3 a <cycle 1> [5]
- [4] 52.85 1.02 0 0 b <cycle 1> [4]
- 2 a <cycle 1> [5]
- 0 0 3/6 c [6]
- ----------------------------------------
- 1.77 0 1/1 main [2]
- 2 b <cycle 1> [4]
- [5] 38.86 0.75 0 1 a <cycle 1> [5]
- 3 b <cycle 1> [4]
- 0 0 3/6 c [6]
- ----------------------------------------
- @end example
- @noindent
- (The entire call graph for this program contains in addition an entry for
- @code{main}, which calls @code{a}, and an entry for @code{c}, with callers
- @code{a} and @code{b}.)
- @example
- index % time self children called name
- <spontaneous>
- [1] 100.00 0 1.93 0 start [1]
- 0.16 1.77 1/1 main [2]
- ----------------------------------------
- 0.16 1.77 1/1 start [1]
- [2] 100.00 0.16 1.77 1 main [2]
- 1.77 0 1/1 a <cycle 1> [5]
- ----------------------------------------
- 1.77 0 1/1 main [2]
- [3] 91.71 1.77 0 1+5 <cycle 1 as a whole> [3]
- 1.02 0 3 b <cycle 1> [4]
- 0.75 0 2 a <cycle 1> [5]
- 0 0 6/6 c [6]
- ----------------------------------------
- 3 a <cycle 1> [5]
- [4] 52.85 1.02 0 0 b <cycle 1> [4]
- 2 a <cycle 1> [5]
- 0 0 3/6 c [6]
- ----------------------------------------
- 1.77 0 1/1 main [2]
- 2 b <cycle 1> [4]
- [5] 38.86 0.75 0 1 a <cycle 1> [5]
- 3 b <cycle 1> [4]
- 0 0 3/6 c [6]
- ----------------------------------------
- 0 0 3/6 b <cycle 1> [4]
- 0 0 3/6 a <cycle 1> [5]
- [6] 0.00 0 0 6 c [6]
- ----------------------------------------
- @end example
- The @code{self} field of the cycle's primary line is the total time
- spent in all the functions of the cycle. It equals the sum of the
- @code{self} fields for the individual functions in the cycle, found
- in the entry in the subroutine lines for these functions.
- The @code{children} fields of the cycle's primary line and subroutine lines
- count only subroutines outside the cycle. Even though @code{a} calls
- @code{b}, the time spent in those calls to @code{b} is not counted in
- @code{a}'s @code{children} time. Thus, we do not encounter the problem of
- what to do when the time in those calls to @code{b} includes indirect
- recursive calls back to @code{a}.
- The @code{children} field of a caller-line in the cycle's entry estimates
- the amount of time spent @emph{in the whole cycle}, and its other
- subroutines, on the times when that caller called a function in the cycle.
- The @code{calls} field in the primary line for the cycle has two numbers:
- first, the number of times functions in the cycle were called by functions
- outside the cycle; second, the number of times they were called by
- functions in the cycle (including times when a function in the cycle calls
- itself). This is a generalization of the usual split into nonrecursive and
- recursive calls.
- The @code{calls} field of a subroutine-line for a cycle member in the
- cycle's entry says how many time that function was called from functions in
- the cycle. The total of all these is the second number in the primary line's
- @code{calls} field.
- In the individual entry for a function in a cycle, the other functions in
- the same cycle can appear as subroutines and as callers. These lines show
- how many times each function in the cycle called or was called from each other
- function in the cycle. The @code{self} and @code{children} fields in these
- lines are blank because of the difficulty of defining meanings for them
- when recursion is going on.
- @node Implementation, Sampling Error, Call Graph, Top
- @chapter Implementation of Profiling
- Profiling works by changing how every function in your program is compiled
- so that when it is called, it will stash away some information about where
- it was called from. From this, the profiler can figure out what function
- called it, and can count how many times it was called. This change is made
- by the compiler when your program is compiled with the @samp{-pg} option.
- Profiling also involves watching your program as it runs, and keeping a
- histogram of where the program counter happens to be every now and then.
- Typically the program counter is looked at around 100 times per second of
- run time, but the exact frequency may vary from system to system.
- A special startup routine allocates memory for the histogram and sets up a
- clock signal handler to make entries in it. Use of this special startup
- routine is one of the effects of using @samp{cc -pg} to link. The startup
- file also includes an @code{exit} function which is responsible for writing
- the file @file{gmon.out}.
- Number-of-calls information for library routines is collected by using a
- special version of the C library. The programs in it are the same as in
- the usual C library, but they were compiled with @samp{-pg}. If you link
- your program with @samp{cc -pg}, it automatically uses the profiling
- version of the library.
- The output @code{gprof} gives no indication of parts of your program that
- are limited by I/O or swapping bandwidth. This is because samples of the
- program counter are taken at fixed intervals of run time. Therefore, the
- time measurements in @code{gprof} output say nothing about time that your
- program was not running. For example, a part of the program that creates
- so much data that it cannot all fit in physical memory at once may run very
- slowly due to thrashing, but @code{gprof} will say it uses little time. On
- the other hand, sampling by run time has the advantage that the amount of
- load due to other users won't directly affect the output you get.
- @node Sampling Error, Assumptions, Implementation, Top
- @chapter Statistical Inaccuracy of @code{gprof} Output
- The run-time figures that @code{gprof} gives you are based on a sampling
- process, so they are subject to statistical inaccuracy. If a function runs
- only a small amount of time, so that on the average the sampling process
- ought to catch that function in the act only once, there is a pretty good
- chance it will actually find that function zero times, or twice.
- By contrast, the number-of-calls figures are derived by counting, not
- sampling. They are completely accurate and will not vary from run to run
- if your program is deterministic.
- The @dfn{sampling period} that is printed at the beginning of the flat
- profile says how often samples are taken. The rule of thumb is that a
- run-time figure is accurate if it is considerably bigger than the sampling
- period.
- The actual amount of error is usually more than one sampling period. In
- fact, if a value is @var{n} times the sampling period, the @emph{expected}
- error in it is the square-root of @var{n} sampling periods. If the
- sampling period is 0.01 seconds and @code{foo}'s run-time is 1 second, the
- expected error in @code{foo}'s run-time is 0.1 seconds. It is likely to
- vary this much @emph{on the average} from one profiling run to the next.
- (@emph{Sometimes} it will vary more.)
- This does not mean that a small run-time figure is devoid of information.
- If the program's @emph{total} run-time is large, a small run-time for one
- function does tell you that that function used an insignificant fraction of
- the whole program's time. Usually this means it is not worth optimizing.
- One way to get more accuracy is to give your program more (but similar)
- input data so it will take longer. Another way is to combine the data from
- several runs, using the @samp{-s} option of @code{gprof}. Here is how:
- @enumerate
- @item
- Run your program once.
- @item
- Issue the command @samp{mv gmon.out gmon.sum}.
- @item
- Run your program again, the same as before.
- @item
- Merge the new data in @file{gmon.out} into @file{gmon.sum} with this command:
- @example
- gprof -s @var{executable-file} gmon.out gmon.sum
- @end example
- @item
- Repeat the last two steps as often as you wish.
- @item
- Analyze the cumulative data using this command:
- @example
- gprof @var{executable-file} gmon.sum > @var{output-file}
- @end example
- @end enumerate
- @node Assumptions, Incompatibilities, Sampling Error, Top
- @chapter Estimating @code{children} Times Uses an Assumption
- Some of the figures in the call graph are estimates---for example, the
- @code{children} time values and all the the time figures in caller and
- subroutine lines.
- There is no direct information about these measurements in the profile
- data itself. Instead, @code{gprof} estimates them by making an assumption
- about your program that might or might not be true.
- The assumption made is that the average time spent in each call to any
- function @code{foo} is not correlated with who called @code{foo}. If
- @code{foo} used 5 seconds in all, and 2/5 of the calls to @code{foo} came
- from @code{a}, then @code{foo} contributes 2 seconds to @code{a}'s
- @code{children} time, by assumption.
- This assumption is usually true enough, but for some programs it is far
- from true. Suppose that @code{foo} returns very quickly when its argument
- is zero; suppose that @code{a} always passes zero as an argument, while
- other callers of @code{foo} pass other arguments. In this program, all the
- time spent in @code{foo} is in the calls from callers other than @code{a}.
- But @code{gprof} has no way of knowing this; it will blindly and
- incorrectly charge 2 seconds of time in @code{foo} to the children of
- @code{a}.
- We hope some day to put more complete data into @file{gmon.out}, so that
- this assumption is no longer needed, if we can figure out how. For the
- nonce, the estimated figures are usually more useful than misleading.
- @node Incompatibilities, , Assumptions, Top
- @chapter Incompatibilities with Unix @code{gprof}
- GNU @code{gprof} and Berkeley Unix @code{gprof} use the same data file
- @file{gmon.out}, and provide essentially the same information. But there a
- few differences.@refill
- GNU @code{gprof} does not support the @samp{-c} option which prints a
- static call graph based on reading the machine language of your
- program. We think that program cross-references ought to be based on
- the source files, which can be analyzed in a machine-independent
- fashion.@refill
- For a recursive function, Unix @code{gprof} lists the function as a parent
- and as a child, with a @code{calls} field that lists the number of
- recursive calls. GNU @code{gprof} omits these lines and puts the number of
- recursive calls in the primary line.
- When a function is suppressed from the call graph with @samp{-e}, GNU
- @code{gprof} still lists it as a subroutine of functions that call it.
- The function names printed in GNU @code{gprof} output do not include
- the leading underscores that are added internally to the front of all
- C identifiers on many operating systems.
- The blurbs, field widths, and output formats are different. GNU
- @code{gprof} prints blurbs after the tables, so that you can see the
- tables without skipping the blurbs.
- @contents
- @bye
|