gprof.texinfo 38 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718719720721722723724725726727728729730731732733734735736737738739740741742743744745746747748749750751752753754755756757758759760761762763764765766767768769770771772773774775776777778779780781782783784785786787788789790791792793794795796797798799800801802803804805806807808809810811812813814815816817818819820821822823824825826827828829830831832833834835836837838839840841842843844845846847848849850851852853854855856857858859860861862863864865866867868869870871872873874875876877878879880881882883884885886887888889890891892893894895896897898899900901902903904905906907908909910911912913914915916917918919920921922923924925926927928929930931932933934
  1. \input texinfo @c -*-texinfo-*-
  2. @setfilename gprof
  3. @settitle gprof
  4. @ifinfo
  5. This file documents the gprof profiler of the GNU system.
  6. Copyright (C) 1988 Free Software Foundation, Inc.
  7. Permission is granted to make and distribute verbatim copies of
  8. this manual provided the copyright notice and this permission notice
  9. are preserved on all copies.
  10. @ignore
  11. Permission is granted to process this file through Tex and print the
  12. results, provided the printed document carries copying permission
  13. notice identical to this one except for the removal of this paragraph
  14. (this paragraph not being relevant to the printed manual).
  15. @end ignore
  16. Permission is granted to copy and distribute modified versions of this
  17. manual under the conditions for verbatim copying, provided that the entire
  18. resulting derived work is distributed under the terms of a permission
  19. notice identical to this one.
  20. Permission is granted to copy and distribute translations of this manual
  21. into another language, under the above conditions for modified versions.
  22. @end ifinfo
  23. @titlepage
  24. @center @titlefont{gprof}
  25. @sp 1
  26. @center The GNU Profiler
  27. @sp 2
  28. @center Jay Fenlason and Richard Stallman
  29. @sp 4
  30. This manual describes the GNU profiler, @code{gprof}, and how you can use
  31. it to determine which parts of a program are taking most of the execution
  32. time. We assume that you know how to write, compile, and execute programs.
  33. GNU @code{gprof} was written by Jay Fenlason.
  34. @sp 8
  35. Copyright @copyright{} 1988 Free Software Foundation, Inc.
  36. Permission is granted to make and distribute verbatim copies of
  37. this manual provided the copyright notice and this permission notice
  38. are preserved on all copies.
  39. @ignore
  40. Permission is granted to process this file through Tex and print the
  41. results, provided the printed document carries copying permission
  42. notice identical to this one except for the removal of this paragraph
  43. (this paragraph not being relevant to the printed manual).
  44. @end ignore
  45. Permission is granted to copy and distribute modified versions of this
  46. manual under the conditions for verbatim copying, provided that the entire
  47. resulting derived work is distributed under the terms of a permission
  48. notice identical to this one.
  49. Permission is granted to copy and distribute translations of this manual
  50. into another language, under the same conditions as for modified versions.
  51. @end titlepage
  52. @ifinfo
  53. @node Top, Why, Top, (dir)
  54. @ichapter Profiling a Program: Where Does It Spend Its Time?
  55. This manual describes the GNU profiler @code{gprof}, and how you can use it
  56. to determine which parts of a program are taking most of the execution
  57. time. We assume that you know how to write, compile, and execute programs.
  58. GNU @code{gprof} was written by Jay Fenlason.
  59. @menu
  60. * Why:: What profiling means, and why it is useful.
  61. * Compiling:: How to compile your program for profiling.
  62. * Executing:: How to execute your program to generate the
  63. profile data file @file{gmon.out}.
  64. * Analyzing:: How to run @code{gprof}, and how to specify
  65. options for it.
  66. * Flat Profile:: The flat profile shows how much time was spent
  67. executing directly in each function.
  68. * Call Graph:: The call graph shows which functions called which
  69. others, and how much time each function used
  70. when its subroutine calls are included.
  71. * Implementation:: How the profile data is recorded and written.
  72. * Sampling Error:: Statistical margins of error.
  73. How to accumulate data from several runs
  74. to make it more accurate.
  75. * Assumptions:: Some of @code{gprof}'s measurements are based
  76. on assumptions about your program
  77. that could be very wrong.
  78. * Incompatibilities:: (between GNU @code{gprof} and Unix @code{gprof}.)
  79. @end menu
  80. @end ifinfo
  81. @node Why, Compiling, Top, Top
  82. @chapter Why Profile
  83. Profiling allows you to learn where your program spent its time and which
  84. functions called which other functions while it was executing. This
  85. information can show you which pieces of your program are slower than you
  86. expected, and might be candidates for rewriting to make your program
  87. execute faster. It can also tell you which functions are being called more
  88. or less often than you expected. This may help you spot bugs that had
  89. otherwise been unnoticed.
  90. Since the profiler uses information collected during the actual execution
  91. of your program, it can be used on programs that are too large or too
  92. complex to analyze by reading the source. However, how your program is run
  93. will affect the information that shows up in the profile data. If you
  94. don't use some feature of your program while it is being profiled, no
  95. profile information will be generated for that feature.
  96. Profiling has several steps:
  97. @itemize @bullet
  98. @item
  99. You must compile and link your program with profiling enabled.
  100. @xref{Compiling}.
  101. @item
  102. You must execute your program to generate a profile data file.
  103. @xref{Executing}.
  104. @item
  105. You must run @code{gprof} to analyze the profile data.
  106. @xref{Analyzing}.
  107. @end itemize
  108. The next three chapters explain these steps in greater detail.
  109. The result of the analysis is a file containing two tables, the
  110. @dfn{flat profile} and the @dfn{call graph} (plus blurbs which briefly
  111. explain the contents of these tables).
  112. The flat profile shows how much time your program spent in each function,
  113. and how many times that function was called. If you simply want to know
  114. which functions burn most of the cycles, it is stated concisely here.
  115. @xref{Flat Profile}.
  116. The call graph, shows, for each function, which functions called it, which
  117. other functions it called, and how many times. There is also an estimate
  118. of how much time was spent in the subroutines of each function. This can
  119. suggest places where you might try to eliminate function calls that use a
  120. lot of time. @xref{Call Graph}.
  121. @node Compiling, Executing, Why, Top
  122. @chapter Compiling a Program for Profiling
  123. The first step in generating profile information for your program is
  124. to compile and link it with profiling enabled.
  125. To compile a source file for profiling, specify the @samp{-pg} option when
  126. you run the compiler. (This is in addition to the options you normally
  127. use.)
  128. To link the program for profiling, if you use a compiler such as @code{cc}
  129. to do the linking, simply specify @samp{-pg} in addition to your usual
  130. options. The same option, @samp{-pg}, alters either compilation or linking
  131. to do what is necessary for profiling. Here are examples:
  132. @example
  133. cc -g myprog.c utils.c -pg
  134. cc -o myprog myprog.o utils.o -pg
  135. @end example
  136. The @samp{-pg} option also works with a command that both compiles and links:
  137. @example
  138. cc -o myprog myprog.c utils.c -g -pg
  139. @end example
  140. If you run the linker @code{ld} directly instead of through a compiler such
  141. as @code{cc}, you must specify the profiling startup file
  142. @file{/lib/gcrt0.o} as the first input file instead of the usual startup
  143. file @file{/lib/crt0.o}. In addition, you would probably want to specify
  144. the profiling C library, @file{/usr/lib/libc_p.a}, by writing @samp{-lc_p}
  145. instead of the usual @samp{-lc}. This is not absolutely necessary, but doing
  146. this gives you number-of-calls information for standard library functions such
  147. as @code{read} and @code{open}. For example:
  148. @example
  149. ld -o myprog /lib/gcrt0.o myprog.o utils.o -lc_p
  150. @end example
  151. If you compile only some of the modules of the program with @samp{-pg}, you
  152. can still profile the program, but you won't get complete information about
  153. the modules that were compiled without @samp{-pg}. The only information
  154. you get for the functions in those modules is the total time spent in them;
  155. there is no record of how many times they were called, or from where. This
  156. will not affect the flat profile (except that the @code{calls} field for
  157. the functions will be blank), but will greatly reduce the usefulness of the
  158. call graph.
  159. So far GNU @code{gprof} has been tested only with C programs, but it ought
  160. to work with any language in which programs are compiled and linked to form
  161. executable files. If it does not, please let us know.
  162. @node Executing, Analyzing, Compiling, Top
  163. @chapter Executing the Program to Generate Profile Data
  164. Once the program is compiled for profiling, you must run it in order to
  165. generate the information that @code{gprof} needs. Simply run the program
  166. as usual, using the normal arguments, file names, etc. The program should
  167. run normally, producing the same output as usual. It will, however, run
  168. somewhat slower than normal because of the time spent collecting and the
  169. writing the profile data.
  170. The way you run the program---the arguments and input that you give
  171. it---may have a dramatic effect on what the profile information shows. The
  172. profile data will describe the parts of the program that were activated for
  173. the particular input you use. For example, if the first command you give
  174. to your program is to quit, the profile data will show the time used in
  175. initialization and in cleanup, but not much else.
  176. You program will write the profile data into a file called @file{gmon.out}
  177. just before exiting. If there is already a file called @file{gmon.out},
  178. its contents are overwritten. There is currently no way to tell the
  179. program to write the profile data under a different name, but you can rename
  180. the file afterward if you are concerned that it may be overwritten.
  181. In order to write the @file{gmon.out} file properly, your program must exit
  182. normally: by returning from @code{main} or by calling @code{exit}. Calling
  183. the low-level function @code{_exit} does not write the profile data, and
  184. neither does abnormal termination due to an unhandled signal.
  185. The @file{gmon.out} file is written in the program's @emph{current working
  186. directory} at the time it exits. This means that if your program calls
  187. @code{chdir}, the @file{gmon.out} file will be left in the last directory
  188. your program @code{chdir}'d to. If you don't have permission to write in
  189. this directory, the file is not written. You may get a confusing error
  190. message if this happens. (We have not yet replaced the part of Unix
  191. responsible for this; when we do, we will make the error message
  192. comprehensible.)
  193. @node Analyzing, Flat Profile, Executing, Top
  194. @chapter Analyzing the Profile Data: @code{gprof} Command Summary
  195. After you have a profile data file @file{gmon.out}, you can run @code{gprof}
  196. to interpret the information in it. The @code{gprof} program prints a
  197. flat profile and a call graph on standard output. Typically you would
  198. redirect the output of @code{gprof} into a file with @samp{>}.
  199. You run @code{gprof} like this:
  200. @example
  201. gprof @var{options} [@var{executable-file} [@var{profile-data-files}@dots{}]] [> @var{outfile}]
  202. @end example
  203. @noindent
  204. Here square-brackets indicate optional arguments.
  205. If you omit the executable file name, the file @file{a.out} is used. If
  206. you give no profile data file name, the file @file{gmon.out} is used. If
  207. any file is not in the proper format, or if the profile data file does not
  208. appear to belong to the executable file, an error message is printed.
  209. You can give more than one profile data file by entering all their names
  210. after the executable file name; then the statistics in all the data files
  211. are summed together.
  212. The following options may be used to selectively include or exclude
  213. functions in the output:
  214. @table @code
  215. @item -a
  216. The @code{-a} option causes @code{gprof} to ignore static (private)
  217. functions. (These are functions whose names are not listed as global,
  218. and which are not visible outside the file/function/block where they
  219. were defined.) Time spent in these functions, calls to/from them,
  220. etc, will all be attributed to the function that was loaded directly
  221. before it in the executable file. This is compatible with Unix
  222. @code{gprof}, but a bad idea. This option affects both the flat
  223. profile and the call graph.
  224. @item -e @var{function_name}
  225. The @code{-e @var{function}} option tells @code{gprof} to not print
  226. information about the function (and its children@dots{}) in the call
  227. graph. The function will still be listed as a child of any functions
  228. that call it, but its index number will be shown as @samp{[not
  229. printed]}.
  230. @item -E @var{function_name}
  231. The @code{-E @var{function}} option works like the @code{-e} option,
  232. but time spent in the function (and children who were not called from
  233. anywhere else), will not be used to compute the percentages-of-time
  234. for the call graph.
  235. @item -f @var{function_name}
  236. The @code{-f @var{function}} option causes @code{gprof} to limit the
  237. call graph to the function and its children (and their
  238. children@dots{}).
  239. @item -F @var{function_name}
  240. The @code{-F @var{function}} option works like the @code{-f} option,
  241. but only time spent in the function and its children (and their
  242. children@dots{}) will be used to determine total-time and
  243. percentages-of-time for the call graph.
  244. @item -z
  245. If you give the @code{-z} option, @code{gprof} will mention all
  246. functions in the flat profile, even those that were never called, and
  247. that had no time spent in them.
  248. @end table
  249. The order of these options does not matter.
  250. Note that only one function can be specified with each @code{-e},
  251. @code{-E}, @code{-f} or @code{-F} option. To specify more than one
  252. function, use multiple options. For example, this command:
  253. @example
  254. gprof -e boring -f foo -f bar myprogram > gprof.output
  255. @end example
  256. @noindent
  257. lists in the call graph all functions that were reached from either
  258. @code{foo} or @code{bar} and were not reachable from @code{boring}.
  259. There are two other useful @code{gprof} options:
  260. @table @code
  261. @item -b
  262. If the @code{-b} option is given, @code{gprof} doesn't print the
  263. verbose blurbs that try to explain the meaning of all of the fields in
  264. the tables. This is useful if you intend to print out the output, or
  265. are tired of seeing the blurbs.
  266. @item -s
  267. The @code{-s} option causes @code{gprof} to summarize the information
  268. in the profile data files it read in, and write out a profile data
  269. file called @file{gmon.sum}, which contains all the information from
  270. the profile data files that @code{gprof} read in. The file @file{gmon.sum}
  271. may be one of the specified input files; the effect of this is to
  272. merge the data in the other input files into @file{gmon.sum}.
  273. @xref{Sampling Error}.
  274. Eventually you can run @code{gprof} again without @samp{-s} to analyze the
  275. cumulative data in the file @file{gmon.sum}.
  276. @end table
  277. @node Flat Profile, Call Graph, Analyzing, Top
  278. @chapter How to Understand the Flat Profile
  279. @cindex flat profile
  280. The @dfn{flat profile} shows the total amount of time your program
  281. spent executing each function. Unless the @samp{-z} option is given,
  282. functions with no apparent time spent in them, and no apparent calls
  283. to them, are not mentioned. Note that if a function was not compiled
  284. for profiling, and didn't run long enough to show up on the program
  285. counter histogram, it will be indistinguishable from a function that
  286. was never called.
  287. @c???
  288. Here is a sample flat profile for a small program:
  289. @example
  290. Each sample counts as 0.01 seconds.
  291. % time seconds cumsec calls function
  292. 79.17 0.19 0.19 6 a
  293. 16.67 0.04 0.23 1 main
  294. 4.17 0.01 0.24 mcount
  295. 0.00 0 0.24 1 profil
  296. @end example
  297. @noindent
  298. The functions are sorted by decreasing run-time spent in them. The
  299. functions @code{mcount} and @code{profil} are part of the profiling
  300. aparatus and appear in every flat profile; their time gives a measure of
  301. the amount of overhead due to profiling. (These internal functions are
  302. omitted from the call graph.)
  303. The sampling period estimates the margin of error in each of the time
  304. figures. A time figure that is not much larger than this is not reliable.
  305. In this example, the @code{seconds} field for @code{mcount} might well be 0
  306. or 0.02 in another run. @xref{Sampling Error}, for a complete discussion.
  307. Here is what the fields in each line mean:
  308. @table @code
  309. @item % time
  310. This is the percentage of the total execution time your program spent
  311. in this function. These should all add up to 100%.
  312. @item seconds
  313. This is the total number of seconds the computer spent executing the
  314. user code of this function.
  315. @item cumsec
  316. This is the cumulative total number of seconds the computer spent
  317. executing this functions, plus the time spent in all the functions
  318. above this one in this table.
  319. @item calls
  320. This is the total number of times the function was called. If the
  321. function was never called, or the number of times it was called cannot
  322. be determined (probably because the function was not compiled with
  323. profiling enabled), the @dfn{calls} field is blank.
  324. @item function
  325. This is the name of the function.
  326. @end table
  327. @node Call Graph, Implementation, Flat Profile, Top
  328. @chapter How to Read the Call Graph
  329. @cindex call graph
  330. The @dfn{call graph} shows how much time was spent in each function
  331. and its children. From this information, you can find functions that,
  332. while they themselves may not have used much time, called other
  333. functions that did use unusual amounts of time.
  334. Here is a sample call from a small program. This call came from the
  335. same @code{gprof} run as the flat profile example in the previous
  336. chapter.
  337. @example
  338. index % time self children called name
  339. <spontaneous>
  340. [1] 100.00 0 0.23 0 start [1]
  341. 0.04 0.19 1/1 main [2]
  342. ----------------------------------------
  343. 0.04 0.19 1/1 start [1]
  344. [2] 100.00 0.04 0.19 1 main [2]
  345. 0.19 0 1/1 a [3]
  346. ----------------------------------------
  347. 0.19 0 1/1 main [2]
  348. [3] 82.61 0.19 0 1+5 a [3]
  349. ----------------------------------------
  350. @end example
  351. The lines full of dashes divide this table into @dfn{entries}, one for each
  352. function. Each entry has one or more lines.
  353. In each entry, the primary line is the one that starts with an index number
  354. in square brackets. The end of this line says which function the entry is
  355. for. The preceding lines in the entry describe the callers of this
  356. function and the following lines describe its subroutines (also called
  357. @dfn{children} when we speak of the call graph).
  358. The entries are sorted by time spent in the function and its subroutines.
  359. The internal profiling functions @code{mcount} and @code{profil}
  360. (@pxref{Flat Profile}) are never mentioned in the call graph.
  361. @menu
  362. * Primary:: Details of the primary line's contents.
  363. * Callers:: Details of caller-lines' contents.
  364. * Subroutines:: Details of subroutine-lines' contents.
  365. * Cycles:: When there are cycles of recursion,
  366. such as @code{a} calls @code{b} calls @code{a}@dots{}
  367. @end menu
  368. @node Primary, Callers, Call Graph, Call Graph
  369. @section The Primary Line
  370. The @dfn{primary line} in a call graph entry is the line that
  371. describes the function which the entry is about and gives the overall
  372. statistics for this function.
  373. For reference, we repeat the primary line from the entry for function
  374. @code{a} in our main example, together with the heading line that shows the
  375. names of the fields:
  376. @example
  377. index % time self children called name
  378. @dots{}
  379. [3] 82.61 0.19 0 1+5 a [3]
  380. @end example
  381. Here is what the fields in the primary line mean:
  382. @table @code
  383. @item index
  384. Entries are numbered with consecutive integers. Each function
  385. therefore has an index number, which appears at the beginning of its
  386. primary line.
  387. Each cross-reference to a function, as a caller or subroutine of
  388. another, gives its index number as well as its name. The index number
  389. guides you if you wish to look for the entry for that function.
  390. @item % time
  391. This is the percentage of the total time that was spent in this
  392. function, including time spent in subroutines called from this
  393. function.
  394. The time spent in this function is counted again for the callers of
  395. this function. Therefore, adding up these percentages is meaningless.
  396. @item self
  397. This is the total amount of time spent in this function. This
  398. should be identical to the number printed in the @code{seconds} field
  399. for this function in the flat profile.
  400. @item children
  401. This is the total amount of time spent in the subroutine calls made by
  402. this function. This should be equal to the sum of all the @code{self}
  403. and @code{children} entries of the children listed directly below this
  404. function.
  405. @item called
  406. This is the number of times the function was called.
  407. If the function called itself recursively, there are two numbers,
  408. separated by a @samp{+}. The first number counts non-recursive calls,
  409. and the second counts recursive calls.
  410. In the example above, the function @code{a} called itself five times,
  411. and was called once from @code{main}.
  412. @item name
  413. This is the name of the current function. The index number is
  414. repeated after it.
  415. If the function is part of a cycle of recursion, the cycle number is
  416. printed between the function's name and the index number
  417. (@pxref{Cycles}). For example, if function @code{gnurr} is part of
  418. cycle number one, and has index number twelve, its primary line would
  419. be end like this:
  420. @example
  421. gnurr <cycle 1> [12]
  422. @end example
  423. @end table
  424. @node Callers, Subroutines, Primary, Call Graph
  425. @section Lines for a Function's Callers
  426. A function's entry has a line for each function it was called by.
  427. These lines' fields correspond to the fields of the primary line, but
  428. their meanings are different because of the difference in context.
  429. For reference, we repeat two lines from the entry for the function
  430. @code{a}, the primary line and one caller-line preceding it, together
  431. with the heading line that shows the names of the fields:
  432. @example
  433. index % time self children called name
  434. @dots{}
  435. 0.19 0 1/1 main [2]
  436. [3] 82.61 0.19 0 1+5 a [3]
  437. @end example
  438. Here are the meanings of the fields in the caller-line for @code{a}
  439. called from @code{main}:
  440. @table @code
  441. @item self
  442. An estimate of the amount of time spent in @code{a} itself when it was
  443. called from @code{main}.
  444. @item children
  445. An estimate of the amount of time spent in @code{a}'s subroutines when
  446. @code{a} was called from @code{main}.
  447. The sum of the @code{self} and @code{children} fields is an estimate
  448. of the amount of time spent within calls to @code{a} from @code{main}.
  449. @item called
  450. Two numbers: the number of times @code{a} was called from @code{main},
  451. followed by the total number of nonrecursive calls to @code{a} from
  452. all its callers.
  453. @item name and index number
  454. The name of the caller of @code{a} to which this line applies,
  455. followed by the caller's index number.
  456. Not all functions have entries in the call graph; some
  457. options to @code{gprof} request the omission of certain functions.
  458. When a caller has no entry of its own, it still has caller-lines
  459. in the entries of the functions it calls. Since this caller
  460. has no index number, the string @samp{[not printed]} is used
  461. instead of one.
  462. If the caller is part of a recursion cycle, the cycle number is
  463. printed between the name and the index number.
  464. @end table
  465. If the identity of the callers of a function cannot be determined, a
  466. dummy caller-line is printed which has @samp{<spontaneous>} as the
  467. ``caller's name'' and all other fields blank. This can happen for
  468. signal handlers.
  469. @c What if some calls have determinable callers' names but not all?
  470. @node Subroutines, Cycles, Callers, Call Graph
  471. @section Lines for a Function's Subroutines
  472. A function's entry has a line for each of its subroutines---in other
  473. words, a line for each other function that it called. These lines'
  474. fields correspond to the fields of the primary line, but their meanings
  475. are different because of the difference in context.
  476. For reference, we repeat two lines from the entry for the function
  477. @code{main}, the primary line and a line for a subroutine, together
  478. with the heading line that shows the names of the fields:
  479. @example
  480. index % time self children called name
  481. @dots{}
  482. [2] 100.00 0.04 0.19 1 main [2]
  483. 0.19 0 1/1 a [3]
  484. @end example
  485. Here are the meanings of the fields in the subroutine-line for @code{main}
  486. calling @code{a}:
  487. @table @code
  488. @item self
  489. An estimate of the amount of time spent directly within @code{a}
  490. when @code{a} was called from @code{main}.
  491. @item children
  492. An estimate of the amount of time spent in subroutines of @code{a}
  493. when @code{a} was called from @code{main}.
  494. The sum of the @code{self} and @code{children} fields is an estimate
  495. of the total time spent in calls to @code{a} from @code{main}.
  496. @item called
  497. Two numbers, the number of calls to @code{a} from @code{main}
  498. followed by the total number of nonrecursive calls to @code{a}.
  499. @item name
  500. The name of the subroutine of @code{a} to which this line applies,
  501. followed by the subroutine's index number. If the subroutine is
  502. a function omitted from the call graph, it has no index number,
  503. so @samp{[not printed]} appears instead.
  504. If the caller is part of a recursion cycle, the cycle number is
  505. printed between the name and the index number.
  506. @end table
  507. @node Cycles,, Subroutines, Call Graph
  508. @section How Mutually Recursive Functions Are Described
  509. @cindex cycle
  510. @cindex recursion cycle
  511. The graph may be complicated by the presence of @dfn{cycles of
  512. recursion} in the call graph. A cycle exists if a function calls
  513. another function that (directly or indirectly) calls (or appears to
  514. call) the original function. For example: if @code{a} calls @code{b},
  515. and @code{b} calls @code{a}, then @code{a} and @code{b} form a cycle.
  516. Whenever there are call-paths both ways between a pair of functions, they
  517. belong to the same cycle. If @code{a} and @code{b} call each other and
  518. @code{b} and @code{c} call each other, all three make one cycle. Note that
  519. even if @code{b} only calls @code{a} if it was not called from @code{a},
  520. @code{gprof} cannot determine this, so @code{a} and @code{b} are still
  521. considered a cycle.
  522. The cycles are numbered with consecutive integers. When a function
  523. belongs to a cycle, each time the function name appears in the call graph
  524. it is followed by @samp{<cycle @var{number}>}.
  525. The reason cycles matter is that they make the time values in the call
  526. graph paradoxical. The ``time spent in children'' of @code{a} should
  527. include the time spent in its subroutine @code{b} and in @code{b}'s
  528. subroutines---but one of @code{b}'s subroutines is @code{a}! How much of
  529. @code{a}'s time should be included in the children of @code{a}, when
  530. @code{a} is indirectly recursive?
  531. The way @code{gprof} resolves this paradox is by creating a single entry
  532. for the cycle as a whole. The primary line of this entry describes the
  533. total time spent directly in the functions of the cycle. The
  534. ``subroutines'' of the cycle are the individual functions of the cycle, and
  535. all other functions that were called directly by them. The ``callers'' of
  536. the cycle are the functions, outside the cycle, that called functions in
  537. the cycle.
  538. Here is a portion of the call graph which shows a cycle containing
  539. functions @code{a} and @code{b}. The cycle was entered by a call to
  540. @code{a} from @code{main}; both @code{a} and @code{b} called @code{c}.@refill
  541. @example
  542. index % time self children called name
  543. ----------------------------------------
  544. 1.77 0 1/1 main [2]
  545. [3] 91.71 1.77 0 1+5 <cycle 1 as a whole> [3]
  546. 1.02 0 3 b <cycle 1> [4]
  547. 0.75 0 2 a <cycle 1> [5]
  548. ----------------------------------------
  549. 3 a <cycle 1> [5]
  550. [4] 52.85 1.02 0 0 b <cycle 1> [4]
  551. 2 a <cycle 1> [5]
  552. 0 0 3/6 c [6]
  553. ----------------------------------------
  554. 1.77 0 1/1 main [2]
  555. 2 b <cycle 1> [4]
  556. [5] 38.86 0.75 0 1 a <cycle 1> [5]
  557. 3 b <cycle 1> [4]
  558. 0 0 3/6 c [6]
  559. ----------------------------------------
  560. @end example
  561. @noindent
  562. (The entire call graph for this program contains in addition an entry for
  563. @code{main}, which calls @code{a}, and an entry for @code{c}, with callers
  564. @code{a} and @code{b}.)
  565. @example
  566. index % time self children called name
  567. <spontaneous>
  568. [1] 100.00 0 1.93 0 start [1]
  569. 0.16 1.77 1/1 main [2]
  570. ----------------------------------------
  571. 0.16 1.77 1/1 start [1]
  572. [2] 100.00 0.16 1.77 1 main [2]
  573. 1.77 0 1/1 a <cycle 1> [5]
  574. ----------------------------------------
  575. 1.77 0 1/1 main [2]
  576. [3] 91.71 1.77 0 1+5 <cycle 1 as a whole> [3]
  577. 1.02 0 3 b <cycle 1> [4]
  578. 0.75 0 2 a <cycle 1> [5]
  579. 0 0 6/6 c [6]
  580. ----------------------------------------
  581. 3 a <cycle 1> [5]
  582. [4] 52.85 1.02 0 0 b <cycle 1> [4]
  583. 2 a <cycle 1> [5]
  584. 0 0 3/6 c [6]
  585. ----------------------------------------
  586. 1.77 0 1/1 main [2]
  587. 2 b <cycle 1> [4]
  588. [5] 38.86 0.75 0 1 a <cycle 1> [5]
  589. 3 b <cycle 1> [4]
  590. 0 0 3/6 c [6]
  591. ----------------------------------------
  592. 0 0 3/6 b <cycle 1> [4]
  593. 0 0 3/6 a <cycle 1> [5]
  594. [6] 0.00 0 0 6 c [6]
  595. ----------------------------------------
  596. @end example
  597. The @code{self} field of the cycle's primary line is the total time
  598. spent in all the functions of the cycle. It equals the sum of the
  599. @code{self} fields for the individual functions in the cycle, found
  600. in the entry in the subroutine lines for these functions.
  601. The @code{children} fields of the cycle's primary line and subroutine lines
  602. count only subroutines outside the cycle. Even though @code{a} calls
  603. @code{b}, the time spent in those calls to @code{b} is not counted in
  604. @code{a}'s @code{children} time. Thus, we do not encounter the problem of
  605. what to do when the time in those calls to @code{b} includes indirect
  606. recursive calls back to @code{a}.
  607. The @code{children} field of a caller-line in the cycle's entry estimates
  608. the amount of time spent @emph{in the whole cycle}, and its other
  609. subroutines, on the times when that caller called a function in the cycle.
  610. The @code{calls} field in the primary line for the cycle has two numbers:
  611. first, the number of times functions in the cycle were called by functions
  612. outside the cycle; second, the number of times they were called by
  613. functions in the cycle (including times when a function in the cycle calls
  614. itself). This is a generalization of the usual split into nonrecursive and
  615. recursive calls.
  616. The @code{calls} field of a subroutine-line for a cycle member in the
  617. cycle's entry says how many time that function was called from functions in
  618. the cycle. The total of all these is the second number in the primary line's
  619. @code{calls} field.
  620. In the individual entry for a function in a cycle, the other functions in
  621. the same cycle can appear as subroutines and as callers. These lines show
  622. how many times each function in the cycle called or was called from each other
  623. function in the cycle. The @code{self} and @code{children} fields in these
  624. lines are blank because of the difficulty of defining meanings for them
  625. when recursion is going on.
  626. @node Implementation, Sampling Error, Call Graph, Top
  627. @chapter Implementation of Profiling
  628. Profiling works by changing how every function in your program is compiled
  629. so that when it is called, it will stash away some information about where
  630. it was called from. From this, the profiler can figure out what function
  631. called it, and can count how many times it was called. This change is made
  632. by the compiler when your program is compiled with the @samp{-pg} option.
  633. Profiling also involves watching your program as it runs, and keeping a
  634. histogram of where the program counter happens to be every now and then.
  635. Typically the program counter is looked at around 100 times per second of
  636. run time, but the exact frequency may vary from system to system.
  637. A special startup routine allocates memory for the histogram and sets up a
  638. clock signal handler to make entries in it. Use of this special startup
  639. routine is one of the effects of using @samp{cc -pg} to link. The startup
  640. file also includes an @code{exit} function which is responsible for writing
  641. the file @file{gmon.out}.
  642. Number-of-calls information for library routines is collected by using a
  643. special version of the C library. The programs in it are the same as in
  644. the usual C library, but they were compiled with @samp{-pg}. If you link
  645. your program with @samp{cc -pg}, it automatically uses the profiling
  646. version of the library.
  647. The output @code{gprof} gives no indication of parts of your program that
  648. are limited by I/O or swapping bandwidth. This is because samples of the
  649. program counter are taken at fixed intervals of run time. Therefore, the
  650. time measurements in @code{gprof} output say nothing about time that your
  651. program was not running. For example, a part of the program that creates
  652. so much data that it cannot all fit in physical memory at once may run very
  653. slowly due to thrashing, but @code{gprof} will say it uses little time. On
  654. the other hand, sampling by run time has the advantage that the amount of
  655. load due to other users won't directly affect the output you get.
  656. @node Sampling Error, Assumptions, Implementation, Top
  657. @chapter Statistical Inaccuracy of @code{gprof} Output
  658. The run-time figures that @code{gprof} gives you are based on a sampling
  659. process, so they are subject to statistical inaccuracy. If a function runs
  660. only a small amount of time, so that on the average the sampling process
  661. ought to catch that function in the act only once, there is a pretty good
  662. chance it will actually find that function zero times, or twice.
  663. By contrast, the number-of-calls figures are derived by counting, not
  664. sampling. They are completely accurate and will not vary from run to run
  665. if your program is deterministic.
  666. The @dfn{sampling period} that is printed at the beginning of the flat
  667. profile says how often samples are taken. The rule of thumb is that a
  668. run-time figure is accurate if it is considerably bigger than the sampling
  669. period.
  670. The actual amount of error is usually more than one sampling period. In
  671. fact, if a value is @var{n} times the sampling period, the @emph{expected}
  672. error in it is the square-root of @var{n} sampling periods. If the
  673. sampling period is 0.01 seconds and @code{foo}'s run-time is 1 second, the
  674. expected error in @code{foo}'s run-time is 0.1 seconds. It is likely to
  675. vary this much @emph{on the average} from one profiling run to the next.
  676. (@emph{Sometimes} it will vary more.)
  677. This does not mean that a small run-time figure is devoid of information.
  678. If the program's @emph{total} run-time is large, a small run-time for one
  679. function does tell you that that function used an insignificant fraction of
  680. the whole program's time. Usually this means it is not worth optimizing.
  681. One way to get more accuracy is to give your program more (but similar)
  682. input data so it will take longer. Another way is to combine the data from
  683. several runs, using the @samp{-s} option of @code{gprof}. Here is how:
  684. @enumerate
  685. @item
  686. Run your program once.
  687. @item
  688. Issue the command @samp{mv gmon.out gmon.sum}.
  689. @item
  690. Run your program again, the same as before.
  691. @item
  692. Merge the new data in @file{gmon.out} into @file{gmon.sum} with this command:
  693. @example
  694. gprof -s @var{executable-file} gmon.out gmon.sum
  695. @end example
  696. @item
  697. Repeat the last two steps as often as you wish.
  698. @item
  699. Analyze the cumulative data using this command:
  700. @example
  701. gprof @var{executable-file} gmon.sum > @var{output-file}
  702. @end example
  703. @end enumerate
  704. @node Assumptions, Incompatibilities, Sampling Error, Top
  705. @chapter Estimating @code{children} Times Uses an Assumption
  706. Some of the figures in the call graph are estimates---for example, the
  707. @code{children} time values and all the the time figures in caller and
  708. subroutine lines.
  709. There is no direct information about these measurements in the profile
  710. data itself. Instead, @code{gprof} estimates them by making an assumption
  711. about your program that might or might not be true.
  712. The assumption made is that the average time spent in each call to any
  713. function @code{foo} is not correlated with who called @code{foo}. If
  714. @code{foo} used 5 seconds in all, and 2/5 of the calls to @code{foo} came
  715. from @code{a}, then @code{foo} contributes 2 seconds to @code{a}'s
  716. @code{children} time, by assumption.
  717. This assumption is usually true enough, but for some programs it is far
  718. from true. Suppose that @code{foo} returns very quickly when its argument
  719. is zero; suppose that @code{a} always passes zero as an argument, while
  720. other callers of @code{foo} pass other arguments. In this program, all the
  721. time spent in @code{foo} is in the calls from callers other than @code{a}.
  722. But @code{gprof} has no way of knowing this; it will blindly and
  723. incorrectly charge 2 seconds of time in @code{foo} to the children of
  724. @code{a}.
  725. We hope some day to put more complete data into @file{gmon.out}, so that
  726. this assumption is no longer needed, if we can figure out how. For the
  727. nonce, the estimated figures are usually more useful than misleading.
  728. @node Incompatibilities, , Assumptions, Top
  729. @chapter Incompatibilities with Unix @code{gprof}
  730. GNU @code{gprof} and Berkeley Unix @code{gprof} use the same data file
  731. @file{gmon.out}, and provide essentially the same information. But there a
  732. few differences.@refill
  733. GNU @code{gprof} does not support the @samp{-c} option which prints a
  734. static call graph based on reading the machine language of your
  735. program. We think that program cross-references ought to be based on
  736. the source files, which can be analyzed in a machine-independent
  737. fashion.@refill
  738. For a recursive function, Unix @code{gprof} lists the function as a parent
  739. and as a child, with a @code{calls} field that lists the number of
  740. recursive calls. GNU @code{gprof} omits these lines and puts the number of
  741. recursive calls in the primary line.
  742. When a function is suppressed from the call graph with @samp{-e}, GNU
  743. @code{gprof} still lists it as a subroutine of functions that call it.
  744. The function names printed in GNU @code{gprof} output do not include
  745. the leading underscores that are added internally to the front of all
  746. C identifiers on many operating systems.
  747. The blurbs, field widths, and output formats are different. GNU
  748. @code{gprof} prints blurbs after the tables, so that you can see the
  749. tables without skipping the blurbs.
  750. @contents
  751. @bye