statprof.texi 9.3 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255
  1. @c -*-texinfo-*-
  2. @c This is part of the GNU Guile Reference Manual.
  3. @c Copyright (C) 2013, 2015, 2017 Free Software Foundation, Inc.
  4. @c See the file guile.texi for copying conditions.
  5. @node Statprof
  6. @section Statprof
  7. Statprof is a statistical profiler for Guile.
  8. A simple use of statprof would look like this:
  9. @example
  10. (use-modules (statprof))
  11. (statprof (lambda ()
  12. (map 1+ (iota 1000000))
  13. #f))
  14. @end example
  15. This would run the thunk with statistical profiling, finally displaying
  16. a flat table of statistics which could look something like this:
  17. @example
  18. % cumulative self
  19. time seconds seconds procedure
  20. 57.14 39769.73 0.07 ice-9/boot-9.scm:249:5:map1
  21. 28.57 0.04 0.04 ice-9/boot-9.scm:1165:0:iota
  22. 14.29 0.02 0.02 1+
  23. 0.00 0.12 0.00 <current input>:2:10
  24. ---
  25. Sample count: 7
  26. Total time: 0.123490713 seconds (0.201983993 seconds in GC)
  27. @end example
  28. All of the numerical data with the exception of the calls column is
  29. statistically approximate. In the following column descriptions, and in
  30. all of statprof, ``time'' refers to execution time (both user and
  31. system), not wall clock time.
  32. The @code{% time} column indicates the percentage of the run-time time
  33. spent inside the procedure itself (not counting children). It is
  34. calculated as @code{self seconds}, measuring the amount of time spent in
  35. the procedure, divided by the total run-time.
  36. @code{cumulative seconds} also counts time spent in children of a
  37. function. For recursive functions, this can exceed the total time, as
  38. in our example above, because each activation on the stack adds to the
  39. cumulative time.
  40. Finally, the GC time measures the time spent in the garbage collector.
  41. On systems with multiple cores, this time can be larger than the run
  42. time, because it counts time spent in all threads, and will run the
  43. ``marking'' phase of GC in parallel. If GC time is a significant
  44. fraction of the run time, that means that most time in your program is
  45. spent allocating objects and cleaning up after those allocations. To
  46. speed up your program, one good place to start would be to look at how
  47. to reduce the allocation rate.
  48. Statprof's main mode of operation is as a statistical profiler. However
  49. statprof can also run in a ``precise'' mode as well. Pass the
  50. @code{#:count-calls? #t} keyword argument to @code{statprof} to record
  51. all calls:
  52. @example
  53. (use-modules (statprof))
  54. (statprof (lambda ()
  55. (map 1+ (iota 1000000))
  56. #f)
  57. #:count-calls? #t)
  58. @end example
  59. The result has an additional @code{calls} column:
  60. @example
  61. % cumulative self
  62. time seconds seconds calls procedure
  63. 82.26 0.73 0.73 1000000 1+
  64. 11.29 420925.80 0.10 1000001 ice-9/boot-9.scm:249:5:map1
  65. 4.84 0.06 0.04 1 ice-9/boot-9.scm:1165:0:iota
  66. [...]
  67. ---
  68. Sample count: 62
  69. Total time: 0.893098065 seconds (1.222796536 seconds in GC)
  70. @end example
  71. As you can see, the profile is perturbed: @code{1+} ends up on top,
  72. whereas it was not marked as hot in the earlier profile. This is
  73. because the overhead of call-counting unfairly penalizes calls. Still,
  74. this precise mode can be useful at times to do algorithmic optimizations
  75. based on the precise call counts.
  76. @heading Implementation notes
  77. The profiler works by setting the unix profiling signal
  78. @code{ITIMER_PROF} to go off after the interval you define in the call
  79. to @code{statprof-reset}. When the signal fires, a sampling routine
  80. runs which crawls up the stack, recording all instruction pointers into
  81. a buffer. After the sample is complete, the profiler resets profiling
  82. timer to fire again after the appropriate interval.
  83. Later, when profiling stops, that log buffer is analyzed to produce the
  84. ``self seconds'' and ``cumulative seconds'' statistics. A procedure at
  85. the top of the stack counts toward ``self'' samples, and everything on
  86. the stack counts towards ``cumulative'' samples.
  87. While the profiler is running it measures how much CPU time (system and
  88. user -- which is also what @code{ITIMER_PROF} tracks) has elapsed while
  89. code has been executing within the profiler. Only run time counts
  90. towards the profile, not wall-clock time. For example, sleeping and
  91. waiting for input or output do not cause the timer clock to advance.
  92. @heading Usage
  93. @deffn {Scheme Procedure} statprof thunk @
  94. [#:loop loop=1] [#:hz hz=100] @
  95. [#:port port=(current-output-port)] @
  96. [#:count-calls? count-calls?=#f] @
  97. [#:display-style display-style='flat]
  98. Profile the execution of @var{thunk}, and return its return values.
  99. The stack will be sampled @var{hz} times per second, and the thunk
  100. itself will be called @var{loop} times.
  101. If @var{count-calls?} is true, all procedure calls will be recorded.
  102. This operation is somewhat expensive.
  103. After the @var{thunk} has been profiled, print out a profile to
  104. @var{port}. If @var{display-style} is @code{flat}, the results will be
  105. printed as a flat profile. Otherwise if @var{display-style} is
  106. @code{tree}, print the results as a tree profile.
  107. Note that @code{statprof} requires a working profiling timer. Some
  108. platforms do not support profiling timers. @code{(provided?
  109. 'ITIMER_PROF)} can be used to check for support of profiling timers.
  110. @end deffn
  111. Profiling can also be enabled and disabled manually.
  112. @deffn {Scheme Procedure} statprof-active?
  113. Returns @code{#t} if @code{statprof-start} has been called more times
  114. than @code{statprof-stop}, @code{#f} otherwise.
  115. @end deffn
  116. @deffn {Scheme Procedure} statprof-start
  117. @deffnx {Scheme Procedure} statprof-stop
  118. Start or stop the profiler.
  119. @end deffn
  120. @deffn {Scheme Procedure} statprof-reset sample-seconds sample-microseconds count-calls?
  121. Reset the profiling sample interval to @var{sample-seconds} and
  122. @var{sample-microseconds}. If @var{count-calls?} is true, arrange to
  123. instrument procedure calls as well as collecting statistical profiling
  124. data.
  125. @end deffn
  126. If you use the manual @code{statprof-start}/@code{statprof-stop}
  127. interface, an implicit statprof state will persist starting from the
  128. last call to @code{statprof-reset}, or the first call to
  129. @code{statprof-start}. There are a number of accessors to fetch
  130. statistics from this implicit state.
  131. @deffn {Scheme Procedure} statprof-accumulated-time
  132. Returns the time accumulated during the last statprof run.
  133. @end deffn
  134. @deffn {Scheme Procedure} statprof-sample-count
  135. Returns the number of samples taken during the last statprof run.
  136. @end deffn
  137. @deffn {Scheme Procedure} statprof-fold-call-data proc init
  138. Fold @var{proc} over the call-data accumulated by statprof. This
  139. procedure cannot be called while statprof is active.
  140. @var{proc} will be called with arguments, @var{call-data} and
  141. @var{prior-result}.
  142. @end deffn
  143. @deffn {Scheme Procedure} statprof-proc-call-data proc
  144. Returns the call-data associated with @var{proc}, or @code{#f} if none
  145. is available.
  146. @end deffn
  147. @deffn {Scheme Procedure} statprof-call-data-name cd
  148. @deffnx {Scheme Procedure} statprof-call-data-calls cd
  149. @deffnx {Scheme Procedure} statprof-call-data-cum-samples cd
  150. @deffnx {Scheme Procedure} statprof-call-data-self-samples cd
  151. Accessors for the fields in a statprof call-data object.
  152. @end deffn
  153. @deffn {Scheme Procedure} statprof-call-data->stats call-data
  154. Returns an object of type @code{statprof-stats}.
  155. @end deffn
  156. @deffn {Scheme Procedure} statprof-stats-proc-name stats
  157. @deffnx {Scheme Procedure} statprof-stats-%-time-in-proc stats
  158. @deffnx {Scheme Procedure} statprof-stats-cum-secs-in-proc stats
  159. @deffnx {Scheme Procedure} statprof-stats-self-secs-in-proc stats
  160. @deffnx {Scheme Procedure} statprof-stats-calls stats
  161. @deffnx {Scheme Procedure} statprof-stats-self-secs-per-call stats
  162. @deffnx {Scheme Procedure} statprof-stats-cum-secs-per-call stats
  163. Accessors for the fields in a @code{statprof-stats} object.
  164. @end deffn
  165. @deffn {Scheme Procedure} statprof-display @
  166. [port=(current-output-port)] [#:style style=flat]
  167. Displays a summary of the statistics collected. Possible values for
  168. @var{style} include:
  169. @table @code
  170. @item flat
  171. Display a traditional gprof-style flat profile.
  172. @item anomalies
  173. Find statistical anomalies in the data.
  174. @item tree
  175. Display a tree profile.
  176. @end table
  177. @end deffn
  178. @deffn {Scheme Procedure} statprof-fetch-stacks
  179. Returns a list of stacks, as they were captured since the last call to
  180. @code{statprof-reset}.
  181. @end deffn
  182. @deffn {Scheme Procedure} statprof-fetch-call-tree [#:precise precise?=#f]
  183. @verbatim
  184. Return a call tree for the previous statprof run.
  185. The return value is a list of nodes. A node is a list of the form:
  186. @code
  187. node ::= (@var{proc} @var{count} . @var{nodes})
  188. @end code
  189. The @var{proc} is a printable representation of a procedure, as a
  190. string. If @var{precise?} is false, which is the default, then a node
  191. corresponds to a procedure invocation. If it is true, then a node
  192. corresponds to a return point in a procedure. Passing @code{#:precise?
  193. #t} allows a user to distinguish different source lines in a procedure,
  194. but usually it is too much detail, so it is off by default.
  195. @end verbatim
  196. @end deffn
  197. @deffn {Scheme Procedure} gcprof thunk [#:loop]
  198. Like the @code{statprof} procedure, but instead of profiling CPU time,
  199. we profile garbage collection.
  200. The stack will be sampled soon after every garbage collection during the
  201. evaluation of @var{thunk}, yielding an approximate idea of what is
  202. causing allocation in your program.
  203. Since GC does not occur very frequently, you may need to use the
  204. @var{loop} parameter, to cause @var{thunk} to be called @var{loop}
  205. times.
  206. @end deffn