123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420 |
- perf-report(1)
- ==============
- NAME
- ----
- perf-report - Read perf.data (created by perf record) and display the profile
- SYNOPSIS
- --------
- [verse]
- 'perf report' [-i <file> | --input=file]
- DESCRIPTION
- -----------
- This command displays the performance counter profile information recorded
- via perf record.
- OPTIONS
- -------
- -i::
- --input=::
- Input file name. (default: perf.data unless stdin is a fifo)
- -v::
- --verbose::
- Be more verbose. (show symbol address, etc)
- -n::
- --show-nr-samples::
- Show the number of samples for each symbol
- --show-cpu-utilization::
- Show sample percentage for different cpu modes.
- -T::
- --threads::
- Show per-thread event counters. The input data file should be recorded
- with -s option.
- -c::
- --comms=::
- Only consider symbols in these comms. CSV that understands
- file://filename entries. This option will affect the percentage of
- the overhead column. See --percentage for more info.
- --pid=::
- Only show events for given process ID (comma separated list).
- --tid=::
- Only show events for given thread ID (comma separated list).
- -d::
- --dsos=::
- Only consider symbols in these dsos. CSV that understands
- file://filename entries. This option will affect the percentage of
- the overhead column. See --percentage for more info.
- -S::
- --symbols=::
- Only consider these symbols. CSV that understands
- file://filename entries. This option will affect the percentage of
- the overhead column. See --percentage for more info.
- --symbol-filter=::
- Only show symbols that match (partially) with this filter.
- -U::
- --hide-unresolved::
- Only display entries resolved to a symbol.
- -s::
- --sort=::
- Sort histogram entries by given key(s) - multiple keys can be specified
- in CSV format. Following sort keys are available:
- pid, comm, dso, symbol, parent, cpu, socket, srcline, weight, local_weight.
- Each key has following meaning:
- - comm: command (name) of the task which can be read via /proc/<pid>/comm
- - pid: command and tid of the task
- - dso: name of library or module executed at the time of sample
- - symbol: name of function executed at the time of sample
- - parent: name of function matched to the parent regex filter. Unmatched
- entries are displayed as "[other]".
- - cpu: cpu number the task ran at the time of sample
- - socket: processor socket number the task ran at the time of sample
- - srcline: filename and line number executed at the time of sample. The
- DWARF debugging info must be provided.
- - srcfile: file name of the source file of the same. Requires dwarf
- information.
- - weight: Event specific weight, e.g. memory latency or transaction
- abort cost. This is the global weight.
- - local_weight: Local weight version of the weight above.
- - transaction: Transaction abort flags.
- - overhead: Overhead percentage of sample
- - overhead_sys: Overhead percentage of sample running in system mode
- - overhead_us: Overhead percentage of sample running in user mode
- - overhead_guest_sys: Overhead percentage of sample running in system mode
- on guest machine
- - overhead_guest_us: Overhead percentage of sample running in user mode on
- guest machine
- - sample: Number of sample
- - period: Raw number of event count of sample
- By default, comm, dso and symbol keys are used.
- (i.e. --sort comm,dso,symbol)
- If --branch-stack option is used, following sort keys are also
- available:
- - dso_from: name of library or module branched from
- - dso_to: name of library or module branched to
- - symbol_from: name of function branched from
- - symbol_to: name of function branched to
- - srcline_from: source file and line branched from
- - srcline_to: source file and line branched to
- - mispredict: "N" for predicted branch, "Y" for mispredicted branch
- - in_tx: branch in TSX transaction
- - abort: TSX transaction abort.
- - cycles: Cycles in basic block
- And default sort keys are changed to comm, dso_from, symbol_from, dso_to
- and symbol_to, see '--branch-stack'.
- If the --mem-mode option is used, the following sort keys are also available
- (incompatible with --branch-stack):
- symbol_daddr, dso_daddr, locked, tlb, mem, snoop, dcacheline.
- - symbol_daddr: name of data symbol being executed on at the time of sample
- - dso_daddr: name of library or module containing the data being executed
- on at the time of the sample
- - locked: whether the bus was locked at the time of the sample
- - tlb: type of tlb access for the data at the time of the sample
- - mem: type of memory access for the data at the time of the sample
- - snoop: type of snoop (if any) for the data at the time of the sample
- - dcacheline: the cacheline the data address is on at the time of the sample
- And the default sort keys are changed to local_weight, mem, sym, dso,
- symbol_daddr, dso_daddr, snoop, tlb, locked, see '--mem-mode'.
- If the data file has tracepoint event(s), following (dynamic) sort keys
- are also available:
- trace, trace_fields, [<event>.]<field>[/raw]
- - trace: pretty printed trace output in a single column
- - trace_fields: fields in tracepoints in separate columns
- - <field name>: optional event and field name for a specific field
- The last form consists of event and field names. If event name is
- omitted, it searches all events for matching field name. The matched
- field will be shown only for the event has the field. The event name
- supports substring match so user doesn't need to specify full subsystem
- and event name everytime. For example, 'sched:sched_switch' event can
- be shortened to 'switch' as long as it's not ambiguous. Also event can
- be specified by its index (starting from 1) preceded by the '%'.
- So '%1' is the first event, '%2' is the second, and so on.
- The field name can have '/raw' suffix which disables pretty printing
- and shows raw field value like hex numbers. The --raw-trace option
- has the same effect for all dynamic sort keys.
- The default sort keys are changed to 'trace' if all events in the data
- file are tracepoint.
- -F::
- --fields=::
- Specify output field - multiple keys can be specified in CSV format.
- Following fields are available:
- overhead, overhead_sys, overhead_us, overhead_children, sample and period.
- Also it can contain any sort key(s).
- By default, every sort keys not specified in -F will be appended
- automatically.
- -p::
- --parent=<regex>::
- A regex filter to identify parent. The parent is a caller of this
- function and searched through the callchain, thus it requires callchain
- information recorded. The pattern is in the exteneded regex format and
- defaults to "\^sys_|^do_page_fault", see '--sort parent'.
- -x::
- --exclude-other::
- Only display entries with parent-match.
- -w::
- --column-widths=<width[,width...]>::
- Force each column width to the provided list, for large terminal
- readability. 0 means no limit (default behavior).
- -t::
- --field-separator=::
- Use a special separator character and don't pad with spaces, replacing
- all occurrences of this separator in symbol names (and other output)
- with a '.' character, that thus it's the only non valid separator.
- -D::
- --dump-raw-trace::
- Dump raw trace in ASCII.
- -g::
- --call-graph=<print_type,threshold[,print_limit],order,sort_key[,branch],value>::
- Display call chains using type, min percent threshold, print limit,
- call order, sort key, optional branch and value. Note that ordering of
- parameters is not fixed so any parement can be given in an arbitraty order.
- One exception is the print_limit which should be preceded by threshold.
- print_type can be either:
- - flat: single column, linear exposure of call chains.
- - graph: use a graph tree, displaying absolute overhead rates. (default)
- - fractal: like graph, but displays relative rates. Each branch of
- the tree is considered as a new profiled object.
- - folded: call chains are displayed in a line, separated by semicolons
- - none: disable call chain display.
- threshold is a percentage value which specifies a minimum percent to be
- included in the output call graph. Default is 0.5 (%).
- print_limit is only applied when stdio interface is used. It's to limit
- number of call graph entries in a single hist entry. Note that it needs
- to be given after threshold (but not necessarily consecutive).
- Default is 0 (unlimited).
- order can be either:
- - callee: callee based call graph.
- - caller: inverted caller based call graph.
- Default is 'caller' when --children is used, otherwise 'callee'.
- sort_key can be:
- - function: compare on functions (default)
- - address: compare on individual code addresses
- branch can be:
- - branch: include last branch information in callgraph when available.
- Usually more convenient to use --branch-history for this.
- value can be:
- - percent: diplay overhead percent (default)
- - period: display event period
- - count: display event count
- --children::
- Accumulate callchain of children to parent entry so that then can
- show up in the output. The output will have a new "Children" column
- and will be sorted on the data. It requires callchains are recorded.
- See the `overhead calculation' section for more details.
- --max-stack::
- Set the stack depth limit when parsing the callchain, anything
- beyond the specified depth will be ignored. This is a trade-off
- between information loss and faster processing especially for
- workloads that can have a very long callchain stack.
- Note that when using the --itrace option the synthesized callchain size
- will override this value if the synthesized callchain size is bigger.
- Default: 127
- -G::
- --inverted::
- alias for inverted caller based call graph.
- --ignore-callees=<regex>::
- Ignore callees of the function(s) matching the given regex.
- This has the effect of collecting the callers of each such
- function into one place in the call-graph tree.
- --pretty=<key>::
- Pretty printing style. key: normal, raw
- --stdio:: Use the stdio interface.
- --stdio-color::
- 'always', 'never' or 'auto', allowing configuring color output
- via the command line, in addition to via "color.ui" .perfconfig.
- Use '--stdio-color always' to generate color even when redirecting
- to a pipe or file. Using just '--stdio-color' is equivalent to
- using 'always'.
- --tui:: Use the TUI interface, that is integrated with annotate and allows
- zooming into DSOs or threads, among other features. Use of --tui
- requires a tty, if one is not present, as when piping to other
- commands, the stdio interface is used.
- --gtk:: Use the GTK2 interface.
- -k::
- --vmlinux=<file>::
- vmlinux pathname
- --kallsyms=<file>::
- kallsyms pathname
- -m::
- --modules::
- Load module symbols. WARNING: This should only be used with -k and
- a LIVE kernel.
- -f::
- --force::
- Don't do ownership validation.
- --symfs=<directory>::
- Look for files with symbols relative to this directory.
- -C::
- --cpu:: Only report samples for the list of CPUs provided. Multiple CPUs can
- be provided as a comma-separated list with no space: 0,1. Ranges of
- CPUs are specified with -: 0-2. Default is to report samples on all
- CPUs.
- -M::
- --disassembler-style=:: Set disassembler style for objdump.
- --source::
- Interleave source code with assembly code. Enabled by default,
- disable with --no-source.
- --asm-raw::
- Show raw instruction encoding of assembly instructions.
- --show-total-period:: Show a column with the sum of periods.
- -I::
- --show-info::
- Display extended information about the perf.data file. This adds
- information which may be very large and thus may clutter the display.
- It currently includes: cpu and numa topology of the host system.
- -b::
- --branch-stack::
- Use the addresses of sampled taken branches instead of the instruction
- address to build the histograms. To generate meaningful output, the
- perf.data file must have been obtained using perf record -b or
- perf record --branch-filter xxx where xxx is a branch filter option.
- perf report is able to auto-detect whether a perf.data file contains
- branch stacks and it will automatically switch to the branch view mode,
- unless --no-branch-stack is used.
- --branch-history::
- Add the addresses of sampled taken branches to the callstack.
- This allows to examine the path the program took to each sample.
- The data collection must have used -b (or -j) and -g.
- --objdump=<path>::
- Path to objdump binary.
- --group::
- Show event group information together.
- --demangle::
- Demangle symbol names to human readable form. It's enabled by default,
- disable with --no-demangle.
- --demangle-kernel::
- Demangle kernel symbol names to human readable form (for C++ kernels).
- --mem-mode::
- Use the data addresses of samples in addition to instruction addresses
- to build the histograms. To generate meaningful output, the perf.data
- file must have been obtained using perf record -d -W and using a
- special event -e cpu/mem-loads/ or -e cpu/mem-stores/. See
- 'perf mem' for simpler access.
- --percent-limit::
- Do not show entries which have an overhead under that percent.
- (Default: 0). Note that this option also sets the percent limit (threshold)
- of callchains. However the default value of callchain threshold is
- different than the default value of hist entries. Please see the
- --call-graph option for details.
- --percentage::
- Determine how to display the overhead percentage of filtered entries.
- Filters can be applied by --comms, --dsos and/or --symbols options and
- Zoom operations on the TUI (thread, dso, etc).
- "relative" means it's relative to filtered entries only so that the
- sum of shown entries will be always 100%. "absolute" means it retains
- the original value before and after the filter is applied.
- --header::
- Show header information in the perf.data file. This includes
- various information like hostname, OS and perf version, cpu/mem
- info, perf command line, event list and so on. Currently only
- --stdio output supports this feature.
- --header-only::
- Show only perf.data header (forces --stdio).
- --itrace::
- Options for decoding instruction tracing data. The options are:
- include::itrace.txt[]
- To disable decoding entirely, use --no-itrace.
- --full-source-path::
- Show the full path for source files for srcline output.
- --show-ref-call-graph::
- When multiple events are sampled, it may not be needed to collect
- callgraphs for all of them. The sample sites are usually nearby,
- and it's enough to collect the callgraphs on a reference event.
- So user can use "call-graph=no" event modifier to disable callgraph
- for other events to reduce the overhead.
- However, perf report cannot show callgraphs for the event which
- disable the callgraph.
- This option extends the perf report to show reference callgraphs,
- which collected by reference event, in no callgraph event.
- --socket-filter::
- Only report the samples on the processor socket that match with this filter
- --raw-trace::
- When displaying traceevent output, do not use print fmt or plugins.
- --hierarchy::
- Enable hierarchical output.
- include::callchain-overhead-calculation.txt[]
- SEE ALSO
- --------
- linkperf:perf-stat[1], linkperf:perf-annotate[1]
|