The opreport utility is the primary utility you will use for getting formatted data out of OProfile. It produces two types of data: image summaries and symbol summaries. An image summary lists the number of samples for individual binary images such as libraries or applications. Symbol summaries provide per-symbol profile data. In the following truncated example, we see an image summary for the whole system:
$ opreport --long-filenames CPU: Intel Sandy Bridge microarchitecture, speed 2401 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 100000 CPU_CLK_UNHALT...| samples| %| ------------------ 22577 28.9011 /usr/bin/Xorg CPU_CLK_UNHALT...| samples| %| ------------------ 16846 74.6158 /proc/kallsyms 2126 9.4167 /usr/bin/Xorg 763 3.3795 /usr/lib64/libpixman-1.so.0.26.2 ... 17402 22.2766 /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.55.x86_64/jre/bin/java CPU_CLK_UNHALT...| samples| %| ------------------ 5666 32.5595 anon (tgid:29664 range:0x7f3475000000-0x7f347616ffff) 2312 13.2858 /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.55.x86_64/jre/lib/amd64/server/libjvm.so ... 11554 14.7904 /home/user1/oprof-install/bin/operf CPU_CLK_UNHALT...| samples| %| ------------------ 7467 64.6270 /proc/kallsyms 1691 14.6356 /usr/bin/operf 1324 11.4592 /lib64/libc-2.12.so 455 3.9380 /usr/lib64/libstdc++.so.6.0.13 315 2.7263 /ext4 ... ... |
If we had specified --symbols
in the previous command, we would have
gotten a symbol summary of all the images across the entire system. We can restrict this to only
part of the system profile; for example,
below is a symbol summary for the operf program used to collect the profile.
$ opreport -l -p /lib/modules/`uname -r` `which operf` 2>/dev/null | more CPU: Intel Sandy Bridge microarchitecture, speed 2401 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 100000 samples % image name symbol name 860 7.4607 kallsyms avtab_search_node 474 4.1121 operf OP_perf_utils::op_write_event(event_union*, unsigned long long) 461 3.9993 kallsyms avc_has_perm_noaudit 455 3.9473 libstdc++.so.6.0.13 /usr/lib64/libstdc++.so.6.0.13 412 3.5742 libc-2.12.so _IO_vfscanf 369 3.2012 kallsyms __d_lookup 350 3.0363 kallsyms sidtab_context_to_sid 274 2.3770 operf OP_perf_utils::op_record_process_exec_mmaps(int, int, int, operf_record*) 232 2.0127 operf operf_process_info::find_mapping_for_sample(unsigned long long, bool) 222 1.9259 kallsyms __link_path_walk 191 1.6570 kallsyms pipe_read 34 0.2950 ext4.ko ext4_mark_iloc_dirty ... |
These are the two basic ways you are most likely to use regularly, but opreport can do a lot more than that, as described below.
--separate[*]
options
whilst profiling, there can be several separate profiles for
a single binary image within a session. Normally the output
will keep these images separated. So, for example, if you profiled
with separation on a per-cpu basis (operf --separate-cpu
),
you would see separate columns in
the output of opreport for each CPU where samples
were recorded. But it can be useful to merge these results back together
to make the report more readable. The --merge
option allows
you to do that.
event:
profile specifications, etc.
This section provides details on how to use the OProfile callgraph feature.
When using the --callgraph
option, you can see what
functions are calling other functions in the output. Consider the
following program:
#include <string.h> #include <stdlib.h> #include <stdio.h> #define SIZE 500000 static int compare(const void *s1, const void *s2) { return strcmp(s1, s2); } static void repeat(void) { int i; char *strings[SIZE]; char str[] = "abcdefghijklmnopqrstuvwxyz"; for (i = 0; i < SIZE; ++i) { strings[i] = strdup(str); strfry(strings[i]); } qsort(strings, SIZE, sizeof(char *), compare); } int main() { while (1) repeat(); } |
When running with the call-graph option, OProfile will record the function stack every time it takes a sample. opreport --callgraph outputs an entry for each function, where each entry looks similar to:
samples % image name symbol name 197 0.1548 cg main 127036 99.8452 cg repeat 84590 42.5084 libc-2.3.2.so strfry 84590 66.4838 libc-2.3.2.so strfry [self] 39169 30.7850 libc-2.3.2.so random_r 3475 2.7312 libc-2.3.2.so __i686.get_pc_thunk.bx ------------------------------------------------------------------------------- |
Here the non-indented line is the function we're focussing upon
(strfry()
). This
line is the same as you'd get from a normal opreport
output.
Above the non-indented line we find the functions that called this
function (for example, repeat()
calls
strfry()
). The samples and percentage values here
refer to the number of times we took a sample where this call was found
in the stack; the percentage is relative to all other callers of the
function we're focussing on. Note that these values are
not call counts; they only reflect the call stack
every time a sample is taken; that is, if a call is found in the stack
at the time of a sample, it is recorded in this count.
Below the line are functions that are called by
strfry()
(called callees).
It's clear here that strfry()
calls
random_r()
. We also see a special entry with a
"[self]" marker. This records the normal samples for the function, but
the percentage becomes relative to all callees. This allows you to
compare time spent in the function itself compared to functions it
calls. Note that if a function calls itself, then it will appear in the
list of callees of itself, but without the "[self]" marker; so recursive
calls are still clearly separable.
You may have noticed that the output lists main()
as calling strfry()
, but it's clear from the source
that this doesn't actually happen. See Section 3, “Interpreting call-graph profiles” for an explanation.
Callgraph output where anonymously mapped code is in the callstack can sometimes be misleading.
For all such code, the samples for the anonymously mapped code are stored in a samples subdirectory
named {anon:anon}/<tgid>.<begin_addr>.<end_addr>
.
As stated earlier, if this anonymously mapped code is JITed code from a supported VM like Java,
OProfile creates an ELF file to provide a (somewhat) permanent backing file for the code.
However, when viewing callgraph output, any anonymously mapped code in the callstack
will be attributed to anon (<tgid>: range:<begin_addr>-<end_addr>
,
even if a .jo
ELF file had been created for it. See the example below.
------------------------------------------------------------------------------- 1 2.2727 libj9ute23.so java.bin traceV 2 4.5455 libj9ute23.so java.bin utsTraceV 4 9.0909 libj9trc23.so java.bin fillInUTInterfaces 37 84.0909 libj9trc23.so java.bin twGetSequenceCounter 8 0.0154 libj9prt23.so java.bin j9time_hires_clock 27 61.3636 anon (tgid:10014 range:0x100000-0x103000) java.bin (no symbols) 9 20.4545 libc-2.4.so java.bin gettimeofday 8 18.1818 libj9prt23.so java.bin j9time_hires_clock [self] ------------------------------------------------------------------------------- |
The output shows that "anon (tgid:10014 range:0x100000-0x103000)" was a callee of
j9time_hires_clock
, even though the ELF file 10014.jo
was
created for this profile run. Unfortunately, there is currently no way to correlate
that anonymous callgraph entry with its corresponding .jo
file.
Often, we'd like to be able to compare two profiles. For example, when analysing the performance of an application, we'd like to make code changes and examine the effect of the change. This is supported in opreport by giving a profile specification that identifies two different profiles. The general form is of:
$ opreport <shared-spec> { <first-profile> } { <second-profile> } |
We lost our Dragon book down the back of the sofa, so you have to be careful to have spaces around those braces, or things will get hopelessly confused. We can only apologise.
For each of the profiles, the shared section is prefixed, and then the specification is analysed. The usual parameters work both within the shared section, and in the sub-specification within the curly braces.
A typical way to use this feature is with archives created with oparchive. Let's look at an example:
$ operf ./a $ oparchive -o orig ./a # edit and recompile a $ operf ./a # now compare the current profile of a with the archived profile $ opreport --session-dir=`pwd`/oprofile_data/ -xl ./a { archive:./orig } { } CPU: PIII, speed 863.233 MHz (estimated) Counted CPU_CLK_UNHALTED events (clocks processor is not halted) with a unit mask of 0x00 (No unit mask) count 100000 samples % diff % symbol name 92435 48.5366 +0.4999 a 54226 --- --- c 49222 25.8459 +++ d 48787 25.6175 -2.2e-01 b |
Note that we specified an empty second profile in the curly braces, as we wanted to use the current session; alternatively, we could have specified another archive, or a tgid etc. We specified the binary a in the shared section, so we matched that in both the profiles we're diffing.
As in the normal output, the results are sorted by the number of samples, and the percentage field represents the relative percentage of the symbol's samples in the second profile.
Notice the new column in the output. This value represents the
percentage change of the relative percent between the first and the
second profile: roughly, "how much more important this symbol is".
Looking at the symbol a()
, we can see that it took
roughly the same amount of the total profile in both the first and the
second profile. The function c()
was not in the new
profile, so has been marked with ---
. Note that the
sample value is the number of samples in the first profile; since we're
displaying results for the second profile, we don't list a percentage
value for it, as it would be meaningless. d()
is
new in the second profile, and consequently marked with
+++
.
When comparing profiles between different binaries, it should be clear that functions can change in terms of VMA and size. To avoid this problem, opreport considers a symbol to be the same if the symbol name, image name, and owning application name all match; any other factors are ignored. Note that the check for application name means that trying to compare library profiles between two different applications will not work as you might expect: each symbol will be considered different.
Many applications, typically ones involving dynamic compilation into machine code (just-in-time, or "JIT", compilation), have executable mappings that are not backed by an ELF file. opreport has basic support for showing the samples taken in these regions; for example:
$ opreport /usr/bin/mono -l CPU: ppc64 POWER5, speed 1654.34 MHz (estimated) Counted CYCLES events (Processor Cycles using continuous sampling) with a unit mask of 0x00 (No unit mask) count 100000 samples % image name symbol name 47 58.7500 mono (no symbols) 14 17.5000 anon (tgid:3189 range:0xf72aa000-0xf72fa000) (no symbols) 9 11.2500 anon (tgid:3189 range:0xf6cca000-0xf6dd9000) (no symbols) . . . . |
Note that, since such mappings are dependent upon individual invocations of
a binary, these mappings are always listed as a dependent image.
Equally, the results are not affected by the --merge
option.
As shown in the opreport output above, OProfile is unable to attribute the samples to any symbol(s) because there is no ELF file for this code. Enhanced support for JITed code is now available for some virtual machines; e.g., the Java Virtual Machine. For details about OProfile output for JITed code, see Section 4, “OProfile results with JIT samples”.
For more information about JIT support in OProfile, see Section 4.1, “Support for dynamically compiled (JIT) code”.
The --xml option can be used to generate XML instead of the usual text format. This allows opreport to eliminate some of the constraints dictated by the two dimensional text format. For example, it is possible to separate the sample data across multiple events, cpus and threads. The XML schema implemented by opreport is found in doc/opreport.xsd. It contains more detailed comments about the structure of the XML generated by opreport.
Since XML is consumed by a client program rather than a user, its structure is fairly static. In particular, the --sort option is incompatible with the --xml option. Percentages are not dislayed in the XML so the options related to percentages will have no effect. Full pathnames are always displayed in the XML so --long-filenames is not necessary. The --details option will cause all of the individual sample data to be included in the XML as well as the instruction byte stream for each symbol (for doing disassembly) and can result in very large XML files.
--accumulated / -a
Accumulate sample and percentage counts in the symbol list.
--callgraph / -c
Show callgraph information.
--debug-info / -g
Show source file and line for each symbol.
--demangle / -D none|normal|smart
none: no demangling. normal: use default demangler (default) smart: use pattern-matching to make C++ symbol demangling more readable.
--details / -d
Show per-instruction details for all selected symbols. Note that, for binaries without symbol information, the VMA values shown are raw file offsets for the image binary.
--exclude-dependent / -x
Do not include application-specific images for libraries, kernel modules and the kernel..
--exclude-symbols / -e [symbols]
Exclude all the symbols in the given comma-separated list.
--global-percent / -%
Make all percentages relative to the whole profile.
--help / -? / --usage
Show help message.
--image-path / -p [paths]
Comma-separated list of additional paths to search for binaries. This is needed to find kernel modules.
--root / -R [path]
A path to a filesystem to search for additional binaries.
--include-symbols / -i [symbols]
Only include symbols in the given comma-separated list.
--long-filenames / -f
Output full paths instead of basenames.
--merge / -m [lib,cpu,tid,tgid,unitmask,all]
Merge any profiles separated in a --separate session.
--no-header
Don't output a header detailing profiling parameters.
--output-file / -o [file]
Output to the given file instead of stdout.
--reverse-sort / -r
Reverse the sort from the default.
--session-dir=dir_path
Use sample database from the specified directory dir_path
instead
of the default location. If this option is not specified, then opreport will search for
samples in <cur_dir>/oprofile_data
first. If that directory does not exist, the standard session-dir of
/var/lib/oprofile
is used
as the session directory.
--show-address / -w
Show the VMA address of each symbol (off by default).
--sort / -s [vma,sample,symbol,debug,image]
Sort the list of symbols by, respectively, symbol address, number of samples, symbol name, debug filename and line number, binary image filename.
--symbols / -l
List per-symbol information instead of a binary image summary.
--threshold / -t [percentage]
Only output data for symbols that have more than the given percentage of total samples. For profiles using multiple events, if the threshold is reached for any event, then all sample data for the symbol is shown.
--verbose / -V [options]
Give verbose debugging output.
--version / -v
Show version.
--xml / -X
Generate XML output.