At this point we have a duplicate-free list of sample files we need to process. But first we need to do some further arrangement: we need to classify each sample file, and we may also need to "invert" the profiles.
It's possible for utilities like opreport to show
data in columnar format: for example, we might want to show the results
of two threads within a process side-by-side. To do this, we need
to classify each sample file into classes - the classes correspond
with each opreport column. The function that handles
this is arrange_profiles()
. Each sample file
is added to a particular class. If the sample file is the first in
its class, a template is generated from the sample file. Each template
describes a particular class (thus, in our example above, each template
will have a different thread ID, and this uniquely identifies each
class).
Each class has a list of "profile sets" matching that class's template.
A profile set is either a profile of the primary binary image, or any of
its dependent images. After all sample files have been listed in one of
the profile sets belonging to the classes, we have to name each class and
perform error-checking. This is done by
identify_classes()
; each class is checked to ensure
that its "axis" is the same as all the others. This is needed because
opreport can't produce results in 3D format: we can
only differ in one aspect, such as thread ID or event name.
Remember that if we're using certain profile separation options, such as "--separate=lib", a single binary could be a dependent image to many different binaries. For example, the C library image would be a dependent image for most programs that have been profiled. As it happens, this can cause severe performance problems: without some re-arrangement, these dependent binary images would be opened each time we need to process sample files for each program.
The solution is to "invert" the profiles via
invert_profiles()
. We create a new data structure
where the dependent binary is first, and the primary binary images using
that dependent binary are listed as sub-images. This helps our
performance problem, as now we only need to open each dependent image
once, when we process the list of inverted profiles.