2. Collating the candidate sample files

At this point we have a duplicate-free list of sample files we need to process. But first we need to do some further arrangement: we need to classify each sample file, and we may also need to "invert" the profiles.

2.1. Classifying sample files

It's possible for utilities like opreport to show data in columnar format: for example, we might want to show the results of two threads within a process side-by-side. To do this, we need to classify each sample file into classes - the classes correspond with each opreport column. The function that handles this is arrange_profiles(). Each sample file is added to a particular class. If the sample file is the first in its class, a template is generated from the sample file. Each template describes a particular class (thus, in our example above, each template will have a different thread ID, and this uniquely identifies each class).

Each class has a list of "profile sets" matching that class's template. A profile set is either a profile of the primary binary image, or any of its dependent images. After all sample files have been listed in one of the profile sets belonging to the classes, we have to name each class and perform error-checking. This is done by identify_classes(); each class is checked to ensure that its "axis" is the same as all the others. This is needed because opreport can't produce results in 3D format: we can only differ in one aspect, such as thread ID or event name.

2.2. Creating inverted profile lists

Remember that if we're using certain profile separation options, such as "--separate=lib", a single binary could be a dependent image to many different binaries. For example, the C library image would be a dependent image for most programs that have been profiled. As it happens, this can cause severe performance problems: without some re-arrangement, these dependent binary images would be opened each time we need to process sample files for each program.

The solution is to "invert" the profiles via invert_profiles(). We create a new data structure where the dependent binary is first, and the primary binary images using that dependent binary are listed as sub-images. This helps our performance problem, as now we only need to open each dependent image once, when we process the list of inverted profiles.