The compiler can introduce some pitfalls in the annotated source output. The optimizer can move pieces of code in such manner that two line of codes are interlaced (instruction scheduling). Also debug info generated by the compiler can show strange behavior. This is especially true for complex expressions e.g. inside an if statement:
if (a && .. b && .. c &&) |
here the problem come from the position of line number. The available debug
info does not give enough details for the if condition, so all samples are
accumulated at the position of the right brace of the expression. Using
opannotate -a
can help to show the real
samples at an assembly level.
The compiler generally needs to generate "glue" code across function calls, dependent on the particular function call conventions used. Additionally other things need to happen, like stack pointer adjustment for the local variables; this code is known as the function prologue. Similar code is needed at function return, and is known as the function epilogue. This will show up in annotations as samples at the very start and end of a function, where there is no apparent executable code in the source.
You may see that a function is credited with a certain number of samples, but the listing does not add up to the correct total. To pick a real example :
:internal_sk_buff_alloc_security(struct sk_buff *skb) 353 2.342% :{ /* internal_sk_buff_alloc_security total: 1882 12.48% */ : : sk_buff_security_t *sksec; 15 0.0995% : int rc = 0; : 10 0.06633% : sksec = skb->lsm_security; 468 3.104% : if (sksec && sksec->magic == DSI_MAGIC) { : goto out; : } : : sksec = (sk_buff_security_t *) get_sk_buff_memory(skb); 3 0.0199% : if (!sksec) { 38 0.2521% : rc = -ENOMEM; : goto out; 10 0.06633% : } : memset(sksec, 0, sizeof (sk_buff_security_t)); 44 0.2919% : sksec->magic = DSI_MAGIC; 32 0.2123% : sksec->skb = skb; 45 0.2985% : sksec->sid = DSI_SID_NORMAL; 31 0.2056% : skb->lsm_security = sksec; : : out: : 146 0.9685% : return rc; : 98 0.6501% :} |
Here, the function is credited with 1,882 samples, but the annotations
below do not account for this. This is usually because of inline functions -
the compiler marks such code with debug entries for the inline function
definition, and this is where opannotate annotates
such samples. In the case above, memset
is the most
likely candidate for this problem. Examining the mixed source/assembly
output can help identify such results.
This problem is more visible when there is no source file available, in the following example it's trivially visible the sums of symbols samples is less than the number of the samples for this file. The difference must be accounted to inline functions.
/* * Total samples for file : "arch/i386/kernel/process.c" * * 109 2.4616 */ /* default_idle total: 84 1.8970 */ /* cpu_idle total: 21 0.4743 */ /* flush_thread total: 1 0.0226 */ /* prepare_to_copy total: 1 0.0226 */ /* __switch_to total: 18 0.4065 */ |
The missing samples are not lost, they will be credited to another source location where the inlined function is defined. The inlined function will be credited from multiple call site and merged in one place in the annotated source file so there is no way to see from what call site are coming the samples for an inlined function.
When running opannotate, you may get a warning "some functions compiled without debug information may have incorrect source line attributions". In some rare cases, OProfile is not able to verify that the derived source line is correct (when some parts of the binary image are compiled without debugging information). Be wary of results if this warning appears.
Furthermore, for some languages the compiler can implicitly generate functions, such as default copy constructors. Such functions are labelled by the compiler as having a line number of 0, which means the source annotation can be confusing.
Depending on your compiler you can fall into the following problem:
struct big_object { int a[500]; }; int main() { big_object a, b; for (int i = 0 ; i != 1000 * 1000; ++i) b = a; return 0; } |
Compiled with gcc 3.0.4 the annotated source is clearly inaccurate:
:int main() :{ /* main total: 7871 100% */ : big_object a, b; : for (int i = 0 ; i != 1000 * 1000; ++i) : b = a; 7871 100% : return 0; :} |
The problem here is distinct from the IRQ latency problem; the debug line number information is not precise enough; again, looking at output of opannoatate -as can help.
:int main() :{ : big_object a, b; : for (int i = 0 ; i != 1000 * 1000; ++i) : 80484c0: push %ebp : 80484c1: mov %esp,%ebp : 80484c3: sub $0xfac,%esp : 80484c9: push %edi : 80484ca: push %esi : 80484cb: push %ebx : b = a; : 80484cc: lea 0xfffff060(%ebp),%edx : 80484d2: lea 0xfffff830(%ebp),%eax : 80484d8: mov $0xf423f,%ebx : 80484dd: lea 0x0(%esi),%esi : return 0; 3 0.03811% : 80484e0: mov %edx,%edi : 80484e2: mov %eax,%esi 1 0.0127% : 80484e4: cld 8 0.1016% : 80484e5: mov $0x1f4,%ecx 7850 99.73% : 80484ea: repz movsl %ds:(%esi),%es:(%edi) 9 0.1143% : 80484ec: dec %ebx : 80484ed: jns 80484e0 : 80484ef: xor %eax,%eax : 80484f1: pop %ebx : 80484f2: pop %esi : 80484f3: pop %edi : 80484f4: leave : 80484f5: ret |
So here it's clear that copying is correctly credited with of all the samples, but the line number information is misplaced. objdump -dS exposes the same problem. Note that maintaining accurate debug information for compilers when optimizing is difficult, so this problem is not suprising. The problem of debug information accuracy is also dependent on the binutils version used; some BFD library versions contain a work-around for known problems of gcc, some others do not. This is unfortunate but we must live with that, since profiling is pointless when you disable optimisation (which would give better debugging entries).