Zariko's Place

How to profile only a part of the code using callgrind

Callgrind allows to profile an application. Using kcachegrind it’s easy to get a picture of your application CPU consumption.

Note

Callgrind records only CPU consumption, if your application is slow down by IO, callgrind won’t help you.

Code with a long initialisation

Sometimes it’s convenient to profile only a little part of your code. For example, what if you have a long initialization phase ?

Let’s take a little dummy example (code follow), Let’s say I’m interested in the f1 call CPU consumption.

// long initialisation will produce a big tmp.txt file filled with numbers
void longInitialisation() {
  ofstream fout( "tmp.txt", ios_base::out | ios_base::trunc );
  for (int i=0; i <100000; ++i) {
    fout << i << endl;
  }
}

// no inline so I see it in callgrind report
__attribute__((noinline)) int f2(int i) {
  return i*2;
}

__attribute__((noinline)) int f1(int seed) {
  int res = 0;
  for (int i = 0; i<100; ++i) {
    res += f2(i) * seed;
  }
  return res;
}

int main(int argc, char * argv[]) {
  longInitialisation();
  int a = f1(argc); //take argc as param to prevent optimisations
  cout << a << endl;
  return 0;
}

When running this example with callgrind :

valgrind –tool=callgrind ./my_example

The output mainly show my longInitialisation function cost :

../_images/callgrind_piece_full.png

Only showing part of interest

Callgrind provides a nice feature, the application can ask the instrumentation to start and stop. The code becomes :

#include <valgrind/callgrind.h>

void longInitialisation() {
  ofstream fout( "tmp.txt", ios_base::out | ios_base::trunc );
  for (int i=0; i<100000; ++i) {
    fout << i << endl;
  }
}

__attribute__((noinline)) int f2(int i) {
  return i*2;
}

__attribute__((noinline)) int f1(int seed) {
  int res = 0;
  for (int i = 0; i<100; ++i) {
    res += f2(i) * seed;
  }
  return res;
}

int main(int argc, char * argv[]) {
  longInitialisation();
  CALLGRIND_START_INSTRUMENTATION;
  int a = f1(argc);
  CALLGRIND_STOP_INSTRUMENTATION;
  CALLGRIND_DUMP_STATS;
  cout << a << endl;
  return 0;
}

Three macros defined in the callgrind.h header are used to control the callgrind instrumentation.

Running once again callgrind (notice the –instr-atstart=no which ask the instrumentation to not begin at the start of the program) :

valgrind –tool=callgrind –instr-atstart=no ./my_example

And the result is much cleaner :

../_images/callgrind_piece_only_f11.png