This shows you the differences between two versions of the page.
Both sides previous revision Previous revision | |||
code:profiling [2017/12/22 14:25] phil |
code:profiling [2020/03/29 12:13] (current) |
||
---|---|---|---|
Line 5: | Line 5: | ||
===== Dynamic Memory Allocation ===== | ===== Dynamic Memory Allocation ===== | ||
- | The biggest problems with dynamic memory management are: | + | Typical problems with dynamic memory management are: |
- leaks and | - leaks and | ||
- corruption. | - corruption. | ||
- | While the later is rather tricky to analyse, for memleaks there is | + | While the latter is rather tricky to analyse, for memleaks there is |
''valgrind''. Invocation as follows: | ''valgrind''. Invocation as follows: | ||
<code> | <code> | ||
Line 17: | Line 17: | ||
===== Performance ===== | ===== Performance ===== | ||
- | When programming, the code complexity (O-notation) is the main factor | + | When programming, code complexity (O-notation) is the main factor identifying |
- | identifying CPU-intense algorithms. Reducing the code's complexity often | + | CPU-intense algorithms. Reducing code complexity often doesn't suffice, though. |
- | doesn't suffice, though. E.g. IO-intense operations often lead to delays at | + | E.g. IO-intense operations often lead to delays at run-time which isn't covered |
- | run-time which isn't covered by the O-notation, at all. This means that aside | + | by O-notation, at all. This means that aside of complexity analysis, there |
- | of complexity analysis, there always should be run-time code execution time | + | always should be run-time code execution time measurement. |
- | measurement. And this is where ''gprof'' comes into action: | + | |
+ | ==== GProf ==== | ||
+ | |||
+ | ''gprof'' is a profiler integrated into gcc. Enabled at compile-time, the | ||
+ | program collects profiling data for later analysis using ''gprof'' tool: | ||
<code> | <code> | ||
$ gcc -pg -g test.c | $ gcc -pg -g test.c | ||
$ ./a.out | $ ./a.out | ||
$ gprof a.out gmon.out | $ gprof a.out gmon.out | ||
+ | </code> | ||
+ | |||
+ | ==== Perf ==== | ||
+ | |||
+ | On recent kernels, ''perf'' is the best tool for the job. It may profile the | ||
+ | whole system like with obsoleted OProfile but may be limited to a single | ||
+ | program, also. Before executing the workload to profile (or while it is | ||
+ | running), call: | ||
+ | <code> | ||
+ | # perf record | ||
+ | </code> | ||
+ | When done, finish recording using ''CTRL-c''. perf data will be written to //perf.data// in local directory. To analyse, call: | ||
+ | <code> | ||
+ | # perf report | ||
</code> | </code> | ||
Line 52: | Line 71: | ||
of //test.c//. | of //test.c//. | ||
+ | If linking happens in a separate step, some additional flags have to be passed to linker: | ||
+ | <code> | ||
+ | CFLAGS += -fprofile-arcs -ftest-coverage | ||
+ | LDFLAGS += -lgcov --coverage | ||
+ | </code> | ||