User:MikeDunlavey/Sandbox

Performance analysis by call stack sampling

This method differs significantly from use of profilers in Performance analysis. At this time, it is primarily a manual technique, though profiling tools could easily evolve to support it.

Example program[edit]

This page shows an example of the technique, using a highly artificial program. The example is a distillation of experience with many real projects.

Certain statements in the program are labelled as necessary or unnecessary. In actual software, statements would obviously not be so labelled, and in general there is no need for function calls to be really necessary, unless they affect performance. The technique finds statements that do affect performance. Then their level of necessity becomes important, because a price is being paid for them. In this method, when a statement appears a significant percentage of the time on the call stack, it is examined, at which time the determination of its necessity is made.

 1 void main(){
 2     A(); /*necessary*/
 3     A(); /*unnecessary*/
 4 }
 5 void A(){
 6     B(); /*necessary*/
 7     B(); /*necessary*/
 8 }
 9 void B(){
10     C(); /*necessary*/
11     C(); /*unnecessary*/
12 }
13 void C(){
14     /* necessary hot spot that takes 1 sec */
15 }

Using a profiler[edit]

Using a typical profiler, one gets a histogram of the program counter, and some information about function timings and frequencies, and a call graph.

The histogram of the program counter concludes:

There is a hot spot - line 14 uses 100% of time
It is looked at, and since it's necesssary, nothing can be done.

The function-call instrumentation says:

 routine: ncalls, avgtime(sec)
    A:      2       4
    B:      4       2
    C:      8       1

 call graph:
    main calls A 2 times for a total of 2
    A calls B 2 times for a total of 4
    B calls C 2 times for a total of 8

To make sense of this, it is necessary to know what to expect in each routine, and then to look inside the routines that look suspicious. (Real software can have hundreds of routines.)

Using stack sampling[edit]

On the other hand, using the call-stack sampling method, attention is not so much paid to functions as to function calls, because one can actually do something about those, and one can tell how much time they will save.

It is not necessary to take a large number of call stack samples, as long as they are at random times. However, for this example, suppose call stack samples are taken about every second. They are (expressed in lines):

2,6,10,14
2,6,11,14
2,7,10,14
2,7,11,14
3,6,10,14
3,6,11,14
3,7,10,14
3,7,11,14

Notice that:

Line 14 appears on 100% of samples. It is a hot spot, but it is necessary, so it is left alone.
Line 10 appears on 50% of samples. It is necessary, so it is left alone.
Line 11 appears on 50% of samples. It is unnecessary, so it is removed.

Result: time reduced by 50% to 4 seconds.

Take samples again.

Line 2 appears on 50% of samples. It is necessary, so it is left alone.
Line 3 appears on 50% of samples. It is unnecessary, so it is removed.

Result: time reduced by 50% to 2 seconds.

Take samples again.

See that various lines appear on large percentages of samples
None of them are unnecessary, so the process stops.

Overall performance improvement: 4 times.

Typically, as the running time grows too short to manually take samples, a temporary outer loop may be added to make it take more than a few seconds.

Discussion[edit]

It is not necessary to take a complete or even large set of call stack samples to get the needed information. A small randomly collected set will do.

In actual large software, as the typical depth of the call stack increases, there are more statements to examine for necessity, resulting in more iterations of the method, and exponential speedups.

Typically, calls are not so much removed as executed less often. For example, an array search algorithm might spend nearly all of its time in a call to a string comparison function. By changing the algorithm from linear to binary search, for example, the number of calls to the function are reduced.

Often the calls that take a large percentage of time are not even visible in the source code, such as when objects are constructed or destructed as they go in and out of scope, or when automatic type-conversion occurs. This method finds those as well.

References[edit]

Dunlavey, “Performance Tuning: Slugging It Out!”, Dr. Dobb’s Journal, Vol 18, #12, November 1993, pp 18-26.

Dunlavey, "Building Better Applications: a Theory of Efficient Software Development" International Thomson Publishing ISBN 0-442-01740-5, 1994.