The Fundamentals of Performance Profilers - What You Need to Know
Profilers are software development tools designed to help you analyze the performance
of your applications and improve poorly performing sections of code. They provide
measurements of how long a routine takes to execute, how often it is called, where
it is called from, and how much of total time at some spot is spent executing that
routine. If you've used a profiler in the past, you'll certainly agree that it is
a wonderful asset during the development and QA process. Did you ever wonder, though,
if the results and timings produced by a profiler are actually correct?
There are different ways to measure the performance of an application while it runs.
Depending on the method used, profiler results will vary; this can affect your ability
to optimize your projects. Profiling methods fall into two broad categories: Instrumenting
and Sampling. Let's take a look at each.
Instrumentation
Instrumenting profilers insert special code at the beginning and end of each routine
to record when the routine starts and when it exits. With this information, the
profiler aims to measure the actual time taken by the routine on each call. This
type of profiler may also record which other routines are called from a routine.
It can then display the time for the entire routine and also break it down into
time spent locally and time spent on each call to another routine.
Currently, two types of instrumenting profilers are available on the market: source-code
modifying and binary.
Source-code modifying profilers create several problems. They tend to conflict
with source code control systems. They do not always reliably parse the source they
are supposed to instrument. In fact, since the addition and removal of instrumentation
can be bit tricky, source-code modifying profilers often suggest that users work
with a copy of the project source to avoid possible corruption.
Also, at best these profilers can only insert their instrumenting code at the start
of a procedure in source. At this point, procedure setup has already run (for stack
frames, local variables, parameters). In small procedures, setup can be a significant
portion of execution time. Yet, there is no way to time the setup itself using a
source-code modifying profiler.
Binary profilers (or hierarchical profilers ) work strictly at runtime.
They insert their instrumentation directly into an application's executable code
once it is loaded in memory. Source code is not required in any way, and thus there
is no risk of corrupting it. Since a binary profiler works anew on each execution,
it is also very easy to find some slow code on one execution, try an improvement
in source, recompile and test again - incremental optimization is supported
in real time.
And of course a binary profiler inserts its instrumentation just at the first assembly
instruction of each routine. This insures that routine setup is counted in the timing.
Pitfalls of Instrumentation
The timer calls which an instrumenting profiler inserts at the start and end of
each profiled routine take some time themselves. To account for this, at the start
of each run instrumenting profilers measure the overhead incurred from the instrumenting
process - they calibrate themselves - and they later subtract this overhead
from performance measurements. This usually works out very well.
However, when a routine is very short, another effect due to the instrumentation
becomes important. Modern processors are quite dependent on order of execution for
branch predictions and other CPU optimizations. Inevitably, inserting a timing operation
at the start and end of a very small routine disturbs the way it would execute in
the CPU, absent the timing calls. If you have a small routine that is called millions
of times, an instrumenting profiler will not yield an accurate time comparison between
this routine and larger routines. If you ignore this, you may spend great deal of
effort optimizing routines that are not the real bottlenecks.
Sampling
To help address the limitations of instrumenting profilers, sampling profilers
let applications run without any runtime modifications. Nothing is inserted, order
of execution is not affected, and all profiling work is done outside the application’s
process.
The operating system interrupts the CPU at regular intervals (time slices) to execute
process switches. At that point, a sampling profiler will record the currently-executed
instruction (the execution point) for the application it is profiling. This is as
short an operation as can possibly be implemented: the contents of one CPU register
are copied to memory. Using debug information linked into the application's executable,
the profiler later correlates the recorded execution points with the routine and
source code line they belong to. What the profiling finally yields is the frequency
with which a given routine or source line was executing at a given period in the
application's run, or over the entire run.
As the profiler operation is executed less often, and as it is so much simpler than
a time measurement, the overhead is negligible, and the application runs practically
at its real speed.
A sampling profiler is the perfect tool to isolate small, often-called routines
that cause bottlenecks in program execution. The downside is that its evaluations
of time spent are approximations. It is not impossible that a quite fast
routine should regularly be executing at the sampling interrupts. To make sure a
given routine is slow, it is recommended the application be run twice through
the sampling profiler.
Another limitation of sampling is that it only tells what routine is executing currently,
not where it was called from. A sampling profiler cannot give you a parent-child
call trace of your application. Nor can it show you that a routine is actually running
slow, when the time is not spent in its own code, but in routines it calls, either
because it makes many calls, or because the called routines are slow. By contrast,
the code which an instrumenting profiler inserts at the beginning of each routine
traces out and records what other routine it is being called from.
How To Insure Proper Performance Analysis
There are many other differences between profiling methods. But our point is already
made: not all profilers are alike, each has strengths and weaknesses and each is
properly applied only to specific aspects of application testing during development.
To accurately and successfully isolate bottlenecks in your code, you must use a
combination of profilers.
AutomatedQA's AQtime is the only
tool on the market today that offers you both a binary profiler (which never touches
your source, of course) and a sampling profiler - in one integrated package.
Together, these AQtime profilers help you hunt down slow code with amazing accuracy.
To learn more about AQtime, write to us at: sales@automatedqa.com