A program trace (or more succinctly, a trace)
is a record of the execution path of a program that has been measured
during program execution. If we can measure a trace from a running
program, we avoid the need to analyze the program's CDFG to determine
the execution path. Traces are typically large. Since we usually need
to look at the entire execution history of a program, traces can be
gigabytes in size.
However, because disks to hold the traces are inexpensive and
workstations to analyze them are fast and cheap, traces are an
appealing way to analyze programs. We have to consider two problems:
how to generate a trace and how to use it to determine program
performance.
A trace can be captured by using hardware methods. For example, a
logic analyzer can be attached to the microprocessor bus to capture
memory bus traffic. There are two major limitations to this approach.
First, the size of hardware trace buffers is usually limited.
The hardware must be able to store instructions at execution speed;
if we use high-speed RAM for the buffer, we are limited to perhaps
several million instructions. This means that we probably cannot
capture a trace of the entire program but only a section of its
execution. Second, if the CPU has an on-chip cache, we will not be able
to see internal memory references to the program or the data.
Alternatively, a microprocessor emulator can be used to trace the PC
over time. Although this technique allows us to see what is going on
inside the chip and not just behavior on the memory bus, it is too slow
to generate long traces. Since the microprocessor must be stopped for
probing and probing is very slow compared to normal execution,
capturing long traces would require exorbitant amounts of time.
Some CPUs have hardware facilities for automatically generating
trace information. For example, the Pentium family microprocessors
generate a special bus cycle, a branch trace message, that shows the
source and/or destination address of a branch [Col97]. If we record only traces,
we can reconstruct the instructions executed within the basic blocks
while greatly reducing the amount of memory required to hold the trace.
There are three major methods for generating a trace by software: PC
sampling, instrumentation instructions, and simulation.
The PC sampling technique uses the operating system or some other
facility to periodically interrupt the program and save away the
current PC value.
This method has the same limitations of any sampling method - if the
sampling period is too slow, we may miss important behavior; in
particular, the wrong sampling interval can be a multiple of some
periodic behavior in the program that could be totally missed.
Alternatively, instructions can be added to the program that will write
out information to the trace buffer.
The instructions required depend on the information to be captured.
The simplest case is to add code to the start of each basic block that
remembers the program execution going through that point. Traces are
often used to analyze cache or superscalar behavior, in which case the
instrumentation code may need to save additional information about the
state of the CPU.
Software-based trace generation methods do add execution overhead
time to the program, but this usually does not affect the measurement.
Simulation methods interpret the instructions to simulate the CPU, and
they can generate the required trace information at the same time.
Obtaining a representative trace requires some knowledge of what the
program does. Someone needs to supply inputs that properly exercise the
program. Program users should have knowledge of the types of data
typically presented to the program.
However, because they may not know which data inputs will cause
worst-case behavior, some collaboration between the program designers
and users may be necessary. The techniques to be described later in
this series can be used to measure how thoroughly a set of inputs
covers the program's control and data flow, giving you an idea of the
representativeness of your traces.