11.2 Tracing

Symbolic debugging is fine for finding and correcting program errors. However, symbolic debugging offers little help in finding any sort of problem that involves an application's interaction with other applications or with the kernel. These sorts of behavioral problems necessitate the tracing of the actual interactions between your application and other software components.

The simplest form of tracing involves monitoring the interactions between a single application and the Linux kernel. This allows you to easily observe any problems that result from the passing of parameters or the wrong sequence of system calls.

Observing a single process in isolation is, however, not sufficient in all circumstances. If you are attempting to debug interprocess synchronization problems or time-sensitive issues, for example, you will need a system-wide tracing mechanism that provides you with the exact sequence and timing of events that occur throughout the system. For instance, in trying to understand why the Mars Pathfinder constantly rebooted while on Mars, the Jet Propulsion Laboratory engineers resorted to a system tracing tool for the VxWorks operating system.^[3]

^[3] For a very informative and entertaining account on what happened to the Mars Pathfinder on Mars, read Glenn Reeves' account at http://research.microsoft.com/~mbj/Mars_Pathfinder/Authoritative_Account.html. Glenn was the lead developer for the Mars Pathfinder software.

Fortunately, both single-process tracing and system tracing are available in Linux. The following sections discuss each one.

11.2.1 Single Process Tracing

The main tool for tracing a single process is strace. strace uses the ptrace( ) system call to intercept all system calls made by an application. Hence, it can extract all the system call information and display it in a human-readable format for you to analyze. Because strace is a widely used Linux tool, I do not explain how to use it, but just explain how to install it for your target. If you would like to have more details on the usage of strace, see Chapter 14 of Running Linux.

strace is available from http://www.liacs.nl/~wichert/strace/ under a BSD license. For my control module I used strace Version 4.4. Download the package and extract it in your ${PRJROOT}/debug directory. Move to the package's directory, then configure and build strace:

$ cd ${PRJROOT}/debug/strace-4.4
$ CC=powerpc-linux-gcc ./configure --host=$TARGET
$ make

If you wish to statically link against uClibc, add LDFLAGS="-static" to the make command line. Given that strace uses NSS, you need to use a special command line if you wish to link it statically to glibc, as we did for other packages in Chapter 10:

$ make \
> LDLIBS="-static -Wl --start-group -lc -lnss_files -lnss_dns \
> -lresolv -Wl --end-group"

When linked against glibc and stripped, strace is 145 KB in size if linked dynamically and 605 KB if linked statically. When linked against uClibc and stripped, strace is 140 KB in size if linked dynamically and 170 KB when linked statically.

Once the binary is compiled, copy it to your target's root filesystem:

$ cp strace ${PRJROOT}/rootfs/usr/sbin

There are no additional steps required to configure strace for use on the target. In addition, the use of strace on the target is identical to that of its use on a normal Linux workstation. See the web page listed earlier or the manpage included with the package if you need more information regarding the use of strace.

11.2.2 System Tracing

The main system tracing utility for Linux is the Linux Trace Toolkit (LTT), which was introduced and continues to be maintained by this book's author. In contrast with other tracing utilities such as strace, LTT does not use the ptrace( ) mechanism to intercept applications' behavior. Instead, a kernel patch is provided with LTT that instruments key kernel subsystems. The data generated by this instrumentation is then collected by the trace subsystem and forwarded to a trace daemon to be written to disk. The entire process has very little impact on the system's behavior and performance. Extensive tests have shown that the tracing infrastructure has marginal impact when not in use and an impact lower than 2.5% under some of the most stressful conditions.

In addition to reconstructing the system's behavior using the data generated during a trace run, the user utilities provided with LTT allow you to extract performance data regarding the system's behavior during the trace interval. Here's a summary of some of the tasks LTT can be used for:

Debugging interprocess synchronization problems
Understanding the interaction between your application, the other applications in the system, and the kernel
Measuring the time it takes for the kernel to service your application's requests
Measuring the time your application spent waiting because other processes had a higher priority
Measuring the time it takes for an interrupt's effects to propagate throughout the system
Understanding the exact reaction the system has to outside input

To achieve this, LTT's operation is subdivided into four software components:

The kernel instrumentation that generates the events being traced
The tracing subsystem that collects the data generated by the kernel instrumentation into a single buffer
The trace daemon that writes the tracing subsystem's buffers to disk
The visualization tool that post-processes the system trace and displays it in a human-readable form

The first two software components are implemented as a kernel patch and the last two are separate user-space tools. While the first three software components must run on the target, the last one, the visualization tool, can run on the host. In LTT Versions 0.9.5a and earlier, the tracing subsystem was accessed from user space as a device through the appropriate /dev entries. Starting in the development series leading to 0.9.6, however, this abstraction has been dropped following the recommendations of the kernel developers. Hence, though the following refers to the tracing subsystem as a device, newer versions of LTT will not use this abstraction and therefore will not require the creation of any /dev entries on your target's root filesystem.

Given that LTT can detect and handle traces that have different byte ordering, traces can be generated and read on entirely different systems. The traces generated on my PPC-based control module, for example, can be read transparently on an x86 host.

In addition to tracing a predefined set of events, LTT enables you to create and log your own custom events from both user space and kernel space. Have a look at the Examples directory included in the package for practical examples of such custom events. Also, if your target is an x86-or PPC-based system, you can use the DProbes package provided by IBM to add trace points to binaries, including the kernel, without recompiling. DProbes is available under the terms of the GPL from IBM's web site at http://oss.software.ibm.com/developer/opensource/linux/projects/dprobes/.

LTT is available under the terms of the GPL from Opersys's web site at http://www.opersys.com/LTT/. The project's web site includes links to in-depth documentation and a mailing list for LTT users. The current stable release is 0.9.5a, which supports the i386, PPC, and SH architectures. The 0.9.6 release currently in development adds support for the MIPS and the ARM architectures.

11.2.2.1 Preliminary manipulations

Download the LTT package, extract it in your ${PRJROOT}/debug directory, and move to the package's directory for the rest of the installation:

$ cd ${PRJROOT}/debug
$ tar xvzf TraceToolkit-0.9.5a.tgz
$ cd ${PRJROOT}/debug/TraceToolkit-0.9.5

The same online manual that provides detailed instructions on the use of LTT is included with the package under the Help directory.

11.2.2.2 Patching the kernel

For the kernel to generate the tracing information, it needs to be patched. There are kernel patches included with every LTT package in the Patches directory. Since the kernel changes with time, however, it is often necessary to update the kernel patches. The patches for the latest kernels are usually available from http://www.opersys.com/ftp/pub/LTT/ExtraPatches/. For my control module, for example, I used patch-ltt-linux-2.4.19-vanilla-020916-1.14. If you are using a different kernel, try adapting this patch to your kernel version. Unfortunately, it isn't feasible to create a patch for all kernel versions every time a new version of LTT is released. The task of using LTT would be much simpler if the patch was included as part of the main kernel tree, something your author has been trying to convince the kernel developers of doing for some time now. In the case of my control module, I had to fix the patched kernel because of failed hunks.

Given that the binary format of the traces changes over time, LTT versions cannot read data generated by any random trace patch version. The -1.14 version appended to the patch name identifies the trace format version generated by this patch. LTT 0.9.5a can read trace data written by patches that use format Version 1.14. It cannot, however, read any another format. If you try opening a trace of a format that is incompatible with the visualization tool, it will display an error and exit. In the future, the LTT developers plan to modify the tools and the trace format to avoid this limitation.

Once you've downloaded the selected patch, move it to the kernel's directory and patch the kernel:

$ mv patch-ltt-linux-2.4.19-vanilla-020916-1.14 \
> ${PRJROOT}/kernel/linux-2.4.18
$ cd ${PRJROOT}/kernel/linux-2.4.18
$ patch -p1 < patch-ltt-linux-2.4.19-vanilla-020916-1.14

You can then configure your kernel as you did earlier. In the main configuration menu, go in the "Kernel tracing" submenu and select the "Kernel events tracing support" as built-in. In the patches released prior to LTT 0.9.6pre2, such as the one I am using for my control module, you could also select tracing as a module and load the trace driver dynamically. However, this option has disappeared following the recommendations of the kernel developers to make the tracing infrastructure a kernel subsystem instead of a device.

Proceed on to building and installing the kernel on your target using the techniques covered in earlier chapters.

Though you may be tempted to return to a kernel without LTT once you're done developing the system, I suggest you keep the traceable kernel, since you never know when a bug may occur in the field. The Mars Pathfinder example I provided earlier is a case in point. For the Pathfinder, the JPL engineers applied the test what you fly and fly what you test philosophy, as explained in the paper I mentioned in the earlier footnote about the Mars Pathfinder problem. Note that the overall maximum system performance cost of tracing is lower than 0.5% when the trace daemon isn't running.

11.2.2.3 Building the trace daemon

As I explained earlier, the trace daemon is responsible for writing the trace data to permanent storage. Though this is a disk on most workstations, it is preferable to use an NFS-mounted filesystem to dump the trace data. You could certainly dump it to your target's MTD device, if it has one, but this will almost certainly result in increased wear, given that traces tend to be fairly large.

Return to the package's directory within the ${PRJROOT}/debug directory, and build the trace daemon:

$ cd ${PRJROOT}/debug/TraceToolkit-0.9.5
$ ./configure --prefix=${PREFIX}
$ make -C LibUserTrace CC=powerpc-linux-gcc UserTrace.o
$ make -C LibUserTrace CC=powerpc-linux-gcc LDFLAGS="-static"
$ make -C Daemon CC=powerpc-linux-gcc LDFLAGS="-static"

By setting the value of LDFLAGS to -static, we are generating a binary that is statically linked with LibUserTrace. This won't weigh down the target, since this library is very small. In addition, this will avoid us the trouble of having to keep track of an extra library for the target. The trace daemon binary we generated is, nevertheless, still dynamically linked to the C library. If you want it statically linked with the C library, use the following command instead:

$ make -C Daemon CC=powerpc-linux-gcc LDFLAGS="-all-static"

The binary generated is fairly small. When linked against glibc and stripped, the trace daemon is 18 KB in size when linked dynamically and 350 KB when linked statically. When linked against uClibc and stripped, the trace daemon is 16 KB in size when linked dynamically and 37 KB when linked statically.

Once built, copy the daemon and the basic trace helper scripts to the target's root filesystem:

$ cp tracedaemon Scripts/trace Scripts/tracecore Scripts/traceu \
> ${PRJROOT}/rootfs/usr/sbin

The trace helper scripts simplify the use of the trace daemon binary, which usually requires a fairly long command line to use adequately. Look at the LTT documentation for an explanation of the use of each helper script. My experience is that the trace script is the easiest way to start the trace daemon.

At the time of this writing, you need to create the appropriate device entries for the trace device on the target's root filesystem for the trace daemon to interface properly with the kernel's tracing components. Because the device obtains its major number at load time, make sure the major number you use for creating the device is accurate. The simplest way of doing this is to load all drivers in the exact order they will usually be loaded in and then cat the /proc/devices file to get the list of device major numbers. See Linux Device Drivers for complete details about major number allocation. Alternatively, you can try using the createdev.sh script included in the LTT package. For my control module, the major number allocated to the trace device is 254:^[4]

^[4] I obtain this number by looking at the /proc/devices file on my target after having loaded the trace driver.

$ su -m
Password:
# mknod ${PRJROOT}/rootfs/dev/tracer c 254 0
# mknod ${PRJROOT}/rootfs/dev/tracerU c 254 1
# exit

As I said earlier, if you are using a version of LTT that is newer that 0.9.5a, you may not need to create these entries. Refer to your package's documentation for more information.

11.2.2.4 Installing the visualization tool

The visualization tool runs on the host and is responsible for displaying the trace data in an intuitive way. It can operate both as a command-line utility, dumping the binary trace data in a textual format, or as a GTK-based graphical utility, displaying the trace as a trace graph, as a set of statistics, and as a raw text dump of the trace. The graphical interface is most certainly the simplest way to analyze a trace, though you may want to use the command-line textual dump if you want to run a script to analyze the textual output. If you plan to use the graphical interface, GTK must be installed on your system. Most distributions install GTK by default. If it isn't installed, use your distribution's package manager to install it.

We've already moved to the LTT package's directory and have configured it in the previous section. All that is left is to build and install the host components:

$ make -C LibLTT install
$ make -C Visualizer install

The visualizer binary, tracevisualizer, has been installed in the ${PREFIX}/bin directory, while helper scripts have been installed in ${PREFIX}/sbin. As with the trace daemon, the helper scripts let you avoid typing long command lines to start the trace visualizer.

11.2.2.5 Tracing the target and visualizing its behavior

You are now ready to trace your target. As I said earlier, to reduce wear, avoid using your target's solid-state storage device for recording the traces. Instead, either write the traces to an NFS-mounted filesystem or, if you would prefer to reduce polluting the traces with NFS-generated network traffic, use a TMPFS mounted directory to store the traces and copy them to your host after tracing is over.

Here is a simple command for tracing the target for 30 seconds:

# trace 30 outt

The outt name specified here is the prefix the command should use for the names of the output files. This command will generate two files: outt.trace, which contains the raw binary trace, and outt.proc, which contains a snapshot of the system's state at trace start. Both these files are necessary for reconstructing the system's behavior on the host using the visualization tool. If those files are stored locally on your target, copy them to your host using your favorite protocol.

It is possible that your system may be generating more events than the trace infrastructure can handle. In that case, the daemon will inform you upon exit that it lost events. You can then change the size of the buffers being used or the event set being traced to obtain all the data you need. Look at the documentation included in the package for more information on the parameters accepted by the trace daemon.

Once you've copied the files containing the trace to the host, you can view the trace using:

$ traceview outt

This command opens a window that looks like Figure 11-1.

Figure 11-1. Example LTT trace graph

In this case, the graph shows the interaction between the BusyBox shell and another BusyBox child. On the left side of the visualizer display, you see a list of all the processes that were active during the trace. The Linux kernel is always the bottom entry in that list. On the right side of the display, you see a graph that characterizes the behavior of the system. In that graph, horizontal lines illustrate the passage of time, while vertical lines illustrate a state transition. The graph portion displayed here shows that the system is running kernel code in the beginning. Near the start, the kernel returns control to the sh application, which continues running for a short period of time before making the wait4( ) system call. At that point, control is transferred back to the kernel, which runs for a while before initiating a scheduling change to the task with PID 21. This task starts executing, but an exception occurs, which results in a control transfer to the kernel again.

The graph continues in both directions, and you can scroll left or right to see what happened before or after this trace segment. You can also zoom in and out, depending on your needs.

Using this sort of graph, you can easily identify your applications' interaction with the rest of the system, as I explained earlier. You can also view the same set of events in their raw form by selecting the "Raw Trace" thumbnail, as seen in Figure 11-2.

Figure 11-2. Example LTT raw event list

If you would prefer not to use the graphic tool at all, you can use the tracevisualizer on the command line. In that case, the tracevisualizer command takes the two input files and generates a text file containing the raw event list. This list is the same as the one displayed in the "Raw Trace" thumbnail of the graphic interface. To dump the content of the trace in text, type:

$ tracevisualizer outt.trace outt.proc outt.data

The first two parameters of this command, outt.trace and outt.proc, are the input files I described earlier, and the last parameter, outt.data, is the output file to where the trace's content is dumped in text. You can also use one of the facilitating scripts such as tracedump or traceanalyze. We discuss LTT's analysis capabilities and the "Process analysis" thumbnail later in this chapter.