Recipe 9.15 Tracing Processes

9.15.1 Problem

You want to know what an unfamiliar process is doing.

9.15.2 Solution

To attach to a running process and trace system calls:

# strace -p pid

To trace network system calls:

# strace -e trace=network,read,write ...

9.15.3 Discussion

The strace command lets you observe a given process in detail, printing its system calls as they occur. It expands all arguments, return values, and errors (if any) for the system calls, showing all information passed between the process and the kernel. (It can also trace signals.) This provides a very complete picture of what the process is doing.

Use the strace -p option to attach to and trace a process, identified by its process ID, say, 12345:

# strace -p 12345

To detach and stop tracing, just kill strace. Other than a small performance penalty, strace has no effect on the traced process.

Tracing all system calls for a process can produce overwhelming output, so you can select sets of interesting system calls to print. For monitoring network activity, the -e trace=network option is appropriate. Network sockets often use the generic read and write system calls as well, so trace those too:

$ strace -e trace=network,read,write finger katie@server.example.com
...
socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 4
connect(4, {sin_family=AF_INET, 
            sin_port=htons(79), 
            sin_addr=inet_addr("10.12.104.222")}, 16) = 0 
write(4, "katie", 5)                    = 5
write(4, "\r\n", 2)                     = 2
read(4, "Login: katie          \t\t\tName: K"..., 4096) = 244
read(4, "", 4096)                      = 0
...

The trace shows the creation of a TCP socket, followed by a connection to port 79 for the finger service at the IP address for the server. The program then follows the finger protocol by writing the username and reading the response.

By default, strace prints only 32 characters of string arguments, which can lead to the truncated output shown. For a more complete trace, use the -s option to specify a larger maximum data size. Similarly, strace abbreviates some large structure arguments, such as the environment for new processes: supply the -v option to print this information in full.

You can trace most network activity effectively by following file descriptors: in the previous example, the value is 4 (returned by the socket-creation call, and used as the first argument for the subsequent system calls). Then match these values to the file descriptors displayed in the FD column by lsof. [Recipe 9.14]

When you identify an interesting file descriptor, you can print the transferred data in both hexadecimal and ASCII using the options -e [read|write]=fd:

$ strace -e trace=read -e read=4 finger katie@server.example.com
...
read(4, "Login: katie          \t\t\tName: K"..., 4096) = 244
 | 00000  4c 6f 67 69 6e 3a 20 6b  61 74 69 65 20 20 20 20  Login: k atie     |
 | 00010  20 20 20 20 20 20 09 09  09 4e 61 6d 65 3a 20 4b        .. .Name: K |
...

strace watches data transfers much like network packet sniffers do, but it also can observe input/output involving local files and other system activities.

If you trace programs for long periods, ask strace to annotate its output with timestamps. The -t option records absolute times (repeat the option for more detail), the -r option records relative times between system calls, and -T records time spent in the kernel within system calls. Finally, add the strace -f option to follow child processes.^[6]

^[6] To follow child processes created by vfork, include the -F option as well, but this requires support from the kernel that is not widely available at press time. Also, strace does not currently work well with multithreaded processes: be sure you have the latest version, and a kernel Version 2.4 or later, before attempting thread tracing.

Each line of the trace has the process ID added for children. Alternatively, you can untangle the system calls by directing the trace for each child process to a separate file, using the options: