8.8 Tools for MPI Programs

8.8 Tools for MPI Programs

A number of tools are available for developing, testing, and tuning MPI programs. In this section, we describe some of the tools that are available from www.mcs.anl.gov/mpi. These tools work with most MPI implementations, not just MPICH2.

8.8.1 Profiling Libraries

The MPI Forum decided not to standardize any particular tool but rather to provide a general mechanism for intercepting calls to MPI functions, which is the sort of capability that tools need. The MPI standard requires that any MPI implementation provide two entry points for each MPI function: its normal MPI_ name and a corresponding PMPI version. This strategy allows a user to write a custom version of MPI_Send, for example, that carries out whatever extra functions might be desired, calling PMPI_Send to perform the usual operations of MPI_Send. When the user's custom versions of MPI functions are placed in a library and the library precedes the usual MPI library in the link path, the user's custom code will be invoked around all MPI functions that have been replaced.

Three such "profiling libraries" and some tools for creating more are provided in the MPE tools. MPE is available at ftp://ftp.mcs.anl.gov/pub/mpi/mpe.tar.gz.

8.8.2 Visualizing Parallel Program Behavior

The detailed behavior of a parallel program is surprisingly difficult to predict. It is often useful to examine a graphical display that shows the exact sequence of states that each process went through and what messages were exchanged at what times and in what order. The data for such a tool can be collected by means of a profiling library. One tool for looking at such log files is Jumpshot [126]. A screenshot of Jumpshot in action is shown in Figure 8.15.

Click To expand
Figure 8.15: Jumpshot displaying message traffic.

The horizontal axis represents time, and there is a horizontal line for each process. The states that processes are in during a particular time interval are represented by colored rectangles. Messages are represented by arrows. It is possible to zoom in for microsecond-level resolution in time.

Part III: Managing Clusters