8.1 Hello World in MPI

To see what an MPI program looks like, we start with the classic "hello world" program. MPI specifies only the library calls to be used in a C, Fortran, or C++ program; consequently, all of the capabilities of the language are available. The simplest "Hello World" program is shown in Figure 8.1.

#include "mpi.h"
#include <stdio.h>

int main( int argc, char *argv[] )
{
    MPI_Init( &argc, &argv );
    printf( "Hello World\n" );
    MPI_Finalize();
    return 0;
}

Figure 8.1: Simple "Hello World" program in MPI.

All MPI programs must contain one call to MPI_Init (or MPI_Init_thread, described in Section 9.9) and one to MPI_Finalize. All other^[2] MPI routines must be called after MPI_Init and before MPI_Finalize. All C and C++ programs must also include the file 'mpi.h'; Fortran programs must either use the MPI module or include mpif.h.

The simple program in Figure 8.1 is not very interesting. In particular, all processes print the same text. A more interesting version has each process identify itself. This version, shown in Figure 8.2, illustrates several important points. Of particular note are the variables rank and size. Because MPI programs are made up of communicating processes, each process has its own set of variables. In this case, each process has its own address space containing its own variables rank and size (and argc, argv, etc.). The routine MPI_Comm_size returns the number of processes in the MPI job in the second argument. Each of the MPI processes is identified by a number, called the rank, ranging from zero to the value of size minus one. The routine MPI_Comm_rank returns in the second argument the rank of the process. The output of this program might look something like the following:

    Hello World from process 0 of 4
    Hello World from process 2 of 4
    Hello World from process 3 of 4
    Hello World from process 1 of 4

#include "mpi.h"
#include <stdio.h>

int main( int argc, char *argv[] )
{
    int rank, size;

    MPI_Init( &argc, &argv );
    MPI_Comm_rank( MPI_COMM_WORLD, &rank );
    MPI_Comm_size( MPI_COMM_WORLD, &size );
    printf( "Hello World from process %d of %d\n", rank, size );
    MPI_Finalize();
    return 0;
}

Figure 8.2: A more interesting version of "Hello World".

Note that the output is not ordered from processes 0 to 3. MPI does not specify the behavior of other routines or language statements such as printf; in particular, it does not specify the order of output from print statements. However, there are tools, built using MPI, that can provide ordered output of messages.

8.1.1 Compiling and Running MPI Programs

The MPI standard does not specify how to compile and link programs (neither do C or Fortran). However, most MPI implementations provide tools to compile and link programs.

For example, one popular implementation, MPICH, provides scripts to ensure that the correct include directories are specified and that the correct libraries are linked. The script mpicc can be used just like cc to compile and link C programs. Similarly, the scripts mpif77, mpif 90, and mpicxx may be used to compile and link Fortran 77, Fortran, and C++ programs.

If you prefer not to use these scripts, you need only ensure that the correct paths and libraries are provided. The MPICH implementation provides the switch -show for mpicc that shows the command lines used with the C compiler and is an easy way to find the paths. Note that the name of the MPI library may be 'libmpich.a', 'libmpi.a', or something similar and that additional libraries, such as 'libsocket.a' or 'libgm.a', may be required. The include path may refer to a specific installation of MPI, such as '/usr/include/local/mpich2-1.0/include'.

Running an MPI program (in most implementations) also requires a special program, particularly when parallel programs are started by a batch system as described in Chapter 14. Many implementations provide a program mpirun that can be used to start MPI programs. For example, the command

    mpirun -np 4 helloworld

runs the program helloworld using four processes.

The name and command-line arguments of the program that starts MPI programs were not specified by the original MPI standard, just as the C standard does not specify how to start C programs. However, the MPI Forum did recommend, as part of the MPI-2 standard, an mpiexec command and standard command-line arguments to be used in starting MPI programs. A number of MPI implementations including the all-new version of MPICH, called MPICH2, now provide mpiexec. The name mpiexec was selected because no MPI implementation was using it (many are using mpirun, but with incompatible arguments). The syntax is almost the same as for the MPICH version of mpirun; instead of using -np to specify the number of processes, the switch -n is used:

    mpiexec -n 4 helloworld

The MPI standard defines additional switches for mpiexec; for more details, see Section 4.1, "Portable MPI Process Startup," in the MPI-2 standard. For greatest portability, we recommend that the mpiexec form be used; if your preferred implementation does not support mpiexec, point the maintainers to the MPI-2 standard.

Most MPI implementations will attempt to run each process on a different processor; most MPI implementations provide a way to select particular processors for each MPI process.

8.1.2 Adding Communication to Hello World

The code in Figure 8.2 does not guarantee that the output will be printed in any particular order. To force a particular order for the output, and to illustrate how data is communicated between processes, we add communication to the "Hello World" program. The revised program implements the following algorithm:

    Find the name of the processor that is running the process
    If the process has rank > 0, then
        send the name of the processor to the process with rank 0
    Else
        print the name of this processor
        for each rank,
            receive the name of the processor and print it
    Endif

This program is shown in Figure 8.3. The new MPI calls are to MPI_Send and MPI_Recv and to MPI_Get_processor_name. The latter is a convenient way to get the name of the processor on which a process is running. MPI_Send and MPI_Recv can be understood by stepping back and considering the two requirements that must be satisfied to communicate data between two processes:

Describe the data to be sent or the location in which to receive the data
Describe the destination (for a send) or the source (for a receive) of the data.

#include "mpi.h"
#include <stdio.h>

int main( int argc, char *argv[] )
{
    int  numprocs, myrank, namelen, i;
    char processor_name[MPI_MAX_PROCESSOR_NAME];
    char greeting[MPI_MAX_PROCESSOR_NAME + 80];
    MPI_Status status;

    MPI_Init( &argc, &argv );
    MPI_Comm_size( MPI_COMM_WORLD, &numprocs );
    MPI_Comm_rank( MPI_COMM_WORLD, &myrank );
    MPI_Get_processor_name( processor_name, &namelen );

    sprintf( greeting, "Hello, world, from process %d of %d on %s",
             myrank, numprocs, processor_name );

    if ( myrank == 0 ) {
        printf( "%s\n", greeting );
        for ( i = 1; i < numprocs; i++ ) {
            MPI_Recv( greeting, sizeof( greeting ), MPI_CHAR,
                      i, 1, MPI_COMM_WORLD, &status );
            printf( "%s\n", greeting );
        }
    }
    else {
        MPI_Send( greeting, strlen( greeting ) + 1, MPI_CHAR,
                  0, 1, MPI_COMM_WORLD );
    }

    MPI_Finalize( );
    return 0;
}

Figure 8.3: A more complex "Hello World" program in MPI. Only process 0 writes to stdout; each process sends a message to process 0.

In addition, MPI provides a way to tag messages and to discover information about the size and source of the message. We will discuss each of these in turn.

Describing the Data Buffer

A data buffer typically is described by an address and a length, such as "a,100," where a is a pointer to 100 bytes of data. For example, the Unix write call describes the data to be written with an address and length (along with a file descriptor). MPI generalizes this to provide two additional capabilities: describing noncontiguous regions of data and describing data so that it can be communicated between processors with different data representations. To do this, MPI uses three values to describe a data buffer: the address, the (MPI) datatype, and the number or count of the items of that datatype. For example, a buffer a containing four C ints is described by the triple "a, 4, MPI_INT." There are predefined MPI datatypes for all of the basic datatypes defined in C, Fortran, and C++. The most common datatypes are shown in Table 8.1.

Table 8.1: The most common MPI datatypes. C and Fortran types on the same row are often but not always the same type. The type MPI_BYTE is used for raw data bytes and does not correspond to any particular datatype. The type MPI_PACKED is used for data that was incrementally packed with the routine MPI_Pack. The C++ MPI datatypes have the same name as the C datatypes but without the MPI_prefix, for example, MPI::INT.
C		Fortran
?	MPI type	?	MPI type

int	MPI_INT	INTEGER	MPI_INTEGER
double	MPI_DOUBLE	DOUBLE PRECISION	MPI_DOUBLE_PRECISION
float	MPI_FLOAT	REAL	MPI_REAL
long	MPI_LONG	?	?
char	MPI_CHAR	CHARACTER	MPI_CHARACTER
?	?	LOGICAL	MPI_LOGICAL
—	MPI_BYTE	—	MPI_BYTE
—	MPI_PACKED	—	MPI_PACKED

Describing the Destination or Source

The destination or source is specified by using the rank of the process. MPI generalizes the notion of destination and source rank by making the rank relative to a group of processes. This group may be a subset of the original group of processes. Allowing subsets of processes and using relative ranks make it easier to use MPI to write component-oriented software (more on this in Section 9.4). The MPI object that defines a group of processes (and a special communication context that will be discussed in Section 9.4) is called a communicator. Thus, sources and destinations are given by two parameters: a rank and a communicator. The communicator MPI_COMM_WORLD is predefined and contains all of the processes started by mpirun or mpiexec. As a source, the special value MPI_ANY_SOURCE may be used to indicate that the message may be received from any rank of the MPI processes in this MPI program.

Selecting among Messages

The "extra" argument for MPI_Send is a nonnegative integer tag value. This tag allows a program to send one extra number with the data. MPI_Recv can use this value either to select which message to receive (by specifying a specific tag value) or to use the tag to convey extra data (by specifying the wild card value MPI_ANY_TAG). In the latter case, the tag value of the received message is stored in the status argument (this is the last parameter to MPI_Recv in the C binding). This is a structure in C, an integer array in Fortran, and a class in C++. The tag and rank of the sending process can be accessed by referring to the appropriate element of status as shown in Table 8.2.

Table 8.2: Accessing the source and tag after an MPI_Recv.
C	Fortran	C++

status.MPI_SOURCE	status(MPI_SOURCE)	status.Get_source()
status.MPI_TAG	status(MPI_TAG)	status.Get_tag()

Determining the Amount of Data Received

The amount of data received can be found by using the routine MPI_Get_count. For example,

    MPI_Get_count( &status, MPI_CHAR, &num_chars );

returns in num_chars the number of characters sent. The second argument should be the same MPI datatype that was used to receive the message. (Since many applications do not need this information, the use of a routine allows the implementation to avoid computing num_chars unless the user needs the value.)

Our example provides a maximum-sized buffer in the receive. It is also possible to find the amount of memory needed to receive a message by using MPI_Probe, as shown in Figure 8.4.

    char *greeting;
    int num_chars, src;
    MPI_Status status;
    ...
    MPI_Probe( MPI_ANY_SOURCE, 1, MPI_COMM_WORLD, &status );
    MPI_Get_count( &status, MPI_CHAR, &num_chars );
    greeting = (char *)malloc( num_chars );
    src      = status.MPI_SOURCE;
    MPI_Recv( greeting, num_chars, MPI_CHAR,
              src, 1, MPI_COMM_WORLD, &status );

Figure 8.4: Using MPI_Probe to find the size of a message before receiving it.

MPI guarantees that messages are ordered, that is, that messages sent from one process to another arrive in the same order in which they were sent and that an MPI_Recv after an MPI_Probe will receive the message that the probe returned information on as long as the same message selection criteria (source rank, communicator, and message tag) are used. Note that in this example, the source for the MPI_Recv is specified as status.MPI_SOURCE, not MPI_ANY_SOURCE, to ensure that the message received is the same as the one about which MPI_Probe returned information.

^[2]There are a few exceptions, including MPI_Initialized.