12.5 Load Balancing

Many applications can be distributed in more than one way over a parallel architecture. Even if one distribution is the natural result of one component of the computation (for instance, setup of a grid and generation of the matrix), a subsequent component (for instance, an eigenvalue calculation) may be so labor intensive that the cost of a full data redistribution may be outweighed by resulting gains in parallel efficiency.

In this section we discuss two packages for graph partitioning: ParMetis and Chaco. These packages aim at finding a partitioning of a graph that assigns roughly equally sized subgraphs to processors, thereby balancing the work load, while minimizing the size of the separators and the consequent communication cost.

12.5.1 ParMetis

ParMetis [99, 84] is a parallel package for mesh or graph partitioning for parallel load balancing. It is based on a three-step coarsening/partitioning/uncoarsening algorithm that the authors claim is faster than multiway spectral bisection. It can be used in several modes, for instance, repartitioning graphs from adaptively refined meshes or partitioning graphs from multiphysics simulations.

The input format of ParMetis, in its serial form, is a variant on compressed matrix storage. The adjacency of each element is stored consecutively (excluding the diagonal, but for each pair u, v storing both (u, v) and (v, u)), with a pointer array indicating where each element's data starts and ends. Both vertex and edge weights can be specified optionally. The parallel version of the graph input format takes blocks of consecutive nodes and allocates these to subsequent processors. An array that is identical on each processor then indicates which range of variables each processor owns. The distributed format uses global numbering of the nodes.

The output of ParMetis is a mapping of node numbers to processors. No actual redistribution is performed.

12.5.2 Chaco

The Chaco [24] package comprises several algorithms for graph partitioning, including inertial, spectral, Kernighan-Lin, and multilevel algorithms. It can be used in two modes:

stand-alone In this mode, input and output are done through files.
library Chaco can be linked to C or Fortran codes, and all data is passed through arrays.

Unlike ParMetis, Chaco runs only sequentially.

Zoltan [128] is a package for dynamic load balancing that builds on top of Chaco. Thanks to an object-oriented design, it is data structure neutral, so it can be interfaced by using existing user data structures.