For sparse matrices, more economical storage can be used, but the most foolproof algorithm is still Gaussian elimination. This is the principle behind SuperLU (Section 12.2.1). In certain applications, especially physics-based ones, the matrix has favorable properties that allow so-called iterative solution methods, which can be much more efficient than Gaussian elimination. The Aztec and PETSc packages are built around such iterative methods (Sections 12.2.2 and 12.2.3).
SuperLU [67, 112] is one of the foremost direct solvers for sparse linear system. It is available in single-processor, multithreaded, and parallel versions.
One of the aims of SuperLU is obtaining a high computational efficiency. To this end it finds cliques in the matrix graph. Eliminating these reduces the cost of the graph algorithms used; and since cliques lead to dense submatrices, it enables the use of higher-level BLAS routines.
The sequential and threaded versions of SuperLU use partial pivoting for numerical stability. Partial pivoting is avoided in the parallel version, however, because it would lead to large numbers of small messages. Instead, "static pivoting" is used, with repair of zero pivots during run time. To compensate for these numerically suboptimal strategies, the solution process uses iterative refinement to obtain the full available precision.
Like ScaLAPACK, SuperLU has both simple drivers and expert drivers; the latter give the user opportunity for further steering, return more detailed information, and are more sophisticated in terms of numerical precision.
While SuperLU accepts the user's matrix data structure (it must be in compressed column format), this is not a critical feature as it is in our discussion of the relative merits of PETSc (Section 12.2.3) and Aztec (Section 12.2.2), for the following reason. SuperLU, being a direct method, generates large amounts of data for the factorization, making the savings from reusing the user data and the extra matrix storage in the parallel case relatively unimportant.
ScaLAPACK accepts two input modes: one where the matrix is distributed and the other where the matrix is replicated on every processor. The former mode is less efficient because it requires more data redistribution.
SuperLU is written in C and cannot easily be used from Fortran. The standard installation comes with its own collection of BLAS routines; one can edit the makefile to ensure that an optimized version of the BLAS library is used.
The Aztec package [57, 6] has as its main focus linear system solving. While it is not so sophisticated as PETSc, it has two advantages:
It has far fewer routines, so the learning curve is conceivably shorter.
It uses the user's matrix data structure and thus is easier to integrate in existing applications and to avoid duplication of storage. (See Section 12.2.3 for a discussion of this issue in PETSc.)
Thus, Aztec is an attractive choice for supplying matrix-vector product and linear system solution routines for use in Arpack (Section 12.3.2).
Aztec is written in C but supports a full set of Fortran interfaces.
Aztec supports a few parallel sparse matrix formats, in particular a parallel form of compressed row storage. The user first partitions the matrix over the parallel processors using the global numbering for the element indices; Aztec then transforms the matrix to a local (on-processor) indexing scheme.
The following code illustrates the gist of an Aztec iterative solution program:
AZ_transform(proc_config,&external, idx,mat_el,update, &update_index,&extern_index,&data_org, n_update, index,bpntr,rpntr,&cpntr,AZ_MSR_MATRIX); AZ_defaults(options,params); options[AZ_conv] = AZ_r0; params[AZ_tol] = rtol; options[AZ_solver] = AZ_bicgstab; options[AZ_precond] = AZ_Jacobi; options[AZ_max_iter] = maxit; AZ_reorder_vec(invec,data_org,update_index,rpntr); AZ_solve(outvec,invec, options,params, index,idx,rpntr,cpntr,bpntr,mat_el,data_org, status,proc_config); iterations = status[AZ_its]; convergence = (status[AZ_why]==AZ_normal);
Aztec is no longer under development but has been incorporated in a larger Sandia project, Trilinos [122],[1] that includes linear and nonlinear solvers, with time-stepping methods and eigensolvers planned.
Trilinos is based on an object-oriented design with matrix/vector classes and an abstract solver interface that are specified pure virtual. A linear algebra library called Epetra implements this interface, but the user can write a matrix and vector class, thereby using the Trilinos algorithms on the user data structures.
Apart from the Epetra lower layer, Trilinos contains the algorithms of Aztec, plus (among others) Belos (a block Krylov package), IFPACK (which has Schwarz preconditioners with local ILU), and the ML algebraic multilevel package.
PETSc is a library for the solution of partial differential equations. It features tools for manipulating sparse matrix data structures, a sizable repertoire of iterative linear system solvers and preconditioners, and nonlinear solvers and time-stepping methods. Although it is written in C, it includes Fortran and F90 interfaces.
PETSc differs from other libraries in a few aspects. First, it is usable as a tool box: many low-level routines can be used to implement new methods. In particular, PETSc provides tools for parallel computation (VecScatter objects) that offer an abstraction layer over straight MPI communication.
Second, PETSc's approach to parallelism is very flexible. Many routines operate on local matrices as easily as on distributed ones. Impressively, during the construction of a matrix any processor can specify the value of any matrix element. This approach, for instance, facilitates writing parallel FEM codes because, along processor boundaries, elements belonging to different processors will contribute to the value of the same matrix element.
A third difference between PETSc and other packages (often counted as a disadvantage) is that its data structures are internal and not explicitly documented. Unlike Aztec (Section 12.2.2), which accepts the user's matrix, PETSc has its own data structure, built up by passing matrix elements through function calls.
MatCreate(comm,...,&A); for (i=... ) for (j= ... ) MatSetValue(A,...,i,j,value,...);
Thus, the user faces the choice of maintaining duplicate matrices (one in the native user format and one in PETSc format) with the resulting storage overhead or of using PETSc throughout the code. However, because PETSc provides a large set of operations, many applications can be written using PETSc for all matrix operations. In this case, there is no duplicate storage because the only storage is within the PETSc routines. In addition, PETSc provides a way, though what are called "shell" objects, to make direct use of application data structures. This provides a modular alternative to the "reverse communication" approach used by Arpack.
Once PETSc data objects have been built, they are used in an object-oriented manner, where the contents and the exact nature of the object are no longer visible:
MatMult(A,X,Y);
Likewise, parameters to the various objects are kept internal:
PCSetType(pc,PCJACOBI);
Of particular relevance in the current context is that after the initial creation of an object, its parallel status is largely irrelevant.
[1]As of this writing, a first public release of Trilinos is scheduled for the second half of 2003.