The purpose of this book is to help you understand the Beowulf approach to parallel computing. We describe here how to select the hardware components of computers and networks, how to configure and install the necessary system software, how to write parallel programs to take advantage of your new machine, and how to manage it for use by others.
This book concentrates on the concepts of Beowulf computing, since computing changes too fast for any detailed "Beowulf manual" to stay up to date for long. Many concepts are common to multiple generations of systems, and provide the basic for understanding the changing details of assembling, configuring, using, and managing a cluster.
We don't take a purely abstract approach, however. We give detailed examples drawn from current systems, which will be immediately useful. This book can thus serve as a practical guide to the current state of Beowulf computing as well as a map to the central issues, an understanding of which will have long-lasting value.
Since the first edition appeared, Beowulf computing has expanded rapidly, at all ranges of cluster sizes. The continuing drop in prices of both computers and networks has meant that more and more users are acquiring small and medium-sized systems for departmental and even personal use. At the high end, clusters are now amply represented in the Top500 list of the most capable machines in the world. Clusters available from cluster hardware vendors such as Dell and Linux Networx are even in the top 25.
Another development contributing to the expansion of the Beowulf community has been the emergence of effective automated cluster setup software. We survey some current systems in Chapter 6.
The fact that both pre-packaged cluster hardware and cluster software are available greatly simplifies the effort required to get a cluster up and running. Of course it is also possible (and common) for clusters to be assembled "by hand." This book will help you build your Beowulf yourself if that is your choice, and to understand both its hardware and software structure well even if you let others attend to the hardware construction and systems software installation.
Many additions and updates to the first edition make this second edition timely and more complete.
A new introductory chapter explains what sorts of applications Beowulf clusters are good for and provides a "road map" for reading the book.
The chapter on PVFS has been entirely rewritten to cover parallel file systems for clusters, including the three systems that are hot in the Beowulf community: GPFS, Lustre, and PVFS.
A new chapter on managing clusters covers the issues faced by systems administrators.
A new chapter on tuning networks for clusters includes information on network security. As Linux has matured, the typical Linux distribution has been optimized for interacting with the Internet, which requires strict security policies. This new chapter discusses how to configure your cluster for performance while retaining a secure system.
A new chapter describes the Scyld environment, which provides an illusion of a single system image to the user and the administrator.
A new chapter describes library and application software for numerical applications. Using a Beowulf no longer requires writing programs; there are already many available applications. Even if it is necessary to write software, existing powerful parallel libraries make it relatively easy to write many kinds of parallel applications. A new section in Chapter 8 shows how libraries written in MPI may be used to write programs that have no explicit use of MPI. Two sample programs that solve a linear and a nonlinear system of equations in parallel illustrate this approach.
A new chapter on parallel programming covers both the basic terms and ideas and presents some simple programming methods based on the manager/worker approach and using powerful scripting languages such as perl and python.
The MPI chapters now emphasize the new version of MPICH2 that supports all of MPI-1 and MPI-2, including the use of mpiexec (recommended in the MPI-2 standard) over mpirun.
As the software for Beowulfs matures, changes are inevitable. Each chapter has been updated to cover the current state of the software. Cluster hardware changes even faster than the software, and hence the hardware chapters have been rewritten, covering new processors and networks.
The high-level structure of the book breaks the huge topic of cluster computing into three parts.
Part I, Enabling Technologies describes the components, both hardware and software, that go into a Beowulf.
Part II, Parallel Programming shows how to write application programs for clusters, either by using functions built into Linux or by using any of a number of both general and special-purpose libraries.
Part III. Managing Clusters covers administration of clusters large and small, and includes a case study of a specific large cluster.