This book may also be used as an introduction to the various areas of Beowulf computing. Each part, and to some extent, each chapter may be read independently of the others. This section makes recommendations based on how you intend to use your cluster, providing a different persective on the book than that presented in the preceding section. Additional information on all of these topics may be found in the reading list in Appendix B and the URLs in Appendix C.
If you are using a cluster that someone else is operating, you need only learn how to program and run applications.
Part II covers programming clusters. Even if you do not intend to develop your own parallel applications, we recommend reading Chapter 7, which provides an overview of the technologies. For a deeper understanding of the parallel programming technologies, read the chapters on MPI (Chapters 8 and 9) and PVM (Chapters 10 and 11). Even if you plan to write your own parallel software, you should read Chapter 12 on parallel software and libraries. You may find that what you need has already been written!
Once you have your application, you will need to run your program. Part III covers tools for managing and using a cluster. Many clusters will use some kind of workload management system to mediate use of the cluster among the user community. Chapter 14 provides an overview of the concepts and capabilities of these systems. You should also read the chapter that corresponds to the workload system that is used on your cluster: Condor (Chapter 15), Maui (Chapter 16), PBS (Chapter 17), or Scyld (Chapter 18). If your application requires a high-performance, parallel I/O system, read Chapter 19 on the Parallel Virtual File System. These chapters cover information of interest to both the system administrator and the cluster user, so skip over material that doesn't apply to you.
First, re-read this chapter and pay close attention to the discussion of application requirements. These requirements will guide you in your choice of cluster components. Chapters 2 and 4 describe the choices of processor, network, and other hardware. Even if you plan to buy a preassembled cluster, these chapters will help you understand the various choices of components and aid you in understanding the specifications of a cluster. Chapter 2 also covers some of the issues of assembling your own cluster.
Operating a cluster requires an understanding of the operating system. Chapter 3 provides a brief introduction along with a discussion of cluster-specific issues. Chapter 6 describes tools for setting up a cluster. An introduction to managing a cluster from the point of view of the system administrator is presented in Chapter 13. Chapter 14 provides an overview of the concepts and capabilities of these systems. The chapters on the individual systems provide information on both the use and management of workload management systems: Condor (Chapter 15), Maui (Chapter 16), PBS (Chapter 17), or Scyld (Chapter 18). Once the cluster is up and running, you may need to tune the network and operating system. Chapter 3 provides some information on tuning the OS; Chapter 5 discusses techniques for tuning the network and communication systems. Finally, Chapter 20 provides a case study of two generations of a major cluster system, illustrating particular choices and best practices.