5.1 Cluster Network Designs

Just as many styles of node and network hardware exist, so do a wide variety of cluster network designs. To understand why such variety exists is to understand how the choice of network design directly affects the operation of the cluster as a whole. In general, the motivation for such variation comes from the striving to achieve a perfect balance of usability, performance, and security. As a result, the cluster designer has realized that network design impacts all of these in important ways.

5.1.1 Impact of Network Design

Although it would be impossible to fully enumerate how a cluster network design impacts the overall look, feel, and operation of a cluster, there are some key aspects that are directly and substantially affected.

One of the first issues affected by the cluster network is security. Questions include how secure the system is from outsider attacks, how we maintain security over time, and how the cluster fits within institutional security requirements. The cluster network design should directly address all of these issues since the primary security defenses are often implemented inside the institution's network itself.

The cluster's usability is defined by how users interact with the system and what types of applications will use the cluster. Application requirements impact every aspect of the cluster design, and the cluster network is no exception. If the cluster is designed to run a single application, the designers can make very focused decisions about how the user(s) can employ the machine. If the cluster is meant to be a general resource for students, researchers, etc., then intuitiveness and ease of use must be considered.

Finally, we cannot overlook the impact of the cluster network design on application performance. The cluster network may impose bottlenecks that could limit the performance of an application. The designer must be aware that some decisions, while bolstering the security and usability of the cluster, can seriously impact the performance of applications.

5.1.2 Example Designs

Over time, cluster network designs have evolved from simple networks of desktops and servers. Modern designs focus more on the specific realm of high performance computing and thus often mirror network designs that large site administrators have been employing for years. The cluster community has built upon this substantial groundwork to generate a wide variety of network topologies. As we examine some common network designs, we should remember that the examples are a small subset of the many possibilities. For each of the following design descriptions, we could imagine a dozen permutations, each having a different positive or negative impact on overall cluster issues.

The first, and probably the simplest, style of cluster is the fully connected system. In this case, all nodes in the system, as well as any front end servers or login machines, are simply connected to the Internet the same way as any non-cluster server or workstation. The major benefit of this design is obvious: very little work is required to initially bring the system online. While the simplicity of such a design is attractive, the users and administrators of these systems must constantly be aware of all the implications. Security, for instance, will be a major concern. Although each node is easily accessed by legitimate users and administrators from anywhere on the Internet, each system is equally accessible to malicious outside attackers.

A simple optimization would be to reduce the number of systems visible to the Internet. Such a system would have a publicly available front end login machine, with all other nodes hidden behind a firewall and only visible from that front end machine. A user would log into the front end and then have access to cluster nodes. Although such a design provides tighter system security, we still have a machine visible to the Internet. Internet visibility is inherently problematic, but certainly does not make the system impossible to tightly secure. One interesting disadvantage of this design is that users whose work requires compute resources to be Internet accessible are unable to use such a system.

Going one step further, measures could be taken to completely block all access to compute nodes, even from the user. The user would log into a cluster front end (login) machine and would perform local operations such as compilation or preliminary testing. When the user's program is ready to be run on the cluster, it is submitted as a job to the cluster scheduler. When sufficient compute nodes become available, the scheduler runs the job on the user's behalf. Notice, in this design, the cluster nodes are never directly accessed by the user. The nodes are therefore completely hidden to all entities except the scheduler and other cluster services. We further could extend this concept by disallowing users access to the front end machine. Instead, the cluster would only accept jobs from a meta scheduler.

One interesting design simulates a large multi-processor computer with a single system image on a Linux cluster. By running custom operating systems, nodes become nearly invisible to users or outside influences. On such a system, users would employ an OS level mechanism present on the login machine (which may or may not be externally visible) to run processes on the cluster compute nodes. The biggest advantage of this design is ease of use for the application user. The user interface to a single system image avoids the common problems of managing remote processes. Disadvantages may arise when a user needs direct access to the compute nodes, which is prohibited by the nature of the system.

Cluster designers have put tremendous efforts into creating network topologies suited to their individual needs. Many designers have made their experiences and technologies available for other cluster designers to use. For some real life examples of cluster designs, see Chapters 6, 18, and 20.

Armed with an awareness of various cluster network configurations, as well as some of the most importantly impacted issues, the cluster designer can embark on designing a network that optimally addresses individual needs. However, knowing the issues and possibilities at hand is only the first step. We must understand the simplest case of cluster network designs and some of the concepts surrounding their construction. In the sections that follow, we introduce customary communication protocols and give a short overview of Linux networking concepts and services, before delving into the construction of a simple cluster network.