6.2 Hardware Provisioning Challenges and Best Practices

6.2 Hardware Provisioning Challenges and Best Practices

Chapters 2 and 4 describe the critical hardware choices that one has to make when constructing a cluster. In this section, we describe a few "tricks of the trade" that, down the road, can make a huge difference in terms of neatness, maintainability, and, ultimately, reliability. The watchword is organization. Neat cables aren't just to look pretty, they can significantly improve your ability to debug some types of problems on the cluster. Labeling cables and nodes is always helpful, but having a regular layout is essential. For obvious reasons, powers of two (and/or multiples of 8) are natural quantities to deal with in the computing world, and clusters are no different. There are four key areas to focus on in hardware provisioning

  • Layout of nodes—Rackmount vs. Workstation Towers vs. Blades

  • Cable management

  • Airflow management

  • Power Management

Paying attention to these issues for that pile of boxes in the corner will make your cluster last longer and be more stable. Building a cluster is fun and rewarding, take the small amount of time to plan out your physical layout.

Node Layout and Cable Management

Rackmount systems are perhaps the most convenient way to stack nodes into a small space. in Figure 6.2, one can see the front side and back side of a typical racking system. Cluster nodes themselves are defined in terms of a standard rack unit or "U". One Rack Unit is 1.75" (4.45cm) and standard height (2 Meter tall) racks have 42U of available space. Rackmount nodes are typically called servers, but there are plenty of hardware chassis that are rack mountable and take standard motherboards. As one gets more densely packed such as in a tower full of 1U servers, CAP (Cable, Airflow, and Power) becomes of paramount importance. We will take some time to detail out these issues for rack-based systems and then make comments on how these can be carried over to shelves of desktops and newer blade servers.

In cable management, groups of four (4) and eight (8) are the tickets to success. In Figure 6.1, one can see 8 power cables in one bundle and 4 ethernet cables in another bundle using wire ties available from the local home improvement store. To prebundle the cables, just lay them out on the floor, and wrap a wire tie every 6–12 inches (15–30 cm). Clip off the excess from each wire tie and you have taken just a few minutes to create nice, tidy packages. Do this with all of your cables. You will soon discover that a 128 node cluster can be wired with 16 power bundles and 16 ethernet bundles. A bit of pre-planning cable lengths is needed, especially in the case of workstation towers. In this case you might prebundle a set of cables that contains two each of 5,6,7 and 8 foot long ethernet cables. At one end the cables are even so that they plug easily into the ethernet switch, the others are of the correct length to plug into a specific server that are sitting in a line next to each other in a shelf configuration. If towers are 6 inches wide, then the 8th tower is about four feet (about 1 meter) further away than the first one. If on the other hand, you have rackmounted 2U servers then the top server in a "bank" of 8 is only about 15 inches (40cm) away from the first one. In this case, cabling 8 ethernet cables of the same length often works well.

Click To expand
Figure 6.1: Cable bundles. Wire ties make 8 power cables into a neat and managable group

The power cables are also grouped and bundled. It turns out that power cables are actually quite a headache. They are big, bulky, heavy and rarely are close to the correct length. What you decide to do with the power cables can have a significant impact on cooling. Figure 6.2 illustrates how cables are pulled to the sides of the rack allowing for unrestricted airflow. This is really one of the compelling reasons to bundle cables—neatness improves the ability of the chassis to cool themselves by getting the cables out of the airflow. Heat kills clusters and blocking the airflow is a common mistake. High-end nodes often dissipate 150–200 Watts, so a rack of 32 such servers is equivalent to 4 hair dryers running at high. As processor speeds improve, power consumption always goes up before it comes back down as the semiconductor process is improved.

Click To expand
Figure 6.2: The back of a rack, showing the clean organization of the cables. Note that the fans are unobstructed.

Power consumption of needs means power planning in the number circuits, and the number of power distribution units. In reality, power is just another network. Take the power consumption seriously—there are many cases of overloaded power strips melting, or worse, catching fire. There are many rules of thumb for how many machines can go on single circuit. The critical observation is to not get too close to the max current of your circuit and to use thick enough power cabling. Standard power distribution units (PDUs) are significantly better than the $2.00 power strip from the clearance table at the local hardware store. PDUs run about $10.00/outlet and have quality cabling that won't overheat even as the current load increases. Remember, a Beowulf cluster is a personal supercomputer, it has the electricty appetite to match. Network controlled PDUs generally run at about $50/outlet, and these enable you to cycle power remotely. This is a very nice convenience for large installations.

Part III: Managing Clusters