13.8 Software upgrades

13.8 Software upgrades

Whoever coined the saying that only death and taxes are certain was definitely not a system or cluster administrator. As certain as death and taxes are software upgrades. How many software packages that continue to be useful don't change? Even when a package is stable, the environment around it constantly changes, making new versions necessary to fix new issues derived from this evolving environment.

The scope of the impact of a software upgrade can vary tremendously. At the low end are upgrades that do not affect other software packages on the system, such as the version of a particular numerical library. At the high-impact end are distribution upgrades that change the version of libc, the standard C library, which can have a ripple effect through many of the software packages on a system, and a large set of in-between upgrades that can affect a varying number of users and applications.

Upgrading software on an individual cluster machine is similar to upgrading software on non-cluster machines. In many respects clusters should be managed like non-cluster machines. If you are dealing with a production cluster that servers a large user community, then all the standard practices should be followed, such as pre-change testing and a carefully planned and communicated migration path.

One of the most critical reasons to upgrade cluster software is to address security vulnerabilities. If your cluster is reachable by the world at large or by potential hackers you should keep a close eye out for security advisories for your kernel, system services, and any other software component that could be used to compromise a system. Some of the most useful resources to keep an eye on for software vulnerability and fix information are:

  1. the vendor supplying the kernel and distribution you use,

  2. the U.S. Department of Energy Computer Incident Advisory Capability, also known as CIAC, which can be found at http://www.ciac.org/ciac/, and

  3. the CERT Coordination Center: a federally funded Internet security research and development center which can be found at http://www.cert.org/.

Part III: Managing Clusters