6.6 The OSCAR Toolkit

6.6 The OSCAR Toolkit

The Open Source Cluster Application Resource (OSCAR) uses imaging as its primary method of installing the operating system on compute nodes of a cluster. Because it is image-based, OSCAR supports a wider array of Linux distributions (Redhat 7.2, 7.3 and Mandrake 8.0 as of this writing) with the with the same cluster tool stack, but is more limited in its hardware support. The more limited hardware support juxtaposed to supporting more distributions seems to be an oxymoron. One has to examine exactly how image-based installers actually work to see why this is the case.

6.6.1 How Image-based Installers Work

The most primitive image program is the venerable Unix dd command. With dd, one can save, bit-for-bit, a disk partition or entire disk and store it as a file. The problem is that restoring such an image in a naive way requires that the new hardware be in everyway identical. For disks, this level of identity is down to the geometry and cylinder count. Modern image-based installers take this basic capability, but then add some critical features to significantly increase their utility across hardware.

The first key insight on how imaging works is to treat a disk (or partition) image as file system. Let's digress with an example. Suppose you have a Linux system with a root partition in '/dev/hdal' and a separate partition (e.g. scratch) with enough free space to hold a complete image of the root. Then try the following sequence (as root):

  • # dd if=/dev/hdal of=/scratch/root.image

  • # mkdir /mnt/root

  • # mount /scratch/root.image /mnt/root -o loop

  • # ls -l /mnt/root

As you make changes to the '/mnt/root', the contents of '/scratch/root. image' are updated. When you unmount the file system, those changes are saved in the original image file. So it is really straightforward to take an image of system, save it, update the image by using standard tools and tricks. Because the entire root file system is available in an image, there are no limits on what could done to it. Files (like 'fstab', 'hosts', IP configuration, and more) can added, edited or deleted. In fact, because it is the raw file system, it theoretically doesn't matter if the distribution is Redhat, Mandrake, Debian, or any of the 100's of Linux distributions that are out there. Practically, the installer most know something about the file layout to be efficient and therefore only a small subset of distributions is actually supported by any image-based installer. The one key feature that many admins like about image based techniques is that they can handcraft a configuration and then take a snaphot. Image-based installers help with the replication of this snapshot.

The second critical piece of image-based management is the customized installer. The installer must download an image from a server, customize some portions of it for the target node, and then install the updated image on the particular hardware of the node, taking into account small differences in hardware. An example of necessary customization is changing the network configuration file which must be be updated to a new node's IP address. If this isn't done properly, then nodes would be are identical in everyway—even to their IP address—which obviously leads to an unusable cluster. The installer, like System Imager used in OSCAR can make several changes based upon differences in node hardware. It supports the most common adjustments without intervention by the administrator: changes in the ethernet driver, changes in disk drive geometry (but not in disk type), and memory size differences. Because the installer itself is designed to handle a variety of distributions, the onus of basic hardware detection (e.g. disk geometry, network driver) is in the installer and not on the distribution. Resource constraints in supporting the imaging software leads to the reality that only a subset of hardware can be supported. In OSCAR, for example, IDE and SCSI devices are supported by the installer, but IDE and SCSI hardware RAID (e.g. HP Proliant's Integrated Drive Array, '/dev/ida/') is not understood by the installer and hence not supported. A further constraint is the the installer itself is a specialized program that runs a customized Linux kernel. The kernel may not have the complete set of device drivers needed to run your hardware, even if the distribution natively supports your hardware. OSCAR allows users to build customized installation kernels to handle the case where an administrator can identify manually the needed driver. Even though the above dd-based example is straightforward, installing and customizing images is actually quite complex: to make configuration changes, the installer must understand the file system, layout, and location of config files to do localization. Small differences, like choosing inetd over xinetd, must be dealt with to manage across distributions.

6.6.2 Bootstrapping and Configuration

OSCAR assumes a working head node—which generally is installed "by hand" using the tools of the base distribution (Mandrake or Redhat). The OSCAR toolset is then installed afterwards and requires additional configuration steps. The core of OSCAR is a set of tools, all driven by the OSCAR install wizard, to define the set of packages and resources that are needed to create a disk image. Resources include drive partitioning installation, which MPI libararies to install, and other OSCAR-specific tools. Once the set of base software (stored as RPMs), is selected a client image is created. If further customization is needed, then the image can be "edited" using SIS (System Installation Suite) tools. If one wants to create other types of nodes (e.g. an NFS server instead of compute node) or if nodes of the same type haven't different disk subsytems (IDE and SCSI) the entire process is started again with a different image name. The case of homogeneous hardware (and node function type) is handled easily by this setup. If your cluster has heterogenous node types and/or different appliance types, then description-based methods generally provide a simpler solution.

Once the OSCAR image is built, the wizard will guide you to start integrating new nodes. OSCAR uses a tcpdump to detect DHCP requests—when a new node is seen, a new name is automatically assigned. The SIS installer kernel starts the process of downloading the correct image from the server and at this point takes over, doing node customization by looking up node-specific information in the SIS database. In summary, we annotate the installation steps with the steps that OSCAR takes:

  1. Install Head Node—Hand installation. Usually using Distro installer

  2. Configure Cluster Services on Head Node—Follow installer setup script

  3. Define Configuration of a Compute Node—Use The OSCAR wizard to define a client image

  4. For each compute node—repeat

    1. Detect Ethernet Hardware Address of New Node use OSCAR Wizard

    2. Install complete OS onto new node—SIS disk image downloaded and installed

    3. Complete Configuration of new node—Most customization already done in the image

  5. Restart Services on head node that are cluster-aware (e.g. PBS, Sun Grid Engine)—part of the OSCAR install wizard

The key features of OSCAR is that it uses disk images and supports multiple distributions, it uses a configuration wizard to create a client image without first installing a golden client, and supports cluster nodes with no previously installed OS. The images have some hardware independence, but differences in disk subsystem type require different images.

Part III: Managing Clusters