Cluster Service

The Cluster Service is a special piece of software that runs on Windows Server 2003 and manages the relationships between the servers in the cluster, which are referred to as nodes. You don't manually install Cluster Service like you did in prior versions of Windows; you simply use the Cluster Administrator console to create a new cluster. We'll walk through that procedure later in this chapter.

Windows Server 2003 clusters are a bit picky about who they'll allow in a cluster. If you've built the first cluster node on Windows Server 2003, Enterprise Edition, all other nodes that are running Windows Server 2003 must also be running Enterprise Edition. The same restriction applies to the Standard and Datacenter editions; the edition of the first cluster node sets the edition that all other Windows Server 2003 nodes must follow. However, you can have clusters that have a mix of nodes running Windows Server 2003 and Windows 2000 Server.

You can build the following three types of clusters:

Single node? In this type of cluster, only one node exists. Essentially, all clusters start out this way: You create the first node and then add more nodes. However, clusters with only one node don't have any failover capabilities, which defeats the purpose of clustering. The only practical use for a single-node cluster is to experiment with the Cluster Service in a lab or classroom.
Single quorum device? The quorum, as we explain in the next section, is a special resource the cluster uses to store the cluster configuration. The quorum is accessible to all nodes in this type of cluster, generally through an external drive array physically attached to each cluster member.
Majority node set? A new type of cluster for Windows Server 2003, this stores the quorum across the nodes, and the nodes might not be physically attached to one another. This type of clustering enables you to create geographically dispersed clusters, but setting them up is very complex because Windows is incredibly picky about little implementation details. Microsoft recommends that you use this type of cluster only if it is provided preconfigured from a server manufacturer, and we tend to concur.

Tip

Most major server computer manufacturers offer preconfigured cluster packages you can buy. These are far and away the best way to get a cluster into your environment because manually setting up your server hardware to support a cluster can be a difficult, time-consuming task. If you decide to build a cluster from scratch, make absolutely certain that your hardware is on Microsoft's special Cluster Hardware Compatibility List (ask the hardware manufacturer about this), and make sure you have the hardware manufacturer's specific directions for setting up a cluster with its equipment.

The next section discusses some of the basic concepts you'll need to start building your own clusters.

Cluster Service Concepts

Imagine you have a single server you want to make into a SQL Server computer. You can simply install Windows and then install SQL Server. Suppose that, for reasons of your own, you install Windows on one hard drive and install SQL Server onto a completely separate RAID array. Now imagine that you make that RAID array an external array, connected to the computer by a small computer system interface (SCSI) cable or by fibre channel. This shouldn't be that hard to imagine; most database servers are set up in this fashion.

Conceptually, all you need to do now to build a cluster is attach a second server to the external drive array. You'll install Windows and SQL Server on that server, too. However, SQL Server's services will remain stopped, and the two servers will communicate via a special service, sending a heartbeat signal to one another. If the first server stops sending the heartbeat, the second server knows that something has gone wrong. It immediately seizes control of the external drive array, starts the SQL Server services, and takes over where its failed companion left off.

Clustering Details

Even though this example omits a few crucial details, it's essentially how an active-passive cluster operates. Active-active clusters are more difficult to conceptualize. Imagine that you have two servers, each running SQL Server and each connected to two external drive arrays. One server owns a single external drive array and stores the SQL Server databases there. If one server fails, the other one detects it and seizes control of its external drive array (now owning two drive arrays total). The second server starts a second copy of SQL Server and takes over where the failed server left off. Now, the remaining server is effectively two SQL Server computers, all running in one box.

For details on how the Cluster Service operates and handles failovers, visit www.samspublishing.com and enter this book's ISBN number (no hyphens or parenthesis) in the Search field; then click the book cover image to access the book details page. Click the Web Resources link in the More Information section, and locate article ID# A011201.

Each cluster, then, is comprised of a variety of resources, which can include the following:

One or more IP addresses? These are addresses users and client computers use to communicate with the cluster. Each cluster node has its own private IP addresses, as well, which nobody really uses.
One or more computer names? As with IP addresses, these belong to the cluster and not to any one cluster node. Users and client computers communicate with these names, and not with the nodes' private computer names.
A quorum resource? This resource contains all the information about the cluster's resources, which node owns each resource, and so forth.
Logical drives? These represent shared drive arrays, which can be external SCSI arrays or fibre channel arrays.
Application services? This might be the DHCP service, SQL Server, or any other clustered application.

Each resource can be owned by only one node at a time. Each resource does, however, have a list of all nodes that could possibly own the resource. When the node that owns a resource fails, the resource's other possible owners detect the failure and one of them takes ownership of the resource. Administrators can also manually transfer resource ownership from one node to another. This enables you to transfer workload off of one node, letting you shut down the node for maintenance without interrupting your users' work.

Resources can have dependencies. For example, most clustered applications require TCP/IP (other network protocols aren't supported under clustering), so a node can't own an application resource unless the node already owns the shared TCP/IP address the application requires. To keep dependencies easy to manage, Windows lets you organize resources into groups. Resource groups can be transferred from node to node as a single unit, so you don't have to worry about forgetting a dependency when transferring ownership of an application to another node.

Creating a New Cluster

To create a new cluster, first make sure your hardware is ready. Then, launch Cluster Administrator on any Windows Server 2003 computer and follow these steps:

Caution

These steps assume that your hardware and device drivers are already properly configured for, and compatible with, clustering. Don't assume that the hardware drivers normally used in Windows Server 2003 will support clustering because that's often not the case. Contact your hardware manufacturer's support division for detailed instructions on how to build a cluster and for the correct device drivers.

If your cluster nodes have a shared storage device, such as an external SCSI drive array, power down all nodes except the first one.
In Cluster Administrator, select File, New, Cluster.
Enter the domain in which the first cluster node exists and its name. This step is necessary to identify the first node because you don't have to perform this process on the first node?you can perform it on any computer capable of running Cluster Administrator.
Enter a name for the cluster; this name must not conflict with any other NetBIOS names on your network.

Provide an IP address to be used by the cluster. The cluster can't use DHCP to obtain an address, so be sure the address you provide isn't in use on your network and won't be issued by a DHCP server to another computer.

Note

You must specify an IP address that's in the same range as the first node's private IP address. Otherwise, the wizard won't be capable of determining the correct subnet mask and you'll see an error message prompting you to enter a different IP address.

Enter the credentials the Cluster Service should use, as shown in Figure 12.1. This account will be granted local Administrator privileges on the node.

Figure 12.1. The account you provide cannot use a blank password.

Tip

For smooth operation of your new cluster, use a domain user account for the Cluster Service. All nodes in the cluster should specify the same account.
Wait for the wizard to install and configure the Cluster Service.

After your first cluster node is fully functional, you can add more nodes to the cluster. Simply right-click the cluster in Cluster Administrator. Then, from the pop-up menu, select New, Node.

Cluster Administrator

You'll use Cluster Administrator to manage your clusters. With it, you can add and remove cluster resources and resource groups, add and remove nodes, and transfer resource ownership from one node to another.

To add a new resource group, just right-click a cluster and select New from the pop-up menu; then select Group. Enter the name and details for the new group, and you're ready to go. To add a new resource to an existing group, right-click the group and select New; then select Resource from the pop-up menu. You'll see a dialog box similar to the one in Figure 12.2, which allows you to specify the type of resource you want to add.

Figure 12.2. You can add any type of new resource from this dialog box.

graphics/12fig02.jpg

In the New Resource dialog box, you'll specify the following:

A name for the new resource.
A brief description of the resource. This is especially useful when you have multiple resources of the same type because it enables other administrators to more easily figure out which is which.
The resource type, which can include clusterable Windows services, file shares, and so forth. We'll discuss those in the next section.
A group to which the new resource will belong.

After clicking Next, you can specify the nodes that are allowed to own the resources. For example, the DHCP Service resource can be owned only by nodes that have the DHCP Service already installed, so you use that condition to limit the potential owners for a DHCP Service resource. After specifying the potential owners, click Next.

Last, you'll specify any resources on which your new resource depends. As shown in Figure 12.3, you can choose from all existing resources as possible dependencies, and you typically must specify at least one resource of the Storage type. That's because most resources, especially applications and services, have to store their data somewhere. In a cluster, that data must be on a cluster-owned (shared) storage location, making the application or service dependent on the availability of that storage location.

Figure 12.3. Specify only the resources necessary for your new resource to run properly.

graphics/12fig03.jpg

Tip

Generally, your resource should be in the same resource group as any resources on which it depends. That way, the entire group can be transferred from node to node as a single unit.

Transferring resources from one node to another is easy. If you want to perform a transfer that tests your resources' capability to fail over from one node to another, simply right-click a resource group and select Initiate Failure from the pop-up menu. The resource will immediately fail and transfer to another possible owner.

Tip

Be sure that all resources have at least two nodes as possible owners. Otherwise, a failed resource won't be capable of transferring to another node.

Note that resources have their own properties that affect their behaviors. To access these properties, right-click a resource and select Properties from the pop-up menu. The Properties dialog box enables you to change the resource's name, description, and list of possible owners; you can also change the resource's dependencies. As shown in Figure 12.4, you can also configure advanced properties. For example, you can configure the Cluster Service to automatically try to restart a service any time it stops, and you can configure how often the Cluster Service checks a service to see whether it's responding.

Figure 12.4. The default values for the advanced properties are sufficient in most environments.

graphics/12fig04.jpg

Resource groups have properties, too, which you can access by right-clicking the group and selecting Properties from the pop-up menu. Group properties include failover and failback policies, which are defined as follows:

Failover policy? Determines how many times a group is allowed to fail from node to node within a given time period. The default is 10 times in 6 hours. If the failovers exceed this threshold, the Cluster Service assumes the group is not working correctly and takes it offline. You must manually restore the group to service after correcting its problems.
Failback policy? Disabled by default, this allows the Cluster Service to move a group back to its original cluster node. You can either allow an immediate failback, in which case the group returns to its original node (referred to as the preferred node) as soon as that node becomes available or specify that failback occur only during certain hours. Specifying hours for failback allows the group to remain where it is until a relatively idle period when users won't be affected by the failback.

Cluster Administrator includes a comprehensive help file that can provide step-by-step instructions for other cluster operations.