15.6 Clustering and Load Balancing

There is a limit to how fast you can make your application run. The application has work to do, and that work takes a minimum number of processor instructions to execute. The faster the machine that runs the application, the faster those instructions execute, but there is a limit to processor speeds. As the number of concurrent requests received by your server increases, your application exceeds the target response times when the workload is too much for the given machine. At this point, you have two options: use a faster machine, or use several machines. The first option is not always available or cost-effective, and ultimately you may use the fastest machine available and still need more power. Any efficient J2EE design should include the possibility that the application will need to be deployed across multiple servers. This is known as horizontal scalability.

Two technologies that enhance the ability to achieve horizontal scalability are clustering and load balancing. Clustering can mean different things in different contexts. In this context, clustering means having a group of machines running the application. Clustering is the mechanism that spreads the application-processing capability across multiple machines; load balancing is the mechanism that ensures that different machines use their various processor capacities efficiently. Currently, scalable web-application architectures consist of many small servers accessed through a load balancer.

15.6.1 Load-Balancing Mechanisms

Generally, the load-balancing mechanism should route requests to the least-busy resource. However, such routing can be difficult to achieve with some load-balancing mechanisms and may be inappropriate for some applications, especially session-based applications. You should determine the load-balancing mechanism that is most appropriate to your application. Load balancing can be achieved in a number of ways, as described in the following sections.

15.6.1.1 DNS load balancing

The mechanism for obtaining the route to a machine from the machine's name is the Domain Name System (DNS), discussed in Section 14.4.4. DNS can supply different IP addresses to separate lookups of the same hostname, providing a simple application-independent load-balancing mechanism. For example, www.oreilly.com could map to 10.12.14.16 and 10.12.14.17 on each alternate lookup. Many Internet sites use DNS load balancing; it is a common and simple load-balancing technique.

DNS load balancing is achieved by using a round-robin mechanism to send each subsequent DNS lookup request to the next entry for that server name. DNS round-robin has no server load measuring mechanisms, so requests can go to overloaded servers, creating ironically unbalanced load balancing.

The result of a DNS lookup is typically cached at various locations, with caches lasting days (though this is configurable and can be any value down to seconds). Consequently, it is slow to propagate changes when using DNS load balancing, and any one client typically uses the same IP address over multiple connections rather than being directed to alternate servers. These issues can be problematic, but can also be advantageous if transactional or session-based communications are normal for your application.

Also note that DNS load balancing can be used in conjunction with other load-balancing techniques. For example, if you use a load-balancing dispatcher, then you can use DNS to balance multiple load-balancing dispatchers to achieve the optimal case in which the load-balancing mechanism has no single point of failure (DNS lookups are replicated). However, some clients could still see failures due to lookup caching.

15.6.1.2 Hardware load balancer

A hardware load balancer is a machine at a single IP address that reroutes communications to other machinesfor example, it reroutes IP packets by rewriting the IP address in the header and passing the packet to the appropriate machine. This can also be an application-independent load-balancing mechanism. The technique is more complex and more expensive than DNS load balancing, but much more flexible and capable. Multiple hardware load balancers can be used in conjunction with DNS load balancing to achieve application-independent load balancing with no single point of failure (although some clients could see failures if a hardware load balancer fails due to lookup caching).

Hardware load balancers may come with extra features such as the ability to automatically detect unavailable servers and eliminate them from their routing tables; to intelligently reroute packets based on server load, IP address, port number, or other criteria; and to decrypt encrypted communications before rerouting packets.

15.6.1.3 Load-balancing dispatcher/Proxy load balancing

A cluster can be implemented with a frontend dispatcher that accepts requests and passes them on to other clustered servers. All requests are directed to the dispatcher, either explicitly (the client doesn't know about any machines "behind" the dispatcher) or redirected (as is done when correctly configured browsers have their requests automatically sent to a proxy server).

The dispatcher (or proxy server) services the request in one of three ways:

The request is satisfied by returning a result (document) cached in the dispatcher. This scenario is common for proxy servers, but unusual for dispatchers.
The request is redirected to another server that services the request and returns the results to the client, either directly or, more commonly, through the dispatcher.
The dispatcher redirects the client to send the request to another server. The HTTP protocol supports this option with the Location directive. For example, if a browser connects to a server requesting a particular URL and receives a response like this:
```
Location: http://somewhere.else.com/
```
then the browser automatically tries to request the new URL.

A dispatcher could also decrypt encrypted requests before handling or forwarding them, thus centralizing security and offloading some processing from the server cluster.

15.6.1.4 URL-based load balancing

Decide where any particular document or service is best served and specify the appropriate host machine in the URL. This load-balancing mechanism is straightforward. For example, you could retrieve images from the image server and documents from a separate document server.

URL generation can be done statically or dynamically, but generating documents dynamically can add further overhead. In addition, the URL could be retained by the client and used later or reused when the specified host is no longer the optimal server for the request. Where possible, convert dynamic requests into static ones by replacing URLs served dynamically with ones served statically.

15.6.1.5 Server-pooled objects

Load balancing is possible by varying how pooled objects are handed out. This type of balancing tends to apply at the application level, where you can create and hand out objects from a pool according to your own algorithm.

15.6.1.6 Client-based load balancing

The connection mechanism in the client can serve as a load-balancing mechanism. The client can even check for an available rerouting server to combine client load balancing with server load balancing. The client connection mechanism should be centrally based, either explicitly by having client objects connect through a connection service, or implicitly using proxy objects in place of server-communicated objects.

One such load-balancing connection mechanism simply selects from a list of available RMI connections sequentially.

15.6.1.7 Application configuration load balancing

The application itself should be configured for deployments with different usage patterns. Each type of read-only, read-write, and batch-update bean components should be in different application servers so that each application server can be appropriately configured. Transactional and nontransactional sessions and messages should be served from separate servers. Nontransactional server components can be replicated easily for horizontal scaling.

Multiple VMs can be used even within individual server machines to help load-balance the application. Scalable servers usually work best with multiple VMs. Using one thread per user can create a bottleneck with large numbers of concurrent users. Stateless sessions are easily load-balanced; replicating transactional sessions is more involved and carries higher overhead. Pseudo-sessions that encode a session identifier into the URLs and are stored globally are probably the best compromise for load-balancing transactional sessions.

Separate web servers should be used for all static content, which can be served much faster and with much lower overhead than dynamic content. Priority queues can provide higher-priority users with a higher quality of service. The section Section 10.8 discusses queue-based load balancing and network queuing in more detail. The frontend queue can use the network queue as its bottleneck and accept requests only when there are sufficient resources to process the request. Try to balance the workload of the various components so they all work. All components should work at any given time; there should be no idle components.

15.6.1.8 Database partitioning

Section 13.4.1.4 discusses database partitioning schemes to assist in load-balancing the application.

15.6.2 Load-Balancing Algorithms

In addition to selecting one or more load-balancing mechanisms, you may need to consider optimal load-balancing algorithms for your application. These algorithms include:

Random: Randomly allocate requests to servers.
Minimum load: Allocate requests to the server with the currently minimum load.
Round-robin: Successively select the next server in a list, starting again from the first server when the list is exhausted.
Weighted round-robin: Like round-robin, but with some servers listed multiple times.
Performance-based: Allocate requests based on the performance capability of the server.
Load-based: Allocate requests based on the servers' total load capability.
Dynamic: Dynamically allocate to servers based on an application-encoded algorithm.
Nearest IP address: Allocate requests to the IP address (physically) nearest the client.
Port number: Allocate requests according to the port number.
HTTP header: Allocate requests according to a value within the HTTP header, such as the URL or a cookie.