Chapter 12. Designing for High Availability

High-availability architectures represent a wide-ranging subject of interlocked complexity stretching over all layers of the OSI (Open System Interconnection) stack.

Keep in mind that the end-user's perception of service availability is the ultimate and most relevant criterion; perception will be favorable if you did your job right. Toward that end, high-availability architectures satisfy the following needs:

  • Redundancy? This includes equipment (node) and topology (link) redundancy precautions and redundant services available for a user base.

  • Load balancing? Naturally, load balancing primarily serves the purpose of distributing load among candidates of a pool or farm of devices. Next-hop redundancy considerations and load balancing are important aspects of such an overall design. Dynamic DNS can accomplish this also with different means.

  • Clustering? This involves logical grouping of constituents to a service. Clustering groups might include performance clusters, load-balancing clusters, or fault-tolerance clusters. It is another generic approach to presenting one highly robust virtual service to the outside world with a group of real servers behind the scene. Dedicated cluster management software maintains the overall picture of cluster controllers and component servers, thus increasing overall availability, robustness, and performance.

  • Heartbeat/keepalives? Heartbeat/keepalive protocols and agents monitor the availability and operational parameters of network elements and services.

  • (D)DoS defenses? Robust high-availability architectures can more likely withstand or mitigate the effects of (D)DoS attacks or are an attribute of a sound design.

  • Network failover strategies? These approaches in general include VRRP/HSRP mechanisms in combination with gratuitous ARP for the purpose of providing a gateway failover mechanism.

  • Reliable failure detection and fast recovery/restoration of service? This is the domain of routing protocols. The general goal of modern routing designs is subsecond convergence. This is a mandatory requirement for real-time traffic such as voice or video.

This chapter discusses support for such services from a networker's point of view (OSI Layers 1 through 4). The application layers (Layers 5 through 7) are intentionally underrepresented in this chapter because they use other mechanisms beyond the scope of a network/transport layer discussion.