Increasing Availability

The essential questions for high-availability (HA) designers have always been (and will continue to be) "How can I increase the overall availability of a special service or application, and what do I have to do to eliminate weak links in the chain or single points of failure? Tackling these challenges involves thorough planning across all OSI layers and the removal of all single points of failure wherever possible. A chain is as strong as its weakest link. Therefore, it is highly advisable to have at least one backup system, link, or resource available at all times.

Of course, the efforts and costs associated with such an endeavor can get out of hand easily and should, therefore, be governed by common sense and commercial feasibility. This is a particularly interesting topic in times of "best effort" services. Best effort is always a commercial dictate. The particular task of network engineers is to provide highly robust IP infrastructures to support higher-layer redundancy approaches, and the task of systems engineers is to accomplish OS resilience with concepts such as clustering or distributed architectures. This is the foundation for high-availability applications (services); a good implementation should result in robust and stable services from the point of view of the end user. How this is accomplished means little to the customer.