Chapter 12. Designing for High Availability
High-availability architectures represent a wide-ranging subject of interlocked complexity stretching over all layers of the OSI (Open System Interconnection) stack.
Keep in mind that the end-user's perception of service availability is the ultimate and most relevant criterion; perception will be favorable if you did your job right. Toward that end, high-availability architectures satisfy the following needs:
- Load balancing— Naturally, load balancing primarily serves the purpose of distributing load among candidates of a pool or farm of devices. Next-hop redundancy considerations and load balancing are important aspects of such an overall design. Dynamic DNS can accomplish this also with different means.
- Clustering— This involves logical grouping of constituents to a service. Clustering groups might include performance clusters, load-balancing clusters, or fault-tolerance clusters. It is another generic approach to presenting one highly robust virtual service to the outside world with a group of real servers behind the scene. Dedicated cluster management software maintains the overall picture of cluster controllers and component servers, thus increasing overall availability, robustness, and performance.
- (D)DoS defenses— Robust high-availability architectures can more likely withstand or mitigate the effects of (D)DoS attacks or are an attribute of a sound design.
This chapter discusses support for such services from a networker's point of view (OSI Layers 1 through 4). The application layers (Layers 5 through 7) are intentionally underrepresented in this chapter because they use other mechanisms beyond the scope of a network/transport layer discussion.
Increasing Availability
The essential questions for high-availability (HA) designers have always been (and will continue to be) "How can I increase the overall availability of a special service or application, and what do I have to do to eliminate weak links in the chain or single points of failure? Tackling these challenges involves thorough planning across all OSI layers and the removal of all single points of failure wherever possible. A chain is as strong as its weakest link. Therefore, it is highly advisable to have at least one backup system, link, or resource available at all times.
Of course, the efforts and costs associated with such an endeavor can get out of hand easily and should, therefore, be governed by common sense and commercial feasibility. This is a particularly interesting topic in times of "best effort" services. Best effort is always a commercial dictate. The particular task of network engineers is to provide highly robust IP infrastructures to support higher-layer redundancy approaches, and the task of systems engineers is to accomplish OS resilience with concepts such as clustering or distributed architectures. This is the foundation for high-availability applications (services); a good implementation should result in robust and stable services from the point of view of the end user. How this is accomplished means little to the customer.