Icon
Icon
Icon
Icon
Icon
Icon
2:17 AM
1 comments


Chapter 12. Designing for High Availability

High-availability architectures represent a wide-ranging subject of interlocked complexity stretching over all layers of the OSI (Open System Interconnection) stack.
Keep in mind that the end-user's perception of service availability is the ultimate and most relevant criterion; perception will be favorable if you did your job right. Toward that end, high-availability architectures satisfy the following needs:
  • Redundancy— This includes equipment (node) and topology (link) redundancy precautions and redundant services available for a user base.
  • Load balancing— Naturally, load balancing primarily serves the purpose of distributing load among candidates of a pool or farm of devices. Next-hop redundancy considerations and load balancing are important aspects of such an overall design. Dynamic DNS can accomplish this also with different means.
  • Clustering— This involves logical grouping of constituents to a service. Clustering groups might include performance clusters, load-balancing clusters, or fault-tolerance clusters. It is another generic approach to presenting one highly robust virtual service to the outside world with a group of real servers behind the scene. Dedicated cluster management software maintains the overall picture of cluster controllers and component servers, thus increasing overall availability, robustness, and performance.
  • Heartbeat/keepalives— Heartbeat/keepalive protocols and agents monitor the availability and operational parameters of network elements and services.
  • (D)DoS defenses— Robust high-availability architectures can more likely withstand or mitigate the effects of (D)DoS attacks or are an attribute of a sound design.
  • Network failover strategies— These approaches in general include VRRP/HSRP mechanisms in combination with gratuitous ARP for the purpose of providing a gateway failover mechanism.
  • Reliable failure detection and fast recovery/restoration of service— This is the domain of routing protocols. The general goal of modern routing designs is subsecond convergence. This is a mandatory requirement for real-time traffic such as voice or video.
This chapter discusses support for such services from a networker's point of view (OSI Layers 1 through 4). The application layers (Layers 5 through 7) are intentionally underrepresented in this chapter because they use other mechanisms beyond the scope of a network/transport layer discussion.

Increasing Availability

The essential questions for high-availability (HA) designers have always been (and will continue to be) "How can I increase the overall availability of a special service or application, and what do I have to do to eliminate weak links in the chain or single points of failure? Tackling these challenges involves thorough planning across all OSI layers and the removal of all single points of failure wherever possible. A chain is as strong as its weakest link. Therefore, it is highly advisable to have at least one backup system, link, or resource available at all times.
Of course, the efforts and costs associated with such an endeavor can get out of hand easily and should, therefore, be governed by common sense and commercial feasibility. This is a particularly interesting topic in times of "best effort" services. Best effort is always a commercial dictate. The particular task of network engineers is to provide highly robust IP infrastructures to support higher-layer redundancy approaches, and the task of systems engineers is to accomplish OS resilience with concepts such as clustering or distributed architectures. This is the foundation for high-availability applications (services); a good implementation should result in robust and stable services from the point of view of the end user. How this is accomplished means little to the customer.

If You Enjoyed This Post Please Take a Second To Share It.

You Might Also Like

Stay Connected With Free Updates

Subscribe via Email

teaser