System Design For High Availability
Paradoxically, adding more components to an overall system design can undermine efforts to achieve high availability. That is because complex systems inherently have more potential failure points and are more difficult to implement correctly. While some analysts would put forth the theory that the most highly available systems adhere to a simple architecture (a single, high quality, multi-purpose physical system with comprehensive internal hardware redundancy); however, this architecture suffers from the requirement that the entire system must be brought down for patching and Operating System upgrades. More advanced system designs allow for systems to be patched and upgraded without compromising service availability (see load balancing and failover).
High availability implies no human intervention to restore operation in complex systems. For example, availability limit of 99.999% allows about one second of down time per day, which is impractical using human labor. The need for human intervention for maintenance actions in a large system will exceed this limit. Availability limit of 99% would allow an average of 15 minutes per day, which is realistic for human intervention.
Redundancy (engineering) is used to create systems with high levels of Availability (e.g. aircraft flight computers). In this case it is required to have high levels of failure detectability and avoidance of common cause failures. Two kinds of redundancy are passive redundancy and active redundancy.
Passive redundancy is used to achieve high availability by including enough excess capacity in the design to accommodate a performance decline. The simplest example is a boat with two separate engines driving two separate propellers. The boat continues toward its destination despite failure of a single engine or propeller. A more complex example is multiple redundant power generation facilities within a large system involving electric power transmission. Malfunction of single components is not considered to be a failure unless the resulting performance decline exceeds the specification limits for the entire system.
Active redundancy is used in complex systems to achieve high availability with no performance decline. Multiple items of the same kind are incorporated into a design that includes a method to detect failure and automatically reconfigure the system to bypass failed items using a voting scheme. This is used with complex computing systems that are linked. Internet routing is derived from early work by Birman and Joseph in this area. Active redundancy may introduce more complex failure modes into a system, such as continuous system reconfiguration due to faulty voting logic.
Zero downtime system design means that modeling and simulation indicates mean time between failures significantly exceeds the period of time between planned maintenance, upgrade events, or system lifetime. Zero downtime involves massive redundancy, which is needed for some types of aircraft and for most kinds of communications satellite. Global Positioning System is an example of a zero downtime system.
Fault instrumentation can be used in systems with limited redundancy to achieve high availability. Maintenance actions occur during brief periods of down-time only after a fault indicator activates. Failure is only significant if this occurs during a mission critical period.
Modeling and simulation is used to evaluate the theoretical reliability for large systems. The outcome of this kind of model is used to evaluate different design options. A model of the entire system is created, and the model is stressed by removing components. Redundancy simulation involves the N-x criteria. N represents the total number of components in the system. x is the number of components used to stress the system. N-1 means the model is stressed by evaluating performance with all possible combinations where one component is faulted. N-2 means the model is stressed by evaluating performance with all possible combinations where two component are faulted simultaneously.
Read more about this topic: High Availability
Famous quotes containing the words system, design, high and/or availability:
“He could jazz up the map-reading class by having a full-size color photograph of Betty Grable in a bathing suit, with a co- ordinate grid system laid over it. The instructor could point to different parts of her and say, Give me the co-ordinates.... The Major could see every unit in the Army using his idea.... Hot dog!”
—Norman Mailer (b. 1923)
“Humility is often only the putting on of a submissiveness by which men hope to bring other people to submit to them; it is a more calculated sort of pride, which debases itself with a design of being exalted; and though this vice transform itself into a thousand several shapes, yet the disguise is never more effectual nor more capable of deceiving the world than when concealed under a form of humility.”
—François, Duc De La Rochefoucauld (16131680)
“twas by making sweetbreads do
I passed with such a high I.Q.”
—Robert Frost (18741963)
“Since ... six weeks ago, there has been no day in which I have not had letters and visits on the subject of my nomination for the Presidency.... I say very little. I have in no instance encouraged any one to work to that end.... I have said the whole talk about me is on the score of availability. Let availability do the work then.”
—Rutherford Birchard Hayes (18221893)