In a distributed system, failure transparency refers to the extent to which errors and subsequent recoveries of hosts and services within the system are invisible to users and applications. For example, if a server fails, but users are automatically redirected to another server and never notice the failure, the system is said to exhibit high failure transparency.
Failure transparency is one of the most difficult types of transparency to achieve since it is often difficult to determine whether a server has actually failed, or whether it is simply responding very slowly. Additionally, it is generally impossible to achieve full failure transparency in a distributed system since networks are unreliable.
There is also usually a trade-off between achieving a high level of failure transparency and maintaining an adequate level of system performance. For example, if a distributed system attempts to mask a transient server failure by having the client try to contact the failed server multiple times, performance of the system may be negatively affected. In this case, it would have been preferable to have given up earlier and tried another server.
Famous quotes containing the words failure and/or transparency:
“Human beings are compelled to live within a lie, but they can be compelled to do so only because they are in fact capable of living in this way. Therefore not only does the system alienate humanity, but at the same time alienated humanity supports this system as its own involuntary masterplan, as a degenerate image of its own degeneration, as a record of peoples own failure as individuals.”
—Václav Havel (b. 1936)
“Life is filigree work.... What is written clearly is not worth much, its the transparency that counts.”
—Louis-Ferdinand Céline (18941961)