Failure Transparency

In a distributed system, failure transparency refers to the extent to which errors and subsequent recoveries of hosts and services within the system are invisible to users and applications. For example, if a server fails, but users are automatically redirected to another server and never notice the failure, the system is said to exhibit high failure transparency.

Failure transparency is one of the most difficult types of transparency to achieve since it is often difficult to determine whether a server has actually failed, or whether it is simply responding very slowly. Additionally, it is generally impossible to achieve full failure transparency in a distributed system since networks are unreliable.

There is also usually a trade-off between achieving a high level of failure transparency and maintaining an adequate level of system performance. For example, if a distributed system attempts to mask a transient server failure by having the client try to contact the failed server multiple times, performance of the system may be negatively affected. In this case, it would have been preferable to have given up earlier and tried another server.

Famous quotes containing the words failure and/or transparency:

    It could be clearly proved that by a practical nullification [by the South] of the Fifteenth Amendment the Republicans have for several years been deprived of a majority in both the House and Senate. The failure of the South to faithfully observe the Fifteenth Amendment is the cause of the failure of all efforts towards complete pacification. It is on this hook that the bloody shirt now hangs.
    Rutherford Birchard Hayes (1822–1893)

    Life is filigree work.... What is written clearly is not worth much, it’s the transparency that counts.
    Louis-Ferdinand Céline (1894–1961)