Resources: BW - Chapter 2 - Reliability and fault tolerance.pdf
The introduction of Redundant components into a system so that faults can be detected and tolerated.
Levels of fault tolerance in a system:
- Full fault tolerance ← Most safety-critical systems
- The system continuous to operate in the presence of faults, albeit for a limited period, with no significant loss of functionality or performance.
- Graceful degradation / Fail soft
- The system continuous to operate in the presence of errors, accepting a partial degradation of functionality or performance during recovery or repair.
- Fail safe
- The system maintains its integrity while accepting a temporary halt in its operation.
Two discussed techniques: