上QQ阅读APP看书,第一时间看更新
Resiliency
Resiliency is the ability of a system to adapt and keep working in response to changes and challenges. Resiliency can only be discussed in the context of the types of events that a system is resilient towards. A system might be resilient to a few computers getting turned off but may not be resilient to nuclear war.
Resiliency can be broken down into different sub-categories:
- Fault tolerance: The ability of the system to deal with invalid states, bad data, and other problems
- Failure isolation: A problem in one part of the system does not infect other parts of the system. Bad data or system failure in one place does not result in problems elsewhere
- Scalability: A scalable system under heavy use is able to provide additional capacity and is thus resilient to load
- Complexity management: A system that has ways of managing complexity helps it be resilient against human errors
We will now discuss fault tolerance in more detail.