Throttling pattern
At times, there are applications that have very stringent--SLA requirements from performance and scalability perspective irrespective of a number of users consuming the service. In such circumstances, it is important to implement throttling patterns because they help in limiting the number of requests allowed to be executed. The load on applications cannot be predicted accurately for all circumstances. When the load on application spikes, throttling helps in reducing pressure on the servers and services by controlling the resource consumption.
This pattern should be used when meeting SLA is a priority for applications, to prevent some users to consume more resources than allocated, to optimize spikes and burst in demand, and to cost optimize the resource consumption. These are valid scenarios for applications built to be deployed on the cloud.
There can be multiple strategies used for throttling an application. The throttling strategy can reject newer requests once the threshold is crossed, or it can let the user know that the request is in the queue and will get an opportunity to be executed once the number of requests gets reduced.
The following diagram illustrates implementing throttling in a multi-tenant system where each tenant is allocated a fixed resource usage limit. Once they cross this limit, any additional demand for resources is constrained, thereby maintaining enough resources for other tenants.