Adaptive concurrency limits are critical for maximizing the performance and reliability of microservices, learn how to implement it using FluxNinja Aperture.
Join the DZone community and get the full member experience.
Highly available and reliable Services are a hallmark of any thriving business in today’s digital economy. As a Service owner, it is important to ensure that your Services stay within SLAs. But when bugs make it into production or user traffic surges unexpectedly, services can slow down under a large volume of requests and fail. If not addressed in time, such failures tend to cascade across your infrastructure, sometimes resulting in a complete outage.
At FluxNinja, we believe that adaptive concurrency limits are the most effective way to ensure services are protected and continue to perform within SLAs.
Concurrency is the number of requests a service can handle at any given time. It is calculated using Little’s Law, which states that in the long-term, steady state of a production system, the average number of items L in the system is the product of the average arrival rate λ and the average time W that an item spends in the system, that is, L=λW. If any excess requests come in beyond L, they cannot be served immediately and must be queued or rejected. And this could lead to a significant build-up of queues, slowing down service response times. However, queues do not build as long as services are within their concurrency limits.
Concurrency limits are hard to estimate, especially when there are a large number of interdependent micro-services and fast-moving environments.
This is why dynamically setting concurrency limits (Adaptive Concurrency Limits) based on overall service health is the best way to protect a service & stay within SLAs.
At first glance, both concurrency limits and rate limits seem to do the same job. But they serve very different purposes.
Rate limits are a preventive technique- they prevent misuse of a Service by a particular user, making sure the Service remains available for other users. But this technique does not help if there is a surge in overall traffic not attributed to any specific user.
On the other hand, Adaptive Concurrency Limits are a protective reliability technique. Using Adaptive Concurrency Limits, it is possible to detect when the number of requests to a service exceeds the concurrency limit of a service and have reliability interventions kick in.
Aperture is an open-source flow control and reliability platform which can help you set Adaptive Concurrency Limits for your services. At the heart of Aperture is a Control System Loop, manifested by:
To showcase how Adaptive Concurrency Limits can be set in practice, let’s deep dive into a demo setup of Aperture agents and controllers.
Aperture comes with a playground, pre-configured with a traffic generator, a sample application, and an instance of Grafana that you can use to see various signals generated by a policy.
The above snap shows a demo application with three services and a traffic generator named wavepool-generator.
The demo application is an example of micro-services topology, where the request flows from service1 to service2 and service2 to service3. Each service adds a delay with a jitter to simulate processing.