Intro
The idea of this post is to provide a pattern that helps improve the performance of a complex information system that has very high cpu consumption and processing times. This pattern allows:
- Maximize system capacity allowing less DoS
- Reduce the number of long running requests
- Keep system more stable
Normal circuit breaker user case
In the following diagram we have configured the mentioned system with a circuit breaker implemented as a WebLogic WorkManager that allows 20 concurrent threads serving requests in parallel for each server. In addition there is a coherence data grid in place that caches the information meanwhile data remains unchanged.
Let’s suppose we have at 12:00:00 an instant load like this:
- 8 users requesting the operation /ss/cfa/1230 from Madrid
- 9 users requesting the operation /ss/cfa/1230 from Barcelona
- 12 users requesting the operation /ss/cfa/1234 from San Francisco
Let’s also suppose that the load balancer sends the requests coming from SFO to server1 and the requests coming from MAD&BCN to server3, therefore:
- server1 = 12 concurrent requests < 20 maximum allowed by the workmanger
- server3 = 8 + 9= 17 concurrent requests < 20 maximum allowed by the workmanager
In addition let’s suppose the information for request /ss/cfa/1230 have just been invalidated in cache because several data has been changed at 11:59:59
Let’s assume the following processing times:
- Daatabase responds with a linear function of 4000ms x number of concurrent requests (*)
- Coherence cache performs constantly for this ranges of load = 300ms
- Application responds with a linear function of 200ms x number of concurrent requests (*)
- Network time is constant = 500ms
Response Time Results:
Formula: TOTAL= network + application + backend(database or gridcache)
- SFO = 500 + 12 x 200 + 300 = 500 + 2400 + 300 = 3.2 s (cached requests)
- MAD = 500 + 8 x 200 + 8 x 4000 = 500 + 1600 + 32000= 34.1 s (uncached requests)
- BCN = 500 + 9 x 200 + 9 x 4000 = 500 + 1800 + 36000 = 38.3 s (uncached requests)
As we can see the, effect of uncached info provokes response times 10 times longer than the cached requests just because this is a heavy processing system.
The smart circuit breaker approach
Let’s suppose we can manage requests on each sever in a way that the same request is only processed one time. Meanwhile the rest of equal concurrent requests are kept waiting until the information of the first request is retrieved.
In this situation server3 will process only one request of the 17 uncached concurrent that arrived at 12:00:00, the rest 16 will be kept sleeping until the unique request finishes and stores the result in cache. See following diagram.
So the calculation for the unique request is the result of the formula when there is only 1 request in place and is not cached:
server3.processing.time.for.the.unique.request = 500 + 1 x 200 + 1 x 4000 = 4.7 s
When the unique request finishes, the rest 16 can be processed, but now they can get the result from cache, therefore the processing time is the sum of waiting time plus the time it takes a cached request:
RT = waittime + cahed.data.processing.time
server3.processing.time.for.the.rest.16.requests = 4.7 s + (500 + 16 x 200 + 300) = 8.7 s
Conclusions
- The circuit breaker improves response times by reducing them from a range of [34.1-38.3] to a narrower range of [4.7-8.7] seconds, which is 5 times better!
- Because we are processing only one unique request and the rest that are equal are kept waiting, the workmanager has 19 free threads to process other different requests, therefore the capacity of the system is incremented from (20 -17 = 3) to (20 -1 = 19) which is six times more
- With the normal circutbreaker approach, more subsequent requests (only 3 more because we already have 17 running and wormanager limit is 20) to /ss/cfa/1230 arriving before the second 34.1, will be processed uncached, this extra requests increase the total processing time because the linnear formula mentioned.
- With the smart circuitbreaker, new requests before second 4.1 are put waiting, therefore the time for finishing the request remains constan.. In addition the number of free threads is 19, allowing the system to process more cached or uncached requests different than the /ss/ca/1230. Several of those new requests can potentially be requesting same info, in that case the smart circuit pattern will start working as well.
Assumptions & comments
Normally systems can process several request in parallel with constant response time while there are sufficient processors to dedicate for each request.
But when the number of requests is much more than the number of processors, response time increases because the scheduler starts to spread computer capacity to all the pending requests. This increase is proportional to
RT = (concurrent requests–number of processors)/number of processors + overhead
Overhead comes from pools schedulers, timers, context switches, memory management (pagination, caches,…), and all the stuff involved in multi-process/multi-thread computation. If the overhead is low compared with the processing time and the number of request is much higher than the number of cpus, response time is honoring a linear formula this way:
RT = concurrent requests/number of processors
For instance if we have 9 concurrent requests, 1 cpu and response time for one request is 4s, the response time for each request is aproximately
RT ~= (9/1) x 4 = 9 x 4 = 36s
In general, when systems starts to saturate, overheads and internal processing comes to be significant, therefore the system is not working in the optimal constant/linear part of the response curve.
Enjoy 😉