In my previous post, bulkheads, I mentioned that “Since the popularity of service oriented architectures and then microservices, people are talking about bulkheads and some other terms like circuit breakers, timeouts etc. However, I see that many lacks the understanding of what these terms really mean or how to implement them. I decided to cover some of them and wanted to start with Bulkheads.”. In this post I will briefly cover up Circuit breakers.
Not many years ago, people used to plug too many electronical appliances into their circuit, each appliance drew certain amount of current and when the current is resisted, it created so much head in the walls of the house and then the house burns down. Later on there were some optimizations but still houses on fire.
Circuit breakers come to rescue.
This idea can also be applied to software applications with integration points. A software with one or more integrations are destined to fail at any given time. This is a given. Circuit breakers can help prevent operations that is already known not to be healthy.
Circuit breakers are a way to degrade functionality when the system is under stress. Changes in the circuit breakers’ state should always be logged and monitored. Circuit breakers are effective at guarding against integration points, cascading failures, slow responses etc.
In a normal closed state, circuit breakers executes operations as usual. Other services can be invoked, internal operations can proceed. However if any operations fail, circuit breakers store this information, that is including any times. Once the number of failures exceeds a certain threshold, circuit breaker trips and opens the circuit. Any call made to the circuit breaker, it fails immediately.
This is very important, because most of the failures occurs due to blocked threads, race conditions and dead locks. Hence, if a service is not responding or continuously timing out, what is the point of invoking it. All yours threads will be blocked and you will run out of them. Soon, your JVM or runtime will crash.
After a configurable amount of time, circuit breaker can go into half open state, in which calls can pass-through and if all goes well, circuit breaker closes. If things are still failing, circuit breakers opens again.
If circuit breaker is open, you can either let the user know something is not working and check back soon. Or you can have fallback services. The latter is much more better however services unfortunately can not have fallback due to their responsibility and function.
You can have multiple circuit breakers for different purposes such as timeouts, connections refused and other type of failures.
There are several tools that can help you implement circuit breakers in your system. Netflix has an open source project for this purpose called hystrix . You can check out it and see how things work.