Circuit breakers

In my previous post, bulkheads, I mentioned that “Since the popularity of service oriented architectures and then microservices, people are talking about bulkheads and some other terms like circuit breakers, timeouts etc. However, I see that many lacks the understanding of what these terms really mean or how to implement them. I decided to cover some of them and wanted to start with Bulkheads.”. In this post I will briefly cover up Circuit breakers.

Not many years ago, people used to plug too many electronical appliances into their circuit, each appliance drew certain amount of current and when the current is resisted, it created so much head in the walls of the house and then the house burns down. Later on there were some optimizations but still houses on fire.

Circuit breakers come to rescue.

This idea can also be applied to software applications with integration points. A software with one or more integrations are destined to fail at any given time. This is a given. Circuit breakers can help prevent operations that is already known not to be healthy.

Circuit breakers are a way to degrade functionality when the system is under stress. Changes in the circuit breakers’ state should always be logged and monitored. Circuit breakers are effective at guarding against integration points, cascading failures, slow responses etc.

In a normal closed state, circuit breakers executes operations as usual. Other services can be invoked, internal operations can proceed. However if any operations fail, circuit breakers store this information, that is including any times. Once the number of failures exceeds a certain threshold, circuit breaker trips and opens the circuit. Any call made to the circuit breaker, it fails immediately.

This is very important, because most of the failures occurs  due to blocked threads, race conditions and dead locks. Hence, if a service is not responding or continuously timing out, what is the point of invoking it. All yours threads will be blocked and you will run out of them. Soon, your JVM or runtime will crash.

After a configurable amount of time, circuit breaker can go into half open state, in which calls can pass-through and if all goes well, circuit breaker closes. If things are still failing, circuit breakers opens again.

If circuit breaker is open, you can either let the user know something is not working and check back soon. Or you can have fallback services. The latter is much more better however services unfortunately can not have fallback due to their responsibility and function.

You can have multiple circuit breakers for different purposes such as timeouts, connections refused and other type of failures.

There are several tools that can help you implement circuit breakers in your system. Netflix has an open source project for this purpose called hystrix . You can check out it and see how things work.

 

Bulkheads

Since the popularity of service oriented architectures and then microservices, people are talking about bulkheads and some other terms like circuit breakers, timeouts etc. However, I see that many lacks the understanding of what these terms really mean or how to implement them. I decided to cover some of them and wanted to start with Bulkheads.

This term comes from Ships. In a ship, bulkheads are partitions that can be sealed and closed during an emergency. Something like following:

If one of the compartment starts taking water, it can be sealed once hatches are closed which prevents the water moving from one compartment to another hence sinking the ship.

Same technique can be employed for software and architecture. By partitioning your system you can avoid cascading failures. Bulkheads can be applied to physical and application services in such a way that if one of the hardware or application fails, the system should continue functioning. Critical applications should be partitioned and bulkheads should be implemented.

Imagine you have an application A and application B. Then there is a critical common service called service C. This service is very critical for both apps. In a conventional architecture, the design is as follows:

The problem with this architecture is if Service C goes down for any reason both of the apps will be affected. So Bulkheads pattern recommends the following:

Deploying Service C for both of the Apps provides better stability for the apps. This can be simply independent hardware, application host or thread pool.  You can partition thread pool in an application by deploying to multiple virtual machine.

Today many application servers provide means to separate runtime environments for applications. You can deploy the same application under different context and assign seperate JVM or CLR to go with it.

Also today we have docker and several Virtualization software which makes implementing Bulkheads easily.

 

Databases for integration is a terrible idea

Even though Databases as integration end points is a terrible idea. yet, we still see implementations.

dbsforintegration

For example: there is a CRM for the enterprise that is being used for the last 10 years. Every integration with the CRM has been done with a materialized view or direct access to the database tabes for integrating other applications and services. Well, the day has come and enterprise decides to change/ upgrade the CRM. But, this will have affect on all the integration points and there will be breaking changes. Also, it is not possible to do any audit trails, rate limiting or security checks for direct access to databases.

Moreover, there doesn’t exist documentation for the views and access to tables except the DB admins take a look at the users and ACLs.

In many words, using databases for integration is a terrible idea. Instead you should embrace services as integration mechanism.

Jeff Bezos, CEO of Amazon, have the following email sent to the developers:

1) All teams will henceforth expose their data and functionality through service interfaces.

2) Teams must communicate with each other through these interfaces.

3) There will be no other form of interprocess communication allowed: no direct linking, no direct reads of another team’s data store, no shared-memory model, no back-doors whatsoever. The only communication allowed is via service interface calls over the network.

4) It doesn’t matter what technology they use. HTTP, Corba, Pubsub, custom protocols — doesn’t matter. Bezos doesn’t care.

5) All service interfaces, without exception, must be designed from the ground up to be externalizable. That is to say, the team must plan and design to be able to expose the interface to developers in the outside world. No exceptions.

6) Anyone who doesn’t do this will be fired.

7) Thank you; have a nice day!

Robustness principle for microservices

Robustness principles by Jon Postel can be applies to microservices.

The principle says:

Be conservative in what you do, be liberal in what you accept from others (often reworded as “Be conservative in what you send, be liberal in what you accept”).

snappyApplying this principle to microservices, we need to be conservative while exposing our services and end points to outside however, be liberal in what we implement within.

Having different services exposing their contracts with different interchange models, it can be a quite messy and challenging for integration. Using a unified interchange model should help.

You should pay attention to integration between services but you can be liberal within your services.