Circuit breakers

In my previous post, bulkheads, I mentioned that “Since the popularity of service oriented architectures and then microservices, people are talking about bulkheads and some other terms like circuit breakers, timeouts etc. However, I see that many lacks the understanding of what these terms really mean or how to implement them. I decided to cover some of them and wanted to start with Bulkheads.”. In this post I will briefly cover up Circuit breakers.

Not many years ago, people used to plug too many electronical appliances into their circuit, each appliance drew certain amount of current and when the current is resisted, it created so much head in the walls of the house and then the house burns down. Later on there were some optimizations but still houses on fire.

Circuit breakers come to rescue.

This idea can also be applied to software applications with integration points. A software with one or more integrations are destined to fail at any given time. This is a given. Circuit breakers can help prevent operations that is already known not to be healthy.

Circuit breakers are a way to degrade functionality when the system is under stress. Changes in the circuit breakers’ state should always be logged and monitored. Circuit breakers are effective at guarding against integration points, cascading failures, slow responses etc.

In a normal closed state, circuit breakers executes operations as usual. Other services can be invoked, internal operations can proceed. However if any operations fail, circuit breakers store this information, that is including any times. Once the number of failures exceeds a certain threshold, circuit breaker trips and opens the circuit. Any call made to the circuit breaker, it fails immediately.

This is very important, because most of the failures occurs  due to blocked threads, race conditions and dead locks. Hence, if a service is not responding or continuously timing out, what is the point of invoking it. All yours threads will be blocked and you will run out of them. Soon, your JVM or runtime will crash.

After a configurable amount of time, circuit breaker can go into half open state, in which calls can pass-through and if all goes well, circuit breaker closes. If things are still failing, circuit breakers opens again.

If circuit breaker is open, you can either let the user know something is not working and check back soon. Or you can have fallback services. The latter is much more better however services unfortunately can not have fallback due to their responsibility and function.

You can have multiple circuit breakers for different purposes such as timeouts, connections refused and other type of failures.

There are several tools that can help you implement circuit breakers in your system. Netflix has an open source project for this purpose called hystrix . You can check out it and see how things work.

 

Bulkheads

Since the popularity of service oriented architectures and then microservices, people are talking about bulkheads and some other terms like circuit breakers, timeouts etc. However, I see that many lacks the understanding of what these terms really mean or how to implement them. I decided to cover some of them and wanted to start with Bulkheads.

This term comes from Ships. In a ship, bulkheads are partitions that can be sealed and closed during an emergency. Something like following:

If one of the compartment starts taking water, it can be sealed once hatches are closed which prevents the water moving from one compartment to another hence sinking the ship.

Same technique can be employed for software and architecture. By partitioning your system you can avoid cascading failures. Bulkheads can be applied to physical and application services in such a way that if one of the hardware or application fails, the system should continue functioning. Critical applications should be partitioned and bulkheads should be implemented.

Imagine you have an application A and application B. Then there is a critical common service called service C. This service is very critical for both apps. In a conventional architecture, the design is as follows:

The problem with this architecture is if Service C goes down for any reason both of the apps will be affected. So Bulkheads pattern recommends the following:

Deploying Service C for both of the Apps provides better stability for the apps. This can be simply independent hardware, application host or thread pool.  You can partition thread pool in an application by deploying to multiple virtual machine.

Today many application servers provide means to separate runtime environments for applications. You can deploy the same application under different context and assign seperate JVM or CLR to go with it.

Also today we have docker and several Virtualization software which makes implementing Bulkheads easily.

 

Blue-Green deployments – no more downtime!

Deploying new functionality or version of a software always have the risks of introducing bugs, downtime and chaos. I usually get shivers during deployments J  Some of the things that can go wrong:

  • Application failures
  • Capacity issues
  • Infra failures
  • Scaling issues

If you are practicing continuous delivery correctly, releasing to production is not always another deployment. You need to measure risks, think of what can go wrong, coordinate and communicate while going to production.

There are low risks techniques to release software which are as follows:

  • Blue green deployment
  • Canary releases
  • Dark launching
  • Production immune system
  • Feature toggles

Blue green deployment is a release management technique to minimize downtime, avoid outages and provides a way for roll back during deployment.

Initially you need to have at least two identical setups.

For a setup of two identical environments, one is called blue while the other is called green. That is why this release technique is called Blue/Green Deployment. Companies like Netflix call this technique Red and Black deployment. I have also heard; it is called A/B deployment. Regardless of the name, the idea behind it is pretty straightforward.

You can always have multiple identical setups in geographically distributed data centers or within the same data center, also on cloud.

A load balancer or a proxy is a must to achieve this process.

While deploying, you need to cut off the traffic from one setup and have the traffic go to other setup(s). The idle environment is called blue and the active setup is called green. You do the deployment to blue environment, once the deployment is done, you do the sanity checks and tests, once the environment is healthy, you can take the traffic onto it. If you see any problems, with the blue setup, you can always roll back to the stable version.

The best part of this practice is that deployment happens seamlessly to your clients.

This technique can eliminate downtime due to application deployment. In addition, blue-green deployment reduces risk: if something unexpected happens with your new release on Green, you can immediately roll back to the last version by switching back to Blue.

If you have multiple data centers:

Initiall two data centers are active, serving to clients.

Take the traffic off from Cluster 1 and let it go to Cluster 2. Do the deployment to Cluster1, check if deployment is successful, run the tests.

Take the traffic off from Cluster 2 and let it go to Cluster 1, your new version is now live. Then do the deployment to cluster2, check if it is successful, test it.

Have the traffic go to both cluster by routing the traffic to both data centers.

Challenges:

Data Store replication: If your application is using any data stores across different region or data center replication becomes crucial. Database and schema migrations need to be implemented. This can be a uni- directional or bi-directional replication depending on the needs.

However, if you are using a single data store that is feeding applications, database schema changes need some attention as well.

If your app uses a relational database, blue-green deployment can lead to discrepancies between your Green and Blue databases during an update. To maximize data integrity, configure a single database for backward and forward compatibility.

Service Contract changes: Yet another challenge is updating service contracts while keeping the applications up and running.

Cache warming up: After deployment warming up caches can take some time.

Session Management: Session management strategy is crucial and can be problematic during deployment. Using a dedicated session storage will help a lot while doing blue/green deployments.

Today, we have docker, kubernetes and cloud of course. All these platforms also supporting and embracing blue green deployment.

Read more at martin fowler

 

 

Databases for integration is a terrible idea

Even though Databases as integration end points is a terrible idea. yet, we still see implementations.

dbsforintegration

For example: there is a CRM for the enterprise that is being used for the last 10 years. Every integration with the CRM has been done with a materialized view or direct access to the database tabes for integrating other applications and services. Well, the day has come and enterprise decides to change/ upgrade the CRM. But, this will have affect on all the integration points and there will be breaking changes. Also, it is not possible to do any audit trails, rate limiting or security checks for direct access to databases.

Moreover, there doesn’t exist documentation for the views and access to tables except the DB admins take a look at the users and ACLs.

In many words, using databases for integration is a terrible idea. Instead you should embrace services as integration mechanism.

Jeff Bezos, CEO of Amazon, have the following email sent to the developers:

1) All teams will henceforth expose their data and functionality through service interfaces.

2) Teams must communicate with each other through these interfaces.

3) There will be no other form of interprocess communication allowed: no direct linking, no direct reads of another team’s data store, no shared-memory model, no back-doors whatsoever. The only communication allowed is via service interface calls over the network.

4) It doesn’t matter what technology they use. HTTP, Corba, Pubsub, custom protocols — doesn’t matter. Bezos doesn’t care.

5) All service interfaces, without exception, must be designed from the ground up to be externalizable. That is to say, the team must plan and design to be able to expose the interface to developers in the outside world. No exceptions.

6) Anyone who doesn’t do this will be fired.

7) Thank you; have a nice day!

Principles, Practices and Constraints

principlesPrinciples are abstract set of related ideas. Practices, on the other hand,  are concrete actions and implementations that supports principles.

For same set of principles there might be different practices. ie: for the same principles there might be different practices in .Net community or Java community.

And then we have constraints that restricts our activities. While practices are not strict and can be bent, constraints are the other way around.

Our problem is we focus so much on practices and constraints then we actually forget about the principles. Instead we should always focus on the principles, use necessary practices and embrace constraints. So that, we can come up with better solutions, innovate and optimize.

 

 

Business problem of your software

Based on software architectures, business logic or business problem, lies somewhere specific within the application, if done right. This can be on server side, client side, or bit of both, and sometimes within databases (stored procedures) .

537px-Overview_of_a_three-tier_application_vectorVersion.svg

Image from: Wikipedia

We have monolithic applications that tries to have the business logic usually within some services or controllers(if we are using MVC). We usually have Domain entities and View Models (DTO) that we send back and forth between layers. Domain entities or model in this case doesn’t contain any business logic, they are just data containers so that we can persist the data to data stores via some ORM.

Then there are client-server or smart client desktop apps, which have partial business logic on client and some business logic on the server side (services).

This approach is also called anemic model. You can find several discussions on whether anemic model is an anti-pattern or not. Even though anemic model is pretty old way of doing things, today it is still applied to many software projects because of its simplicity.

Today many applications lack necessary documentation of where the business problem or logic resides. Worst ones are those which the business problem are on the head of developers. That being said, you should have a concrete planning and strategy on documentation and architecture of your apps to have your business logic.

Keep in mind data storage, and technology is not your business problem.

Your software sucks

Your software sucks if you have the following signs or symptoms:

Rigidity:

Rigidity is the tension for software to change. If things are connected in such a complicated way that is hard to make a change, that causes rigidity.

For example, a small change or addition in a monolith can cause changes in several layers in the application. One smell of this example is one day of work, ends up being a one week of work.

Rigidity also causes fragility.

Fragility:

art-beauty-bubble-dream-fragility-favim-com-357541Your code is easy to break. This is a very common symptom. One example can be, if you are using setters and getters in your class definitions and you are consuming your objects through setters and getters. Imagine you have a property which is integer and you decided to make it decimal. If you need to visit several places in your code base to make the necessary changes to have the code compile again, this is a typical sign of bad design.

How about MVC? MVC promises for loose coupling, right?  Do you use your models directly in your Views? What happens when you make a change to your model? Do you need to go through a lot of views to make them actually work?

Coupling causes fragility. You can easily spot or recognize the code that is fragile.

Immobility:

If your code is hard to reuse in the same or in different projects, that is immobility. You are not using interfaces enough. Your classes are so focused so that you can’t reuse them or your classes have unnecessary dependencies.

Developers tend to write lot of generic classes or methods for reuse, however, that might cause complexity. Generics are also hard to maintain.

Viscosity:

If your software is easy to hack but hard to fix, that is a sign of bad design. When your software requires a change, there is usually more than one way of implementing it. Sometimes, developers preserve the design goals and principles but something they hack their way through. Especially, if maintaining the design goals are challenging. It is usually easy to do the wrong thing but hard to do the right thing.

Complexity:

mazeEverything that is too complicated is destined to fail. We love complicated things and problems. We enjoy it. However, software should be as simple as possible. In my previous post about core software design principles I mentioned about two types of complexity, one is accidental and other is inherent complexity. Inherent complexity is unavoidable, which is the problem domain. We should refrain from accidental complexity, that we make things complicated.

If something is complicated that is almost always bad. Look at technology that people do not use anymore. In java world there was EJB (enterprise java beans). Almost 60 percent of the projects that implemented EJBS didn’t work. 30 percent of remaining was so bad that it required so much time for deployment and configuration. Because, it was all too complicated.

Duplication:

If you have duplicate code and bad structure in your solution or projects, this is a typical symptom of bad design. We have DRY principle that tells you not to repeat yourself. Copy paste is bad. It doesn’t only cause duplication of code but also the effort. In my opinion there is nothing wrong with two lines of method. It is much better than duplication of code. Because, if you have a bug in your code, even a small one, you duplicate it in other places.

Opacity:

If your software is hard to understand, that is a symptom of bad design. This is related to complexity as well. Your code should be clear and easy to understand, not only by you but also by other developers as well.

 

These are usually the signs, smells or symptoms of bad software design. SOLID principles help for better software design, regardless of the technology you use. There are also other software design principles you can refer to while developing software.

 

Pile of shit

Certainly, Refactoring should take place during every phase of software projects. Personally, I wouldn’t accept any excuses around it. You can refactor your code, derive re-usable components and useful patterns from it, i.e.: command-query, data access patterns and so on. Recently I have chatted with project managers of a very large project. We wanted to integrate our crash reporting system into their project. They were a bit skeptical at first. Once they confessed that they use exception handling for flow control, I asked them why they don’t refactor, the response was not acceptable.

pileofshit

Yet another project I have recently witnessed has 3000 lines of code in a single method. Probably only the person who wrote the method can understand it. Compose method pattern can be used for this methods while re-factoring.

In so many ways, these projects resembles pile of shit. Yet, they are destined to be re-written. My curiosity is, will it be different? I am working on a post for “Software for a change”, which will be published soon. Please follow me on that one.

You don’t have to be a very experienced developer to realize the  problems above, use your intuition. If something doesn’t feel right, you are probably doing it wrong. If it feels too complicated, you are probably doing it wrong!

Log all the things

Today is logging day. I have published couple more posts about logging. This is yet another one.

The Log: What every software engineer should know about real-time data’s unifying abstraction

“You can’t fully understand databases, NoSQL stores, key value stores, replication, paxos, hadoop, version control, or almost any software system without understanding logs; and yet, most software engineers are not familiar with them.”

“So, a log is not all that different from a file or a table. A file is an array of bytes, a table is an array of records, and a log is really just a kind of table or file where the records are sorted by time. ”

“The two problems a log solves—ordering changes and distributing data—are even more important in distributed data systems. Agreeing upon an ordering for updates (or agreeing to disagree and coping with the side-effects) are among the core design problems for these systems. ”

Great article. Read on..