Couchbase Client SDK Internals

In the previous post, there was an introduction to Couchbase. In this post, there is a summary of how client SDKs work internally.

While developing applications that communicates with Couchbase, developers use client SDKs.

Clients, in order to communicate with Couchbase, creates connnections, sends data over connections, tear down the connection and so forth.

Two things we need to pay attention is thread safety and connection management. Anytime we are working with systems that uses TCP/IP, we should be careful about connections. Because creating and tearing down connections are expensive operations. Moreover, thread safety is important, if the client SDK is not thread safe and being used carelessly in a multi-threaded environment or server, it will cause problems.

While most programming languages provides means for concurrency, threading and parallel computing, if the Client SDKs doesn’t support such functionality, you will have problems. That is why, it is essential to know if the client SDKs you will be using in your project is thread safe or not. Re-using a single object/connection will perform much better than creating and cleaning up references. With that said Java and .Net clients are thread safe. That means, if you create a single instance of the client, you can re-use the instance. However, if the connection fails, you will need to re-initialize it.

When it comes to connection management, optimal is minimal amount of connections. You should try to have one connection per application or client SDK.

For key based access, memcached protocol is used, on the other hand, for views another port is used.

When the client is initialized, we supply some basic information, such as where the client can collect the cluster map. Client uses HTTP to get cluster’s status and configuration data. Clients receives back list of serves (all the servers and their statuses), vBucket Map (list of all the vBuckets in the cluster, each buckets also provides the replicas position).

When the client is doing any operations, document key is hashed (consistent hashing). The result is between 1 and 1024. Couchbase cluster has 1024 vBuckets and the result of the key hash is where the document will be stored. Once the vbucket is located, client simply sends the data to the corresponding server.

Using long polling, client continuously communicates with the node to retrieve cluster map. If there is any change in the cluster map, client will be aware of the changes.

For client samples you can check out Couchbase Github page.

Couchbase introduction

We are all aware of Relational Database Management Systems which is based on Relational Model developed by Codd (normalization, foreign keys, joins etc.). RDBMS provides features like transactions, locks, ACID properties and joins. Data is stored in tables as rows. However there are some limitations of RDBMS in terms of scalability and schema changes.

Sample_DB_EER

Scalability has been a hot topic since the internet bubble for modern web applications. Hence the name, web scale, or internet scale. Today, many developers are addicted to this word “scalability”. In developers eyes everything should be scalable. Every architecture or design built should be in web scale. While this doesn’t apply to all projects, yet developers want to scale everything.

horizontal-vs-vertical-scaling-vertical-and-horizontal-scaling-explained-diagram

Scalability comes in two flavors, scaling out (horizontal) and scaling up (vertical). Scaling up is usually buying a more expensive hardware that has more CPU, RAM and storage. I have seen data stores with 8 TB of RAM, 128 core CPU and lots of storage. However there are limitations that a single machine can do. Scaling out is simply adding more servers/nodes to a cluster. The challenges with scaling out by adding more nodes to a cluster are how to distribute the data across the cluster evenly or logically, and then how to find them, we have to consider transactions and so on. Even though there are articles or books indicating the change of schema in RDBMS is rather difficult today it is easier and can be done online without taking the whole database offline. With NoSQL many of them are design and built as Schemaless. However, there is an implicit schema and in fact while accessing data from applications we need to know the data model, hence in application we need schema.

NoSQL data stores are no silver bullet and many times projects are developed with NoSQL stores because of the hype (Hype Driven Development). It is essential to consider the product being developed and then picking the correct data store for it.

In this article a cluster is collection of instances that are configured as logical cluster.

Some of the major benefits of NoSQL stores are known as follows: being Schemaless that provides rapid application development, can scale easily, high availability.

Moreover, you need to understand the CAP theorem if you are considering NoSQL stores.

Consistency: Generally speaking in a distributed system, the data across the clusters should be consistent with each other. Every component in a system should see the same data.

Availability: Systems should be able to serve clients at all times even if there are failures in some other parts of the systems. In short, client should be able to read, write and update all the time. Every request made to the system should receive a response.

Partition Tolerance: In a distributed system failure of replication data, communication between nodes shouldn’t stop the system from satisfying user requests. The system should continue functioning even if some parts fail or traffic is lost.

So the cap theorem indicates in a distributed environment you can have only two of the above three features. You cannot have all of them. Depending on your product and requirements you need to pick two.

cap-theorem-venn-diagram

Couchbase goes for AP (availability and partition tolerance) and provides eventual consistency.

couchbase_logo

Couchbase server is persistent, distributed, document based data store that is part of the NoSQL movement. Even though we say Couchbase is document oriented however the difference between document oriented and key value stores is little blurry since we have a key pattern that we can access objects by their key. Couchbase also provides Views which developers can access objects with other properties as well.

I have written about Membase which is dated back in 2011. There was also couchdb. These two companies developed Couchbase. Couchbase has the following set of features.

Scalability: Couchhbase scales very well and in a very easy operational way. Data is shared across the cluster between nodes. Hence performing lookups, disk and I/O operations are shared between nodes. Distributing data and load across the cluster is achieved by vBuckets (logical partitions, shards). Clients use consistent hashing in order to work with the data. Moreover, Couchbase scales linearly without compromising performance. Performance doesn’t change by adding new nodes to the cluster.

Schemaless: You don’t need to define an explicit schema, data model can be changed easily. In relational model, we need to have a rigid schema, and yet schema changes many times over the length of the project. Back in days, this was a challenge for relational stores. Today, there are migrations for changing schemas in relational stores without taking the data storage down. Migrations can be done online. However, care is needed while applying migrations to avoid data loss.

JSON document store: data is stored in JSON format. JSON is a compact key-value data format that is used for many purposes.

Clustering with replication: All nodes within a cluster are identical, Couchbase provides easy, built-in clustering support as well as data replication with automatic fail over. Certainly clustering is one of the feature that enables high availability.

High availability: It is easy to add and remove nodes to and from a cluster without compromising the availability. There is no single point of failure in Couchbase cluster, since the clients are aware of the entire cluster topology, including where every document is located.

Cache: Couchbase has built-in cache support. All documents are stored in the RAM. When the cache is full, the documents are ejected.

Heterogeneous: Within a Couchbase cluster all the nodes are identical and equal. There is no master node and single point of failure. Data is distributed and the load is distributed uniformly across the cluster.

One of the most distinguishing features of Couchbase is, being very fast. This is mostly due to a solid caching layer inherited from memcached.

In a Couchbase instance there is data manager and cluster manager.

data_manager

Cluster manager is responsible for nodes within the cluster, distribution of the data, handling fail over, monitoring, statistics and logging. Cluster manager generates the cluster map which client uses to access the data and find out where the data is located. Moreover, cluster manager provides a management console.

Data manager deals with storage engine, cache and querying. Clients uses cluster map to find out where the data is then work with data manager to retrieve/store/update the data.

Bucket is data container that stores data, similar to database in RDBMS systems. There is no notion of tables in Couchbase. While data is stored as rows in RDBMS, in couchbase data is stored as JSON documents in buckets. There are two types of buckets in couchbase.

In a bucket we have ejection, replication, warmup and rebalancing.

Memcached: Data is stored in the memory only. There is no persistence. If the node fails, reboot or shuts down, you lose all the data, since the data is in volatile space. Maximum document size can be 1 MB and if the bucket runs out of memory, eviction occurs. The oldest data will be discarded. Replication is not possible in Memcached bucket. Data is stored as blobs.

Couchbase: Data is stored in memory and persisted to disk. Also replication is done and delivers high availability since data is spread across the cluster. One a client sends data, it is stored in the memory and sent to disk-write queue as well as replication queue. Client can read the document from RAM while persistence is happening. This is called eventual persistence. Once the data is persisted to disk, data can survive a reboot or shut down. Maximum document size of a document is 20MB for Couchbase bucket.

A document has a unique ID to access it later on and the value of the document can be anything. Document has internal some properties used internally. Rev is internal revision ID used for versioning. Expiration can be set to expire documents. CAS is used for handling concurrency.

One of fundamentals developers need to pay attention is key patterns. While developing application using Couchbase, there are key patterns in order to find the documents later on.

While storing a document, you need a KEY to store the document and access the same document later on. So couchbase is like KEY-VALUE store. However, you can also access documents without keys, such as if you are interested in querying by other properties of documents. There are key patterns in couchbase for better management and development.

Virtual Buckets comes in play while distributing data across the cluster. Every document should have a unique document just like RDBMS primary key. Each bucket has 1024 logical partitions called virtual buckets. Using consistent hashing cluster manager knows which node to send the data and from which node to retrieve it.

Couchbase ensures that most frequently used documents stay in the memory, this features increases performance. However, when the memory cannot store any new incoming documents, it will eject the old documents. There is a configurable threshold for ejecting documents.

After a reboot or importing backup, couchbase can load data from disk to memory to serve requests. This process is called warmup.

Couchbase can have 3 replicas of data stored across the cluster which enables high availability. So same document in 4 different places. During a failure, cluster map is rebuilt serving from the replicas. Moreover, re-balancing takes place while adding or removing nodes from the cluster. Once replication is complete cluster map is re-built.

It is usually a better practice to store everything in a single document. Remember there are no joins in Couchbase and you don’t have normalization that you have in RDBMS. So if you store all the data in a single document, you will have some redundancy as well as same data in all the documents. However there are also benefits of storing everything in a single document. Storing and retrieving documents becomes faster. No joins are required since everything is in one place. If you split your data across multiple documents, you will have normalization which leads to less disk space, however there might be a performance penalty since documents will be spilled across the cluster. You will need multiple read operations to fully load the object model. By knowing the pros and cons, you can decide which model to go with.

Atomicity is handled at document level within Couchbase. A document is always stored as a whole object never partially, ensuring atomicity. Moreover, in a write operation client doesn’t wait the document to be persisted to disk or replicated. Response is returned to the client immediately. However, with client SDK you can wait until the document is persisted to disk and replicated. As you can imagine, client would wait for the operations to complete. While reading a document if the document is not in the memory, Couchbase will get it from the disk, store it in the memory and return it to the client.

The update operation is important. Cas (compare and swap or check and set) will provide a cas value to the client and while client is updating a document, it will send along the CAS value, if the cas value is same in the server, the update will sync, if the CAS value is different than what the server has, it will throw an exception. In RDBMS in order to maintain consistency, client acquires a lock on a record, so another client cannot update the same record. During a bulk update, one can have a table level locking, which hits the application performance. In Couchbase document based locking is possible. Both optimistic and pessimistic concurrency controls are possible with Couchbase.

While writing a new document to the server, server acknowledges the client for receiving the document, in order to find out if the document is persisted to disk, Observe method can be used.

Flush method can be used to flush all of the documents from a bucket. Use it with care.

flush

Moreover, there are atomic counters which can be used for counting operations or access patterns.

Couchbase supports both synchronous and asynchronous connection and operations.

 

 

 

True REST

REST has to keep state as in the name Representational State Transfer. There should be hypermedia and self-describing methods. If not, that is an HTTP Service. There are a lot of wrong definitions and misunderstandings about REST. REST is an architectural style. It was developed in HTTP and has set of rules and constraints. Using HTTP methods like GET, POST, DELETE etc., doesn’t mean you are doing REST.

For example some shopping sites uses REST, when the client retrieves a product, link to hypermedia comes along which means there is a state at the client. The user can browse to the hypermedia, which might have other hypermedia in it.

So in order to call a Service: RESTful, there has to be hypermedia and state hence the name Representational State Transfer.

Embracing Monolith

Traditional n-tier

traditional n-tier

This architecture doesn’t really scale. You can see analysis of architectures post to have more information about it.
Then Eric Evans came up with Domain Driven design and we have the following architecture.

domain_driven_design_n_tier

However, development and architecture should actually be simpler because HTTP is a very simple protocol. We mostly use HTTP GET and POST. To create resources we use POST, to retrieve resources we use GET.

web_app_characteristics

GET operations are safe and idempotent. GET can be called over and over. GET can be seen as queries. POST is unsafe and not idempotent. POST is commands.

One of the handicap with traditional layered architecture is that, a change in the system has to propagate through all the layers.When there is a change in UI layer or to introduce a new feature, all the layers gets affected. We have to make changes across all the layers from Presentation to Persistence layer. This can become very tedious and hard to manage.

During development lot of merge conflicts can occur as well. If more than one person changing the layers there will be conflicts.

changes

In traditional N-tier architecture Interface segregation principle is usually violated by having so many methods within repositories or services.
A repository usually have methods as queries and commands. Interfaces can be separated for Queries and Commands. CQRS to the rescue.

Features can be collapsed in to slices. For example for User feature, you can have Query, QueryValidator and QueryHandler, then you can have Command, CommandValidator, CommandHandler. All these methods can be stacked within a single entity. Everything can be found in one place.

Moreover, everything related to a feature can be stacked in one place, such as javascripts, css etc. Instead of spreading everything all over the project, related entities can be in the same place.

We want to have the following architecture.

featurebased

Validation, Authentication and Authorization can be domain agnostic and used in other projects. Every request in a system should be validated.
Features can use different technologies or different storage. For user management, you can use NoSQL stores etc.

How does this architecture fit in SOLID principles?
SRP (Single responsibility principle): One class per feature or concept. Since every request has one class, it has one reason to change.
OCP (Open-Closed Principle): Extend through cross cutting concerns.
LSP (Liskov Substitution Principle): Since there is no inheritance anymore, there is no substitution.
ISP (Interface Segregation Principle): We separate queries from commands.
DIP (Dependency Inversion Principle): Only true dependencies and external dependencies can be wired.

While still being a monolith feature based architecture resolves some of the problems traditional n-tiered architecture has.

We consider agility of traditional layered architecture is low due to propagating changes across layers. Responding to change can be low. However with feature based architecture, responding to change can be high, since you will make changes only within a single class.

Development of this architecture is also easier and can be developed efficiently in a team without causing any conflicts.

Since feature sets are independent, testing features independently also proves high testability.

Due to monolith nature, deployment can be as problematic and not easy similar to traditional layered architecture.

Performance and Scalability can be low with this approach as well.

 

Analysis of Software Architectures

Presentation can be found at Software design principles for evolving architectures.
Layered architecture is the most common architecture. It is also known as n-tiered architecture. For several years now, many enterprises and companies employed this architecture in their projects and it almost became the de facto standard therefore it is widely known by most architects, developers and designers.

Layers or components within layered architecture are organized into horizontal layers. Each layer performs a specific role in the application. Depending on your needs and the complexity of your software you can have N layers, however most applications work with 3-4 layers. Having too many layers is not good and leads to complexity, because you have to maintain all those layers. In a conventional layered architecture, you can find presentation, business or service, data access layers. Presentation layer deals with the look and feel of the application. UI related work happens here. We usually have Data Transfer Objects to carry on the data to this layer. Then with View Models output is rendered to the client. Business layer is responsible for executing certain business rules for requests. Data access layer is responsible for any database operations, every request to or from database passes through this layer.

Layers don’t have to know about what other layers do, in example: business layer doesn’t have to know how data layer is querying the database, instead business layer expects some data or not at all when it invokes certain method in data layer. This is where we meet separation of concerns. This is a very powerful feature. Each layer is responsible for its own purposes.

The key concept in layered architecture is managing dependencies. If you apply dependency inversion principle and use TDD (test driven development), your architecture becomes more robust. You need to ensure that you have all test cases for all possible uses cases.

If you have redundancy such that even though you are not doing any business processing but just calling business layer which invokes data layer, this is called layers of isolation. For some features if you call directly data layer from presentation layer, then any change made in data layer affects both business layer and presentation layer.

layered

An important concept in layered architecture is that, if a layer is open or closed. If the layer is closed every request should go through this particular layer. If the layer is open, requests can pass through the layer to the next one.

Layers of isolation help reduce complexity of the whole application. Some of the features and functionality doesn’t need to go through all the layers; this is where we need open-closed approach to simplify implementations.

Layered architecture is solid general purpose architecture when you are not sure which architecture is most suitable for you. It is a good starting point. You need to watch out for architecture sinkhole anti-pattern. This anti-pattern describes how requests passes through without doing anything or with little processing. If your requests pass through all the layers without doing anything, this is a sign of sinkhole anti-pattern. If you have 20 percent of requests just passes through layers and 80 percent of requests does real processing, it is fine, however if the ratio is different then you are having sinkhole anti-pattern syndrome.

monolith

Moreover, layered architectures can become monoliths and hard maintain code the base.
Layered Architecture Analysis:

Agility: Overall agility is the ability to respond to a change in a constantly changing environment. Due to the nature of monoliths, it might become hard to reflect the changes through all the layers, developers need to pay attention to dependencies and isolation of layers.

Ease of Deployment: For larger applications deployment can become a problem. A small requirement might require the whole application to be deployed. If continuous delivery is done right it might help.

Testability: With mocking and faking, each layer can tested independently hence makes testing easy.

Performance: While layered applications can perform well, since requests has to go through multiple layers, it might have performance problems.

Scalability: It is not very easy to scale layered applications, because of tight coupling and monolithic nature of the pattern. However, if layers are built as independent deployments, scalability can be achieved. But it might be expensive to do so.

Ease of development: This pattern is particularly very easy to develop. Many companies adopt this pattern. Most developers know, understand and learn easily how to work with it.

Event Driven Architecture

Event driven architecture is a popular distributed asynchronous architecture pattern which is used to create scalable applications. This pattern is adaptive and can be applied to small or large scale applications. Event driven architecture can be applied with mediator topology or broker topology. It is essential to understand the difference to be able to select the correct topology for the application.

Mediator topology requires orchestration between multiple events. For example: imagine a trading system, each request should pass through certain steps such as validation, order, shipping and notifying buyer etc. Some of these steps can be done manually and some can be done in parallel.

There are usually 4 main types of components within the architecture which are event queues, mediator, and event channel and event processor. Client creates an event and sends to event queue, mediator receives the event and passes it over to event channels. Event channels pass the event to event processors in which event are processed.

eventdriven

Event mediator doesn’t do or knows any business logic, it just orchestrates the events. Event mediator knows the necessary steps for each event type. Business logic or processing happens in event processor. For event channels, message queues or topics can be used to pass the event to event processors. Event processors are self-contained, independent and decoupled from the architecture. Ideally each event processor should be responsible for only one event type.

Usually enterprise service bus, queue or hub can be used as event mediator. Choosing the correct technology, implementation reduces risk.

Broker topology unlike mediator topology doesn’t use any central orchestration. Simple queue or hub can be used between event processors and event processor knows the next event processor to handle the event.

Eventdriven_broker

Event driven architecture is relatively complex pattern to implement because of distributed and asynchronous nature. You can face with many problems such as network partitioning, failure of mediator, reconnection logic and so on. Since this is a distributed pattern, and it is async if you need transactions, you are in trouble; you will need a transaction coordinator. Transactions in distributed systems are very hard to manage. It is not easy to have a standard unit of work pattern here.

Yet another challenging concept here is contracts. Architects claim that services contracts should be defined up front and it is expensive to change.

Event Driven Architecture Analysis:

Agility: Since events and event processors are decoupled and can be maintained independently, agility of this pattern is high. Changes can be done quickly and easily while not affecting the overall system.

Ease of deployment: Since this architecture is decoupled, it is easy to deploy as well. Components can be deployed independently and can be registered at mediator. Broker topology is rather simple.

Testability: While testing independent components is easy, testing the overall application can be challenging. So end to end testing is hard.

Performance: Event driven architecture can perform very well since it is asynchronous. Moreover, event channels and event processes can work in parallel since they are decoupled.

Scalability: Event driven architecture can scale very well, since components are decoupled, components can scale independently.

Ease of development: It is not easy to develop this architecture. Contracts should be defined well, error handling and retry mechanisms should be done properly.

Microkernel architecture pattern

Microkernel architecture pattern is also known as plugin architecture pattern. This pattern is ideal for product based applications and consists of two components: core system and plug-in modules. Core system usually contains minimum business logic but it does ensure to load, unload and run the necessary plugins. Many operating systems use this pattern hence the name is microkernel.

Plugins can be independent from each other and hence decoupled. Core system has a registry where plugins are registered and core system knows where to find them as well as how to run them.

microkernel

Even though this pattern is very suitable for desktop application, it can also be used in web applications. In fact, many different architectural patterns can be a plugin of the whole system. For product based application where you will have new features and functionalities added to system in time, microkernel architecture is a great choice.

Microkernel Architecture Analysis:

Agility: Since plugins can be developed independently and registered to the core system, microkernel architecture has high agility.

Ease of deployment: Depending on how the core system is implemented, deployment can be done without restarting the whole system.

Testability: If plugins are developed independently, which it should, testing can be done independently and in isolation. Plugins can also be mocked by the core system.

Performance: It depends how many plugins you are running, but performance can be tuned.

Scalability: If the complete system is deployed as a single unit, it would be hard to scale this system.

Ease of development: It is not easy to develop this architecture. Implementing the core system and registry can be difficult, moreover plugin contracts and data exchange models adds to the problem.

Microservices Architecture Pattern

Even though Microservices are fairly new, it did capture lot of attentions rather quickly as an alternative to monolithic applications and service oriented architecture. One of the core concepts is separately deployable units which enables high scalability, ease of deployment and delivery. The most important concept is service components. Service components include business logic and processes. Designing service components for desired granularity is essential and challenging. Service components are decoupled, distributed, independent from each other and accessible with a known protocol.

Microservices developed due to the problems in monolithic and service oriented applications. Monolithic applications usually contain layers that are tightly coupled which make deployments and delivery difficult. For example, if the application breaks every time there is a change, this is a big problem and arises from coupling. Microservices separates application into multiple deployable units hence enhances development, deployment and testability easier. While Service Oriented Architecture is very powerful, enables heterogeneous connectivity and loose coupling, it comes with a high cost. It is complex, expensive, difficult to understand and implement and usually overkill for most applications. Microservices simplifies this complexity.

microservices

It is totally normal to have redundant code across service components. You can violate DRY principle while developing Microservices. With the benefit of separate deployable units, deployment becomes much easier. Some of the challenges are contracts between service components and availability of service components.

Microservices Architecture Analysis:

Agility: Since service components can be developed independently and decoupled, microservices architecture has high agility. Separately deployable units can respond to changes rather quickly.

Ease of deployment: Microservices has advantages over other architectural patterns because services components are separately deployable units.

Testability: Testing can be done independently for service components. Testability is high.

Performance: Depends on the service components and distributed nature of this particular pattern.

Scalability: Scalability is high because of separately deployable units.

Ease of development: Separate service components can be implemented independently.

Comparison

 

 

 

 

Ultimate Guide for Evolving Architectures

In my previous post, I covered some of the fundamental software design principles . In this post, you will find a guide for evolving architectures.

Let’s start by defining architecture. What is software architecture? You can find the Wikipedia definition of software architecture as follows: “Software architecture refers to the high level structures of a software system, the discipline of creating such structures, and the documentation of these structures. These structures are needed to reason about the software system.”

Is there a similarity between architecture of building and software? How about city planning?

211_10_architect-and-design-also-engaging-3d-architecture-design-vol2-no11--1920x1200--picrolls--free

Actually software architecture can be seen as set of design decisions that are hard to change, also common understanding of the system between people who are leading or working on the project can define what architecture is.

While Kent Beck is working on the projects, he asks people to define the system with 4 objects, if people can define it using similar objects, that means there is a shared understanding. If they are different, there is a misunderstanding between people and should be resolved.

In most enterprises there are employee with architecture title such as “Enterprise architect, system architect, solution architect etc.”. There is lots of title that ends with “architect”. It sounds really cool also.

One of the anti-pattern today is that person with the title “architect” walks in and creates all the architecture in the beginning. The problem with this approach is that we are trying to create something that we don’t have much information about. So there is high risk. That is why one of the principles of good software architecture is to avoid big up front design.

timevsunder

As the time goes by in the project, we will know and understand the domain, features and problems we are trying to solve much better. In the beginning we don’t have much idea about it that is why big up front design is a bad idea. The architecture has to evolve. It shouldn’t be decided up front. It shouldn’t be some static artifacts defined in the beginning of the projects. The evolution should have reasons, value, impact on the product and should be deliverable.

This leads us to principle of last responsible moment, reversibility, YAGNI (you are not gonna need it, yet) principle.

Principle Last responsible moment is about postponing the decisions until you cannot postpone them any further. One of the examples for this principle is selecting frameworks, databases, plugins etc right in the beginning of the project. We can call this as FDD (Framework driven development). Sometimes developers pick technologies to boost their resumes which is called RDD (Resume Driven Development). Selecting technologies, frameworks or plugins can wait until you understand the problem domain and have a proper solution for it.

YAGNI principle is very similar to last responsible moment that tells us if you don’t need a certain implementation right now, don’t work on it. Wait until you see a value in it, then you can implement it.

Reversibility is very critical because we are saying that software architecture is set of design decisions that are hard to change. In a software project if things are reversible we can claim that it has a solid architecture. You should ask yourself, how can I back out from a certain design decision that I made early on? Decisions should be changed easily. If it is too expensive to roll out a design decision there is probably a problem with the architecture. Core driver of complexity is reversibility. If you can make process reversible, you can simplify things.

So, instead of big up front design, we should try to understand the problems we are trying to solve. Define the feature sets or functionality that we need, evaluate their value and architectural impact by creating a table and sorting them with the highest value along with architectural impact. Start with the one with the highest value and architectural impact. With this approach you can have well balanced decisions early on.

design economy

Martin Fowler calls the graph above as design stamina hypothesis. We have cumulative set of features and functionalities. Every time you want to add a new feature, if it gets harder and harder to do so because of the existing code base, then the design is not robust. On the other side, if you can easily add new features, then the design is good. This is why software architecture is important. If you have too much up front design, your velocity of adding new features is not as great as balanced design. If you have no design your velocity of adding new features is also lower than balanced design. When the architecture of the system is not good, you can’t easily add new features, you will end up losing customers that leads us to the economics.

I wanted to mention one more time, minimize framework and libraries. Today in many companies there is a tendency towards creating frameworks, plugins, and platforms. Concentrate on your business, solve your problems, and refactor re-usable components.

Make it work, make it better real soon. Let your architecture evolve around the product you are working on.  Always keep in mind “The reason it all exists”. Keep it simple and stupid. Maintain the vision and keep focus. Also, nothing works as expected. 🙂

 

 

Software design principles you need to know!!

Good design is almost impossible to get it right the first time. Several iterations are required for a successful design. Software design should be proactive. Software is always rewritten because software has to constantly evolve. If the cost of changing the software is minimum, then that is a good design.

Kiss principle, keep it simple and stupid. We create complexity so quickly. Complexity makes code difficult to change. Simple keeps focus. Simple makes it easy understanding the problem and solving it. Simple is easier to understand. There is a difference between simple and easy. Easy is familiar. Simple is not necessarily familiar.

We deal with inherent and accidental complexity. Inherent complexity comes from the domain and it is natural. Accidental complexity is what developers create and should be avoided. Good design hides inherent complexity and eliminates accidental complexity.

Yagni, you are not gonna need it. This principle is related to postponing the decisions or implementations to later. If you don’t need to it right now, don’t do it. Don’t do it until you see a value in doing it. Postpone until you cannot postpone anymore.

Principle of cohesion. Cohesion is narrow focused code, takes one responsibility and does it well. We want software to change but in an easy and inexpensive way. If one piece of code does several things, it needs to change many times. Cohesive code usually has single responsibility. Cohesion is related to Single responsibility principle. We need to ensure we can define the responsibility of a method for high cohesion. Cohesion can be applied at class and method level. Long methods are a bad idea because they are hard to test, hard to debug, leads to duplication and hard to reuse. Long methods are usually low cohesion and high coupling. We want high cohesion and low coupling. Removing or reducing coupling creates better design. Coupling with interfaces instead of classes is better. Methods should define abstraction in high level and many methods can be combined to define bigger abstractions. This is called compose method pattern or single level of abstraction principle. Abstractions can be combined to define the bigger picture. Methods should be cohesive and small which makes it easy to understand. Combining small and cohesive methods help create better code bases.

Dry (don’t repeat yourself) principle tell us not to duplicate code and effort. Every knowledge in a system must have a single, authoritative, unambiguous representation, from the pragmatic programmers. We should not duplicate code which leads to duplication of effort. Continuous refactoring help reduce code duplication.

Principles of least knowledge, aka Law of Demeter, states that you shouldn’t know the inner details of other methods or classes. For example, if you are using a method to process credit cards, except the method signature, you don’t need to know or care about how the method is implemented.

  • Single responsibility principle.
  • Open/Closed principle.
  • Liskov Substitution principle.
  • Interfage Segregation Principle.
  • Dependency Inversion Principle.

Single responsibility principle:
While developing software, underlying components should have single responsibility. Benefit of this principle is loose coupling and high cohesion. The component should be specialized doing only one thing.

Open/Closed principle:
With open/closed principle software components/entities should be open for extension and closed for modification. Benefit of this principle is also loose coupling and high cohesion. Abstraction and polymorphism are the key to make achieve this principle. Imagine a method that has many if statements in it and whenever you want to add something or make a change in the logic you need to open that method and make the change. If you need to make a change you have two options. One of them is to modify the current code base, the other is to add a new method. Of course adding a small method is much more easier and flexible to add a feature. In order to build an extensible software, you need to know the domain and the business.

Liskov Substitution principle:
While developing software it is highly recommended to program to an interface rather than programming to a implementation. Also known as program to a contract. In this context programming to an interface both contains Interfaces and Abstract classes. Due to polymorphism, you should be able to substitute an implementation with another implementation due to this principle. Inheritance should be used only for substitutability.

Interface segregation principle:
while developing software we said programming to an interface is good. However, it is not quite good to make a single interface and implement that. Interface segregation suggests us that interfaces should be highly cohesive and specialized. Obviously this will provide us high cohesion.

Dependency Inversion principle:
Dependency inversion principle tells us that we should build our software depending on abstractions not implementations which will provide us loose coupling. In this context abstractions means programming to an interface, just to be clear.