jQuery Timeago plugin

In many applications, we display resources creations, modification date time, and many times we need this information in a more readable way to make is easy to grasp, something like, Post created 10 minutes ago, or image uploaded 3 weeks ago.

This information can be generated both on server and client side.

For client side, there is timeago plugin, which is convenient and easy to use.

jQuery timeago plugin

Here is a server side time ago method written in c#.

RabbitMQ tutorial

RabbitMQ is yet another messaging queue. It is open source, robust, easy, reliable, portable and scalable with high throughput and latency. RabbitMQ is developed with Erlang. RabbitMQ is based on AMQP, Advanced Message Queuing Protocol.

AMQP is an open standard that defines a protocol for system to exchange messages. AMQP defines both the interaction between the parties of the system as well as low level representation of messages that are being exchanged ie: wire format for messages.

Before going into details of RabbitMQ, let’s cover some basic glossary, definitions and use cases. A message is an entity that is transferred between different components of an architecture or infrastructure. Messaging or message queuing is a style of communication between applications or components of a system or systems that enables loose coupling architecture. Message can have several formats, from text to serialized binary. Messaging are sending and receiving messages between different parts of the system.

A messaging infrastructure provides several benefits as follows:

Interoperability: Your infrastructure can have different components in different technologies and languages. Having a messaging queue in the middle provides interoperability so independent components work seamlessly, unaware of other platforms.

Loose coupling: It is one of the best practices both in programming and in software design to implement lose coupling. A messaging queue helps achieve loose coupling easily. You can break your software into components which won’t have any dependencies between and can run independently from each other.

Some of the benefits of loosely coupled producer & consumer paradigm is as follows:

  • Producers or consumers fail without impacting each other.
  • Performance of each side to have no affect on each other.
  • Number of instances of producers and consumers can grow and shrink to enable scalability.
  • Producers or consumers can be located on different locations.

Scalability: Loose coupling brings scalability along. If you design your application that you can loose couple your application components, you can scale it easily.

Portable: You can port your messaging queue to any architecture.

Reliability: Most Messaging Queues provides reliable solutions, so that clients won’t lose any data. Messaging queues can work with many data stores such as databases, file systems or caches. It is essential to persist the data which is delivered to queue to a persistent storage in order to avoid data loss. Moreover, there is several persistence schemes implemented in different messaging queues, and interesting one that ActiveMQ uses is temp files via a message dispatcher, if there is no persistence setup. We would like to make sure that when we send a message, messaging queue receives our message, and likewise, we would like to ensure that when we ask for a message, we get a message and this message is removed from the queue. These operations are done with acknowledgements, similar to TCP.

Support for Protocols: most of the message queues support multiple transport protocols such as TCP, HTTP, and STOMP etc.

Enter Producer-Consumer Problem.

producer_consumer

Producer-consumer problem is a classical problem in computer science that deals with synchronization between processes. We have a producer which produces and feeds data, and then we have a consumer that consumes data. Producer and consumer share a common buffer to send and receive data respectively. Producer –consumer has a great benefit in parallelization of work, while producers are producing data, consumers can consume at the same time. Moreover, you can have multiple producers as well as multiple consumers. Wrong implementations of this concept can cause several problems such as deadlocks, race conditions. Producer-consumer problem can use a Queue as the buffer storage. We can have a blocking queue for this purpose; also, synchronization should be handled carefully, in order to avoid inconsistent states. Here is the usage of a Blocking queue in a producer – consumer problem, while producers are sending data to a blocking queue, consumers can request messages from the queue and process them for whatever purpose they need to. In order to avoid overflow, queue can block producers from overloading the queue with too much data, this is also called throttling. In order to avoid overflows and overloading of the message queue, producers are blocked until there are more resources available. Likewise for consumers, if the queue is empty, consumers just wait until there is a new message on the queue to be processed.

According the AMQP following are the core concept:

Broker is a middleware application that receives messages produced by publishers and delivers them to consumers or to another broker.

Virtual host provides tenancy to clients and provides security.

Channel is an abstraction layer that provides connectivity between broker and producer/consumer. Channels enable isolation of connectivity so connection don’t interfere with each other. It is possible to have multiple channel within a single connection and this increases performance since creating and destroying TCP connections are expensive.

Exchange is the initial destination that messages flow to. Routing rules are applied in change for their destinations. Routing rules can be direct (point to point), topic (publish-subscribe) and fanout (multicast).

Queue is the final destination of the messages ready to be consumed.

Binding is the virtual connection between exchange and a queue that enables messages to flow.

amqpspec

There can be different topologies for queuing.

toplogies

Applications that need to use RabbitMQ need to establish persistent connections to it. When connection is established, logical channels can be used for sending and receiving messages. Creating a connection is an expensive operation therefore, one a connection is setup, and multiple channels can be used to communicate with the broker.

Queue and Messages can have the following properties:

Durable: If you want the queue to stay declared even after a broker restart.

AutoDelete: If you want to keep the queue even if it is not being consumed anymore.

Exclusive: If you want the queue to be consumable by other connections.

Message properties are as follows:

contentType: Messages go as byte arrays between systems. Content type of the message can be sent along with the message, such as JSON.

contentEncoding: You can use a specific encoding when serializing messages into byte arrays.

messageId: Message ID is used to enable tracing messages in a system. Usually message IDs should be unique. UUID or GUID can be used for message ID.

deliveryMode: This parameter can have non-persistent : 1 or persistent : 2 values. If the message is persistent, it will be written to the disk and message won’t be lost.

autoAck: Once the consumer receives messages, it will inform the broker of the delivery and messages can be removed safely.

NoSQL, CAP Theorem, an introduction

We have vertical scalability that we throw bigger, powerful boxes to the problem and pray that it scales.

When we don’t have any more capacity for a single big box, we need another of single bigger box, and then we have to worry about consistency, latency, and replication, fault tolerance and so forth. We want to optimize our replication and communication, so we want to disable logging, journals just to improve the performance, which are not desirable at all. Sooner, we will run into many problems with managing these boxes.

Then we come to a point where we want to take off some load from the database, enter distributed caching. Caching doesn’t fully solve our problems because we are still dealing with Relational data model, joins, schema changes, normalization and queries.

We come to a point in which we want our application and data storages to be massively scalable, fault tolerant and consistent. While trying to achieve all these properties, we face with several problems.

For the last two decades RDMB became very popular for several reasons, simplicity being the major one. With the development of powerful SQL (Structured Query Language), relational databases become the center of database systems. With SQL, everyone was able to use data manipulation and data definition so easily. And SQL became ANSI standard.

Enter transactions. For RDMS with SQL there was a need for transactions and in my opinion transactions is a very important property for a data store to be aware of states. With transactions, most people will think about commit, rollback and ACID terms. Commit is if everything goes well with an operation then we can safely store the data in the storage, if something fails and we have an inconsistent state, we would like to rollback in order to avoid inconsistent state of data. ACID is rather a large topic. We are talking about Atomicity, Consistency, Isolation, and Durability of the state. Atomic means, all or nothing, so we either commit the data or we don’t. Consistent means, the data should be in one and only one state. It cannot be in multiple states, users of this data cannot see multiple states or version of the data. Isolated means, one and only one client of the data can operate on the data. You can’t have multiple clients working on the data at the same exact time, which would lead the data to be in a inconsistent state. Every operation on the data should be isolated from each other. Durable means that once the transaction is committed, the data would be in the same state as long as it is not modified by another transaction. Being said about transactions and ACID properties, it is rather easy to implemented transactions on single application compared to distributed systems. When you want to implement transactions across distributed system, then you have to consider the whole system, which introduces transaction managers, synchronizations etc. Distributed transactions introduce several complexities, and fault tolerance should be implemented very carefully.

In order to scale database, Sharding is introduced which is dividing up the data into meaningful clusters based on common id. For example, by initial letter of a person, which you can have 26 servers or so from A-Z, you can distribute your data accordingly to all the servers. Sharding is like partitioning your data based on a common key. Selecting a common key is very important. There are also other ways of sharding such as feature or functional sharding, in which you shard your databases based on functionality and feature. For example, you can store user data in one database and in another database you can store some product data etc. Moreover, there is key based sharding as I described above, these can be tweaked as for what you need. Hashing also can be used in this scheme. Another way to shard is to use a Look up tables, hash table, or dictionary, disadvantage of this (if there is a single lookup table), this table would be the bottleneck, and single point of failure. However, these days there are several fault tolerant distributed hash tables.

Sharding introduced shared nothing architectures, with sharding the entities within the shards are not aware of each other, and operate independently. Shared nothing architecture becomes very popular in the last few years or so. With shared nothing there is no dependency between entities of your system. There is no central control or master nodes. All of the nodes are same.

Cassandra is a decentralized, distributed, fault tolerant, elastically scalable, tunable consistent and column oriented database. It was designed based on Amazon Dynamo and Google Big Table models. Cassandra is decentralized and distributed so that client of the Cassandra is not aware of anything. So Cassandra acts as a single entity if it is distributed or not. There is no central authority, no master nodes. This is a great benefit in terms of failures, so there is no single point of failure, if one of your Cassandra server dies, the whole system keeps performing as if nothing happened. This is often called server symmetry, all the nodes are symmetric. Because Cassandra nodes are identical and there is no central authority this greatly increases availability due to decentralized model.

High availability and fault tolerance is a must have aspect of a distributed system, whatever is happening at the system, it should be seamless to the clients of the system. High availability is satisfying the requests of the clients. Moreover, in daily computing, you face with several problems from a system bug, to a failure to system/hardware failure; it is really very hard to know when there will be any kind of problem within a system. These problems should be considered and handled very carefully. Under any kind of errors, the system should still be available and satisfy its job.

Consistency is one of the most important parts of any distributed systems. While working with big data and distributed data stores, you will have to deal with consistency. There are flavors of consistency such as strict consistency, casual consistency and eventual consistency. Strict consistency is when you want to get the most recent data from any node of your distributed system. This is great! However, you will have to synchronize your data across all the servers before you can actually satisfy the request. In this case you have a problem with latency, and synchronization of you data across your cluster. So there is a trade off. Strict consistency is mostly used with financial applications and systems. When your data is very important you most likely want to go with this consistency level. Moreover, synchronization across all the nodes of your system is another hard problem you will have to face. Strict consistency uses global clock across your distributed system and works based on the global clock/timestamp. Latency makes this problem really hard. Casual consistency on the other hand offers weaker consistency. We are still trying for consistent data across our distributed system but this time instead of timestamp and global clock, we are interested in the events. So we act upon the occurrence of the events, rather than global clock. Eventual consistency, dictates that the data will propagate across all the servers at some point, we don’t know when, but will do. This is the weakest type of consistency. Basically, you write some data to a data store and it will be available in all the nodes after some time.

CAP theorem! According to CAP theorem you can only implement two of Consistency, availability and partial tolerance. You cannot have all three properties at the same time in a system. You can either implement Consistency, Availability or Consistency, Partial Tolerance or Availability and Partial Tolerance. This theorem has been proven by Nancy Lynch et al. At the very center of this theorem there is data replication and high availability and consistency comes in when you want to replicate your data across the nodes of your system. Consistency is relative to replication factor, in Cassandra consistency is tunable consistency, which means you can configure your instances in such a way that number of nodes that your data will replicate will be configurable. You can increase or decrease your consistency level and eventually data will be in consistent state across your servers. If you configure your data to be consistent across all your nodes than you will suffer from high availability, because the data will be written to all the nodes in system, it won’t be available right away. If you configure your data replication to be consistent in a small set of nodes, than you will have higher availability. I really like the fact that you configure consistency level with Cassandra. This is a great benefit.

This article (“Starbucks doesn’t use two phase commit“), is really a great to two phase commit.