An in-depth article that explains how a relational database handles an SQL query and the basic components inside a database.Great read!
What Every Programmer Should Know About Memory. Not really, pretty long paper but contains lot of useful information.
A repository dedicated to the Lambda Architecture (LA). Examples and good practices around the LA.
For the last 6-7 years I have been reading and hearing about CAP theorem alot. But it had all started back in 2000 by Eric Brewer with a paper namely “Towards Robust Distributed Systems“. So Dr Brewer talks about STATE, Persistent STATE, ACID vs BASE. And claims that you can have at most two of the following properties : Consistency, Availability, Tolerance to network Partitions.
Of course we have concurrency which adds up to the problem.
In 2002, professor Nancy Lynch and Seth Gilbert formalized “Brewer’s conjecture and the feasibility of consistent, available, partition-tolerant web services“.
The CAP Theorem says that it is impossible to build
an implementation of read-write storage in an asynchronous network that
satisfies all of the following three properties:
* Availability – will a request made to the data store always eventually complete?
* Consistency – will all executions of reads and writes seen by all nodes be atomic or linearizably, consistent?
* Partition tolerance – the network is allowed to drop any messages.
So CAP theorem tells us that we can’t build a system that is always available, responds to every request, consistent, returns the expected results.
When we look at NoSQL databases, they resemble a simple register that you can do set(x) and get() operations. So in most systems you write and read data, which is called read-write storage.
In database systems atomic ( Linearizability) transaction can be simple defined as all or nothing.
Availability in this context means a system should always respond, returning error messages all the time doesn’t count. Respond can return in sub milliseconds to hours or even more but eventually respond should arrive. So this is a strong and weak requirement. Strong in the sense that the system should always return a response but weak because the system can take a very long time to respond.
Partition means a network losing messages, ie: there is a disconnect in the network. Networks are unreliable.
A client writes to one node and there is a network partition between nodes, when the client tries to read from other, we need to make some decisions about the response and it is common for network partitions in large distributed systems.
Replicated databases maintain copies of the same data on multiples nodes, ie: couchbase to provide lower latency to users and to prevent data loss. This is really cool, however applications using these storage should be aware that different nodes might have inconsistency.
Below you will find a paper about the Critique of CAP theorem. nice read.
Even though we find caching very attractive and implement it whenever we need low latency below you will find an article mentioning about the drawbacks of caching. There are some points I agree that it is hard to debug and causes weird bugs when not implemented correctly.
Interested in data science? Below you will find some free books.
Here is a break down of hadoop ecosystem.