NoSQL Systems

MongoDB or Hbase or Cassandra or any NoSQL systems are quite different from the RDBMS (Relational Database Management Systems) and NoSQL system is a paradigm shift in how we store and retrieve data. The trend of developing NoSQL system was started at Google with the introduction of MapReduce technology for the Google Maps.

The name NoSQL means in general that “This not only supports SQL” storage, but other kinds of storage too, like :
  • Graph Stores - Social Connections as graphs, Networks
  • Wide Column Storage - Data stored as wide columns instead of rows
  • Key-value Stores - Every single item in the database is stored as attribute name
  • Document Databases

The key benefit of using NoSQL systems is the scalability and high performance on high amount of load. The problem with RDBMS is that scaling means getting more computing power in a single box with higher processors and more memory. NoSQL systems are distributed in nature, hence a cluster of machines can be made to work with NoSQL and can be scaled to 1000 of computers. With the evolution of cloud computing, NoSQL systems came into widespread use. The advantages of cloud computing is it provides computer on the go, and the famous “Pay as you go” model proved very successful for both the cloud providers and cloud consumers. There is no upfront cost involved, hence a lot of Startups can use it with minimalistic cost, and the big organisations need not spend money on getting complicated servers, instead use a model of Hybrid Cloud to meet the demands of their customers.

Now, after introducing NoSQL system, let’s look in to the main aspects of NoSQL Systems. The CAP theorem which states that it is impossible for a distributed computer system to simultaneously provide all three of the following:
  1. Consistency - All nodes see the same data at the same time
  2. Availability - All database clients are able to access some version of data
  3. Partition Tolerance - The ability to split the database on multiple servers

The above diagram describes various NoSQL systems and shows the CAP Theorem. According to CAP Theorem, it is only possible to achieve any two at a given point of time. The most highly used combination for social/networking sites is Availability with Partition Tolerance, somehow for Social networking Consistency is not a big issue, and hence comes the idea of Eventually Consistent. Eventually Consistent says that, a distributed database, will not be consistent immediately, but eventually.

The high availability can be only achieved with a loose coupled approach to getting data, and performing the least computations. This is exactly what NoSQL systems provide. The get and set mechanisms in NoSQL systems take advantage of no strict schema applied with redundant data to achieve availability. Since in today’s world, memory is no longer a constraint, so increasing redundancy to achieve good performance is not a big problem. In RDBMS the schema is very strict with highly normalised tables to store data. The key idea behind Normalisation is to reduce redundancy in the data. Sometimes to fetch data in RDBMS with high number of tables, the query becomes quite computation intensive which leads to slow processing of the request. Slow processing of requests is not something which is desired in this fast paced world, and hence the NoSQL systems fill in that gap of providing fast request processing with highly redundant data often leading to simple queries.

So Partition tolerance was mentioned above, but not explained. So with RDBMS, the database tier in the application is quite restricted. Often a single server with high processors and memory is used. The problem arises in scaling up, as in to scale the DB layer, more number of costly servers with high processors must be added. Adding servers to existing stack is a problem in itself, plus the cost shoots to the sky. NoSQL systems offer a different approach, which emerged from Google’s MapReduce. NoSQL databases provide a mechanism to use multiple machines/nodes for the database tier, and hence partitioning the database into multiple machines.This functionality is known as Partition tolerance, to be able to save the database into multiple nodes functioning as a system. Hadoop is one of the technologies used for creating clusters of computers. Note : These cluster computers are not server level computers, but normal desktop computers put together to form a cluster. The cluster functions as a high performing server while providing a cheaper and better way to manage data.
comments powered by Disqus