Firstly, NoSQL database does not imply no SQL at all and for the same reason “NoSQL” is also referred to as "not only SQL". And, most NoSQL databases provide a SQL-like interface. What is different about NoSQL databases is that they are based on a database model that is non-relational and schema-free. In contrast relational databases such as Oracle database store data in tables with a fixed schema (rows/columns), which have relations between them and make use of SQL (Structured Query Language) to access and query the tables. A relational database table has a fixed schema with predefined columns and column types. Schema-less (schema-free) implies that each row of data in a NoSQL database could have a different (or same) set of columns and the column data type is not fixed. Some of the NoSQL supported data models are document store (for example MongoDB and Couchbase), wide column store (for example Apache Cassandra), and Key Value store (for example Oracle NoSQL database). What is the need for NoSQL? Relational databases have been used for decades. What is the need for a new type of a database, the NoSQL database. NoSQL databases were developed as a solution to the following requirements of modern web scale applications: Increase in the volume of data stored about users and objects, also termed as big data. Streaming data, for example the data generated by online transactions and user sessions. Rate at which big data influx is increasing each year, exponentially. Increase in the frequency at which the data is accessed. Fluctuations in data usage. Increased processing and performance required to handle big data. Ultra-high availability. The type of data is unstructured or semi--structured. A subset of the afore-mentioned reasons is often referred to as the 3 Vs or 4 vs. of Big Data; Volume, Variety, Velocity and Veracity. SQL-based relational databases were not designed to handle the scalability, agility, and performance requirements of modern applications using real-time access and processing big data. While most RDBMS databases provide scalability and high availability as features, NoSQL databases provide higher levels of scalability and high availability. Big data is growing exponentially. Concurrent users have grown from a few hundred or thousand to several million for applications running on the web. It is not just that once big data has been stored new data is not added. It is not just that once a web application is being accessed by millions of users it shall continue to be accessed by as many users for a predictable period of time. The number of users could drop to a few thousand within a day or a few days. Relational database being based on single server architecture, a single database is a single point of failure (SPOF). For a highly available database, data must be distributed across a cluster of servers instead of relying on a single database. NoSQL databases provide the distributed, scalable architecture required for big data. "Distributed" implies that data in a NoSQL database is distributed across a cluster of servers. If one server becomes unavailable another server is used. The "distributed" feature is a provision and not a requirement for a NoSQL database. A small scale NoSQL database may consist of only one server. The fixed schema data model used by relational databases makes it necessary to break data into small sets and store them in separate tables using table schemas. The process of decomposing large tables into smaller tables with relationships between tables is called database normalization. Normalized databases require table joins and complex queries to aggregate the required data. In contrast, the data models provided by NoSQL databases provide a denormalized database. Each document is complete unto itself and does not have any external references to other documents. Self-contained documents are easier to store, transfer, and query. Advantages of NoSQL Databases In this section we shall cover the advantages of NoSQL databases. Scalability NoSQL databases are easily scalable, which provides an elastic data model. Why is scalability important? Suppose you are running a database with a fixed capacity and the web site traffic fluctuates, sometimes rising much in excess of the capacity, sometimes falling below the capacity. A fixed capacity database won't be able to serve the requests of the load in excess of the capacity, and if the load is less than the capacity the capacity is not being utilized fully. Scalability is the ability to scale the capacity to the workload. Two kinds of scalability options are available: horizontal scalability and vertical scalability. With horizontal scalability or scaling-out, new servers/machines are added to the database cluster. With vertical scalability or scaling-up, the capacity of the same server or machine is increased. Vertical scalability has several limitations. Requires the database to be shut down so that additional capacity may be added, which incurs a downtime. A single server has an upper limit. A single server is a single point of failure. If the single server fails, the database becomes unavailable. While relational databases support vertical scalability, NoSQL databases support horizontal scalability. Horizontal scalability does not have the limitations that vertical scalability does. Additional server nodes may be added to a cluster without a dependency on the other nodes in the cluster. The capacity of the NoSQL database scales linearly, which implies that if you add 4 additional servers to a single server, the total capacity becomes five times the original, not a fraction of the original due to performance loss. The NoSQL cluster does not have to be shut down to add new servers. Ease of scalability is provided by the shared-nothing architecture of NoSQL databases. The monolithic architecture provided by traditional SQL databases is not suitable for the flexible requirements of storing and processing big data. Traditional databases support scale-up architecture (vertical scaling) in which additional resources may be added to a single machine. In contrast, NoSQL databases provide a scale-out (horizontal scaling), nothing shared architecture, in which additional machines may be added to the cluster. In a shared- nothing architecture, the different nodes in a cluster do not share any resources, and all data is distributed (partitioned) evenly (load balancing) across the cluster by a process called sharding. Ultra-high Availability Why is high availability important? Because interactive real-time applications serving several users need to be available all the time. An application cannot be taken offline for maintenance, software, or hardware upgrade or capacity increase. NoSQL databases are designed to minimize downtime, though different NoSQL databases provide different levels of support for online maintenance and upgrades. Commodity Hardware NoSQL databases are designed to be installed on commodity hardware, instead of high-end hardware. Commodity hardware is easier to scale-out:, simply add another machine and the new machine added does not even have to be of similar specification and configuration as the machine/s in the NoSQL database cluster. Flexible Schema or No Schema While the relational databases store data in the fixed tabular format for which the schema must be defined before adding data, the NoSQL databases do not require a schema to be defined or provide a flexible dynamic schema. Some NoSQL databases such as Oracle NoSQL database and Apache Cassandra have a provision for a flexible schema definition, still others such as Couchbase are schema-less in that the schema is not defined at all. The support for flexible schemas or no schemas makes NoSQL databases suitable for structured, semi-structured, and unstructured-structured data. In an agile development setting the schema definition for data stored in a database may need to change, which makes NoSQL databases suitable for such an environment. Dissimilar data may be stored together. Flexible schemas make development faster, code integration uninterrupted by modifications to the schema, and database administration almost redundant. Big Data NoSQL databases are designed for big data. Big data is in the order of tens or even hundreds of PetaByte (PB). Big data is usually associated with a large number of users and a large number of transactions. Object-Oriented Programming The data models provided by NoSQL databases support object-oriented programming, which is both easy to use and flexible. Most NoSQL databases are supported by APIs in object-oriented programming languages such as Java, PHP, and Ruby. All client APIs support simple put and get operations to add and get data. Performance Why is performance important? Because interactive real-time applications require low latency for read and write operations for all types and sizes of workloads. Applications need to serve millions of users concurrently at different workloads. The shared- nothing architecture of NoSQL databases provides low latency, high availability, reduced susceptibility to failure of critical sections, and reduced bandwidth requirement. The performance in a NoSQL database cluster does not degrade with the addition of new nodes. Failure Handling NoSQL databases typically handle server failure automatically to failover to another server. Why is auto-failover important? Because if one of the nodes in a cluster were to fail and if the node was handling a workload, the application would fail and become unavailable. NoSQL databases typically consist of a cluster of servers and are designed with the failure of some nodes as expected and unavoidable. With a large number of nodes in a cluster the database does not have a single point of failure, and failure of a single node is handled transparently with the load of the failed server being transferred to another server. Less Administration NoSQL databases are easier to install and administer without the need for specialized DBAs. A developer is able to handle the administration of a NoSQL database, but a specialized NoSQL DBA should still be used. Schemas are flexible and do not need to be modified periodically. Failure detection and failover is automatic without requiring user intervention. Data distribution in the cluster is automatic using auto-sharding. Data replication to the nodes in a cluster is also automatic. When a new server node is added to a cluster, data gets distributed and replicated to a new node as required automatically. Cloud Enabled Cloud computing has made unprecedented capacity and flexibility in choice of infrastructure available. Cloud service providers such as Amazon Web Services (AWS) provide fully -managed NoSQL database services and also the option to develop custom NoSQL database services. Advantages of RDBMS While much has been discussed about their merits, NoSQL databases are not without drawbacks. Some of the aspects in which relational databases have advantages are as follows. Transactional Properties NoSQL databases do not provide the Atomicity, Consistency, Isolation, and Durability (ACID) properties in transactions that relational databases do. Atomicity ensures that either all task/s within a transaction are performed or none are performed. Consistency ensures that the database is always in a consistent state without any partially -completed transactions. Isolation implies that transactions are isolated and do not have access to the data of other transactions till until the transactions have completed. Isolation provides consistency and performance. A transaction is durable when it has completed. NoSQL database provide Basically Available, Soft state, and Eventually consistent (BASE) transactional properties. Basically Available implies that a NoSQL database returns a response to every request though the response could be a failure to provide the requested data, or the requested data could be returned in an inconsistent state. Soft state implies that the state of the system could be in transition during which time the state is not consistent. Eventually consistent implies that when the database stops receiving input, eventually the state of a NoSQL database becomes consistent when the data has replicated to the different nodes in the cluster as required. But, while a NoSQL database is receiving input, the database does not wait for its state to become consistent before receiving more data. Stable and Reliable The NoSQL databases are still new to the field of databases and not as functionally stable and reliable as the established relational databases. Established Vendor Support Most NoSQL databases such as MongoDB and Apache Cassandra are open source projects and lack the official support provided by established databases such as Oracle database or IBM DB2 database. Conclusion NoSQL/NonSQL databases such as MongoDB (or Apache Cassandra or Couchbase) shall never completely replace relational databases such as Oracle database because the NoSQL databases are designed for a different use case, which is big, unstructured data, for example the web scale data used by search engines. Small scale enterprises (and even some larger ones) would continue to use relational databases for their superior transactional properties, stability & reliability, and established support base.
↧