RDB and NoSQL, what is different?
💡 CS Knowledges Study
What is a Database?
According to Oracle - one of the biggest DB companies - a database is defined as:
“A database is an organised collection of structured information, or data, typically stored electronically in a computer system.”
It is managed with a database management system (DBMS) such as MySQL, PostgreSQL, MongoDB and Amazon DynamoDB. Here, MySQL and PostgreSQL is a relational database (RDB) while MongoDB and Amazon DynamoDB is a Non SQL (NoSQL) (a.k.a. Not only SQL).
RDB
Oracle, MySQL, PostgreSQL, SQLite, MariaDB, Microsoft SQL Server
- The most widely used approach for managing data.
- Data is organised into tables - relations.
- Typically employ Structured Query Language (SQL).
Limitations
- Difficult to scale horizontally (Scale out).
- Because it is designed to ensure consistency.
- Difficult to manage unstructured data - such as email messages, photos, videos, etc.
- Because it is designed to manage structured data, or data that aligns with a predefined data type.
NoSQL
- Designed to maximise availability at the expense of consistency.
- Aims for eventual consistency.
- Reads an earlier version of it instead of receiving an error whereas RDB returns an error.
Document based DB
MongoDB, Couchbase, Apache, CouchDB
- Stores data in the form of documents.
- Similar to Key-Value Store - each document has a unique identifier (key).
- The difference is that Document based DB cares about the type of data (Metadata) it stores, whereas Key Value based DB treats the data as opaque - it’s up to the application.
- Data and Metadata are stored hierarchically in JSON-based documents inside the database.
- Stores all the data of a given object in a single document.
- As JSON, BSON, XML, or YAML documents.
- Highly scalable, with sharding being a common horizontal scaling strategy.
- Excellent for keeping large amounts of unrelated, complex information that varies in structure.
- Use cases: E-Commerce, Blogging, Analytics Platforms, and Content Management Systems.
Key Value DB
Redis, Memcached, Amazon DynamoDB, Riak
- Stores and manages associative arrays - a.k.a. a dictionary or hash table.
- Values can be from integers or strings to JSON structures.
- Stores data as a single collection without any structure or relation.
- Treats any data as an opaque blob; it’s up to the application to understand how it’s structured.
- Highly performant, efficient, and scalable.
- Use cases: Caching, Message Queuing, and Session Management.
Column based DB
Apache, Cassandra, Apache HBase, ClickHouse
- Stores data in columns.
- Each column is stored in a separate file or region in the system’s storage.
- Appears in record order.
- The first entry in one column is related to the first entry in other columns.
- Pros - Good where an application needs to frequently perform aggregate functions.
- Such as finding the average or sum total of data in a column.
- Cons - The load performance tends to be slow.
- Because each column must be written separately and data is often kept compressed.
Graph based DB
Neo4j, ArangoDB, OrientDB
- Subcategory of the Document based DB but the graph databases add an extra layer to the Document based DB by highlighting the relationships between individual documents.
- Concepts:
- Node - Individual entity.
- Property - Relevant information related to individual nodes.
- Edge (a.k.a. Graph or Relationship) - Representation of how two nodes are related.
- Undirected - shows a connection between nodes.
- Directed - when edges have different meanings depending on where the direction originated.
(Reference: https://www.digitalocean.com/community/tutorials/a-comparison-of-nosql-database-management-systems-and-models)
Table of Comparison Between RDB and NoSQL
| RDB | NoSQL | |
|---|---|---|
| Data | Structured Data | Unstructured Data |
| Big Data Processing | Performance decreased | Applicable |
| Schema | Structured Schema exists | Either there’s no Schema or easily changed |
| Transaction | Consistent with transaction | No transaction |
| Search | Complex one such as JOIN etc. | Simple search provided |
| Scalability | Not good for cluster | Good for cluster |
| Lisence | Might be expensive | Open Source |
| Services | Oracle, MySQL, PostgreSQL | Cassandra, MongoDB, Amazon DynamoDB |
Selection Criteria
- Operation and maintenance costs
- Service stability
- Performance
- Scalability
- Security
(Image Source: https://en.pingcap.com/blog/how-to-efficiently-choose-the-right-database-for-your-applications/)