3 minute read

💡 CS Knowledges Study

What is a Database?

According to Oracle - one of the biggest DB companies - a database is defined as:

“A database is an organised collection of structured information, or data, typically stored electronically in a computer system.”

It is managed with a database management system (DBMS) such as MySQL, PostgreSQL, MongoDB and Amazon DynamoDB. Here, MySQL and PostgreSQL is a relational database (RDB) while MongoDB and Amazon DynamoDB is a Non SQL (NoSQL) (a.k.a. Not only SQL).

RDB

Oracle, MySQL, PostgreSQL, SQLite, MariaDB, Microsoft SQL Server

  • The most widely used approach for managing data.
  • Data is organised into tables - relations.
  • Typically employ Structured Query Language (SQL).

Limitations

  • Difficult to scale horizontally (Scale out).
    • Because it is designed to ensure consistency.
  • Difficult to manage unstructured data - such as email messages, photos, videos, etc.
    • Because it is designed to manage structured data, or data that aligns with a predefined data type.

NoSQL

  • Designed to maximise availability at the expense of consistency.
    • Aims for eventual consistency.
    • Reads an earlier version of it instead of receiving an error whereas RDB returns an error.

Document based DB

MongoDB, Couchbase, Apache, CouchDB

  • Stores data in the form of documents.
    • Similar to Key-Value Store - each document has a unique identifier (key).
    • The difference is that Document based DB cares about the type of data (Metadata) it stores, whereas Key Value based DB treats the data as opaque - it’s up to the application.
    • Data and Metadata are stored hierarchically in JSON-based documents inside the database.
  • Stores all the data of a given object in a single document.
    • As JSON, BSON, XML, or YAML documents.
  • Highly scalable, with sharding being a common horizontal scaling strategy.
  • Excellent for keeping large amounts of unrelated, complex information that varies in structure.
    • Use cases: E-Commerce, Blogging, Analytics Platforms, and Content Management Systems.

Key Value DB

Redis, Memcached, Amazon DynamoDB, Riak

  • Stores and manages associative arrays - a.k.a. a dictionary or hash table.
    • Values can be from integers or strings to JSON structures.
  • Stores data as a single collection without any structure or relation.
  • Treats any data as an opaque blob; it’s up to the application to understand how it’s structured.
  • Highly performant, efficient, and scalable.
    • Use cases: Caching, Message Queuing, and Session Management.

Column based DB

Apache, Cassandra, Apache HBase, ClickHouse

  • Stores data in columns.
    • Each column is stored in a separate file or region in the system’s storage.
  • Appears in record order.
    • The first entry in one column is related to the first entry in other columns.
  • Pros - Good where an application needs to frequently perform aggregate functions.
    • Such as finding the average or sum total of data in a column.
  • Cons - The load performance tends to be slow.
    • Because each column must be written separately and data is often kept compressed.

Graph based DB

Neo4j, ArangoDB, OrientDB

  • Subcategory of the Document based DB but the graph databases add an extra layer to the Document based DB by highlighting the relationships between individual documents.
  • Concepts:
    • Node - Individual entity.
    • Property - Relevant information related to individual nodes.
    • Edge (a.k.a. Graph or Relationship) - Representation of how two nodes are related.
      • Undirected - shows a connection between nodes.
      • Directed - when edges have different meanings depending on where the direction originated.

(Reference: https://www.digitalocean.com/community/tutorials/a-comparison-of-nosql-database-management-systems-and-models)

Table of Comparison Between RDB and NoSQL

  RDB NoSQL
Data Structured Data Unstructured Data
Big Data Processing Performance decreased Applicable
Schema Structured Schema exists Either there’s no Schema or easily changed
Transaction Consistent with transaction No transaction
Search Complex one such as JOIN etc. Simple search provided
Scalability Not good for cluster Good for cluster
Lisence Might be expensive Open Source
Services Oracle, MySQL, PostgreSQL Cassandra, MongoDB, Amazon DynamoDB

Selection Criteria

  • Operation and maintenance costs
  • Service stability
  • Performance
  • Scalability
  • Security
Decision Tree for RDB
Decision Tree for RDB
Decision Tree for NoSQL
Decision Tree for NoSQL

(Image Source: https://en.pingcap.com/blog/how-to-efficiently-choose-the-right-database-for-your-applications/)