Right DB for your next scalable Project

Ayush Singh
3 min readAug 23, 2022

Choosing databases don’t impact your functional requirements but non-functional requirements

Don’t know which DB to use when ????

The choice of database depends on some factors predominently :-
1) Structure of the data (e.g :- very stuctured, non structured, etc)
2) Query pattern that you have.
3) Amount of scale that needs to be handled.
4) CAP theorem :-

CAP Theorem

Caching solution (key : value) (e.g redis):-

Use cases :-
1) When querying a database, you don’t want to repeatedly query it.
2) When making a remote call to different service and that has high latency, you may want to cache the results.

Non — querying storage (blob storage) (e.g s3) -> Not a DB:

Use cases:-
1) Storing images,videos, songs, etc.
Note :- Along with this(e.g: s3) , we might want to use CDN to distribute this data in geographically lots of location.

Text Searching Capability (e.g Elastic Search, Solr) -> Special cases of Document DB:

Use cases:-
1) To provide text searching capability on products that you are selling(e.g amazon), movies that you have to show (e.g Netflix) , handle even fuzzy search.

Time Series DB (metrics, monitoring) (e.g Influx DB, Open TS DB) :

Use cases:-
In these dbs, the update happen only in Sequential append only mode related to time. Here we dont do random reads, we give a range and that’s how we read.

Data Warehouse (Provide various querying capability on largely dump data) (e.g Hadoop)

Use cases:-
When you have a lot of information, and you want to store all those information in certain kind of datastore for analytics requirement (e.g most sought after item).
Note:- These are not generally used for transactional systems but used for offline reporting.

How to choose among databases ??

Prerequisetes :-
Structured data :- Info that can be easily modelled in table formation. (e.g :- User information like email, profile, etc)

Logics to decide DB

Combination of one or more DB scenarios :-

  1. Suppose you are building an e-commerce platform. Now in inventory you want to make sure you dont oversell the item (e.g :- only one item left and 4 requests came for purchase)

Sol :- In this scenario we need ACID properties therefore we need RDBMS but if the e-commerce is well scaled, the data would be ever increasing as each day new orders come in and the data is never purged, so in this case we need Columnar DB.

What we can do here is use both RDBMS and Columnar DB

RDBMS :- Store for orders which are just placed and not yet delivered to the customer.

Columnar DB :- Once the order is delivered to the customer, remove it from RDBMS and put in Columnar DB.

2. Suppose in a scaled e-commerce, you want to get all users who have bought T-shirt in last 5 days.

Sol :- There are many brands selling T-shirts. So there would be lot of item ids of T shirt. On top of all these item ids there will be lot of orders.
This orders can be either in Columnar dB or in RDBMS, now if we want to do random queries like who bought T-shirt, who bought pants, who bought pants of certain quality => This requires Document DB.

So what I can do is store subset of order information in Document DB for random querying purpose, which has data like user_id, order_id, timestamp.On this Document DB I can run the query and get list of users and list of orders. Now take this order_id and query on RDBMS and Columnar DB.

Other DBs :-

Graph DB is really used a lot when sql joins get complicated.

--

--