Database Clustering and Sharding
Context
Microservices are being adopted. Each microservice has an isolated database (area).
Problem
- The database is a risk for failure
- If it doesn't work, the using microservice(s) might not be able to offer their service
Solution
Cluster or shard the database(s) the microservices use. Clustering or sharding the database layer will lead to increase resilience of the microservice and improve its scalability.
Depending on how the storage isolation per microservice is applied this might imply a range from clustering one shared database to scaling a database per microservice.
Maturity
Proposed, requires evaluation.
Sources of Evidence
L25:
- "The database cluster pattern" (originally by Richardson)
- microservices have subset of tables only accessible from a single microservices => preserve data consistency
- equal to "shared db server" pattern => one single db
- Reported usages use separated DB schema for each service
- one proposed for replicating data accross the DBs of each service
- (+) improves scalability, use with huge data traffics
- (+) allows moving db to dedicated hardware
- (-) complexity due to cluster architecture
- (-) risk of failure (due to another component + distribution)
L40:
- Using microservices with decomposition = sharding the overall data
- (+) improves scalability and performance when storing and accessing large data volumes
- Shards kept consistent with respect to a centralized storage or through distributed protocol
- Sharding and security threats: need to verify (determine trustworthiness) and protect against forging (modification) during sharding
LN43:
- possible to select db engine per microservice => database per service pattern
- more freedom to select tools
- designing and scaling database easier if there are fewer tables and microservice has full control of data and schemas
- some microservices can even use no database
- e.g. persist to disk instead
LM48:
- Context: microservice migration describes an examples project (FX Core) and compares back to monolith
- Use Docker Swarm
- Some services utilize clustering
- like RabbitMQ, Icinga, all dbs: RadioNodeList, Casandra, PostgreSQL
- due to availability requirements: are critical components of infrastructure
- deployment on each Swarm cluster node in each data center
- => can keep running as long as there is single data center available with active Swarm node
Interview D:
- which constraints do I have for used databases?
- replication mechanisms
- consistency levels
- anecdote: had a customer who wanted to use Cassandra because of wrong reasons
- cool piece of tech
- Merkle Trees, Gossip protocols
- much fun, but almost nobody understands what that means
- but if I need high level of consistency => not built for that
- it's a last-man-standing technology
- take write requests as long there is one node available
- consistency, who cares
- cool piece of tech