Skip to main content

Bulkheads

Context

Microservices are being adopted. Communication between microservices happens over an unreliable network.

Problem

  • A failing downstream microservice leads to resource exhaustion (e.g. thread pool) on the upstream microservice. The upstream microservice becomes incapable of processing any incoming requests, even if the downstream service is not involved since the resources are exhausted. The failure cascades through the system.

Solution

Use bulkheads that partition the critical resource into isolated resource pools. For example, each downstream service can have a separate thread pool for requests. If one resource pool becomes exhausted, the others are not affected. Parts of the service's functionality keeps available preventing cascading failures.

Infrastructure components service meshes may offer bulkhead implementations out of the box.

Maturity

More data required (only 3 sources).

Sources of Evidence

L35:

  • Bulkheads
    • fault isolation within a microservice
    • shared thread pool to make API calls to multiple microservices => thread pool resources can be quickly exhausted when one of downstream services degrades
    • resource exhaustion => incapable of processing new requests
    • => assign independent thread pool for each type of dependent microservice being called

L61:

  • mentions bulkhead pattern as finding in one examined primary study (P42)
  • supports fault isolation within a microservice

Interview B:

  • Service mesh can take over bulkhead pattern