Degradation of Functionality
Context
Microservices are in use or are planned to be adopted. Microservices communicate over network with each other to a certain degree. This inevitably leads to failures by the unreliable network.
Problem
- Failures at downstream services (e.g. unavailability) cascades to other services.
- Failures of single services endanger the health of the overall system.
- Independent deployment cannot be facilitated since some microservices require other microservices to be available.
- There is a mandatory starting order.
Solution
Degrade the service functionality on downstream failures instead of quitting service in order to minimize the impact of partial outages.
The basis for degrading the functionality is a service cut with a low domain coupling. Additionally, we advice to look for domain-motivated alternatives.
Data replication mechanisms allow to resort to replicated data instead of failing. However, the cached data might be out of date, so it should be evaluated if it is applicable in the specific case.
Degrading functionality contributes to the self-containment and independence of microservices. It builds the foundation for independent deployments avoiding mandatory starting orders and zero-downtime upgrades.
Maturity
Proposed, evaluation required.
Sources of Evidence
L6:
- Gradually degrading system functionality contributes to resilience
L20:
- Even if microservice is exploiting functions of other microservices it can be deployed without their availability
- deployed microservices continue to be running even if dependencies are not there
- even if partly working
- (+) simplifies orchestration and deployment
- => contributes to reliability, fault isolation, live updates
L35:
- ElasticPress: fallback to MySQL powered search method when Elastic search instance was unreachable/returned error
L61:
- tolerate failures by either
- recovering as fast as possible
- or by gracefully degrading its functionalities
Interview D:
- Degradation of service on failures
- only possible with low domain coupling (don't confuse with technical coupling)
- need to compensate when comm partner is not answering to stay actionable
- tradeoff with reuse of code/components/functionality
Interview E:
- Context: how to handle errors by showing alternatives
- Known in customer-facing UI areas where you want to satisfy the customers
- domains where milliseconds delay makes a huge loss in revenue
- => let customer feel an elevated performance by showing ghost elements or something prerendered
Interview F:
- Context: resilience and fault tolerance
- which dependencies do I have to other services? What happens if they are not available?
- example: degradation
- you need to think about how this is okay or if I need to pull up my feet as well