Rollbacks

Context

Microservices are being adopted. Multiple microservice instances are being deployed. There is a service discovery mechanism in place.

Problem

Faulty releases lead to downtimes of parts or the whole system

Solution

Prepare for potential rollbacks when deploying a new release of a microservice.

Despite how good the quality assurance measures are, chances are that faulty releases make it into production eventually. Thus, this case needs to be prepared to keep downtimes as short as possible, potentially at almost zero. Cluster managers, cloud platforms, or serverless platforms might offer capabilities for rollbacks.

The enabling factors are the immutable server pattern facilitated by containers and the stateless design of microservices. However, there is a need to rollback schema migrations in the database as well.

We advise automating the rollback process early as it will become unmanageable with increasing numbers of microservices. If you aim for zero-downtime, we recommend linking the rollback mechanism with other zero-downtime deployment techniques that keep the old version for a rollback available for a certain period:

Maturity

Proposed, to be evaluated.

Sources of Evidence

L9:

containers => support fast rollback
problem occurs in new release
- no need to debug and fix in prod en
- just rollback to previous container image
- wait until new images with fixes become available

L16:

Immutable server pattern => if sth in new service is fault: old version of application can easily be redeployed by replacing the new version again
routers to switch between versions (e.g. load bbalancer)
- rollback: just reroute traffic back to the old version

L17:

Context: interview study
automatically replace faulty services with previous version of same service to deliver correct output without need to restart the whole application

L19:

Context: leveraging Docker to do microservices
rollback to any version of the application as needed
- map corresponding version of docker container to code-level docker compose file
- push it back to mainstream dedicated chageback-orchestrate github repository

L20:

19 of the selected studies highlight microservices can be deployed and *undeployed- independently

L23:

Kubernetes provides automated rollouts and rollbacks and self-healing features

L25:

Self-healing property: failed service easily restarted or replaced by previous versions

L31:

MP14 Container orchestrator makes easy to gradually replace service's instance and rolling back gradually in case of a failure

L58:

kept two different versions of same service run in parallel
- QA on the new version
- possibility to rollback to old version if needed
- => no downtime

LN48:

Context: service update
microservices must be able to constantly update
enterprise appls must have zero downtime => reduce update time
- => ensure application can roll back to its previous state in case of any failure
- proposed model serves the purpose of rolling updates with minimum downtime
virtualization (VM or container) helps to reduce effort for rolling updates
ensure for every update previous version is up and running
- => in case of failure: roll back to previous state

Interview B:

Assume a service responds with wrong payload or similar
fallback: an old service
- fast automated rollback
(+) enables trust for reduced coordination
Importance to recover service brought into production
- fast rollback
  - includes database
- best automated
- in cases where determined the service doesn't work as planned
- Best case: doesn't go into production thanks to tests
  - Practice: it happens
- bringing into production easy, getting out of production often neglected
  - is a challenge
- start with automation early as this requires automation eventually when number of services increase

Interview F:

Context: coordinated release, planned ahead
if something happend despite the integration in a dedicated integration environment
- roll back the service that broke the system
- then wait until other microservices adapted to release
Context: how hard is it to roll back
- microservices brings everything it needs, locally confined
- and it usually was detected in the staging environment
- that means no production data => test data set
- destroying that was not a big issue => recreate test data set easily
- wouldn't wanna do this in production environment
  - in worst case: use backups to restore state on after a breaking change

Rollbacks

Context​

Problem​

Solution​

Maturity​

Sources of Evidence​

Context

Problem

Solution

Maturity

Sources of Evidence