Skip to main content

Rollbacks

Context

Microservices are being adopted. Multiple microservice instances are being deployed. There is a service discovery mechanism in place.

Problem

  • Faulty releases lead to downtimes of parts or the whole system

Solution

Prepare for potential rollbacks when deploying a new release of a microservice.

Despite how good the quality assurance measures are, chances are that faulty releases make it into production eventually. Thus, this case needs to be prepared to keep downtimes as short as possible, potentially at almost zero. Cluster managers, cloud platforms, or serverless platforms might offer capabilities for rollbacks.

The enabling factors are the immutable server pattern facilitated by containers and the stateless design of microservices. However, there is a need to rollback schema migrations in the database as well.

We advise automating the rollback process early as it will become unmanageable with increasing numbers of microservices. If you aim for zero-downtime, we recommend linking the rollback mechanism with other zero-downtime deployment techniques that keep the old version for a rollback available for a certain period:

Maturity

Proposed, to be evaluated.

Sources of Evidence

L9:

  • containers => support fast rollback
  • problem occurs in new release
    • no need to debug and fix in prod en
    • just rollback to previous container image
    • wait until new images with fixes become available

L16:

  • Immutable server pattern => if sth in new service is fault: old version of application can easily be redeployed by replacing the new version again
  • routers to switch between versions (e.g. load bbalancer)
    • rollback: just reroute traffic back to the old version

L17:

  • Context: interview study
  • automatically replace faulty services with previous version of same service to deliver correct output without need to restart the whole application

L19:

  • Context: leveraging Docker to do microservices
  • rollback to any version of the application as needed
    • map corresponding version of docker container to code-level docker compose file
    • push it back to mainstream dedicated chageback-orchestrate github repository

L20:

  • 19 of the selected studies highlight microservices can be deployed and *undeployed- independently

L23:

  • Kubernetes provides automated rollouts and rollbacks and self-healing features

L25:

  • Self-healing property: failed service easily restarted or replaced by previous versions

L31:

  • MP14 Container orchestrator makes easy to gradually replace service's instance and rolling back gradually in case of a failure

L58:

  • kept two different versions of same service run in parallel
    • QA on the new version
    • possibility to rollback to old version if needed
    • => no downtime

LN48:

  • Context: service update
  • microservices must be able to constantly update
  • enterprise appls must have zero downtime => reduce update time
    • => ensure application can roll back to its previous state in case of any failure
    • proposed model serves the purpose of rolling updates with minimum downtime
  • virtualization (VM or container) helps to reduce effort for rolling updates
  • ensure for every update previous version is up and running
    • => in case of failure: roll back to previous state

Interview B:

  • Assume a service responds with wrong payload or similar
  • fallback: an old service
    • fast automated rollback
  • (+) enables trust for reduced coordination
  • Importance to recover service brought into production
    • fast rollback
      • includes database
    • best automated
    • in cases where determined the service doesn't work as planned
    • Best case: doesn't go into production thanks to tests
      • Practice: it happens
    • bringing into production easy, getting out of production often neglected
      • is a challenge
    • start with automation early as this requires automation eventually when number of services increase

Interview F:

  • Context: coordinated release, planned ahead
  • if something happend despite the integration in a dedicated integration environment
    • roll back the service that broke the system
    • then wait until other microservices adapted to release
  • Context: how hard is it to roll back
    • microservices brings everything it needs, locally confined
    • and it usually was detected in the staging environment
    • that means no production data => test data set
    • destroying that was not a big issue => recreate test data set easily
    • wouldn't wanna do this in production environment
      • in worst case: use backups to restore state on after a breaking change