Auto-Scale Instances Based on Metrics

Context

Microservices are being adopted. Microservices can be deployed with a configurable amount of instances. There are mechanisms to discover service instances. There is a mechanism to balance load between instances of a microservice.

Problem

Manual scaling by starting and stopping microservices instances becomes very laborious with the rising number of microservices
- Load spikes need to be known beforehand
- The manual process is error-prone

Solution

Implement an auto-scaling mechanism for the instances of the microservices. The mechanism should take monitoring metrics to track the load of the microservices and scale them up or down depending on their load.

Depending on where you deploy your microservice application to, the deployment environment might offer auto-scaling features. Cloud providers usually offer their own rule-based reactive runtime scheduling mechanisms, cluster managers like Kubernetes come with their auto-scaling capabilities as well.

An emerging research topic is performance modeling and performance prediction that might further optimize automated microservice instance scaling.

Maturity

Proposed, requires evaluation.

Sources of Evidence

L3:

scaling individually (I assume manually) is major effort and error-prone

L14:

Cloud infrastructure offers to adapt to workload automatically by elastic capacity management
at Otto.de: monitor CPU, number of incoming requests => automatic capacity management
Developers can deploy and scale without any support from operations team

L15:

Serverless benchmarking that scales automatically => elasticity

L23:

Kubernetes provides horizontal scaling
- manually or automated based on CPU load
- automatic elasticity is agnostic to workload behavior and performance objectives
AWS, Google Cloud, Rackspace, Azure, IBM clouds provide threshold-based rules reactive runtime scheduling techniques for scaling
- e.g. average CPU is over 40 percent
challenge: learn and predict microservice-specific workload
- request arrival pattern
- CPU usage pattern
- memory usage pattern
- I/O system behaviors
- request processing time distributions
- network usage patterns

L30:

Virtualization and infrastructure technologies as Docker + Kubernetes allow for auto-scaling
Use additional metrics
- e.g. on container level: CPU, memory, startup times of containers

L31:

Migration Pattern "Deploy into a cluster and orchestrate containers"
- Cluster / Container orchestrator should allow for restarting containers on failure and auto-scaling
- Declarative configuration => cluster management tool should handle auto-scaling

L34:

Auto-scaling as part of DevOps practices

L49:

PaaS cloud platforms manage load balancing, service naming, etcetera and provide middleware for message communication
- to scale the system automatically

L61:

Serverless platform handles monitoring and logging
- on different levels: OS, container, communication, ...
- does automatic horizontal scaling
Service orchestration as part of system level management (includes auto-scaling)

L63:

Context: MiCADO concept and architecture
- Application definition layer: includes infrastructure and QoS specifications to define service requirements
- continuous collection of metrics from running application
- Optimization decision maker component analyzes data on performance/cost
- Deployment executer makes adjustment to scale up or down the infrastructure
- Users can adjust requirements of microservices at runtime
Implementation of MiCADO prototype
- Prometheus to collect information about services
- processed by Alert manager
- Alert executor instructs Occopus to launch or shut down cloud instances if bottleneck detected / underutilized infrastructures
MiCADO prototype evaluation
- scale up app node: 300s
- scale down app node: 12s

LN21:

industrial systems run on highly complex infrastructures as auto-scaling microservice cluster and service mesh
- => challenges on execution tracing and visualization

LN43:

current development of cloud services make automatic scaling of resources very easy and cost-efficient
- microservices make most of this automatic scaling
- monoliths cannot take full advantage
if Docker used in prod env => multiple tools to handle scaling, deployment, and management of containers
- e.g. Kubernetes => makes solving these challenging easier
  - horizontal scaling among others

LM47:

Context: SLR with tactics to achieve quality attributes
Auto-scaling
reactive: threshold-based
- scaler decides in or out according to upper and lower threshold of monitored data
- CPU utilization, HTTP latency, HTTP throughput, message queue metrics
- if exceeds threshold: increase or decrease instances of a microservice
- should be adaptable
  - can be optimized by heuristic to achieve specific goals
proactive: workload prediction
- predict number of requests for each microservice
- convert to probably required resources while achieving SLA
- match supply from needed resource: available and cost-efficient
  - usually uses optimization algo
- needs the system to run for a period to ensure accuracy of predictions

Interview A:

Kubernetes does auto-scaling
- Replica-sets: platform makes sure there are X instances, distributed on different nodes or even ESX hosts (different fire section in server room)
- => resilience: handles failure (restart)
- Combination with load balancing
Work with min and max!
Standard Metrics: CPU, RAM
- Liveliness probes and health checks
- On different levels: Node, OS, Kubernetes, Pod, Service, Namespace, ...
  - fast detection
  - easier to localize in error case
Learning: define max resources
- Otherwise might lead to cluster overflow (upwards spiral if bug)

Interview B:

automated up and down scaling

Interview C:

auto-scaling as complex topic
platforms, libs, frameworks can offer help but no complete abstraction
but you need to understand the problem and you have to handle it
- configure them right
- know the right scenarios for the right cases

Auto-Scale Instances Based on Metrics

Context​

Problem​

Solution​

Maturity​

Sources of Evidence​

Context

Problem

Solution

Maturity

Sources of Evidence