Auto-Scale Instances Based on Metrics
Context
Microservices are being adopted. Microservices can be deployed with a configurable amount of instances. There are mechanisms to discover service instances. There is a mechanism to balance load between instances of a microservice.
Problem
- Manual scaling by starting and stopping microservices instances becomes very laborious with the rising number of microservices
- Load spikes need to be known beforehand
- The manual process is error-prone
Solution
Implement an auto-scaling mechanism for the instances of the microservices. The mechanism should take monitoring metrics to track the load of the microservices and scale them up or down depending on their load.
Depending on where you deploy your microservice application to, the deployment environment might offer auto-scaling features. Cloud providers usually offer their own rule-based reactive runtime scheduling mechanisms, cluster managers like Kubernetes come with their auto-scaling capabilities as well.
An emerging research topic is performance modeling and performance prediction that might further optimize automated microservice instance scaling.
Maturity
Proposed, requires evaluation.
Sources of Evidence
L3:
- scaling individually (I assume manually) is major effort and error-prone
L14:
- Cloud infrastructure offers to adapt to workload automatically by elastic capacity management
- at Otto.de: monitor CPU, number of incoming requests => automatic capacity management
- Developers can deploy and scale without any support from operations team
L15:
- Serverless benchmarking that scales automatically => elasticity
L23:
- Kubernetes provides horizontal scaling
- manually or automated based on CPU load
- automatic elasticity is agnostic to workload behavior and performance objectives
- AWS, Google Cloud, Rackspace, Azure, IBM clouds provide threshold-based rules reactive runtime scheduling techniques for scaling
- e.g. average CPU is over 40 percent
- challenge: learn and predict microservice-specific workload
- request arrival pattern
- CPU usage pattern
- memory usage pattern
- I/O system behaviors
- request processing time distributions
- network usage patterns
L30:
- Virtualization and infrastructure technologies as Docker + Kubernetes allow for auto-scaling
- Use additional metrics
- e.g. on container level: CPU, memory, startup times of containers
L31:
- Migration Pattern "Deploy into a cluster and orchestrate containers"
- Cluster / Container orchestrator should allow for restarting containers on failure and auto-scaling
- Declarative configuration => cluster management tool should handle auto-scaling
L34:
- Auto-scaling as part of DevOps practices
L49:
- PaaS cloud platforms manage load balancing, service naming, etcetera and provide middleware for message communication
- to scale the system automatically
L61:
- Serverless platform handles monitoring and logging
- on different levels: OS, container, communication, ...
- does automatic horizontal scaling
- Service orchestration as part of system level management (includes auto-scaling)
L63:
- Context: MiCADO concept and architecture
- Application definition layer: includes infrastructure and QoS specifications to define service requirements
- continuous collection of metrics from running application
- Optimization decision maker component analyzes data on performance/cost
- Deployment executer makes adjustment to scale up or down the infrastructure
- Users can adjust requirements of microservices at runtime
- Implementation of MiCADO prototype
- Prometheus to collect information about services
- processed by Alert manager
- Alert executor instructs Occopus to launch or shut down cloud instances if bottleneck detected / underutilized infrastructures
- MiCADO prototype evaluation
- scale up app node: 300s
- scale down app node: 12s
LN21:
- industrial systems run on highly complex infrastructures as auto-scaling microservice cluster and service mesh
- => challenges on execution tracing and visualization
LN43:
- current development of cloud services make automatic scaling of resources very easy and cost-efficient
- microservices make most of this automatic scaling
- monoliths cannot take full advantage
- if Docker used in prod env => multiple tools to handle scaling, deployment, and management of containers
- e.g. Kubernetes => makes solving these challenging easier
- horizontal scaling among others
- e.g. Kubernetes => makes solving these challenging easier
LM47:
- Context: SLR with tactics to achieve quality attributes
- Auto-scaling
- reactive: threshold-based
- scaler decides in or out according to upper and lower threshold of monitored data
- CPU utilization, HTTP latency, HTTP throughput, message queue metrics
- if exceeds threshold: increase or decrease instances of a microservice
- should be adaptable
- can be optimized by heuristic to achieve specific goals
- proactive: workload prediction
- predict number of requests for each microservice
- convert to probably required resources while achieving SLA
- match supply from needed resource: available and cost-efficient
- usually uses optimization algo
- needs the system to run for a period to ensure accuracy of predictions
Interview A:
- Kubernetes does auto-scaling
- Replica-sets: platform makes sure there are X instances, distributed on different nodes or even ESX hosts (different fire section in server room)
- => resilience: handles failure (restart)
- Combination with load balancing
- Work with min and max!
- Standard Metrics: CPU, RAM
- Liveliness probes and health checks
- On different levels: Node, OS, Kubernetes, Pod, Service, Namespace, ...
- fast detection
- easier to localize in error case
- Learning: define max resources
- Otherwise might lead to cluster overflow (upwards spiral if bug)
Interview B:
- automated up and down scaling
Interview C:
- auto-scaling as complex topic
- platforms, libs, frameworks can offer help but no complete abstraction
- but you need to understand the problem and you have to handle it
- configure them right
- know the right scenarios for the right cases