Cluster Management by Container Orchestrator

Context

Microservices are adopted. Microservices have infrastructure services or other microservices as dependencies. There is no sophisticated deployment environment yet. Potentially, there is a single-node deployment.

Problem

An increasing number of containers makes it hard to schedule them to different nodes by hand
Automation via configuration management tools (e.g. Ansible, Chef) and other scripts can't cope with the complexity required to schedule microservices to different nodes
The overhead puts pressure on the people responsible for operation

Solution

Introduce a cluster managager tool, e.g., a container orchestrator, that handles the deployment of the microservices' deployment artifacts to a cluster of servers. Letting the cluster manager do the heavy lifting of the deployment in an automated manner will relieve the pressure from those responsible for operation and deployment. The point of integration with a continuous deployment pipeline can be an artifact registry.

Container orchestrators provide advanced functionality for service management:

deploy artifacts. e.g., container images, on-demand with a number of instances
self-healing properties by automated restarts on failure, ensuring the configured number of replicas
service instance discovery features and/or internal name resolution mechanisms
monitoring capabilities
auto-scaling based on metrics
load balancing functionalities
automated rolling upgrades and rollbacks

Kubernetes emerged as the de-facto standard in the industry as the predominant tool for container orchestration.

In comparison to deploying to a cloud platform, a cluster manager provides more portability and reduces the risk of a cloud vendor lock-in. Different cloud providers offer managed Kubernetes instances, but Kubernetes can be self-hosted in a private data center as well.

Utilizing a cluster manager requires the know-how with the specific tooling. We advice to have a look at our best practices on how to overcome the knowledge demand of microservices to ensure the microservice team autonomy.

We found a conflicting meaning for the word orchestration in the microservice domain. Next to container orchestration it is used in the context of workflow orchestration. Thus, we suggest to speak of workflow orchestration and container orchestration instead of leaving room for ambiguities.

Maturity

Proposed, to be evaluated.

Sources of Evidence

L3:

Context: Backtory Migration Case
- CI: Gitlab -> Jenkins -> Docker Artifactory -> Kubernetes / CoreOS Deploy
  - for each microservice!!!
Pattern: Deploy into a cluster and orchestrate containers
- reduce difficulties around deployment of many instances from different services
- reduce operations team's resistance to development team's changes

L5:

Selecting platform significantly influences architecture of microservices system
Container orchestration tools + structured PaaS solutions => various features makes deployment and operations very trivial
- selecting right solution is critical
- each technologies comes with set of assumptions and opinions the designer has to follow to fully utilize platform's potential
Related keywords: operations, orchestration deployment, scaling, auto-scale, rolling upgrades, images, container

L8:

4th wave: container orchestration technologies
- e.g. Mesos, Kubernetes, Docker Swarm, AWS Elastic Container Service, Nomad
- automate container allocation and management tasks
- abstract away the underlying physical or virtual infrastructure from service developers
1st generation: lightweight containers (e.g. LXC) + deployment to container orchestration
Container orchestration present in 1st-3rd gen; replaced in 4th gen by FaaS layer

L9:

Service orchestrator: manages all service and decides how many instances need to be run and where
- based on certain policies, e.g. co-location or anti-colocation
- launches containers along with their sidekicks on the same host
- tracks status of container and takes action accordingly, e.g. relaunch on other host if host restarts
OpenStack example: HA framework simplifies managing, monitoring, and recovering OpenStack services
- each service considered as resource and managed by an executable
- resource agent has to follow certain specs (OCF)
  - (-) complexity
  - (-) additional burden on developers
  - (-) difficult for operations to debug
- operators has to manipulate resource agents for individual resources
- Microservices => packaged as containers
  - HA framework: only implement monitoring mechanism to inquire status of service
  - Service fails => kill existing container ,create new one in its place
  - "Cattle vs. pet" principle
    - (+) flexibility
    - (+) simplicity
    - (+) efficiency
other clustering tools:
- fleet, Kubernetes, Docker Swarm

L12:

Kubernetes good integration with CoreOS, tool for easy deployments of containers on a cluster
easily fetch container from private repository
deploy to a cluster with different policies, e.g. 3 always available instances
Example: GitLab -> Jenkins -> Docker/Artifactory -> Kubernetes -> to cluster of CoreOS instances

L13:

Kubernetes = system that provides capabilities in a cloud / clustered environment (isolation, load balancing, ...)
- lightweight, simple and self-healing services management
- high scalability and simplicity => promotes containers
Pod = group of containers as deployable unit
- all containers in pod share same fate
- destroying faulty pod => routing fabric will route to healthy one => does not harm
- startup faster than VMs using containers
makes a lot of sense with microservices and containers
Alternatives
- Mesos: similarly built to Linux kernel with different level ob abstraction
  - master(s), slave(s) and framework(s)
  - frameworks to manage different type of workloads
  - hierarchical resource management solution => increases scalability and reduces latency due to resource cheduling
  - highly scalable (10.000s of nodes) => strong for microservices management

L14:

deployment on compute cluster / cloud
- using containers running in cluster-management infrastructures
- e.g. Apache Mesos
- schedule containers onto nodes in a compute cluster and manage load balancing among containers

L18:

separate from service discovery problem: "orchestration"
orchestration = process of deploying containers on a cluster of machines
CoreOS, Mesos/Mesosphere, OpenShift, CloudFoundry, Kubernetes, Brooklyn/Clocker, Shipyard, Crane, ...
Kubernetes and Mesosphere include service discovery features
Adoption of orchestration solution often a very heavy undertaking requiring highly specialized deployment environments

L23:

Different tools can manage containers in the cloud
- Docker Swarm provides native clustering for Docker containers
  - turns pool of Docker hosts into single virtual Docker host
  - all tools communicating with Docker daemon can use Warm to transparently scale to multiple hosts
  - Docker container manager represents basic container-oriented technology
- Kubernetes = os tech for automating deployment, operations, and scaling of containerized applications
  - groups containers into logical units for easy management and discovery, e.g. based on resource requirements and other constraints
  - horizontal scaling of applications
    - manually
    - automatically based on CPU load
  - automated rollouts and rollbacks and self-healing features
- Magnum = the OperStack API service
  - makes container orchestration engines (e.g. Swarm, K8s) available as first class resources in OpenStack managed datacenter
  - uses Heat service to schedule and operate system image (Docker, Kubernetes ,..) => runs image on VMs or bare metal cluster
- Google Container Engine = commercial service relying on Docker and k8s for cluster management and rochestration
  - AWS EC2
  - Rackspace Carina
Microservices can have different dominating requirements: storage, computation, communication
- => container scheduling platforms must to combine to minimize workload interference and contention
- Balancing resource consumption and performance critical for decision where to deploy microservices

L24:

microservices can ideally packaged, provisioned and orchestrated through the cloud with the usage of lightweight container technology
DevOps requirements reflect industry desire that production-ready microservices should follow key dev practices of DevOps CD pipeline
- easy source code access from version control
- support for DevOps practices like CI, automated testing, dependency management, automated deployment, and container orchestration
R9: Container Orchestration support
- containers can be automatically scheduled and orchestrated on top of any physical or virtualized computing environment
- most commonly used container orchestrator tools: Docker Swarm, Kubernetes, Mesos
- automated support to address key challenges of microservice deployment
  - service discovery
  - load balancing
  - rolling upgrades
When benchmarking microservice architectures: ideally support for multiple tools for container orchestration (among others)
- 1 of 4 viewed OS apps have multiple deployment tools, none others use container orchestrators

L28:

Common practice to control scale-up or scale-out performance by changing the number of containers with orchestration tools as Swarm and Kubernetes
- container should only include one process => fine-grained control, but more containers
- increased number of containers on OS is expected to increase impacts on performance

L30:

Cluster management tools (e.g. K8s) automatically manage and deploy services on pool of machines, continuously adapt deployment structure at runtime
microservices more difficult to operate (distributeness)
sophisticated (container-based) virtualization and infrastructure technologies as Docker & K8s emerged
- facilitate operations by functionality as rolling updates, automated scaling, rebalancing in case of node failure
- => deployments much more dynamic and volatile than traditional applications
- => monitoring and performance modelling challenges
Need for additional metrics for auto-scaling
- container level (e.g. CPU, memory, startup times) to decide scaling up or down used by K8s

L31:

Migration practice MP14: Deploy into a cluster and orchestrate containers
Context:
- CI in place
- Production-ready container images available for each service
- large number of services exist => deployment and redeployment complex, cumbersome, and unmanageable
Problem:
- How instance deployed into cluster?
- How (re)deployment of all services orchestrated with least effort?
Solution:
- System to manage cluster of computing nodes
  - deploy images on demand,
  - with specified number of instances and on different nodes
  - handle failure of instances, restart failed nodes or instances
  - provide means for auto-scaling services
  - internal name resolution strategy good for some services as service registry => internal name instead of IP
Challenges:
- cluster manager = single-point of failure
  - => high-available deployment with no single-point of failure
Technologies:
- Mesos + Marathon
- Kubernetes
All three case studies applied this pattern

L37:

Context: IoT Framework based on microservice architecture
use Docker and Kubernetes for deployment

L41:

Context: Experience report from banking domain
Services hosted in Linux Containers on Docker Swarm cluster
- enables to use Docker tools
  - Docker Compose to deploy whole architecture
  - Internal Docker registry to host all images
- enables CICD to fully automate deployment
  - e.g. rolling updates
Orchestration by Docker Swarm
- Failed services can be automatically restarted => self-healing
- service discovery
- load balancing

L43:

Container technology mainstream with development of container management and orchestration systems
- Kubernetes, Docker Swarm, AWS container services

L47:

Docker Swarm: decoupling of containers and machines supporting them
- Docker Compose allows for Distributed Application Bundles
  - Define set of services
  - only depend on presence of a Docker host IP address and access credentials
- best low coupling to complexity ration in combination with web-hook based integration

L54:

Cluster Manager
- each service inside its own container
- "Cluster Scheduler" and "Cluster Manager": find which host is suitable to which container and connect them
  - (+) more efficient resource usage
  - (+) follow user placement constraints
  - (+) make service always available
- Tools: Kubernetes and CoreOS, Deis based on Docker, Docker Swarm, Apache Mesos and Mesosphere Marathon
  - Docker Swarm simplest scheduler

L61:

service orchestration as part of infrastructure services
need for system level management
- health management, autoscaling to leverage underlying infrastructure efficiently

L63:

Context: MiCADO concept and architecture
goal: dynamic and automated resource supply for applications
two phases:
- (1) optimized deployment
  - requires high-level description of application: app topology, QoS parameters as cost and performance requirements, security policies
  - passed to Coordination/Orchestration component
    - has components to supply security and actual deployment values
  - passed to deployment executor => runs the app on targeted cloud infrastructure
- (2) run-time orchestration
  - continuous collection of metrics, passed to Coordination/Orchestration component
  - decision making on adjustments => passed to deployment executor
users can adjust any requirement during runtime
generic architecture to identify a modular and pluggable framework, components can be easily substituted => tech neutral
multiple layers
- (1) application layer => run applicatio ncode
- (2) application definition layer => define requirements of apps and their interconnectivity using application descriptions
- (3) orchestration layer
  - (a) coordination interface API => utilize orchestration functionality to dynamically scale
  - (b) microservice discovery and execution layer => keep track of running services, starts and shuts down microservices
  - (c) microservice coordination logic layer => information of how execution env is performing gathered, detect bottlenecks, launch or shut down cloud instances, move microservices to other worker node
  - (d) cloud interface API => encapsulates cloud APIs to layers above
  - (e) security, privacy, and trust services
- (4) cloud interface layer => menas to launch and shut down cloud instances, can support multiple clouds
- (5) cloud instance layer => contains instances by IaaS cloud providers, run containers executing microservices
prototype implementation
- Technologies: Swarm master node, Consul, Prometheus, Occopus, HAProxy
performance evaluation
- deploy application took 140 seconds in their experiment
- create infrastructure: 320s
- destroy infrastructure: 15s
- scale up app node: 300s
- scale down app node: 12s

LN21:

microservice containers can be managed by clusters with configuration management, service discovery, service registy, load balancing between running infrastructure frameworks
examples: Spring Cloud, Mesos, Kubernetes, Docker Swarm

LN42:

Technologies for microservice orchestration: Azure Container services, ConductR, Spring Suite
Kubernetes as DevOps tool

LN43:

Challenge: handle orchestration of microservices in production
solution: many new tools to do so like Kubernetes and Mesos
Kubernetes provides horizontal scaling, service discovery, load balancing etc.

LN44:

Context: security issues per layer
Orchestration layer
- threat example: compromising discovery, registering malicious nodes within system, redirecting communication to them
  - attacker inside orchestration perimeter can eavesdrop and manipulate traffic
- mitigation example: protection or orchestration platforms and its components
  - not well investigated area

LM43:

Context: SLR findings about microservices in DevOps
S16: DevOps based approach "Neo-Metropolis"
- uses, among others, Mesos to deal with scalability and elasticity across different cloud platforms
S27 recommends using Kubernetes to scale microservices in prod
S25: Jolie redeployment optimizer to achieve optimal deployment
- Zephyrus generates optimal architecture of MSA
- Jolie Enterprise framework to deploy and manage microervices
- Jolie Reconfiguration Coordinator interacts with other two components for optimized deployment
- interpretation: reconfiguration coordination manages the cluster
DevOps infrastucture to ease deployment
- among others, Kubernetes to address frequent and optimal deployment
- Kubernetes used frequently in industry

LM45:

Context: interviews and insights from multiple cases on technologies and sw quality in MSA
C1-S1
- migration to Docker + Kubernetes planned to increase operability
C5-S8
- Preparation for Kubernetes and impl of 12-factor app principles seen as difficult
C9-S13
- Politically motivated changes: CloudFroundry to Kubernetes
  - made mastering new tech challenging
Two participants positive about CloudFoundry: more developer friendly than Kubernetes
- Kubernetes powerful but complex
Portability profits from container tech like Docker and Kubernetes
- easy to change platform
Kubernetes provides liveliness probes => made reliability positive

LM47:

Context: SLR with tactics to achieve quality attributes
7 of 12 studies: resource management and allocation tactic relies on horiontal duplication to scale resources
- containers and VMs can be orchestrated by central manager
  - in charge of scaling
  - allocation to VM/cloud provider
- requirement: orchestrator requires (performance) monitoring and parameter setting

LM48:

Context: microservice migration describes an examples project (FX Core) and compares back to monolith
MSA can be deployed to single Node, but no scalability => deploy to cluster
- utilize resources from multiple hosts as single system
- elasticity: increase and decrease with additional hosts
usually comes with containerization
services actively run in parallel
infrastructure and dbs use clustering mechanisms
- depend on requirements like performance, consistency, availability
- usually included, e.g., RabbitMQ or Redis
use Docker Swarm for all services
- Docker compose: deploy all services with dependencies for local testing
  - exactly same env like for production (even though locally deployed) => same reliability between envs
- Docker registry for container images
- CI builds container images and saves in registry
orchestration necessary for managing service containers and infra
- without: need for whole system running on every host (vertically scaled monolith)
- without: develop and maintain features for large scale operations
example open-source orchestration tech: Kubernetes, Mesosphere MArathon, Docker Swarm
- provide features like service discovery, load balancing, cluster management, replication of services, distribution of replicas across nodes
load balancing and service discovery implemented as part of Swarm
- hostname lookup to IP of concrete container
- use service discovery only to reach RabbitMQ, RabbitMQ as discovery between service instances
- distributes load to the redis instances

Interview A:

When speaking about containers, you'll speak about Kubernetes pretty soon
- Container as basis for k8s
- Microservice => kubernetes service
- orchestration
- auto-scaling
  - default metrics: CPU and RAM
  - define the MAX of instances => otherwise potential cluster overflow if endless loop bug
- replica sets => 3 instances on different nodes / ESX hosts / fire areas
  - work with min and max
- load balancing => coping with failure
- monitoring on different levels
Completely new skillset, new job descriptions
1st iteration
- Started out with 5-6 containers
- "we don't even need Docker Swarm" - is now even deprecated
- we do it with Ansible => coachings
- Upgrade to Ansible Galaxy: need to version ansible roles with container roles etc.
- => Ansible alone not enough
2nd iteration
- reached 50 containers => hell
- hard/tedious to model with Ansible: dependencies, etc.
- supported port offsets, different stages, pipelines, and test environments
- but didn't suffice at some point of time
3rd iteration: end of 2017, early 2018
- Kubernetes, 2nd most important OS project after Linux
- Fear the this would overstrain the developers since new technologie again
- Developers busy with other stuff: making microservice cloud-nativ (configurable by outside, log to std-out, ....)
- DevOps team managed the deployment
- Introduction of deployment yaml, KubeCTL
4th iteration
- got too complex
- need for Helm Charts
- again, new know how
- DevOps team writes helm chart => is the product that is delivered to customer
- one command to update software, db schemata, new secrets, volume claims, ...
Operate their own "cloud"
Challenges
- Certification of BSI; only Sless and RedHat for enterprise supported
  - Is K8s a runtime, an OS, an application?
  - Who enterprise-supports the k8s?
- Bringing K8s to production: 9 person years
  - cheaper in cloud, but was not an option
  - OpenStack, Openshift, Ranger, plain Ubuntu, Suse Casp?
- Pace of updates (for security) is a challenge
  - Hard API changes
  - Need for the resources to manage that
vanilla-k8s as common denominator
- otherwise coordination which OS run on customer hardware to deploy to etc.
- stay compatible to all vendors!
use virtualization in between, VMWare as hypervisor

Interview B:

Kubernetes cluster as potential point of solving problems
- service discovery, resilience, load balancing

Cluster Management by Container Orchestrator

Context​

Problem​

Solution​

Maturity​

Sources of Evidence​

Context

Problem

Solution

Maturity

Sources of Evidence