Sidecars / Service Meshes

Context

Microservices are being adopted. Multiple microservices' instances are being deployed to a deployment environment.

Problem

Setting up and configuring all the required infrastructure components is too complex to facilitate.
- E.g., monitoring, service discovery, load balancing, communication encryption, ecetera

Solution

Use sidecar deployments or a service mesh to simplify the setup and configuration of your microservice infrastructure.

Sidecar processes or containers take over repeating and cross-cutting logic of microservices. ervice meshes emerged by using sidecars offering a fully integrated service-to-service communication management solution. As they don't affect the microservice code itself, sidecars and service meshes contribute to the technological freedom of microservices and their development autonomy.

The following features can be facilitated by using sidecars or service meshes and thus require no additional infrastructure component anymore:

Service instance discovery
Load balancing
Simple fault tolerance implementations
Monitoring
- Tracing
Security
- Service-to-service authentication
- Communication encryption

On a macro-level architecture, the teams have to decide which of these functionalities is part of a microservice itself, part of the API facade, or part of the service mesh. Deliberately decide where to put the domain logic in your architecture.

As sidecars become a network proxy for microservices, the sidecars may also contribute to resilience testing by emulating specific failure scenarios L35.

Maturity

Proposed, to be evaluated.

Sources of Evidence

L8:

8th wave: side-car technologies
- Examples: Prana, Envoy, SmartStack
- encapsulate communication-related features
  - service discovery
  - protocol-specific communication libraries
  - fault-tolerant communication
- => abstract that from service developers
- transparent service (network) intermediates
  - natural point of monitoring
- original idea: improve reusability by encapsulating functionality into sidecar
- each sidecar is self-contained
  - (+) features of fault-tolerant comm libs to any new programming language
  - (+) increases development autonomy
10th wave: service mesh technologies
- Examples: Linkerd, Istio, Conduit
- build on sidecar technologies to provide fully integrated service-to-service communication monitoring and management environment
- (+) dynamically monitor and manage behavior of multiple distributed sidecars by centralized control plane
- (+) fain-grained control over variety of service-to-service communication features
  - service discovery, load balancing, fault tolerance, message routing, and security
Comparison FaaS / Serverless
- unclear if sidecars and service meshes would still be necessary
- Existing FaaS platforms don't provide yet all traffic management features
- Sidecars-like functions could act as intermediate for all function-to-function interactions
  - Higher-level control plane function could monitor and manage sidecar functions' behavior => new kind of service mesh

L9:

Configuration State Manager
- Each container has a "sidekick" process => monitors container and (de)registers service in CSM
  - => contributes to solving service discovery problem

L18:

Serfnode = non-intrusive Docker image advertising any Docker container to a dynamically formed cluster of Serfnode containers
- Features
  - solves service discovery problem
  - enables event-based communication
  - Monitoring and self-healing mechanism for added resiliency
  - lightweight
  - platform-agnostic
  - easy to integrate with existing system
  - is extensible (e.g. file system synchronization problem)
- Usage
  - No modification of container necessary
  - No installation beyond downloading image and providing minimal configuration
    - which container to run
    - identifier for the service provided by that container
    - run the Serfnode container
      - ensures the service container is running and maintains local registry of services in cluster

L35:

Context: Resilience testing framework "Gremlin"
Leverage a service proxy to implement Gremlin Agent
- acts as Layer-7 router, handling outbound calls from a microservice
- natural access to messages passing through application
- Ways of implementing proxy
  - As sidecar
    - AirBnB, Yelp
    - standalone process in same container / VM
    - communication between microservice and proxy over loopback network interface
      - reliable and low overhead
  - Language-specific libraries
    - Netflix, Spotiry
- Proxy remains immune to evolution of application logic
- Support fault injection => resilience testing

LN21:

Industrial systems run on highly complex infrastructures as auto-scaling microservice clusters and service meshes
service mesh = dedicated infrastructure layer for service-to-service communication
=> complexity and heterogeneity poses challenge on tracing and visualisation

Interview A:

Challenge of 2021: encrypt communication between containers
- Istio as Service Mesh (based on Envoy)
  - solves the problem by attaching a side-car by K8s
  - containers only communicate via localhost with Envoy container
  - Envoy starts secured communication with comm. partner
- Trend going to LinkerD
  - They will use it as Service Mesh, currently in evaluation phase
Service mesh
- much more fine-granular tracing
- network policies
- encrypted communication
- = next huge know-how hurdle to overcome
- migration => breaking changes since requires to whitelist

Interview B:

Service mesh can take over several jobs
- service discovery
- resilience patterns
- load balancing patterns
- security and access restriction
=> What is part of gateway, what part of mesh, what part of service?
- Decision on makro-architecture level
- Every domain-knowledge specific decision: within service (e.g. domain alternatives)
- Fail fast: into gateway
- not fail fast since triggered from within system but not domain specific: into service mesh
- Security
  - Basic security in gateway: DnS protection
  - Fine-granular security: Service Mesh
Service mesh as a wrapper to a service
- uses a sidecar to provide functionality
  - security and timeout patterns
    - timeout
    - bulkhead pattern
    - retry pattern
      - simple retry requires no domain knowledge => can be done within servie mesh, but after retry failed: in service to use domain alternative
Don't facilitate elaborate rule engines => distributes business logic over infrastructure

Interview D:

What can we solve with infrastructure by service mesh?
- omission failure => retry
- detect timeouts, but not react on timeout

Sidecars / Service Meshes

Context​

Problem​

Solution​

Maturity​

Sources of Evidence​

Context

Problem

Solution

Maturity

Sources of Evidence