Skip to main content

Dead Letter Queue

Context

Microservices are being adopted. Communication between microservices happens over an unreliable network using a messaging broker. There is no detection of failed receiving or processing of events.

Problem

  • Messages sometimes are not received due to the unreliable network or processing an event fails.
    • Since there is no detection mechanism, there can be no reaction to these cases.
    • Inconsistent application states across microservices might arise.

Solution

Introduce a dead letter queue for every used messaging queue.

Messages that failed to deliver or process are put into a dead letter queue indicating that the message was not considered by the consumer. Compensation mechanisms ranging from automated up to manual solution to solve the problem can build on the dead leader queues.

Maturity

More data required (only 2 source).

Sources of Evidence

LM48:

  • Context: microservice migration describes an examples project (FX Core) and compares back to monolith
  • usage of RabbitMQ between business services
  • require acknowledge after successful processing of message
    • otherwise: delivered to other deplica
    • if no replica can handle message: send into error queue
      • notifies developers

Interview B:

  • need to cope with messages
    • being sent more often
    • not being sent
    • not being received
  • need for dead letter queues
    • listen to them
    • react on a certain timeout to that
    • makes sure that in limited time some compensation / notification happens
  • timeout of dead letter queue needs to be motivated by domain knowledge: 10min vs 3sec => dependent on use case
    • huge challenge for enterprise devs to add the domain knowledge aspect
    • Example from automotive environment
      • important some messages in strict order - for exactly one use-case, for others order did not matter
      • but queue was built that everything was sequential to be on the safe side
      • one item could stop whole queue => system became slow and prone to failures
      • what makes more sense: take the blockig element, but it into another queue [interpretation: dead letter queue] and deal with it