Domain-Montivated Alternatives
Context
Microservices are in use or are planned to be adopted. Microservices communicate over network with each other to a certain degree. This inevitably leads to failures by the unreliable network.
Problem
- There are no reactions to failures leading to cascading failures or unsatisfying user experience.
- There are some technical reactions to failures reducing the frequency of caused defects, but still failures cascade or lead to an unsatisfying user experience.
- Implementations of failure handling are sometimes unclear or mismatch the needs of users.
Solution
Use domain-motivated alternatives instead of technically motivated alternatives.
The distributed nature of microservices leads to more complex error handling efforts. Downstream dependencies might not answer due to network issues, or defects at services might influence other microservices. We suggest to face theses troubles with creativity on the domain level thinking about domain-motivated alternatives.
Thinking in domain alternatives should be started early on: in the requirement analysis phase. Thus, next to success paths the alternative paths need to be treated as first-class citizens in the requirements engineering phases already.
Examples
For example, what should happen if a video call service is currently not available. The domain motivates us to still bring both parties in contact, maybe by offering a chat functionality or providing a phone number instead.
Another example: what should happen in a web shop when the recommendation functionality for further products fails or is not reachable? You could instead use a different microservices to show top selling or trending products as an alternative instead of showing nothing or an error to the user. Our understanding of the domain shows us that the alternative might still be appealing to customers and lead to further product sales.
What if the alternative also doesn't work? Consider adding a further fallback mechanism. For example, a once-a-day pre-generated list of top sellers and serve it statically via a content delivery network (CDN). Domain alternatives might also be different for every client type and depends on the business value the functionality provides. Thus domain knowledge is needed to come up with these alternatives.
Maturity
Proposed, evaluation required.
Sources of Evidence
L8:
- Fallback mechanisms in the service itself
L35:
- ElasticPress: fallback to MySQL powered search method when Elastic search instance was unreachable/returned error
LN43:
- Provide a fallback method on error
- e.g., a product service fails to fetch personalized recommendations
- fall back to returning default recommendations tied to the product
- or return no recommendations => UI handle the case
- => User doesn't even notice that the microservice is down
Interview B:
- Change way of thinking
- not is response positive or negative
- but incorporate and also focus on alternative / "what-were-if" paths into thinking
- thinking in domain alternatives
- since no catching exceptions in async communication
- no try-catch
- What are domain alternatives?
- stop thinking too technical, but with domain knowledge
- Example: search with two search services
- one full-text and one normal search; or only the full-text one is used
- No answer within time frame
- Domain alternative: switch to search service as alternative
- or have multiple requests in parallel and provide best result in given time span
- Example at fluege.de - search only shows result delivered in certain time span - not all, nor none - but use the best we can get within time span
- Example: give recommendations in webshop
- Recommendation service incorporating customer data and history
- No answer within time frame
- Domain alternative: don't show error but best sellers / trending charts (high probably that customer likes something of it as well)
- If that also doesn't work: pre-generated list of best sellers
- somewhere cached as snapshot
- static delivery via CDN
- Trick: question the domain functionality => is the reality transactional?
- not as transactional as we might think!
- Example: opening a bank account
- fill out many forms
- mandatory fields: without cannot go further
- example: birthdate of wife
- can't open bank account
- in reality bank account would have been opened, but without wife having access
- DDD, distributness, etc. forces us to think about domain alternatives
- There are a lot of domain alternatives
- => need to have contact person with domain knowledge to choose one
- Location of where to do domain alternatives
- gateway
- service itself facading other services
- domain alternatives into service itself
- Domain alternative can be device (client type) specific!
- UI integration at Netflix some time ago
- If a service was not available => there was static content as fallback
Interview C:
- otto.de
- where in system is eventual consistency not okay?
- customer gets delivery in 2-3 days, even with prime next day
- system has to be running really bad if order processing does not happen within 8-10 hours
- change of catalogue
- what if synced over night? No problem!
- bank transfer
- I'm not interested in it happening consistently right now, but within 2-3 days
- 2-step mode: reserve money
- bank hopes not 2 reservations too shortly after another
- usually cannot use my card that fast twice to go over my limit
- the actual processing happens over night because it is expensive (HW security modules, transactions, 2-phase commit over multiple institutions, ...)
- bank: how are the odds something goes wrong? very low!
- how much would it cost to build system better? much more!
- => accept the risk and that's it!
Interview D:
- How to deal with failures in distributed system?
- Matter of domain knowledge, not technical!
- It's already too late when you are in the "catch" block
- Needs to be sorted out beforehand!
- In the requirements analysis!
- Examples:
- log into statistics log
- e-commerce order: put into another medium => "will process it later"
- => alternatives depends on the business value
- Standard reaction pattern vs. domain alternative
Interview E:
- Context: eventual consistency between microservices
- inconsisitencies never been a problem for the interviewee
- user doesn't expect everything to be consistent within the same milliseconds
- usual case: a few seconds and its consistent against
- until I hit the refresh button, it is there
- if it is a problem: clearly communicate it
- "thanks for your input, it will be available to the other modules in a few minutes"
- "your input was received and is currently processed"
- urge towards ACID comes from our education, we got good grades doing ACID and DRY
Interview F:
- Context: consistency
- strong consistency only within a microservice instance
- implication: cannot just write down sequences because many things can happen in between
- esp. when going from service to service
- design strategies to handle these cases and decide what is okay for me
- Is it okay if user gets response asynchronously instead of an interactive answer?
- Need to thinkg about these think!
- Lots of possibilities you need to look at in the specific case