The pace of change in IT is unlike any other discipline. Even dog years are long by comparison. It’s an environment that makes us so primed for what comes next that we often don’t pause to ask why. This is very much the case with cloud computing, where cloud is the sine qua non of any CIO’s strategy and leads to vague approaches such as ‘Cloud First’.
This phenomenon also applies to application architecture, where microservices and composability are all the rage. The underlying rationale makes sense: break complex applications into function-specific components and assemble the pieces you need. After all, this is what software engineers do; they create and implement libraries of functionality that can be assembled in almost limitless ways. The result is an integrated collection of components are more elegant than the monolithic applications they replace.
Technical elegance, though, isn’t always better. To illustrate why, we need to turn to probability – not the complex stuff like Bayesian decision theory or even Poisson distributions. I’m talking about availability, which in IT means the percentage of time a system is able to perform its function. A system that is 99.9% (‘three-nines’) available can perform its function all but 0.1% of the time. Pretty good, right? Of course the answer to that question is, it depends. Some applications are fine with three-nines. For others, four-nines, five-nines, and even higher are more appropriate. How you would feel, for instance, about getting on an airplane if there were a 1-in-1000 chance of “downtime”?
From Component to Systems Availability
The aggregate availability of a system is the product of the availabilities of each component. For example, the availability of a system with 3 interdependent components, each having 99.9% availability, is 99.9% x 99.9% x 99.9% = 99.7%. The following figure illustrates the availability of a system with multiple components, each with identical availability. The number of system components is show on the x-axis, and the aggregate system availability is shown on the y-axis (plotted on a logarithmic scale). The maximum potential monthly downtime for a given availability level is included on the second y-axis.
Ac = component availability Uc = component unavailability Us = system unavailability Nc = number of components Uc = 1 – Ac [example: Uc = 1 – 99.9% = 0.1%] Us = 1 – nc · Uc [example: 1 – 10 · 0.1% = 99.0%]
Implications for product managers, solutions architects, and CTOs
The point of this article is not to condemn composable applications; it’s to encourage thoughtful, intentional design. Just because everyone seems to be doing something doesn’t mean you must. In the case of application and service architecture, this means viewing the system holistically. In support, I offer the following:
Occam’s razor: entities should not be multiplied unnecessarily
Albert Einstein: Everything should be made as simple as possible, but no simpler
KISS principle: Keep It Simple …
- Design for component failure and to minimize its impact.
- Make components decoupled and asynchronous whenever possible so that loss of one component does not cascade to others.
- Make critical components redundant and automatically scalable.
- Avoid stateful components whenever possible.
- Understand service interdependencies and failure modes.
- Eliminate unnecessary complexity (KISS).