Bryan Ford, a researcher at Yale, has just released a very interesting 5-page paper, “Icebergs in the Clouds: the Other Risks of Cloud Computing”, which provides a compelling technical and historical explanation of the potential for emergent chain reaction failures in commercial cloud computing offerings.
He takes a nuanced approach, explaining that this is speculative and forward-looking, but given the increasing reliance on public cloud computing, this is an area that deserves research attention now.
To quote selectively from the synopsis linked to above: “As diverse, independently developed cloud services share ever more fluidly and aggressively multiplexed hardware resource pools, unpredictable interactions between load-balancing and other reactive mechanisms could lead to dynamic instabilities or “meltdowns.” Non-transparent layering structures, where alternative cloud services may appear independent but share deep, hidden resource dependencies, may create unexpected and potentially catastrophic failure correlations, reminiscent of financial industry crashes. ”
When cloud computing buyers do have a concern about risk, it tends to be narrowly focused on data confidentiality. Many security practitioners are unfortunately encouraging a simplistic belief that if we just deal with data secrecy, then we can use cloud services for everything. The types of failures that have already occured suggest a greater need to look at the broader risk issues, especially reliability and data loss.
In his introduction, Ford explains that “Non-transparent layering structures, where alternative cloud services may appear independent but share deep, hidden resource dependencies, may create unexpected and potentially catastrophic failure correlations, reminiscent of financial industry crashes.” In a 2010 blog entry, ‘Toxic Clouds’, I drew multiple parallels between the runup to the financial services meltdown, and the willingness of cloud buyers to accept complex but non-transparent offerings. In both cases, the mechanisms that allow the leveraging of a relatively small set of resources to provide high levels of value for a huge number of customers means that a single failure can have widespread impact. In both cases, complexity inevitably leads to brittleness, unless explicit measures are taken to anticipate and prevent the emergence of destabilizing behaviors. In the case of the financial markets, greed won out over governance. It remains to be seen what will happen with public cloud computing.