In the Disney cartoon Dumbo, a misfit elephant discovers that his ears are so large that he has the ability to fly. Lacking confidence in this unusual capability, he is understandably reluctant to fully exploit it. His shrewed friends, a pack of crows, come up with an ingenious psychological ploy. They give him a single feather, explain that it has magical properties, and as long as he clutches it in his trunk, he will be able to fly. This ruse is wildly successful, enabling the young pachyderm to soar with the crows.
This story is compelling because it illustrates a common form of human behavior. Desiring to do something, and unfamiliar with the associated risks, we often latch onto talismans, superstitiously hoping that they will protect us from unforeseen disasters. We do the same thing in the IT world, vainly clutching contractual SLAs in order to gain the false confidence that it will enable us to safely fly in the public cloud.
A typical example is a security document I recently reviewed which explained how this particular enterprise could safely rely on a specific SaaS vendor to reliably protect, and if necessary recover, their data, because it was ‘protected by a high level of SLA’. AN SLA IS NO MORE THAN AN EXPRESSION OF INTENT; IT IS NOT EVIDENCE OF DELIVERABILITY.
I’m not saying that you should not seek very specific service level agreements in your contracts–by no means. What I am saying is that the mere fact that a vendor promises to do something does not mean that you as the buyer can rely on it to happen.
If your business depends upon having access to data or services that are hosted externally, then you need to either have a tested contingency plan that functions entirely independently of that provider, or you need specific evidence that the provider is not only reliably making offline backups that are consistent with your recovery point objectives, but also that they have a proven ability to restore your data within your time objectives.
I have to admit that I’m completely at a loss to come up with some sort of test which would reliably demonstrate that in the event that even a small part of their client data was lost, say a petabyte worth belonging to a couple hundred thousand digital tenants, that a service provider would be able to restore it within a few days. As a recent case in point for what was a minuscule disaster, earlier this year it took Google 4 days to restore what they described as .02% of their gmail users.
What form of evidence could demonstrate that a provider with hundreds of thousands of tenants, if not millions, could, after some sort of data-eating disaster, reliably restore that data from offline backups and link it back into the accounts and applications such that it could be used again by their customers? How long would it take for the highly-leveraged administrative staff of a cloud services provider to complete such an operation? Where in the recovery queue would you be?
An SLA from a public cloud service promising some sort of recoverability can be a crow feather, clutched in the trunk of the enterprise elephant, providing them the false courage to be willing to fly in the public cloud. I hope that this is not a lesson that your organization will have to learn the hard way.
Recent Gartner research on this topic includes ‘Black Swans’ Are Sure to Fly in the Public Cloud, and The Realities of Cloud Services Downtime: What You Must Know and Do.