Every evening for several decades, a number of American television stations announced that it was 10pm, and asked the public service question “Do you know where your children are?” Anyone using a cloud computing service should be asking the same question about their data.
Over the next few months, I’m going to be researching an area of cloud computing risk that hasn’t received adequate attention: data continuity and recovery.
Theoretically, the cloud computing model should be a resilient one, and a number of vendors claim that their model is built to automatically replicate data to an alternate site, protecting their customers from the risk of hardware failure, or even site failure. I have no trouble believing this.
What I do have trouble with is accepting unsubstantiated vendor claims that this is a more reliable mechanism than anything I can do for myself. There is no perfect mechanism for backing up data, but if I choose to be responsible for backing up my own data, I’ve got quite a bit of useful knowledge about the reliability of the mechanisms I choose, and the degree to which the processes are performed. I can verify the integrity and completeness of the copies, I can store them offsite and post armed guards, and I can periodically test to ensure that restoration is possible. None of this is foolproof, but it can be reliable to what ever degree I desire.
If I choose instead to rely on a cloud service provider, I have no ability to know where the primary data is, let alone have an ability to verify that redundant copies of all my data exist in a different site. I have no ability to know the likelihood that my provider would be able to restore my data in case of an accident, let alone restore something important that I accidentally deleted.
And if my data in the cloud is being backed up in real time, it raises another significant question: if the original data is corrupted, won’t the same corruption affect the copy? Mistakes and errors replicate at the speed of the cloud. What if data loss occurs as the result of some sort of cascading failure, or external attack? Isn’t it reasonable to assume that this would affect all copies of the data? Traditional backups are inherently more reliable in that offline data is insulated from failure modes that are inherent to realtime online redundancy models.
If you don’t know where your data is, can you confirm that it will be there when you need it most?
What evidence do you have from your provider that their proprietary technology is reliable?