From Stan Zaffos, Research VP and John Morency, Research VP
Episodic stories of cloud service and storage providers suffering unplanned downtime that resulted in varying amounts of downtime or losing data accessibility or in some cases actual user data should remind users that if data must truly be protected from technology and site failures, software errors, and human errors it must be stored on at least two different technologies located in different locations which may or may not be controlled by separate organizations.
While some would argue that this is excessive, history has shown that:
• Human foresight cannot predict every possible permutation of unexpected events that can lead to loss of data accessibility or the data itself and therefore 100% data availability cannot be guaranteed;
• Software and microcode can never be guaranteed as bug free;
• Facilities and infrastructure always have problems waiting to happen;
• Operational procedures, particularly those related to handling infrequent events, are rarely if ever tested under real world conditions, and they are sometimes executed by operators with little experience following them; and
• Companies, even those with the best of intentions, are not always in control of the events that impact them.
There are two very big reasons that cloud storage failures get lots of attention from the press. The first is that they impact lots of users in different organizations and are therefore by definition highly visible. The second is that large cloud service and storage providers are supporting continuously evolving infrastructures that can never be fully tested to scale and are therefore more vulnerable to software bugs and mismatches between parameter settings and workloads than storage arrays built by storage vendors that are tested to scale. It is also useful to know whether you’re getting supplementary data backup within the cloud for the cost per managed gigabyte or terabyte that you’re paying. Some providers offer backup as a separate managed service tier.
There are two practical approaches to simultaneously using two different cloud storage providers to protect against cloud provider failures. The first is to use a gateway appliance that simultaneously connects to two or more cloud providers. The second is to take advantage of many cloud providers supporting AWS compatible REST APIs and having the application write directly to two clouds. While this nominally doubles the cost of cloud storage, the economics may still work depending upon the prices, terms and conditions that you have negotiated.
For users keeping their data in house the choice of technologies used to store business critical data should depend upon RPO and RTO requirements. Assuming that the production copy of data is kept on disk arrays with no single points of failure (SPOFs) other than microcode bugs, many users will use snapshots to recover from logical corruption problems, and controller based replication to another like system located at the D/R (disaster/recovery) site for protection from technology and site failures, and disk to disk backups (D2D) or disk to disk to tape (D2D2T) as their recovery of last resort. A reasonable variation of this theme that more directly satisfies the advice of using two different technologies to protect mission critical data is using software or network based replication to a different storage system located at the D/R site.
For more Gartner blog posts on this topic, see Jay Heiser’s post at: Updating a cloud is like organ transplants without anesthesia.