Robert Desisto

A member of the Gartner Blog Network

Robert P. Desisto
VP Distinguished Analyst
14 years at Gartner
24 years IT industry

Robert Desisto is a Vice President and Distinguished Analyst in Gartner Research. He is responsible for managing the software as a service (SaaS) research agenda. His research focuses primarily on the use of SaaS as a delivery model for applications. Read Full Bio

Coverage Areas:

Amazon’s Cloud Outage Should Cause Pause

by Robert Desisto  |  April 21, 2011  |  5 Comments

While some may write that companies should have gotten better Service Level Agreements with Amazon, I ask does it really matter with such a long outage? Can a SLA really repair damage to your business? Just imagine having your customer contact center in the cloud being down for 20 hours. I am sure your customers would gain comfort knowing your company has a good SLA. I am not here to say Amazon’s outage will cause the death of cloud computing, but to explain away the obvious with long written explanations of zones, regions, etc, is good information of what happened but avoids the obvious issue. We need to show pause rushing to the cloud especially the more mission critical the application is to the business.

One CEO posted the following on their web site to their customers today:

We are unfortunately still down. This is beyond frustrating for us as well… as you can imagine. This has been a widespread, day-long outage that you may have seen reported in the media. Amazon Web Services has been experiencing major issues with failures on multiple levels.

These issues have taken down thousands of major sites with it. This outage has affected us as well as a number of other related sites.

We have a new server up that is restored from our backups, but it would mean losing almost an entire day’s worth of data. Our hope is that we can wait a bit longer for Amazon to get its act together and be humming back along with zero data loss. Worst case scenario at this point is sometime tonight or early AM we make the call to go back to the backups and things will be back to normal (minus yesterday’s data, which would need to be re-input)

We are very sorry for the problems and wish there was more we could do.

The lesson for today is not about Amazon although there are important tactical considerations we have learned. What is more important is to understand cloud outages do and will happen in the future. We have published a number of good research notes on mitigating risk but there is no way to completely avoid risk. However, taking pause and evaluating which applications you are willing to move to the cloud knowing there will be unmitigated risks is something we should all take greater care in doing.

5 Comments »

Category: Cloud Software as a Service Uncategorized     Tags: ,

5 responses so far ↓

  • 1 Leon Katsnelson   April 22, 2011 at 1:11 am

    I think it is always wise to give serious thought on which apps to move to cloud and which ones to hold back for on-premises deployments. I do however think that collectively we are overreacting to the cloud outages. This Amazon US-East was a long and painful outage no doubt and it caused loss of revenue to many. However, I feel that when it comes to the cloud outages our responses are much more driven by the visibility of the outage than the probability of failure. Every data center out there will experience an outage … this is just a fact of life. When an outage occurs in a cloud provider facility the visibility is immediate and wide spread. But what about spectacular failures in company’s own data center?
    DB2University.com runs in US-East-1. We were both lucky and well prepared because we use DB2 HADR and did not suffer an outage http://freedb2.com/2011/04/21/cloud-crash-has-a-silver-lining/. I am sure had we run the site in our own data centers (not an option as this is a community effort) we have been at as much a risk of an outage … maybe even more. For all the knocking that Amazon and other cloud providers are getting, they run pretty impressive operation and historically have delivered up-times that are the stuff of envy for most data center operators.

  • 2 Mike T.   April 22, 2011 at 2:01 am

    Folks, Amazon’s Cloud anything IS NOT an Enterprise class solution! The DNA of all this is from when monetized their e-tailing platform to other retail firms (and revenue shared with them). Their approach to cloud is hard to live with because of the one-size-fits-most does provide the flexibility that many firms demand. For those buyers of this service that have set their companies up without diligence for use case, shame on you! Your ignorance isn’t Amazon’s fault.

  • 3 John C   April 22, 2011 at 6:04 am

    Mike,

    Amazon sold this as 99.999 reliability and consistently sells it as Enterprise class. That makes the suffering Amazon’s fault. I’ve got backed up instances locally, etc. But the guy quoted lost a day. Their organization has pretty good disaster recovery.

    Amazon will lose big here because they cannot credibly sell this as an enterprise solution. That’s the underpinning of the whole model. Only option for them is to redesign the whole offering to credibly dismiss the possibility of recurrence. Few will pay to double store their data on multiple locations, that was part of the offering – and the price.

  • 4 sasamat   April 22, 2011 at 4:08 pm

    Saying that Amazon is not enterprise-class is like saying an IBM x-server is not enterprise-class. Not very helpful and not very meaningful.

    Amazon is a provider of compute and storage, like the aforementioned server. Give that server RAID direct-attached-storage, or dual-homing to a SAN, power from two UPSs and a mirror image of itself in another data center and you can perform synchronization between the two. Lo and behold: enterprise-class computing!

    All of the above can be achieved with Amazon using different ‘Availability Regions’ and the appropriate software.

    The reality is that the majority of Amazon’s clients are startups (many in the social networking space) that are willing to take the risk (or don’t comprehend it) in return for scalability, agility and price. Another significant group of clients are enterprises in search of cheap agile compute for problems requiring mass horizontal scalability, but not persistence.

    The really fascinating question behind this outage is the economic one, i.e. what level of risk/cost ratio are companies willing to tolerate for Information Technology.

    Countless small enterprises that make heavy use of IT don’t have diesel backup and rely on their electrical utility to provide adequate uptime… sans SLA I might add. This is exactly the calculation that anyone using Amazon and its ilk is making–whether they are aware of it or not.

    The cloud is all about economics—as are public electrical utilities—and we are in an important phase in the maturation of a field who’s economics have long been misunderstood to say the least.

  • 5 Lydia Leong   April 23, 2011 at 12:36 am

    @John C: Amazon did not bill it as five-9s reliability. EC2 is SLA’d at 99.95% for a year, multi-AZ. EBS and RDS have no SLA at all.