Gartner Blog Network

Amazon’s Cloud Outage Should Cause Pause

by Robert Desisto  |  April 21, 2011  |  5 Comments

While some may write that companies should have gotten better Service Level Agreements with Amazon, I ask does it really matter with such a long outage? Can a SLA really repair damage to your business? Just imagine having your customer contact center in the cloud being down for 20 hours. I am sure your customers would gain comfort knowing your company has a good SLA. I am not here to say Amazon’s outage will cause the death of cloud computing, but to explain away the obvious with long written explanations of zones, regions, etc, is good information of what happened but avoids the obvious issue. We need to show pause rushing to the cloud especially the more mission critical the application is to the business.

One CEO posted the following on their web site to their customers today:

We are unfortunately still down. This is beyond frustrating for us as well… as you can imagine. This has been a widespread, day-long outage that you may have seen reported in the media. Amazon Web Services has been experiencing major issues with failures on multiple levels.

These issues have taken down thousands of major sites with it. This outage has affected us as well as a number of other related sites.

We have a new server up that is restored from our backups, but it would mean losing almost an entire day’s worth of data. Our hope is that we can wait a bit longer for Amazon to get its act together and be humming back along with zero data loss. Worst case scenario at this point is sometime tonight or early AM we make the call to go back to the backups and things will be back to normal (minus yesterday’s data, which would need to be re-input)

We are very sorry for the problems and wish there was more we could do.

The lesson for today is not about Amazon although there are important tactical considerations we have learned. What is more important is to understand cloud outages do and will happen in the future. We have published a number of good research notes on mitigating risk but there is no way to completely avoid risk. However, taking pause and evaluating which applications you are willing to move to the cloud knowing there will be unmitigated risks is something we should all take greater care in doing.

Category: cloud-computing  software-as-a-service  

Tags: saas  saas-cloud-computing  

Robert P. Desisto
VP Distinguished Analyst
14 years at Gartner
24 years IT industry

Robert Desisto is a Vice President and Distinguished Analyst in Gartner Research. He is responsible for managing the software as a service (SaaS) research agenda. His research focuses primarily on the use of SaaS as a delivery model for applications. Read Full Bio

Thoughts on Amazon’s Cloud Outage Should Cause Pause

  1. I think it is always wise to give serious thought on which apps to move to cloud and which ones to hold back for on-premises deployments. I do however think that collectively we are overreacting to the cloud outages. This Amazon US-East was a long and painful outage no doubt and it caused loss of revenue to many. However, I feel that when it comes to the cloud outages our responses are much more driven by the visibility of the outage than the probability of failure. Every data center out there will experience an outage … this is just a fact of life. When an outage occurs in a cloud provider facility the visibility is immediate and wide spread. But what about spectacular failures in company’s own data center? runs in US-East-1. We were both lucky and well prepared because we use DB2 HADR and did not suffer an outage I am sure had we run the site in our own data centers (not an option as this is a community effort) we have been at as much a risk of an outage … maybe even more. For all the knocking that Amazon and other cloud providers are getting, they run pretty impressive operation and historically have delivered up-times that are the stuff of envy for most data center operators.

  2. Mike T. says:

    Folks, Amazon’s Cloud anything IS NOT an Enterprise class solution! The DNA of all this is from when monetized their e-tailing platform to other retail firms (and revenue shared with them). Their approach to cloud is hard to live with because of the one-size-fits-most does provide the flexibility that many firms demand. For those buyers of this service that have set their companies up without diligence for use case, shame on you! Your ignorance isn’t Amazon’s fault.

  3. John C says:


    Amazon sold this as 99.999 reliability and consistently sells it as Enterprise class. That makes the suffering Amazon’s fault. I’ve got backed up instances locally, etc. But the guy quoted lost a day. Their organization has pretty good disaster recovery.

    Amazon will lose big here because they cannot credibly sell this as an enterprise solution. That’s the underpinning of the whole model. Only option for them is to redesign the whole offering to credibly dismiss the possibility of recurrence. Few will pay to double store their data on multiple locations, that was part of the offering – and the price.

  4. sasamat says:

    Saying that Amazon is not enterprise-class is like saying an IBM x-server is not enterprise-class. Not very helpful and not very meaningful.

    Amazon is a provider of compute and storage, like the aforementioned server. Give that server RAID direct-attached-storage, or dual-homing to a SAN, power from two UPSs and a mirror image of itself in another data center and you can perform synchronization between the two. Lo and behold: enterprise-class computing!

    All of the above can be achieved with Amazon using different ‘Availability Regions’ and the appropriate software.

    The reality is that the majority of Amazon’s clients are startups (many in the social networking space) that are willing to take the risk (or don’t comprehend it) in return for scalability, agility and price. Another significant group of clients are enterprises in search of cheap agile compute for problems requiring mass horizontal scalability, but not persistence.

    The really fascinating question behind this outage is the economic one, i.e. what level of risk/cost ratio are companies willing to tolerate for Information Technology.

    Countless small enterprises that make heavy use of IT don’t have diesel backup and rely on their electrical utility to provide adequate uptime… sans SLA I might add. This is exactly the calculation that anyone using Amazon and its ilk is making–whether they are aware of it or not.

    The cloud is all about economics—as are public electrical utilities—and we are in an important phase in the maturation of a field who’s economics have long been misunderstood to say the least.

  5. Lydia Leong says:

    @John C: Amazon did not bill it as five-9s reliability. EC2 is SLA’d at 99.95% for a year, multi-AZ. EBS and RDS have no SLA at all.

Comments are closed

Comments or opinions expressed on this blog are those of the individual contributors only, and do not necessarily represent the views of Gartner, Inc. or its management. Readers may copy and redistribute blog postings on other blogs, or otherwise for private, non-commercial or journalistic purposes, with attribution to Gartner. This content may not be used for any other purposes in any other formats or media. The content on this blog is provided on an "as-is" basis. Gartner shall not be liable for any damages whatsoever arising out of the content or use of this blog.