While some may write that companies should have gotten better Service Level Agreements with Amazon, I ask does it really matter with such a long outage? Can a SLA really repair damage to your business? Just imagine having your customer contact center in the cloud being down for 20 hours. I am sure your customers would gain comfort knowing your company has a good SLA. I am not here to say Amazon’s outage will cause the death of cloud computing, but to explain away the obvious with long written explanations of zones, regions, etc, is good information of what happened but avoids the obvious issue. We need to show pause rushing to the cloud especially the more mission critical the application is to the business.
One CEO posted the following on their web site to their customers today:
We are unfortunately still down. This is beyond frustrating for us as well… as you can imagine. This has been a widespread, day-long outage that you may have seen reported in the media. Amazon Web Services has been experiencing major issues with failures on multiple levels.
These issues have taken down thousands of major sites with it. This outage has affected us as well as a number of other related sites.
We have a new server up that is restored from our backups, but it would mean losing almost an entire day’s worth of data. Our hope is that we can wait a bit longer for Amazon to get its act together and be humming back along with zero data loss. Worst case scenario at this point is sometime tonight or early AM we make the call to go back to the backups and things will be back to normal (minus yesterday’s data, which would need to be re-input)
We are very sorry for the problems and wish there was more we could do.
The lesson for today is not about Amazon although there are important tactical considerations we have learned. What is more important is to understand cloud outages do and will happen in the future. We have published a number of good research notes on mitigating risk but there is no way to completely avoid risk. However, taking pause and evaluating which applications you are willing to move to the cloud knowing there will be unmitigated risks is something we should all take greater care in doing.