by Kyle Hilgendorf | April 20, 2012 | 2 Comments
I just returned from full days at the OpenStack conference and analyst day in San Francisco. For full transparency, I’ve been somewhat skeptical about the size of the OpenStack movement and the crippling effect of too many players and the competitive nature of those involved for moving the initiative forward. Perhaps some of this comes from me having little to no open source background. Perhaps I am right. Perhaps I am wrong.
I walk away from this week being convinced of one thing. Those involved in OpenStack are “all-in”. And only a few key industry players are absent (VMware, Microsoft, Amazon, Oracle, Citrix)…but it is pretty obvious why each of those players would not be involved. To be accurate, Citrix is still minimally involved.
But let’s look at some of those who are committed and were very vocal at the conference. Rackspace, HP, IBM, Red Hat, Intel, Dell, Cisco, AT&T, Canonical, SUSE, Nebula, and MANY others (165+), too many to name them all. My apologies.
Those listed above are some heavy pillars in the IT industry, and more importantly the open source world. We are talking about companies that have significant open source experience and wisdom. As mentors and industry luminaries have taught me, the organization of the open source foundation is more key than anything else in a movement like this. Citrix is attempting to make this exact point by moving CloudStack to the Apache Software Foundation (the most proven such foundation). Is it a concern that OpenStack did not go there first? Maybe. But the currently forming OpenStack Foundation is making key progress and apparently has looked extensively at both the ASF and Eclipse in order to build a foundation with the best aspects of other open source foundations. The emerging governance model, structure, leadership, and process is coming together. Q3, 2012 is the goal to have the foundation in place. If OpenStack meets that objective and is successful, there may be no stopping OpenStack.
But are there still too many companies involved? Probably, but that may also shake itself out. Companies will come and go. However, I no longer consider the size as a threat to destroy the initiative. When you consider the complexity of a cloud stack (not to be confused with Citrix’s CloudStack), you realize how intricate and expansive it really is. It is not as though every company involved is participating in every OpenStack project. Nova, the compute core project has perhaps the most involvement, but other love is being spread around. Thought leading networking companies like Cisco, Brocade, Nicira, and Internap are focusing their efforts on Project Quantum (Network). I was told this week that even though there are really only a few major Linux distros in terms of market share, there are still over 700 Linux ecosystem players. OpenStack is not even close yet to that size, and one could argue the potential market for cloud stacks is significantly larger than server OS.
OpenStack does not get to waltz into the party though without a fight. Many others will have something to say, and they are pillars themselves. Chris Kemp, CEO of Nebula and former NASA CTO delivered a keynote where he said OpenStack is not competing with VMware nor Amazon Web Services. He said both have completely different use cases in the industry. Yet the theme throughout the entire conference, and even keynote speakers immediately following Kemp reiterated that VMware and AWS are squarely in the crosshairs as the core competition. Don’t think VMware, AWS, Microsoft, Oracle, Citrix, and possibly others will stay idle.
One of the ways OpenStack is targeting VMware and AWS is by marketing choice and avoiding vendor lock-in as key benefits against them. I do have to issue one caution here. Does OpenStack desire to offer choice and interoperability? Yes. Are we anywhere near that reality? Not even close. The OpenStack platform is open. Public and private cloud providers building solutions on top of OpenStack however are doing many interesting (and closed) things to add value (for market differentiation) to the stack that will introduce lock-in. For instance, Rackspace, in its Next Generation Cloud built its own customer management portal, choosing to deploy a more robust portal than the default from OpenStack. While this portal is not a technology lock-in per se, it surely will be a process, management, and support lock-in. Customers will find it difficult to lift and shift from Rackspace to HP or OpenStack Provider X because of the effort involved to learn, train, and deploy their solutions into a new management portal. Possibly even more difficult is the fact that OpenStack is hypervisor agnostic (to a degree). If one cloud provider is running KVM under OpenStack and another is running XenServer, the complexity to move workloads and convert cannot be understated. I could go on with many more examples of value added lock-ins that exist, and if you are a Gartner for Technology Professionals customer, give me a call, I would love to discuss.
I’m excited to track this market over the next several months and years. I’ll get a chance to have some similar discussions with Citrix and CloudStack in early May and I hope to bring more key insights back from there.
Category: AWS Citrix Cloud IaaS OpenStack Private Cloud Providers Tags: AWS, CloudStack, Iaas, OpenStack, Rackspace, VMware
by Kyle Hilgendorf | April 16, 2012 | Comments Off
Last week I wrote a short blog on some not-so-common differentiators among public cloud IaaS providers. This week, Gartner published a large corresponding research project that I wanted to highlight.
“Evaluation Criteria for Public Cloud IaaS Providers” (Gartner for Technology Professionals subscription required) is the result of a year worth of customer interactions and personal testing. The research document covers important pieces of criteria by which enterprise customers should evaluate public cloud IaaS providers.
The document has 163 criteria components broken down into eight categories:
- Cross Service
- Support & Communication
- Service Levels
- Price & Billing
Within each category, I assigned one of three category ratings to each criterion:
- Required – Criteria that Gartner considers essential for IaaS providers to be capable of hosting production applications and to be considered “enterprise-grade.” IaaS solutions meeting less than all of the required criteria may still be employed for less-critical workloads or for very specific use cases where there is some work-around for a missing piece.
- Preferred – Criteria that Gartner considers nice to have and which are often those features that separate or differentiate good services from the best services. When evaluating IaaS providers, customers should always ask to see road maps that specify when providers plan to meet missing preferred criteria.
- Optional – Criteria that may be unique to specific use cases, or emerging criteria that will be more important as time progresses.
The research document (68 pages in length) includes a downloadable spreadsheet that allows customers to cut and paste into RFIs/RFPs or to change the weighting themselves for individual business requirements.
I am very proud of this research and many customer discussions went into creating the criteria list. I am confident that it will help many organizations cut down the time it takes to build evaluation criteria and questions for IaaS providers.
Category: Cloud Evaluation Gartner IaaS Providers Tags: Cloud, Criteria, Evaluation, Iaas, Providers
by Kyle Hilgendorf | April 11, 2012 | 4 Comments
I recently took a customer phone dialog regarding key differentiators in a public cloud IaaS offering. The customer wanted to discuss differentiators among services that are not commonly considered. The customer had already considered differences among scalability, geographic offerings, VM catalogs, pricing, security controls, network options, and storage tiers. They were essentially asking about “periphery” differences or those things that might not be immediately obvious.
I created a journal list of the things we discussed and thought that this blog would serve valuable for many others. I will not take the time to describe each in detail here, but I welcome comments and debate below. As always, Garter for Technology Professional (GTP) customers can schedule a call with me at anytime.
This is not an exhaustive list and is not in any particular order:
- Graphical User Interface / Management Console
- Provider Transparency
- Payment Models
- Billing (granularity)
- Enterprise management capabilities (asset, deploy, change, incident, problem, …)
- Ecosystem of partners & user community
- Support levels
- Simplicity of service vs. Feature set
- SLAs (breadth of, definition, and clarity)
- APIs (robustness & documentation)
Each of the above bullets can warrant a lengthy discussion in itself. But as you are considering public cloud IaaS offerings, do not forget these items.
Category: Cloud Evaluation IaaS Tags: Assessment, Cloud, Iaas, Providers, Transparency, Vendor Management
by Kyle Hilgendorf | March 12, 2012 | 6 Comments
Late Friday evening, Microsoft released their root cause analysis (RCA) for the Azure Leap Day Bug outage. My last two blog posts chronicled what I heard from Azure customers regarding the outage.
I want to share that I was very pleased with the level of detail in Microsoft’s RCA. As we learned with the AWS EBS outage in 2011, an RCA or Post Mortem is one of the best insights into architecture, testing, recovery, and communication plans in existence at a cloud provider. Microsoft’s RCA was no exception.
I encourage all current and prospective Azure customers to read and digest the Azure RCA. There is significant insight and knowledge around how Azure is architected, much more so than customers have received in the past. It is also important for customers to gauge how a provider responds to an outage. We continuously advise clients to pay close attention to how providers respond to issues, degradations, and outages of service.
I do not want to copy the RCA, but here are a few bullet points I’d like to highlight.
- It’s erie how similar the leap day outage at Azure was to AWS’ EBS outage. Both involved software bugs and human errors. Both were cascading issues. Both issues went unnoticed longer than necessary. As a result, both companies have implemented key traps in their service to catch and prevent errors like this sooner and to prevent spreading.
- Microsoft decidedly suspended service management in order to stop or slow the spread of the issue. Microsoft made this decision with very good reason. Customers would have appreciated knowing the rationale around this decision right away and Microsoft is committing to improving real time communication.
- The actual leap day bug issue and resolution was identified, tested, and rolled out within 12 hours. That is pretty fast. The other issues resulted from unfortunate timing of upgrading software at the time the bug hit, as well as a human error in trying to resolve some other software issues. Microsoft even admits, “in our eagerness to get the fix deployed, we had overlooked the fact that the update package we created….[was] incompatible.”
- Even though the human error only affected seven Azure clusters, those clusters happened to contain Access Control (ACS) and Service Bus services, thereby taking those key services offline. As I spoke to customers the last two weeks, it became quite clear that without such key services as ACS and Service Bus, many other functions of Azure are unusable.
- Microsoft took steps to prevent the outage from worsening. Had these steps not been taken, we might have seen a much bigger issue.
- The issues with the Health Dashboard were a result of increased load and traffic. Microsoft will be addressing this problem.
- Microsoft understands that real-time communication must improve during an outage and are taking steps to improve.
- A 33% service credit is being applied to all customers of the affected services, regardless of whether they were affected. This 33% credit is quickly becoming a de facto standard for cloud outages. Customers appreciate this offer as it benefits both customers and providers alike from having to deal with SLA claims and the administrative overhead involved.
As a final note, Microsoft stated in the RCA many times that they would be working to improve many different processes. I hope that as time moves forward, Microsoft continues to use their blog to share more specifics about the improvements in those processes and the progress against achieving those goals.
What did you think of the Azure RCA?
Category: Cloud Microsoft Outage Providers Tags: Azure, Cloud, Microsoft, Outage, Transparency
by Kyle Hilgendorf | March 9, 2012 | 2 Comments
On February 29, 2012 (leap day), Microsoft Windows Azure experienced a significant cloud service outage. Microsoft announced the outage and resolution in their public blogs (http://blogs.msdn.com/b/windowsazure/archive/2012/03/01/windows-azure-service-disruption-update.aspx and http://blogs.msdn.com/b/windowsazure/archive/2012/03/01/window-azure-service-disruption-resolved.aspx). After the outage, I was able to interview customers of Azure that expressed to me that the outage was very impactful. During the outage last week, I summarized on this blog some high level points about the outage that customers had quickly sent me. However, now that the dust has settled and I’ve had an opportunity to personally interview more Azure customers, I wanted to take this opportunity and provide deeper insight.
Every customer I spoke to agreed to do so under strict confidence. This is always of primary importance to Gartner. I am very thankful to be in the unique position where I get direct and specific details from customers and will always respect their confidentiality. Therefore, I have anonymized all the details. Readers can be certain however, that the below points came directly from real customers using Windows Azure services. I will deliberately not replay the insights from my previous blog, but they still apply. While these insights are specific to Microsoft Windows Azure, they can be applied to any cloud service and I encourage customers and providers alike to consider the learnings. Let’s look at the new insights.
- Communication from Microsoft should have been better – Every customer I spoke to mentioned this. Even 2-3 days after the outage, some customers had not received any formal communication. One customer informed me that they received some personal emails from support friends at Microsoft but nothing official. Those customers that did get formal communication received a very brief synopsis email of the outage stating that the issue started at 5:45pm PST on February 28th and that customers may have experienced issues with Access Control 2.0, Marketplace, Service Bus, and Access Control & Caching Portal. This is a different list of services than those posted in the public blog by VP Bill Laing, however the services listed in the email more closely align to what the Azure Health Dashboard displayed at various points during the outage. The history of the Azure Health Dashboard also shows service interruptions or degradations with the following services on February 29 and/or March 1: SQL Azure Data Sync, Management Portal, Compute, and Service Management. Depending on which communication customers reviewed, there was conflicting information.
- Customers are frustrated with the lack of transparency by Microsoft – The Azure blog announced services were restored for most customers by 2:57am PST on the 29th. Yet, every customer I spoke to informed me that they experienced widespread issues until late on the 29th. I was specifically told by multiple customer that they did not see services come back online until approximately 8pm PST on the 29th, essentially canceling the entire day for them on the 29th, especially for those on the US east coast or in Europe. One customer told me, “we live in a transparent culture. Services go down, but the best practice is ultimate transparency.” Customer sentiment was that Microsoft was not honest immediately during the outage and continued to post conflicting information regarding the outage and its breadth.
- The service outage was far more impactful than advertised – Customers informed me of outages or issues with all of the following services: Access Control 2.0, Service Bus, SQL Azure Data Sync, SQL Azure Database, SQL Azure Management Portal, Windows Azure Compute, and Windows Azure Marketplace. Furthermore, many of these services were having issues in multiple Azure regions. Even though these services were offline at different times, more than one customer informed me that the integrated nature of Azure services means that even if one service is offline, it actually severely affects any of the other services from working properly. For example, when SQL Azure and Azure Compute were online, Azure Data Sync was not. When a customer relies on Data Sync for connecting SQL and Compute services, all three services end up being unavailable. When Access Control was offline, users could not authenticate, rendering the backend application unusable. Furthermore, when Service Management capabilities are offline, it prevents customers from executing any administrative tasks that would assist the customer from redeploying in other regions or implementing business continuity plans. The key learning here is that even if a single component of a cloud service is offline, the impact for an individual customer could be far reaching throughout the other services in the cloud.
- Customers are not leaving Azure, but they are brainstorming options – A consistent theme among Azure customers is that this outage by itself is not driving away current business from Azure. In fact, most customers have been pleased with the service over the past months and years. Most customers are willing to give Microsoft a “black swan” pass on the actual technical issue, but hope it causes Microsoft to improve upon the first few bullet points. With that said, some customers are considering options to protect themselves further from an Azure outage in the future. As mentioned in the previous bullet point, Azure by itself was not able to offer the resiliency and availability to sustain this outage for customers. Because this was a wide-reaching software bug, most regions and services were affected at some point. Customers concluded after the outage that the only true protection against such a widespread software bug is to build a multi-provider or hybrid operating strategy. Therefore, customers are looking at possibilities to maintain some services locally on-premises or enlisting a secondary provider. The challenge in the latter is that very few legitimate .NET and SQL Server as-a-service alternatives exist. Microsoft may be contributing to this problem by building up Azure to such a large offering and cannibalizing its own channel of partners. One customer informed me that they would love to see Microsoft resell Azure to other providers. Other customers that are looking at an on-premises deployment are weighing the costs and risks to do so as compared to the business lost in a single business day. Building such architecture can be quite costly.
- Customers were surprised at the lack of “press” – This is an interesting insight. More than one customer informed me that they were surprised how little information was published regarding the Azure outage and how few customers were publicly complaining about the outage. In comparison to cloud outages in 2011, customers expressed that the news and twitter traffic was much lower. One customer informed me that they were actually wondering if this was an indication of how few customers are in production with Azure and whether they are one of the few in that situation. That did not make them feel very good. However, as I learned from other customers later, many customers deliberately refrained from commenting publicly or in venues such as Twitter because they did not want to elevate to the public that they were having an outage as a result of the Azure outage. As an analyst I have to wonder whether admitting use of public cloud services is a good PR move or a bad PR move.
- Customers are not bothering with SLA claims – Most customers when asked about submitting an SLA claim responded that they were not going to waste their time. To begin, many of the customers complained about the Azure standard SLA, concluding that it is open for interpretation and highly beneficial to Microsoft. One customer even informed me that Microsoft told them this outage did not violate the Azure SLAs. Regardless of whether the outage violated the SLA or not, customers commonly shared that submitting an SLA claim is not worth the time and effort. After all, these businesses lost nearly a day of service and are focusing their time and effort on making sure services are restored, working, and better resilient for the future. Customers did express that it would be welcomed if Microsoft proactively offered them a credit for this outage as a sign of good will and to lessen any need to go through the hassle of submitting an SLA claim. AWS did this in April of 2011 for all customers and it was a popular move. One customer did tell me that Microsoft extended a compensation offer to them after the outage.
- Customers need better health status of Azure – As I mentioned in my blog last week, cloud providers need to host their health dashboard outside of their own service and be prepared for large amounts of traffic to the dashboard in the event of an outage.. The Azure Health Dashboard was frequently unavailable during the outage, making it hard for customers to understand what was going on. Current health status is very important, especially for those customers that desperately want to try to leverage other regions or services to bring capabilities back online. Customers are therefore urging that Microsoft take this advice and some customers are looking at 3rd party options that can monitor Azure health from the outside.
We are near the 10-day commitment by Microsoft to deliver the Root Cause Analysis. Customers should pay close attention to the root cause analysis as often such documentation will provide insights and learnings into not only the architecture of the cloud service, but also the commitment by the cloud service provider to customers. I hope the analysis will be transparent into what happened, what Microsoft has learned from it, how it will be prevented in the future, and what help Microsoft is offering to Azure customers to avoid impacts in the future.
Cloud outages are a sad and unfortunate event. However, if we learn from them, build better services, increase transparency, and guide towards better application design, then we can make something great out of something bad.
Category: Cloud Microsoft Outage Providers Tags: Azure, Cloud, Microsoft, Outage, Providers
by Kyle Hilgendorf | February 29, 2012 | 9 Comments
Today, Microsoft Windows Azure had an advertised outage. As of writing this blog, the outage is still in recovery mode. I spent the morning talking to a handful of Azure customers via phone, email, and Twitter. Here are some observations becoming quite evident and important learnings for cloud customers and cloud providers:
- Cloud providers continue to track cloud outages/issues based only on availability whereas it must also include performance and response metrics
- Service dashboards continue to rely on the underlying cloud service being online
- Customers can never get enough information during the outage from the provider
- We all know outages are a fact of life, but in the midst of one, pain is real
- Customer application design needs to continue to evolve
Let me dive into each of these points with my own commentary.
- Cloud providers continue to track cloud outages/issues based only on availability whereas it must also include performance and response metrics: Azure’s health dashboard and communication originally communicated that only 3.8% of customers were affected with this outage. There was no context around where the 3.8% came from or how it was measured but I spoke to several customers this morning that suspect they were not included in the 3.8%. Just recently, the percentages were increased at the dashboard. Based upon region, the latest affected customer percentages are 6.7%, 37%, and 28% (and may still change). I was informed by some customers that various Azure roles (web, worker, VM) are up and online for many of these customers but that service performance is degraded to such a point of being unusable. Because most provider SLAs are based upon uptime and availability, and not performance or response, these outages may not be reported as being affected. You can follow some of my interactions via Twitter (@kylehilgendorf) from this morning to see a couple of examples. Providers MUST start including performance and response SLAs into their standard service. A degraded service is often as impactful as a down service. A great quote came in on twitter this morning via @qthrul, “…a falling tower is ‘up’ until it is ‘down’.” A falling tower is not very useful for most customers.
- Service dashboards continue to rely on the underlying cloud service being online: The Azure Service Dashboard (http://www.windowsazure.com/en-us/support/service-dashboard/) has been experiencing very intermittent availability. Throughout this morning, I have had about a 25%-30% success rate of getting the dashboard to load. I’ve been informing providers frequently that service health systems and dashboards must be hosted independently from the provider’s cloud service. If the cloud service is down or degraded, customers had better be able to see the status at all times. I recently finished a lengthy document on evaluation criteria for public IaaS providers that will publish in the near future, and one of those criteria specifically states this as a requirement. If the service dashboard is the primary vessel by which cloud providers communicate outage updates, it must be up while the service is down.
- Customers can never get enough information during the outage from the provider: Looking back to 2011 and the AWS and Microsoft outages it became very clear that frequent status updates are paramount during an outage. AWS led the way with 30-45 min outage updates through their painful EBS outage and Ireland issues. While updates don’t solve the problem, they do demonstrate customer advocacy and concern. Some customers told me this morning they feel completely in the dark. There is no reason why a cloud provider should not have a dedicated communication team providing at least 30 min updates throughout the entire outage. Microsoft seems to be in a good cadence late this morning on more frequent updates, but there were large gaps in updates when the outage first occurred. More important in my opinion however, is a thorough post-mortem on the outage once the service has been restored. This should come within 3-4 days of the outage and must be very open and honest about the root cause, the fix, and the take-aways for the future. Providers please note, the world is very smart. If a provider even tries to mask or hide any of the details, it will come back to reflect negatively. Honesty wins.
- We all know outages are inevitabilities, but in the midst of one, pain is real: I’ve heard from some customers very impacted and as a result very frustrated and disappointed. When a cloud service has a good track record, we all admit that an outage will happen at some point. Yet, in the middle of an outage, emotion gets involved. Therefore, see point #5.
- Customer application design needs to continue to evolve: Similar to previous cloud outages, customer application design must continue to evolve to account for possible (some would say probable) cloud outages and issues. No cloud services is identical to anotherand each has its own unique design and configuration options. Most cloud services have the concept of zones and regions from a geographical or hosting location standpoint. In most cloud outages, not every zone or region is affected. Therefore, the best-prepared applications will be those designed cross-zone and cross-region to avoid an outage or degradation in any one area. However, this comes at extreme complexity and increase in cost. Many times 3x-10x the cost advertised by providers. If you will be running a critical application at a cloud provider, expect an outage, design for resiliency, and be prepared to pay for it. This may also mean that you have to hire or retain some very skilled cloud staff.
It is always a sad day as a cloud analyst to see these outages. However, it seems that significant change in the industry, at both a provider and customer level, only tends to come after an emergency.
I’d love your comments here. Let’s engage in a conversation.
Category: Cloud Microsoft Outage Providers Tags: Azure, Cloud, Microsoft, Outage
by Kyle Hilgendorf | February 3, 2012 | 9 Comments
I’m pretty vocal when it comes to challenging Cloud Service Providers (CSPs) regarding increasing the amount of public transparency they share with not only customers but with prospects. On a very regular basis, I take calls from Gartner clients about the challenges in evaluating CSPs and the frustration with the lack of published information that exists at most providers.
I’ve seen some CSPs make some very good strides lately in terms of improving websites and publishing architectural and security related information. One particular aspect where the industry has seen very little improvement is transparency with audits.
A common discussion for me at Gartner has centered on SAS 70 Type II audits, and now SSAE 16 / SOC 1 reports. The latter has replaced SAS 70 and having an SSAE 16 audit and SOC 1 report completed by an independent third party is table stakes for competing in the public cloud services market. There are many problems with the SSAE 16 audit, namely that CSPs still get to designate which control objectives an auditing agency verifies. If a CSP does a poor job at logical access security, they could choose not to have the third party audit them against that control. It seems unfair and a loophole. As such, customers actually do need to see the SOC 1 report and must sign a confidentiality agreement with each provider to do so. That does not scale well.
But why a confidentiality agreement? Why don’t CSPs simply publish their SOC 1 report online? I’ve spent the last month talking to a number of CSPs about this. I get the token response that it would divulge sensitive security configurations that if published would put the cloud service in jeopardy of being attacked/exposed. My response to that is, “Ok, but let’s get creative.” I have not been able to understand why a CSP cannot publish a summary report listing each of the controls that were audited and the relative findings for each objective. There is a stark difference in mentioning that a third party confirmed security surveillance cameras are in place versus actually listing each physical location of all individual cameras.
Well after having several in depth conversations with many providers, I believe our cross hairs need not focus on the CSPs as much as the auditing agencies. More than a few of the CSPs have apparently gone to their auditing agency and requested the right to publish the SOC 1 report publicly. All providers that have done this were denied that ability. The auditing agency holds the copyright to the report and the legal agreements of the audit restrict the CSP from publishing without auditor consent.
A few providers claim they have gone further and have asked the auditor if they can takes portions of the report and publish as an executive summary or FAQ to highlight for customers the controls and summarized results. Again, those providers were not able to obtain the rights to do so.
What are these auditing agencies / large consulting companies needing to hide? If they truly are independent, third parties, why can’t they stand behind their report publicly? If not the entire report, why not a summary of findings?
Providers are not 100% absolved of any responsibility here either. Even if the auditing agency refuses to release any information from the report, the provider should still publicly list the controls that the provider asked the auditor to look after. That would be a big step for many providers and would at least start to level-set the playing field for customer evaluations. Furthermore, the best CSPs will put more emphasis on obtaining ISO 27001 certification, which does provide a base standard for controls.
I would love to hear from you on this. Are you a customer that is tired of signing agreements simply to confirm controls? Are you a provider that wants to publish more information but are restricted by auditors? Are you an auditor that would like to have a deeper discussion? Please contact me.
Category: Cloud Evaluation Providers Tags: Audit, Cloud, Evaluation, SSAE 16
by Kyle Hilgendorf | December 16, 2011 | 2 Comments
I’ve focused a significant amount of effort in 2011 in assisting our clients through assessments of various cloud providers, namely at the IaaS level. The topic has been so popular in fact, that I presented an “Evaluating Cloud Providers” session at our Gartner Catalyst 2011 conference as well as a free Gartner webinar (which is available on replay).
We have several pieces of research in the works that we are excited will further assist customers in evaluating cloud providers in early 2012.
However, I would be remiss if I did not call attention to the fact that a very encouraging announcement was recently made by the Cloud Security Alliance. I’ve personally been an advocate for the CSA and the effort they’ve put into improving security standards within cloud computing. The recent announcement is in regards to a public cloud provider registry named STAR. The intent of STAR is to provide a publicly accessible registry where cloud providers publish the security controls that they offer in their service.
Most cloud providers in my recent experience have become quite good and open in sharing their security controls with prospective clients, but it is very time consuming for clients to hop from provider to provider, ask to see these controls, and document the controls for comparison. Furthermore, many of the providers still require a signed NDA with the client to share the controls.
My hope with STAR is that most providers opt in, as this is exactly the type of registry and knowledge sharing location that customers want. However, there is one potential risk. The CSA is a member-driven organization, and many of the public cloud providers are key members. There is a risk that the members will tune the security criteria over time to best match their capabilities. Yet I have faith that the consensus opinion of many providers (i.e. competitors) will triumph over collusion and we as Gartner will keep a close eye on this. It is a positive sign that the CSA does not require a cloud provider to be a CSA member in order to be listed in STAR. As a result, there really is no excuse for a cloud provider to not opt in. If you are a significant customer at a major cloud provider and you also believe in this, encourage your provider to participate.
This entire entry is my own personal opinion, not an official position from Gartner.
Category: Cloud CSA Evaluation Providers Tags: Cloud, CSA, Evaluation, Providers
by Kyle Hilgendorf | August 29, 2011 | 3 Comments
VMworld kicked off this week with a flurry of announcements and improvements and I wanted to highlight two of the important ones for my coverage area.
Global Connect – this new announcement is intriguing for enterprise customers. The goal is for vCloud Datacenter partners to develop global relationships with one another to provide customers true global coverage for vCloud hosting while only having to maintain a relationship with a vCloud Datacenter partner in their region. The announcement kicks off with Bluelock (US), Softbank (Japan), and Singtel (Singapore).
While this will take some time for the technology to be put in place and work effectively, my concerns are not around technology. The non-technical side of this announcement will prove to be extremely challenging. For each of the providers to establish legal agreements with one another in such a way that Customer X can deploy a workload to Bluelock and then later move it or copy it to Singtel or Softbank (or vice versa) will surely prove to be challenging. It is hard enough for Customer X to establish satisfactory terms and conditions with any single provider today. For a customer to be able to establish those terms and then for the provider to take those terms and pass on to other providers is daunting to consider.
Surely these providers will be motivated, but tracking the agreements will be an area to watch closely. Furthermore, customers will now have additional context to consider when negotiating agreements. Pay close to attention for language around the mobility of workloads from provider to provider.
If VMware can successfully broker their providers through this Global Connect initiative, it squarely places vCloud Datacenter providers (or the ecosystem) in line to compete with Amazon Web Services from a global availability perspective, an area until this announcement where vCloud was getting beat badly.
vCloud Connector 1.5 - Earlier this year I published a research document named, Moving Applications to the Cloud: Finding Your Right Path. The document provides guidance around the process to move virtual machines from internal data centers to public cloud providers. In the document, I highlighted several current concerns with vCloud Director and vCloud Connector 1.0. At the document publication time there were issues such as no network transmission intelligence, restart protocol, and the fact that vCloud Connector had to temporarily make a copy of the OVF/VMDK and store it in a temporary holding area. All of these factors contributed to really poor VM mobility performance.
It is great to see that version 1.5 addresses some of these issues. vCloud Connector no longer requires a copy of the OVF/VMDK to a temporary location. Furthermore, a restartable protocol has now been introduced. Now a single network hiccup will not completely interrupt the transmission and require you to start over. Both advancements are welcome.
However, as noted in the document, the biggest problem in V2C Mobility is inconsistent/slow internet upload speeds from customers to vCloud Datacenter providers. I would still like to see network intelligence built into vCloud Director and Connector (e.g. compression, acceleration, de-dup). Adding these enhancements will really start to change the game for VM mobility performance and move the industry from a nice concept to reality. VMware’s stiff competitor, Citrix, is already aggressively innovating in this space with their Netscaler CloudBridge announcement earlier in the year. Perhaps VMware can build, acquire, or partner with someone to bring similar capabilities to vCloud.
I hope to see many of you at VMworld this week. It is always a great week and wonderful chance to learn, network, and gain perspective.
Category: Cloud Hybrid IaaS Mobility Providers vCloud VMware Tags: Availability, Mobility, V2C, vCloud, VMware
by Kyle Hilgendorf | August 23, 2011 | 1 Comment
I just finished two webinars last week on Evaluating Cloud Providers. Attendance was really fantastic and I hope our research has helped a variety of companies in their journey of evaluating cloud providers. Below are my thoughts from the sessions. [Read more →]
Category: Cloud Evaluation Hosting IaaS Providers Tags: Assessment, Cloud, Evaluation, Providers, Transparency, Vendor Management