by Kyle Hilgendorf | March 12, 2012 | 6 Comments
Late Friday evening, Microsoft released their root cause analysis (RCA) for the Azure Leap Day Bug outage. My last two blog posts chronicled what I heard from Azure customers regarding the outage.
I want to share that I was very pleased with the level of detail in Microsoft’s RCA. As we learned with the AWS EBS outage in 2011, an RCA or Post Mortem is one of the best insights into architecture, testing, recovery, and communication plans in existence at a cloud provider. Microsoft’s RCA was no exception.
I encourage all current and prospective Azure customers to read and digest the Azure RCA. There is significant insight and knowledge around how Azure is architected, much more so than customers have received in the past. It is also important for customers to gauge how a provider responds to an outage. We continuously advise clients to pay close attention to how providers respond to issues, degradations, and outages of service.
I do not want to copy the RCA, but here are a few bullet points I’d like to highlight.
- It’s erie how similar the leap day outage at Azure was to AWS’ EBS outage. Both involved software bugs and human errors. Both were cascading issues. Both issues went unnoticed longer than necessary. As a result, both companies have implemented key traps in their service to catch and prevent errors like this sooner and to prevent spreading.
- Microsoft decidedly suspended service management in order to stop or slow the spread of the issue. Microsoft made this decision with very good reason. Customers would have appreciated knowing the rationale around this decision right away and Microsoft is committing to improving real time communication.
- The actual leap day bug issue and resolution was identified, tested, and rolled out within 12 hours. That is pretty fast. The other issues resulted from unfortunate timing of upgrading software at the time the bug hit, as well as a human error in trying to resolve some other software issues. Microsoft even admits, “in our eagerness to get the fix deployed, we had overlooked the fact that the update package we created….[was] incompatible.”
- Even though the human error only affected seven Azure clusters, those clusters happened to contain Access Control (ACS) and Service Bus services, thereby taking those key services offline. As I spoke to customers the last two weeks, it became quite clear that without such key services as ACS and Service Bus, many other functions of Azure are unusable.
- Microsoft took steps to prevent the outage from worsening. Had these steps not been taken, we might have seen a much bigger issue.
- The issues with the Health Dashboard were a result of increased load and traffic. Microsoft will be addressing this problem.
- Microsoft understands that real-time communication must improve during an outage and are taking steps to improve.
- A 33% service credit is being applied to all customers of the affected services, regardless of whether they were affected. This 33% credit is quickly becoming a de facto standard for cloud outages. Customers appreciate this offer as it benefits both customers and providers alike from having to deal with SLA claims and the administrative overhead involved.
As a final note, Microsoft stated in the RCA many times that they would be working to improve many different processes. I hope that as time moves forward, Microsoft continues to use their blog to share more specifics about the improvements in those processes and the progress against achieving those goals.
What did you think of the Azure RCA?
Category: Cloud Microsoft Outage Providers Tags: Azure, Cloud, Microsoft, Outage, Transparency
by Kyle Hilgendorf | March 9, 2012 | 2 Comments
On February 29, 2012 (leap day), Microsoft Windows Azure experienced a significant cloud service outage. Microsoft announced the outage and resolution in their public blogs (http://blogs.msdn.com/b/windowsazure/archive/2012/03/01/windows-azure-service-disruption-update.aspx and http://blogs.msdn.com/b/windowsazure/archive/2012/03/01/window-azure-service-disruption-resolved.aspx). After the outage, I was able to interview customers of Azure that expressed to me that the outage was very impactful. During the outage last week, I summarized on this blog some high level points about the outage that customers had quickly sent me. However, now that the dust has settled and I’ve had an opportunity to personally interview more Azure customers, I wanted to take this opportunity and provide deeper insight.
Every customer I spoke to agreed to do so under strict confidence. This is always of primary importance to Gartner. I am very thankful to be in the unique position where I get direct and specific details from customers and will always respect their confidentiality. Therefore, I have anonymized all the details. Readers can be certain however, that the below points came directly from real customers using Windows Azure services. I will deliberately not replay the insights from my previous blog, but they still apply. While these insights are specific to Microsoft Windows Azure, they can be applied to any cloud service and I encourage customers and providers alike to consider the learnings. Let’s look at the new insights.
- Communication from Microsoft should have been better – Every customer I spoke to mentioned this. Even 2-3 days after the outage, some customers had not received any formal communication. One customer informed me that they received some personal emails from support friends at Microsoft but nothing official. Those customers that did get formal communication received a very brief synopsis email of the outage stating that the issue started at 5:45pm PST on February 28th and that customers may have experienced issues with Access Control 2.0, Marketplace, Service Bus, and Access Control & Caching Portal. This is a different list of services than those posted in the public blog by VP Bill Laing, however the services listed in the email more closely align to what the Azure Health Dashboard displayed at various points during the outage. The history of the Azure Health Dashboard also shows service interruptions or degradations with the following services on February 29 and/or March 1: SQL Azure Data Sync, Management Portal, Compute, and Service Management. Depending on which communication customers reviewed, there was conflicting information.
- Customers are frustrated with the lack of transparency by Microsoft – The Azure blog announced services were restored for most customers by 2:57am PST on the 29th. Yet, every customer I spoke to informed me that they experienced widespread issues until late on the 29th. I was specifically told by multiple customer that they did not see services come back online until approximately 8pm PST on the 29th, essentially canceling the entire day for them on the 29th, especially for those on the US east coast or in Europe. One customer told me, “we live in a transparent culture. Services go down, but the best practice is ultimate transparency.” Customer sentiment was that Microsoft was not honest immediately during the outage and continued to post conflicting information regarding the outage and its breadth.
- The service outage was far more impactful than advertised – Customers informed me of outages or issues with all of the following services: Access Control 2.0, Service Bus, SQL Azure Data Sync, SQL Azure Database, SQL Azure Management Portal, Windows Azure Compute, and Windows Azure Marketplace. Furthermore, many of these services were having issues in multiple Azure regions. Even though these services were offline at different times, more than one customer informed me that the integrated nature of Azure services means that even if one service is offline, it actually severely affects any of the other services from working properly. For example, when SQL Azure and Azure Compute were online, Azure Data Sync was not. When a customer relies on Data Sync for connecting SQL and Compute services, all three services end up being unavailable. When Access Control was offline, users could not authenticate, rendering the backend application unusable. Furthermore, when Service Management capabilities are offline, it prevents customers from executing any administrative tasks that would assist the customer from redeploying in other regions or implementing business continuity plans. The key learning here is that even if a single component of a cloud service is offline, the impact for an individual customer could be far reaching throughout the other services in the cloud.
- Customers are not leaving Azure, but they are brainstorming options – A consistent theme among Azure customers is that this outage by itself is not driving away current business from Azure. In fact, most customers have been pleased with the service over the past months and years. Most customers are willing to give Microsoft a “black swan” pass on the actual technical issue, but hope it causes Microsoft to improve upon the first few bullet points. With that said, some customers are considering options to protect themselves further from an Azure outage in the future. As mentioned in the previous bullet point, Azure by itself was not able to offer the resiliency and availability to sustain this outage for customers. Because this was a wide-reaching software bug, most regions and services were affected at some point. Customers concluded after the outage that the only true protection against such a widespread software bug is to build a multi-provider or hybrid operating strategy. Therefore, customers are looking at possibilities to maintain some services locally on-premises or enlisting a secondary provider. The challenge in the latter is that very few legitimate .NET and SQL Server as-a-service alternatives exist. Microsoft may be contributing to this problem by building up Azure to such a large offering and cannibalizing its own channel of partners. One customer informed me that they would love to see Microsoft resell Azure to other providers. Other customers that are looking at an on-premises deployment are weighing the costs and risks to do so as compared to the business lost in a single business day. Building such architecture can be quite costly.
- Customers were surprised at the lack of “press” – This is an interesting insight. More than one customer informed me that they were surprised how little information was published regarding the Azure outage and how few customers were publicly complaining about the outage. In comparison to cloud outages in 2011, customers expressed that the news and twitter traffic was much lower. One customer informed me that they were actually wondering if this was an indication of how few customers are in production with Azure and whether they are one of the few in that situation. That did not make them feel very good. However, as I learned from other customers later, many customers deliberately refrained from commenting publicly or in venues such as Twitter because they did not want to elevate to the public that they were having an outage as a result of the Azure outage. As an analyst I have to wonder whether admitting use of public cloud services is a good PR move or a bad PR move.
- Customers are not bothering with SLA claims – Most customers when asked about submitting an SLA claim responded that they were not going to waste their time. To begin, many of the customers complained about the Azure standard SLA, concluding that it is open for interpretation and highly beneficial to Microsoft. One customer even informed me that Microsoft told them this outage did not violate the Azure SLAs. Regardless of whether the outage violated the SLA or not, customers commonly shared that submitting an SLA claim is not worth the time and effort. After all, these businesses lost nearly a day of service and are focusing their time and effort on making sure services are restored, working, and better resilient for the future. Customers did express that it would be welcomed if Microsoft proactively offered them a credit for this outage as a sign of good will and to lessen any need to go through the hassle of submitting an SLA claim. AWS did this in April of 2011 for all customers and it was a popular move. One customer did tell me that Microsoft extended a compensation offer to them after the outage.
- Customers need better health status of Azure – As I mentioned in my blog last week, cloud providers need to host their health dashboard outside of their own service and be prepared for large amounts of traffic to the dashboard in the event of an outage.. The Azure Health Dashboard was frequently unavailable during the outage, making it hard for customers to understand what was going on. Current health status is very important, especially for those customers that desperately want to try to leverage other regions or services to bring capabilities back online. Customers are therefore urging that Microsoft take this advice and some customers are looking at 3rd party options that can monitor Azure health from the outside.
We are near the 10-day commitment by Microsoft to deliver the Root Cause Analysis. Customers should pay close attention to the root cause analysis as often such documentation will provide insights and learnings into not only the architecture of the cloud service, but also the commitment by the cloud service provider to customers. I hope the analysis will be transparent into what happened, what Microsoft has learned from it, how it will be prevented in the future, and what help Microsoft is offering to Azure customers to avoid impacts in the future.
Cloud outages are a sad and unfortunate event. However, if we learn from them, build better services, increase transparency, and guide towards better application design, then we can make something great out of something bad.
Category: Cloud Microsoft Outage Providers Tags: Azure, Cloud, Microsoft, Outage, Providers
by Kyle Hilgendorf | February 29, 2012 | 9 Comments
Today, Microsoft Windows Azure had an advertised outage. As of writing this blog, the outage is still in recovery mode. I spent the morning talking to a handful of Azure customers via phone, email, and Twitter. Here are some observations becoming quite evident and important learnings for cloud customers and cloud providers:
- Cloud providers continue to track cloud outages/issues based only on availability whereas it must also include performance and response metrics
- Service dashboards continue to rely on the underlying cloud service being online
- Customers can never get enough information during the outage from the provider
- We all know outages are a fact of life, but in the midst of one, pain is real
- Customer application design needs to continue to evolve
Let me dive into each of these points with my own commentary.
- Cloud providers continue to track cloud outages/issues based only on availability whereas it must also include performance and response metrics: Azure’s health dashboard and communication originally communicated that only 3.8% of customers were affected with this outage. There was no context around where the 3.8% came from or how it was measured but I spoke to several customers this morning that suspect they were not included in the 3.8%. Just recently, the percentages were increased at the dashboard. Based upon region, the latest affected customer percentages are 6.7%, 37%, and 28% (and may still change). I was informed by some customers that various Azure roles (web, worker, VM) are up and online for many of these customers but that service performance is degraded to such a point of being unusable. Because most provider SLAs are based upon uptime and availability, and not performance or response, these outages may not be reported as being affected. You can follow some of my interactions via Twitter (@kylehilgendorf) from this morning to see a couple of examples. Providers MUST start including performance and response SLAs into their standard service. A degraded service is often as impactful as a down service. A great quote came in on twitter this morning via @qthrul, “…a falling tower is ‘up’ until it is ‘down’.” A falling tower is not very useful for most customers.
- Service dashboards continue to rely on the underlying cloud service being online: The Azure Service Dashboard (http://www.windowsazure.com/en-us/support/service-dashboard/) has been experiencing very intermittent availability. Throughout this morning, I have had about a 25%-30% success rate of getting the dashboard to load. I’ve been informing providers frequently that service health systems and dashboards must be hosted independently from the provider’s cloud service. If the cloud service is down or degraded, customers had better be able to see the status at all times. I recently finished a lengthy document on evaluation criteria for public IaaS providers that will publish in the near future, and one of those criteria specifically states this as a requirement. If the service dashboard is the primary vessel by which cloud providers communicate outage updates, it must be up while the service is down.
- Customers can never get enough information during the outage from the provider: Looking back to 2011 and the AWS and Microsoft outages it became very clear that frequent status updates are paramount during an outage. AWS led the way with 30-45 min outage updates through their painful EBS outage and Ireland issues. While updates don’t solve the problem, they do demonstrate customer advocacy and concern. Some customers told me this morning they feel completely in the dark. There is no reason why a cloud provider should not have a dedicated communication team providing at least 30 min updates throughout the entire outage. Microsoft seems to be in a good cadence late this morning on more frequent updates, but there were large gaps in updates when the outage first occurred. More important in my opinion however, is a thorough post-mortem on the outage once the service has been restored. This should come within 3-4 days of the outage and must be very open and honest about the root cause, the fix, and the take-aways for the future. Providers please note, the world is very smart. If a provider even tries to mask or hide any of the details, it will come back to reflect negatively. Honesty wins.
- We all know outages are inevitabilities, but in the midst of one, pain is real: I’ve heard from some customers very impacted and as a result very frustrated and disappointed. When a cloud service has a good track record, we all admit that an outage will happen at some point. Yet, in the middle of an outage, emotion gets involved. Therefore, see point #5.
- Customer application design needs to continue to evolve: Similar to previous cloud outages, customer application design must continue to evolve to account for possible (some would say probable) cloud outages and issues. No cloud services is identical to anotherand each has its own unique design and configuration options. Most cloud services have the concept of zones and regions from a geographical or hosting location standpoint. In most cloud outages, not every zone or region is affected. Therefore, the best-prepared applications will be those designed cross-zone and cross-region to avoid an outage or degradation in any one area. However, this comes at extreme complexity and increase in cost. Many times 3x-10x the cost advertised by providers. If you will be running a critical application at a cloud provider, expect an outage, design for resiliency, and be prepared to pay for it. This may also mean that you have to hire or retain some very skilled cloud staff.
It is always a sad day as a cloud analyst to see these outages. However, it seems that significant change in the industry, at both a provider and customer level, only tends to come after an emergency.
I’d love your comments here. Let’s engage in a conversation.
Category: Cloud Microsoft Outage Providers Tags: Azure, Cloud, Microsoft, Outage
by Kyle Hilgendorf | February 3, 2012 | 9 Comments
I’m pretty vocal when it comes to challenging Cloud Service Providers (CSPs) regarding increasing the amount of public transparency they share with not only customers but with prospects. On a very regular basis, I take calls from Gartner clients about the challenges in evaluating CSPs and the frustration with the lack of published information that exists at most providers.
I’ve seen some CSPs make some very good strides lately in terms of improving websites and publishing architectural and security related information. One particular aspect where the industry has seen very little improvement is transparency with audits.
A common discussion for me at Gartner has centered on SAS 70 Type II audits, and now SSAE 16 / SOC 1 reports. The latter has replaced SAS 70 and having an SSAE 16 audit and SOC 1 report completed by an independent third party is table stakes for competing in the public cloud services market. There are many problems with the SSAE 16 audit, namely that CSPs still get to designate which control objectives an auditing agency verifies. If a CSP does a poor job at logical access security, they could choose not to have the third party audit them against that control. It seems unfair and a loophole. As such, customers actually do need to see the SOC 1 report and must sign a confidentiality agreement with each provider to do so. That does not scale well.
But why a confidentiality agreement? Why don’t CSPs simply publish their SOC 1 report online? I’ve spent the last month talking to a number of CSPs about this. I get the token response that it would divulge sensitive security configurations that if published would put the cloud service in jeopardy of being attacked/exposed. My response to that is, “Ok, but let’s get creative.” I have not been able to understand why a CSP cannot publish a summary report listing each of the controls that were audited and the relative findings for each objective. There is a stark difference in mentioning that a third party confirmed security surveillance cameras are in place versus actually listing each physical location of all individual cameras.
Well after having several in depth conversations with many providers, I believe our cross hairs need not focus on the CSPs as much as the auditing agencies. More than a few of the CSPs have apparently gone to their auditing agency and requested the right to publish the SOC 1 report publicly. All providers that have done this were denied that ability. The auditing agency holds the copyright to the report and the legal agreements of the audit restrict the CSP from publishing without auditor consent.
A few providers claim they have gone further and have asked the auditor if they can takes portions of the report and publish as an executive summary or FAQ to highlight for customers the controls and summarized results. Again, those providers were not able to obtain the rights to do so.
What are these auditing agencies / large consulting companies needing to hide? If they truly are independent, third parties, why can’t they stand behind their report publicly? If not the entire report, why not a summary of findings?
Providers are not 100% absolved of any responsibility here either. Even if the auditing agency refuses to release any information from the report, the provider should still publicly list the controls that the provider asked the auditor to look after. That would be a big step for many providers and would at least start to level-set the playing field for customer evaluations. Furthermore, the best CSPs will put more emphasis on obtaining ISO 27001 certification, which does provide a base standard for controls.
I would love to hear from you on this. Are you a customer that is tired of signing agreements simply to confirm controls? Are you a provider that wants to publish more information but are restricted by auditors? Are you an auditor that would like to have a deeper discussion? Please contact me.
Category: Cloud Evaluation Providers Tags: Audit, Cloud, Evaluation, SSAE 16
by Kyle Hilgendorf | December 16, 2011 | 2 Comments
I’ve focused a significant amount of effort in 2011 in assisting our clients through assessments of various cloud providers, namely at the IaaS level. The topic has been so popular in fact, that I presented an “Evaluating Cloud Providers” session at our Gartner Catalyst 2011 conference as well as a free Gartner webinar (which is available on replay).
We have several pieces of research in the works that we are excited will further assist customers in evaluating cloud providers in early 2012.
However, I would be remiss if I did not call attention to the fact that a very encouraging announcement was recently made by the Cloud Security Alliance. I’ve personally been an advocate for the CSA and the effort they’ve put into improving security standards within cloud computing. The recent announcement is in regards to a public cloud provider registry named STAR. The intent of STAR is to provide a publicly accessible registry where cloud providers publish the security controls that they offer in their service.
Most cloud providers in my recent experience have become quite good and open in sharing their security controls with prospective clients, but it is very time consuming for clients to hop from provider to provider, ask to see these controls, and document the controls for comparison. Furthermore, many of the providers still require a signed NDA with the client to share the controls.
My hope with STAR is that most providers opt in, as this is exactly the type of registry and knowledge sharing location that customers want. However, there is one potential risk. The CSA is a member-driven organization, and many of the public cloud providers are key members. There is a risk that the members will tune the security criteria over time to best match their capabilities. Yet I have faith that the consensus opinion of many providers (i.e. competitors) will triumph over collusion and we as Gartner will keep a close eye on this. It is a positive sign that the CSA does not require a cloud provider to be a CSA member in order to be listed in STAR. As a result, there really is no excuse for a cloud provider to not opt in. If you are a significant customer at a major cloud provider and you also believe in this, encourage your provider to participate.
This entire entry is my own personal opinion, not an official position from Gartner.
Category: Cloud CSA Evaluation Providers Tags: Cloud, CSA, Evaluation, Providers
by Kyle Hilgendorf | August 29, 2011 | 3 Comments
VMworld kicked off this week with a flurry of announcements and improvements and I wanted to highlight two of the important ones for my coverage area.
Global Connect – this new announcement is intriguing for enterprise customers. The goal is for vCloud Datacenter partners to develop global relationships with one another to provide customers true global coverage for vCloud hosting while only having to maintain a relationship with a vCloud Datacenter partner in their region. The announcement kicks off with Bluelock (US), Softbank (Japan), and Singtel (Singapore).
While this will take some time for the technology to be put in place and work effectively, my concerns are not around technology. The non-technical side of this announcement will prove to be extremely challenging. For each of the providers to establish legal agreements with one another in such a way that Customer X can deploy a workload to Bluelock and then later move it or copy it to Singtel or Softbank (or vice versa) will surely prove to be challenging. It is hard enough for Customer X to establish satisfactory terms and conditions with any single provider today. For a customer to be able to establish those terms and then for the provider to take those terms and pass on to other providers is daunting to consider.
Surely these providers will be motivated, but tracking the agreements will be an area to watch closely. Furthermore, customers will now have additional context to consider when negotiating agreements. Pay close to attention for language around the mobility of workloads from provider to provider.
If VMware can successfully broker their providers through this Global Connect initiative, it squarely places vCloud Datacenter providers (or the ecosystem) in line to compete with Amazon Web Services from a global availability perspective, an area until this announcement where vCloud was getting beat badly.
vCloud Connector 1.5 - Earlier this year I published a research document named, Moving Applications to the Cloud: Finding Your Right Path. The document provides guidance around the process to move virtual machines from internal data centers to public cloud providers. In the document, I highlighted several current concerns with vCloud Director and vCloud Connector 1.0. At the document publication time there were issues such as no network transmission intelligence, restart protocol, and the fact that vCloud Connector had to temporarily make a copy of the OVF/VMDK and store it in a temporary holding area. All of these factors contributed to really poor VM mobility performance.
It is great to see that version 1.5 addresses some of these issues. vCloud Connector no longer requires a copy of the OVF/VMDK to a temporary location. Furthermore, a restartable protocol has now been introduced. Now a single network hiccup will not completely interrupt the transmission and require you to start over. Both advancements are welcome.
However, as noted in the document, the biggest problem in V2C Mobility is inconsistent/slow internet upload speeds from customers to vCloud Datacenter providers. I would still like to see network intelligence built into vCloud Director and Connector (e.g. compression, acceleration, de-dup). Adding these enhancements will really start to change the game for VM mobility performance and move the industry from a nice concept to reality. VMware’s stiff competitor, Citrix, is already aggressively innovating in this space with their Netscaler CloudBridge announcement earlier in the year. Perhaps VMware can build, acquire, or partner with someone to bring similar capabilities to vCloud.
I hope to see many of you at VMworld this week. It is always a great week and wonderful chance to learn, network, and gain perspective.
Category: Cloud Hybrid IaaS Mobility Providers vCloud VMware Tags: Availability, Mobility, V2C, vCloud, VMware
by Kyle Hilgendorf | August 23, 2011 | 1 Comment
I just finished two webinars last week on Evaluating Cloud Providers. Attendance was really fantastic and I hope our research has helped a variety of companies in their journey of evaluating cloud providers. Below are my thoughts from the sessions. [Read more →]
Category: Cloud Evaluation Hosting IaaS Providers Tags: Assessment, Cloud, Evaluation, Providers, Transparency, Vendor Management
by Kyle Hilgendorf | August 12, 2011 | 2 Comments
In my recently published Gartner IT1 guidance document, “Moving Applications to the Cloud: Finding Your Right Path”, I highlighted a major contributing factor to the failure of VM to cloud mobility being the Internet bottleneck between enterprise data centers and public cloud providers. In fact, for many large enterprise organizations, achieving uploads of 2-3 GB/hour across the Internet is a best-case scenario today.
I wanted to highlight two recent announcements in the industry that will hopefully help solve this bottleneck soon.
First, Citrix announced NetScaler Cloud Bridge at the Citrix Synergy conference in May. As the name aptly applies, the product aims to bridge connections between corporate data centers and cloud providers. It is a layer 2 network bridge that includes end-to-end security (IPSEC) and network performance optimization. Cloud Bridge accomplishes the network optimization through compression, packet optimizations, and de-duplication. Performance improvement data is not yet available, but I plan to follow up on that in the near future.
More recently, Amazon Web Services announced AWS Direct Connect which allows organizations to establish a dedicated circuit between corporate infrastructure/applications and AWS at either 1Gbps or 10Gbps connections. AWS already has a an intriguing Virtual to Cloud (V2C) tool (AWS VM Import) that works well for converting Windows Server 2008 VMware-based images to AWS AMIs, but adoption is slow due to Internet bandwidth bottlenecks. It simply hasn’t been practical to convert and move large VMs yet. By establishing a dedicated circuit to AWS, the bottleneck is largely erased for customers that are serious about moving large VMs or large amounts of data to AWS quickly.
Personally, I think both solutions will be very successful and have different use cases. Cloud Bridge is more versatile and will improve and secure connections to several cloud providers. AWS Direct Connect is for those customers highly dedicated to using AWS for serious data transfers. Both solutions will need some more time to bake however. Netscaler Cloud Bridge will need time to proliferate into all the major cloud providers as it will require a presence at each cloud provider and AWS Direct Connect is currently only available as a cross-connect from the Ashburn, VA Equinix facility.
We are approaching a “meet me in the middle” solution where most enterprises need a lot of on-premise hosting, some traditional managed hosting and setup (like dedicated circuits, design assistance, concierge-level support), and also some cloud (for workloads that need scalability or can handle elasticity). The hybrid cloud picture is coming together. Both these announcements are positive improvements in supporting this concept.
I’m excited to see the adoption rate and early feedback on both solutions and am wondering what other players might get serious in this market soon. How about you?
Category: AWS Citrix Cloud Dedicated Hosting Hybrid IaaS Mobility Tags: AWS, Citrix, Cloud, Hosting, Hybrid
by Kyle Hilgendorf | July 21, 2011 | Comments Off
Chris Wolf and I will be participating in the first ever Catalyst Twitter Chat at the Gartner Catalyst conference next week in San Diego. I am thrilled to be part of this event and the topic is extremely popular right now. For those that know Chris he has some of the best technical depth of anyone around in virtualization. I will be bringing my background on cloud service providers and capabilities to conduct virtual to cloud migrations. In fact, my first Gartner IT1 research paper centers around this very topic. IT1 clients can access “Moving Applications to the Cloud: Finding Your Right Path” now.
If you will be at Catalyst next week, please stop by and participate face to face with us. If you are not at Catalyst or prefer to interact virtually, please engage with us via Twitter and submit questions per the guidelines below. We’re looking forward to a great week!
Analyst Chat invitation
@Gartner_inc will be hosting an “Analyst Chat” via Twitter during Gartner’s Catalyst Conference.
When: Wednesday, July 27th at 5:30pm PT.
Where: Aqua 302
Details: Join Chris Wolf (@cswolf) and Kyle Hilgendorf (@kylehilgendorf) to discuss VM and Cloud Mobility – What You Should Worry About. This discussion will cover:
• Important considerations: Converting the VM is the easy part!
• Architectural and product pitfalls that directly impact mobility options
• What options do current cloud providers and server virtualization vendors offer?
• What independent software options exist?
• How will this market mature and how long will it take?
Migrating VMs to and from the public cloud is not for the faint of heart. Yet organizations have compelling arguments for VM to cloud (IaaS) mobility. A market is emerging around cloud brokers and orchestrators, but it is very immature. What are the key aspects when considering moving VMs to or from IaaS cloud providers, or even between your own private data centers? Why isn’t it as simple as converting the VM file format?
To participate and track the conversation, Twitter users should key into #CAT11. We would encourage you to share comments, insights, and questions to help further discuss VM and cloud mobility. This chat will be open to all, look forward to seeing you on Twitter.
• Use #CAT11 to tap into the discussion.
• New question/topic every 10 minutes
• DM questions to @Gartner_inc without hashtag
Category: Catalyst Cloud Gartner IaaS Mobility Virtualization Tags: Cloud, Mobility, Virtualization, VM
by Kyle Hilgendorf | May 31, 2011 | 4 Comments
Citrix’s Project Olympus announcement last week at Synergy regarding commercial support for OpenStack officially signaled the beginning of an intriguing battle for the attention of enterprise cloud computing environments.
VMware is the industry behemoth in the data center with the vast market share of server virtualization business. Their vCloud initiative over the past 2 years has been trying to retain enterprise clients by offering a private, hybrid, and public cloud thread that binds things together. Very few options existed for enterprise clients unless those clients wanted to venture away from VMware, try something on their own, or bleed the edge with innovative cloud service providers.
The OpenStack announcement, driven by Citrix, RackSpace, and Dell is the counter argument to vCloud, and it has support from many companies. But which route will enterprises take?
I hear from many clients the growing concerns about staying 100% all-in with VMware and about the cost of VMware. Gartner advises clients to keep options open, yet at the same time, VMware has a proven track record, credibility, and the simplest migration story thus far. If we can be sure of anything, VMware-VMware will work well.
OpenStack looks to be the open source alternative. If OpenStack makes it easy to migrate and interoperate between vSphere in the data center and OpenStack, then I believe “game on”. I have no doubt that will be a major focus and those involved have a vested interest to make it work well. Further evidence is that OpenStack supports vSphere as a back-end hypervisor and Citrix even wrote the integrations.
But, are too many companies involved in OpenStack? At what point does “groupthink” cripple innovation? Today everyone is excited, developing, partnering, and innovating. When will greedy hands start to mess things up? When will OpenStack “forks” develop and will we find ourselves in interoperability hell? Individual companies have a hard enough time providing backwards and forwards compatibility with their own products. How will a conglomerate of companies fare any better? Count me optimistically skeptical. I am hopeful, but very cautious. Will we see variants of OpenStack all over the place, similar to the state of Linux today? Will Citrix be the thread that holds it all together?
The world appears to be out to get VMware right now. That’s business. Kings of the hill are always the target, just ask Microsoft. Is OpenStack the legitimate threat? I am excited to watch this battle take place and am anxious to see how it unfolds. How do you think it will transpire?
Category: Citrix Cloud IaaS OpenStack Private Cloud vCloud VMware Tags: Citrix, Cloud, Iaas, Olympus, OpenStack, vCloud, VMware