Chris Wolf

A member of the Gartner Blog Network

Chris Wolf
Research VP
6 years at Gartner
19 years IT industry

Chris Wolf is a Research Vice President for the Gartner for Technical Professionals research team. He covers server and client virtualization and private cloud computing. Read Full Bio

Coverage Areas:

VMware vSphere 4.1: Not the Typical .1 Release

by Chris Wolf  |  July 16, 2010  |  3 Comments

On Monday VMware announced the release of vSphere 4.1. VMware published a document describing the new features, and I also recommend checking out VMware CTO Steve Herrod’s post as well as Eric Siebert’s great vSphere 4.1 links post. I was planning to link to several good perspectives on the vSphere 4.1 release, but Siebert’s “links” post has them all and I suggest you take a look.

Granted, I’m a little late to the vSphere 4.1 discussion, but I thought I’d add my two cents nonetheless.

First, the release doesn’t feel like a ‘”.1” update. The feature additions and performance enhancements are fairly significant. Of course, product versions are subjective anyway.

Rather than summarize the new features, I thought it would be best to post a few thoughts about each of the major new additions.

Scalability

vSphere 4.1 offers scalability to 3,000 VMs per cluster, 10,000 online VMs (up to 15,000 registered VMs) per vCenter management server, and up to 30,000 VMs managed across vCenter servers in Linked Mode. Cluster scalability to 32 physical nodes – which was introduced in vSphere 4.0 – also remains. Oftentimes vendor scalability numbers are more for marketing bragging rights than for practical implementations. Many storage vendors, for example, want a professional services investment prior to allowing an organization to scale a cluster beyond as few as 8 nodes. It’s good that VMware will support 32-node scalability, but what are its partners saying? It’s time for VMware’s two VCE coalition partners (i.e., Cisco and EMC) at a minimum to step up and offer support clarity. Reference architectures to this scale (and associated support statements) would be ideal. Let’s hear from other storage and server partners while we’re at it. NetApp? Hitachi? HP? IBM? Fujitsu? Dell?

There are efficiency gains to be had with larger cluster deployments and I know of many clients (not just service providers) who would like to scale their physical clusters to larger numbers, but more guidance is needed from all vendors in the virtualization stack. Kudos to VMware for leading at the hypervisor layer, but their hardware vendors need to join the conversation. Published reference architectures would not only add credibility to VMware’s scalability claims, but also give our clients information they need to scale their own virtual infrastructures.

Storage and Network I/O Control

At the vSphere 4.0 launch in April 2009, I mentioned to Steve Herrod that I planned to ding VMware on the remaining limitations of the Distributed Resource Scheduler (DRS) service – specifically the lack of I/O accounting (you can read my 2009 post on the subject here). To Steve’s credit, he said “Go ahead. We deserve it.” At the time he knew that I was echoing concerns expressed to me by various VMware customers further along in their maturity, so VMware was getting this feedback from multiple sources. That being said, vSphere 4.1’s Storage and Network I/O control features are a big step toward intelligent workload balancing and quality of service (QoS) guarantees that extend to both storage and network I/O. For more information on the Storage I/O Control feature, take a look at Duncan Epping’s post on the subject. The VMware paper “VMware Network I/O Control: Architecture, Performance,and Best Practices” does a very good job describing the Network I/O Control feature.

At this point, storage and network I/O control are a big step in the right direction. VMware VMs are better equipped to get the storage and network I/O they require when they need it. Network I/O control is especially useful for prioritizing different traffic types (e.g., VM, IP storage, or vMotion). What I really like about storage I/O control is the fact that it’s configured at the datastore level. This means that storage I/O control is enforced across all VMs accessing a datastore (e.g., LUN), regardless of on which physical host they reside.

To be clear, both storage and network I/O control are a good first step, but are not yet complete. Both technologies rely on a shares algorithm, meaning that access is a percentage of overall resource availability. This is not the same as guaranteeing a specific amount of network or storage throughput to a given VM. Also, neither feature integrates with VMware’s DRS service. So when DRS decides where to place a VM, only available CPU and memory are considered. Ideally, DRS will evolve to consider network and storage I/O requirements, along with other non-technical criteria such as security zoning requirements when determining the best host for a VM.

You have to walk before you can run, and VMware did a very nice job with the Network and Storage I/O Control implementations. Let’s hope that we’re even closer to true QoS by the next release.

AD Authentication

VMware’s vCenter management server has offered Active Directory authentication for some time, but many of our clients have continually asked VMware to extend AD authentication to the individual hypervisor level. Prior to vSphere 4.1, administrators could login to ESX or ESXi hypervisors using a local account (typically root). Organizations that wanted tighter security at the individual hypervisor required a third party add-on such as HyTrust. While hypervisor-level AD authentication was a key value-add for HyTrust,it’s other key features such as security policy enforcement and audit logging will ensure its longevity.

Simultaneous Live Migrations

Simultaneous live migration capabilities were significantly expanded in the 4.1 release as well. With 10 GbE network connectivity, up to 8 simultaneous live migration jobs are supported. Up to 4 concurrent live migration jobs are possible over 1 GbE. This capability is a big deal because live migrations are the key enabler for non-disruptive scheduled hardware maintenance. Consider a physical server that hosts 32 VMs. Clearly, live migrating 8 VMs at a time will take considerably shorter than live migrating 1 VM at a time (a limitation of other hypervisors). Network capacity, along with available CPU resources will determine how quickly the live migrations actually occur. Nonetheless, improvements in simultaneous live migration capabilities give VMware a decided advantage over competitors in this area.

Memory Compression

Memory Compression is one other key vSphere 4.1 feature. Eric Sloof does a nice job describing the feature here. Memory compression works by compressing overcommitted memory and storing it in physical RAM. The idea is that compressed memory stored in physical RAM will outperform overcommitted memory paged to disk. VMware is making claims of performance gains up to 1,000X higher than swapping memory to disk. At this point I’m intrigued by the feature and am eager to hear the results our clients are seeing in their labs.

Last Release with ESX Architecture

One last item of importance with the vSphere 4.1 release is the fact that VMware announced that this is the last release of the ESX hypervisor. Moving forward, only the ESXi hypervisor will be shipped. Many of our clients ran Linux management agents and custom shell scripts on the Red Hat Enterprise Linux (RHEL)-based service console that was included with the ESX hypervisor. If you have ESX tools or agents that still rely on the RHEL console OS, it’s time to start planning your migration away from those tools. Once you upgrade to the next (post 4.1 version) of vSphere, all tools, scripts, and agents written to execute in the ESX service console will be left behind. VMware has been telling customers that this change was coming for several years, so hopefully you have been preparing for the end-of-life for the ESX console. If you are deploying new VMware virtual infrastructures, you should only be deploying the ESXi hypervisor at this point. The ESX hypervisor should only be deployed where absolutely required (e.g., to support a third party management product).

3 Comments »

Category: Cloud Server Virtualization Virtualization     Tags:

3 responses so far ↓

  • 1 jp   July 16, 2010 at 11:35 pm

    I just hope the scripting install capabilites get better. i have tested 4.1 ESXi 4.1 SERVER DEPLOYMENT Kickstart scripts and am quite disappointed. i’m also be-fuddled by the fact that I cannot seem to create two network portgroups on different vlans with different gateways on a single vswitch – wth?? ? that is just fubar’ed.and does not work for me… also, as a longtime linux guy, i feel a bit screwed over, as the server deployment capabilites are obviously not yet up to speed for my taste, so perhaps i will ride out ESX with the full service console for now. as the debugging of the scripting really sucks too. im sad vmware did this to their admins. i hope it does not cost them in the long run… :(

  • 2 Gareth James   July 20, 2010 at 11:49 pm

    VMware ALERT: VMware View Composer 2.0.x is not supported in a vSphere vCenter Server 4.1

    http://blogs.vmware.com/kb/2010/07/vmware-alert-vmware-view-composer-20x-is-not-supported-in-a-vsphere-vcenter-server-41.html

    This is the second time in as many releases that VMware’s own hypervisor doesn’t support their own VDI offering!

  • 3 Chris Wolf   July 21, 2010 at 10:35 am

    Hi Gareth,

    Thanks for posting this tip. You make a good point, but I also think it’s fair to point out that most large software vendors have similar issues when it comes to support across all of their product lines. In many cases the vendor’s ecosystem products (i.e., View, CapacityIQ) will lag in support for the platform. Once the QA process completes on the GA platform release, required updates for other products to support the newer platform are made available. Many OS vendors have historically followed a similar pattern.

    Regardless, your point is well taken. This speaks to why good change control is so important. Organizations should not upgrade a major platform (such as the virtualization hypervisor) until they are assured that all products that integrate with the platform are supported.