On Monday VMware announced the release of vSphere 4.1. VMware published a document describing the new features, and I also recommend checking out VMware CTO Steve Herrod’s post as well as Eric Siebert’s great vSphere 4.1 links post. I was planning to link to several good perspectives on the vSphere 4.1 release, but Siebert’s “links” post has them all and I suggest you take a look.
Granted, I’m a little late to the vSphere 4.1 discussion, but I thought I’d add my two cents nonetheless.
First, the release doesn’t feel like a ‘”.1” update. The feature additions and performance enhancements are fairly significant. Of course, product versions are subjective anyway.
Rather than summarize the new features, I thought it would be best to post a few thoughts about each of the major new additions.
vSphere 4.1 offers scalability to 3,000 VMs per cluster, 10,000 online VMs (up to 15,000 registered VMs) per vCenter management server, and up to 30,000 VMs managed across vCenter servers in Linked Mode. Cluster scalability to 32 physical nodes – which was introduced in vSphere 4.0 – also remains. Oftentimes vendor scalability numbers are more for marketing bragging rights than for practical implementations. Many storage vendors, for example, want a professional services investment prior to allowing an organization to scale a cluster beyond as few as 8 nodes. It’s good that VMware will support 32-node scalability, but what are its partners saying? It’s time for VMware’s two VCE coalition partners (i.e., Cisco and EMC) at a minimum to step up and offer support clarity. Reference architectures to this scale (and associated support statements) would be ideal. Let’s hear from other storage and server partners while we’re at it. NetApp? Hitachi? HP? IBM? Fujitsu? Dell?
There are efficiency gains to be had with larger cluster deployments and I know of many clients (not just service providers) who would like to scale their physical clusters to larger numbers, but more guidance is needed from all vendors in the virtualization stack. Kudos to VMware for leading at the hypervisor layer, but their hardware vendors need to join the conversation. Published reference architectures would not only add credibility to VMware’s scalability claims, but also give our clients information they need to scale their own virtual infrastructures.
Storage and Network I/O Control
At the vSphere 4.0 launch in April 2009, I mentioned to Steve Herrod that I planned to ding VMware on the remaining limitations of the Distributed Resource Scheduler (DRS) service – specifically the lack of I/O accounting (you can read my 2009 post on the subject here). To Steve’s credit, he said “Go ahead. We deserve it.” At the time he knew that I was echoing concerns expressed to me by various VMware customers further along in their maturity, so VMware was getting this feedback from multiple sources. That being said, vSphere 4.1’s Storage and Network I/O control features are a big step toward intelligent workload balancing and quality of service (QoS) guarantees that extend to both storage and network I/O. For more information on the Storage I/O Control feature, take a look at Duncan Epping’s post on the subject. The VMware paper “VMware Network I/O Control: Architecture, Performance,and Best Practices” does a very good job describing the Network I/O Control feature.
At this point, storage and network I/O control are a big step in the right direction. VMware VMs are better equipped to get the storage and network I/O they require when they need it. Network I/O control is especially useful for prioritizing different traffic types (e.g., VM, IP storage, or vMotion). What I really like about storage I/O control is the fact that it’s configured at the datastore level. This means that storage I/O control is enforced across all VMs accessing a datastore (e.g., LUN), regardless of on which physical host they reside.
To be clear, both storage and network I/O control are a good first step, but are not yet complete. Both technologies rely on a shares algorithm, meaning that access is a percentage of overall resource availability. This is not the same as guaranteeing a specific amount of network or storage throughput to a given VM. Also, neither feature integrates with VMware’s DRS service. So when DRS decides where to place a VM, only available CPU and memory are considered. Ideally, DRS will evolve to consider network and storage I/O requirements, along with other non-technical criteria such as security zoning requirements when determining the best host for a VM.
You have to walk before you can run, and VMware did a very nice job with the Network and Storage I/O Control implementations. Let’s hope that we’re even closer to true QoS by the next release.
VMware’s vCenter management server has offered Active Directory authentication for some time, but many of our clients have continually asked VMware to extend AD authentication to the individual hypervisor level. Prior to vSphere 4.1, administrators could login to ESX or ESXi hypervisors using a local account (typically root). Organizations that wanted tighter security at the individual hypervisor required a third party add-on such as HyTrust. While hypervisor-level AD authentication was a key value-add for HyTrust,it’s other key features such as security policy enforcement and audit logging will ensure its longevity.
Simultaneous Live Migrations
Simultaneous live migration capabilities were significantly expanded in the 4.1 release as well. With 10 GbE network connectivity, up to 8 simultaneous live migration jobs are supported. Up to 4 concurrent live migration jobs are possible over 1 GbE. This capability is a big deal because live migrations are the key enabler for non-disruptive scheduled hardware maintenance. Consider a physical server that hosts 32 VMs. Clearly, live migrating 8 VMs at a time will take considerably shorter than live migrating 1 VM at a time (a limitation of other hypervisors). Network capacity, along with available CPU resources will determine how quickly the live migrations actually occur. Nonetheless, improvements in simultaneous live migration capabilities give VMware a decided advantage over competitors in this area.
Memory Compression is one other key vSphere 4.1 feature. Eric Sloof does a nice job describing the feature here. Memory compression works by compressing overcommitted memory and storing it in physical RAM. The idea is that compressed memory stored in physical RAM will outperform overcommitted memory paged to disk. VMware is making claims of performance gains up to 1,000X higher than swapping memory to disk. At this point I’m intrigued by the feature and am eager to hear the results our clients are seeing in their labs.
Last Release with ESX Architecture
One last item of importance with the vSphere 4.1 release is the fact that VMware announced that this is the last release of the ESX hypervisor. Moving forward, only the ESXi hypervisor will be shipped. Many of our clients ran Linux management agents and custom shell scripts on the Red Hat Enterprise Linux (RHEL)-based service console that was included with the ESX hypervisor. If you have ESX tools or agents that still rely on the RHEL console OS, it’s time to start planning your migration away from those tools. Once you upgrade to the next (post 4.1 version) of vSphere, all tools, scripts, and agents written to execute in the ESX service console will be left behind. VMware has been telling customers that this change was coming for several years, so hopefully you have been preparing for the end-of-life for the ESX console. If you are deploying new VMware virtual infrastructures, you should only be deploying the ESXi hypervisor at this point. The ESX hypervisor should only be deployed where absolutely required (e.g., to support a third party management product).