Many IT organizations today are being asked to do more with less, reducing budgets or perhaps curtailing data center expansion projects altogether. Faced with the harsh realities of a difficult economic climate data center managers will need to focus on creating the most efficient operating environments in order to extend the life of existing data centers. These efficiencies can be gained through many avenues; increasing compute densities, creating cold aisle containment systems, more effective use of outside air, but the key component over time will be having an easily understood metric to gauge just how efficient the data center is, and how much improvement in efficiencies have been created on an ongoing basis.
What’s the issue? With the increased awareness of the environmental impact data centers can have there has been a flurry of activity around the need for a data center efficiency metric. Many that have been proposed, including PUE to DCiE attempt to map a direct relationship between total facility power delivered and IT equipment power used. While these metrics will provide a high level benchmark for comparison purposes between data centers, what they do not provide is any criteria to show incremental improvements in efficiency over time. Most importantly, they do not allow for monitoring the effective use of the power supplied – just the differences between power supplied and power consumed. There is no negative, or positive effect of more efficiently utilizing the compute resources at hand, so a data center with a PUE of 2.0, running all x86 servers at 5% utilization, could have the same PUE as another data center running the same number of servers at 50% utilization, effectively producing 10 times the compute capacity as data center 1.
The PPE Metric: A more effective way to look at energy consumption is to analyze the effective use of power by existing IT equipment, relative to the performance of that equipment. While this may sound intuitively obvious (who wouldn’t want more efficient IT). A typical x86 server will consume between 60% and 70% of its total power load when running at low utilization levels. Raising utilization levels has only a nominal impact on power consumed and yet a significant impact on effective performance per kilowatt. Pushing IT resources towards higher effective performance per kilowatt can have a twofold effect of improving energy consumption (putting energy to work), and also extending the life of existing assets through increased throughput.
If major IT assets were evaluated in this manner it becomes clear that not only can more efficient environments be created, but that individual asset utilization levels can be increased, effectively improving the performance per square foot within the data center, and potentially deferring the construction of a new data center.
At Gartner we have created a metric to help demonstrate this effect, Called the Power to Performance Effectiveness metric (PPE), it was developed to help identify at the device level, where efficiencies could be gained. Unlike other metrics the PPE does not compare actual performance to hypothetical maximums, but rather is designed to allow the user to define their own optimal maximum performance levels, and then compare average performance against the optimum.
There are three critical components that come into play, only one of which is out of the primary control of IT; rack density levels, server utilization levels, and energy consumption. Rack density levels are usually mandated by IT management and as often as not are defined based on power levels and the potential heat load that might be generated by specific rack densities. In a typical data center today rack densities of 50%-60% are very common, yielding an average of 25 1U server slots per rack. Server utilization, especially in x86 environments, is often at the low end of the performance range, averaging between 7 and 15% in many organizations today. One of the key drivers for virtualization has been to improve these performance levels, driving servers up towards 60%-70% average utilization. Driving these servers to higher utilization levels does not dramatically increase power consumption, but PPE is designed to capture that as well. Therefore optimal power can be defined as not total compute output – but realistic compute output – compared to energy used.
Since PPE looks at not only energy required but asset utilization levels, the results point out rather quickly \the percentage of growth available within this existing configuration. With a combination of higher virtualization levels and increased rack densities it’s likely most rack environments will support existing growth rates for quite some time. And yes, we must assume that both power and cooling are available to support these higher densities. If not, an analysis of the cost to add additional power and cooling vs. the cost to build out a new data center might in fact change the overall decision making process.
Bottom Line: PPE is not the end-all of power and performance monitoring, but was designed to give IT Managers a view of performance levels within their data centers, and a means to compare that performance to realistic potential (optimal) performance levels, rather than just using a hypothetical maximum. Using PPE on an ongoing basis will yield a clear view of how power and performance use is changing over time, and how an organizations overall data center efficiency is improving. A detailed review of PPE will be published on the Gartner research site shortly.