It’s no surprise that we’ve been treated to many year-end lists and predictions for Hadoop (and everything else IT) in 2013. I’ve never been that much of a fan of those exercises, but I’ve been asked so much lately that I’ve succumbed. Herewith, the first of a series of posts on what I see as the 4 Ps of Hsdoop in the year ahead: performance, projects, platforms and players.
Performance concerns are inevitable as technologies move from early adopters, who are already tweaking everything they build as a matter of course, to mainstream firms, where the value of the investment is always expected to be validated in part by measuring and demonstrating performance superiority. It also becomes an issue when the 3rd or 4th project comes along with a workload profile different from those that came before – and it doesn’t perform as well as those heady first experiments. Getting it right with Hadoop is as much art as science today – the tools are primitive or nonexistent, the skills are more scarce than the tools, and experience – and therefore comparative measurement – is hard to come by.
What’s coming: newly buffed up versions of key management tools. It’s one method of differentiating distributions in a largely common set of software – Hortonworks doubling down on open source Apache Ambari, Cloudera enhancing Cloudera Manager, MapR’s updated Control System (as well as their continued touting of DIY favorites Nagios and Ganglia.) EMC, HP, IBM and other megavendors are continuing to instrument their existing, and familiar, enterprise tools to reach this exploding market. It will be a busy bazaar.
Resources are proliferating to help: published work like Eric Sammer’s Hadoop Operations (somewhat Cloudera-centric but very well organized and useful). A plethora of Slideshare presentations designed to help navigate the arcana of cluster optimization, workload management, configuration optimization, are appearing.
Performance has figured in a number of proof of concept (POC) tests pitting distributions against one another that I’ve heard about from Gartner clients. Some have been inconclusive; some have had clear winners. As we’ve seen in DBMS POCs over the years, your data and your workloads matter, and your results may differ from others’. I’ve seen replacements of “first distributions” by another, as performance or differing functionality comes to the fore. I’ve even seen a case where a Cassandra-based alternative won out over the Hadoop distributions.
Next time: projects proliferate.
Category: hadoop hbase hdfs mapreduce sqoop big-data cloudera emc hortonworks ibm biginsights
Tags: apache flume hadoop hbase hdfs hive mapreduce pig sqoop zookeeper biginsights cloudera emc hortonworks ibm mapr
Comments or opinions expressed on this blog are those of the individual contributors only, and do not necessarily represent the views of Gartner, Inc. or its management. Readers may copy and redistribute blog postings on other blogs, or otherwise for private, non-commercial or journalistic purposes, with attribution to Gartner. This content may not be used for any other purposes in any other formats or media. The content on this blog is provided on an "as-is" basis. Gartner shall not be liable for any damages whatsoever arising out of the content or use of this blog.