Gartner Blog Network

Supported Hadoop Stack Continues Expansion

by Merv Adrian  |  December 24, 2015  |  Comments Off on Supported Hadoop Stack Continues Expansion

For the past year and a half I’ve been tracking the path from 6 broadly supported (4 or more distributors) “Hadoop” projects in 2012 to 15 in June 2014, and now 17 in December 2015. Expansion continues. Clarity? Not so much. As I said in Now, What is Hadoop?:

You will find that you have to dig to find answers to the obvious question “If I pay for a support subscription, what will be supported?” “Support” in this analysis means if you pay for a subscription, that explicitly includes support for the named project.

The chart here is based on conversations with and/or web documentation from Amazon, Cloudera, Hortonworks, IBM, MapR, and Pivotal. Public documentation of distribution contents mostly remains incomplete, though IBM’s page does a good job.

So: what is “broadly supported” Hadoop in December 2015? The Apache Hadoop web site still names Hadoop Common, Hadoop Distributed File System(HDFS™), Hadoop YARN and Hadoop MapReduce, and gives them a common release number. I leave Common out and call that 3 projects.

Other projects supported by all the vendors include HBaseHiveOozie, ParquetPig, Spark, and Zookeeper – for a total of 10 projects supported by all.

(Spark has SQL, Streaming, graph, ML and time series  libraries. Support varies; ask your vendor.)

Flume, Hue, Solr, and Sqoop are supported by 5. That gets us to 14 projects.

Avro, Kafka and Mahout  have 4 supporters. And now we’re up to 17 projects.

Screen Shot 2015-12-11 at 2.28.51 PM

What happens when we get beyond 4 supporters? We get to differentiation – places where the distributors are providing “their own” SQL interface, or security/governance stack, or management console. It’s not obvious, though, because many are not named “[distributor] ProjectX” but “Apache ProjectX.” More on that in the next post in this yearend series.

Additional Resources

Predicts 2019: Data and Analytics Strategy

Data and analytics are the key accelerants of digitalization, transformation and “ContinuousNext” efforts. As a result, data and analytics leaders will be counted upon to affect corporate strategy and value, change management, business ethics, and execution performance.

Read Free Gartner Research

Category: amazon  amazon-web-services  apache  ambari  avro  flume  hadoop  hbase  hdfs  hive  kafka  mahout  mapreduce  oozie  apache-parquet  pig  solr  spark  sqoop  apache-yarn  zookeeper  cloudera  data-and-analytics-strategies  hortonworks  hue  industry-trends  mapr  open-source  pivotal  

Tags: apache  flume  hadoop  hbase  hdfs  hive  mapreduce  oozie  pig  sqoop  yarn  zookeeper  big-data-2  cloudera  hortonworks  ibm  mapr  open-source  

Merv Adrian
Research VP
9 years with Gartner
40 years in IT industry

Merv Adrian is an analyst following database and adjacent technologies as extreme data transforms assumptions about what to persist as well as when, where and how. He also watches the way the software/hardware boundary… Read Full Bio

Comments are closed

Comments or opinions expressed on this blog are those of the individual contributors only, and do not necessarily represent the views of Gartner, Inc. or its management. Readers may copy and redistribute blog postings on other blogs, or otherwise for private, non-commercial or journalistic purposes, with attribution to Gartner. This content may not be used for any other purposes in any other formats or media. The content on this blog is provided on an "as-is" basis. Gartner shall not be liable for any damages whatsoever arising out of the content or use of this blog.