Gartner Blog Network

Hadoop Tracker – March 2017

by Merv Adrian  |  March 16, 2017  |  Comments Off on Hadoop Tracker – March 2017

Stack expansion has ground to a halt. The last time an Apache project was added to the list of those most supported by leading Hadoop distribution vendors was July 2016, when Kafka joined the other 14 then commonly included. Since then, no broad support for new projects has emerged.

As of March 2017, Google Cloud Dataproc has been added to the tracker, and the number supported by (now) 6 or 5 distributors has dropped back to 14. Google supports 6.

Another addition this time is a row at the top of the tracker counting identifiable, directly used projects that the vendors list. Some, like Apache Calcite, are generally used primarily by other components rather than by users directly, so they are not listed.

Coming soon: what differentiates the players’ packages is what is not in common. Which pieces are those?

Updated twice 3/17
Cloudera Kafka version corrected, Amazon EMR HBase version corrected (note also that Amazon’s includes S3 support), Hortonworks Hive and Kafka versions updated. 

Screen Shot 2017-03-17 at 2.09.44 PM


Additional Resources

Improve Critical Business Outcomes With Data-Driven Insights

Executive leaders can capitalize on the ability to make real-time decisions based on data to reduce costs, improve employee and customer experience, and make sales efforts more targeted.

Read Free Gartner Research

Category: elastic-mapreduce  amazon-web-services  apache  accumulo  ambari  avro  flume  hadoop  hbase  hdfs  hive  kafka  mapreduce  oozie  apache-parquet  pig  zookeeper  cloudera  data-and-analytics-strategies  gartner  google  hortonworks  ibm  biginsights  mapr  open-source  

Tags: amazon  apache  flume  hadoop  hbase  hdfs  hive  mapreduce  oozie  pig  spark  sqoop  yarn  zookeeper  big-data-2  biginsights  cdh  cloudera  gartner  hortonworks  ibm  mapr  microsoft  

Merv Adrian
Research VP
9 years with Gartner
40 years in IT industry

Merv Adrian is an analyst following database and adjacent technologies as extreme data transforms assumptions about what to persist as well as when, where and how. He also watches the way the software/hardware boundary… Read Full Bio

Comments are closed

Comments or opinions expressed on this blog are those of the individual contributors only, and do not necessarily represent the views of Gartner, Inc. or its management. Readers may copy and redistribute blog postings on other blogs, or otherwise for private, non-commercial or journalistic purposes, with attribution to Gartner. This content may not be used for any other purposes in any other formats or media. The content on this blog is provided on an "as-is" basis. Gartner shall not be liable for any damages whatsoever arising out of the content or use of this blog.