Welcome to my co-author, Gartner analyst Sanjeev Mohan
It’s been an eventful 6 months since Merv published the last of these trackers. The Hadoop ecosystem is far from dead, as many pundits predicted. Cloudera Data Platform (CDP) has begun to ship in bare metal, public cloud and private cloud versions. MapR is now HPE Ezmeral Data Fabric. Microsoft has decided to support its own Hadoop distribution in the cloud for HDInsight. As the data below shows, most of the key components are actively being updated.
It is true that one doesn’t hear about Hadoop as much, because:
- It is no longer fashionable
- Its core components have achieved a level of maturity and stability that doesn’t need major revamps.
However, at Gartner, we encounter many clients running workloads at extreme scale using one of the flavors of Hadoop – on premises and in the cloud. It has become the workhorse that quietly delivers in the background and doesn’t attract much attention. Increasingly, it doesn’t even use HDFS and runs directly on CSPs’ object stores – and Cloudera has just rolled out Apache Ozone, an open source object store. And there are other interesting new enhancements we are eagerly awaiting such as broader integration of Kubernetes, dueling machine learning libraries, the competition among optimization layers like Arrow and Presto, and more.
Recognizing that dynamism, we’ve included the current releases of both Cloudera legacy offerings CDH and HDP here, since they are far more widely deployed than CDP – and will be for some time, and currency and support will be top of mind issues. CDP is represented by Runtime, which is common across multiple offerings in Cloudera’s new product architecture. And Amazon’s EMR, the first commercial Hadoop offering, is represented by two releases – Amazon EMR 5.31 and 6, which will have Hadoop 3.X and Spark 3.X series components. The community continues to be composed of those who have to have the newest shiny objects and more conservative users. Amazon is maintaining currency on both and tells us “For now, we suggest using Amazon EMR 5.3. We expect Amazon EMR 6.1 to launch in September, at which point we suggest using Amazon EMR 6.1.”
Finally, note that Cloudera’s packaging means some newer projects it supports, like Apache Flink or Apache NiFi, don’t show up here because you don’t get them in Runtime, only in specific use case offerings for operational data or data in motion. We dealt with a similar question when at first we did not find Apache Knox in Google Cloud Dataproc. It’s listed elsewhere, as a an optional component included in Component Gateway, which is avaialable at no additional charge. So recognize that those Apache projects have somewhat broader support in the market than they may appear to have here in this format. But even looking just at this set, component activity tells a story of a dynamic community that continues to birth new projects, and incorporate them even as some older ones fade. More on that in our next post.
As always, comments are invited. This is a huge, sprawling community and we can’t keep track of everything. Please send corrections, questions, experiences and opinions. Always happy to hear from you.
Measure the Business Impact of Data and Analytics
How can executive leaders advise their direct reports to ensure metrics focus on Data & Analytics programs themselves? Download this guide to receive guidance to overcome challenges when trying to measure business impact.Read Free Gartner Research
Category: amazon elastic-mapreduce amazon-web-services analytics-and-bi-solutions analytics-and-bi-solutions-for-technical-professionals apache accumulo ambari atlas apache-drill flink flume hadoop hbase hdfs hive impala kafka knox kudu mahout mapreduce apache-nifi oozie apache-parquet phoenix pig ranger solr spark sqoop tez apache-yarn apache-zeppelin zookeeper cloudera data-and-analytics-leaders data-and-analytics-strategies data-lake data-management-solutions data-management-solutions-for-technical-professionals google hortonworks hue mapr microsoft open-source presto spark-3 technology-and-emerging-trends
Tags: amazon apache flink flume hadoop hbase hdfs hive mapreduce apache-nifi oozie pig spark sqoop yarn zookeeper big-data-2 cdh cloudera hortonworks mapr microsoft
Comments or opinions expressed on this blog are those of the individual contributors only, and do not necessarily represent the views of Gartner, Inc. or its management. Readers may copy and redistribute blog postings on other blogs, or otherwise for private, non-commercial or journalistic purposes, with attribution to Gartner. This content may not be used for any other purposes in any other formats or media. The content on this blog is provided on an "as-is" basis. Gartner shall not be liable for any damages whatsoever arising out of the content or use of this blog.