Blog post

November 2020 Hadoop Distribution Apache Project Tracker

By Merv Adrian | November 07, 2020 | 2 Comments

PrestoMapRITInformation TechnologyHuehpeGartnerClouderaApache ZookeeperApache ZeppelinApache YARNApache SqoopApache SparkApache RangerApache PigApache PhoenixApache OozieApache MapReduceApache KafkaApache ImpalaApache HiveApache HDFSApache HBaseApache HadoopApache FlumeApache FlinkApache DrillApache

Since our last tracker post, there has been a major version release: HPE Ezmeral Data Fabric (formerly MapR) version 6.2 was released in September, its first major release since the acquisition of MapR. You may be forgiven for not noticing – there has not been a great deal of product marketing communication about it. At Gartner I hear a lot of questions about the status of the offering. To their credit, the HPE team has been diligently connecting with customers and they claim a strong retention rate, and even expansions, in their base.

I’ve updated the matrix to reflect the components of the new release. A few notes for those unfamiliar with how the HPE offering works: HPE has its own versions of some core Apache pieces, and that is noted, for example, in the first three lines below. Two sets of pieces are in any release: Core, and what is still called the MapR Ecosystem Pack (MEP). The former includes support for the Apache Hadoop components – HDFS, MapReduce and YARN. MEP 7.0 supports Hadoop 2.7.4 as far as I can tell. The current Apache versions are 2.10.1 and 3.14 . Why are there two? That is another conversation, but I suspect most readers already know. Let me know if you think I ought to post about it.

HPE also directly documents the subprojects of Apache Kafka and I’ve indicated that with the word “multiple” in the matrix. So CONNECT, REST and Schema Registry are at 5.1.2, Streams is at 2.1.1. Other pieces they list include Livy 0.5.0, their own Data Access Gateway, and the MapR Object Store. In general, there are many differences in the details – but this will of course be true in any “Hadoop” based stack from any of the vendors. Should you choose to migrate from one to another, those pieces must be carefully understood.

[update 11/10/20] HPE tells me they continue to monitor and evaluate Hadoop 3.0, but their customers are rapidly adopting containers/Kubernetes for resource isolation/management instead of YARN​.
HPE also has an Apache Spark offering for Kubernetes as part of their Ezmeral portfolio product – HPE Ezmeral Container Platform with the integrated HPE Ezmeral Data Fabric. In addition, HPE Ezmeral ML Ops includes, out of the box, machine learning tools like TensorFlow and Jupyter as part of the offering. This is another example of the dispersion of stack offering by the vendor community into multiple product offerings, making it a little more challenging to keep track of what we used to call “Hadoop.”

Comments are closed


  • Sanjeev Mohan says:

    Thanks Merv for another excellent update and keeping us informed. I am curious why does Ezmeral 6.2 have Hadoop 2.7.4 and not Hadoop 3.x which has now been out for a while.