For the past year and a half I’ve been tracking the path from 6 broadly supported (4 or more distributors) “Hadoop” projects in 2012 to 15 in June 2014, and now 17 in December 2015. Expansion continues. Clarity? Not so much. As I said in Now, What is Hadoop?:
You will find that you have to dig to find answers to the obvious question “If I pay for a support subscription, what will be supported?” “Support” in this analysis means if you pay for a subscription, that explicitly includes support for the named project.
The chart here is based on conversations with and/or web documentation from Amazon, Cloudera, Hortonworks, IBM, MapR, and Pivotal. Public documentation of distribution contents mostly remains incomplete, though IBM’s page does a good job.
So: what is “broadly supported” Hadoop in December 2015? The Apache Hadoop web site still names Hadoop Common, Hadoop Distributed File System(HDFS™), Hadoop YARN and Hadoop MapReduce, and gives them a common release number. I leave Common out and call that 3 projects.
Other projects supported by all the vendors include HBase, Hive, Oozie, Parquet, Pig, Spark, and Zookeeper – for a total of 10 projects supported by all.
(Spark has SQL, Streaming, graph, ML and time series libraries. Support varies; ask your vendor.)
Flume, Hue, Solr, and Sqoop are supported by 5. That gets us to 14 projects.
Avro, Kafka and Mahout have 4 supporters. And now we’re up to 17 projects.
What happens when we get beyond 4 supporters? We get to differentiation – places where the distributors are providing “their own” SQL interface, or security/governance stack, or management console. It’s not obvious, though, because many are not named “[distributor] ProjectX” but “Apache ProjectX.” More on that in the next post in this yearend series.