Hopefully, that title got your attention. A recursive acronym – the term first appeared in the book Gödel, Escher, Bach: An Eternal Golden Braid and is likely more familiar to tech folks who know Gnu – is self-referential (as in “Gnu’s not Unix.”) So how did I conclude Hadoop, whose name origin we know, fits the definition? Easy – like everyone else, I’m redefining Hadoop to suit my own purposes.
Let’s start with the obvious one. Of course, Doug Cutting named Hadoop after his child’s toy elephant, seen here.
And in its early days, as I discussed in my post about the changing composition of distributions a few months back, the story was simpler. Hadoop was HDFS, MapReduce and some utilities. As those utilities got formalized and became projects themselves and were supported by commercial distributors, the list grew: Pig, Hive, HBase, and Zookeeper were Hadoop too. And a few months ago, as I noticed, Accumulo, Avro, Cascading, Flume, Mahout, Oozie, Spark, Sqoop, and YARN had joined the list.
YARN is the one that really matters here because it doesn’t just mean the list of components will change, but because in its wake the list of components will change Hadoop’s meaning. YARN enables Hadoop to be more than a brute force, batch blunt instrument for analytics and ETL jobs. It can be an interactive analytic tool, an event processor, a transactional system, a governed, secure system for complex, mixed workloads. At Strata this week, we’ll talk about its integration with Red Hat’s middleware, its cautious alliance with Spark for MapReduce replacement, its alliance with data wrangling tools from startups and Teradata, its connection, via Sentry, to security stacks… and more.
So yes, many of us are redefining Hadoop as we add new pieces – new use cases, new projects that change its very nature. My answer to “What is Hadoop”?
OK – it’s a bit cute. But hopefully, it got your attention. Hadoop’s journey is just beginning, and there is much more change ahead.
Category: apache accumulo flume hadoop hbase hdfs hive mahout mapreduce oozie pig spark sqoop apache-yarn zookeeper cascading data-and-analytics-strategies gartner teradata
Tags: apache flume hadoop hbase hdfs hive mapreduce oozie pig sqoop zookeeper gartner teradata
Comments or opinions expressed on this blog are those of the individual contributors only, and do not necessarily represent the views of Gartner, Inc. or its management. Readers may copy and redistribute blog postings on other blogs, or otherwise for private, non-commercial or journalistic purposes, with attribution to Gartner. This content may not be used for any other purposes in any other formats or media. The content on this blog is provided on an "as-is" basis. Gartner shall not be liable for any damages whatsoever arising out of the content or use of this blog.