Hopefully, that title got your attention. A recursive acronym – the term first appeared in the book Gödel, Escher, Bach: An Eternal Golden Braid and is likely more familiar to tech folks who know Gnu – is self-referential (as in “Gnu’s not Unix.”) So how did I conclude Hadoop, whose name origin we know, fits the definition? Easy – like everyone else, I’m redefining Hadoop to suit my own purposes.
Let’s start with the obvious one. Of course, Doug Cutting named Hadoop after his child’s toy elephant, seen here.
And in its early days, as I discussed in my post about the changing composition of distributions a few months back, the story was simpler. Hadoop was HDFS, MapReduce and some utilities. As those utilities got formalized and became projects themselves and were supported by commercial distributors, the list grew: Pig, Hive, HBase, and Zookeeper were Hadoop too. And a few months ago, as I noticed, Accumulo, Avro, Cascading, Flume, Mahout, Oozie, Spark, Sqoop, and YARN had joined the list.
YARN is the one that really matters here because it doesn’t just mean the list of components will change, but because in its wake the list of components will change Hadoop’s meaning. YARN enables Hadoop to be more than a brute force, batch blunt instrument for analytics and ETL jobs. It can be an interactive analytic tool, an event processor, a transactional system, a governed, secure system for complex, mixed workloads. At Strata this week, we’ll talk about its integration with Red Hat’s middleware, its cautious alliance with Spark for MapReduce replacement, its alliance with data wrangling tools from startups and Teradata, its connection, via Sentry, to security stacks… and more.
So yes, many of us are redefining Hadoop as we add new pieces – new use cases, new projects that change its very nature. My answer to “What is Hadoop”?
OK – it’s a bit cute. But hopefully, it got your attention. Hadoop’s journey is just beginning, and there is much more change ahead.