2013 promises to be a banner year for Apache Hadoop, platform providers, related technologies – and analysts who try to sort it out. I’ve been wrestling with ways to make sense of it for Gartner clients bewildered by a new set of choices, and for them and myself, I’ve built a stack diagram that describes possible functional layers of a Hadoop-based model.
The model is not exhaustive, and it continually evolves. In my own continuous collection and update of market, usage and technical data, it serves as a scratchpad I use – every project/product name in the layers is a link to a separate slide in a large deck I use to follow developments. As you can see below, it contains many Apache and non-Apache pieces – projects, products, vendors – open and closed source. Some are quite low level – for example Trevni can be thought of as a format used inside Avro – but I include them at least in part because I keep track of “moving parts,” and in the world of open source, that means a lot of pieces that are independent of one another.
Part of the effort so far has been on relating this model to Gartner’s Information Capabilities Framework, an enormously useful view of the verbs we use to compose our semantic use cases in building business applications. My colleague Ted Friedman and I just used the two models to assess how Hadoop stacks up as a Data Integration solution. Not surprisingly, I suppose, we found it wanting. You can see our research here if you’re a Gartner client.
I expect further refinement of this stack in the weeks ahead, and more offerings at each layer as it evolves as well. I’m trying to keep it simple – at 6 layers its already getting heavy, and I’d hate to add more. But that may be unavoidable. Your feedback here will be helpful – please offer comments if you have any! As a guide to choice, simplicity is a much-desired, but often unobtainable, objective.
Read Complimentary Relevant Research
Organizing for Big Data Through Better Process and Governance
With big data past the Peak of Inflated Expectations on the Hype Cycle, organizations are addressing next-level challenges and asking,...
View Relevant Webinars
What Big Data Means Today and How to Position Effectively
Gartner's original prediction that the term "Big Data" would become meaningless by 2020 was actually a bit off its largely useless already...
Category: apache hadoop hbase hdfs mapreduce sqoop big-data cloudera data-integration hortonworks open-source oss
Tags: apache flume hadoop hbase hdfs hive mapreduce oozie pig sqoop zookeeper cassandra cloudera datastax hadapt hortonworks hstreaming karmasphere mapr open-source oss
Comments or opinions expressed on this blog are those of the individual contributors only, and do not necessarily represent the views of Gartner, Inc. or its management. Readers may copy and redistribute blog postings on other blogs, or otherwise for private, non-commercial or journalistic purposes, with attribution to Gartner. This content may not be used for any other purposes in any other formats or media. The content on this blog is provided on an "as-is" basis. Gartner shall not be liable for any damages whatsoever arising out of the content or use of this blog.