Gartner Blog Network


Where Art Thou Hadoop?

by Svetlana Sicular  |  October 14, 2014  |  1 Comment

The Hadoop ecosystem is like a kaleidoscope, where particles keep colliding, tumbling and forming mesmerizing patterns created by the reflections. My research note What Matters When Comparing Hadoop Distributions is finally out. I’ve been writing it for four months. As soon as I felt it was ready, there were some principle points that I had to resolve, because the Hadoop kaleidoscope kept turning. Hadoop distribution vendors were changing their stances and clients were seeking guidance on more and more Hadoop-related subjects. What’s even more interesting, over this time, a whole wave of the Hadoop ecosystem products became better visible in the kaleidoscope: Databricks / Apache Spark, 0xdata H2O and Adatao are the examples.

I’d like to offer the main points from my research, which can help enterprises get a snapshot of the Hadoop kaleidoscope. Be aware, many new announcements will come from Strata / Hadoop World this week to keep beautiful and evanescent pictures in motion.

Key Findings

  • Commercial Hadoop distributions eliminate the complexity of building a Hadoop stack on your own. They ensure indemnification and provide support of open-source software.
  • There are more similarities than differences among commercial Hadoop distributions: All Hadoop distributions include the core open-source Apache Hadoop projects, many other open-source projects and a smaller set of distribution-specific components. Most distribution-specific components deliver functionality comparable with functionality of other distributions. This makes vendor lock-in concerns ungrounded for the majority of use cases.
  • Hadoop distributions will improve and will deviate from their current state. Gartner expects new technologies in the Hadoop ecosystem in the near future.
  • Hadoop’s value is not only in its features and capabilities — given growing YARN resource management maturity — it is also becoming the de facto cluster management standard.
  • Given that Hadoop is engineering-driven, certain gaps important to the business could get low priority or may be overlooked.
  • For many organizations, big data initiatives are the cutting edge of their innovation. Talented and experienced distribution vendors are often not just service providers but innovation partners and the source of new ideas in the enterprise.

Recommendations

  • Cost should not be a key factor in deciding to implement Hadoop on your own. Acquire a commercial Hadoop distribution for your on-premises implementation to address unavoidable technology challenges.
  • Partnerships between Hadoop distribution vendors and your key software or hardware suppliers are a main Hadoop distribution selection factor. Determine how a Hadoop distribution fits into your overall architecture.
  • The majority of your time would be better spent on determining the value of Hadoop to your enterprise, rather than on choosing among Hadoop distributions.
  • Your long-term architectures will evolve along with Hadoop. In the light of rapid changes and upcoming Hadoop improvements, focus your architecture on your immediate use cases.

kaleidoscope

Follow Svetlana on Twitter @Sve_Sic

Category: big-data  data-paprazzi  hadoop  information-everywhere  innovation  inquire-within  

Tags: big-data  biginsights  cloudera  data-paprazzi  hadoop-distribution  hortonworks  innovation  mapr  pivotal-hd  vendors  

Svetlana Sicular
Research Director
3 years at Gartner
21 years IT industry

Svetlana Sicular has a uniquely combined experience of Fortune 500 IT and business leadership, product management at world-class software vendors, and Big Four consulting. She primarily handles inquiries in the areas of data management strategy, ...Read Full Bio


Thoughts on Where Art Thou Hadoop?




Comments are closed

Comments or opinions expressed on this blog are those of the individual contributors only, and do not necessarily represent the views of Gartner, Inc. or its management. Readers may copy and redistribute blog postings on other blogs, or otherwise for private, non-commercial or journalistic purposes, with attribution to Gartner. This content may not be used for any other purposes in any other formats or media. The content on this blog is provided on an "as-is" basis. Gartner shall not be liable for any damages whatsoever arising out of the content or use of this blog.