The Hadoop ecosystem is like a kaleidoscope, where particles keep colliding, tumbling and forming mesmerizing patterns created by the reflections. My research note What Matters When Comparing Hadoop Distributions is finally out. I’ve been writing it for four months. As soon as I felt it was ready, there were some principle points that I had to resolve, because the Hadoop kaleidoscope kept turning. Hadoop distribution vendors were changing their stances and clients were seeking guidance on more and more Hadoop-related subjects. What’s even more interesting, over this time, a whole wave of the Hadoop ecosystem products became better visible in the kaleidoscope: Databricks / Apache Spark, 0xdata H2O and Adatao are the examples.
I’d like to offer the main points from my research, which can help enterprises get a snapshot of the Hadoop kaleidoscope. Be aware, many new announcements will come from Strata / Hadoop World this week to keep beautiful and evanescent pictures in motion.
- Commercial Hadoop distributions eliminate the complexity of building a Hadoop stack on your own. They ensure indemnification and provide support of open-source software.
- There are more similarities than differences among commercial Hadoop distributions: All Hadoop distributions include the core open-source Apache Hadoop projects, many other open-source projects and a smaller set of distribution-specific components. Most distribution-specific components deliver functionality comparable with functionality of other distributions. This makes vendor lock-in concerns ungrounded for the majority of use cases.
- Hadoop distributions will improve and will deviate from their current state. Gartner expects new technologies in the Hadoop ecosystem in the near future.
- Hadoop’s value is not only in its features and capabilities — given growing YARN resource management maturity — it is also becoming the de facto cluster management standard.
- Given that Hadoop is engineering-driven, certain gaps important to the business could get low priority or may be overlooked.
- For many organizations, big data initiatives are the cutting edge of their innovation. Talented and experienced distribution vendors are often not just service providers but innovation partners and the source of new ideas in the enterprise.
- Cost should not be a key factor in deciding to implement Hadoop on your own. Acquire a commercial Hadoop distribution for your on-premises implementation to address unavoidable technology challenges.
- Partnerships between Hadoop distribution vendors and your key software or hardware suppliers are a main Hadoop distribution selection factor. Determine how a Hadoop distribution fits into your overall architecture.
- The majority of your time would be better spent on determining the value of Hadoop to your enterprise, rather than on choosing among Hadoop distributions.
- Your long-term architectures will evolve along with Hadoop. In the light of rapid changes and upcoming Hadoop improvements, focus your architecture on your immediate use cases.
Follow Svetlana on Twitter @Sve_Sic
Read Complimentary Relevant Research
Organizing for Big Data Through Better Process and Governance
With big data past the Peak of Inflated Expectations on the Hype Cycle, organizations are addressing next-level challenges and asking,...
View Relevant Webinars
What Big Data Means Today and How to Position Effectively
Gartner's original prediction that the term "Big Data" would become meaningless by 2020 was actually a bit off its largely useless already...
Category: big-data data-paprazzi hadoop information-everywhere innovation inquire-within
Tags: big-data biginsights cloudera data-paprazzi hadoop-distribution hortonworks innovation mapr pivotal-hd vendors
Comments or opinions expressed on this blog are those of the individual contributors only, and do not necessarily represent the views of Gartner, Inc. or its management. Readers may copy and redistribute blog postings on other blogs, or otherwise for private, non-commercial or journalistic purposes, with attribution to Gartner. This content may not be used for any other purposes in any other formats or media. The content on this blog is provided on an "as-is" basis. Gartner shall not be liable for any damages whatsoever arising out of the content or use of this blog.