Gartner Blog Network


IBM Ends Hadoop Distribution, Hortonworks Expands Hybrid Open Source

by Merv Adrian  |  June 21, 2017  |  2 Comments

IBM has followed Intel and EMC/Pivotal in abandoning efforts to make a business of Hadoop distributions, and followed Microsoft in making Hortonworks its supplying partner. At the former Hadoop Summit, now called Dataworks (itself a sign of the shift from Hadoop-centric positioning), IBM announced it will discontinue its IBM Open Platform/BigInsights offering, and will instead OEM Hortonworks’ HDP. In a 7 year agreement, IBM will provide its Data Science Experience to Hortonworks, which will make it a part of the marketed HDP distribution. Hortonworks CEO Rob Bearden noted in a discussion with analysts that “we were already shipping the same bits” and that the ODPi relationship they share had helped to make the commonalities more obvious and easily rationalized (though the ODPi platform is still a small subset of the typical stack used by customers.)

Like Hortonworks’ earlier Enterprise Data Warehouse Optimization offering, this new packaging strategy will permit the firm to combine its open source components with partner pieces that are not open source.Hybrid open source combines “the free stuff” everyone shares with proprietary bits that help differentiate and monetize the whole package, the theory goes, and clearly both vendors hope the go to market strategy will play in Peoria as well as it does in the Silicon Valley venture capital community.

The extra-cost, non-Apache piece here is IBM’s Big SQL, which IBM calls the ultimate hybrid engine, permitting concurrent, optimized use of Hive, HBase and Spark and other sources using a single database connection. This echoes the Pivotal exit last year – Pivotal’s HAWQ was supposed to blunt the performance advantage of Cloudera’s Impala over Hive. It didn’t happen. Hortonworks has had to keep plowing R&D into Hive. Whether BigSQL will be different is hardly clear – IBM has not made convincing progress in the market with it.

The two firms will also up the ante on the  data governance front, which both believe will be a key driver of demand in mainstream firms in the months ahead. They will advance the integration of IBM BigIntegrate, BigQuality and Information Governance Catalog with the Apache Atlas project Hortonworks has spearheaded. Being the go-to providers for a large base of IBM customers who are already relying on IBM for much of their governance stack and hoping to extend its coverage to new data appears promising. The deal signals that the elevation of governance and security as issues will not only continue, but be increasingly pursued by players with real credibility and experience with both. The huge majority of Hadoop adopters who are currently stalled in attempting to get to broad production use will need to deal with those issues, as well as the serious skills and performance gaps they struggle with today, if and when their pilots and first projects gain internal acceptance.

For IBM, the partnership will permit redirecting its internal resources from BigInsights to work on machine learning, Spark and governance. IBM likely had nearly as many developers as customers on BigInsights, and many of its users were apparently given the offering as part of larger deals – usage stories have been few and far between, despite some suggestions in the press (though not directly from IBM) of hundreds of users. Both companies can focus their story higher up the stack and continue the evolution of the Hadoop narrative from “components and platforms” to “use cases and solutions.” Will that drive footprint expansion in today’s customer base, and break down the barriers Hadoop has been bouncing off? It remains to be seen.

Such packages demonstrate what Gartner calls the “disaggregation” of the Hadoop stack – the recognition that building a solution requires both more and less than the Hadoop distribution itself. This is a commercial dilemma for the pure plays – and is one of the reasons their numbers are shrinking even as they move away from being “Hadoop companies” to something else.

There are other challenges ahead for Hortonworks – they’ll have to accommodate and convert some IBM customers, and maybe convince them to pay. Moreover,  the IBM deal does not help them in the cloud much. Their Microsoft partnership will continue to do that, but the choice between those two partners will be another interesting challenge for the Hortonworks sales force to navigate in the months ahead.

 

Category: apache  atlas  hadoop  hive  impala  big-data  cloudera  gartner  hortonworks  ibm  biginsights  microsoft  odp  open-source  pivotal  

Tags: apache  hadoop  hive  biginsights  cloudera  gartner  governance  hortonworks  ibm  microsoft  

Merv Adrian
Research VP
5 years with Gartner
38 years in IT industry

Merv Adrian is an analyst following database and adjacent technologies as extreme data transforms assumptions about what to persist as well as when, where and how. He also watches the way the software/hardware boundary… Read Full Bio


Thoughts on IBM Ends Hadoop Distribution, Hortonworks Expands Hybrid Open Source


  1. Bjorn Tuft says:

    Hi Merv,the writing was on the wall. Many IBM customers want the comfort of support from a big partner but they do not get the Big Data software from the same partner, they get it from GitHub and for most issues they go to Google and specialised sites. Maybe Cloudera and Hortonworks will act as gatekeepers, if they support a project in their distribution, then it is mature enough for enterprise use, the two companies will consecrate components.

    • Merv Adrian says:

      They are not alone, of course – there are other distributors. But who will be the “supporter” of record is a question the market haas not answered yet, and that uncertainty is a challenge. Thanks for the comment.



Comments are closed

Comments or opinions expressed on this blog are those of the individual contributors only, and do not necessarily represent the views of Gartner, Inc. or its management. Readers may copy and redistribute blog postings on other blogs, or otherwise for private, non-commercial or journalistic purposes, with attribution to Gartner. This content may not be used for any other purposes in any other formats or media. The content on this blog is provided on an "as-is" basis. Gartner shall not be liable for any damages whatsoever arising out of the content or use of this blog.