This is a joint blog post between Nick Heudecker and Merv Adrian.
It’s Strata week here in San Jose, and with that comes a flood of new announcements on products, partners and funding. Today’s big announcement came in the form of the Open Data Platform (ODP). A number of companies have signed on, but in short, it’s got some Hadoopers, some service providers and systems integrators, as well as some analytics apps vendors.
There are a variety of arguments both for and against the ODP. Statler and Waldorf tried to encapsulate them here. They are surprisingly knowledgeable.
Statler The ODP’s primary objective is to create a standard set of components for Hadoop, thus ensuring enterprise customers aren’t locked into a specific vendor. Applications are portable, and end user organizations know what they’re getting if they pick the ODP standard. This is for the enterprise end users of Hadoop.
Waldorf “Standard”? For Hadoop? Excuse me, don’t we have one already? It’s called Apache Hadoop. Distributors add their own pieces, but there is a core that everybody agrees to. And APIs that tie it all together. But nobody has to pay to get in. You qualify with useful code, voted on by your peers. This is clearly for vendors, by vendors. These guys have Platinum members, Gold members – what’s that money for?
Statler Don’t kid yourself – Apache has Platinum sponsors too. And the timing in this announcement isn’t coincidental. Being a charitable organization doesn’t mean being altruistic – it’s a tax status. That said, it’s worth talking about how ODP relates to the Apache Software Foundation (ASF). As I see it, ODP doesn’t compete with the ASF. The ASF provides a governance model around open source software development, while ODP hopes to provide a vendor-led packaging model. Currently, the Hadoop vendors are fighting proxy battles in the ASF using committers. That destroys the spirit and undermines the purpose of the ASF. There have been allusions to a “fauxpen” process dominated by a few players packing the committees. Shifting these discussions to the ODP is the right move.
Waldorf And yet one of those dominant players is a charter member here. And “minority players” seem to have been able to get things like Drill and Phoenix in – because their code was good enough to get voted through the Incubator.
This is not about discussion – it’s about innovation. Apache’s job is to drive more innovation – let a thousand flowers bloom. If someone wants a better/richer engine than MapReduce, somebody creates Tez. If HBase isn’t secure enough, somebody else creates Accumulo. This new group pushes a least common denominator; it’s a frozen snapshot for its members to support until they deem a new one is ready.
Statler And you think that’s a bad thing? Every vendor’s Hadoop distribution is constantly changing. One month Spark is wholly unsupported, while the next month some components are supported, others are beta, while still others are shipped but unsupported. Repeat this for every component in the ever-expanding Hadoop stack. ODP offers stability end users can invest in. That stability offers a catalyst for mainstream Hadoop adoption.
Waldorf Sorry, I’m still not buying that. Pivotal used to, but now they don’t have to invest in the pieces that aren’t “common” anymore – not that they were doing so. Teradata doesn’t have a distribution and didn’t contribute that much either. IBM has a distro, and they contributed some, but mostly their “special sauce.” Hortonworks picks up the lead and offers support for at least one of the members – suddenly their role is very different. Maybe that is what the fees are for…
Statler Maybe. To get back to the question posed in the title – who asked for ODP? When cast with Pivotal’s other announcements around open sourcing HAWQ, and Greenplum, and other pieces – and partnering with Hortonworks, yes, it looks like ODP positions Hortonworks as the Hadoop arms dealer for the other players. Basing an open data platform on a single vendor’s packaging casts some doubt on “open.”
Waldorf Exactly. And it’s not just who wants it – who needs it? Aren’t the vendors already free to add their own pieces now? In fact they have to, to differentiate themselves. So are they saying the previous compatibility wasn’t compatible enough? Or are they creating a club they get to be the leading members in? Maybe this is Pivotal’s way to reduce its investment in a failing effort to build a proprietary way to capture a slice of this trend. Declare victory and retreat. And a way for Hortonworks to get included in many of their sales, and pick up some revenue for themselves.
Statler In the long run, ODP’s effectiveness at defining a certified core set of Hadoop components is an open question. But the long run doesn’t mean much in Silicon Valley.
Waldorf Or, you might say, in Redmond. It reminds me of ODBC. Microsoft’s term used to be “embrace and extend.” We’ll use your innovations. You keep on doing that. We’ll make some special pieces around the edge and monetize them atop that base. And you guys spend your time and effort on what the rest of us share. I notice they’re not in this announcement.
This simply institutionalizes a dichotomy in favor of a few favored players. Who wants it? As Cloudera suggests, the paying members, and it’s not clear who else. It’s ironic that Hortonworks is one of the founders of an organization that wants to add an anchor slowing innovation in the open source free-for-all it has been the flag-bearer for.
Gartner note: our recent webinar asked how many attendees considered vendor lock-in a barrier to investment in Hadoop. It came in dead last. With around 1% selecting it. More on that in a future post..
Read Complimentary Relevant Research
How to Create a Data Strategy for Machine Learning-Powered Artificial Intelligence
MLpAI can help deliver systems with more automation and less human intervention, but success requires a data strategy to deal with the...
View Relevant Webinars
Big Data Architectures: Comparing Relational and NoSQL Databases
In the big data arena, few choices are more important and impactful than the persistent data store. Relational and nonrelational databases...
Comments or opinions expressed on this blog are those of the individual contributors only, and do not necessarily represent the views of Gartner, Inc. or its management. Readers may copy and redistribute blog postings on other blogs, or otherwise for private, non-commercial or journalistic purposes, with attribution to Gartner. This content may not be used for any other purposes in any other formats or media. The content on this blog is provided on an "as-is" basis. Gartner shall not be liable for any damages whatsoever arising out of the content or use of this blog.