Blog post

BYOH – Hadoop’s a Platform. Get Used To It.

By Merv Adrian | October 09, 2013 | 6 Comments

TeradataTalendSQLStreamSQL ServerSecurityRDBMSRainstorOracleMicrosoftMapRIBM BigInsightsIBMHortonworksDBMSClouderaApache SolrApache PigApache MapReduceApache LuceneApache HiveApache HDFSApache HadoopData and Analytics Strategies

When is a technology offering a platform? Arguably, when people build products assuming it will be there. Or extend their existing products to support it, or add versions designed to run on it. Hadoop is there. The age of Bring Your Own Hadoop (BYOH) is clearly upon us.  Specific support for components such as Pig and Hive vary, as do capabilities and levels of partnership in development, integration and co-marketing. Some vendors are in many categories – for example, Pentaho and IBM at opposite ends of the size spectrum interact with Hadoop in development tools, data integration, BI, and other ways. A few category examples, by no means exhaustive:

Analytic Platforms: Kognitio – an analytics-focused in-memory database with SQL 2011 support – offers significant Hadoop support. Jethrodata adds indexes for SQL and stores them in HDFS to accelerate BI tools, while Splunk’s offering called – gotta love it – Hunk has a similar approach. SAS recently added a partnership wkith Hortonworks, extending existing capabilities. For  integrated marketing, RedPoint Global offers a true YARN-enabled engine with a rich array of capabilities that many tool vendors would envy.

Application Performance Management – longtime stalwart Compuware is bringing its portfolio to both Hadoop and leading NoSQL offerings.

BI Tools: specialists like Alpine Data LabsDatameerKarmasphere and Platfora position themselves as targeted for Hadoop environments. Traditional players like SAP Business Objects may represent themselves as connecting via Hive, and increasingly we will see some, like Alteryx,  Qlikview  and Tableau, are partnering with emerging distribution-specific stack components like Cloudera’s Impala.

Database: some vendors, like IBM and Teradata offer their own distributions, and even appliances. Others like Actian, CalpontOracle and Microsoft partner with pureplay vendors. All provide connectors, management interfaces to their own management tools, etc. MarkLogic adds an “enterprise NoSQL” flavor; Rainstor adds an archiving solution for a highly compressed Hadoop environment.

Data Integration: Informatica and Talend both support HDFS and even have specific offerings for ETL, data quality, etc. Revelytix Loom offers data prep and metadata creation capabilities to shorten time to use cycles.

Development platforms: Continuuity is out to an early lead here, but is hardly alone and won’t be the last. For example, SQL Server development tool player Red Gate will enter the market soon.

Hadoop as a Service: Altiscale, Amazon, QuboleRackspace,  Savvis, and Xplenty (who mask Hadoop development complexity) offer varying degrees of control and surrounding capabilities – and marketing, as the links demonstrate.

In-memory data grid (IMDG) engine: Gridgain offers GGFS, one of several HDFS substitutes, and like  ScaleOut hServer offers an in-memory grid for execution of MapReduce code. Longtime IMDG player Terracotta has added a Hadoop connector.

Lifecycle Management: WANdisco is offering ALM and support for highly available distributed network deployments, and has recently partnered with Hortonworks.

Platform Performance Management: Appfluent, already providing visibility and performance optimization for Oracle, Teradata and IBM DB2 and PureData for Analytics (aka Netezza) platforms, has now added a Hadoop offering as well.

Search: numerous approaches and players here. One of the more interesting is LucidWorks, leveraging Lucene and Solr for search based use cases on a Hadoop infrastructure.

Security: Dataguise,  GazzangProtegrity, and Zettaset offer varying components of a full-stack security hardening for a Hadoop deployment.

Stream Processing: DataTorrentTibco Streambase, Vitria and Zoomdata are among the early players here.

Workload Automation: BMC’s Control-M is already in place for a need that will become more significant as adoption rises and efficiency becomes more of an issue.

This is just a bare smattering of the evolving ecosystem, and I’d be delighted to have your additions, recommendations and comments. I will  update this post to include them. Please jump in.

The Gartner Blog Network provides an opportunity for Gartner analysts to test ideas and move research forward. Because the content posted by Gartner analysts on this site does not undergo our standard editorial review, all comments or opinions expressed hereunder are those of the individual contributors and do not represent the views of Gartner, Inc. or its management.

Comments are closed


  • @Merv – great market landscape, good to see how far reaching it is. There are clearly many tools connecting to various constituents of Hadoop – HDFS, Hive, etc. – but Hadoop really becomes a platform when tools run inside/on top of it.

    In the data integration/data quality space, that means running the transformation & cleansing jobs inside Hadoop, through MapReduce code generation for example (a bit like ELT jobs run inside a RDBMS). In the BI space that would mean crunching these queries in MapReduce, not just extract the raw data to load it in a distinct cube.

    I know this is the direction the market is taking, but not all legacy architectures will be able to transition to that concept.

    • Merv Adrian says:

      Thanks, Yves. It is really just a quick sketch; doing it right would take many pages. I’m hoping folks weigh in and say “Add this” – and I will, keeping this as a living post for a while.
      There’s a good discussion to be had here about how the architecture will evolve, starting with “which pieces have to be in the stack to consider it Hadoop?” – and that will change as YARN proliferates and people use other, non-MR engines in their tacks. But certainly MR will remain the vehicle of choice for DI work in the HDFS-and-alternatives data layer.

  • Immo Salo says:

    Good post about what Hadoop is all about: the ecosystem! Far too often Hadoop is only seen as a cheap alternative to store massive amounts of data or batch processing it. Seeing it from platform or ecosystem perspective emphasises the potential it has to utilise the stored (or streamed) data.

    I believe that the wide range of appliances and other so called big data solutions that are providing integration to Hadoop or are “powered by Hadoop” make it more and more obvious that this platform is here to stay (HP’s HAVEn, IBM’s InfoSphere BigInsights, Microsoft’s Parallel Data Warehouse etc.).

  • Pierce Lamb says:

    Great overview. One category you might consider adding is ‘In-Memory Hadoop’ a la ScaleOut Software, Grid Gain, Terracotta, Pivotal etc. Basically the vendors that are combining in-memory computing technology with Hadoop to enable developers to run MapReduce on live, fast-changing data and speed up job execution.

    For reference

    ScaleOut hServer (disclaimer, I’m a ScaleOut employee)

    Grid Gain accelerator for Hadoop

    Terracotta BigMemory Hadoop Connector

    Pivotal (not yet launched)

  • Nice overview, Don’t forget the ODBC Connection plumbing offered by Simba. We have the most of the Major Hadoop players (Cloudera, MapR, Hortonworks, Intel, Microsoft, etc) offering our ODBC drivers as part of their distributions, as well as solutions for MongoDB, Cassandra, Salesforce Google BigQuery and others.

    We enable the ecosystem to talk to each other in a standardized and reliable way, via the most common interface.