When is a technology offering a platform? Arguably, when people build products assuming it will be there. Or extend their existing products to support it, or add versions designed to run on it. Hadoop is there. The age of Bring Your Own Hadoop (BYOH) is clearly upon us. Specific support for components such as Pig and Hive vary, as do capabilities and levels of partnership in development, integration and co-marketing. Some vendors are in many categories – for example, Pentaho and IBM at opposite ends of the size spectrum interact with Hadoop in development tools, data integration, BI, and other ways. A few category examples, by no means exhaustive:
Analytic Platforms: Kognitio – an analytics-focused in-memory database with SQL 2011 support – offers significant Hadoop support. Jethrodata adds indexes for SQL and stores them in HDFS to accelerate BI tools, while Splunk’s offering called – gotta love it – Hunk has a similar approach. SAS recently added a partnership wkith Hortonworks, extending existing capabilities. For integrated marketing, RedPoint Global offers a true YARN-enabled engine with a rich array of capabilities that many tool vendors would envy.
Application Performance Management – longtime stalwart Compuware is bringing its portfolio to both Hadoop and leading NoSQL offerings.
BI Tools: specialists like Alpine Data Labs, Datameer, Karmasphere and Platfora position themselves as targeted for Hadoop environments. Traditional players like SAP Business Objects may represent themselves as connecting via Hive, and increasingly we will see some, like Alteryx, Qlikview and Tableau, are partnering with emerging distribution-specific stack components like Cloudera’s Impala.
Database: some vendors, like IBM and Teradata offer their own distributions, and even appliances. Others like Actian, Calpont, Oracle and Microsoft partner with pureplay vendors. All provide connectors, management interfaces to their own management tools, etc. MarkLogic adds an “enterprise NoSQL” flavor; Rainstor adds an archiving solution for a highly compressed Hadoop environment.
Data Integration: Informatica and Talend both support HDFS and even have specific offerings for ETL, data quality, etc. Revelytix Loom offers data prep and metadata creation capabilities to shorten time to use cycles.
Hadoop as a Service: Altiscale, Amazon, Qubole, Rackspace, Savvis, and Xplenty (who mask Hadoop development complexity) offer varying degrees of control and surrounding capabilities – and marketing, as the links demonstrate.
In-memory data grid (IMDG) engine: Gridgain offers GGFS, one of several HDFS substitutes, and like ScaleOut hServer offers an in-memory grid for execution of MapReduce code. Longtime IMDG player Terracotta has added a Hadoop connector.
Lifecycle Management: WANdisco is offering ALM and support for highly available distributed network deployments, and has recently partnered with Hortonworks.
Platform Performance Management: Appfluent, already providing visibility and performance optimization for Oracle, Teradata and IBM DB2 and PureData for Analytics (aka Netezza) platforms, has now added a Hadoop offering as well.
Search: numerous approaches and players here. One of the more interesting is LucidWorks, leveraging Lucene and Solr for search based use cases on a Hadoop infrastructure.
Workload Automation: BMC’s Control-M is already in place for a need that will become more significant as adoption rises and efficiency becomes more of an issue.
This is just a bare smattering of the evolving ecosystem, and I’d be delighted to have your additions, recommendations and comments. I will update this post to include them. Please jump in.