When is a technology offering a platform? Arguably, when people build products assuming it will be there. Or extend their existing products to support it, or add versions designed to run on it. Hadoop is there. The age of Bring Your Own Hadoop (BYOH) is clearly upon us. Specific support for components such as Pig and Hive vary, as do capabilities and levels of partnership in development, integration and co-marketing. Some vendors are in many categories – for example, Pentaho and IBM at opposite ends of the size spectrum interact with Hadoop in development tools, data integration, BI, and other ways. A few category examples, by no means exhaustive:
Analytic Platforms: Kognitio – an analytics-focused in-memory database with SQL 2011 support – offers significant Hadoop support. Jethrodata adds indexes for SQL and stores them in HDFS to accelerate BI tools, while Splunk’s offering called – gotta love it – Hunk has a similar approach. SAS recently added a partnership wkith Hortonworks, extending existing capabilities. For integrated marketing, RedPoint Global offers a true YARN-enabled engine with a rich array of capabilities that many tool vendors would envy.
Application Performance Management – longtime stalwart Compuware is bringing its portfolio to both Hadoop and leading NoSQL offerings.
BI Tools: specialists like Alpine Data Labs, Datameer, Karmasphere and Platfora position themselves as targeted for Hadoop environments. Traditional players like SAP Business Objects may represent themselves as connecting via Hive, and increasingly we will see some, like Alteryx, Qlikview and Tableau, are partnering with emerging distribution-specific stack components like Cloudera’s Impala.
Database: some vendors, like IBM and Teradata offer their own distributions, and even appliances. Others like Actian, Calpont, Oracle and Microsoft partner with pureplay vendors. All provide connectors, management interfaces to their own management tools, etc. MarkLogic adds an “enterprise NoSQL” flavor; Rainstor adds an archiving solution for a highly compressed Hadoop environment.
Data Integration: Informatica and Talend both support HDFS and even have specific offerings for ETL, data quality, etc. Revelytix Loom offers data prep and metadata creation capabilities to shorten time to use cycles.
Hadoop as a Service: Altiscale, Amazon, Qubole, Rackspace, Savvis, and Xplenty (who mask Hadoop development complexity) offer varying degrees of control and surrounding capabilities – and marketing, as the links demonstrate.
In-memory data grid (IMDG) engine: Gridgain offers GGFS, one of several HDFS substitutes, and like ScaleOut hServer offers an in-memory grid for execution of MapReduce code. Longtime IMDG player Terracotta has added a Hadoop connector.
Lifecycle Management: WANdisco is offering ALM and support for highly available distributed network deployments, and has recently partnered with Hortonworks.
Platform Performance Management: Appfluent, already providing visibility and performance optimization for Oracle, Teradata and IBM DB2 and PureData for Analytics (aka Netezza) platforms, has now added a Hadoop offering as well.
Search: numerous approaches and players here. One of the more interesting is LucidWorks, leveraging Lucene and Solr for search based use cases on a Hadoop infrastructure.
Workload Automation: BMC’s Control-M is already in place for a need that will become more significant as adoption rises and efficiency becomes more of an issue.
This is just a bare smattering of the evolving ecosystem, and I’d be delighted to have your additions, recommendations and comments. I will update this post to include them. Please jump in.
Read Complimentary Relevant Research
Laying the Foundation for Artificial Intelligence and Machine Learning: A Gartner Trend Insight Report
Now more than ever, technical professionals must focus on developing the foundational components needed to support artificial intelligence...
View Relevant Webinars
State of Data Security
Warning: Your data is not all neatly defined, structured, organized and secured in your datacenter. Determining or defining the data...
Category: hadoop hdfs hive lucene mapreduce pig solr big-data cloudera dbms hortonworks ibm biginsights mapr microsoft oracle rainstor rdbms security sql-server sqlstream talend teradata
Tags: apache hadoop hbase hdfs hive pig big-data-2 biginsights cdh cloudera compuware hortonworks ibm jethrodata karmasphere kognitio metadata oracle pentaho qubole revelytix sas splunk teradata
Comments or opinions expressed on this blog are those of the individual contributors only, and do not necessarily represent the views of Gartner, Inc. or its management. Readers may copy and redistribute blog postings on other blogs, or otherwise for private, non-commercial or journalistic purposes, with attribution to Gartner. This content may not be used for any other purposes in any other formats or media. The content on this blog is provided on an "as-is" basis. Gartner shall not be liable for any damages whatsoever arising out of the content or use of this blog.