Merv Adrian

A member of the Gartner Blog Network

Entries Categorized as 'Oozie'


What Is Hadoop….Now?

by Merv Adrian  |  June 28, 2014  |  4 Comments

In February 2012, Gartner published How to Choose The Right Apache Hadoop Distribution (available to clients). At the time, the leading distributors were Cloudera, EMC (now Pivotal), Hortonworks (pre-GA), IBM, and MapR. These players all supported six Apache projects: HDFS, MapReduce, Pig, Hive, HBase, and Zookeeper. Things have changed. [updated June 29] We included Datastax (a distributor  of Apache Cassandra) […]

4 Comments »

Category: Accumulo Apache Apache Yarn Avro Cascading Cloudera Falcon Flume Gartner Giraph Hadoop Hbase HDFS Hive Hortonworks Hue IBM Knox Lucene Mahout MapR MapReduce Oozie Pig Pivotal Spark Sqoop Storm Tez YARN Zookeeper     Tags:

Hadoop is in the Mind of the Beholder

by Merv Adrian  |  March 24, 2014  |  11 Comments

This post was jointly authored by Merv Adrian (@merv) and Nick Heudecker (@nheudecker) and appears on both blogs. In the early days of Hadoop (versions up through 1.x), the project consisted of two primary components: HDFS and MapReduce. One thing to store the data in an append-only file model, distributed across an arbitrarily large number […]

11 Comments »

Category: Accumulo Ambari Apache Apache Drill Apache Yarn Big Data BigInsights Cloudera Elastic MapReduce Gartner Giraph Hadoop Hbase HCatalog HDFS Hive Hortonworks IBM Intel Lucene MapR MapReduce Oozie open source OSS Pig Solr Sqoop Storm YARN Zookeeper     Tags: , , , , , , , , , , , , , , , , , , , , , , , ,

Hadoop Summit Recap Part Two – SELECT FROM hdfs WHERE bigdatavendor USING SQL

by Merv Adrian  |  July 15, 2013  |  10 Comments

Probably the most widespread, and commercially imminent, theme at the Summit was “SQL on Hadoop.” Since last year, many offerings have been touted, debated, and some have even shipped. In this post, I offer a brief look at where things stood at the Summit and how we got there. To net it out: offerings today […]

10 Comments »

Category: Apache Apache Drill Apache Yarn Aster Big Data Cloudera data warehouse DBMS Gartner Hadapt Hadoop HCatalog HDFS Hive Hortonworks IBM MapR MapReduce Microsoft Netezza Oozie Oracle Rainstor RDBMS Real-time SQL Server Sqoop Teradata YARN     Tags: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

Hadoop 2013 – Part Two: Projects

by Merv Adrian  |  February 21, 2013  |  1 Comment

In Part One of this series, I pointed out that how significant attention is being lavished on performance in 2013. In this installment, the topic is projects, which are proliferating precipitously. One of my most frequent client inquiries is “which of these pieces make Hadoop?” As recently as a year ago, the question was pretty simple for […]

1 Comment »

Category: Accumulo Ambari Apache Apache Drill Apache Yarn BigInsights Cassandra Cloudera Dataguise EMC Gartner Giraph graph databases Hadapt Hadoop Hbase HCatalog HDFS Hive Hortonworks Hstreaming IBM InfoSphere Lucense MapReduce Mshout Oozie open source Pig Rainstor Serengeti Solr SQLstream Sqoop VMware Zookeeper     Tags: , , , , , , , , , , , , , , , , , , , ,