There is a romantic notion of leaving the past behind and embracing the future unencumbered. Previous mistakes forgotten, we can venture forward to accomplish great things to the amazement of friends, colleagues and casual onlookers.
This is the promise made by BI and analytics vendors in the Hadoop-only ecosystem. After all, if your data moves to Hadoop, why concern yourself with data stored in legacy data warehouses? Based on the audience response from a polling question conducted during a webinar on Hadoop 2.0, you can’t escape your past. You can only embrace it.
On January 16th, Merv Adrian and I presented webinars discussing the impact of Hadoop 2.0. As part of the webinar, we asked our audience three polling questions. One of the questions asked participants which SQL-on-Hadoop methods they were most likely to use to access data stored in HDFS and HBase. We decided to ask this question based on the arms race occurring in the Hadoop ecosystem based on SQL. Here are the responses:
These results indicate that analytics tools lead, but Hadoop-specific tools trail significantly. Most responded they were most likely to use interfaces provided by existing analytics tool providers. Hive came in second, indicating increasing familiarity and comfort with the most established SQL-on-Hadoop interface. However, the Hadoop-specific BI specialists were tied for last. There are a few possible reasons for this apparent reluctance, like maturity, availability of skills and existing investment in analytics tools. However, the understanding that data exists in more places than just Hadoop may be a core factor in attendee responses. If your data lives in warehouses, marts, as well as Hadoop, your analytics tools must cope with that reality.