Over the last two years, several companies have rushed to get SQL-on-Hadoop products or projects to market. Having a familiar SQL interface makes the data stored in Hadoop more accessible, and therefore more useful to larger parts of the organization. Search, another capability broadly available from several Hadoop vendors, enables more use cases for a different set of audiences.
This rush for SQL-on-Hadoop has left the developer market effectively underserved. But here’s the reality: if you can’t accomplish your task with SQL or even Pig, it’s time to break out the editor or IDE and start writing code. That means writing MapReduce (or tomorrow, Spark?), which has its own challenges:
- Development tool support is fairly limited.
- Application deployment and management is lacking.
- Testing and debugging is difficult, if not impossible (the same can be said for just about any distributed system).
- Integrating with non-HDFS data sources requires a lot of custom code.
None of these are new or unknown challenges, and developers have simply dealt with them with mixed levels of success. But Hadoop is growing up. The workloads it handles are increasing in priority and complexity. Developers on Hadoop need the same empowerment as BI/analytics users.
This push for developer empowerment on the broader Hadoop stack went largely unnoticed at June’s Hadoop Summit, but a number of companies are filling this gap, such as Concurrent, Continuuity and BMC with its Control-M product. And the ubiquitous Spring Framework has several stories to tell, with Spring-Hadoop and Spring-Batch.
What’s interesting, at least to me, is the traditional Hadoop vendors are largely absent from empowering developers (except for Pivotal). Has the developer base been abandoned in favor of the enterprise, or is this a natural evolution of a data management application?
Update: Apparently Cloudera is leading the development of Kite SDK. Kite looks like a good start at addressing some of the pain points developers frequently encounter, such as building ETL pipelines and working with Maven.
Another Update: Milind Bhandarkar reminded me about Spring-XD.