Gartner Blog Network


Spark Restarts the Data Processing Race

by Nick Heudecker  |  July 6, 2014  |  4 Comments

It’s still early days for Apache Spark, but you’d be forgiven for thinking that based on the corporate sponsorship at Spark Summit. For the second conference for a very early technology, the list of notable sponsors is impressive: IBM, SAP, Amazon Web Services, SanDisk and RedHat. SAP also announced Spark integration with HANA, its flagship DBMS appliance. Other companies, like MapR and DataStax, also announced (or reinforced) partnerships with Databricks, the Spark commercializer.

Given the relative immaturity of this open source project, why are these companies – particularly the large vendors – rushing to support Spark? I think there are a few things happening here.

First, after building out integration with MapReduce, integrating with Spark was easy. SAP’s integration with Spark uses Smart Data Access, the same method used for MapReduce integration. I imagine only it’s a matter of time before similar integration occurs with Teradata’s QueryGrid or IBM’s BigSQL, among others. After all, this looks a lot like external tables, something the DBMS vendors have been doing for at least a decade.

The ease of integration only explains part of the sudden interest in Spark. More important is the need to not be left out of the next iteration in data processing. While Hadoop is an important component of any data management discussion today, it had a long road to credibility. Many vendors simply took a “wait and see” approach to Hadoop and they waited too long. Don’t think the same mistake will happen with Spark. Customers are less resistant to open source options, and large vendors need to get behind every project with momentum to compete with startups.

It’s too early to pick winners and losers. The incumbent vendors are upping their game, while much of the messaging coming from the Hadoop distribution vendors is confusing. However this shakes out, it should make a great show for the rest of 2014.

Category: big-data  

Tags: hadoop  spark  

Nick Heudecker
Research Director
4 years at Gartner
18 years IT Industry

Nick Heudecker is an Analyst in Gartner's Research and Advisory Data Management group. Read Full Bio


Thoughts on Spark Restarts the Data Processing Race


  1. […] Source: Spark Restarts the Data Processing Race […]

  2. […] leveraging YARN to enable a new, much broader set of use cases. (See Nick Heudecker’s blog for a recent assessment.) It has a commercializer in Databricks, which has shown great skill in […]

  3. […] leveraging YARN to enable a new, much broader set of use cases. (See Nick Heudecker’s blog for a recent assessment.) It has a commercializer in Databricks, which has shown great skill in […]

  4. […] leveraging YARN to enable a new, much broader set of use cases. (See Nick Heudecker’s blog for a recent assessment.) It has a commercializer in Databricks, which has shown great skill in […]



Leave a Reply

Your email address will not be published. Required fields are marked *

Comments or opinions expressed on this blog are those of the individual contributors only, and do not necessarily represent the views of Gartner, Inc. or its management. Readers may copy and redistribute blog postings on other blogs, or otherwise for private, non-commercial or journalistic purposes, with attribution to Gartner. This content may not be used for any other purposes in any other formats or media. The content on this blog is provided on an "as-is" basis. Gartner shall not be liable for any damages whatsoever arising out of the content or use of this blog.