Gartner Blog Network


Uncle Sam Wants You (Data Scientists)!

by Nick Heudecker  |  February 27, 2015  |  Submit a Comment

This is a guest post from Gartner analyst Jim Hare.

Each Strata + Hadoop World conference has its own personality and this one was no different. The buzz at the San Jose venue was the fact that President Barack Obama had attended the event. Well, sort of: he delivered the opening keynote presentation via taped video. With fanfare, he told the attendees that:

a) The U.S. Federal Government had hired Dr. DJ Patil* to be its first Chief Data Scientist
b) Over 135,000 data sets have been made available by the government on data.gov**
c) And, that the Chief Data Scientist Office (CDSO) will focus initially on healthcare and climate change.

The question I’m asking myself, “Does the U.S. Government really need a Chief Data Scientist”? Having a Federal Chief Data Scientist sounds trendy but does the White House really needs its own? Don’t the other government agencies have one already? Frankly, I’m more impressed with the government making more and more data sources available. And, making the data open, accessible, and machine-readable. Of course, who knows about the quality of the data? But hey, at least it’s free.

What we really need is leadership to establish policies and ensure there continues to be open access to more public data sources. My worry is that this data.gov thing is just the current fad and it won’t receive the long-term focus and investments to make it useful. Time will tell.

And, speaking of Data Scientists, isn’t there a shortage of quants in the industry? And yet, the Feds are actively hiring data scientists to join the White House U.S. Digital Service: http://www.whitehouse.gov/digital/united-states-digital-service

Shouldn’t the government be helping academic institutions graduate more data scientists instead of pilfering the ones we already need?

Other Observations from Strata + Hadoop World

1. Who’s the audience for Strata these days?

I’m trying to figure out who is the target audience for Strata. It seems the conference chairs are trying to cast a wide net and attract a wide audience of both big data practitioners and business types. If that’s their objective, it is time to run Strata as two separate programs – one for business, one for techies.

While there were several tracks focused on Hadoop (e.g. Hadoop Platform, Hadoop in Action, Hadoop and Beyond), the real interest was around Spark and Machine Learning to analyze the data stored in Hadoop (HDFS). Today’s Hadoop conversation is now more about how to make it operational and manageable.

2. Open Data Platform (ODP) Initiative – what’s that?

With much controversy (and skepticism), a group of vendors announced that they had joined forces to do something about the “open but fragmented” nature of Hadoop. The goal of the initiative is to optimize testing between the ecosystem’s vendors to help enterprises implement big data apps faster. I’m all for better testing and allowing organizations to mix and match technologies and have confidence that the vendors have done interoperability testing. But do we really need another “initiative”? Couldn’t these vendors simply have said to each other, “Hey, here’s a copy of my software. Give me a copy of yours and let’s both test to make sure they work together with Hadoop?”

If you want to get more thoughts about ODP, my colleagues Nick and Merv wrote an excellent and amusing blog post titled “Who asked for ODP?” Check it out: https://blogs.gartner.com/nick-heudecker/who-asked-for-odp/

3. Where were the attendees?

With the exception of the Opening Reception and Booth Crawl, the Expo floor seemed to have less traffic than last fall’s Strata event in New York. The booths that seemed to have more traffic included vendors that offered data preparation, data visualization, and real-time analytics. In spot checking with several people, attendees were maximizing their conference time at the breakout and training presentations.

4. Where is the ROI in big data?

While it was great to see a whole track dedicated to the business side of big data, I was disappointed in lack of specific ROI data in the breakout presentations. For big data to really gain traction, businesses need to know more than what is possible with big data but real-world ROI data to help them build their business cases. Hey, how about a couple of new business tracks at the fall Strata Conference: One called the “Business ROI of Big Data” presented by actual customers using big data in production and another focused on “Big Data Failures and Lessons Learned”. These types of tracks would not only increase the number of business attendees (project sponsors) but also get more big data projects approved.

* You can read more about the mission of the Chief Data Scientist: http://www.whitehouse.gov/blog/2015/02/19/memo-american-people-us-chief-data-scientist-dr-dj-patil

** Fact check: data.com website states 117,212 data sets available

Additional Resources

Predicts 2019: Data and Analytics Strategy

Data and analytics are the key accelerants of digitalization, transformation and “ContinuousNext” efforts. As a result, data and analytics leaders will be counted upon to affect corporate strategy and value, change management, business ethics, and execution performance.

Read Free Gartner Research

Category: data-and-analytics-strategies  

Tags: data-science  strata  

Nick Heudecker
Research Vice President
5 years at Gartner
19 years IT Industry

Nick Heudecker is an Analyst in Gartner's Research and Advisory Data Management group. Read Full Bio




Leave a Reply

Your email address will not be published. Required fields are marked *

Comments or opinions expressed on this blog are those of the individual contributors only, and do not necessarily represent the views of Gartner, Inc. or its management. Readers may copy and redistribute blog postings on other blogs, or otherwise for private, non-commercial or journalistic purposes, with attribution to Gartner. This content may not be used for any other purposes in any other formats or media. The content on this blog is provided on an "as-is" basis. Gartner shall not be liable for any damages whatsoever arising out of the content or use of this blog.