Merv Adrian

A member of the Gartner Blog Network





Coverage Areas:

Security for Hadoop? Don’t Look Now…

by Merv Adrian  |  January 21, 2014  |  19 Comments

“Not looking” at security and privacy seems to be the posture of people implementing Hadoop, based on recent data Gartner has collected. This is troubling, and paradoxical. In an era when the privacy of data, from government surveillance to medical record-keeping to “creepy” marketing initiatives and password breaches, has been in the news regularly, it is hard to understand why professionals implementing Hadoop are not paying attention.

The data here comes from a recent webinar I conducted with my colleague Nick Heudecker. We had over 600 attendees, and during the discussion we offered several polling questions. One had to do with barriers to Hadoop adoption. We had 213 responses to that question.

You can see the results below and two things leap out: only 2% of the respondents see lack of robust security as a barrier, and half of the respondents feel that they do not have a sufficiently defined value proposition. More on the latter in another post.

barriers to Hadoop

 

For me, the nearly non-existent response to the security issue is shocking. Can it be that people believe Hadoop is secure? Because it certainly is not. At every layer of the stack, vulnerabilities exist, and at the level of the data itself there numerous concerns. These include the use of external unveiled data and of data in file systems that lack any protection, and the separation of Hadoop initiatives in most organizations from IT governance. Add to that the kinds of use cases Hadoop is being pointed at: sensitive health care information personal data in retail systems; telephone usage; social media connection and sentiment analytics – all of them give us pause.

I’ve pointed to security as a key issue facing the Hadoop community in 2014 for some time now. The fact that awareness of the problem is not getting attention only reinforces my belief that we will see major problems as Hadoop goes mainstream.

19 Comments »

Category: Big Data Gartner Hadoop Security     Tags: , , ,

19 responses so far ↓

  • 1 Security for Hadoop? Don't Look Now… | All that Cuteness   January 21, 2014 at 1:00 am

    [...] By Merv Adrian [...]

  • 2 Security for Hadoop? Don’t Look Now&helli...   January 21, 2014 at 11:34 am

    [...] RT @merv: New blog post Security for Hadoop? Don’t Look Now http://t.co/Rdhz3BL7TF  [...]

  • 3 Owen O'Malley   January 21, 2014 at 5:26 pm

    Merv,

    Many of our customers, especially from large companies are extremely concerned with security and are pushing us to raise the capabilities of Hadoop. I suspect that your audience was strongly biased to the group that is trying to use Hadoop and not the ones trying to ensure the safety of the corporation’s data. One of the areas that is being worked on is adding encryption for data in HDFS, so that data at rest can be encrypted.

    As the architect of the effort to add strong authentication to Hadoop, I knew the effort succeeded when I was blocked from poking around a test cluster where I hadn’t been configured as an admin. If you know of security holes in Hadoop’s strong authentication, authorization, or auditing, please send a report to the project’s security list: security@hadoop.apache.org. We take security vulnerabilities quite seriously.

  • 4 Security for Hadoop? Don’t Look Now&helli...   January 22, 2014 at 5:23 am

    [...]   [...]

  • 5 Balaji Ganesan   January 22, 2014 at 6:47 am

    I would agree with Owen in that many customers have started looking seriously into security when considering Hadoop, a big change from a year or couple of years back when getting Hadoop deployed and managing it was the biggest priority. Data, for most companies, is their competitive differentiation factor and it is a no brainer that it needs to be well protected

    That said, there is a lot more work left for the community in building common frameworks for authentication, authorization and encryption. Specialist security companies can incorporate these frameworks in building advanced security features or what I call as “end of the mile” work.

  • 6 Merv Adrian   January 22, 2014 at 10:49 pm

    Thanks to you both for your comments. There is a big stack to consider when thinking about security for Hadoop. I do not by any means believe what’s there is not functional – and Owen, I know you’ve done great work on it. But there are plenty of other dimensions to be thguth of. And it’s a great concern that so many customers do not yet seem to be talking it seriously. Hopefully the ongoing conversation will help!

  • 7 Joe Travaglini   January 23, 2014 at 12:40 pm

    Security is a major concern for any substantial data oriented project – big data projects only amplify the stakes. I find it less likely that security “isn’t a concern” for the poll respondents, rather than it’s not their *main* concern. To say that “2% see security as a barrier” is misleading. Instead, the results of this poll only reiterate the nascent state of Hadoop and related projects on the Enterprise adoption curve. Once the majority can define their objectives and strategy, acquire the right resources, and figure out how to roll out Hadoop in a meaningful way, I would imagine security to bubble up the list.

  • 8 David Corrigan   January 23, 2014 at 2:44 pm

    Merv this is a great blog to highlight a real issue. We see this in our client base too. Many clients start off “innocently” piloting Hadoop as a research project. For some reason, a lot of companies think that pilots and research projects are more secure than production usage! It’s often only once they creep towards implementation that security is raised as an issue, often by the governance or security teams, and then it needs to be addressed. This can be a real speed bump in a Hadoop rollout. I’d argue that aside from compliance to regulations and avoiding data breaches, addressing security up front helps Hadoop projects get into production faster, by avoiding delays and rework when, inevitably, security is raised as an issue.

    Great blog on an important topic, Merv!

    Dave

  • 9 Merv Adrian   January 23, 2014 at 8:09 pm

    Dave, thanks for the kind words. Your observation reflects what we’ve seen for years in test data – it’s often copies of production data without the protection the production system has. Add to that the addition of outside data we know nothing about and you can see why it’s a recipe for problems.

  • 10 Vinay Shukla   January 23, 2014 at 8:32 pm

    Merv,

    Echoing earlier comments, I congratulate you on highlighting an important issues. Security has been an after thought for far too long. Often security isn’t thought of as a feature.

    Most projects apply security at the end.

    To stop this practice we need to actively engage the audience and figure out how we can apply security without sacrificing usability and performance too much.

  • 11 Jeff W   January 27, 2014 at 3:09 pm

    Mr. Adrian,

    I’ve been reading up on your great posts about Hadoop and wanted to ask if there were any materials (books, whitepapers, etc) you could recommend on Hadoop in a) general and b) related to security, as my company is seriously considering building a Hadoop ecosystem for data analytics. I have been asked to start performing a review on Hadoop, specifically the security of.

    Thank You,
    Jeff

  • 12 Owen O'Malley   January 27, 2014 at 4:28 pm

    Jeff,
    Here is a paper that we wrote about adding security to Hadoop:

    http://hortonworks.com/wp-content/uploads/2011/10/security-design_withCover-1.pdf

    You can also look at the original design:

    https://issues.apache.org/jira/secure/attachment/12428537/security-design.pdf

    Finally, here is a presentation that I did at Yahoo as we were rolling out the first secure Hadoop clusters.

    http://www.slideshare.net/oom65/hadoop-security-architecture

  • 13 Zettaset and MicroStrategy Deliver Secure, Big Data Insight #MSTRworld - WebPronto   January 28, 2014 at 9:03 pm

    [...] the security vulnerabilities that exist “at every layer of the stack,” according to a blog post penned by Gartner analyst Merv [...]

  • 14 Zettaset and MicroStrategy Deliver Secure, Big Data Insight #MSTRworld | cloudanalyst.net   January 28, 2014 at 9:19 pm

    [...] the security vulnerabilities that exist “at every layer of the stack,” according to a blog post penned by Gartner analyst Merv [...]

  • 15 Merv Adrian   January 29, 2014 at 3:56 am

    Owen, thanks for providing the deep technical look – anyone interested should dig in. It’s good to expose the efforts underway, and you and your colleagues are going to be busy for some time. Hopefully, over time, these capabilities will become the norm, and become features easily accessible, installed and managed. Today, one needs to go deep into this level of detail to ensure the protection enterprise customers will want. One hopes that they will become more cognizant of the fact that they should want.

  • 16 Jeff W   January 29, 2014 at 6:23 pm

    Owen,

    Thank you for the information you’ve provided. I will dig into them asap. Greatly appreciated.

    Jeff

  • 17 Jeff W   January 29, 2014 at 7:04 pm

    Owen,

    In an earlier post, you mentioned being worked on is adding encryption for data in HDFS. Is there an ETA for that?

    Thx again,
    Jeff

  • 18 Owen O'Malley   January 31, 2014 at 4:29 pm

    Jeff,
    I expect the crypto file system to land in trunk in March or so. (It is an encrypting file system that layers over HDFS so that users can choose to encrypt some directories and not other. You’ll reference it like:

    cfs://hdfs@mynn.example.com/my/path

    which will store encrypted data in:

    hdfs://mynn.example.com/my/path

    — Owen (owen@hortonworks.com)

  • 19 Merv Adrian   January 31, 2014 at 6:39 pm

    Thanks, Owen, for exposing this here so developers and architects can see it coming. Many of my clients tell me they wish they had more time to track stuff through the process, and the more channels the better!