“Not looking” at security and privacy seems to be the posture of people implementing Hadoop, based on recent data Gartner has collected. This is troubling, and paradoxical. In an era when the privacy of data, from government surveillance to medical record-keeping to “creepy” marketing initiatives and password breaches, has been in the news regularly, it is hard to understand why professionals implementing Hadoop are not paying attention.
The data here comes from a recent webinar I conducted with my colleague Nick Heudecker. We had over 600 attendees, and during the discussion we offered several polling questions. One had to do with barriers to Hadoop adoption. We had 213 responses to that question.
You can see the results below and two things leap out: only 2% of the respondents see lack of robust security as a barrier, and half of the respondents feel that they do not have a sufficiently defined value proposition. More on the latter in another post.
For me, the nearly non-existent response to the security issue is shocking. Can it be that people believe Hadoop is secure? Because it certainly is not. At every layer of the stack, vulnerabilities exist, and at the level of the data itself there numerous concerns. These include the use of external unveiled data and of data in file systems that lack any protection, and the separation of Hadoop initiatives in most organizations from IT governance. Add to that the kinds of use cases Hadoop is being pointed at: sensitive health care information personal data in retail systems; telephone usage; social media connection and sentiment analytics – all of them give us pause.
I’ve pointed to security as a key issue facing the Hadoop community in 2014 for some time now. The fact that awareness of the problem is not getting attention only reinforces my belief that we will see major problems as Hadoop goes mainstream.
The Gartner Blog Network provides an opportunity for Gartner analysts to test ideas and move research forward. Because the content posted by Gartner analysts on this site does not undergo our standard editorial review, all comments or opinions expressed hereunder are those of the individual contributors and do not represent the views of Gartner, Inc. or its management.
Comments are closed
Many of our customers, especially from large companies are extremely concerned with security and are pushing us to raise the capabilities of Hadoop. I suspect that your audience was strongly biased to the group that is trying to use Hadoop and not the ones trying to ensure the safety of the corporation’s data. One of the areas that is being worked on is adding encryption for data in HDFS, so that data at rest can be encrypted.
As the architect of the effort to add strong authentication to Hadoop, I knew the effort succeeded when I was blocked from poking around a test cluster where I hadn’t been configured as an admin. If you know of security holes in Hadoop’s strong authentication, authorization, or auditing, please send a report to the project’s security list: email@example.com. We take security vulnerabilities quite seriously.
I would agree with Owen in that many customers have started looking seriously into security when considering Hadoop, a big change from a year or couple of years back when getting Hadoop deployed and managing it was the biggest priority. Data, for most companies, is their competitive differentiation factor and it is a no brainer that it needs to be well protected
That said, there is a lot more work left for the community in building common frameworks for authentication, authorization and encryption. Specialist security companies can incorporate these frameworks in building advanced security features or what I call as “end of the mile” work.
Thanks to you both for your comments. There is a big stack to consider when thinking about security for Hadoop. I do not by any means believe what’s there is not functional – and Owen, I know you’ve done great work on it. But there are plenty of other dimensions to be thguth of. And it’s a great concern that so many customers do not yet seem to be talking it seriously. Hopefully the ongoing conversation will help!
Security is a major concern for any substantial data oriented project – big data projects only amplify the stakes. I find it less likely that security “isn’t a concern” for the poll respondents, rather than it’s not their *main* concern. To say that “2% see security as a barrier” is misleading. Instead, the results of this poll only reiterate the nascent state of Hadoop and related projects on the Enterprise adoption curve. Once the majority can define their objectives and strategy, acquire the right resources, and figure out how to roll out Hadoop in a meaningful way, I would imagine security to bubble up the list.
Merv this is a great blog to highlight a real issue. We see this in our client base too. Many clients start off “innocently” piloting Hadoop as a research project. For some reason, a lot of companies think that pilots and research projects are more secure than production usage! It’s often only once they creep towards implementation that security is raised as an issue, often by the governance or security teams, and then it needs to be addressed. This can be a real speed bump in a Hadoop rollout. I’d argue that aside from compliance to regulations and avoiding data breaches, addressing security up front helps Hadoop projects get into production faster, by avoiding delays and rework when, inevitably, security is raised as an issue.
Great blog on an important topic, Merv!
Dave, thanks for the kind words. Your observation reflects what we’ve seen for years in test data – it’s often copies of production data without the protection the production system has. Add to that the addition of outside data we know nothing about and you can see why it’s a recipe for problems.
Echoing earlier comments, I congratulate you on highlighting an important issues. Security has been an after thought for far too long. Often security isn’t thought of as a feature.
Most projects apply security at the end.
To stop this practice we need to actively engage the audience and figure out how we can apply security without sacrificing usability and performance too much.
I’ve been reading up on your great posts about Hadoop and wanted to ask if there were any materials (books, whitepapers, etc) you could recommend on Hadoop in a) general and b) related to security, as my company is seriously considering building a Hadoop ecosystem for data analytics. I have been asked to start performing a review on Hadoop, specifically the security of.
Here is a paper that we wrote about adding security to Hadoop:
You can also look at the original design:
Finally, here is a presentation that I did at Yahoo as we were rolling out the first secure Hadoop clusters.
Owen, thanks for providing the deep technical look – anyone interested should dig in. It’s good to expose the efforts underway, and you and your colleagues are going to be busy for some time. Hopefully, over time, these capabilities will become the norm, and become features easily accessible, installed and managed. Today, one needs to go deep into this level of detail to ensure the protection enterprise customers will want. One hopes that they will become more cognizant of the fact that they should want.
Thank you for the information you’ve provided. I will dig into them asap. Greatly appreciated.
In an earlier post, you mentioned being worked on is adding encryption for data in HDFS. Is there an ETA for that?
I expect the crypto file system to land in trunk in March or so. (It is an encrypting file system that layers over HDFS so that users can choose to encrypt some directories and not other. You’ll reference it like:
which will store encrypted data in:
— Owen (firstname.lastname@example.org)
Thanks, Owen, for exposing this here so developers and architects can see it coming. Many of my clients tell me they wish they had more time to track stuff through the process, and the more channels the better!