One of my projects this year wants to clarify how companies can secure the Hadoop Ecosystem. The Hadoop ecosystem is huge and has many moving parts that are internally managed by their own zookeeper :D.
The challenge is that firms who are going to use Hadoop have to secure an environment that is distributed by nature. Whilst map reduce and distributed computing have been existing for decades and are not new, the notion to secure distributed computing and High Performance Computing (HPC) environments is rather recent. The job for the security folks is daunting, no single human can understand all the components and interactions to ensure that the data is truly protected.
The Hadoop Security Research Note will answer questions such as:
- Are the levels of protection offered by Apache Sentry, Accumulo or RHINO enough to protect your data?
- Is the available authN/Z good enough to be compared with traditional data protection measures?
- Should you restrict the use of Hadoop components to only HBASE or Hive?
- What are the best practices to protect data across the board: from ingestion all the way down to the “cell”?
- Does the result then still do justice to the promise of Big Data?
However, as the blind seer in the movie (O Brother, Where Art Thou) tells us: “the treasure you seek shall not be the treasure you find.”, during my journey I have gained some interesting insights and would like to engage with people who have an opinion about this.
Got a Big Data security story to share? WIN or FAIL? Hit the comments or email me privately (Gartner client NDA will cover it, if you are a client).
Read Complimentary Relevant Research
Laying the Foundation for Artificial Intelligence and Machine Learning: A Gartner Trend Insight Report
Now more than ever, technical professionals must focus on developing the foundational components needed to support artificial intelligence...
View Relevant Webinars
State of Data Security
Warning: Your data is not all neatly defined, structured, organized and secured in your datacenter. Determining or defining the data...
Comments or opinions expressed on this blog are those of the individual contributors only, and do not necessarily represent the views of Gartner, Inc. or its management. Readers may copy and redistribute blog postings on other blogs, or otherwise for private, non-commercial or journalistic purposes, with attribution to Gartner. This content may not be used for any other purposes in any other formats or media. The content on this blog is provided on an "as-is" basis. Gartner shall not be liable for any damages whatsoever arising out of the content or use of this blog.