Gartner Blog Network


More on Security Data Lakes – And FAIL!

by Anton Chuvakin  |  August 29, 2018  |  Comments Off on More on Security Data Lakes – And FAIL!

Naturally, all of you have read my famousWhy Your Security Data Lake Project Will FAIL!” [note: Anton’s ego wrote this line :-)]

Today I read a great Gartner note on data lake failures in general (“How to Avoid Data Lake Failures” [Gartner access required]). Thus, I wanted to share a few bits that, in my experience, are VERY relevant to security data lake efforts I’ve seen in recent years. So:

  • “Proponents of data lakes often exaggerate their benefits by promoting them as enterprisewide solutions to all data and analytics problems.” – indeed, we’ve seen the exact same thing with security data lakes! Of course, then the reality hits: you build a huge pile of dirty data poo – and nothing else …
  • “Data lakes are rarely started with a definite goal in mind, but rather with nebulous aspirations […]” – same is often seen with security data lakes.
  • “Avoid confusing a data lake implementation with a data and analytics strategy. A data lake is just infrastructure […]” – this is pretty much what I said in the post.
  • “The popular view is that a data lake will be the one destination for all the data in their enterprise and the optimal platform for all their analytics.” – the paper later explains that, generally speaking, this is very false, becauses it rests on 3 false assumptions. This is false even if scoped down to all security relevant data.
  • The paper later describes several exciting FAIL scenarios, all of which I’ve seen with security data lakes. For example, “single version of the truth” as a failure scenario often means a single version of raw unusable data that nobody wants and nobody knows how to use.
  • Another “failway” is “Data Lake Is My Data and Analytics Strategy” with its juicy “ego-driven perspective on data lakes: they see them as means by which to be viewed as thought leaders […]” that result in all the useless data, none of the insight situation.
  • Yet another FAIL comes from “Infinite Data Lake” confusion. Imagine lots of useless data … now imagine a lot of useless data a year later. Two years. Five years. What is worse than unusable data? OLD unusable data that has even less context. NOW: useless. TWO YEARS LATER: that much more useless at huge hardware cost!
  • Finally, they close with: “The goal of gathering all data in one location was never truly achieved in the data warehousing world. It’s unlikely to be achieved in the data lake world, either […]”

Note that this post intentionally does not quote any of the recommendation from the paper. Sorry, but you have to read the paper for that (because policy).

Enjoy!

Related posts:

Category: analytics  big-data  security  

Anton Chuvakin
Research VP and Distinguished Analyst
5+ years with Gartner
17 years IT industry

Anton Chuvakin is a Research VP and Distinguished Analyst at Gartner's GTP Security and Risk Management group. Before Mr. Chuvakin joined Gartner, his job responsibilities included security product management, evangelist… Read Full Bio




Comments are closed

Comments or opinions expressed on this blog are those of the individual contributors only, and do not necessarily represent the views of Gartner, Inc. or its management. Readers may copy and redistribute blog postings on other blogs, or otherwise for private, non-commercial or journalistic purposes, with attribution to Gartner. This content may not be used for any other purposes in any other formats or media. The content on this blog is provided on an "as-is" basis. Gartner shall not be liable for any damages whatsoever arising out of the content or use of this blog.