Blog post

More on Security Data Lakes – And FAIL!

By Anton Chuvakin | August 29, 2018 | 0 Comments

securityanalyticsData and Analytics Strategies

Naturally, all of you have read my famousWhy Your Security Data Lake Project Will FAIL!” [note: Anton’s ego wrote this line :-)]

Today I read a great Gartner note on data lake failures in general (“How to Avoid Data Lake Failures” [Gartner access required]). Thus, I wanted to share a few bits that, in my experience, are VERY relevant to security data lake efforts I’ve seen in recent years. So:

  • “Proponents of data lakes often exaggerate their benefits by promoting them as enterprisewide solutions to all data and analytics problems.” – indeed, we’ve seen the exact same thing with security data lakes! Of course, then the reality hits: you build a huge pile of dirty data poo – and nothing else …
  • “Data lakes are rarely started with a definite goal in mind, but rather with nebulous aspirations […]” – same is often seen with security data lakes.
  • “Avoid confusing a data lake implementation with a data and analytics strategy. A data lake is just infrastructure […]” – this is pretty much what I said in the post.
  • “The popular view is that a data lake will be the one destination for all the data in their enterprise and the optimal platform for all their analytics.” – the paper later explains that, generally speaking, this is very false, becauses it rests on 3 false assumptions. This is false even if scoped down to all security relevant data.
  • The paper later describes several exciting FAIL scenarios, all of which I’ve seen with security data lakes. For example, “single version of the truth” as a failure scenario often means a single version of raw unusable data that nobody wants and nobody knows how to use.
  • Another “failway” is “Data Lake Is My Data and Analytics Strategy” with its juicy “ego-driven perspective on data lakes: they see them as means by which to be viewed as thought leaders […]” that result in all the useless data, none of the insight situation.
  • Yet another FAIL comes from “Infinite Data Lake” confusion. Imagine lots of useless data … now imagine a lot of useless data a year later. Two years. Five years. What is worse than unusable data? OLD unusable data that has even less context. NOW: useless. TWO YEARS LATER: that much more useless at huge hardware cost!
  • Finally, they close with: “The goal of gathering all data in one location was never truly achieved in the data warehousing world. It’s unlikely to be achieved in the data lake world, either […]”

Note that this post intentionally does not quote any of the recommendation from the paper. Sorry, but you have to read the paper for that (because policy).


Related posts:

Comments are closed