Blog post

Big Data Analytics Mindset – What Is It?

By Anton Chuvakin | November 18, 2013 | 2 Comments

securityphilosophyanalyticsData and Analytics Strategies

One common thread seen among those who actually do use big data tools and related analytic approaches for security is their analytic mindset. Not tools. Not algorithms. Not hoards of data scientists. Not methods, and not even specific approaches – but a mindset.

How do we define this mindset and turn it into something teachable to other organizations?

Let’s start here:

  • Are you into consuming security products or exploring data? Do you feel that you need a security appliance for everything?
  • Do you say “give me the data” or “give me out-of-the-box content, canned rules, signatures”?
  • Do you just want to be shown “what you need to know” or are you willing to figure what you need to know from the data you have?
  • Would you rather learn “what your data is trying to tell you” or “what latest stuff the vendors have on sale”?

At this point, if you just want “a box”, the path of big data analytics is not for you. Analytic mindset seems to determine the success of a big data initiative for security more than anything else. Those organizations that succeed with using big data for security are all subscribers to this view. They all state that for the foreseeable future, there will be no “boxed security big data analytics” products (except for some narrow and specific problems solved by specific tools).

Along the same line, somebody asked me one of those days “Do I need to toss out my SIEM and buy “a big data product”? – NO, SILLY!!! You need to try using your SIEM to actually analyze the data inside it…. If you analyze the data inside your SIEM to its maximum potential, then you may need to look beyond that into other tools and approaches. But start from data exploration, not from tool replacement!

Therefore, the best analytics “starter pack” is the one you can do on the data and tools you have. If you have RDBMS full of logs, flows or context data – start there. Leverage the data you have collected to make better decisions; use traditional BI tools on that database to see what emerges (some of the current ‘big data for security’ champions started like that). In fact, if all you have is Excel and bunch of exported reports – well, start exploring there!

The evolution then continues like this: ask questions of the data you have -> get a useful answer –> become more data driven –> gather more data –> ask more useful questions.

Organization then start to naturally “think data first”: new threat pops up? Let’s go into our data and see what is up, then create new analytic approaches to detect and investigate it – rather than start whining “what tool do I buy next?” No amount of Hadoop will give you big data analytics without a mindset. As I found out, this mindset and data curiosity is most important; by the way, mindset importance is also well-established for doing indicator hunting and anomaly detection, such as using network forensics and ETDR tools (also see Alert-driven vs Exploration-driven Security Analysis).

So, go and build your own data analytic discipline! Build analytic-centric and data-centric mindset – rather than buy or download any particular big data technology. Start data driven – not tool-driven (and, yes, Hadoop is a tool too – and the one often hard to implement, operate and utilize, especially in the absence of clarity of purpose or your goals). You cannot solve a mindset problem by buying technology; you need a mindset for leveraging data differently.

The only path is to shift the thinking, learn to be data-centric and data-driven and then solve problems that call for bigger data. Such culture change has to happen for the big data approaches to become pervasive across the industry. And yes, this includes willingness to explore, follow leads, and occasionally arrive at dead ends and algorithms that don’t work.

In fact, most of my questions about the particular algorithms aimed at those few (REALLY few!) organization that do advanced analytics on large-scale security data resulted in no single list of “top useful algorithms.” Machine learning (ML), Bayesian, clustering, various data mining and text mining methods were mentioned, but none were highlighted as “must use.” What was a must? Again, it was a mindset and willingness to dip into a toolbox of algorithms to throw at data…

Finally, some quick tips:

  • Got a SIEM? Go beyond vendor reports, run those queries direct to backend, extract and visualize.
  • Got a little other data relevant to security? Try open source mining tools, write scripts to analyze and profile data, and look at the data and see what it is trying to tell you …

To summarize, while conceptually, security is becoming a big data analytics problem, practically, it won’t become that for you if you keep investing in prevention and buying boxes.

There you have it! Now, GO EXPLORE YOUR DATA!!!

Related posts on the topic of big data for security:

The Gartner Blog Network provides an opportunity for Gartner analysts to test ideas and move research forward. Because the content posted by Gartner analysts on this site does not undergo our standard editorial review, all comments or opinions expressed hereunder are those of the individual contributors and do not represent the views of Gartner, Inc. or its management.

Comments are closed


  • Matthew Gardiner says:

    Totally agree. In short the way we say the above is the security analyst (at least some of them on the team) needs to evolve into a hunter of security related anomalies. These anomalies then prompt an investigation to see if they lead anywhere interesting. The tools, big data or otherwise, are used to aide the hunt and the investigation. The vast majority of hunters don’t need fancy algorithms yet (ML, Bayesian etc…), what they do need is a relatively large scale of data against which they can apply often fairly simple rules (let me know if any of my users “move” too fast with their consecutive VPN logins, let me know if any session is using SSL over an unusal port, or let me know if we get any emails that have suspicious links in them etc….). These are all simple algorithms that any IT person could understand how to calculate manually, but the trick is executing them in near real-time across tens of thousands of users and hundreds of thousands of sessions. And also the trick is to know what is unusual for your organization. “We don’t do business in Country X, so why do we have so much traffic to Country X?”

  • Matt, thanks for your insightful comment. Indeed, the hunting mindset trumps the specific algorithms. Sadly, it is also harder to impart upon people; harder to teach it. “use that box” is so much easier, yet no effective vs these threats on today’s messy networks….