Back to Gartner for Marketers Blog

Back-to-School Special: Big Data Analytics

By Martin Kihn | August 26, 2015 | 2 Comments

More tragic than the dying bleats of summer is this sad story: Big Data is not intrinsically suited to analytics.

I’m sorry, but it’s true. Distributed storage and massive scale can make data hard to find. Lack of normalization and structure means an analyst often has a lot of hamster-wheeling to do just to know what the data is — much less combine it with other sources. Add to this the challenge of simultaneous batch and streaming processing and a legacy of marketing analysts who are whizzes with Adobe and Google Analytics but not SQL, much less statistical programming languages such as SAS and R.

Advanced analytics for marketing is not synonymous with Big Data, and vice versa. Adoption of pure Big Data analytics, such as MapReduce/NoSQL engines, remains below 5% across most industries and company sizes, according to Gartner’s 2014 survey of analytics spending intentions. A recent Gartner survey found that 80% of analytics use cases still require a traditional data warehouse. Early exceptions are seen in media, services and communications industries, which have been aggressive in building out marketing analytics teams and staffing up centers of excellence, according to Gartner’s Survey of Data-Driven Marketing, 2015. (Clients can enjoy it here.)

Many analytics techniques can and do make use of Big Data stores, generally by transforming it into structured or semistructured formats first. These include:

  • Data mining and predictive analytics
  • Text and speech analytics
  • Video analytics
  • Social media and sentiment analysis
  • Location and sensor analytics
  • Machine learning

Because it is often organized in unstructured files rather than structured tables, Hadoop originally did not work with SQL. And NoSQL databases offered little support for ad hoc queries from analysts. Even basic questions could (and can) require programming skills. Open source and commercial markets have been working feverishly to fill the gap. Apache Hadoop provides SQL capabilities via Apache Hive, which is based on MapReduce. Commercial technologies including Cloudera Impala and Hortonworks, among others, provide ways to access Big Data stores for analysis. Pivotal offers a platform-as-a-service suite called Hawq to analyze data lakes. Other B.I. vendors have developed platforms with Big Data support, including Alteryx, Qlik and Splunk Hunk.

Machine learning. An area of growing interest and relevance to marketers is machine learning. This is defined as the use of software to find high-order interactions and patterns within large amounts of data in ways that surpass human capabilities. Big Data stores can provide a wealth of historical data that has simply been too voluminous and unstructured for human analysts to interrogate. And the volume and complexity of data and interactions on social networks, across marketing and advertising channels, from mobile apps and sensors, such as in-store beacons – all are ripe for scrutiny by smart machines. Frameworks for machine learning, including the widely used open source machine learning library (MLlib), are also compatible with in-memory processing models like Spark. Companies such as FICO and Microsoft offer machine learning through SaaS.

Open source tools. Clearly, open source permeates Big Data storage and processing. The same is true of analytics. Commercial statistical programming languages such as SAS and SPSS have long been used by advanced marketing analysts, and are of course still relevant. Most analysts are also aware of open source tools widely used in the context of Big Data. These include R, Python and Weka (now Pentaho Data Mining). They provide an ever-evolving and powerful set of methods and an active global community of users that are continually adding features to support marketers’ needs.

Lower cost, certain features, and continual updates open source analytics tools an attractive complement to commercial software in some situations. (Common data exploration tools such as Tableau all support R.) For example, Audi USA’s digital marketing agency AKQA developed a number of R models to deliver more personalized images and content on the website for returning visitors, as well as to suggest options for the car configurator based on the users’ previous behavior.

Talent gaps. Advanced analytics talent is in short supply everywhere, and the “data science gap” is particularly acute in marketing. Gartner’s March 2014 survey of data-driven marketers found that 54% of organizations got big data analysts from internal development, 32% relied on consultants, and only 13% were able to bring on outside talent. Gartner clients report real problems with both training and retention.

The good news is that skills are becoming more common as analysts enroll in coursework or teach themselves how to fish. Big Data infrastructure and statistical languages are not easy to master, but they are relatively easy for an experienced analyst to start using. Online learning modules and communities such as StackOverflow abound. As skills improve at the same time that traditional marketing analytics tools add more capabilities in familiar interfaces, we expect the pain to subside.

Comments are closed

2 Comments

  • Will Thiel says:

    “Big Data Analytics” always makes me pause.

    A few months ago I posed this question to our analytics team: theoretically, what volume of information do we need in order to run any reasonable statistical or machine learning analysis with a robust confidence level? I specified a hypothetical business with a hypothetical big data store. I tried to engineer a complex, worst case scenario.

    The answer: 15GB. The hard disk capacity of a typical PC circa 2000. It seemed so absurd that I padded it with a factor of safety when I told other colleagues.

    It turns out that 15GB is actually a pretty massive amount of information. Just because data stores have gotten many orders of magnitude larger doesn’t mean the useful information contained within them has expanded all that much. All but a tiny fraction of that data is just repeating the same statistical information over and over.

    This makes me wonder if “Big Data Analytics” is even a rational undertaking – or should we focus our efforts on finding ways to first compress the glut of data into its critical 15GB? After all, this is the approach that evolution applied when faced with a similar challenge: analyzing streaming, multi-channel datasets that were bigger than “Big,” to make high-impact decisions, in real time, with slow, non-scalable, fault-prone, flesh-based computers.

    • Martin Kihn says:

      Great point Will – in addition to confusing “Big Data” and “advanced analytics” (they are not the same thing), I think a lot of marketers particularly are trying to get “big” for the sake of being big – Gartner has a prediction (I think it was last year) that 90% of the data in “data lakes” will never be used.