BlueKai CEO Omar Tawakol has an insightful blog post on AllThingsD: More Data Beats Better Algorithms — Or Does It? Recapping the mounting evidence that more data beats out better algorithms, he then (as the title suggests) unveils the hook that it’s the algorithmically-derived connections – or patterns – discovered in the data that really matter.
What does it really mean to compare data with algorithms? Omar cites compelling examples from Anand Rajaraman’s students at Stanford’s response to the Netflix Challenge, and Google’s famous PageRank algorithm, as cases where weak algorithms with more data beat more sophisticated algorithms with less data. While the goals were different, they share one important thing in common: the desire to predict what service users will find most useful, a crucial skill in marketing.
Omar’s central point is this:
“Algorithms shouldn’t be one-way filters that take data out and put them to use outside of the system. Rather, the algorithm output is itself data which enhances the data asset.”
This observation questions the idea that there’s a meaningful competition here and suggests a more useful way of thinking about data and algorithms. We often tend to think of data as a vast and volatile collection of facts and observations that can be expressed as cells in a big, virtual table (the bigger the better). Then we think of algorithms as sorting and searching through that data, joining and aggregating and finally rendering it in a way that can give us answers. The distinction between these two domains is often reflected in discrete organizations: those charged with the harvest and storage and provisioning of data (usually an IT function), and those charged with its algorithmic analysis and business application (a business function). These groups might even compete for enterprise budgets, making the relative importance of their contributions a material concern.
But, as Omar points out, the output of the algorithms is also data, which can in turn be processed by other algorithms in a layered path toward deeper understanding. Look-alike targeting, a staple of DMPs such as BlueKai, provides a foundational example.
Omar and others note that this is apparently how our brains work: they don’t just store raw sensory data for later analysis; they filter and integrate this data from the start using associations with memory and attention to reduce it to a level that’s manageable by our relatively limited capacity to notice and consider it. Most data (according to sources such as this and this and this) never makes it to the level of conscious consideration, although it often influences our decisions on an unconscious level. So, if the goal is to predict or influence how people will act, perhaps our methodology architecture should reflect this type of structure. Can an organization work like this?
Omar concludes by acknowledging, “If you have to choose, having more data does indeed trump a better algorithm. However, what is better than just having more data on its own is also having an algorithm that annotates the data with new linkages and statistics which alter the underlying data asset.” I would argue that, if you have to choose, maybe you’re on the wrong track. Algorithms can be simple or complex, costly or quick, but if they’re not thoughtfully integrated into your big data design architecture at many levels you might be focusing on the wrong things. Or deafened by the noise.