Gartner Blog Network

BYO Big Data Quality

by Svetlana Sicular  |  May 16, 2014  |  2 Comments

In the absence of best practices for big data quality, individual companies are coming up with their own solutions. Of course, these organizations first have problems.  Let’s look at the example from Paytronix, a cool company providing loyalty management for restaurant chains including my favorite Panera Bread. Paytronix is converging social, mobile, cloud and big data for its business (aha! The Nexus of Forces!).  And — by the way — cutting edge technologies help a lot to attract top talent to the company. But first things first, Paytronix had a big data quality problem, here is the description:

  • Over a quarter of their clients, restaurants, do not ask for age
  • Of those who ask age, 18% leave it blank
  • Of those who answer, approximately 10% are blatant liars

All of the above means that identifying families with kids is a huge challenge (spoiler: Paytronix successfully met the challenge). People with kids are younger.  They tend to fill the restaurants earlier in the evening. Check average is higher when orders include a kids meal (I confirm for our orders in Panera). That’s why restaurants often want to market to people with children: when they offer a kids meal coupon they get 25% more redemptions.  But!  What customers say is different than what they do. (Aren’t we all customers?)  In other words, here is the picture, instructive for parents:

Source: Paytronix

Big data quality is new and different: Traditional models do not work, familiar standards do not apply, typical metrics miss the mark. Most important, people’s mentality has to change when they assure quality of big data.  My colleague Martin Reynolds likes to cite, “most people are woefully muddled information processors who often stumble along ill-chosen shortcuts to reach bad conclusions.”  This quote appeared in Newsweek in 1987, BC (before Cloudera, the first commercial Hadoop distribution vendor). E.g. the problem is eternal although it wasn’t so widespread in data management because there was not much data management in 1987.  That Newsweek with the quote still advertised typewriters, best in the world. Wikipedia gives a daunting list of cognitive biases —each bias is a big data quality factor because quality applies to the resulting analysis, and to intermediate results, and to iterative data science.  In case of Paytronix, segmentation was biased. Biases also apply to data mashups: to evaluating granularity, trustworthiness and dependencies of participating data sets.  And sometimes, biases matter even to the absence or presence of particular data sources. Martin Reynolds shared with me the most astonishing example of cognitive bias.

Paytronix solved its big data quality problem by deciding not to change how people think.  They validated data by giving it in cubes in a familiar BI tool to good old people. By the way, crowdsourcing is another excellent big data quality method that relies on people. But this is a subject of my next post — I will tell what vendors are doing about big data quality, and even maybe about big data governance. As DBAs like to say, stay tuned.


Follow Svetlana on Twitter @Sve_Sic

Category: big-data  big-data-market  crossing-the-chasm  crowdsourcing  data  data-governance  data-paprazzi  hadoop  information-everywhere  innovation  inquire-within  skills  

Tags: big-data  big-data-adoption  cloudera  data-paprazzi  data-spy  end-users  hadoop-distribution  information-everywhere  innovation  pseudo-tweets  

Svetlana Sicular
Research Director
3 years at Gartner
21 years IT industry

Svetlana Sicular has a uniquely combined experience of Fortune 500 IT and business leadership, product management at world-class software vendors, and Big Four consulting. She primarily handles inquiries in the areas of data management strategy, ...Read Full Bio

Thoughts on BYO Big Data Quality

  1. […] BYO Big Data Quality […]

  2. Svetlana, great highlight of a practical yet innovative big data use case from Pentaho customer Paytronix. They are applying big data integration and analytics capabilities to the core of their operations. Exciting!

Comments are closed

Comments or opinions expressed on this blog are those of the individual contributors only, and do not necessarily represent the views of Gartner, Inc. or its management. Readers may copy and redistribute blog postings on other blogs, or otherwise for private, non-commercial or journalistic purposes, with attribution to Gartner. This content may not be used for any other purposes in any other formats or media. The content on this blog is provided on an "as-is" basis. Gartner shall not be liable for any damages whatsoever arising out of the content or use of this blog.