Gartner Blog Network

Filling in the Blanks with Synthetic Data

by Andrew White  |  April 13, 2020  |  Comments Off on Filling in the Blanks with Synthetic Data

We have been using synthetic data for years. In the April 4th 2020 publication of the Wall Street Journal I read, ‘How the Census Bureau Fills in the Blanks’. This article explains how the US Census has evolved over the years and what processes it employs to collect data about its citizens.

It is a long and laborious process, often starting with a manual paper-based communication to households. Previous electoral rolls and other sources are used to create an initial list for those to be surveyed. One key purposes of the survey is to learn about where people live, which changes over time, in order to realign funding for things like public services.

Once we all receive our paper-based notification we are all encouraged to complete the survey online. I seem to remember that 20 years ago, two census’s ago, the survey itself was paper based. It was, and still is, short and sweet, simply tracking household members, their age, sex, and the proverbial range of ethnic options.

But such vast data collection efforts are notoriously bad. Response rates vary around the country but it will never be anywhere near 100% or anything like it. So what does the government do? They fill in the blanks! In order for all the funding allocations to add up to 100 (don’t they always?), the census columns also need to round up to 100%. So synthetic data techniques are employed to round up, round out and generally make stuff up.

I jest of course. As the article explains other sources are subsequently used to add or compare gaps, to determine what the gaps might be. For example, previous tax records might help validate or round out household membership. There are other sources too.

I suggested a few months ago they 2020 might be the year of synthetic data. See Will 2020 Be the Year of Synthetic Data?. With the Covid-19 and now economic crisis around us in full swing, I might have understated the point. Firms of all kinds are starting to accelerate cost optimization methods and preparing initial plans for possible investments in the opportunities that will emerge once economies re-start in uneven lurches. To do this, firms need more information, more insight, and faster and more autonomous decision making apparatus. There is not enough data that fills this need. There might be too much data and not enough you can use. Synthetic data might help fill in the blanks.

Additional Resources

View Free, Relevant Gartner Research

Gartner's research helps you cut through the complexity and deliver the knowledge you need to make the right decisions quickly, and with confidence.

Read Free Gartner Research

Category: data-and-analytics-strategy  

Andrew White
Research VP
8 years at Gartner
22 years IT industry

Andrew White is a Distinguished Analyst and VP. His roles include Chief of Research and Content Lead for Data and Analytics. His main research focus is data and analytics strategy, platforms, and governance. Read Full Bio

Comments are closed

Comments or opinions expressed on this blog are those of the individual contributors only, and do not necessarily represent the views of Gartner, Inc. or its management. Readers may copy and redistribute blog postings on other blogs, or otherwise for private, non-commercial or journalistic purposes, with attribution to Gartner. This content may not be used for any other purposes in any other formats or media. The content on this blog is provided on an "as-is" basis. Gartner shall not be liable for any damages whatsoever arising out of the content or use of this blog.