Gartner Blog Network

Will 2020 Be the Year of Synthetic Data?

by Andrew White  |  January 9, 2020  |  1 Comment

Digital is a reality.  It is evolving to accommodate and exploit AI to help automate work and drive innovation; and so the hype around AI and ML continues.  For an increasing number of organizations, leaders are beginning to ask, “Where is the beef?”  All the proof of concepts and investments need to start paying the bills.  See Toolkit: Presentation for Key Findings From the 2020 Board of Directors Survey.

With this in mind I was most interested to read this weeks’ Economist.  In there was the Technology Quarterly: A new revolution.  In the set of articles was one about China and AI: Data – a New Trinity.  This article was really interesting – as it explored the role and position China has in the use of AI.

One key point drawn out is that those uses of AI and ML that have access to data in all its kinds and forms and colors have an advantage over those that don’t.  Examples are given: one is a Chinese firm who apparently has 300,000 staff tagging data every minute of every day – to feed into its ML technologies to drive more effective AI solutions.  Such numbers are pretty scary.

But then I remembered synthetic data.  We notes in our recent Hype Cycle for Data Science and Machine Learning the following about synthetic data: Synthetic data is utilized in use cases where the available data is limited, incomplete or cannot be sourced easily. Simulation and generative techniques can be used to increase the available training data.  If you think about it, if the competition has more data than you, but you have some data you can work with, perhaps you can reduce any gaps with synthetic data.  So I wonder – when will synthetic data be (yet another) next big thing?

Other related blogs:

Additional Resources

View Free, Relevant Gartner Research

Gartner's research helps you cut through the complexity and deliver the knowledge you need to make the right decisions quickly, and with confidence.

Read Free Gartner Research

Category: machine-learning  synthetic-data  

Andrew White
Research VP
8 years at Gartner
22 years IT industry

Andrew White is a Distinguished Analyst and VP. His roles include Chief of Research and Content Lead for Data and Analytics. His main research focus is data and analytics strategy, platforms, and governance. Read Full Bio

Thoughts on Will 2020 Be the Year of Synthetic Data?

  1. Jordi Lopez says:

    For large tech firms like Google, Apple, and Amazon, gathering data is less of an issue compared to other companies. Indeed, they have an almost limitless supply of diverse data streams through their products/services, creating the perfect ecosystem for data scientists to train their algorithms. For smaller companies, access to these datasets is limited, expensive, or non-existent.

    In addition to solving AI’s data collection problem, businesses must also contend with intense competition.

    The reality is that the cost of data acquisition is high, and it keeps many from even starting. However, synthetic data can help change this situation. Synthetically generated data can help companies and researchers build data repositories needed to train and even pre-train machine learning models.

    Jordi Lopez

Comments are closed

Comments or opinions expressed on this blog are those of the individual contributors only, and do not necessarily represent the views of Gartner, Inc. or its management. Readers may copy and redistribute blog postings on other blogs, or otherwise for private, non-commercial or journalistic purposes, with attribution to Gartner. This content may not be used for any other purposes in any other formats or media. The content on this blog is provided on an "as-is" basis. Gartner shall not be liable for any damages whatsoever arising out of the content or use of this blog.