Blog post

It’s all about the data – not the algorithm

By Andrew White | November 12, 2019 | 0 Comments

Data StrategyData OwnershipDataArtificial IntelligenceAlgorithm EconomyAI/ML

I have blogged in the past on the importance of data in our AI-infused world emerging around us – see Microsoft Targets Mother of All Professional-People Master Data with LinkedIn Buy.  I have explored how some large software vendors have been securing data sources to drive their AI engines; our own team evaluates (every few months) the difference in importance: data or algorithm. I even introduced the idea of ‘rare data’ (as in rare earth elements) I my blog the other day 0 see The Ultimate Source of Differentiation: Rare Data,

But an article in the front page of the US print edition of the Wall Street Journal there is an article today that adds fuel to the fire; Google secretly acquired access to an awe full lot of healthcare data. See Google amassed personal medical records.  This move (known internally at Google as project “Nightingale”) supports my earlier point that securing data as a source to power algorithms is ultimately the most competitive of weapons. If a firm can build up a treasure-trove of data, and ensure they have control over access and licensing, that firm will have a major advance.

In the healthcare business we are talking (today) about volume. We are not yet talking about rare data or synthetic data. We are talking about a wide variety of data types for thousands, even millions of patients, over time. All this data is needed to train the machine learning (ML) algorithms. Google is trying build competitive capability in using ML algorithms to improve patient healthcare outcomes.

In Prediction Machines, by Ajay Agarwal, Joshua Gans and Avi Goldfarb (2018 Harvard Business Review Press), the authors claim, “AI will increase incentives to own data.” (Page 178). They also say: “Imitation is easy. After you have done all the work of training an AI, that AI’s workings are effectively exposed to the world and can be replicated.” (Page 203)

So, there you have it. Data wins, ultimately. It may take time, and the rate at which data becomes more important than data science will vary across use cases and industries.


The Gartner Blog Network provides an opportunity for Gartner analysts to test ideas and move research forward. Because the content posted by Gartner analysts on this site does not undergo our standard editorial review, all comments or opinions expressed hereunder are those of the individual contributors and do not represent the views of Gartner, Inc. or its management.

Comments are closed