A debate resurfaced in our team recently as we were discussing plans for research topics next year. I am not actually sure why we do such things; the ‘year’ is an artificial thing we created to organize certain activities. Often it is more a strait-jacket that prevents continuous processes like innovation. But I digress.
We were discussion ideas and brainstorming how they connect and impact each other, and one of our team said, “The model doesn’t matter to me. It’s the data that counts.” I can tell you this comment came from someone in our team who covers the data side of D&A. I expected an immediate response from those on the analytics side of the team, but for four seconds they were held in check, perhaps stunned.
The silence was finally broken, and a rallying cry for how data is cheap to store, copy and move came to the rescue of the importance of models and analytics. But then the serious debate got going: what is most important as in a source of differentiation? We mused that the what is the source of differentiation changes over time; so, there is a when that matters too. We went further and concluded that for whom and when is the what a source of differentiation.
But I awoke Saturday morning playing though my mind several themes and topics I have been thinking about for some time. One concerned IBM and Oracle, and their data as a service (DaaS) strategy (compared to Amazon and Google’s IaaS strategy); the other was the techniques used to train ML models and algorithmic.
In the case of the first, I used to work on understanding Oracle’s strategy. It’s been over a year now since I was watching it closely, but they were clearly focused on moving both their core DB business to the cloud and also their apps business to the cloud. The point being the cloud here means infrastructure; not the largest revenue generating for Oracle but a strength of some of its competitors. However, Google and Amazon have no ERP or enterprise business apps. So. Oracle and SAP have a major advantage over Google and Amazon. Finally, out of interest, I felt the real battle might be in the PaaS space since that is where custom and confided apps as a service are assembled and that is where packaged apps and app. dev. connect. I have blogged on this: How to Talk Cloud to Business Leaders and The Battle for the Cloud Has Not Even Started Yet.
Either way, IBM and to a degree Oracle were acquiring or licensing data sources that would or could be used to train ML and other data science projects. In other words, IBM and Oracle were acquiring data sources that would be used in AI solutions. But what if IBM and Oracle ended up owning the data that all important business apps need? Would Amazon and Google have to concede defeat to Oracle and IBM? Would owning the data that fed the world’s algorithms lead to the algorithms being irrelevant? Is data the ultimate source of differentiation?
The other idea in my head clearly hearkens back to my previous role, and head of product strategy and new product development at a software company, many moons ago. There I learned a lot about neural networks and simulated annealing and how they worked. I also developed plans to bring to market neural network based causal forecasting engines. The training of those engines was critical. Guess what? This is all relevant today!
I woke up dreaming about running a software company developing alternative data sourcing strategies for pre-trained and training programs for ML engines. My subconscious was clearly wrestling with the question; does the data trump the analytic?
As it stands, with just one coffee out of the way and while watching Ireland stuff Samoa in the Rugby World Cup in Japan, it would appear this:
- If there is a single source of differentiation it would be data
- This single source idea may not be relevant for most market, most of the time. This is because there will be ample noise and variants in models and combinations of data and model for any use case.
- Given the uses of data, there will be ample opportunity and time for differentiation to be achieved along the conversion life-cycle of data:
- Raw data
- Understood data (e.g. described)
- Processed data (e.g. analyzed, or analytic)
- Decision (convergence of business process and analytic) or insight
- Execution (convergence of process and reality) or application
- There is also the emerging focus on synthetic data which is just as intriguing a thought
The question may become:
- How quickly can a model be used to extract value from the data?
If the time frame is short, the importance of model will rise as innovation in the model will be sought by the market. If the time frame is long, the importance of the data will rise.
And this question, speed to convert previous state to subsequent state could be applied to the conversion life-cycle of data. So, data is king, but it may not matter everywhere, to everyone, all the time. Only in the last instance. This is somewhat analogous to any production chain one learns in business, supply chain, or playing any sim-game. As such, rare earths represent very rare materials that drive critical production chains.
So. if there are rare earth elements, are there rare data? Is the coca cola formula a form of rare data? Does any company have control over critical rare data? Do you, as your companies CDO, have a handle on your companies, “rare data” or rare analytics or rare algorithms?
Anyway, I need a second coffee and the match is almost over. That’s enough thinking for one day.