Blog post

The Role of Machine Learning on Master Data Management (MDM)

By Andrew White | February 22, 2017 | 3 Comments

MDMMaster Data ManagementMachine LearningDeep Learning

There is a lot of hype (as you know) related to Artificial Intelligence (AI), machine learning and specifically deep learning (complex neural networks). You also know (if you have been keeping up with the news) that we are all users of such techniques in many every day tools.  But recently the technology has gotten a little too close for comfort.  Some vendors in the data space, specifically focused on data quality, MDM and data management have started talking about how deep learning will change the use of those tools significantly.  At this point, I am not so sure.   I think there is great promise but, as with many technologies, we need to be clear how we plan to use them,

For example, deep learning might help us discover where our master data is kept.  Finding where our master data is, embedded copies all over the place inside and between business systems in a complex landscape of on prem and cloud apps is a hard task.  Deep learning might be able to “spot” where the most frequently referenced data reside (much as the famous cats were “discovered” in the YouTube experiments).  This same concept is what sits at the heart of tools (think of IBM’s Watson) that sifts through diagnosis or recipes and concertos as they break constituent elements down and “discover” (really, it’s a form of classification) each one.  But does this change MDM?

We have had access to semantic discovery tools for years.  But finding where our master data exists is not equal to MDM – it is just part of the overall set of tasks needed to sustain MDM.  In fact, there are two other tasks (among many others) that are much different and we don’t need, and cannot use, deep learning. The first task is “what is your master data” and the second concerns the enforcement of the policies that sustain it.   The former steps should take no more than an hour with the right business people in the room; you simply ask the business users such things as:

  • What is the most important data (at a conceptual level) that is needed to make business process A work as planned?
  • How much of this data is needed also to make business process B work as planned?
  • How much less data can you use to make business process C work as planned?

Once you get to the point where the business users are arguing over the 9th or 10th attribute, you are done.  Close the meeting with the miraculous conclusion that the 10th attributes are “it”.  Get on with MDM.  You don’t need to be perfect and you don’t need a consultant and you don’t need a long list of 20 different master data objects.  When you get into those heady situations you are not looking at master data at all – you are probably looking at shared data or application data.

The second task is at the other extreme; the enforcement of policy.  It is the work of policy enforcement that sustains the level of data quality and the effectiveness of the workflows executed to meet that data quality and business process KPI’s, that actually brings home the bacon with your MDM program.  The rubber meets the road in MDM not with the discovery of where master data resides (though that is a key step).  The rubber meets the road when you can manage the exceptions that would otherwise hold business processes hostage to data; and when you can assure the process integrity and drive outcomes improvements.  Deep learning can help an MDM program with the middle step – of finding out where the data might reside and that will help save a lot of time and money in the overall MDM implementation.  But let’s not get carried away with ourselves.  Deep learning will not make MDM go away.   We just need to keep our feet on the ground and understand the kinds of problems that deep learning can help with.

That’s my story and I am sticking with it (until you tell me otherwise) 🙂

The Gartner Blog Network provides an opportunity for Gartner analysts to test ideas and move research forward. Because the content posted by Gartner analysts on this site does not undergo our standard editorial review, all comments or opinions expressed hereunder are those of the individual contributors and do not represent the views of Gartner, Inc. or its management.

Comments are closed


  • Yves de Montcheuil says:

    Andrew, great insights, thanks for sharing them.

    I think there is another area where AI and more specifically Machine Learning can help, and that’s in the stewardship phase of MDM, where a data steward needs to make decisions on survivorship, record merging, applying sometimes empirical/intuitive rules of precedence.

    I surmise that with AI, we could go much further than the basic rules most tools implement today, address more cases – therefore sending less of these to the human expert. And of course, learn from the choices that the steward is making, to become more and more effective.

  • Andrew White says:

    Dear Yves,
    Thanks for stopping by and leaving a comment.

    Yes I think you are right. Deep learning should be able to spot patterns of behavior that worked before – and patterns that did not. But I would suggest given information stewardship includes this work of exception based problem solving, there are some high-value opportunities to “solve” this now. IT just needs to ask the business users. If there enough exceptions to analyze, then deep learning can help (perhaps later). But IT just needs to get business users on board first. Many organizations tend to over think this stuff, don’t you think?

    And note, in our definition of information stewardship apps, we already have provision for “playbooks”. These playbooks are just that – templates that describe workable steps from past experiences.

    Thanks again – Andrew

  • Yves de Montcheuil says:

    Hi Andrew,

    I absolutely agree, AI is just a tool that IT can use to further automate processes – but it does requires buy-in from business users.

    Where there is commonly a misperception though, is on the size the the “training set”. Most people assume that millions of records of past behavior are needed for Machine Learning to become efficient. This not the case: hundreds of cases already constitute a great starting point to train an engine.

    Thanks – Yves