Andrew White

A member of the Gartner Blog Network

Andrew White
Research VP
8 years at Gartner
22 years IT industry

Andrew White is a research vice president and agenda manager for MDM and Analytics at Gartner. His main research focus is master data management (MDM) and the drill-down topic of creating the "single view of the product" using MDM of product data. He was co-chair… Read Full Bio

Coverage Areas:

Why Applications Cannot Be Responsible for Data Quality (in other applications)

by Andrew White  |  December 3, 2009  |  6 Comments

Information Management ran an interesting article yesterday: “You Build it, You Break It, You Fix It: Why Applications Must Be Responsible for Data Quality”  The premise of the article is that business applications often create data for their own use, and this data is often used in other applications.  If the original application shares data of dubious quality then that original applications (or, more precisely, the designer of the application or the user of the application) should take it upon themselves to clean the data up before it is shared with another.

The article puts it this way:

When a system creates data, and when that data leaves that system, the data should be checked and corrected.  Bad data should be viewed as a hazardous material that should not be transported. The moment you generate data, you have the implicit responsibility to establish its accuracy and integrity.  Distributing good data to your competitors is unacceptable; distributing bad data to your team is irresponsible. And when bad data is ignored, it’s negligence.

I think this article is writing about a big problem but concluding a solution that is incorrect.  They key error (for me) emerges when the article concludes that one application can understand the context by which its output is used by another application.  When did applications (or designers of applications, or users of applications) concern themselves or worry about the context in which their data may or may not be used by any number of other applications, many of which were not designed before the first application was? 

If business applications are developed or deployed one at a time (like how most of the world works) we know for a fact that:

  1. data might be perfectly accurate and fit for use (in application A), and
  2. this can be of little value or interest for application B if the context changes

Is this a “fault” of the first application?  No, not necessarily.  It is hard to guess what my wife is thinking – even when she is talking to me about what is on her mind.  It’s hard for me to figure out what I am thinking about, half the time.  Is it a design flaw that leads to this issue?  I don’t think so.  I don’t think that 20 years ago we all got together and said, “hey guys, lets design applications in silos so that we confuse each other with different data definitions.”  It just happened – and it happened when we tired explicitly to stop it happening.  It is not a design flaw; it is a weakness in the view of what the design turned out to be.

I do not think that even in 2009 we can ask any application (amongst a peer of applications) to take upon itself the responsibility of “cleaning” up its data before it is shared.  No single application can understand what “clean” means in the context of other application use.  Data might be “100%” accurate – for one purpose – and be totally useless for another.  Why try to force every application to do something for which it was never designed to do?  Why replicate this responsibility across all these siloed applications?

I think we would all agree that context is key – that knowing the context for which information is being used is required, in order to determine if the information is fit for purpose.  But that context has to be supplied at the time of use (i.e. as part of the business applications, service, or inquiry).  Disciplines like MDM standardize and centralizes how master data is governed across the myriad contexts that exists across the firm.  That model seems to work conceptually, and work in the 6 years life cycle of MDM show that it is beginning to work well in practice too. 

Focusing on solving this problem, “one application at a time”, is “business as usual” and the sign of lethargy and inertia.  This approach will waste your money faster than a consulting project. 

However, I love the comment, “Distributing good data to your competitors is unacceptable; distributing bad data to your team is irresponsible.”  So I believe the goal of the writer is valid – just that architecturally the focus on applications (that cause the problem in the first place) to solve a problem that is dependent on context for which the application cannot understand, is false.

6 Comments »

Category: Application Architecture Data Quality MDM     Tags: ,

6 responses so far ↓

  • 1 Tweets that mention Why Applications Cannot Be Responsible for Data Quality -- Topsy.com   December 3, 2009 at 2:55 pm

    [...] This post was mentioned on Twitter by Jim Harris, Jeroen Blankendaal. Jeroen Blankendaal said: Interessant: http://blogs.gartner.com/andrew_white/2009/12/03/why-applications-cannot-be-responsible-for-data-quality/ [...]

  • 2 Bloemlezing 49 « De Kadenzer Courant   December 6, 2009 at 1:11 pm

    [...] vraag in DWH-land is altijd wie er nu eigenlijk verantwoordelijk is voor de kwaliteit van data. In dit artikel staat niet ‘het’ antwoord, voor zover dat bestaat. Maar het geeft in ieder geval aan [...]

  • 3 Evan Levy   January 7, 2010 at 4:57 pm

    Andrew-

    Thanks for taking the time to read and weigh in on my Information Management blog post. I figured I’d clarify my assertions a bit more relative to source systems owning data quality.

    You’re essentially saying that each system that creates data can’t be held responsible for another system mis-using or re-purposing the data differently from the way in which it was originally defined. No argument there. The responsibilities of the operational system shouldn’t include supporting data re-purposing any more than a furniture polish should be used as a dessert topping.

    However the real point I’m focusing on–and one I’m seeing in spades at client sites–is the data in question isn’t even meaningful in the context of originating application. For instance, customer names that contain a meaningless string of symbols, or U.S. zip codes with characters in them.

    The problem is sloppiness. Application developers documented their designs, are responsible for sharing their data, but then don’t adhere to the rigor that they themselves established. The whole point of development methodologies and packaged applications were to address these challenges. Data should be validated before it’s stored.

    This isn’t a problem of data sharing. This is a problem that the data in the native system is corrupt. The fact is that developers aren’t focused on code rigor or data accuracy. The data content is never standardized and validated before it’s even processed by its own application,

    The ramifications are real and tangible. People look at their credit reports and find addresses they never lived at. Mail is misdirected or undelivered. Penalties for noncompliance rise into the millions. Until data is meaningful and correct at the system of origin, there can never really be a “single version of truth.”

    Evan Levy
    Baseline Consulting

  • 4 Andrew White   January 7, 2010 at 7:02 pm

    Hi Evan,

    Thanks for responding. I read your response and have to agree with your analysis and conclusion. I also see a huge weakness in the quality of assurance at “source”. This week I sat with 4 organizations from different industries, all wrestling with (among other things) MDM issues; one key part of this is sourced in the original systems where master data (and other reference data) is authored.

    However, one issue that troubles me concerns the controls, edits, and rules that have, for many years, been designed into the various business applications that littler our organizations. The questions the users were asking was: if MDM is going to be used to assure “single view” of master data (agreed – in a shared environment), and as such, MDM will have to “mirror” the edits, rules and checks found in originating source systems, why bother ever creating such edits, rules and checks [for master data] in applications if they are to be assured elsewhere [in MDM systems]?

    This a great question. 99.9% of all business application and business intelligence tools, applications and solutions “do not play well with others”. We know that sharing master data is hard to do. Now we are talking about sharing business rules. Users are asking, don’t we need master business rule management?

    I think we do. And I think we already have a bunch of technology, tools, and disciplines that can help. Perhaps it will evolve between the link between business rules engines and MDM solutions…

    What do you think?

    Thanks again,
    Andrew

  • 5 Evan Levy   January 12, 2010 at 5:38 pm

    Hi Andrew-

    Your clients are probably making the same assumption that many of ours do: that the logic contained within business applications to address data accuracy is the same for master data. But MDM can be implemented at varying scopes or levels within a company. It’s not uncommon to implement MDM at an organizational level, and branch out incrementally to the enterprise.

    The business applications we’ve been talking about are built to automate specific business processes. But business processes are not enterprise in nature – they’re very specific. The challenge is understanding the scope of the MDM needs – and then determine if the scope of the data in individual business applications overlaps.

    If the goal of MDM is to align all systems so they have a consistent view of reference details and allow them to exchange data with one another, then it’s important to ensure that the business scope and data context is consistent. Our MDM clients have their biggest challenges when the desire to implement enterprise MDM assumes that all systems (departmental, organizational, enterprise, and even 3rd-party) can be adapted to use the new “enterprise master reference.” The trouble is that it’s impractical to assume that the information policies or definitions across these various systems are the same. In fact, we know they aren’t—otherwise we could already move data between systems without any problems.

    Anyone who’s ever implemented a big ERP solution has run into this, but the standard answer in those environments is to simply force everyone to conform. The whole premise of MDM is to be able to establish data standards that can be leveraged across multiple and different processes and systems that often, as you say, “don’t play well with others.” We don’t want to have to reengineer the business to establish master data. The benefit to MDM is that it meets the company where it is, idiosyncratic data and all!

    The whole point of MDM is to link the data – but loosely couple the systems and the processes. If data is going to be shared outside of the system of origin, then every accessible value needs to conform to the values definition. Data standardization is critical when data content is moved outside a system. Integrating various sources occurs through a translation process that is invisible and specific to an individual source system. The MDM hub has a set of centrally managed and administered rules that allow it to interact with each individual source system. When data moves between the hub and the system, these rules (containing transformation, conversion, and other logic) determine which data can be used and converted in order to be mastered.

    Part of the whole point of MDM is to allow a loose-coupling of systems. This means that while every customer is supposed to be identified by the MDM hub, not every application system has to modify its values to match the MDM hub’s values. As you point out in your blog post, it’s not practical for every system to change its data – changing the data could break the system. The focus of MDM is to allow the hub identify the customer (or product or location) based on the contents of the application(s). The idea is that the source system should be able to retain its unique, non-enterprise values to support its business processes. The MDM hub will use the values that it can to support enterprise-level needs but it knows that the source system can’t contribute to the mastering process for all data elements.

    This is why establishing consistent data management within each specific application system is necessary prior to implementing MDM. This means that some work has to be done to establish the right rules as you configure your MDM hub. But, as I’ve said in more than one presentation, you can’t master data that you haven’t managed.

    Evan Levy
    Baseline Consulting

  • 6 Why Applications Cannot Be Responsible for Data Quality (in other applications) « DQ Blogger   January 22, 2010 at 12:43 pm

    [...] source [...]