Several discussions I have recently had indicate that people continue to have challenges with getting their enterprise data quality towards a more mature state.
I am seeing organizations continue to struggle with securing resources to improve the quality of their critical data. Often this is because they are unable to effectively communicate what it is that is actually broken, to the people who should care and are able to help them. First and foremost, the far too general concept of ‘good data quality’ must be translated towards what is appropriate data quality. In other words, data quality that is ‘fit for purpose for business operations’. Since business operations use measures to assess just how well (or poorly) they are doing, this means tying such measures (e.g. KPI’s) to the data created and consumed by business operations. So now we are getting closer to why and which data quality rules (in the world of possibilities) are the appropriate ones that then tell us what ‘fit for purpose’ actually means – using metrics.
A metrics-based approach to assessing data quality helps remove the assumptions, politics and emotion often associated with information, giving organizations something more factual on which to justify, focus and monitor their efforts. With this solid foundation of facts and focus, rather than unsubstantiated assumptions, misconceptions and apathy, we might actually begin to better manage the quality of our data. If you haven’t yet established which data is critical to your enterprise, then you absolutely should. And once you are clear on that, assess what ‘fit for purpose’ means for each of your data elements against certain more objective characteristics. such as:
- Accuracy (for now) — whether the data values being held reflect the properties of the real-world object or event that the data is intended to model. I say ‘for now’ because data is temporal, but I haven’t got the space here to explain why this dimension is actually more complex – maybe I should try it in Twitter 😉
- Consistency — whether the values of attributes managed or presented in multiple locations are the same
- Existence — whether a value is being held for a particular attribute
- Integrity — whether all expected relationships between data in multiple data stores, tables and files etc. are intact
- Validity — whether the values held fall within the allowable domain of values established for an attribute
There are actually quite a lot more data quality dimensions that you could use and the above are just in short-hand, but here’s the thing: choose the ones that make sense, are practical and that move you forward. You can get cleverer as you move forward. What I may do next week, is to follow-up on these dimensions with the more subjective ones and explain how these relate to trust.
If this has sparked an interest in data quality topics (this or others), do let me know. I am in fact endlessly fascinated about this subject and information in general, which inevitably leads to geek-off competitions with my lovely colleagues in Gartner.
Comments or opinions expressed on this blog are those of the individual contributors only, and do not necessarily represent the views of Gartner, Inc. or its management. Readers may copy and redistribute blog postings on other blogs, or otherwise for private, non-commercial or journalistic purposes, with attribution to Gartner. This content may not be used for any other purposes in any other formats or media. The content on this blog is provided on an "as-is" basis. Gartner shall not be liable for any damages whatsoever arising out of the content or use of this blog.