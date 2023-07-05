I blogged recently about the high level of hype and confusion across Data and Analytics just a few months ago. Here is the original blog from March 2023: Summing Up Three Days at Gartner’s Data and Analytics Conference in Orlando, Florida, USA. The hype and confusion I registered in the US was also visible when I attended our D&A Conference in London, UK, a few weeks later. Interestingly the hype and confusion was markedly lower when I was in China just a few weeks after that.

One of the points of confusion is with catalogs – or data catalogs – or analytics catalogs or metrics stores. The fact that there are different names is one thing. Here I repeat what I wrote in the original blog:

Use cases for a data catalog

Analytics use cases are quite different to governance use cases. Too often they are conflated. Several 1-1s asked how to get business folks involved and excited to work with a data catalog in support of a governance program. That is the wrong question. For governance, business is already interested but they would only really care about what should be named a glossary. Even the data dictionary might be used selectively by a steward (in the business) for root cause analysis. Those are subsets of a much larger catalog. See Quick Answer: What Are Differences Between a Data Dictionary, Business Glossary and Data Catalog?

I was on a vendor briefing today with a vendor (who will be nameless) where they described what is in their “data catalog”. After I listened and probed, it seems there is a lot less “catalog” in the catalog! Here is what the vendor said, and I paraphrase: “The catalog is the place where we inventory your data. We also store all the history, lineage, policies, data owners, rules, conditions, and supporting data to help with governance.” This is not much of catalog at all, and is closer to being a D&A governance solution and stewardship solution.

To catalog, or not to catalog

As I noted above, there is a clear use-case for a catalog in analytics. When a business analyst or data science leader is building a model, they will often start out by searching a catalog for data. This may include a search for data sets and even previous analytics models that might be leveraged. These catalogs could be called data catalogs (for data or data sets), or metrics stores (for models and metrics). Why some part of the market started to use “metrics store” and another part went with “data catalog” or “analytics catalog” is beyond me. They are all catalogs. They are inventories of things.

What this vendor called “a data catalog for data governance” during today’s briefing was really speaking and selling to the D&A governance use case. And both use cases, analytics and governance, are being sold a catalog of some kind. This is the source of confusion. The governance use case is nothing like the analytics use case. No business role in their rightful mind sits there during a normal day and asks themselves, “I wonder what data to look at, today?” The world of governance is exception based. It is not about developing metrics and models for analysis – though there are some metrics and analysis that has to be done. That is confusing the point.

Being clear about the use case

The work we do in D&A governance and stewardship (setting and enforcing policy) should not take place “inside a catalog” at all but “inside a solution designed to serve the needs of governance and stewardship”. You would not use an EDW when you need ERP, nor would you talk about the database that sits at the foot of the ERP application; its included. Stewardship and governance solutions are what we should be referring too, not catalogs. Even if a data catalog plays a role deep inside the body of the governance and stewardship solution, it is not a catalog of any kind. ERP is not a database, though it contains one. We should split out and clarify the roles and use-cases for which technology is being thrown.

One last point: Back to the use cases. I am taking rather too many calls from clients who are “turning off” their catalogs and marketplaces. In the case of the catalog scenario is that the wrong catalog was sole. Perhaps a catalog that was originally deigned for analytics has been augmented with some governance capability, and sold into that use case. Some time later the end user realizes there is a gap between what they really need, and what they have. The reverse is also true too: selling a governance solution, that happens to include a catalog, into the analytics use case. That client also gets unhappy about 5 months into the project.

Ask what you can do for the user, not what the user can do for you

Business roles will not govern data in a catalog – they have no use, no need, and its just silly. They WOULD have some opportunity to govern the glossary! And guess what – there are other names that we have collectively invented over the years that tend to mean the same thing:

Glossary

Business metadata, and

Master data.

Those are the things business business folks would care about, since they use that data every day. The other 96% of data in the catalog has little meaning to them. And for the most part, never will.

Even more confusion

I saw a slide during the vendor briefing and it is quite consistent with general messages in the market. The slide contrasted privacy applications to security applications to governance applications. This is inconsistent. Privacy and security applications are examples of governance apps. More precisely they are all applications that address different governance policy classes. What was documented under governance apps by this vendor was really a mix of other types. One or two were policy focused solutions themselves, such as data quality.

There was also reference to stewardship solutions that should be used across all policy areas, including privacy and security. In effect there should be solutions that address a range of policy classes such as security/access, privacy, quality, retention, ethics and standards. See Effective Data and Analytics Governance Includes a Range of Policy Types.

Back to the future

After writing this blog, I asked myself: Surely I had written this blog before, yes? Yes, I had. Here is is: When is a Catalog No Longer a Catalog? This was from 2018. If you can afford the time, it will make you smile. Its the same message!