SearchDataManagement Asks: “Is it too soon for unstructured data governance?” I say, “It’s long overdue”.
I had no choice – as soon as I read the headline I had to click to read the article. I had received an email alert from SearchDataManagement that ran an article called, “Is it too soon for unstructured data governance?” The focus was big data – since much of that tends to unstructured. What is meant though by “unstructured” is actually that there are many different data, from different sources, and each may have no actual structure at all (unlikely) or its own structure (taxonomy, conformed dimensions, schema, metadata data model, semantic model etc). And so the “lack of governance” as a result actually means “no single lens or compliance information model for which all sources have to comply for entry (perhaps into the data lake). In other words, no information governance, no barrier to entry.
I actually commented on this problem recently in relation to the growing popularity with data lakes. See Making Sense of the Information in your Data Lake – adding structure.
I feel the question is a bit of a ruse. How can you hope to repeat or build on knowledge and insight gleaned from any analysis of your data if you don’t preserve some form of structure? The answer should be self-evident. What I think is different these days is the degree and form of information governance that should be overlaid our data lakes and stores. The classic Enterprise Data Warehouse (EDW) had too much; every data had to comply 100% to gain entry into the warehouse. At the other extreme a data lake itself has no barrier to entry – no conformed dimension requirements. What we need these days is something in between those two extremes and I have heard that same point from end-users recently who have worked with data lakes. So no, it’s not too soon for unstructured data governance. It is long, long overdue.
Comments or opinions expressed on this blog are those of the individual contributors only, and do not necessarily represent the views of Gartner, Inc. or its management. Readers may copy and redistribute blog postings on other blogs, or otherwise for private, non-commercial or journalistic purposes, with attribution to Gartner. This content may not be used for any other purposes in any other formats or media. The content on this blog is provided on an "as-is" basis. Gartner shall not be liable for any damages whatsoever arising out of the content or use of this blog.