Gartner Blog Network


A Data Lake without any information governance is a data cesspool

by Andrew White  |  September 9, 2015  |  3 Comments

I had the good fortune of speaking at two briefings last week – one in Dallas and the other in Houston.  The sun was out, the heat was on, and the conversations with clients was awesome.  I was presenting on sustaining and updating a business relevant digital information strategy.

As part of the briefing the analyst attending also participate in 1-1’s with attendees.  So even though I spoke as part of the event, I also took part in a number of 1-1’s in both locations.  All told I had about 20 1-1’s or individual conversations with attendees.  I was amazed to note that in about 75% of the conversations, “data lakes” were mentioned.  Sometimes this was the main focus of the conversation; other times it was part of the wider need to manage, govern and/or exploit information but also analytics.

What was even more interesting was the emergent uniformity in question and understanding of the data lake and the lack of information governance.  It seems the idea that a data lake is just a place to collect data – all kinds of data in any state – is becoming quite widespread.  For those firms that have played with a data lake another notable discovery pops up quite quickly:

  • A data lake does not, alone, bring with it any capability to support the broad topic of information governance; and with no information governance, the ability to re-use and exploit further insight on someone else’s work, cannot take place.
  • Vendor’s in and around the data lake space are all now talking about “governance” or “information governance” – even if it means a PowerPoint slide update.  In some cases it might go as far as talking about metadata management and even, if you are lucky, “data lineage”.

It seems we have quite a bit of discovery to develop:

  • What exactly is information governance?
  • What range of technology capability is needed to sustain information governance?
  • How does information governance change in a data lake/big data situation versus a traditional data warehouse, or even operational business application environment?

This will keep us all quite busy for a while, I would think.

Additional Resources

Category: advanced-analytics  analytics  business-applications  business-intelligence  dark-data  data-and-analytics-strategies  data-lake  

Andrew White
Research VP
8 years at Gartner
22 years IT industry

Andrew White is a Distinguished Analyst and VP. His roles include Chief of Research and Content Lead for Data and Analytics. His main research focus is data and analytics strategy, platforms, and governance. Read Full Bio


Thoughts on A Data Lake without any information governance is a data cesspool


  1. One important aspect of governance is to have clarity over which user has access to which data. Using technology that provides the greatest level of granularity to match every business needs in critical. This also feeds directly into better security – what we referred to as data-centric security at BlueTalon.

  2. Andrew White says:

    Hi Isabelle,
    Thanks for dropping by and leaving a comment. I agree with you – user access is certainly part of an overall information governance framework.
    Thanks again,
    Andrew

  3. Nice piece. Elsewhere on the web Martin Fowler of Thoughtworks discusses the important distinction between the Data Lake and the Lake Shore where Lakes are handling raw data and shores are handling curated information.

    The DataLake/LakeShore approach is the evolution of the traditional datawarehouse model- the Operational Data Store is the data lake (Hadoop) and datawarehouse is morphing into lake shores thanks to the enhanced capabilities of NoSQL databases such as graph and search. Lake shores will power systems of insight and machine learning capabilities because it represents contextual governed data.



Comments are closed

Comments or opinions expressed on this blog are those of the individual contributors only, and do not necessarily represent the views of Gartner, Inc. or its management. Readers may copy and redistribute blog postings on other blogs, or otherwise for private, non-commercial or journalistic purposes, with attribution to Gartner. This content may not be used for any other purposes in any other formats or media. The content on this blog is provided on an "as-is" basis. Gartner shall not be liable for any damages whatsoever arising out of the content or use of this blog.