To assure you that Gartner analysts are human and have a wry sense of humor, last Friday I was wrapping up my day. I had just taken a briefing from a vendor that sports a ‘managed data lake’ platform. Well, if you ever want some free entertainment and you are all out of binge watching your favorite Netflix program, just ask a Gartner analyst, “What is a platform, anyway?” If you didn’t know, this is an easy jibe to get the mind of the analyst going.
Well, it was Friday, and I was headed out. But as I sat there processing the last of emails, the message this vendor had put into m head created a graphic. I could not help myself so I put the graphic down, and in so doing, a specific image came to mind. The background goes like this – and this is not exactly scientific let along “correct”:
- A data lake is really just a staging area for data between a number of sources and some kind of consumer/consumption. We have had staging areas before. However, some vendors – and so some clients – are confusing a “staging area” as if it was the ends, not the means, to something else.
- A data lake, by definition, is un-managed data. Now, what does this mean? It was meant to imply that there is no serious amount of control or governance (policy setting) or stewardship (policy enforcement). Clearly by the fact that there is a store of some kind there is some management – but maybe you get the idea.
- Some vendors started to offer cool tools to access the data lake and offer up insights to analytics types. Think of these as “analytic apps”. These are not unlike, conceptually, the same “analytic apps” that were built on top of data warehouses in the past
- Some vendors went further and offered up entire platforms – that is – sets of broad capability that could serve up any number of analytical uses. These platforms ended up developing a range of services that, to all intense and purposes, offer “management” capability of the lake. The funny thing about this is that the capabilities look unerringly like the same capabilities we all tried to deploy on top of our data warehouses. And you know how well that went.
- This last batch of vendors (and I digress here for a moment) also “discovered” the idea of ‘data discovery’ and in so doing they thought they had invented slides bread. Of course, the reality is that we had access to “semantic discovery” tools long ago. But at least this new lot had some interesting capabilities with AI and inference. Anyway, back to the main story…
So I a picture came into my head that tried to capture the difference between the data lake itself (all the data gathered together) and that part of the data lake that was subject to “management” (according to the vendor I last spoke with). So here is the image I came up with:
Since the graphic reminded me of a cake, I came up with the glib line, ‘from data lake to data cake’. I then realized that I had missed the more recent “analytic platform” vendors so I came up with an adaptation that looked like this:
These I then sent to one of our email distribution lists “for fun”. After all, it was Friday evening. Well, the humor of my colleagues was over flowing. Here are some of the nice ideas that came back. From Mike Rollings we had the data cake “tug boat”:
And to top off the evening’s entertainment we had this from Doug Laney, the ultimately embellished data lake (er, cake) concept;
Comments or opinions expressed on this blog are those of the individual contributors only, and do not necessarily represent the views of Gartner, Inc. or its management. Readers may copy and redistribute blog postings on other blogs, or otherwise for private, non-commercial or journalistic purposes, with attribution to Gartner. This content may not be used for any other purposes in any other formats or media. The content on this blog is provided on an "as-is" basis. Gartner shall not be liable for any damages whatsoever arising out of the content or use of this blog.