By Mark Beyer and Cindi Howson
How do I get to my data faster? Every analyst in every organization asks this question almost as soon as they start moving up the skills curve. The answer is simple and direct—provide agility to the competent and protect the uninitiated from their over-confidence. Modern BI and analysis tools are highly agile—making them potentially more dangerous and even threatening to people invested in maintaining the status quo. Who are these skilled analysts? And, who is in jeopardy? Is this freedom from the data management elite oligarchy, or are we on the cusp of anarchy?
A few weeks ago, the BI team published a Technology Insight on what constitutes a modern BI platform. A few key characteristics include:
- No intervening semantic model required
- Data warehouse not a prerequisite
- Self-service data preparation for business users (different from full blown ELT)
The definition of a modern BI platform presents a stark contrast to what has defined Enterprise Reporting and Analytic platforms of the last two decades—and it is time for a change. Upheaval, maybe?
BI platforms and analysts THRIVE in the world of semantics. The existence of a semantic layer is part of what made BI platforms so great. It makes no difference if it is a universe in SAP BusinessObjects or a Framework Manager model in IBM Cognos, or something else. The semantic tier provides a nice, friendly business view of the data and ensures governance and re-usability. That is the theory. But data forms and formats are a semantic as well. Semantic to semantic interpretation is the key to communication and generally, an interpretation semantic is required to hook things together. When the interpretation tier is missing, the communication becomes “fixed” and gradually stale. The data warehouse was supposed to be an interpreting tier—between data at the source and data in the analysis use-case. Unfortunately, data models are “fixed” semantics as well. The 80/20 rule of data warehousing was born—eighty percent of analysis only uses twenty percent of the data and in a given, “fixed” model.
The BI semantic tier became unfriendly because users started pursing the other twenty percent of analysis, not easily done with the fixed data warehouse model; the result was convoluted representations of complex combinations of data from the warehouse with other systems and in varied models. Then came hub-and-spoke, and it soon was all about data management and not analytic outcomes. Rules of what could and couldn’t be done started to appear. The data managers lost the business user focus—they locked down the warehouse and effectively “kicked out” the twenty-percenters. Arguably, the highest value analysis is found in that twenty percent. It represents innovation, new insight as well as alternative outcomes and expands to include outliers as indicators or drivers.
A slight divergence, pet-peeve time. How many times have you heard someone say, “We are only arguing semantics?” Probably quite often. Let me give all IT professionals some very pointed advice—never say this in a data management meeting (and don’t say it in any meeting with business users, because they’ll quietly think, “this is why we don’t talk to IT”). Data is a semantic representation of information. It is a multi-level, abstracted representation of logical concepts that are attempting to identify a commonly held context. In other words, data IS semantics—IT professionals who dismiss semantics should start looking for new work.
So, we end up with two semantic designers—the analysts and the data managers—and they each have their favorite tools and architectural preferences. If your analysis demands a set of well modeled data that is pervasive in use, needs persistence for audit and trending and can tolerate even minimal latency to get it prepared—then even the most modern BI platform should leverage a repository where the data is ready and waiting. The Modern BI and Analytics Platform includes that.
But what if the company goes through a merger and you now have a new set of data? The business should not have to wait a year before they can analyze data from the acquired company. And what about external data sources, such as social data or economic data? That is a demand for multi-level semantics with at least one interpretation tier.
When the requirements are not understood, you need a model-as-you-go approach. Transform the newly acquired product codes and mappings as you analyze. This becomes the requirements definition for a more scalable solution down the road.
We all know that the most-skilled analysts are the best for developing those requirements. They have titles like “category manager” or “operations team lead” or “actuary” and more. Some of them are steeped in source system knowledge and can recognize the difference between a data error and an actual outlier with a glance. These users don’t need much interpretation in their semantics because they ARE the interpreter—they just need access to start developing those models. Will they use the same tools as more casual users? Will they use the same tools the same way? The answer is probably not. They need agility. They need to control their own semantic model—some way, somehow.
Databases have provided external file and table views for years. Data virtualization tools provide the capability to link logical models (designed and built by analysts, not IT) with physical data assets—even with any type of structure or the mythical “unstructured” data types. But, so can modern BI platforms. And familiar tools that can quickly explore and discover models into a robust, repeatable delivery are easier to use than data management tools. Tools such as Tableau allow for modeling on the fly, as you analyze. What we need in the modern BI platform is the next step: promote that user-centric model to a shared area, for re-usability and governance. Ideally leverage the smarts of the system to advise when a data model should be promoted, and allow for sharing at a granular level, down to the individual metric calculation. The difference is in the starting point. It’s all about semantics.
The data warehouse has come a long way, but bluntly, it never went anywhere until it had a face—BI tools were that early face. Now, data management and integration has to include new styles of deployment. Underlying infrastructure should support adept business users in pursuing their business focus and protect the novice from misinterpretation or mistakes. The challenge for information stewards will be to determine when this new data and new definitions should be shared with others, while also giving analysts the freedom to use tools and techniques to do their jobs.
The ability to promote user-generated models to a governed model is a key evaluation criteria for a modern BI platform. As a sweeping generalization, most are weak in this area and there is room for improvement across the industry. More of a pipe dream is the concept of open models, across tools (see The Rise of Data Discovery Has Set the Stage for a Major Strategic Shift in the BI and Analytics Platform Market http://www.gartner.com/document/3075117). Imagine a user creating a model in say Qlik, but then SAP BusinessObjects could consume it? Or vice versa? Or, let me take that data preparation process I created in Alteryx and extend it with Informatica’s PowerCenter (or Denodo through data virtualization). Most attempts for interoperability on the data models and data integration for BI platforms has been through third parties, with limited adoption.
But, even better, what if I could promote multiple models together and then run a rationalization process that examines user queries, frequency of use for data objects or even ties it to the Quarterly Earnings statement to detect the lag between new analysis and their effect on bottom-line results? This is the challenge for the modern analytics environment.
- You need a semantic tier somewhere. Why not the BI Platform just as much as the database or an independent semantic virtual tier? But modeling and deploying a repository is not necessarily the starting point.
- The data warehouse repositories can feed a BI tool, and a user-generated data model may design the repository.
- Modern BI tools give users flexibility. That doesn’t mean no governance. It means the governance model includes freedom, with business and IT as partners. You want the right amount of governance, which in some cases may be not at all (“no management required” is still a governance model).
- Make sure your governance model includes management that assesses at what point in time to promote content to a shared, governed area—and when it needs to stay right where it is.
Comments or opinions expressed on this blog are those of the individual contributors only, and do not necessarily represent the views of Gartner, Inc. or its management. Readers may copy and redistribute blog postings on other blogs, or otherwise for private, non-commercial or journalistic purposes, with attribution to Gartner. This content may not be used for any other purposes in any other formats or media. The content on this blog is provided on an "as-is" basis. Gartner shall not be liable for any damages whatsoever arising out of the content or use of this blog.