Blog post

The Logical Data Warehouse and its Jobs to be Done

By Henry Cook | January 28, 2018 | 1 Comment

Data and Analytics LeadersData Management Solutions

Over the holidays I reread parts of the fabulous books on innovation and disruptive change by Clayton Christensen. If you have not read  The Innovators Dilemma, The Innovators Solution or Competing Against Luck then I can highly recommend them. Systems architects will be interested in his discussion on monolithic and modular architectures, and their relationship to the dynamics of innovation and commoditization. I wrote up notes on how his ideas applied to the Logical Data Warehouse(LDW). It  wasn’t obvious which paper I might put this in, then my friend and colleague  Mei suggested it would make a good topic for a blog – so here it is! I’ll summarize the theory, then show how relates to the evolution of the LDW. I think it explains a lot of what we see.

What are Jobs to be Done?

Classic disruptive innovations include Sony with their Walkman, Kodak competing with digital photography, and Apple’s iPhone that had the best product launch presentation ever.  Christensen’s theory of  “jobs to be done” revolutionized thinking about innovation. His thesis is that customers buying a product or a service do this because they wish to make progress with a “job” or problem in their life. They wish to “hire” something to enable them to make progress with that job.

Someone may want something to keep them entertained in a dentist’s waiting room. They can “hire” the New York Times, or a freesheet newspaper, smartphone or a pack of playing cards. The newspaper is not just competing by being a better written and more informed newspaper. For this job, it competes against completely different products to fill that need in the customer’s life.

Christensen observes that market incumbents often compete by adding  features, attempting to differentiate themselves and justify a higher price. However, by doing this they can also become overqualified for newly emerging jobs, and too costly.

For example, companies selling photographic film and equipment competed by increasing the quality and features of their cameras. Then the disposable camera was invented. Many people wanted take pictures inexpensively on trips and at parties. They were not expert photographers, so they did not value the expert features of existing cameras and were not prepared to pay for them. However, they were perfectly happy to buy some inexpensive film (which happened to have a camera wrapped around it). The first film manufacturer to realize this succeeded in greatly expanding their market, but not by turning casual users into expert photographers. Instead, they found and filled a latent need, a new job to be done.

The Jobs to be Done by an LDW

LDW Jobs to be Done 1
Job #1 – The Data Warehouse Job to be Done

We see a similar pattern in the Data Management Solutions for Analytics (DMSA) market space. There were already several vendors with richly featured products doing “Job #1” for these systems – the analysis of terabytes of structured data.

However, with the arrival of so called “big data” a shift occurred. Loading the new, less structured data types into the existing RDBMS products would be hard, because of the need to define the data models in advance. It would also be prohibitively expensive to hold and process the very large volumes, possibly  petabytes, of data using the existing software licensing models. The arrival of this data created a new “job to be done” Job #2. People found that the newer NoSQL / non-relational systems were more suited to holding and analyzing this data. Open source map-reduce style software fitted this set of requirements from a functionality and cost point of view.

Job #2 - The Job to be Done of the Data Lake
Job #2 – The Job to be Done of the Data Lake

Customers who needed to do job #2 were not concerned about all the other jobs that a traditional RDBMS would do,  Job #1. After all, they already had the existing RDBMS systems that did that.

Organizations added Hadoop, and other non-relational systems to their portfolio. From an DMSA perspective, this explains the need for, and rise of, a different kind of analytical engine. However, the new systems were standalone systems. This was OK at the time, since they provided additional value that existing systems, technically or economically, could not. Note that the emergence of new platforms that addressed Job #2  did not automatically mean they they could perform job #1 in the same way as the original platforms.

As the market and technology matured, it has became apparent that it was sub-optimal to treat the existing and new systems in an uncoordinated way. It is better to treat these systems as complementary engines rather competing solutions, as I noted in my previous blog. When analyzing sensor data from a manufacturing line it is useful to marry this with the structured data that describes the machinery and what its being used to produce. When this is required it is obviously better if you can do it quickly and efficiently.

Thus, Job #3 arose, the integration of the new and the old engines. This is the job of the LDW, as the next generation data warehouse. This next generation DMSA is itself constructed from other DMSAs.

LDW Jobs to be Done 3
Job #3 – Treating All Data and Analysis as a Unified LDW System

Fulfilling job #3 allows us to analyze data independent of applications, no matter what form it is in, or where it comes from. By the way, this should ring a few bells within our memory; it was the mission of the original data warehouse. The LDW does with logical integration what the original DW did using physical integration.

For technical architects it will be important, and interesting, to watch how this develops. Analytic RDBMS vendors can also respond to the new needs by building support for unstructured data types into their products in a couple of ways. They can:

  • Provide support for more voluminous data on less costly storage devices, and make use of commodity processing power. This kind of “hot and cold data” techniques are well understood. At the same time, they can add new data operators to process JSON, XML, text or other less structured data types.
  • Alternatively, integrate the non-relational distributed engines into a federated structure with their original RDBMS core engines. Multi-model DBMS are now becoming common

In the early stages of big data, there was an assumption that the new technology would replace the existing RDBMS technology. That has clearly not happened.

With hindsight is clear how these different processing engines complement each other. Each component does a job, or part of a job, that another is less suited to do.

New technologies will continue to emerge and evolve. By taking a modular and integrated approach, as the LDW does, our our ability to blend these together to meet new requirements and derive new benefit will evolve with them.

There are many other aspects of product innovation and design discussed in these books, so the next time you need some good beach reading these, particularly the first, make a great choice.

Leave a Comment

1 Comment

  • Very informative post. I really appreciate your skills. Thanks for sharing.