In my last blog, I mentioned some books I’d recently rediscovered. This reminded me of two more books that many IT architects would find interesting – and that in turn reminded me of a concept that comes up repeatedly in my calls and is worth passing on as it can be a big help.
The concept is: Designing a Logical Data Warehouse (LDW) design, is like Town Planning, not like designing a building. When looked at from this perspective the design of an LDW is much more straightforward.
But I’m getting ahead of myself. The books are by Christopher Alexander: Notes on the Synthesis of Form and A Pattern Language. Alexander was an architect of buildings and of urban spaces. In his books Alexander talks about the process of design, how shapes are combined and decomposed to meet a design aim. He has influenced many fields, including enterprise IT architecture, as this article in the design magazine Slate illustrates. The book about IT architecture Design Patterns is acknowledged to be heavily influenced by him – see the Code to Joy blog for a reference.
When designing an LDW we are reconciling very different information requirements. We already understand that it is not possible to reconcile these very different criteria using a single server, therefore we need to design a system that is an integration of multiple servers. Those multiple servers typically include a data warehouse, a data lake, data marts, maybe an operational data store, and other specialist data stores and servers.
In the past, when designing a central data warehouse, or a data lake, we’ve focused on a single system. This is like defining the architecture of a single building. However, with the LDW we are, of necessity, designing an architecture with multiple servers, or processing engines.
I’d like to propose we think about this in terms of city planning. In this analogy I’ll use Barcelona in Spain as an example. Different areas of a city serve different purposes. Barcelona is on the Mediterranean coast and has beach areas, mainly devoted to leisure. this is one of the main attractions of the city (see picture above). But there are also downtown city office and retail areas, manufacturing and the industrial estates, and suburbs where people live. There is an airport located out of town, and a railway station in the city center.
There are other specialist areas too. After Sao Paulo, Brazil and Mexico City, Barcelona Camp Nou is the third biggest football stadium in the world . It has a capacity of 110,000 people and on every match practically every seat is taken. This clearly has to be a dedicated facility. However, also consider what happens when 100,000 people leave the football stadium at the same time. Apart from the issue of how to accommodate that number of people during the football match we have to consider connections how to move them in and out of the stadium at the start and end of the match. In the LDW we have very big servers and need to consider how to accommodate all the data in each. However its equally important to think about how to facilitate large volumes of data moving in and out of the servers. This may not be necessary between all components, but certainly between the data warehouse and data lake components we’d expect to make use of some kind of mass data movement capability.
Being on the Mediterranean coast has other implications. Those fortunate enough to visit Barcelona know it is not just a pleasant tourist destination. It is also a thriving commercial port, for container cargo ships, cruise liners and very large passenger ferries. You can see all this from the unique vantage point of a very special restaurant atop a tower that overlooks the sea, which I highly recommend, if you’re lucky enough to be there.
All this commercial activity is essential to the economy of Barcelona, but needs to be kept separate from the leisure activities around the beach areas. Both areas need to be easily accessible. This is analogous to having to handle large volumes of data, and undertake different kinds of processing, but having to keep them separate. We do this in the data warehouse with data subject areas, or by spinning out specialist processing to marts, or maybe to the operational data store. In the data lake we do this by keeping data in different ‘zones’. We do it at a higher level in the LDW by arranging our data and distributing our workload among these separate servers.
Away from the coastal areas, the leisure area and commercial port, are the residential areas, many of which are very traditional. Those that live in these areas value the traditional design of the streets and apartment blocks.
It is not necessary, or desirable, to have the design of these areas influenced by the commercial parts of the city, in fact, that is part of the charm of living in this city. When Barcelona added the Fira Conference Centre, one of the largest in Europe, with the capacity for many thousands of people, and with multiple football field sized conference halls this did not impact the existing residential areas.
In the same way we wish to make large scale industrial changes to our LDW without affecting the familiar, friendly, regular reporting already in place. There will be a lot of regular KPI and tactical reporting that needs to remain undisturbed by the addition of say, a data lake. That means not just the functional results, but also the user experience in terms of reliability, response times and availability of the reports. Being able to keep these as separate areas supports this. For example, adding the data lake to provide extra function, rather trying to use it to replace all existing function.
We can see that it is possible for different parts of the city to adopt different roles, each part being very good at meeting a particular purpose. Each part contributing to the whole, doing a particular job exceptionally well. But we also want to city to work well as a city.
It is necessary to be able to move people smoothly between the different areas . In the picture above we see the metro station for the Sagria Familia, the famous cathedral designed by Gaudi, it connects that major tourist site to the rest of the city quickly and efficiently. The quick and efficient metro system does a great job of connecting all the major areas within Barcelona. There are two aspects to this. There is a well defined metro map with consistent naming and annotation that describes how the metro connects all the parts of the city.
This is like the architecture diagram and common metadata we use in the LDW to describe the different parts of the LDW in a consistent way. There is also a general purpose transport mechanism to move people from anywhere to anywhere else – much like the ETL or data virtualization we might use as a general “anywhere to anywhere” data fabric. Sometimes we need extra capacity, like we saw above with the football stadium. When 110,000 people hit the streets the surrounding infrastructure, roads and metro, needs to be able to absorb them. This is like the high speed parallel import, load and export utilities we use between particular components of the LDW. Thus we see that some of the connections between components need particular attention. It may also be useful to build some kind of high capacity data transport that other components can plug into.
It is necessary to ensure to that traffic can enter, exit and disperse within the city efficiently. Barcelona has the “Avenguda Diagonal”, the “diagonal avenue”, which as its name suggests runs diagonally across the city. You can actually see it clearly in satellite pictures taken from space (see below, with yellow highlighter on the name of the avenue). It intersects with a motorway that brings you to the city, at the left, then it stretches across the city.
Contrary to what you might expect this main road is not ugly, or out of place, it is a wide boulevard that stretches across the city. Within the analogy we’re making for the LDW this would be the equivalent of high speed, parallel, ETL or other integration software.
By thinking about the different regions in the city, and the connections between them it is possible to have each area fulfill its main purpose; not be compromised by conflicting requirements, but to have the city as a whole fulfill all the requirements of those that live there.
As we see at the left, in Barcelona it is possible for traditional activities to be supported, like horse riding on the beach, alongside modern requirements such as telecommunications. The space age Montjuïc Communications Tower visible in the distance was designed as part of the park for the summer Olympics in 1992 to be used for the transmission of live TV coverage (Barcelona also has a second telecoms tower which handles present day telecoms traffic, designed by the British aechitect Norman Foster). These are not conflicting requirements, the city easily supports both, the traditional and the modern, simultaneously. It does this by having different areas and buildings dedicated to specific tasks and enabling the connections between them.
We can visit Barcelona and stand in the Sagria Familia (see below) and marvel at the unique space it represents. This does not preclude taking advantage of all the other parts and functions of the city, neither does the rest of the city encroach on the experience we have standing inside this unique building. This is because the city works as a whole.
In a similar way we can use different parts of the LDW, its marts, warehouse and lakes, enjoy the full benefit they can give us, but we can easily transition between the components to maximize benefit. It’s unnecessary to give our selves the problem of trying to use one data server to do everything. We can design our multiple data server LDW to do all the specific things we might want to do and to use them in combination – through a modicum of common metadata and data interchange design.
If you have the time I’d encourage you to become familiar with the work of Christopher Alexander. Also to think of the design of the LDW as city planning rather than the design of a building – I think you’ll find that if you do this many design decisions become straightforward.
By the way – from February to October we at Gartner will be running our popular Data and Analytics Summits at various locations around the world (See: gartner.com/events/bi). If you can attend one of these it will be possible to learn about, and discuss, this topic and many others. I’ll be at the London event in March and happy to chat.
/* Thanks to Unsplash for the free graphics I’ve used in this post, and to Google for the satellite photo */
Comments or opinions expressed on this blog are those of the individual contributors only, and do not necessarily represent the views of Gartner, Inc. or its management. Readers may copy and redistribute blog postings on other blogs, or otherwise for private, non-commercial or journalistic purposes, with attribution to Gartner. This content may not be used for any other purposes in any other formats or media. The content on this blog is provided on an "as-is" basis. Gartner shall not be liable for any damages whatsoever arising out of the content or use of this blog.