This is part 1 of a 2-part series. You can find part 2 right here. But start with 1. Trust me.
The mighty data management platform – the DMP – is the soul of modern marketing. And Gartner’s top marketing-related blog post for all of 2015 was titled “What Does a DMP Do, Anyway?” – so obviously there’s an appetite for instruction.
Hence my latest (and longest) attempt to chalk out what these magic boxes really do. Of course, my science does not apply to every DMP and is not a good sub for hard study. But it might bring you closer to the rainbow.
First, I list the Secrets (for the post-literate #millennials) and then provide some real detail behind each one of them (for aspiring #ninjas).
Top 10 Amazing Secrets of DMPs
- DMPs store data in two different ways
- They collect data like everyone else – with tags, APIs and uploads
- They are in the business of labeling (and relabeling) people
- They use outside partners to help map data to users
- Their “user profile” is not supposed to be a complete customer profile
- What they call an “audience” is just a segment
- They were designed to build targets for advertising
- They were also designed to personalize websites
- They make many decisions based on predefined rules
- [Revealed below]
1. DMPs store data in two different ways
DMPs have to store a lot of information. This info is organized (as much as possible) around an individual user ID. DMPs are user-level information systems. Their goal, their purpose, is to understand users – and then join them together into groups.
So obviously, one way DMPs store data is in massive user-level data stores. Hadoop will do for this purpose. So will Google’s BigTable and even relational databases and appliances like those things you have in your IT closet. All the juju we talk about below – attributes and data elements and so on – can sit in this big store until show time.
But this one type of storage is not enough. If DMPs only stored user profiles they would be identical to databases. The problem is they’re used in ad tech and marketing – what we’ve called mad-tech – and need to respond to requests for help about individual users, at least 1 million times per second, let’s say. Within about 50 milliseconds.
Hadoop won’t do this. No data mart or lake or bivouac, no matter how groovy, will do this fast enough. So DMPs rise to the challenge of ultra-fast response by using a second data store, related to the first. This second store is in a specific format that is optimized for ultra-fast response – what we call “fast read” – and super-zippy lookup times. One format of choice is key-value and common vendors here include Amazon’s Dynamo and Redis.
The key-value store is simply a list of keys and values and includes a frequently-updated subset of important pieces of information extracted from the database. It’s faster because it doesn’t include a lot of fancy information included in other types of data stores – like the order of the values or references to other tables – just the label and the value.
For example, Turn (which has a DMP and DSP) has described its DMP’s two data stores as an Audience User Profile (AUP) and a Run-time User Profile (RUP), which is optimized for speed. The RUP includes bits of information needed to inform real-time bidding decisions on an ad exchange. Example: a user’s membership in a segment (e.g., “recent website visitor to retarget” or “in market for an Icelandic horse farm” or simply “rich old zombie in Memphis”).
2. They collect data like everyone else – with tags, APIs and uploads
Activity tags: These are used to track things users do on your sites. You set these up in the DMP, of course, and include: name, location, parameters. Parameters are bits of info that are passed back to the DMP, and you need to predefine them. For example, you can pass back the userID, the activity name (“CheckOut”), and info available on the page like order size.
Media tags: These are used to track impressions and clicks for media. Of course, media data are also collected by DSPs, SSPs, ad servers and attribution platforms, among others, but DMPs often need this information in order to improve their ad targeting (see Secret #7).
Mobile SDKs: DMPs also have SDK’s, at least for IOS and Android. They work the way analytics SDK’s work, including code into the app itself that sends data to the DMPs servers. They include pre-defined events like “Bought Something” or custom events (defined by you). (Since IOS devices block cookies by default, it can be impossible to tie events back to userID’s, but that’s a topic for another day.)
APIs: Tags are used to collect information from browsers. DMPs can also collect data using web services APIs, which exchange JSON objects back and forth from the web server to the DMP server. A lot of information can be passed back and forth this way, behind the scenes, but APIs require diligent setup, care and feeding.
Now in this Secret, we are talking about the ways that DMPs collect online data in the now. There are also a number of chillaxed ways they can collect so-called offline data that do not involve tags. But we will cover these in a later Secret. In fact, the next one …
3. They are in the business of labeling (and relabeling) people
The data that sits in DMPs comes from many different sources. That’s the point. One of the major functions of a DMP is to combine data from both online and offline sources into a single coherent record, associated with a userID. Online sources include the tagged up sites mentioned above, as well as any other data the client (or her buddies) wants to send to the DMP, preferably attached to a userID.
But what about offline data? Outsiders may be surprised to learn that offline is not the same as non-digital. It’s all digital, friends. Offline as digital marketers use the term just means data that is not collected and sent in real-time through the internet wires. It is information that is sent at some point – hours, days, months – after it is collected, often as a simple comma-separated value (csv) text file. Usually some work is required to figure out which userID the data points belong to.
Yes, DMPs can handle your offline data. They have to. It’s great stuff. Offline data can include critical pieces of information such as which people bought what items in a store, in the real world or online, or more generally how valuable a person is as a customer (loyalty tiers). It can include purchased data such as where a customer lives or if they like dogs. The most common offline data is stuff from a client’s own CRM system (including lifetime value and things they’ve done in the past) and so-called third-party data, which is sold by companies like Nielsen/eXelate, Acxiom and Oracle Data Cloud (ODC).
DMPs are directly integrated to the most popular data providers (including those just mentioned). What does “integration” mean here? It means that the DMP has set up many FTP connections that can upload files in carefully defined formats in large batches, usually once a day, like a vitamin. After they are uploaded, the files are copied into the big database and – in parallel, using the magic of Big Data – mapped record-by-record to previously known userID’s.
In other words, DMPs take files sent by outside systems on a schedule and update their userID records with this new info.
Sounds easy. However, stop for a moment to ponder. On the one hand, you’ve got – say – dozens of outside companies scattered near and far . . . and you need information from them every single day that is:
- useful to you; and
- in a format you can understand
Obviously, you’ll need to work something out with the third parties. This calls for itching and scratching. The “useful” part can be determined in a contract: send me X data you have on Y group of people at $1.00 per thousand people (or whatever). The “format” part is trickier. In database terms, the DMP needs this third-party data mapped to a schema that it can decode. (A schema is just a format that ensures my machine – the DMP – knows what exact data points your machine – the third-party data provider – is sending.)
DMPs solve the schema problem by requiring third-party data to be in a certain format. This exercise is made easier by the insight that DMPs are in a fundamental sense categorization machines. They are in the business of labeling people: 45ish… In-market for a BMW…. Loves Bernese mountain dogs… Wants a swimming pool…
As one group of DMP architects put it:
“The key insight is that all incoming data [to the DMP] is mean to categorize users, so by definition it is categorical data. Based on this observation, data providers are required to build a taxonomy of categories for each data set.”
In other words, the data provider must organize its data into categories and subcategories. So all the data sent to the DMP is basically a list of users and an associated list of which predefined categories/subcategories they belong to.
4. They use outside partners to help map data to users
Of course, all the information in the world is not worth much unless it can be tied to the DMP’s userID, which in turn belongs to a single person (ideally). Otherwise it is just a random piece of information about a random person. An exception here would be a dataset comprised of anonymous cookies that a third-party provider like Nielsen/eXelate says have a certain important feature (usually they’re in market for some particular product). But we don’t really need a DMP to do this.
Online data sent to a DMP is often tied to a userID at the source through a process called “ID matching.” This requires a DMP’s tag to be on at least one page of the site where the customerID (the one being matched) is known. Usually, this happens right after a user logs in: their customerID is passed as a query string parameter to the DMP and voila – the DMP maps it to the DMP’s userID. Or another client system that maintains a different ID (like Adobe Analytics) can match its ID to the DMP’s userID and share information at leisure.
But what about offline data? How is that tied to a DMP’s userID?
One way is for the client to send it to the DMP directly. For example, a client might have a bunch of retail locations and want to send information about what loyal customers bought recently. They can FTP a file to the DMP that contains the customerID, date, amount, parameters (defined in the DMP, e.g., type of item). Of course, for this to work the customerID must have been previously mapped to the DMP userID. Each line of such a file might look like this:
But what if the client wants information from a system that it does not own and – even worse – that is not already mapped to a DMP’s userID or even its own customer ID?
The answer here is: Acxiom LiveRamp. Or Neustar. Or a smaller provider or custom setup. But usually, it’s Acxiom LiveRamp. (Which is what makes that platform such a magnet for the gliteratti of the digiteratti.) In generic terms, a data “onboarding” platform.
Onboarding platforms are simply very large, continually updated indexes that map userIDs and personal information across different data files from different systems. Their process of working with DMPs goes like this:
- Onboarder and DMP synch their cookies – i.e., when a browser fires the DMP’s tag, the DMP sends its userID to the onboarder, who then either finds its own cookie already in the browser or sets one and returns both to the DMP
- Onboarder obtains – or already has – personally identifiable information (aka P.I.I., such as email or mobile number) about people, usually by partnering with large publishers or other companies that require users to register
- Onboarder links DMP userID with its P.I.I., which it can use as an “offline key” to map records from third-party data providers or non-cookied source.
There are obvious benefits to scale in the onboarding business. In fact, the industry itself could benefit from a single reliable source that mapped cookies to accurate, updated P.I.I. and back again. (Leaving aside the issue of monopoly prices.) That’s the reason there is a clearly dominant player in both onboarding (Acxiom LiveRamp) and the exercise of mapping individuals to multiple mobile devices (Tapad).
5. Their “user profile” is not supposed to be a complete customer profile
We can sometimes think of a DMP’s “user profile” as a spiritual single view of the customer, a mystically actualized and comprehensive portrait that contains just about anything you’d want to know about a person for marketing purposes. This is not true. It is neither practical, nor useful, nor true.
DMPs are not philosophers and do not exist to admire us.
They’re here for a reason: to put labels on people that can be used to sell them things.
So it won’t be a surprise to learn that a DMP’s typical user profile is incomplete and even skimpy. As we saw above, it should not even contain all the data in a client’s own marketing or commerce systems (unless they’re small or confused). It contains just what it needs to do the job.
DMP “user profiles” contain userIDs, of course. These are the keys. But what else? It is not going to far out onto the arroyo to say that the bulk of what a DMP’s user profile contains can be given a single label: attributes.
What is an attribute? It is a label or other value attached to a userID. It can be binary (Y/N), categorical (High Value/Medium Value/Low Value), or a value (usually banded into categories, actually: see previous parentheses). Each attribute has a label, a description and a value. The label can be anything and is strictly defined by the client. For example, if it matters to your business that the customer is male or female, there is certainly an attribute assigned to each userID called “female” with values 0 or 1. Attribute values are an integer. More complicated attributes can take up to a certain number of different values (often 255, for memory purposes). And each attribute has a predefined ID code of its own so there is no ambiguity.
You can see that updating (or creating) user profiles in a DMP just requires you to send a text file in the proper format: with userID in one column and an integer value for each column of attributes (with a set of column/attribute labels in a separate file).
Although we’ve said attributes can have many different values, the truth is they’re generally always used by the DMP to make a decision – and as an input into a decision, all attributes collapse into binary values. A person either is (“1”) or is not (“0”) in the group I am interested in for my decision. Usually, these decisions are not based on a single attribute – although they can be. For example, ad retargeting campaigns are often based on the single attribute “Visited My Website Recently” (Y/N). But more often, decisions are based on a bunch of attributes put together that – collectively – can tell the DMP what action to recommend.
All of which brings us, happily, to . . . audiences.
That’s all for now. You can enjoy the exciting second half – Secrets #6 – #10 – right here. Peace.
The Gartner Blog Network provides an opportunity for Gartner analysts to test ideas and move research forward. Because the content posted by Gartner analysts on this site does not undergo our standard editorial review, all comments or opinions expressed hereunder are those of the individual contributors and do not represent the views of Gartner, Inc. or its management.