How Does a DMP Really Work?
By Martin Kihn | July 30, 2015 | 3 Comments
As we were saying, the mighty DMP — Data Management Platform — is much beloved of digital marketers and advertisers. It’s a “nexus system,” lying at the crux of so much that is groovy; and it’s a bridge from your own information to the world of advertising and even content personalization.
Yet like Elon Musk or the Ginsu 2000, the DMP does so much — and is so little understood. For a series of a few posts, I’m going to get into some detail about how the DMP actually works. If you’re an ace DMP’er, please go take a maze run. For the rest of us, center.
Now, recall what a DMP does. Four things:
- Takes data in — from you (the user), other companies, and data it collects itself
- Groups it — into segments, mainly, which DMPs insist on calling “audiences” because they are obsessed with pretending they work in television
- Sends instructions — to place ads (usually via its wingman, the Demand-Side Platform [DSP]), change the content on a web page or email, etc.
- Measures the impact; improves the instructions
So we have our four pillars of the glamorous DMP. (By the way, if you’re wondering what kinds of skills you’ll need (or need to contract) to run a DMP successfully, they more or less line up to the above. You’ll need a (1) data wrangler, (2) media person, (3) marketing operations, and (4) analytics ninja.)
Without drifting too far off the trail, you can think of the DMP as a smart database. In the early days, DMP’s were quite literally built on an Oracle or IBM database. In fact, I am looking at a “Platform Overview” schematic of one of the pioneering DMP’s from a few years ago. It contains an Oracle database that is connected to the client’s CRM system via batch file transfer, and it also collects data from the client’s websites and other systems via pixels and data importing; this information is shared with a “bidder,” which connects to ad exchanges via APIs.
More recently, you’ll find DMP’s sitting in the cloud — most likely somewhere out among the Amazon Web Services so beloved of SaaS vendors. I’m looking at another schematic for a newer DMP that proudly touts Amazon S3 data storage as well as “real-time, NoSQL data stores” (including Amazon’s Dynamo and open source Cassandra). Data is collected via pixels and “cloud ETL.” So it’s basically the same as the above with a degree in Big Data.
To understand the soul of the DMP, we will need to get into:
- How it collects information [note: you can substitute the word “data” for “information” if you like: personally, I’m tired of hearing the word data and think we all need a break]
- How it defines and maintains information
- What audiences are
- How decisions are made, and
- What the analytics looks like.
Before we jump into an empty pool, though, let’s talk about two words that come up a lot when we’re talking about DMPs.
- IDs — You will notice as you work with DMPs that there is a lot of talk about IDs — customer IDs, DMP IDs, platform IDs, device IDs. An ID usually represents a person, but a person has many IDs. You probably have hundreds without knowing it. Unless it has a key (a way of mapping one ID to another), the DMP won’t work as well. The DMP itself will give each individual an ID, and you probably have a customer ID for your customers. It follows that one early step in the process of booting up a DMP is providing the DMP with your customer IDs, so it knows who the information you’re providing maps to. And of course providing interesting information about each of those customers. [I am going to bracket the discussion of personally identifiable information (PII) vs. de-identified and anonymous information; we’ll just say here there are ways to use what you know about people without unduly invading their privacy. You have been served.]
- Attributes — If IDs are people, “attributes” are things we know about people. To a marketer, a person is just a collection of attributes. These attributes are often binary Y/N (male = Y, high income = N, recent customer = Y), or have a narrow range of values. Sometimes DMPs limit the number of values an attribute can have to 256 (based on computer memory size). For this reason, attributes that are numerical are often banded into ranges, and categorical variables are also often turned into numbers. For example, for the attribute “customer-age”: under 18 can be “0,” 19-25 can be “1,” and so on. Likewise, for “customer-stage,” new customers can be “0,” old loyalists can be “5,” with ramps in between.
You may say: Why not just create a variable called “age” and dump the age in there, and another one called “customer-stage” and put in the number of days since they first bought something? You could. But you have to remember how DMPs think. They are treating each attribute as a discrete piece of information. Statistics deals in probabilities. If you’re providing information, the system will treat it as though it could have some value. If you let it, the DMP will happily waste time and energy trying to figure out if there’s any difference between how an 18-year-old acts and a how a 19-year-old acts, when (believe me) there isn’t. To make the system more efficient (faster, more accurate), provide it only with attributes that could possibly have some meaning. Which is just a thoughtful subset of everything you know about your customers.
How DMPs Collect Data
If you think about it, before you came along, your DMP was just a bunch of empty memory waiting for you to arrive. Sure, it has easy access to data vendors, who are willing to sell anyone groups of (anonymous) cookies or device IDs that may respond to your advertising, or are otherwise useful. But it doesn’t know anything about you. Think of it as a stranger you meet at a conference. Other than the obvious fact that you’re both into marketing and very attractive, there is no mutual understanding. Yet.
Your DMP collects data in three ways:
- Onboarding — it takes data you send it and stores it for future use
- Tags — it collects information itself from your digital properties, like websites, wherever you put data collection tags
- APIs — the all-important server-to-server exchange of carefully formatted data objects (which are like little files)
Onboarding: Onboarding works quite simply: you create data files that include information you think the DMP should know about your customers, format them carefully, and send them to the DMP. These files are often sent as flat files in CSV (comma separated values) form and uploaded as a batch. They will probably look something like a whole bunch of rows of this:
You will be required to provide a file that describes the format of the file you are sending — a “decoder” for the data file. This tells the DMP what it’s looking at when it’s looking at your big CSV file full of numbers, letters and commas. Typical fields in the decoder file would be things like:
- Attribute name — what it’s called (e.g., “loyalty-status” or “home-state”)
- Attribute ID — an 8-digit number assigned by the DMP to that attribute (each gets a number)
- Attribute type — binary or another value (string, integer, decimal)
- Banding rules — if you are providing continuous values in bands (like the age example above)
- Description — for your reference
For the above string of numbers, the decoder table might read:
“customernumber, testgroup, funnelstage, recent-customer, avg-order-size, first-order-date …”
What about Acxiom/LiveRamp? At this stage, many marketers begin to muse about data onboarding platforms such as LiveRamp and Neustar. Smart. These platforms are in the business of matching information from different systems to the same customer ID. They do this through an elaborate process of “stitching” IDs (mapping and cross-mapping) and anonymizing and de-anonymizing (since ethical marketers and data vendors are careful about how they use personal information).
Platforms like LiveRamp work in a way similar to DMPs themselves, and most DMPs work with platforms like LiveRamp all the time. The marketer sends a file of user records to the platform, IDs are matched to the marketers’ own and outside information sources, anonymized and sent back to the marketer. For example: I send my customer file from my POS system and my customer file from my Email system to LiveRamp, which matches the two files using a common field (like an Email or customer #), and then adds new info about household income and age estimates from Acxiom. Personal identities can then be stripped (de-identified) and replaced with random IDs. The data on-boarder could then send the file back to me enhanced with the additional info I can use to help me segment or label the customers.
However you do it, sharing your own data with the DMP lies at the heart of the DMP’s value. Why? As you’ll see, this information and how it is segmented is the best way for the DMP to get to know you. It learns what a “good customer” looks like, in terms it understands (i.e., attributes). And it uses this to go find more, and to make “okay customers” more valuable.
That’s enough for today … next time, we’ll talk about Tags, APIs, and How DMP’s Organize Data … until then, I’m @martykihn