How Cross-Device Identity Matching Works (part 1)
By Martin Kihn | August 09, 2016 | 6 Comments
Cross-device identity is the thing that tells marketers and other nosy types that Device A and Device B and Browser C all belong to the same Person X. Devices (and browsers, which we’ll call devices for convenience) are matched to people. These people may be known to us by name or they may remain quite mysterious. But here we are concerned only with the process of matching devices together that all point to someone.
How is this done?
Talking to our friends at Adelphic recently, they clued us in to their U.S. Patent 8,438,184 B1. Adelphic is a mobile demand side platform, in the business of helping advertisers target people mainly on mobile devices, so they’re interested in identity. Why? If they’ve served an ad to Person X on Device A, they’d like to know that fact before they bid good money on yet another ad to the same person on Device B. Or if they happen to know something interesting about Person X (say, they like Bernese mountain dogs), then when that person shows up on a new Device C, Adelphic can put in a high bid for their Bernese mountain dog tea cozy-selling client.
Boom! But only – of course – if they know that Devices A, B and C all belong to the same Bernese mountain dog lover Marty, I mean, Person X.
How? That’s the topic of the patent, which I have read in some detail. (Patent reading is about as fun as going to Coney Island to count the sand, so you’re welcome.) I summarize highlights for the people.
It turns out that the “matcher” (e.g., Adelphic) is in the business of building up a large database of individuals. Each individual has a unique identifer (we can call it the Master ID) and a set of known Attributes. These Attributes are things that the database knows about them based on previous interactions. And – importantly – this Master ID and set of Attributes also tries to include other ID’s that are uniquely tied to particular devices.
So at its core, the matcher is a system that is confronted with device identifiers that it bumps up against its Master ID database over and over in an attempt to see if that device belongs with an existing Master ID, i.e., a person they know. If not, a new Master ID is created and becomes part of the matching game.
Some of you may wonder: “Is it using deterministic or probabilistic methods?” The answer is: both.
First, it will try to determine if there is a literal match – that is, if the device identifier is already in the database. If so, there is a direct match that we call deterministic. If not, we need Plan B: probabilistic matching. Plan B uses some fancy math (which I will sketch out below) to figure out if the various Attributes it’s seeing are similar enough to Attributes in the Master ID database to make it PROBABLE we have a match.
If not, rinse and repeat: New Master ID, attached to these Attributes, and associated with the device ID.
This is all somewhat less mystical than it sounds. I will take a moment here to list the specific pieces of information that Adelphic’s patent mentions. It’s interesting because it shows how sparse info can get in ad tech.
Deterministic matches are commonly made by finding one-to-one identity with:
- Cookie – any unique browser cookie set by the advertiser or its agents, like a DSP
- Phone number
- Email address
- UserID – explicit identifier that is similar to a cookie, known only by advertiser or its agents
- Device ID – some of the ones listed in the patent don’t exist anymore (like Apple’s beloved UDID and various open standards that didn’t take); practically, this is limited to ADID and IDFA available only to app developers for Android and IoS
These identifiers are often sent with the HTTP request as a query parameter. This means something that looks like a URL is sent to the matcher’s server and it contains a string after the matcher’s address, like:
The matcher has set up an API that can be pinged with this request. It sends back a match (if it exists) and other relevant info (ditto). Of course, many times a marketer will not have an email, phone number or even known cookie. In practice, this kind of deterministic identity is useful for people who have explicitly given us information – usually because they are a customer or prospect – and this is stored either in their browser (e.g., cookie) or on the web page itself (e.g., email).
Most people will not have a deterministic identifier attached to their API request. So we go to probabilistic matching. Here the matcher will use any and all information it can find. Some of it is simply the kind of information that is always sent back and forth when machines on the internet communicate. This information is contained in the IAB specs and is routinely part of any exchange of data on the wires.
It comes in two flavors: device and system data; and so-called behavioral data.
Device and system data spelled out by the patent include:
- OS type and version
- Device brand and model
- Clock setting
- Time zone
- Language default
These seem quite mundane but contain more information than we think. OS versions can get very specific. While default language = english doesn’t say much, I’ve heard that clock settings down to multiple decimal points can be quite revealing.
Then there is so-called behavioral data. (I say “so-called” because it isn’t always about what we humans call behaviors.)
Attribute data of this type are:
- HTTP header – this is hidden text sent with the HTTP request and includes things like the date and time, characters accepted, various settings
- User agent – the browser and version
- App launch time
- Page load time
- Referrer – previous page visited is usually stored in the browser
- Plug-ins used
- Geography (latititude / longitude)
- URL – specific page person is on; this can also be visited to determine the type of content being viewed (more on this below)
- Typing frequency
- Gesture – these last two apply to apps, if you can capture some patterns in the person’s interaction with the app; I have no idea how often this is used and can’t find any real information about it, other than the obvious fact that people can have different typing frequencies and gestures. (If you know, DM me at @martykihn or comment below.)
This does not seem like a whole lot of information to describe a unique person, and it isn’t. But combined, it can be helpful. For example, if I open a browser and visit a Bernese mountain dog site and my referrer is a Crossfit gym in Bedford Hills, NY – well, that’s pretty unique. If I did the same thing on my iPad last week – which, to be honest, I probably did – there exists a MasterID with those behavioral Attributes and (assuming there are not thousands of people who do exactly the same thing, which there are NOT) the matcher can call us a likely match. Based purely on a URL and a Referrer.
Of course, the matcher is rarely so lucky. How does it make probabilistic matches, generally?
That’s the topic of next week’s part 2. Watch this space.