How Cross-Device Identity Matching Works (part 1)

By Martin Kihn | August 09, 2016 | 6 Comments

Cross-device identity is the thing that tells marketers and other nosy types that Device A and Device B and Browser C all belong to the same Person X. Devices (and browsers, which we’ll call devices for convenience) are matched to people. These people may be known to us by name or they may remain quite mysterious. But here we are concerned only with the process of matching devices together that all point to someone.

How is this done?

Talking to our friends at Adelphic recently, they clued us in to their U.S. Patent 8,438,184 B1. Adelphic is a mobile demand side platform, in the business of helping advertisers target people mainly on mobile devices, so they’re interested in identity. Why? If they’ve served an ad to Person X on Device A, they’d like to know that fact before they bid good money on yet another ad to the same person on Device B. Or if they happen to know something interesting about Person X (say, they like Bernese mountain dogs), then when that person shows up on a new Device C, Adelphic can put in a high bid for their Bernese mountain dog tea cozy-selling client.

Boom! But only – of course – if they know that Devices A, B and C all belong to the same Bernese mountain dog lover Marty, I mean, Person X.

How? That’s the topic of the patent, which I have read in some detail. (Patent reading is about as fun as going to Coney Island to count the sand, so you’re welcome.) I summarize highlights for the people.

ADELPHIC’S PATENT

It turns out that the “matcher” (e.g., Adelphic) is in the business of building up a large database of individuals. Each individual has a unique identifer (we can call it the Master ID) and a set of known Attributes. These Attributes are things that the database knows about them based on previous interactions. And – importantly – this Master ID and set of Attributes also tries to include other ID’s that are uniquely tied to particular devices.

So at its core, the matcher is a system that is confronted with device identifiers that it bumps up against its Master ID database over and over in an attempt to see if that device belongs with an existing Master ID, i.e., a person they know. If not, a new Master ID is created and becomes part of the matching game.

Some of you may wonder: “Is it using deterministic or probabilistic methods?” The answer is: both.

First, it will try to determine if there is a literal match – that is, if the device identifier is already in the database. If so, there is a direct match that we call deterministic. If not, we need Plan B: probabilistic matching. Plan B uses some fancy math (which I will sketch out below) to figure out if the various Attributes it’s seeing are similar enough to Attributes in the Master ID database to make it PROBABLE we have a match.

If not, rinse and repeat: New Master ID, attached to these Attributes, and associated with the device ID.

Simplified Version of Match Process

Simplified Version of Match Process

This is all somewhat less mystical than it sounds. I will take a moment here to list the specific pieces of information that Adelphic’s patent mentions. It’s interesting because it shows how sparse info can get in ad tech.

Deterministic matches are commonly made by finding one-to-one identity with:

  • Cookie – any unique browser cookie set by the advertiser or its agents, like a DSP
  • Phone number
  • Email address
  • UserID – explicit identifier that is similar to a cookie, known only by advertiser or its agents
  • Device ID – some of the ones listed in the patent don’t exist anymore (like Apple’s beloved UDID and various open standards that didn’t take); practically, this is limited to ADID and IDFA available only to app developers for Android and IoS

These identifiers are often sent with the HTTP request as a query parameter. This means something that looks like a URL is sent to the matcher’s server and it contains a string after the matcher’s address, like:

www.matcher.com/api/identityrequest?cookie=XYX2883838383aasdf;email=person@site.com

The matcher has set up an API that can be pinged with this request. It sends back a match (if it exists) and other relevant info (ditto). Of course, many times a marketer will not have an email, phone number or even known cookie. In practice, this kind of deterministic identity is useful for people who have explicitly given us information – usually because they are a customer or prospect – and this is stored either in their browser (e.g., cookie) or on the web page itself (e.g., email).

Most people will not have a deterministic identifier attached to their API request. So we go to probabilistic matching. Here the matcher will use any and all information it can find. Some of it is simply the kind of information that is always sent back and forth when machines on the internet communicate. This information is contained in the IAB specs and is routinely part of any exchange of data on the wires.

It comes in two flavors: device and system data; and so-called behavioral data.

Device and system data spelled out by the patent include:

  • OS type and version
  • Device brand and model
  • Clock setting
  • Time zone
  • Speed
  • Language default

These seem quite mundane but contain more information than we think. OS versions can get very specific. While default language = english doesn’t say much, I’ve heard that clock settings down to multiple decimal points can be quite revealing.

Then there is so-called behavioral data. (I say “so-called” because it isn’t always about what we humans call behaviors.)

Attribute data of this type are:

  • HTTP header – this is hidden text sent with the HTTP request and includes things like the date and time, characters accepted, various settings
  • User agent – the browser and version
  • App launch time
  • Page load time
  • Referrer – previous page visited is usually stored in the browser
  • Plug-ins used
  • Geography (latititude / longitude)
  • URL – specific page person is on; this can also be visited to determine the type of content being viewed (more on this below)
  • Typing frequency
  • Gesture – these last two apply to apps, if you can capture some patterns in the person’s interaction with the app; I have no idea how often this is used and can’t find any real information about it, other than the obvious fact that people can have different typing frequencies and gestures. (If you know, DM me at @martykihn or comment below.)

This does not seem like a whole lot of information to describe a unique person, and it isn’t. But combined, it can be helpful. For example, if I open a browser and visit a Bernese mountain dog site and my referrer is a Crossfit gym in Bedford Hills, NY – well, that’s pretty unique. If I did the same thing on my iPad last week – which, to be honest, I probably did – there exists a MasterID with those behavioral Attributes and (assuming there are not thousands of people who do exactly the same thing, which there are NOT) the matcher can call us a likely match. Based purely on a URL and a Referrer.

Of course, the matcher is rarely so lucky. How does it make probabilistic matches, generally?

That’s the topic of next week’s part 2. Watch this space.

6 Comments
  1. August 10, 2016 at 12:50 pm
    Martin Kihn says:

    Comment *

  2. August 11, 2016 at 9:28 pm
    Mary Foltz says:

    Martin, thank you for a very insightful read. I don’t have information on how an external system could read your (or my) typing frequency or gesture. But to do so, wouldn’t they have to store something on the device itself to collect that data? The other items on the behavioral list seem to have an interactive component that would be readable by a an external process that is communicating with the device, say waiting for a plug-in to load so that pages can be served up. What do you think?

    • August 11, 2016 at 11:46 pm
      Martin Kihn says:

      Yes – it’s a mystery to me. I can’t find info but still poking around. Apparently there are data availability and privacy problems with these methods, so it’s possible nobody uses them for matching. They were mentioned in passing in the patent – but it’s hypothetical. Still, sounded cool, huh?

  3. August 12, 2016 at 8:08 am
    Tomas Salfischberger says:

    Hi Mary, Martin,

    To my knowledge, there are 2 ways in 2 places to capture typing frequency / behavior. The first way is within an app. When an app has a tracking SDK installed by the owner of the app, it is possible to register typing and other types of interactions and behavior within this specific app. This could, for example, be done by the app publisher to sell the data for monetization purposes.

    The second is web based; it is possible using Javascript to track the way users interact with the pages they visit, including behavior such as where you move your mouse and how you type. Facebook introduced something similar a while ago: http://newsfeed.time.com/2013/12/16/facebook-is-keeping-track-of-every-post-you-write-and-dont-publish/

    In practice, I have not seen either of them being used for probabilistic identity management. I’m also not sure how accurate those attributes are compared to the other options (cookies, ip-address, email, plain old login etc). They are, however, often used for analytics purposes to create heatmaps, for example, to map how people interact with a page.

    Tomas

    • August 12, 2016 at 2:04 pm
      Martin Kihn says:

      Boom! Great answer Tomas – I think you are right. I’ve heard from a few probabilistic vendors since writing this and they all tell me they don’t use typing frequency or gestures for matching. Appreciate the clarity here.

  4. August 18, 2016 at 1:28 pm
    Ketharaman Swaminathan - GTM360 Marketing Solutions says:

    Great post. From personal experience, I totally agree with you that “patent reading is about as (much) fun as going to Coney Island to count the sand”. Just that I’d replace Coney Island with Chowpatty or Juhu beach, two of the beaches in Mumbai that are closest to where I live in India! On a side note, I’ve always wondered if there’s an app that would summarize patent filings in plain English using 21st century illustrations.

Comments are closed.