Back to Gartner for Marketers Blog

A Marketer’s TL;DR on Bayes Theorem

By Martin Kihn | December 02, 2016 | 1 Comment

Here is Bayes Theorem itself, in neon:


Last time, we dawdled on its greatness and made outrageous claims about Thomas Bayes, 18th century cleric, founding ad tech. Today, we move along. We need to explain what this Bayes Theorem is to the people, in what my poetic colleague Jake Sorofman refers to as balloon animal shapes.

So here goes.

Look at the neon sign again. Here’s how we read it. The letters A and B are events — things that might (or might not) happen in the world. The big P refers to “probability.” And the vertical line means “given that” or — as the film announcer says — “in a world where …”

So the neon sign says something like this:

The probability of A happening in a world where B has happened is equal to … (The probability of B happening in a world where A has happened) times (The probability of A happening at all) divided by (The probability of B happening at all).

(Bayesians have terms for each of the elements of the formula: the terms in the numerator are called the “Likelihood” and the “Prior,” and the term underneath is sometimes called the “Marginal.”)

Easy, right? And just so we’re clear, “probability” is a word that implies we really don’t know what is going to happen. It describes a world of mystery, which is where we live. It is related to likelihood or odds and is expressed as a number between 0.0 and 1.0.

0.0 means “never happen” and 1.0 means “definitely happen.” So of course, 0.5 means “maybe yes, maybe no.”

Now let’s talk about advertising. Imagine that you’re trying to predict whether someone will click on a banner ad. Of course, you run the Bernese Mountain Dog Club of Greater New York (BMDCNY) and would like to advertise for new members. You would like to put the ad in front of people who are likely to click on it and are excited to apply Bayes Theorem here.

In addition to loving Swiss farm dogs, you admire a well-targeted ad.

In this case, event A is “a person clicks on the BMDCNY ad” and event B is “the BMDCNY ad is put in front of a person.” To be more succinct, let’s call event A “click” and event B “impression.”

What does Bayes Theorem tell us here?

The probability of a click happening on an impression is equal to … (The probability that type of impression caused a click in the past) times (The probability of a click happening at all) divided by (The probability of that impression happening at all).

Some of you like jargon. I don’t, but I will admit we are in the realm here of conditional probability. Bayes was interested in the likelihood that some event would happen given that something else had happened. In our case, that a click would happen given that a particular impression occurred.

This is the essence of the Bayesian worldview: that we can use our hard-won experience and beliefs to inform our predictions about the future.

Now, what do we mean by “particular impression”? In practical terms, the advertiser only knows certain things about each impression – what we’d call its attributes or features. These are things like what website it appears on, what format the ad is (e.g., 300×250 or 728×90, display or video); and also what we know about the target (e.g., likely gender, likely income). All these attributes constitute the “impression.”

If we’ve run a campaign in the past, we have some information already about what makes people click. We see a certain % click on 300×250’s and a certain % of men click, and so on. Naive Bayes treats all these attributes as independent – this assumption greatly simplifies the math – and the first term in our numerator in Bayes Theorem is simply a big multiplication problem.

If our current ad is sized 300×250, targeted at a male – and so on – our first numerator term is:

(% who clicked on 300×250 in past) x (% males who clicked) x etc. …

As you can see, the number is probably quite small. It is then multiplied by the “prior” – the probability that anyone clicks on an ad in our campaign – and divided by the “marginal,” or the chance that this combination of attributes occurs at all.

What comes out is an estimate of how likely that particular impression will get clicked. The beauty of this type of set-up is that whatever happens this time, it can be put into the memory banks to inform the next formula; we have more information to use next time.

So you can see how Bayes Theorem is ideally suited to ad tech. It’s a universe where we don’t know anything for sure but we have a lot of information about what’s happened in the past, and we’re constantly learning. Click.

Comments are closed

1 Comment

  • Suresh says:

    Thanks for sharing your informative article. I recommend my students to follow your blog posts as they give very good insights. I teach digital marketing and expect my students to read the blog posts at marketo.
    Regards, suresh