Brian O’Kelley had a problem. The 25 year-old Princeton grad was parked at a Silicon Alley startup called Right Media, run by two refugees from DoubleClick. This was when digital advertising was young, when people clicked on ads, when dot-coms were on the midnight watch and O’Kelley — by all accounts, a very clever developer — was pushing the ad server toward its final country.
He was creating programmatic advertising. It was 2003.
Those of us in the region remember those days. Everybody wanted to work at DoubleClick. Everybody. It was not owned by Google, yet; it was better than Google. It was an ad server, which in 2003 meant it sat like a dumb brick between sellers and buyers of banner ads.
The process was far from aerobic. It was digital but ugly. Here’s how it moved. I am American Express. I have a new charge card I want to sell to small business owners. I email my rep at Inc.com and figure out how much I’m going to pay per thousand impressions — say, $5.00 CPM. I go into DoubleClick, pull down the menu for Inc.com … select “Small Business Section” and “300×250” and a date range … upload my ad with the orange “CLICK TO APPLY” button … enter my 100,000 impressions and bloop.
My ad runs in the small business section of Inc.com 100,000 times (more or less), and I get a bill for $500.
Now imagine a campaign. I’m running millions of impressions on hundreds of sites, all with different prices. DoubleClick can show me the clicks and my sales rep can go through a list every week — if she’s special — and pull out the sites whose victims aren’t whacking the orange button. She does this by hand. She uses Excel.
Often, she’s busy. Campaigns roll as ordered, some troll at Digitas gives me a campaign rollup report every month, and next quarter I try something else.
Automation seemed in order. Automated optimization. Computers do this.
But O’Kelley’s immediate problem — the one that drove him into the arms of our story’s unlikely 18th century hero — was pricing. CPM was familiar to buyers. It’s used in the magazine business and is like the GRP’s used for television and radio. Each exposure has a price tag, set in advance.
There is another way to price, of course. Digital advertisers already knew it. It was called Google. Cost-per-click (CPC) charges buyers only when somebody does something. A more general form of this is called cost-per-action (CPA). You can see how paying for actions like clicks would be preferable to advertisers throughout the land. The exception would be big brand money wedged way, way up the funnel. Brand people really do expect an action, just not anytime this year.
We are circling the question here, namely: How can the price of an ad get closer to its real value to the buyer?
American Express knows how many people who click on its Google text ad fill out an application, get approved, become lucky Cardmembers. It’s a formula AmEx can back into, an estimate of the value of each click. But for banner ads — what? I have to estimate how much an orange button wafted in front of someone’s nose may be worth in a while and change my mind next month.
To help answer the question, someone (perhaps at Right Media) came up with the idea of “effective CPM” or eCPM. The formula is:
eCPM = Cost per Action x Action Rate x 1000
Or, more realistically (since clicks are easy to measure):
eCPM = CPC x CTR x 1000
This formula makes it possible to compare campaigns. It’s not ideal. But using the best numbers we have in this world (clicks and impressions), it approximates an ideal. In balloon shapes:
What the Payoff Is To me (Advertiser) x How Often It Happens = The Most I’m Willing To Pay (Unless I’m Stupid)
Something like that. Now, you already know what the payoff is — it’s your new card applications. But how do you know how often it will happen? You can look at your monthly report from Digitas, but you already have that. What about before you decide to run the campaign, when you’re figuring out what to pay to whom? Or during the campaign, to get better results? Wouldn’t it be great if there were some way to predict success?
Wouldn’t it be great if we could predict the click-through-rate?
Our Netscape Nostradamus, O’Kelley, fired up the academic journals and came across some ancient wisdom. A sermon-spouting sage actually gave him the shibboleth to unlock the secrets of the click-through-rate.
What he stumbled on was Bayes’ Theorem, named for an 18th century non-conformist British minister named Thomas Bayes.
Today, Bayes Theorem is taught in every machine learning class and the (simplified) Naive Bayes algorithm is included in every standard ML library. But in 2003, there were no machine learning classes and no standard ML libraries.
If O’Kelley really was the first to lionize this algo and apply it to his click-through prediction problem, then what we are looking at here is a very significant moment in the annals of the craft. The kind of conditional probability problem Bayes addressed is all over ad tech to this day. It is the very air of advertising.
“[Naive Bayes] wasn’t too accurate,” O’Kelley told Mike Smith in Targeted, where this story is sketched. “But it was a hell of a lot more accurate than the algorithms being used by others. They were using an abacus. I was using a slow calculator. But it was accurate enough.”
Leaving O’Kelley and Right Media aside for the moment — they did fine — we’ll sidle into a bigger question:
What is Bayes Theorem, anyway?
Which brings us to our humble cleric who invented ad tech.
The Bad Preacher
The few who admired the living Thomas Bayes would be shocked by his fame. He was not the big man of his day. Although he lived at a time when everybody and their alias had a printing press and could not shut up — a time like our own — he published very little, anonymously. We don’t even know what he looked like: no verified portrait exists.
A frustrated biographer wrote: “There is very little primary source material on Bayes and his work.” A few books, three letters, a notebook, some coded scrawls on electrical experiments. As Stephen Stigler of the University of Chicago says, “Bayes was a minor figure in his own time, but an icon to our age.”
Bayes’ Theorem was almost buried with him.
Bayes was born at the turn of 18th century and died before the American Revolution. He inherited a kitchen utencil fortune but worked as a Presbyterian minister in a spa town forty miles from London. Mathematics was a hobby, as it was for many holy men of relative leisure in those days, and he had a practical mind.
For example, an acquaintance of his wrote an essay to support the idea that smart people are naturally nicer than dumb ones. Bayes disagreed. He replied that there did not seem to him to be any relationship at all between “intelligence and kind and benevolent actions.”
Which is not very nice but certainly true.
Another man wrote a piece that said, given a lot of observations, the mean of a small sample is a better estimate of the observations’ real mean than any single observation. What may seem like common sense struck Bayes as missing the pin. It depends, he said, on the accuracy of the measuring instrument.
Evidently, Bayes was not one of the era’s great preachers. We do not really know why but can — unfairly — throw out a guess. In those days, a good sermon was a long one and his audiences were known to drift off, particularly on warm days. He may have been dull, even humorless. A biographer wrote that he was a “quiet man, of earnest thought and abiding faith and of immense intellectual stature. …”
After his father the utencil mogul died, he became a wealthy hombre and retired to a very large library to write a bit and make no noise. On his death he left his papers to an acquaintance named Richard Price, who recognized Bayes’ originality and two years later published Bayes’ “Essay Towards Solving a Problem in the Doctrine of Chances,” which contains the famous Theorem.
Price was a much more interesting man than Bayes. Widely loved for “the unaffected sweetness of his disposition,” he was a gregarious character, also a minister of independent means. He set up a kind of Algonquin Round Table of illuminati, entertaining bold-face names like David Hume, Adam Smith, Mary Wollstonecraft … and proto-American ninjas like Thomas Jefferson, John Adams, Thomas Paine and Benjamin Franklin.
The revolution in inductive probability was not the only revolution Price supported.
Price wrote an introduction to Bayes’ posthumous paper. He said: “I find among his papers a very ingenious solution of this problem … that has never before been solved.” How much tinkering Price himself did is not really known. A friend said he took two years to “undertake the task of completing Mr. Bayes’ solution” which was “imperfect.”
It also seems to have been independently discovered by the French mathematician Pierre-Simon Laplace and “further developed” by Sir Harold Jeffreys, also in Price’s circle. But at any rate, Price gave the credit to Bayes and that is where it goes today.
What is “this problem” that Bayes set out to solve?
Price described it like this. Bayes tried to find a way to know “the probability that an event has to happen” given that “we know nothing concerning it but that, under the same circumstances, it has happened a certain number of times, and failed a certain other number of times.”
In other words, Bayes wanted to predict the probability of an events’ occurring in the future based only on what has happened in the past. Rather than running down to the nearest race track, Price took a spiritual angle. He saw in Bayes’ work a kind of proof that “the frame of the world must be the effect of wisdom and power of an intelligent cause.”
Believe it or not, the man who published Bayes’ Theorem did so because he believed it could prove the existence of God.
As an aside, I’ll mention that the paper was a rousing success, Price was elected to the Royal Society and moved on to do statistical work in insurance, demography, politics and supply-side ad platforms … oh, wait, we’re ahead of ourselves.
(If you’re looking for a good short biography of Thomas Bayes, there is one right here.)
And if you’re looking for a simple explanation of Bayes’ Theorem and how it’s used in the world of ad tech today … well, that’s what I’ll try to do next time. If you use Naive Bayes or any Bayesian method in your ad work, please leave a comment or let me know @martykihn.
The Gartner Blog Network provides an opportunity for Gartner analysts to test ideas and move research forward. Because the content posted by Gartner analysts on this site does not undergo our standard editorial review, all comments or opinions expressed hereunder are those of the individual contributors and do not represent the views of Gartner, Inc. or its management.