Yes — I said it.
We’re going to talk about putting together a recommender system — otherwise known as a recommendation engine — in the programming language Python. With code. What’s more, recommendation engines use machine learning, so my diabolical purposes here is clear: to demystify predictive analytics, machine learning, recommenders and Python for the people.
This is not a Ph.D. and is not for the ninjas. You know who you are and I invite you to enjoy another blogger.
Having spent a few months building my own basic recommender system in — perhaps you saw this coming? — Python, I can tell you there is nothing to fear. You may need great genius to be a great data scientist, but you do not need it to do data science. Human learning can understand machine learning. Let’s prove this to ourselves now.
The goal of a recommender system is to make product or service recommendations to people. Of course, these recommendations should be for products or services they’re more likely to want to want buy or consume. In a word, recommenders want to identify items that are more relevant.
Relevance is at the heart of modern marketing. One-to-one relevance is the worry bead of all multichannel campaign management, mar-tech, ad-tech, mad-tech and digital marketing hub platforms. User-level personalization is where we all want to go and recommendation engines are one of the best early examples of how this can work.
So what is a recommendation system? As we’ve said elsewhere at incredible length, there are in fact a number of different types of recommenders. They all start with the goal of matching consumers — we’ll call them users — with the right products and services — we’ll call them items. They differ in their analytical approach, which in turn is limited by the information that the system starts with.
There are two basic types of recommenders. One uses something called content-based filtering. The other uses something called collaborative filtering. Generally speaking, content-based systems are simpler but come up with less interesting recommendations. Collaborative systems can get very complicated and unwieldy and require a lot of user-generated data, but they’re the state of the art.
Collaborative filtering requires a data source that can tell the recommender what a bunch of users felt about a bunch of items. An example here is Netflix or Yelp, which maintain large databases of user ratings for large numbers of items. This rating-type data is at the heart of many recommenders. Where ratings aren’t available or are too sparse, a marketer can use what we could call “implicit” proxies for ratings, such as purchases, repeat purchases, and even website behavior.
For example, an ecommerce company could say that a user who buys Bernese mountain dog tea cozies every week rates that item highly, and another who lingers for 20 minutes on a tabby cat bed display “likes” tabby cat beds. But it’s not as good as an actual rating.
Collaborative filtering comes in a number of flavors. The two most common are item-item filtering and user-item filtering. Item-item filtering will take a particular item, find people who liked that item, and find other items that those people (or people similar to them) also liked. It takes items and outputs other items as recommendations. On the other hand, user-item filtering will take a particular person, find people who are similar to that person based on similar ratings, and recommend items those similar people liked.
You can see how both flavors use the ratings as a data source but have different inputs (in one case, items, in another, people).
A simple way to think about the different types of recommenders is:
- Content Filtering: “If you liked this item, you might also like …”
- Item-Item Collaborative Filtering: “Customers who liked this item also liked …”
- User-Item Collaborative Filtering: “Customers who are similar to you also liked …”
It is this last type of recommender that can be used, with some manipulation, to predict how a particular person might rate an item they haven’t already rated. Recommender historians among you will immediately recognize this as the infamous Netflix Prize challenge of song and story. And in fact, the Netflix Prize is responsible for making recommenders sexy and has inspired millions of lines of code — and probably even more blog posts, including this one and this one.
1. Item-Item Content Recommenders
Manual Labels. Content recommenders don’t need ratings or even implicit user preferences. They ignore users. This makes them somewhat more feasible in practice — it’s always difficult to get people to rate things without annoying them — but doesn’t mean they’re necessarily simplistic.
As you’ve probably guessed, there are two different types of content recommenders also. (There are at least two types of everything in data science. I have no idea why.) One type we’ll call Pandora. This type takes a bunch of items (aka songs), manually assigns a list of descriptors (what machine learners call “features”), and calculates how close or far away each song is from each other song, based on these features. Items that are closer together are called “similar” and items that are not … well, are not.
This type of system is very good at finding item-songs that are like other item-songs you liked, but not so hot at finding something new. There’s also the question of how to calculate the “close or far away” metric. This is approached using a distance algorithm, and there are literally dozens of them. The most commonly applied to recommenders, as far as I can tell, are Euclidean, Pearson and cosine similarity.
Sound intimidating? Let’s calculate a Euclidean distance together, shall we? You have Item #1 with 1’s and 0’s indicating yes/no for each of 100 features. You have Item #2 with the same. Subtract the rows. That’s the Euclidean distance between the items in feature-space. Whew.
It gets even easier. Many vendors I talk to walk out their “machine learning algorithms” as though they’re whispering a MacArthur Fellowship they just happened to win. Do not fear the machine learning algorithm, friends. Do not assume these vendors know anything more than three words. We will do one right now.
Neighborhood Methods. A common approach to determining the similarity (or difference) among items is to have the machine put them into groups. This is called neighborhooding or clustering and is an example of unsupervised learning. (“Unsupervised” is an adorable term that means you don’t have any particular value you’re trying to predict. It does not always imply that the machine has run amok and started its devious plotting … although it might.)
First, arrange your items into rows, with one item per row. The columns are our features. There are 1’s and 0’s to indicate you know what. We will use the — ahem — machine learning algorithm called K-means clustering to tell us which items are similar to which other items. First, we go get a Ph.D. … then … oh, wait, no. Somebody already did that.
Fire up your Python console. We machine learning amigos rely heavily on a massive library of prewritten code called scikit-learn. This is a very well maintained, entirely free, and dazzlingly comprehensive cornucopia that includes all those fancy “machine learning algorithms” so beloved of people trying to sell you software. And it includes K-means clustering.
So we load up our library thus: “from sklearn.cluster import KMeans”. And then:
That’s it. You’ve executed a machine learning algorithm. You can confidently assert you are not only a data scientist but a worthy recipient of large contract dollars.
2. Item-Item Collaborative Recommenders
Market Basket. Another type of content-focused recommender we’ll call, um, Amazon 1.0. (Amazon still uses this system at times, I’m told.) The Amazon 1.0 approach is also called a “market basket” recommender. Why? It simply looks at which items tend to be purchased with other items and recommends these frequent ride-alongs. A nice and clear description by a couple of Amazon dudes is right here.
Technically, we’ve moved from content-based recommenders to collaborative recommenders now. It’s worth a moment to hover again on the difference. In the recommender science world, anything is “collaborative” if it uses information about individual user behaviors or attitudes. Our Pandora example would have worked without a single user/listener — it could happily recommend similar songs to no one in the dark. This Amazon 1.0 groove will not. It needs user/people to do something. In this case, they’re buying things in the same shopping trip; later we’ll see situations where the user/people gave things a rating. In either case, because you’ve got user/people level info, you’re doing “collaborative” recommending.
(Who’s collaborating? Nobody knows. I suppose you could say the users are collaborating with the mag wheels of commerce without even knowing it. But it’s one of those terms we just have to inhale.)
Back to Amazon 1.0. This method can lead to a certain kind of madness. We’re contemplating a big two-by-two matrix with every single item you sell along one axis and every single item you sell along the other axis — and every single cell filled in with some distance or correlation measure. So the size of this matrix is your entire catalogue times the number of features … squared. Which is fine if you sell 1,000 items with 5 features, since your matrix is only 1,000,000 squares big. But if you sell a lot more, you’re burdened.
On the other hand, not every item gets purchased with every other item. Some pairs are so rare you can ditch them. And a lot of the heavy lifting can be done offline, meaning it can be precomputed overnight. When a person appears on a website, a simple lookup can be done to whatever item they’re considering and the high-correlaters returned pronto. This makes reco’s feasible in web time.
If you’re interested, the way Amazon describes their algorithm (in English, not Python), is like this: “For each item in the catalog / For each customer who bought it / For each other item bought by that customer / Record that the customer bought both items.” After that, you just figure the similarity between items based on this handy record. Done.
It’s like a little poem. A dull poem, perhaps, but more lucrative than most.
Rating Based. In a happy world, your users have actually taken the time to provide explicit feedback on your items. Suppose you — oh, I don’t know — sell books, or movies, and your name is — oh — B+N or Netflix. Or suppose you run a ratings service like GoodReads (owned by Amazon) or Yelp (owned by the people). Users give things star ratings and these can be used to make recommendations. How?
Well, one way is pretty similar to what we just talked about above. Let’s reminisce. In both the Pandora and the Amazon 1.0 examples we:
- Had a list of items
- Calculated a “distance” between each item
- Recommended items that were “close” to items the user liked/bought
In the Pandora case, the distance number was calculated based on similarity of features. For Amazon, it was based on how often an item was bought with another item(s). Now we have star ratings. I know you can smell this one down the interstate: we’re going to:
- Take a list of items
- Calculate a “distance” between each item
- Recommend items that are “close” to other items the user liked
De ja vu all over again. In this case, this happy world, users have told us what they like and don’t like. They’ve gone to the trouble to rate things from 1 to 5 stars. It’s a simple matter now of writing a little piece of Python that calculates the distance between item 1 and item 2 based on all those delightful star ratings. Like this:
Now, let’s face reality. Not every user will rate every item. (In fact, you’re lucky to get 2% of possible ratings filled in the real world.) So there’s a horrible sparsity problem. Our solution here is to pull a list of all the users who rated item 1, another list of all the users who rated item 2, and find out where they overlap. (This is the “overlap” value above, which is just a list of reviewers who happened to review both items.) Now that we know the common reviewers, we pull a list of all the reviews (by these reviewers) of item 1 and of item 2 and calculate the distance between them. In our example, we use Pearson’s r correlation coefficient to figure out this distance (that’s the “pearsonr” thing). Scikit already wrote that code for you.
In this way, we can see if people who liked item 1 were also likely to like item 2 (based on a relatively short distance, or high correlation, between the ratings). Of course, this evaluation works in the opposite direction also: like 1, hate 2, high distance. Why do we care? Obviously, if we encounter a victim who likes item 1 but has not yet encountered item 2, we use our collaborative calculations to figure out whether or not said victim is likely to like it.
It’s like having our minds read by a calculator.
3. User-Item Collaborative Recommenders
All of the above started with the item itself. Pandora didn’t need the user at all. Amazon took an item of interest and recommended others based on their appearance in real shopping carts. It took in an item and churned out other item reco’s. This last approach uses some of the very same basic methods we’ve already seen but starts in a different place. We start here with the people.
With you, my friend. And me. People with hopes and dreams and desires for product recommendations. It starts with one of us and tries to predict what we would like.
How? Thanks for asking. The basic approach is again steeped, soaked, encased within the idea of similarity. (I’m thinking I’m probably applying these terms similarity and distance too loosely, but I’m a loose fellow, an approximator; in this context, they’re getting at the same idea: something is “similar” to something else if it is less “distant.”)
User-item collaborative recommenders work like this:
- Take a bunch of people who have rated things
- Calculate the “distance” between each pair of people based on their ratings
- Find the other people who are most similar to you
- Recommend things to you that people like you also liked
Sounds easy. It’s not. I’ve belabored some of the practical problems, including data sparsity and the fact that this system doesn’t work at all if you (the user) haven’t rated anything. But in our utopian blogosphere, here in this moment, we’ll ignore these issues and give you a simple way to envision the secret center of these recommenders. That is, a way to calculate the “similarity” between two users. This is even simpler than the Pearson’s hoohaw draped above.
Ready? Here it is:
This function just subtracts all the reviews of user 1 from user 2’s and returns the average. That’s it. It’s Euclidean. It’s easy. It requires one whopping workaround to deal with the sparse data problem: for every item, I assumed that a missing review was actually equal to the mean rating for that item. So everybody rated everything whether they wanted to or not — either in reality or by assuming the mean. It’s a stretch, but what can we do? Real-life data science takes a lot of hard turns.
The way the Netflix Prize was set up, the competing teams were asked to predict star ratings for movies that particular people had not yet rated. Contestants were given a lot of user id’s and ratings. Obviously, collaborative filtering of a more impressive nature was called for, and delivered. Rating-based user similarities were calculated. But in general, how can we go from a list of user similarities to a predicted star rating on a 5-point scale?
Here’s one approach that seemed to work nicely for some of the teams. They deconstructed the star rating into different components. Three logical components even us remedial types can jump onto the bus with are these:
- Average rating for the item — this is the best starting point we have … PLUS:
- User bias — because individual people can be harder or easier graders than the average … PLUS:
- User similarity to fans (or haters) — reflecting how much we look like people who liked the item (or hated it)
These three things can be put into a simple formula that churns out a star rating prediction. Amazing, isn’t it. And the first two components are super easy to calculate. In fact, here are two lines of code that return the average rating for each item (here called a “bus” for “business,” since I was looking at restaurant recommenders) … and the “user_bias,” which is simply the average of that user’s rating difference from the mean of all raters.
Now, the user similarity to fans or haters is just a similarity calculation that we saw above. In this case, it’s applied to a subset of users who count as “fans” (which could be defined as people who rated the item above the mean) or “haters” (the opposite). For kicks, it might look something like this:
Which might — kicking again, and some more — end up with a simple function that added up the item mean, the user bias, and the adjustment for similarity to fans or haters. Thus:
And that’s it. Machine learning! Algorithms! Correlation. Collaborative filtering. Recommendation engines. And you’re wondering: is that it? Is machine learning really like Oz, where the man behind the curtain is a man behind a curtain?
Of course not. It’s real and the smartest people in the room are working on it now. (Not this room, some other room; but you know what I mean.) Recommender systems themselves are a thriving subspecialty undergoing rapid improvement. It will be particularly wonderful when they can figure out how to dispense with explicit ratings entirely and simply use implicit cues, such as purchases, behaviors, comments, and motion.
And I haven’t even mentioned all the linear algebra techniques applied by the winning Netflix Prizers: things like principal component analysis, singular value decomposition and matrix factorization, all of which can find underlying structure in those large sparse monsters recommenders have to face.
But you’ve got the flavor, flav. Do not fear the machines. That is my wish for you (and me) for 2016. Peace.