Personalization and Popularity Bias

Is popularity bias just a convenient excuse?

Michael Roizner
3 min readDec 26, 2023

A common issue in recommender systems is the lack of personalization, resulting in mostly popular and not very relevant documents being shown to users.

In the community, there’s a well-known problem of popularity bias. But what exactly is it? Bias implies a systematic error. But where is the error in this case? And does it really exist?

Photo by Nicholas Green on Unsplash

In simple terms, popularity bias refers to the ‘the rich get richer’ situation, where popular documents are disproportionately more often recommended by the system than less popular ones. There can be various reasons for this, and the literature discusses different aspects of this phenomenon. It’s important to distinguish these reasons because it greatly aids in debugging the system.

In the work on recommendations, it is very useful to distinguish two important steps:

  1. Training a model to predict user response to a recommended item. In a simple case, this is just the probability of a click, and in a more general case— E(engagement | item, user, context).
  2. Actually building recommendations using this model. Simple ranking based on predictions is not the most optimal, although it is a good baseline.

In many discussions, popularity bias implies the suboptimality of step 2. That is, even if a more popular item is more likely to elicit a positive response from the user, it might be better to recommend a less popular item. There can be several reasons for this — both user-centric (in the long run, a click on a popular item may be less valuable for this user than a click on an unpopular one) and from the perspective of the entire ecosystem (this might slightly worsen the experience for this user, but it balances the consumption distribution across the entire database of items). These are generally reasonable thoughts, but we must honestly admit: we are sacrificing engagement at the moment of a specific query for a brighter future.

The simplest way to implement this idea (and, in my opinion, other methods haven’t strayed far from this) is to penalize an item for its popularity. This is closely related to PMI, which we discussed in the post about two-tower networks.

In other instances, popularity bias is related to the first point: the imbalance of items hinders our ability to effectively train the model E(engagement | item, user, context). Specifically, the model might poorly account for user features and essentially just learn E(engagement | item), closely associated with popularity (by the way, in this post, I sometimes refer not to P(item) but to E(engagement | item) as popularity). This is a very tangible problem. However, I don’t quite understand why it’s called a bias.

Here, advice depends on the specific model. Here are a few suggestions:

  • Ensure that the model includes informative personal features.
  • Introduce a separate component within the model responsible for popularity, so that the rest of the model can focus on specificity.
  • If the model learns item embeddings, check how well they have been learned. For example, by looking at the most similar items to a given one.
  • If using negative sampling, take popularity into account. Just remember to multiply it back when applying it to get E(engagement | …), as discussed in the same post.
  • And simply check that the model has learned properly. Yes, this is not so simple. It is part of a rather complex but critically important topic called ML Debugging.

By the way, regarding “disproportionately more often.” No one promised that with simple ranking, the probability of being recommended would be proportional to the popularity or CTR of the document. That’s not the case at all. Perhaps that’s why they call it a bias?

In my experience, there have been many cases where teams: a) don’t consider what exactly they’re calling popularity bias and what its causes are, b) face issues with a lack of personalization simply due to a poorly trained model E(engagement | …).

It’s very important to understand whether it’s just the way the world works, that popular but less relevant items indeed get better responses on average, or if we have simply trained the model poorly.

Much more often, popularity bias is just a popular myth that conceals system bugs.

The importance of a good engagement model should not be underestimated.