Beyond External Embeddings: Integrating User Histories for Enhanced Recommendations
Leveraging External Embeddings in Recommender Systems: A Practical Guide
In recommender systems, you frequently encounter situations where you have access to ‘external embeddings.’ These are…
If the object space of this external history coincides with the main space of recommended objects (or significantly intersects or is explicitly connected), then the case is simple: we can add another type of event to our system — external interaction — and in all models, also consider the history of this type with recommended objects. Even if the recommended and external objects are not explicitly connected, sometimes they can be implicitly linked — by training a content model that will “match” external and internal objects.
If the external and internal objects are entirely different, things become more interesting. There may exist statistical patterns that can be learned. One can either explicitly search for them (“users interacting with external object A more often interact with internal object B than other users”), or train models similar to SLIM — linear with cross-features [user interacted with object A, we assess object B].
SLIM: A Fast and Interpretable Baseline for Recommender Algorithms
Continuing from our post about linear models, today we’ll delve into a specific case — Sparse Linear Methods (SLIM)…
Beyond Counters: Linear Models in Recommendations
Linear models are nearly the simplest tools in machine learning, but they shouldn’t be underestimated. They are…
However, in my opinion, the most suitable method (though somewhat more challenging to debug) seems to be two-tower networks. Even when they are trained without external history, they do not utilize the fact that the object spaces coincide. It’s still beneficial not to make the embeddings of objects in the left and right towers shared but to train them separately. Therefore, it’s possible to use external history just in the user tower. The network will ultimately embed both the user and the recommended object into a common semantic space.
Two-Tower Networks and Negative Sampling in Recommender Systems
Understand the key elements that power advanced recommendation engines
One can train a separate network for the external history or simply add the external history as an additional signal to the input of a two-tower network. The second option is likely more powerful, as external and internal histories can influence each other in non-trivial ways (especially if self-attention is used). However, it is more challenging to train: the network might simply ignore the additional, less informative, and noisier signal. Therefore, I would start with the first option. Moreover, it’s much easier to verify that it learns something sensible.
A separate question arises: how exactly to use an additional model with external history? There are several options:
- Features in a ranking model
- Product rules (boosting recommendations from such models)
- More sophisticated techniques like modifying losses/targets in ranking
I strongly recommend always starting with option 1 — features. Even if it is not the most efficient method, without it, the other options will not work optimally. Moreover, they can often even harm the system.
However, sometimes adding features does not help at all (they are simply not utilized by the model) due to the lack of historical examples in the dataset where the system recommended something related to external history. Therefore, the approach is as follows: we deploy features, add candidates, and if necessary, boost these candidates for a limited time. After that, the features begin to be effectively used by the model, and the boost can be turned off. This method has worked well for us.