In a recommendation system, we typically use an instance of tfrs.layers.factorized_top_k.ScaNN or tfrs.layers.factorized_top_k.BruteForce to retrieve the nearest neighbors of a query vector. Is there a convenient way to retrieve the furthest neighbors instead?
I looked at the code for tfrs.layers.factorized_top_k.BruteForce and realized I could create a nearly identical class (BruteForceFurthest) that returns the furthest neighbors by changing one line of code. Here is the relevant code excerpt from the call() function.
scores = self._compute_score(queries, self._candidates)
values, indices = tf.math.top_k(-scores, k = k) # Negate the scores to get the furthest neighbors
return values, tf.gather(self._identifiers, indices)
As you can see, I think it is as simple as negating the scores so that the “top K” scores are really the bottom K scores.
How about inverting the distance functio ? Any kind of transformation on the distance calculation that would reverse its meaning would do – (x * -1), (x ^ -1), etc.
It’s kind of interesting also to consider when the “distance” is computed with multiple dimensions, how a euclidean distance or RMS would be the typical distance metrics we humans would gravitate towards. But any arbitrary Minkowski distance (Minkowski distance - Wikipedia) could be used. Switching up the p-value can produce very different N-body graphs from the same original coordinates.
I figured I could simply use the same distance and score calculation that tfrs.layers.factorized_top_k.BruteForce uses by default but negate the scores, as shown in the code excerpt I provided.
In any event, I ended up implementing another class that combines the “positive” and “negative” retrieval models and averages the respective “nearest neighbors” and “furthest neighbors” scores.