I have been trying to wrap my head around the TF Agents (mainly multi-arm bandits) and TF Recommenders, mainly the difference between the two. Both have a way to use contextual features, and I don’t remember if exploration is built into TF Recommenders, but that would be quite easy to add “manually” to code by not always showing the best recommendation but with some probability choosing something from top 5.
As an illustrative example if I were to have a dataset like:
price | quantity | seasonality_features | item_features
Couldn’t I use both packages such that I could get a recommended price and the reward would be quantity based on contextual features like time of day and some item features? The price could be bucketed so it is categorical for this example.