I found this:
…there is a new version of the algorithm that uses only a Q function and disposes of the V function. It also adds automatic discovery of the weight of the entropy term called the ‘temperature’
By looking at the current code, I do not see this temperature parameter, or is it just named different?
Reference:
1 Like
Hi Janis, those are two different algorithms, CQL-SAC and SAC.
CqlSacAgent implements the CQL algorithm for continuous control domains from “Conservative Q-Learning for Offline Reinforcement Learning” (Kumar, 20).
The SACAgent uses target_entropy described in the paper and the Automating Entropy Adjustment described in the paper.
2 Likes