Soft Actor-Critic Algorithms and Applications
Section 5: Automating Entropy Adjustment for Maximum Entropy RL 참고
ent_coef (Union[str, float]) – Entropy regularization coefficient. (Equivalent to the inverse of reward scale in the original SAC paper.) Controlling exploration/exploitation trade-off. Set it to ‘auto’ to learn it automatically (and ‘auto_0.1’ for using 0.1 as the initial value)
SAC - Stable Baselines3 1.2.1a2 documentation
stable_baselines3.sac.sac - Stable Baselines3 1.2.1a2 documentation
self.log_ent_coef = th.log(th.ones(1, device=self.device)*init_value).requires_grad_(True)
self.ent_coef_optimizer = th.optim.Adam([self.log_ent_coef], lr=self.lr_schedule(1))
ent_coef = th.exp(self.log_ent_coef.detach())
ent_coef_loss = -(self.log_ent_coef * (log_prob + self.target_entropy).detach()).mean()
ent_coef_losses.append(ent_coef_loss.item())