Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SMM intrinsic motivation signs #29

Open
AOS55 opened this issue Oct 28, 2022 · 1 comment
Open

SMM intrinsic motivation signs #29

AOS55 opened this issue Oct 28, 2022 · 1 comment

Comments

@AOS55
Copy link

AOS55 commented Oct 28, 2022

Hey,

Not sure if anyone can clarify just wanted to check on signs with intrinsic reward for SMM

intr_reward = pred_log_ratios + self.latent_ent_coef * h_z + self.latent_cond_ent_coef * h_z_s.detach()

The original paper in equation 3 has:

r_z(s) = log(p*(s)) - log(rho_pi(s|z)) + log(p(z|s)) - log(p(z))

Why do we add the log(rho_pi(s|z)) == pred_log_ratios and log(p(z)) == self.latent_ent_coef and not subtract them as in equation 3, sorry if this is obvious 😄

@chhas
Copy link

chhas commented Jan 29, 2023

I'm also wondering about the signs of the terms within SMM's intrinsic reward.
Regarding pred_log_ratios, I noticed that the VAE of the original SMM implementation returns the negated log_prob (= h_s_z) value.
And within the intrinsic reward it is negated again.
Hence, URLB's intrinsic reward might be correct w.r.t. the sign of h_s_z because URLB's VAE does not negate the log_prob in the first place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants