Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoder updating in ICM implementations #10

Open
xf-zhao opened this issue Jan 25, 2022 · 1 comment
Open

Encoder updating in ICM implementations #10

xf-zhao opened this issue Jan 25, 2022 · 1 comment

Comments

@xf-zhao
Copy link

xf-zhao commented Jan 25, 2022

Hi, thank you all for this remarkable work. I found the codes are very well constructed.

I have one question about the implementations of ICM. I noticed that the encoder is only updated according to the loss of forward+inverse prediction model, and is not updated when critic networks udpate (since obs is detached when calling self.update_critic), though there is a parameter update_encoder=True that should control the behaviour (see url_benchmark/agent/icm.py, line 118-125, also as below).

        if not self.update_encoder:
            obs = obs.detach()
            next_obs = next_obs.detach()

        # update critic
        metrics.update(
            self.update_critic(obs.detach(), action, reward, discount,
                               next_obs.detach(), step))

I guess it is a choice after testing with it on and off? But if so then it will raise another question: the encoder is trained during pretraining procedure, but the one which randomly initialized ("random init" in the paper) used is not. So when comparing them, we cannot say that the representations learned using ICM is better than from random exploration.

Thank you in advance!

@seolhokim
Copy link

I also have same question. DDPG updates encoder when training critic. but APT-ICM trained encoder when training only ICM. in my points of view, It looks not enough..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants