Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Implements Hindsight Experience Replay #361

Open
wants to merge 62 commits into
base: master
Choose a base branch
from

Conversation

prabhatnagarajan
Copy link
Contributor

No description provided.

@prabhatnagarajan prabhatnagarajan changed the title [WIP] adds an empty HER class [WIP] Implements Hindsight Experience Replay Nov 26, 2018
@ummavi
Copy link
Member

ummavi commented May 29, 2019

Here are a couple of differences from the original paper I noticed:

  • Using target network to pick actions during evaluation. From the paper:

    Apart from using the target network for computing Q-targets for the critic we also use it in testing episodes as it is more stable than the main network.

  • Actor output regularisation. From the paper:

    In order to prevent tanh saturation and vanishing gradients we add the square of the their preactivations to the actor’s cost function.

    This might help performance by encourage the actor to take smaller action steps leading to finer control.

Please verify/be advised of the following:

  • The paper mentions training for 200(epochs)x50(cycles)x16(episodes) which have been approximated to run for (200x50x16x50) time-steps. What happens when a goal is reached prematurely (before 50 steps) ?
  • Scale of additive gaussian noise for the explorer is set to 20% (perhaps based on the report and reference implementations). The original paper reports it as 5%.
  • Environment Version: The original release of the Fetch environments (v0) were modified (v1) to have the table fixed to the floor (see: Remove joints from table to avoid that Fetch can slide it to cheat openai/gym#962). Unsure how this effects results reported in the paper.
    A recent pull request has also slightly modified joint angles for slide (see: Fixes fetch/slide environment. openai/gym#1511).

@prabhatnagarajan
Copy link
Contributor Author

herFetchReach-v1
herFetchPush-v1
herFetchPickAndPlace-v1
herFetchSlide-v1

@ummavi ummavi self-assigned this Jul 17, 2019
if self.obs_normalizer:
batch['state'] = self.obs_normalizer(batch['state'],
update=False)
batch['next_state'] = self.obs_normalizer(batch['state'],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be

Suggested change
batch['next_state'] = self.obs_normalizer(batch['state'],
batch['next_state'] = self.obs_normalizer(batch['next_state'],

@@ -251,6 +265,9 @@ def compute_actor_loss(self, batch):

# Since we want to maximize Q, loss is negation of Q
loss = - F.sum(q) / batch_size
if self.l2_action_penalty:
loss += self.l2_action_penalty \
* F.square(onpolicy_actions) / batch_size
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this also include a F.sum term around the F.square?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants