[WIP] Implements Hindsight Experience Replay #361

prabhatnagarajan · 2018-11-26T11:22:44Z

No description provided.

…lass)

…lay buffer

ummavi · 2019-05-29T09:13:59Z

Here are a couple of differences from the original paper I noticed:

Using target network to pick actions during evaluation. From the paper:

Apart from using the target network for computing Q-targets for the critic we also use it in testing episodes as it is more stable than the main network.
Actor output regularisation. From the paper:

In order to prevent tanh saturation and vanishing gradients we add the square of the their preactivations to the actor’s cost function.

This might help performance by encourage the actor to take smaller action steps leading to finer control.

Please verify/be advised of the following:

The paper mentions training for 200(epochs)x50(cycles)x16(episodes) which have been approximated to run for (200x50x16x50) time-steps. What happens when a goal is reached prematurely (before 50 steps) ?
Scale of additive gaussian noise for the explorer is set to 20% (perhaps based on the report and reference implementations). The original paper reports it as 5%.
Environment Version: The original release of the Fetch environments (v0) were modified (v1) to have the table fixed to the floor (see: Remove joints from table to avoid that Fetch can slide it to cheat openai/gym#962). Unsure how this effects results reported in the paper.
A recent pull request has also slightly modified joint angles for slide (see: Fixes fetch/slide environment. openai/gym#1511).

prabhatnagarajan · 2019-06-10T06:55:02Z

ummavi · 2019-07-26T03:19:40Z

chainerrl/agents/ddpg.py

+            if self.obs_normalizer:
+                batch['state'] = self.obs_normalizer(batch['state'],
+                                                     update=False)
+                batch['next_state'] = self.obs_normalizer(batch['state'],


Shouldn't this be

Suggested change

batch['next_state'] = self.obs_normalizer(batch['state'],

batch['next_state'] = self.obs_normalizer(batch['next_state'],

ummavi · 2019-07-26T03:38:21Z

chainerrl/agents/ddpg.py

@@ -251,6 +265,9 @@ def compute_actor_loss(self, batch):

        # Since we want to maximize Q, loss is negation of Q
        loss = - F.sum(q) / batch_size
+        if self.l2_action_penalty:
+            loss += self.l2_action_penalty \
+                        * F.square(onpolicy_actions) / batch_size


Should this also include a F.sum term around the F.square?

adds an empty HER class

100ba3b

prabhatnagarajan changed the title ~~[WIP] adds an empty HER class~~ [WIP] Implements Hindsight Experience Replay Nov 26, 2018

prabhatnagarajan added 28 commits November 26, 2018 05:24

fixes flake

a43dac1

Merge branch 'master' into her

e5d8c74

Merge branch 'master' into her

e15a4e1

adds a train_her_example file

8399f10

modifies phi function to remove bug

98565f0

uses hindsightbuffer instead of replay buffer (hindsight is a shell c…

4633be3

…lass)

Merge branch 'master' into her

87cfb8d

small modifications for her

6bb6990

adds current episode variable and assert that n != 1 to hindsight rep…

0e5b063

…lay buffer

adds part of HER transition-storing loop

0329358

minor changes to support future goals:

d65e6b4

makes HindsightBuffer and EpisodicBuffer

0796caa

implements future sampling

3899264

updates the update frequency

4a20443

changes default gamma to be 0.98, to match paper

e65e784

changes buffer size to avoid error

119e2ed

adds some starter code for the HER explorer

38582aa

implements HER exploration

5595404

adds normalization to DDPG and HER

e353632

adds a clip threshold argument

07575ad

Merge branch 'master' into her

160831c

Merge branch 'master' into her

7602139

gets rid of batch normalization option in her example

466cae5

sets eval interval to match paper

26ae4e3

adds wrapper class to get success rate

8a10d1a

makes some fixes to normalization code

17804df

implement clipped critic target to her

6fd78ba

updates target updat einterval to match paper

4afb548

prabhatnagarajan added 6 commits March 20, 2019 07:15

adds some tests and comments to HER

caebfa1

fixes merge conflict

e809d6a

makes a batch HER agent

7fc391d

Merge branch 'master' into her

0d43949

merges and addresses flakes

73dc692

Merge branch 'master' into her

02fa906

prabhatnagarajan added 2 commits June 6, 2019 23:43

reverts epsilon to match original paper

d9b110b

Merge branch 'master' into her

280571f

prabhatnagarajan added 10 commits June 13, 2019 08:34

merges with master

b4b2cea

adds args to Hindsight Buffer

620be35

adds action penalty to ddpg

25e563b

only switches goals if achieved goal is not none

753dc70

Merge branch 'master' into her

a98e20c

adds observation normalization for batch training

3c7e4e0

fixes incorrect naming

289e551

fixes more bugs

d63beca

Merge branch 'master' into her

56eb3dc

fixes a minor bug

db24b8e

ummavi self-assigned this Jul 17, 2019

prabhatnagarajan added 5 commits July 18, 2019 04:53

merges with master, moves HER to replay_buffers directory

23e8edb

refactors hindsight code, addresses flakes, conforms to new master

52b538e

Merge branch 'master' into her

dd57c94

adds HER to readme

1d102fd

changes reward structure for HER

159b2af

ummavi reviewed Jul 26, 2019

View reviewed changes

prabhatnagarajan added 3 commits July 26, 2019 01:36

Merge branch 'master' into her

e4f7a2e

fixes ation penalty bug and actually uses action penalty

a87cbce

makes default exploration match paper

2a3207a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Implements Hindsight Experience Replay #361

[WIP] Implements Hindsight Experience Replay #361

prabhatnagarajan commented Nov 26, 2018

ummavi commented May 29, 2019

prabhatnagarajan commented Jun 10, 2019

ummavi Jul 26, 2019

ummavi Jul 26, 2019

	batch['next_state'] = self.obs_normalizer(batch['state'],
	batch['next_state'] = self.obs_normalizer(batch['next_state'],

[WIP] Implements Hindsight Experience Replay #361

Are you sure you want to change the base?

[WIP] Implements Hindsight Experience Replay #361

Conversation

prabhatnagarajan commented Nov 26, 2018

ummavi commented May 29, 2019

prabhatnagarajan commented Jun 10, 2019

ummavi Jul 26, 2019

Choose a reason for hiding this comment

ummavi Jul 26, 2019

Choose a reason for hiding this comment