Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Wrappers]: TimeAwareObservation #1490

Merged
merged 10 commits into from
Nov 5, 2020
Merged

Conversation

zuoxingdong
Copy link
Contributor

No description provided.

super(TimeAwareObservation, self).__init__(env)
low = np.append(self.observation_space.low, 0.0)
high = np.append(self.observation_space.high, np.inf)
self.observation_space = Box(low, high, dtype=np.float32)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's add some asserts here to make sure it is not accidentally being used where it cannot be used with some arcane error message downstream (i.e. check that observation space is Box with dtype=np.float32). Another subtle possibility for a problem is limits of observation space / normalization. Imagine low=-1.0, high=1.0 - then time steps after the second step will be out of bounds. Even more subtly, if the timesteps observation is on the different scale with the rest of the observations, that can mess up the neural network training (so one will have to resort to VecNormalize-like wrapper from baselines approaches)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two assertions are added for checking Box with np.float32. Indeed, for environments that beyond very short horizons, it can be problematic for training neural networks. Perhaps we could add a documentation in the docstring about that ? Or displays a warning msg (but this warning will be redundant if the user already wrap with VecNormalize-like wrappers).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if we provide a horizon parameter, and then normalize the time observation by that value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pzhokhov That's a great idea ! Should we provide this functionality as an optional flag, so that the user could decide to give a horizon parameter or by default letting it be incremented ?

@TheScalper
Copy link

This PR is & year old. Is it still relevant or can it be close?

@jkterry1
Copy link
Collaborator

jkterry1 commented Nov 2, 2020

@pzhokhov do the commits address your concerns?

@pzhokhov
Copy link
Collaborator

pzhokhov commented Nov 5, 2020

Close enough, let's merge it.

@pzhokhov pzhokhov merged commit 28c42b6 into openai:master Nov 5, 2020
zlig pushed a commit to zlig/gym that referenced this pull request Sep 6, 2021
* Create time_aware_observation.py

* Update __init__.py

* Create test_time_aware_observation.py

* Update time_aware_observation.py

* Update time_aware_observation.py

* Update time_aware_observation.py

Co-authored-by: pzhokhov <peterz@openai.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants