-
Notifications
You must be signed in to change notification settings - Fork 8.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Wrappers]: TimeAwareObservation #1490
Conversation
super(TimeAwareObservation, self).__init__(env) | ||
low = np.append(self.observation_space.low, 0.0) | ||
high = np.append(self.observation_space.high, np.inf) | ||
self.observation_space = Box(low, high, dtype=np.float32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's add some assert
s here to make sure it is not accidentally being used where it cannot be used with some arcane error message downstream (i.e. check that observation space is Box with dtype=np.float32). Another subtle possibility for a problem is limits of observation space / normalization. Imagine low=-1.0, high=1.0 - then time steps after the second step will be out of bounds. Even more subtly, if the timesteps observation is on the different scale with the rest of the observations, that can mess up the neural network training (so one will have to resort to VecNormalize-like wrapper from baselines approaches)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two assertions are added for checking Box with np.float32. Indeed, for environments that beyond very short horizons, it can be problematic for training neural networks. Perhaps we could add a documentation in the docstring about that ? Or displays a warning msg (but this warning will be redundant if the user already wrap with VecNormalize-like wrappers).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if we provide a horizon parameter, and then normalize the time observation by that value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pzhokhov That's a great idea ! Should we provide this functionality as an optional flag, so that the user could decide to give a horizon parameter or by default letting it be incremented ?
This PR is & year old. Is it still relevant or can it be close? |
@pzhokhov do the commits address your concerns? |
Close enough, let's merge it. |
* Create time_aware_observation.py * Update __init__.py * Create test_time_aware_observation.py * Update time_aware_observation.py * Update time_aware_observation.py * Update time_aware_observation.py Co-authored-by: pzhokhov <peterz@openai.com>
No description provided.