-
Notifications
You must be signed in to change notification settings - Fork 245
-
Notifications
You must be signed in to change notification settings - Fork 245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Brax info style #194
Comments
You're right that 3 handles the "no data" scenario much more clearly! For high-performance code that uses Jax or PyTorch, one critical requirement is that array shapes are static and do not change in size as a result of the calculations themselves. Using these tools, the only way you could achieve option 3 would be to copy the data off the GPU/TPU and perform this manipulation on CPU. Unfortunately the latency of running this device copy at every environment step would kill your training throughput. Definitely happy consider ways to make the API clearer, but one hard performance requirement is that if an env is vectorized with 2048 envs, it always returns exactly 2048 dones, 2048 rewards, and so on. |
@erikfrey Thanks for the response, you are right, option 2 is our only choice for hardware accelerated info. An example is the following
We could enable this additional key idea with a vector parameter that could enable some or all of the keys |
Defer to you all in Gym - whatever is convenient and makes sense, let's do it. My instinct would be to use masking operations, e.g.: obs, reward, done, info = env.step(...
episode_only_reward = np.sum(np.where(done, 0, reward)) That's what comes naturally when writing Jax code, but I don't know about Gym overall. Do what's right for Gym! |
Closing as openai gym now implements a similar brax style info |
To use brax with openai gym, we are open to modify the info style that is used for our vector envrionments.
There are three options for info styles (that Im aware of), each with their advantages and disadvantages
1 .Classic (this is currently implemeneted in gym)
Example infos
One of the disadvantages of Brax info style is the necessity of "No data" data (i.e. 0 for ints, None for objects, False for bool)
The problem is that it could be quite reasonable for environment to return 0, i.e. a wrapper that computes the total reward over an episode, the number of times an event occurs.
A way to avoid this could be to use objects if one of these "No data" actually exists in the info but this seems hacky to be changing the dtype at runtime. Another thing could be to raise a warning to users to say that a "no data" has been encountered.
The advantage of the Brax info style is the speed of creating and checking if info for an environment exists through just indexing the environment number.
One of the advantages of the paired info style is that the "no data" problem outlined above is not an issue as the actual data is paired with the environments and any environment without data are not included in the dictionary. A second advantage is to finding all environments with particular data is
info[key].keys()
. The disadvantage of this style is to check if an environment has data is to check if the environment key exists then index the environment key. I haven't tested the performance impact of using dictionaries instead of numpy arrays so this maybe important to consider.Is there any additional reasoning for brax to select option 2 over option 3?
The text was updated successfully, but these errors were encountered: