Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Brax info style #194

Closed
pseudo-rnd-thoughts opened this issue May 4, 2022 · 4 comments
Closed

Brax info style #194

pseudo-rnd-thoughts opened this issue May 4, 2022 · 4 comments

Comments

@pseudo-rnd-thoughts
Copy link
Contributor

pseudo-rnd-thoughts commented May 4, 2022

To use brax with openai gym, we are open to modify the info style that is used for our vector envrionments.
There are three options for info styles (that Im aware of), each with their advantages and disadvantages

1 .Classic (this is currently implemeneted in gym)

[
     env_0_info,
     env_1_info,
     ...
]
  1. Brax (this is the style in brax)
{
     key: np.array([env_0_info[key], env_1_info[key], ...]),
     ....
}
  1. Paired (this is a novel style that has the advantages of both styles, imo)
{
     key: {
          1: env_0_info[key],
          2: env_1_info[key],
          ...
     }, 
     ...
}

Example infos

env_0_info = {'reward': 1}
env_1_info = {'reward': 3}
env_2_info = {'truncation': True}

classic_info = [{'reward': 1}, {'reward': 3}, {'truncation': True}]
brax_info = {
    'reward': np.array([1, 3, 0]),
    'truncation': np.array([False, False, True])
}
paired_info = {
    'reward': {0: 1, 1: 3},
    'truncation': {2: True}
}

One of the disadvantages of Brax info style is the necessity of "No data" data (i.e. 0 for ints, None for objects, False for bool)
The problem is that it could be quite reasonable for environment to return 0, i.e. a wrapper that computes the total reward over an episode, the number of times an event occurs.
A way to avoid this could be to use objects if one of these "No data" actually exists in the info but this seems hacky to be changing the dtype at runtime. Another thing could be to raise a warning to users to say that a "no data" has been encountered.
The advantage of the Brax info style is the speed of creating and checking if info for an environment exists through just indexing the environment number.

One of the advantages of the paired info style is that the "no data" problem outlined above is not an issue as the actual data is paired with the environments and any environment without data are not included in the dictionary. A second advantage is to finding all environments with particular data is info[key].keys(). The disadvantage of this style is to check if an environment has data is to check if the environment key exists then index the environment key. I haven't tested the performance impact of using dictionaries instead of numpy arrays so this maybe important to consider.

Is there any additional reasoning for brax to select option 2 over option 3?

@erikfrey
Copy link
Collaborator

erikfrey commented May 5, 2022

You're right that 3 handles the "no data" scenario much more clearly!

For high-performance code that uses Jax or PyTorch, one critical requirement is that array shapes are static and do not change in size as a result of the calculations themselves. Using these tools, the only way you could achieve option 3 would be to copy the data off the GPU/TPU and perform this manipulation on CPU. Unfortunately the latency of running this device copy at every environment step would kill your training throughput.

Definitely happy consider ways to make the API clearer, but one hard performance requirement is that if an env is vectorized with 2048 envs, it always returns exactly 2048 dones, 2048 rewards, and so on.

@pseudo-rnd-thoughts
Copy link
Contributor Author

pseudo-rnd-thoughts commented May 6, 2022

@erikfrey Thanks for the response, you are right, option 2 is our only choice for hardware accelerated info.
My only thought is that we could add an optional parameter that adds an additional _key for keys that could have real "no data" data that is a boolean array of if the info "actually" exists.

An example is the following

env_1_info = {'reward': 0, 'action': 1}
env_2_info = {'reward': 1, 'action': 4}
env_3_info = {'action': 2}

info = {
     'reward': np.array([0, 1, 0]),
     '_reward': np.array([True, True, False]),
     'action': np.array([1, 4, 2])
}

We could enable this additional key idea with a vector parameter that could enable some or all of the keys

@erikfrey
Copy link
Collaborator

erikfrey commented May 6, 2022

Defer to you all in Gym - whatever is convenient and makes sense, let's do it.

My instinct would be to use masking operations, e.g.:

obs, reward, done, info = env.step(... 
episode_only_reward = np.sum(np.where(done, 0, reward))

That's what comes naturally when writing Jax code, but I don't know about Gym overall. Do what's right for Gym!

@pseudo-rnd-thoughts
Copy link
Contributor Author

Closing as openai gym now implements a similar brax style info

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants