Brax info style #194

pseudo-rnd-thoughts · 2022-05-04T14:34:19Z

To use brax with openai gym, we are open to modify the info style that is used for our vector envrionments.
There are three options for info styles (that Im aware of), each with their advantages and disadvantages

1 .Classic (this is currently implemeneted in gym)

[
     env_0_info,
     env_1_info,
     ...
]

Brax (this is the style in brax)

{
     key: np.array([env_0_info[key], env_1_info[key], ...]),
     ....
}

Paired (this is a novel style that has the advantages of both styles, imo)

{
     key: {
          1: env_0_info[key],
          2: env_1_info[key],
          ...
     }, 
     ...
}

Example infos

env_0_info = {'reward': 1}
env_1_info = {'reward': 3}
env_2_info = {'truncation': True}

classic_info = [{'reward': 1}, {'reward': 3}, {'truncation': True}]
brax_info = {
    'reward': np.array([1, 3, 0]),
    'truncation': np.array([False, False, True])
}
paired_info = {
    'reward': {0: 1, 1: 3},
    'truncation': {2: True}
}

One of the disadvantages of Brax info style is the necessity of "No data" data (i.e. 0 for ints, None for objects, False for bool)
The problem is that it could be quite reasonable for environment to return 0, i.e. a wrapper that computes the total reward over an episode, the number of times an event occurs.
A way to avoid this could be to use objects if one of these "No data" actually exists in the info but this seems hacky to be changing the dtype at runtime. Another thing could be to raise a warning to users to say that a "no data" has been encountered.
The advantage of the Brax info style is the speed of creating and checking if info for an environment exists through just indexing the environment number.

One of the advantages of the paired info style is that the "no data" problem outlined above is not an issue as the actual data is paired with the environments and any environment without data are not included in the dictionary. A second advantage is to finding all environments with particular data is info[key].keys(). The disadvantage of this style is to check if an environment has data is to check if the environment key exists then index the environment key. I haven't tested the performance impact of using dictionaries instead of numpy arrays so this maybe important to consider.

Is there any additional reasoning for brax to select option 2 over option 3?

The text was updated successfully, but these errors were encountered:

erikfrey · 2022-05-05T21:43:36Z

You're right that 3 handles the "no data" scenario much more clearly!

For high-performance code that uses Jax or PyTorch, one critical requirement is that array shapes are static and do not change in size as a result of the calculations themselves. Using these tools, the only way you could achieve option 3 would be to copy the data off the GPU/TPU and perform this manipulation on CPU. Unfortunately the latency of running this device copy at every environment step would kill your training throughput.

Definitely happy consider ways to make the API clearer, but one hard performance requirement is that if an env is vectorized with 2048 envs, it always returns exactly 2048 dones, 2048 rewards, and so on.

pseudo-rnd-thoughts · 2022-05-06T19:15:19Z

@erikfrey Thanks for the response, you are right, option 2 is our only choice for hardware accelerated info.
My only thought is that we could add an optional parameter that adds an additional _key for keys that could have real "no data" data that is a boolean array of if the info "actually" exists.

An example is the following

env_1_info = {'reward': 0, 'action': 1}
env_2_info = {'reward': 1, 'action': 4}
env_3_info = {'action': 2}

info = {
     'reward': np.array([0, 1, 0]),
     '_reward': np.array([True, True, False]),
     'action': np.array([1, 4, 2])
}

We could enable this additional key idea with a vector parameter that could enable some or all of the keys

erikfrey · 2022-05-06T19:49:57Z

Defer to you all in Gym - whatever is convenient and makes sense, let's do it.

My instinct would be to use masking operations, e.g.:

obs, reward, done, info = env.step(... 
episode_only_reward = np.sum(np.where(done, 0, reward))

That's what comes naturally when writing Jax code, but I don't know about Gym overall. Do what's right for Gym!

pseudo-rnd-thoughts · 2022-06-09T11:10:11Z

Closing as openai gym now implements a similar brax style info

gianlucadecola mentioned this issue May 7, 2022

New info API for vectorized environments #2657 openai/gym#2773

Merged

pseudo-rnd-thoughts closed this as completed Jun 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Brax info style #194

Brax info style #194

pseudo-rnd-thoughts commented May 4, 2022 •

edited

Loading

erikfrey commented May 5, 2022

pseudo-rnd-thoughts commented May 6, 2022 •

edited

Loading

erikfrey commented May 6, 2022

pseudo-rnd-thoughts commented Jun 9, 2022

Brax info style #194

Brax info style #194

Comments

pseudo-rnd-thoughts commented May 4, 2022 • edited Loading

erikfrey commented May 5, 2022

pseudo-rnd-thoughts commented May 6, 2022 • edited Loading

erikfrey commented May 6, 2022

pseudo-rnd-thoughts commented Jun 9, 2022

pseudo-rnd-thoughts commented May 4, 2022 •

edited

Loading

pseudo-rnd-thoughts commented May 6, 2022 •

edited

Loading