Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding multidiscrete feature #328

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Conversation

Jogima-cyber
Copy link

Hello there ! I'd like to add the capability to handle multidiscrete action space. Here is how I would do that :

  • Add multidiscrete to env config yaml
  • if multidiscrete is set to True then user must add nvec attribute to his model. nvec format is [nb. of actions for first independent action set, nb. of actions for second independent action set, ect.]
  • If multidiscrete set to True, then output logits must be treated differently, I'll take ppo_multidiscrete as inspiration, softmax must be applied for each independent set of actions

@YuriCat
Copy link
Contributor

YuriCat commented Nov 7, 2022

Hi, @Jogima-cyber ! Long time no see, and thanks for your great suggestion!
I also tried the multiple action situation this year, so I'll check if it can be put in the main code.

@Jogima-cyber
Copy link
Author

I'm currently working on it ! First thing I'm doing right now is integrating a home made simple multidiscrete environment into HandyRL in order to test my upcoming multidicrete integration. I'll push it as soon as I'm finished (it's gonna include a non-multidiscrete option).
I'm not sur I'll need to add a multidiscrete variable into config, maybe just nvec being defined in model should be enough for the library to know it should handle a multidiscrete case.
I need this feature right now, so I figured I could add it globally instead on just my fork, if you want it though.

@YuriCat
Copy link
Contributor

YuriCat commented Nov 8, 2022

@Jogima-cyber
I pushed feature/multi_unit branch into my fork.
master...YuriCat:HandyRL:feature/multi_unit

In this branch, each agent can output env.num_units() actions in each turn.
I've confirmed that it works with TicTacToe!

@Jogima-cyber
Copy link
Author

Jogima-cyber commented Nov 8, 2022

Thank you @YuriCat, that's almost exactly what I needed. I'm working on your fork, I just need to add a moment "unit_mask", because some of my independent action sets may be empty depending on the state (meaning no action can be taken among the actions of the action set).

@YuriCat
Copy link
Contributor

YuriCat commented Nov 8, 2022

@Jogima-cyber

unit_mask

Indeed! We need unit_mask if there are absent units.

@Jogima-cyber
Copy link
Author

Btw. I've output the shapes of some vectors and in line feature/multi_unit/train.py:l204 I've got shapes that I don't think are normal :

  • for total_advantages I got torch.Size([1, 32, 1, 2304, 2304])
  • meaning that for (-log_selected_policies * total_advantages) I got torch.Size([1, 32, 1, 2304, 2304])
    Where 32 is my BS and 2304 is my unit number.
    I'm investigating to understand more parts of your code in order to know if it's normal behaviour or not.

@YuriCat
Copy link
Contributor

YuriCat commented Nov 10, 2022

@Jogima-cyber
.unsqueeze(-1) may be necessary for either of the two.

@Jogima-cyber
Copy link
Author

Jogima-cyber commented Nov 10, 2022

Yeah I found where to add a .unsqueeze(-1) : feature/multi_unit/train.py:l234 we have log_rhos = log_selected_t_policies.detach() - log_selected_b_policies but log_selected_t_policies is of shape (1, 32, 1, 2304, 1) but log_selected_b_policies is of shape (1, 32, 1, 1, 2304) and therefore torch is applying broadcasting. I've solved the issue by adding .unsqueeze(-1) and it should work for my case but I didn't take into account all the cases HandyRL is covering like RNN, turn based training and so on, so I don't think my fixing apply generally for HandyRL.

@YuriCat
Copy link
Contributor

YuriCat commented Nov 18, 2022

@Jogima-cyber Updated my branch feature/numti_unit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants