-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to make model consider immediate reward ? #275
Comments
@glitter2626 In this branch #225, we removed |
@YuriCat Thanks for your clear reply. I also have another question about solo training. In self-play, the model always plays with itself. However, when we choose training batch, we only randomly select an agent's episode. Is there some reason why episodes of all agents are not considered in same batch, e.g. stable training? Can I simply comment it to consider episodes of all agents in the same batch? [update] |
@glitter2626 Could you please try this fix #276 ? Of course, we can train all the players at the same time, but we had not been carefully checked such cases. Thank you for your pointing out! |
@YuriCat Thanks your fast solution again! I have this question is because I want to utilize this awesome framework to deal with multi-agents cooperative problem. I revise some loss calculation step as described in VDN paper. But my current solution seems exist some bugs. I will try to fix it. All in all, just wanted to thank you for your contributions, I learned a lot from you. |
I want to use immediate reward from environment to teach my RL model. As described in the document, I implemented "reward" function in "Environment" class.
However, when I checked loss calculation flow in train.py file, losses['v'] seems only consider value outputted from model and outcome from environment. Also, I found that loss['r'] takes into account the rewards from the environment.
Does this mean that my model also needs to output a "return" value ?
The text was updated successfully, but these errors were encountered: