Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to make model consider immediate reward ? #275

Closed
glitter2626 opened this issue Feb 12, 2022 · 4 comments
Closed

How to make model consider immediate reward ? #275

glitter2626 opened this issue Feb 12, 2022 · 4 comments

Comments

@glitter2626
Copy link

glitter2626 commented Feb 12, 2022

I want to use immediate reward from environment to teach my RL model. As described in the document, I implemented "reward" function in "Environment" class.

However, when I checked loss calculation flow in train.py file, losses['v'] seems only consider value outputted from model and outcome from environment. Also, I found that loss['r'] takes into account the rewards from the environment.

Does this mean that my model also needs to output a "return" value ?

@YuriCat
Copy link
Contributor

YuriCat commented Feb 13, 2022

@glitter2626
Thanks for your great question!
Yes, return predicts the cumulative sum of the immediate rewards in the main code.

In this branch #225, we removed outcome. And then, value predicts the cumulative sum of immediate rewards (possibly multidimensional). We can also set gamma as a list for setting different gamma for different reward dimension.

@glitter2626
Copy link
Author

glitter2626 commented Feb 14, 2022

@YuriCat Thanks for your clear reply.

I also have another question about solo training. In self-play, the model always plays with itself. However, when we choose training batch, we only randomly select an agent's episode.

Is there some reason why episodes of all agents are not considered in same batch, e.g. stable training? Can I simply comment it to consider episodes of all agents in the same batch?

[update]
After I comment solo training, my training loss become very unstable : (

@YuriCat
Copy link
Contributor

YuriCat commented Feb 15, 2022

@glitter2626 Could you please try this fix #276 ?

Of course, we can train all the players at the same time, but we had not been carefully checked such cases. Thank you for your pointing out!
Note that the input might be biased if the observations contain the same data.

@glitter2626
Copy link
Author

glitter2626 commented Feb 16, 2022

@YuriCat Thanks your fast solution again!

I have this question is because I want to utilize this awesome framework to deal with multi-agents cooperative problem. I revise some loss calculation step as described in VDN paper. But my current solution seems exist some bugs. I will try to fix it.

All in all, just wanted to thank you for your contributions, I learned a lot from you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants