Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do i know when ive reached an optimum while training #153

Open
Quetzalcohuatl opened this issue Mar 7, 2021 · 3 comments
Open

How do i know when ive reached an optimum while training #153

Quetzalcohuatl opened this issue Mar 7, 2021 · 3 comments

Comments

@Quetzalcohuatl
Copy link

Quetzalcohuatl commented Mar 7, 2021

for example when training tic tac toe, is the optimum reached when win rate == 0.50? my win rate is so far always above 0.50. i havent used the evaluate function yet because i feel like the win_rate printed after every epoch is already an evaluation?

@Quetzalcohuatl Quetzalcohuatl changed the title How do i know when ive reached an optimum How do i know when ive reached an optimum while training Mar 7, 2021
@YuriCat
Copy link
Contributor

YuriCat commented Mar 7, 2021

An opponent player in the evaluation phase is a random player in default.

I think the winning rate of a perfect player versus a random player is about 98% in Tic-Tac-Toe, because random players sometimes choose correct actions.

Generally speaking, "optimal" policy cannot be defined in multi-player games, while the maximum entropy Nash equilibrium is recognized as the representative policy.

@Quetzalcohuatl
Copy link
Author

YuriCat, can you add an arg in the train function that has evaluate to a different agent? Like i want to evaluate against my model from 20 epochs ago to see if it is improving or not. How can i do this ? I only see its supported in evaluate.py but not train.py

@YuriCat
Copy link
Contributor

YuriCat commented Mar 8, 2021

Thanks for your suggestion.
Selecting opponents is what we are considering right now.
Do you have any good idea to specify the old model in configuration?

By the way, comparing against a model just before the current model may give us an interesting result, since policies trained by RL are sometimes with in a loop like Rock-Scissors-Paper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants