Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doubt regarding Batch Size #22

Open
owaisCS opened this issue Sep 11, 2023 · 2 comments
Open

Doubt regarding Batch Size #22

owaisCS opened this issue Sep 11, 2023 · 2 comments

Comments

@owaisCS
Copy link

owaisCS commented Sep 11, 2023

In Table 14(a) and 14(b), you have mentioned batch size as 4096 for Pretraining while batch size of 1024 for Finetuning.
Could you clarify if the said batch size is per GPU or global batch size using all GPUs in all nodes?
Also the GPU size used to report above values.

Could the logs of pretraining and finetuning be made available?

@dbolya
Copy link
Contributor

dbolya commented Sep 12, 2023

The batch sizes in the appendix are global unless stated otherwise, and so are the learning rates. So you can use any number of GPUs as long as the total batch size adds up to that number. We used a mix of A100 40gb and A100 80gb in our experiments, so most configs should work on GPUs with 40gb or below with 64 GPUs.

If you run out of memory, you can always use a lower per-gpu batch size while increasing the number of GPUs (which will be equivalent) or reducing the learning rate (which might not exactly reproduce the result due to training with AdamW) to compensate.

I'll look into seeing if we can release some training graphs.

@owaisCS
Copy link
Author

owaisCS commented Sep 19, 2023

Thank you for the response.

Kindly upload logs of pretraining and fine-tuning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants