Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion: AMP support #200

Open
digitalspecialists opened this issue May 15, 2021 · 3 comments
Open

Suggestion: AMP support #200

digitalspecialists opened this issue May 15, 2021 · 3 comments

Comments

@digitalspecialists
Copy link

It would be useful to have native amp mixd precision support for those with limited GPU
https://pytorch.org/docs/stable/amp.html

@ikki407
Copy link
Member

ikki407 commented May 20, 2021

Thank you for your suggestion! I haven't used it before. How effective is it?
Do you have any ideas for implementation?

@digitalspecialists
Copy link
Author

digitalspecialists commented May 20, 2021

AMP is support for Automatic Mixed Precision. It was made native to torch 1.6 in mid-2020 by replacing Nvidia's Apex extension two years after it was introduced. I try to use it wherever I can for practical needs. I haven't tried to patch HandyRL with it.

Pytorch declares some benchmarks here [0]. They say accuracy impact is generally <0.1% on standard benchmarks and speed up impact is generally 1.5-2x. You can usually almost double batch sizes. That accords with my experience using a 2080 TI.

Adding AMP is straightforward.

The minimal steps are

Declare a scaler for the life of the session

# Create a GradScaler once at the beginning of training.
scaler = torch.cuda.amp.GradScaler()

Wrap all forward calls with the autocaster and scales losses

# Runs the forward pass with autocasting.
with torch.cuda.amp.autocast():
    outputs = self.model(inputs)
    loss = self.criterion(outputs, targets)

# Scales loss.  Calls backward() on scaled loss to create scaled gradients.
scaler.scale(loss).backward()

# Unscale the gradients of the optimizer's params. then calls optimizer.step()
scaler.step(self.optimizer)

# Updates the scale for next iteration
scaler.update()

Pytorch provide some examples here [1]. If you accumulate gradients, or modify gradients before they are scaled, or work with DataParellel, there are additional nominal steps.

Implementations sometimes offer USE_PARALLEL and USE_AMP as configurable parameters for users.

This is just a suggestion. With RL most users would probably be CPU rather than GPU bound. For those with 32+ cores and 1 GPU, it may be useful.

[0] https://pytorch.org/blog/accelerating-training-on-nvidia-gpus-with-pytorch-automatic-mixed-precision/
[1] https://pytorch.org/docs/stable/notes/amp_examples.html

@ikki407
Copy link
Member

ikki407 commented May 20, 2021

Thanks for the details. That is awesome! Through your comments, I feel that it would be convenient to use amp as an config option. Although some verifications are required, it doesn't seem difficult to introduce. I think we need to be careful when using DataParallel and clip_grad_norm (but I'm worried that it will be complicated a bit ...) We, HandyRL team, will consider the AMP support.

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants