RuntimeError: expected scalar type Float but found Half with DCNv2 at backward #2187

zehuichen123 · 2022-08-10T13:39:08Z

Hi, I am tryiuse using fp16 in BEVDet (actually the BEVDepth implemented with the BEVDet codebase) with mmcv==1.4.0, torch=1.9, torchvision=0.10.0. However, I encountered this problem:

Traceback (most recent call last):
  File "./tools/train.py", line 224, in <module>
    main()
  File "./tools/train.py", line 220, in main
    meta=meta)
  File "/nfs/chenzehui/code/BEVDet/mmdet3d/apis/train.py", line 208, in train_model
    meta=meta)
  File "/nfs/chenzehui/code/BEVDet/mmdet3d/apis/train.py", line 177, in train_detector
    runner.run(data_loaders, cfg.workflow)
  File "/nfs/chenzehui/others/miniconda3/envs/bevdet/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/nfs/chenzehui/others/miniconda3/envs/bevdet/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 51, in train
    self.call_hook('after_train_iter')
  File "/nfs/chenzehui/others/miniconda3/envs/bevdet/lib/python3.7/site-packages/mmcv/runner/base_runner.py", line 307, in call_hook
    getattr(hook, fn_name)(self)
  File "/nfs/chenzehui/others/miniconda3/envs/bevdet/lib/python3.7/site-packages/mmcv/runner/hooks/optimizer.py", line 224, in after_train_iter
    self.loss_scaler.scale(runner.outputs['loss']).backward()
  File "/nfs/chenzehui/others/miniconda3/envs/bevdet/lib/python3.7/site-packages/torch/_tensor.py", line 255, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/nfs/chenzehui/others/miniconda3/envs/bevdet/lib/python3.7/site-packages/torch/autograd/__init__.py", line 149, in backward
    allow_unreachable=True, accumulate_grad=True)  # allow_unreachable flag
  File "/nfs/chenzehui/others/miniconda3/envs/bevdet/lib/python3.7/site-packages/torch/autograd/function.py", line 87, in apply
    return self._forward_cls.backward(self, *args)  # type: ignore[attr-defined]
  File "/nfs/chenzehui/others/miniconda3/envs/bevdet/lib/python3.7/site-packages/torch/autograd/function.py", line 204, in wrapper
    outputs = fn(ctx, *args)
  File "/nfs/chenzehui/others/miniconda3/envs/bevdet/lib/python3.7/site-packages/mmcv/ops/modulated_deform_conv.py", line 129, in backward
    with_bias=ctx.with_bias)
RuntimeError: expected scalar type Float but found Half

There are some related issues #1004 about DCN in fp16 but I noticed that they are all accur in the forward phase, not in the backward function.

The text was updated successfully, but these errors were encountered:

zehuichen123 · 2022-08-11T02:07:07Z

I rechecked the code and find that when I change the DCNv2 to DCN (https://github.com/HuangJunJie2017/BEVDet/blob/8bd2c041b249b5fc52cbdd1cfe45834cb98f7e00/mmdet3d/models/necks/view_transformer.py#L307), this error disappeared. So this bug only belongs to DCNv2?

grimoire · 2022-08-11T03:39:34Z

I guess it is because the bias has wrong type.
We have a fix in 9b49fcc. You can use the latest MMCV or update the code if you want to use 1.4.0

quhaoooo · 2022-09-09T05:53:07Z

I guess it is because the bias has wrong type. We have a fix in 9b49fcc. You can use the latest MMCV or update the code if you want to use 1.4.0

I got the sanme problem when i use amp and can not solve when i update the code:

zehuichen123 · 2022-09-09T07:53:07Z

@quhaoooo You can simply set bias=False to avoid this problem. It seems that bias is not so important.

quhaoooo · 2022-09-13T06:17:04Z

@quhaoooo You can simply set bias=False to avoid this problem. It seems that bias is not so important.

I just set bias = False , but got this problem:

zhouzaida assigned grimoire Aug 10, 2022

zehuichen123 closed this as completed Aug 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: expected scalar type Float but found Half with DCNv2 at backward #2187

RuntimeError: expected scalar type Float but found Half with DCNv2 at backward #2187

zehuichen123 commented Aug 10, 2022 •

edited

Loading

zehuichen123 commented Aug 11, 2022

grimoire commented Aug 11, 2022

quhaoooo commented Sep 9, 2022

zehuichen123 commented Sep 9, 2022

quhaoooo commented Sep 13, 2022

RuntimeError: expected scalar type Float but found Half with DCNv2 at backward #2187

RuntimeError: expected scalar type Float but found Half with DCNv2 at backward #2187

Comments

zehuichen123 commented Aug 10, 2022 • edited Loading

zehuichen123 commented Aug 11, 2022

grimoire commented Aug 11, 2022

quhaoooo commented Sep 9, 2022

zehuichen123 commented Sep 9, 2022

quhaoooo commented Sep 13, 2022

zehuichen123 commented Aug 10, 2022 •

edited

Loading