The custom CUDA extensions may be uncompatible with PyTorch 1.10.0, possible missing dependency `ninja` #9

fxmarty · 2022-01-25T23:04:20Z

Hello,

I run the training in a google colab instance.

My code is roughly the following (in different cells):

from google.colab import drive
drive.mount('/content/drive')

%cd "drive/My Drive/DirectVoxGO"

pip install -r requirements.txt

import torch
assert(torch.__version__ == '1.10.0+cu111')

pip install torch-scatter -f https://data.pyg.org/whl/torch-1.10.0+cu111.html

!python run.py --config configs/blendedmvs/Jade.py --render_test  # train

First error coming out (in the last line of the above), I need ninja:

Using /root/.cache/torch_extensions/py37_cu111 as PyTorch extensions root...
Creating extension directory /root/.cache/torch_extensions/py37_cu111/adam_upd_cuda...
Traceback (most recent call last):
  File "run.py", line 13, in <module>
    from lib import utils, dvgo
  File "/content/drive/MyDrive/MAREVA_project/DirectVoxGO/lib/utils.py", line 11, in <module>
    from .masked_adam import MaskedAdam
  File "/content/drive/MyDrive/MAREVA_project/DirectVoxGO/lib/masked_adam.py", line 10, in <module>
    verbose=True)
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1136, in load
    keep_intermediates=keep_intermediates)
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1347, in _jit_compile
    is_standalone=is_standalone)
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1418, in _write_ninja_file_and_build_library
    verify_ninja_availability()
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py", line 1474, in verify_ninja_availability
    raise RuntimeError("Ninja is required to load C++ extensions")
RuntimeError: Ninja is required to load C++ extensions

Then, I do

pip install ninja

Rerunning the line !python run.py --config configs/blendedmvs/Jade.py --render_test # train, I now get a very ugly output, that I put on pastebin because it is too long: https://pastebin.com/Um0cMJfg . Notice that despite the warnings, the training starts at the end.

Is this output expected? If these warnings/errors appear due to PyTorch 1.10.0 and are critical, I would suggest to disable those custom CUDA extensions by default so that training can run on several PyTorch versions, not only 1.8.1.

Thanks a lot!

The text was updated successfully, but these errors were encountered:

sunset1995 · 2022-01-26T07:54:32Z

Thanks for reminding me the dependency :)

Yes, this is expected when the first time you run the code where I turn on the verbose mode of Pytorch jit compilation module for debugging purpose.

fxmarty · 2022-01-26T08:40:33Z

Great to know - I'll use the version with the custom kernels then. Thanks!

Edit: Maybe an other question @sunset1995 : I saw you increased what seem to be the batch size here https://github.com/sunset1995/DirectVoxGO/blob/main/run.py#L90 from 16 to 65536, is it safe to reduce it or is it actually leveraged by the rendering kernels? In the previous version of the library, even 16 was too high to run on my GPU which has low memory. I did not try yet to run inference with the new version but I will let you know how to goes memory-wise. (Edit again: well reducing to 32768 is fine this time for my memory. Weird.)

Edit again: Just to let you know, but probably you are aware: the old trained models have missing keys with the new version, see e.g. the error during inference

Traceback (most recent call last):
  File "/home/felix/Documents/Mines/3A/Option/Mini-projet/directvoxgo-mareva/DirectVoxGO/run.py", line 518, in <module>
    model = utils.load_model(dvgo.DirectVoxGO, ckpt_path).to(device)
  File "/home/felix/Documents/Mines/3A/Option/Mini-projet/directvoxgo-mareva/DirectVoxGO/lib/utils.py", line 62, in load_model
    model = model_class(**ckpt['model_kwargs'])
  File "/home/felix/Documents/Mines/3A/Option/Mini-projet/directvoxgo-mareva/DirectVoxGO/lib/dvgo.py", line 98, in __init__
    self.mask_cache = MaskCache(
  File "/home/felix/Documents/Mines/3A/Option/Mini-projet/directvoxgo-mareva/DirectVoxGO/lib/dvgo.py", line 395, in __init__
    alpha = 1 - torch.exp(-F.softplus(density + st['model_kwargs']['act_shift']) * st['model_kwargs']['voxel_size_ratio'])
KeyError: 'act_shift'

New trained models are fine (and run faster which is cool!).

aarrushi · 2022-02-02T22:07:26Z

Hi,
I tried "pip install ninja" but still getting the issue "raise RuntimeError("Ninja is required to load C++ extensions")".
Is there any other dependency to be taken care of?

Also, is there a way to disable the new optimization related changes in the new repo if that is causing issues?

sunset1995 · 2022-02-03T05:54:31Z

It seems that window OS have some issues with this.
Maybe can try this zhanghang1989/PyTorch-Encoding#167.

Learningm · 2022-11-30T06:28:34Z

I solved this problem "RuntimeError("Ninja is required to load C++ extensions")" by installing

torch 1.8.1, cuda 11.1
pip install ninja
sudo apt-get install ninja-build

aarrushi · 2022-11-30T21:06:38Z

@Learningm did you resolve it on windows as well?

Learningm · 2022-12-01T06:14:32Z

@aarrushi I resolved it on Linux.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The custom CUDA extensions may be uncompatible with PyTorch 1.10.0, possible missing dependency `ninja` #9

The custom CUDA extensions may be uncompatible with PyTorch 1.10.0, possible missing dependency `ninja` #9

fxmarty commented Jan 25, 2022 •

edited

Loading

sunset1995 commented Jan 26, 2022

fxmarty commented Jan 26, 2022 •

edited

Loading

aarrushi commented Feb 2, 2022 •

edited

Loading

sunset1995 commented Feb 3, 2022

Learningm commented Nov 30, 2022

aarrushi commented Nov 30, 2022

Learningm commented Dec 1, 2022

The custom CUDA extensions may be uncompatible with PyTorch 1.10.0, possible missing dependency ninja #9

The custom CUDA extensions may be uncompatible with PyTorch 1.10.0, possible missing dependency ninja #9

Comments

fxmarty commented Jan 25, 2022 • edited Loading

sunset1995 commented Jan 26, 2022

fxmarty commented Jan 26, 2022 • edited Loading

aarrushi commented Feb 2, 2022 • edited Loading

sunset1995 commented Feb 3, 2022

Learningm commented Nov 30, 2022

aarrushi commented Nov 30, 2022

Learningm commented Dec 1, 2022

The custom CUDA extensions may be uncompatible with PyTorch 1.10.0, possible missing dependency `ninja` #9

The custom CUDA extensions may be uncompatible with PyTorch 1.10.0, possible missing dependency `ninja` #9

fxmarty commented Jan 25, 2022 •

edited

Loading

fxmarty commented Jan 26, 2022 •

edited

Loading

aarrushi commented Feb 2, 2022 •

edited

Loading