Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ModuleNotFoundError: No module named 'yolox.layers.fast_cocoeval' #1253

Closed
npurson opened this issue Apr 18, 2022 · 5 comments
Closed

ModuleNotFoundError: No module named 'yolox.layers.fast_cocoeval' #1253

npurson opened this issue Apr 18, 2022 · 5 comments

Comments

@npurson
Copy link

npurson commented Apr 18, 2022

After pulling the latest commit 6513f769fa500b3c7ad23b90a91dcbd8402be330, the following error was raised when begining to evaluate.

2022-04-17 20:45:12 | INFO     | yolox.core.trainer:271 - epoch: 10/150, iter: 1810/1849, mem: 20322Mb, iter_time: 0.405s, data_time: 0.002s, total_loss: 7.3, iou_loss: 2.6, l1_loss: 0.0, conf_loss: 3.2, cls_loss: 1.4, lr: 9.966e-04, size: 608, ETA: 1 day, 5:08:26
2022-04-17 20:45:16 | INFO     | yolox.core.trainer:271 - epoch: 10/150, iter: 1820/1849, mem: 20322Mb, iter_time: 0.363s, data_time: 0.002s, total_loss: 6.6, iou_loss: 2.3, l1_loss: 0.0, conf_loss: 2.7, cls_loss: 1.6, lr: 9.966e-04, size: 512, ETA: 1 day, 5:08:16
2022-04-17 20:45:19 | INFO     | yolox.core.trainer:271 - epoch: 10/150, iter: 1830/1849, mem: 20322Mb, iter_time: 0.303s, data_time: 0.002s, total_loss: 6.4, iou_loss: 2.5, l1_loss: 0.0, conf_loss: 2.6, cls_loss: 1.3, lr: 9.966e-04, size: 544, ETA: 1 day, 5:07:58
2022-04-17 20:45:25 | INFO     | yolox.core.trainer:271 - epoch: 10/150, iter: 1840/1849, mem: 20322Mb, iter_time: 0.667s, data_time: 0.033s, total_loss: 7.6, iou_loss: 2.6, l1_loss: 0.0, conf_loss: 3.3, cls_loss: 1.7, lr: 9.965e-04, size: 672, ETA: 1 day, 5:08:31
2022-04-17 20:45:29 | INFO     | yolox.core.trainer:362 - Save weights to ./YOLOX_outputs/yolox_m
2022-04-17 20:55:07 | INFO     | yolox.evaluators.coco_evaluator:235 - Evaluate in main process...
2022-04-17 20:55:15 | INFO     | yolox.evaluators.coco_evaluator:268 - Loading and preparing results...
2022-04-17 20:55:20 | INFO     | yolox.evaluators.coco_evaluator:268 - DONE (t=4.88s)
2022-04-17 20:55:20 | INFO     | pycocotools.coco:366 - creating index...
2022-04-17 20:55:20 | INFO     | pycocotools.coco:366 - index created!
2022-04-17 20:55:25 | INFO     | yolox.core.trainer:206 - Training of experiment is done and the best AP is 0.00
2022-04-17 20:55:25 | ERROR    | yolox.core.launch:147 - An error has been caught in function '_distributed_worker', process 'ForkProcess-1' (4461), thread 'MainThread' (140443267692352):
Traceback (most recent call last):

  File "/running_package/yolox/yolox/layers/jit_ops.py", line 83, in load
    return importlib.import_module(self.absolute_name())
           |         |             |    -> <function FastCOCOEvalOp.absolute_name at 0x7fb63e57b048>
           |         |             -> <yolox.layers.jit_ops.FastCOCOEvalOp object at 0x7fb63dcc01d0>
           |         -> <function import_module at 0x7fbb7d3709d8>
           -> <module 'importlib' from '/usr/lib64/python3.6/importlib/__init__.py'>

  File "/usr/lib64/python3.6/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           |          |           |    |        |        -> 0
           |          |           |    |        -> None
           |          |           |    -> 0
           |          |           -> 'yolox.layers.fast_cocoeval'
           |          -> <function _gcd_import at 0x7fbb7dd1fe18>
           -> <module 'importlib._bootstrap' (frozen)>
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 953, in _find_and_load_unlocked

ModuleNotFoundError: No module named 'yolox.layers.fast_cocoeval'
@npurson
Copy link
Author

npurson commented Apr 18, 2022

Oh I see, the above error message is raised due to the fail of load cpp extensions because no ninja is installed. I guess there should be more mechanims to guarantee forward compatibility?

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

  File "/usr/lib64/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
                -> ModuleSpec(name='yolox.tools.train', loader=<_frozen_importlib_external.SourceFileLoader object at 0x7fbb7d0ce7f0>, origin='/...
  File "/usr/lib64/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
         |     -> {'__name__': '__main__', '__doc__': None, '__package__': 'yolox.tools', '__loader__': <_frozen_importlib_external.SourceFileL...
         -> <code object <module> at 0x7fbb7d121270, file "/running_package/yolox/tools/train.py", line 5>

  File "/running_package/yolox/tools/train.py", line 140, in <module>
    args=(exp, args),
               -> Namespace(batch_size=64, cache=True, ckpt=None, devices=8, dist_backend='nccl', dist_url=None, exp_file='exps/custom/yolox_m....

  File "/running_package/yolox/yolox/core/launch.py", line 95, in launch
    start_method=start_method,
                 -> 'fork'

  File "/usr/local/lib64/python3.6/site-packages/torch/multiprocessing/spawn.py", line 179, in start_processes
    process.start()
    |       -> <function BaseProcess.start at 0x7fbb7a16ee18>
    -> <ForkProcess(ForkProcess-1, started)>

  File "/usr/lib64/python3.6/multiprocessing/process.py", line 105, in start
    self._popen = self._Popen(self)
    |    |        |    |      -> <ForkProcess(ForkProcess-1, started)>
    |    |        |    -> <staticmethod object at 0x7fbb7a16b5c0>
    |    |        -> <ForkProcess(ForkProcess-1, started)>
    |    -> None
    -> <ForkProcess(ForkProcess-1, started)>
  File "/usr/lib64/python3.6/multiprocessing/context.py", line 277, in _Popen
    return Popen(process_obj)
           |     -> <ForkProcess(ForkProcess-1, started)>
           -> <class 'multiprocessing.popen_fork.Popen'>
  File "/usr/lib64/python3.6/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
    |    |       -> <ForkProcess(ForkProcess-1, started)>
    |    -> <function Popen._launch at 0x7fba88bdd950>
    -> <multiprocessing.popen_fork.Popen object at 0x7fba88bd6f60>
  File "/usr/lib64/python3.6/multiprocessing/popen_fork.py", line 73, in _launch
    code = process_obj._bootstrap()
           |           -> <function BaseProcess._bootstrap at 0x7fbb7a18a620>
           -> <ForkProcess(ForkProcess-1, started)>
  File "/usr/lib64/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
    |    -> <function BaseProcess.run at 0x7fbb7a16ed90>
    -> <ForkProcess(ForkProcess-1, started)>
  File "/usr/lib64/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
    |    |        |    |        |    -> {}
    |    |        |    |        -> <ForkProcess(ForkProcess-1, started)>
    |    |        |    -> (<function _distributed_worker at 0x7fb86f536f28>, 0, (<function main at 0x7fba895802f0>, 8, 8, 0, 'nccl', 'tcp://127.0.0.1:5...
    |    |        -> <ForkProcess(ForkProcess-1, started)>
    |    -> <function _wrap at 0x7fb86fb3fae8>
    -> <ForkProcess(ForkProcess-1, started)>

  File "/usr/local/lib64/python3.6/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
    fn(i, *args)
    |  |   -> (<function main at 0x7fba895802f0>, 8, 8, 0, 'nccl', 'tcp://127.0.0.1:56604', (\u2552\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2564\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550...
    |  -> 0
    -> <function _distributed_worker at 0x7fb86f536f28>

> File "/running_package/yolox/yolox/core/launch.py", line 147, in _distributed_worker
    main_func(*args)
    | 
    -> <function main at 0x7fba895802f0>

  File "/running_package/yolox/tools/train.py", line 117, in main
    trainer.train()
    |       -> <function Trainer.train at 0x7fb861e00510>
    -> <yolox.core.trainer.Trainer object at 0x7fba88bde828>

  File "/running_package/yolox/yolox/core/trainer.py", line 75, in train
    self.train_in_epoch()
    |    -> <function Trainer.train_in_epoch at 0x7fb861e1ed08>
    -> <yolox.core.trainer.Trainer object at 0x7fba88bde828>

  File "/running_package/yolox/yolox/core/trainer.py", line 85, in train_in_epoch
    self.after_epoch()
    |    -> <function Trainer.after_epoch at 0x7fba8956e510>
    -> <yolox.core.trainer.Trainer object at 0x7fba88bde828>

  File "/running_package/yolox/yolox/core/trainer.py", line 232, in after_epoch
    self.evaluate_and_save_model()
    |    -> <function Trainer.evaluate_and_save_model at 0x7fba8956e7b8>
    -> <yolox.core.trainer.Trainer object at 0x7fba88bde828>

  File "/running_package/yolox/yolox/core/trainer.py", line 336, in evaluate_and_save_model
    evalmodel, self.evaluator, self.is_distributed
    |          |    |          |    -> True
    |          |    |          -> <yolox.core.trainer.Trainer object at 0x7fba88bde828>
    |          |    -> <yolox.evaluators.coco_evaluator.COCOEvaluator object at 0x7fba54a49128>
    |          -> <yolox.core.trainer.Trainer object at 0x7fba88bde828>
    -> YOLOX(
         (backbone): YOLOPAFPN(
           (backbone): CoaT(
             (serial_stages): ModuleList(
               (0): SerialStage(
                 ...

  File "/running_package/yolox/yolox/exp/yolox_base.py", line 318, in eval
    return evaluator.evaluate(model, is_distributed, half)
           |         |        |      |               -> False
           |         |        |      -> True
           |         |        -> YOLOX(
           |         |             (backbone): YOLOPAFPN(
           |         |               (backbone): CoaT(
           |         |                 (serial_stages): ModuleList(
           |         |                   (0): SerialStage(
           |         |                     ...
           |         -> <function COCOEvaluator.evaluate at 0x7fba8956d268>
           -> <yolox.evaluators.coco_evaluator.COCOEvaluator object at 0x7fba54a49128>

  File "/running_package/yolox/yolox/evaluators/coco_evaluator.py", line 195, in evaluate
    eval_results = self.evaluate_prediction(data_list, statistics)
                   |    |                   |          -> tensor([ 38.8999,   2.7598, 624.0000], device='cuda:0')
                   |    |                   -> [{'image_id': 397133, 'category_id': 1, 'bbox': [380.73602294921875, 65.4205322265625, 121.3046875, 285.68450927734375], 'sco...
                   |    -> <function COCOEvaluator.evaluate_prediction at 0x7fba8956d378>
                   -> <yolox.evaluators.coco_evaluator.COCOEvaluator object at 0x7fba54a49128>

  File "/running_package/yolox/yolox/evaluators/coco_evaluator.py", line 276, in evaluate_prediction
    cocoEval = COCOeval(cocoGt, cocoDt, annType[1])
               |        |       |       -> ['segm', 'bbox', 'keypoints']
               |        |       -> <pycocotools.coco.COCO object at 0x7fba891ffd68>
               |        -> <pycocotools.coco.COCO object at 0x7fb6443c3860>
               -> <class 'yolox.layers.fast_coco_eval_api.COCOeval_opt'>

  File "/running_package/yolox/yolox/layers/fast_coco_eval_api.py", line 24, in __init__
    self.module = FastCOCOEvalOp().load()
    |             -> <class 'yolox.layers.jit_ops.FastCOCOEvalOp'>
    -> <yolox.layers.fast_coco_eval_api.COCOeval_opt object at 0x7fb63cc589e8>

  File "/running_package/yolox/yolox/layers/jit_ops.py", line 87, in load
    return self.jit_load(verbose)
           |    |        -> True
           |    -> <function JitOp.jit_load at 0x7fb63d6a5b70>
           -> <yolox.layers.jit_ops.FastCOCOEvalOp object at 0x7fb63dcc01d0>

  File "/running_package/yolox/yolox/layers/jit_ops.py", line 107, in jit_load
    verbose=verbose,
            -> True

  File "/usr/local/lib64/python3.6/site-packages/torch/utils/cpp_extension.py", line 1091, in load
    keep_intermediates=keep_intermediates)
                       -> True

  File "/usr/local/lib64/python3.6/site-packages/torch/utils/cpp_extension.py", line 1302, in _jit_compile
    is_standalone=is_standalone)
                  -> False

  File "/usr/local/lib64/python3.6/site-packages/torch/utils/cpp_extension.py", line 1373, in _write_ninja_file_and_build_library
    verify_ninja_availability()
    -> <function verify_ninja_availability at 0x7f8ef56ab400>

  File "/usr/local/lib64/python3.6/site-packages/torch/utils/cpp_extension.py", line 1429, in verify_ninja_availability
    raise RuntimeError("Ninja is required to load C++ extensions")

RuntimeError: Ninja is required to load C++ extensions

@GOATmessi8
Copy link
Member

@FateScript I think we should add ninja to requirements and fix this compatibility

@FateScript
Copy link
Member

Did you installed ninja @npurson ? It's required by our txt file here.
If you do installed ninja, such an errror is wired and I will check it carefully.

@npurson
Copy link
Author

npurson commented Apr 18, 2022

Did you installed ninja @npurson ? It's required by our txt file here. If you do installed ninja, such an errror is wired and I will check it carefully.

I do have ninja installed in my pip env, but ninja is not in the $PATH, i.e.,

$ pip list | grep ninja
ninja                              1.10.2.3
WARNING: You are using pip version 21.0; however, version 21.3.1 is available.
You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command.
$ ninja --version
bash: ninja: command not found

I assume this should be solved by the following cmds from RuntimeError: Ninja is required to load C++ extension #167

wget https://github.com/ninja-build/ninja/releases/download/v1.8.2/ninja-linux.zip
sudo unzip ninja-linux.zip -d /usr/local/bin/
sudo update-alternatives --install /usr/bin/ninja ninja /usr/local/bin/ninja 1 --force 

But I can't get sudo privileges on the cluster.

BTW, the environment works fine with the code of previous version, so I guess there is sth. to do with the forward compatibility.

@FateScript
Copy link
Member

FateScript commented Apr 18, 2022

@npurson after running pip install ninja in cli, ninja will automatically installed(and ninja --version also works).
If you would like to running code as expected, just rerun pip install -v -e . in YOLOX folder and see if everything works well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants