RecursionError: maximum recursion depth exceeded while calling a Python object while training recognition model #13785

mnansaryapsis · 2024-08-29T22:35:39Z

mnansaryapsis
Aug 29, 2024

This is the issue mentioned in various threads but can not solve
I have tried using absolute paths of images did not work

(paddleocr) nazmuddoha_ansary@ML-DEV:/backup2/PaddleOCR$ python3 tools/train.py -c configs/rec/rec_svtrnet_bn_en.yml
[2024/08/30 04:26:04] ppocr INFO: Architecture : 
[2024/08/30 04:26:04] ppocr INFO:     Backbone : 
[2024/08/30 04:26:04] ppocr INFO:         depth : [3, 6, 3]
[2024/08/30 04:26:04] ppocr INFO:         embed_dim : [64, 128, 256]
[2024/08/30 04:26:04] ppocr INFO:         img_size : [32, 100]
[2024/08/30 04:26:04] ppocr INFO:         last_stage : True
[2024/08/30 04:26:04] ppocr INFO:         local_mixer : [[7, 11], [7, 11], [7, 11]]
[2024/08/30 04:26:04] ppocr INFO:         mixer : ['Local', 'Local', 'Local', 'Local', 'Local', 'Local', 'Global', 'Global', 'Global', 'Global', 'Global', 'Global']
[2024/08/30 04:26:04] ppocr INFO:         name : SVTRNet
[2024/08/30 04:26:04] ppocr INFO:         num_heads : [2, 4, 8]
[2024/08/30 04:26:04] ppocr INFO:         out_channels : 192
[2024/08/30 04:26:04] ppocr INFO:         out_char_num : 25
[2024/08/30 04:26:04] ppocr INFO:         patch_merging : Conv
[2024/08/30 04:26:04] ppocr INFO:         prenorm : False
[2024/08/30 04:26:04] ppocr INFO:     Head : 
[2024/08/30 04:26:04] ppocr INFO:         name : CTCHead
[2024/08/30 04:26:04] ppocr INFO:     Neck : 
[2024/08/30 04:26:04] ppocr INFO:         encoder_type : reshape
[2024/08/30 04:26:04] ppocr INFO:         name : SequenceEncoder
[2024/08/30 04:26:04] ppocr INFO:     Transform : 
[2024/08/30 04:26:04] ppocr INFO:         name : STN_ON
[2024/08/30 04:26:04] ppocr INFO:         num_control_points : 20
[2024/08/30 04:26:04] ppocr INFO:         stn_activation : none
[2024/08/30 04:26:04] ppocr INFO:         tps_inputsize : [32, 64]
[2024/08/30 04:26:04] ppocr INFO:         tps_margins : [0.05, 0.05]
[2024/08/30 04:26:04] ppocr INFO:         tps_outputsize : [32, 100]
[2024/08/30 04:26:04] ppocr INFO:     algorithm : SVTR
[2024/08/30 04:26:04] ppocr INFO:     model_type : rec
[2024/08/30 04:26:04] ppocr INFO: Eval : 
[2024/08/30 04:26:04] ppocr INFO:     dataset : 
[2024/08/30 04:26:04] ppocr INFO:         data_dir : /backup2/__archive__/data
[2024/08/30 04:26:04] ppocr INFO:         label_file_list : ['/backup2/__archive__/data/val.txt']
[2024/08/30 04:26:04] ppocr INFO:         name : SimpleDataSet
[2024/08/30 04:26:04] ppocr INFO:         transforms : 
[2024/08/30 04:26:04] ppocr INFO:             DecodeImage : 
[2024/08/30 04:26:04] ppocr INFO:                 channel_first : False
[2024/08/30 04:26:04] ppocr INFO:                 img_mode : BGR
[2024/08/30 04:26:04] ppocr INFO:             RecResizeImg : 
[2024/08/30 04:26:04] ppocr INFO:                 image_shape : [3, 48, 320]
[2024/08/30 04:26:04] ppocr INFO:             KeepKeys : 
[2024/08/30 04:26:04] ppocr INFO:                 keep_keys : ['image', 'label']
[2024/08/30 04:26:04] ppocr INFO:     loader : 
[2024/08/30 04:26:04] ppocr INFO:         batch_size_per_card : 16
[2024/08/30 04:26:04] ppocr INFO:         drop_last : False
[2024/08/30 04:26:04] ppocr INFO:         num_workers : 2
[2024/08/30 04:26:04] ppocr INFO:         shuffle : False
[2024/08/30 04:26:04] ppocr INFO: Global : 
[2024/08/30 04:26:04] ppocr INFO:     cal_metric_during_train : True
[2024/08/30 04:26:04] ppocr INFO:     character_dict_path : /backup2/__archive__/data/mixed_dict.txt
[2024/08/30 04:26:04] ppocr INFO:     checkpoints : None
[2024/08/30 04:26:04] ppocr INFO:     d2s_train_image_shape : [3, 64, 256]
[2024/08/30 04:26:04] ppocr INFO:     distributed : False
[2024/08/30 04:26:04] ppocr INFO:     epoch_num : 20
[2024/08/30 04:26:04] ppocr INFO:     eval_batch_step : [0, 5]
[2024/08/30 04:26:04] ppocr INFO:     infer_img : doc/imgs_words_en/word_10.png
[2024/08/30 04:26:04] ppocr INFO:     infer_mode : False
[2024/08/30 04:26:04] ppocr INFO:     log_smooth_window : 20
[2024/08/30 04:26:04] ppocr INFO:     max_text_length : 40
[2024/08/30 04:26:04] ppocr INFO:     pretrained_model : ./rec_svtr_tiny_none_ctc_en_train/best_accuracy
[2024/08/30 04:26:04] ppocr INFO:     print_batch_step : 10
[2024/08/30 04:26:04] ppocr INFO:     save_epoch_step : 1
[2024/08/30 04:26:04] ppocr INFO:     save_inference_dir : None
[2024/08/30 04:26:04] ppocr INFO:     save_model_dir : ./output/rec/svtr/
[2024/08/30 04:26:04] ppocr INFO:     save_res_path : ./output/rec/predicts_svtr_tiny.txt
[2024/08/30 04:26:04] ppocr INFO:     use_gpu : True
[2024/08/30 04:26:04] ppocr INFO:     use_space_char : False
[2024/08/30 04:26:04] ppocr INFO:     use_visualdl : False
[2024/08/30 04:26:04] ppocr INFO: Loss : 
[2024/08/30 04:26:04] ppocr INFO:     name : CTCLoss
[2024/08/30 04:26:04] ppocr INFO: Metric : 
[2024/08/30 04:26:04] ppocr INFO:     main_indicator : acc
[2024/08/30 04:26:04] ppocr INFO:     name : RecMetric
[2024/08/30 04:26:04] ppocr INFO: Optimizer : 
[2024/08/30 04:26:04] ppocr INFO:     beta1 : 0.9
[2024/08/30 04:26:04] ppocr INFO:     beta2 : 0.99
[2024/08/30 04:26:04] ppocr INFO:     epsilon : 1e-08
[2024/08/30 04:26:04] ppocr INFO:     lr : 
[2024/08/30 04:26:04] ppocr INFO:         learning_rate : 0.0005
[2024/08/30 04:26:04] ppocr INFO:         name : Cosine
[2024/08/30 04:26:04] ppocr INFO:         warmup_epoch : 2
[2024/08/30 04:26:04] ppocr INFO:     name : AdamW
[2024/08/30 04:26:04] ppocr INFO:     no_weight_decay_name : norm pos_embed
[2024/08/30 04:26:04] ppocr INFO:     one_dim_param_no_weight_decay : True
[2024/08/30 04:26:04] ppocr INFO:     weight_decay : 0.05
[2024/08/30 04:26:04] ppocr INFO: PostProcess : 
[2024/08/30 04:26:04] ppocr INFO:     name : CTCLabelDecode
[2024/08/30 04:26:04] ppocr INFO: Train : 
[2024/08/30 04:26:04] ppocr INFO:     dataset : 
[2024/08/30 04:26:04] ppocr INFO:         data_dir : /backup2/__archive__/data
[2024/08/30 04:26:04] ppocr INFO:         label_file_list : ['/backup2/__archive__/data/train.txt']
[2024/08/30 04:26:04] ppocr INFO:         name : SimpleDataSet
[2024/08/30 04:26:04] ppocr INFO:         transforms : 
[2024/08/30 04:26:04] ppocr INFO:             DecodeImage : 
[2024/08/30 04:26:04] ppocr INFO:                 channel_first : False
[2024/08/30 04:26:04] ppocr INFO:                 img_mode : BGR
[2024/08/30 04:26:04] ppocr INFO:             RecAug : None
[2024/08/30 04:26:04] ppocr INFO:             CTCLabelEncode : None
[2024/08/30 04:26:04] ppocr INFO:             AttnLabelEncode : None
[2024/08/30 04:26:04] ppocr INFO:             RecResizeImg : 
[2024/08/30 04:26:04] ppocr INFO:                 image_shape : [3, 48, 320]
[2024/08/30 04:26:04] ppocr INFO:             KeepKeys : 
[2024/08/30 04:26:04] ppocr INFO:                 keep_keys : ['image', 'label', 'length']
[2024/08/30 04:26:04] ppocr INFO:     loader : 
[2024/08/30 04:26:04] ppocr INFO:         batch_size_per_card : 16
[2024/08/30 04:26:04] ppocr INFO:         drop_last : True
[2024/08/30 04:26:04] ppocr INFO:         num_workers : 2
[2024/08/30 04:26:04] ppocr INFO:         shuffle : True
[2024/08/30 04:26:04] ppocr INFO: profiler_options : None
[2024/08/30 04:26:04] ppocr INFO: train with paddle 2.6.1 and device Place(gpu:0)
[2024/08/30 04:26:04] ppocr INFO: Initialize indexs of datasets:['/backup2/__archive__/data/train.txt']
[2024/08/30 04:26:04] ppocr INFO: Initialize indexs of datasets:['/backup2/__archive__/data/val.txt']
W0830 04:26:04.211480 125000 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 8.6, Driver API Version: 12.2, Runtime API Version: 11.8
W0830 04:26:04.212136 125000 gpu_resources.cc:164] device: 0, cuDNN Version: 8.9.
[2024/08/30 04:26:04] ppocr INFO: train dataloader has 31 iters
[2024/08/30 04:26:04] ppocr INFO: valid dataloader has 4 iters
[2024/08/30 04:26:04] ppocr WARNING: The shape of model params head.fc.weight [192, 168] not matched with loaded params head.fc.weight [192, 37] !
[2024/08/30 04:26:04] ppocr WARNING: The shape of model params head.fc.bias [168] not matched with loaded params head.fc.bias [37] !
[2024/08/30 04:26:04] ppocr INFO: load pretrain successful from ./rec_svtr_tiny_none_ctc_en_train/best_accuracy
[2024/08/30 04:26:04] ppocr INFO: During the training process, after the 0th iteration, an evaluation is run every 5 iterations
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/home/nazmuddoha_ansary/anaconda3/envs/paddleocr/lib/python3.9/threading.py", line 980, in _bootstrap_inner
    self.run()
  File "/home/nazmuddoha_ansary/anaconda3/envs/paddleocr/lib/python3.9/threading.py", line 917, in run
    self._target(*self._args, **self._kwargs)
  File "/home/nazmuddoha_ansary/anaconda3/envs/paddleocr/lib/python3.9/site-packages/paddle/io/dataloader/dataloader_iter.py", line 603, in _thread_loop
    batch = self._get_data()
  File "/home/nazmuddoha_ansary/anaconda3/envs/paddleocr/lib/python3.9/site-packages/paddle/io/dataloader/dataloader_iter.py", line 752, in _get_data
    batch.reraise()
  File "/home/nazmuddoha_ansary/anaconda3/envs/paddleocr/lib/python3.9/site-packages/paddle/io/dataloader/worker.py", line 187, in reraise
    raise self.exc_type(msg)
RecursionError: DataLoader worker(0) caught RecursionError with message:
Traceback (most recent call last):
  File "/backup2/PaddleOCR/ppocr/data/simple_dataset.py", line 136, in __getitem__
    outs = transform(data, self.ops)
  File "/backup2/PaddleOCR/ppocr/data/imaug/__init__.py", line 72, in transform
    data = op(data)
  File "/backup2/PaddleOCR/ppocr/data/imaug/rec_img_aug.py", line 58, in __call__
    img = tia_distort(img, random.randint(3, 6))
  File "/backup2/PaddleOCR/ppocr/data/imaug/text_image_aug/augment.py", line 63, in tia_distort
    dst = trans.generate()
  File "/backup2/PaddleOCR/ppocr/data/imaug/text_image_aug/warp_mls.py", line 41, in generate
    return self.gen_img()
  File "/backup2/PaddleOCR/ppocr/data/imaug/text_image_aug/warp_mls.py", line 162, in gen_img
    nx = np.clip(nx, 0, src_w - 1)
  File "/home/nazmuddoha_ansary/anaconda3/envs/paddleocr/lib/python3.9/site-packages/numpy/core/fromnumeric.py", line 2169, in clip
    return _wrapfunc(a, 'clip', a_min, a_max, out=out, **kwargs)
  File "/home/nazmuddoha_ansary/anaconda3/envs/paddleocr/lib/python3.9/site-packages/numpy/core/fromnumeric.py", line 59, in _wrapfunc
    return bound(*args, **kwds)
RecursionError: maximum recursion depth exceeded

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/nazmuddoha_ansary/anaconda3/envs/paddleocr/lib/python3.9/site-packages/paddle/io/dataloader/worker.py", line 372, in _worker_loop
    batch = fetcher.fetch(indices)
  File "/home/nazmuddoha_ansary/anaconda3/envs/paddleocr/lib/python3.9/site-packages/paddle/io/dataloader/fetcher.py", line 77, in fetch
    data.append(self.dataset[idx])
  File "/backup2/PaddleOCR/ppocr/data/simple_dataset.py", line 151, in __getitem__
    return self.__getitem__(rnd_idx)
  File "/backup2/PaddleOCR/ppocr/data/simple_dataset.py", line 151, in __getitem__
    return self.__getitem__(rnd_idx)
  File "/backup2/PaddleOCR/ppocr/data/simple_dataset.py", line 151, in __getitem__
    return self.__getitem__(rnd_idx)
  [Previous line repeated 969 more times]
  File "/backup2/PaddleOCR/ppocr/data/simple_dataset.py", line 140, in __getitem__
    data_line, traceback.format_exc()
  File "/home/nazmuddoha_ansary/anaconda3/envs/paddleocr/lib/python3.9/traceback.py", line 167, in format_exc
    return "".join(format_exception(*sys.exc_info(), limit=limit, chain=chain))
  File "/home/nazmuddoha_ansary/anaconda3/envs/paddleocr/lib/python3.9/traceback.py", line 120, in format_exception
    return list(TracebackException(
  File "/home/nazmuddoha_ansary/anaconda3/envs/paddleocr/lib/python3.9/site-packages/exceptiongroup/_formatting.py", line 96, in __init__
    self.stack = traceback.StackSummary.extract(
  File "/home/nazmuddoha_ansary/anaconda3/envs/paddleocr/lib/python3.9/traceback.py", line 366, in extract
    f.line
  File "/home/nazmuddoha_ansary/anaconda3/envs/paddleocr/lib/python3.9/traceback.py", line 288, in line
    self._line = linecache.getline(self.filename, self.lineno).strip()
  File "/home/nazmuddoha_ansary/anaconda3/envs/paddleocr/lib/python3.9/linecache.py", line 30, in getline
    lines = getlines(filename, module_globals)
  File "/home/nazmuddoha_ansary/anaconda3/envs/paddleocr/lib/python3.9/linecache.py", line 46, in getlines
    return updatecache(filename, module_globals)
  File "/home/nazmuddoha_ansary/anaconda3/envs/paddleocr/lib/python3.9/linecache.py", line 86, in updatecache
    if len(cache[filename]) != 1:
RecursionError: maximum recursion depth exceeded while calling a Python object

Traceback (most recent call last):
  File "/backup2/PaddleOCR/tools/train.py", line 255, in <module>
    main(config, device, logger, vdl_writer, seed)
  File "/backup2/PaddleOCR/tools/train.py", line 208, in main
    program.train(
  File "/backup2/PaddleOCR/tools/program.py", line 305, in train
    for idx, batch in enumerate(train_dataloader):
  File "/home/nazmuddoha_ansary/anaconda3/envs/paddleocr/lib/python3.9/site-packages/paddle/io/dataloader/dataloader_iter.py", line 826, in __next__
    self._reader.read_next_list()[0]
SystemError: (Fatal) Blocking queue is killed because the data reader raises an exception.
  [Hint: Expected killed_ != true, but received killed_:1 == true:1.] (at /paddle/paddle/fluid/operators/reader/blocking_queue.h:175)

data.zip
The data and config

Answered by GreatV

Aug 30, 2024

There's something wrong with your configuration file. The train section has both AttnLabelEncode and CTCLabelEncode, but nothing in the val section.

Global:
  use_gpu: True
  use_space_char: False
  character_dict_path: ./train_data/data/mixed_dict.txt
  epoch_num: 20
  log_smooth_window: 20
  print_batch_step: 10
  save_model_dir: ./output/rec/svtr/
  save_epoch_step: 1
  # evaluation is run every 5 iterations after the 0th iteration
  eval_batch_step: [0, 5]
  cal_metric_during_train: True
  pretrained_model: 
  checkpoints:
  save_inference_dir:
  use_visualdl: False
  infer_img: doc/imgs_words_en/word_10.png
  max_text_length: 40
  infer_mode: False
  save_res_path: ./output/rec/predi…

View full answer

GreatV · 2024-08-30T00:50:17Z

GreatV
Aug 30, 2024
Maintainer

Thanks for the feedback, I will try to reproduce the issue.

0 replies

GreatV · 2024-08-30T09:16:38Z

GreatV
Aug 30, 2024
Maintainer

There's something wrong with your configuration file. The train section has both AttnLabelEncode and CTCLabelEncode, but nothing in the val section.

Global:
  use_gpu: True
  use_space_char: False
  character_dict_path: ./train_data/data/mixed_dict.txt
  epoch_num: 20
  log_smooth_window: 20
  print_batch_step: 10
  save_model_dir: ./output/rec/svtr/
  save_epoch_step: 1
  # evaluation is run every 5 iterations after the 0th iteration
  eval_batch_step: [0, 5]
  cal_metric_during_train: True
  pretrained_model: 
  checkpoints:
  save_inference_dir:
  use_visualdl: False
  infer_img: doc/imgs_words_en/word_10.png
  max_text_length: 40
  infer_mode: False
  save_res_path: ./output/rec/predicts_svtr_tiny.txt
  d2s_train_image_shape: [3, 64, 256]


Optimizer:
  name: AdamW
  beta1: 0.9
  beta2: 0.99
  epsilon: 1.e-8
  weight_decay: 0.05
  no_weight_decay_name: norm pos_embed
  one_dim_param_no_weight_decay: True
  lr:
    name: Cosine
    learning_rate: 0.0005
    warmup_epoch: 2

Architecture:
  model_type: rec
  algorithm: SVTR
  Transform:
    name: STN_ON
    tps_inputsize: [32, 64]
    tps_outputsize: [32, 100]
    num_control_points: 20
    tps_margins: [0.05,0.05]
    stn_activation: none
  Backbone:
    name: SVTRNet
    img_size: [32, 100]
    out_char_num: 25 # W//4 or W//8 or W/12
    out_channels: 192
    patch_merging: 'Conv'
    embed_dim: [64, 128, 256]
    depth: [3, 6, 3]
    num_heads: [2, 4, 8]
    mixer: ['Local','Local','Local','Local','Local','Local','Global','Global','Global','Global','Global','Global']
    local_mixer: [[7, 11], [7, 11], [7, 11]]
    last_stage: True
    prenorm: False
  Neck:
    name: SequenceEncoder
    encoder_type: reshape
  Head:
    name: CTCHead

Loss:
  name: CTCLoss

PostProcess:
  name: CTCLabelDecode

Metric:
  name: RecMetric
  main_indicator: acc

Train:
  dataset:
    name: SimpleDataSet
    data_dir: ./train_data/data
    label_file_list: ["train_data/data/train.txt"]
    transforms:
      - DecodeImage: 
          img_mode: BGR
          channel_first: False
      - RecAug: null
      - CTCLabelEncode: null
      - RecResizeImg: 
          image_shape: [3, 48, 320]
      - KeepKeys:
          keep_keys: [image, label, length]
  loader:
    shuffle: True
    drop_last: True
    batch_size_per_card: 16
    num_workers: 1

Eval:
  dataset:
    name: SimpleDataSet
    data_dir: ./train_data/data
    label_file_list: ["./train_data/data/val.txt"]
    transforms:
      - DecodeImage: 
          img_mode: BGR
          channel_first: False
      - CTCLabelEncode: null
      - RecResizeImg: 
          image_shape: [3, 48, 320]
      - KeepKeys:
          keep_keys: [image, label]
  loader:
    shuffle: False
    drop_last: False
    batch_size_per_card: 16
    num_workers: 1

0 replies

mnansaryapsis · 2024-08-30T16:52:59Z

mnansaryapsis
Aug 30, 2024
Author

Thank you for your quick solve.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RecursionError: maximum recursion depth exceeded while calling a Python object while training recognition model #13785

{{title}}

Replies: 3 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

RecursionError: maximum recursion depth exceeded while calling a Python object while training recognition model #13785

mnansaryapsis Aug 29, 2024

Replies: 3 comments

GreatV Aug 30, 2024 Maintainer

GreatV Aug 30, 2024 Maintainer

mnansaryapsis Aug 30, 2024 Author

mnansaryapsis
Aug 29, 2024

GreatV
Aug 30, 2024
Maintainer

GreatV
Aug 30, 2024
Maintainer

mnansaryapsis
Aug 30, 2024
Author