RecursionError: maximum recursion depth exceeded while calling a Python object while training recognition model #13785
-
This is the issue mentioned in various threads but can not solve (paddleocr) nazmuddoha_ansary@ML-DEV:/backup2/PaddleOCR$ python3 tools/train.py -c configs/rec/rec_svtrnet_bn_en.yml
[2024/08/30 04:26:04] ppocr INFO: Architecture :
[2024/08/30 04:26:04] ppocr INFO: Backbone :
[2024/08/30 04:26:04] ppocr INFO: depth : [3, 6, 3]
[2024/08/30 04:26:04] ppocr INFO: embed_dim : [64, 128, 256]
[2024/08/30 04:26:04] ppocr INFO: img_size : [32, 100]
[2024/08/30 04:26:04] ppocr INFO: last_stage : True
[2024/08/30 04:26:04] ppocr INFO: local_mixer : [[7, 11], [7, 11], [7, 11]]
[2024/08/30 04:26:04] ppocr INFO: mixer : ['Local', 'Local', 'Local', 'Local', 'Local', 'Local', 'Global', 'Global', 'Global', 'Global', 'Global', 'Global']
[2024/08/30 04:26:04] ppocr INFO: name : SVTRNet
[2024/08/30 04:26:04] ppocr INFO: num_heads : [2, 4, 8]
[2024/08/30 04:26:04] ppocr INFO: out_channels : 192
[2024/08/30 04:26:04] ppocr INFO: out_char_num : 25
[2024/08/30 04:26:04] ppocr INFO: patch_merging : Conv
[2024/08/30 04:26:04] ppocr INFO: prenorm : False
[2024/08/30 04:26:04] ppocr INFO: Head :
[2024/08/30 04:26:04] ppocr INFO: name : CTCHead
[2024/08/30 04:26:04] ppocr INFO: Neck :
[2024/08/30 04:26:04] ppocr INFO: encoder_type : reshape
[2024/08/30 04:26:04] ppocr INFO: name : SequenceEncoder
[2024/08/30 04:26:04] ppocr INFO: Transform :
[2024/08/30 04:26:04] ppocr INFO: name : STN_ON
[2024/08/30 04:26:04] ppocr INFO: num_control_points : 20
[2024/08/30 04:26:04] ppocr INFO: stn_activation : none
[2024/08/30 04:26:04] ppocr INFO: tps_inputsize : [32, 64]
[2024/08/30 04:26:04] ppocr INFO: tps_margins : [0.05, 0.05]
[2024/08/30 04:26:04] ppocr INFO: tps_outputsize : [32, 100]
[2024/08/30 04:26:04] ppocr INFO: algorithm : SVTR
[2024/08/30 04:26:04] ppocr INFO: model_type : rec
[2024/08/30 04:26:04] ppocr INFO: Eval :
[2024/08/30 04:26:04] ppocr INFO: dataset :
[2024/08/30 04:26:04] ppocr INFO: data_dir : /backup2/__archive__/data
[2024/08/30 04:26:04] ppocr INFO: label_file_list : ['/backup2/__archive__/data/val.txt']
[2024/08/30 04:26:04] ppocr INFO: name : SimpleDataSet
[2024/08/30 04:26:04] ppocr INFO: transforms :
[2024/08/30 04:26:04] ppocr INFO: DecodeImage :
[2024/08/30 04:26:04] ppocr INFO: channel_first : False
[2024/08/30 04:26:04] ppocr INFO: img_mode : BGR
[2024/08/30 04:26:04] ppocr INFO: RecResizeImg :
[2024/08/30 04:26:04] ppocr INFO: image_shape : [3, 48, 320]
[2024/08/30 04:26:04] ppocr INFO: KeepKeys :
[2024/08/30 04:26:04] ppocr INFO: keep_keys : ['image', 'label']
[2024/08/30 04:26:04] ppocr INFO: loader :
[2024/08/30 04:26:04] ppocr INFO: batch_size_per_card : 16
[2024/08/30 04:26:04] ppocr INFO: drop_last : False
[2024/08/30 04:26:04] ppocr INFO: num_workers : 2
[2024/08/30 04:26:04] ppocr INFO: shuffle : False
[2024/08/30 04:26:04] ppocr INFO: Global :
[2024/08/30 04:26:04] ppocr INFO: cal_metric_during_train : True
[2024/08/30 04:26:04] ppocr INFO: character_dict_path : /backup2/__archive__/data/mixed_dict.txt
[2024/08/30 04:26:04] ppocr INFO: checkpoints : None
[2024/08/30 04:26:04] ppocr INFO: d2s_train_image_shape : [3, 64, 256]
[2024/08/30 04:26:04] ppocr INFO: distributed : False
[2024/08/30 04:26:04] ppocr INFO: epoch_num : 20
[2024/08/30 04:26:04] ppocr INFO: eval_batch_step : [0, 5]
[2024/08/30 04:26:04] ppocr INFO: infer_img : doc/imgs_words_en/word_10.png
[2024/08/30 04:26:04] ppocr INFO: infer_mode : False
[2024/08/30 04:26:04] ppocr INFO: log_smooth_window : 20
[2024/08/30 04:26:04] ppocr INFO: max_text_length : 40
[2024/08/30 04:26:04] ppocr INFO: pretrained_model : ./rec_svtr_tiny_none_ctc_en_train/best_accuracy
[2024/08/30 04:26:04] ppocr INFO: print_batch_step : 10
[2024/08/30 04:26:04] ppocr INFO: save_epoch_step : 1
[2024/08/30 04:26:04] ppocr INFO: save_inference_dir : None
[2024/08/30 04:26:04] ppocr INFO: save_model_dir : ./output/rec/svtr/
[2024/08/30 04:26:04] ppocr INFO: save_res_path : ./output/rec/predicts_svtr_tiny.txt
[2024/08/30 04:26:04] ppocr INFO: use_gpu : True
[2024/08/30 04:26:04] ppocr INFO: use_space_char : False
[2024/08/30 04:26:04] ppocr INFO: use_visualdl : False
[2024/08/30 04:26:04] ppocr INFO: Loss :
[2024/08/30 04:26:04] ppocr INFO: name : CTCLoss
[2024/08/30 04:26:04] ppocr INFO: Metric :
[2024/08/30 04:26:04] ppocr INFO: main_indicator : acc
[2024/08/30 04:26:04] ppocr INFO: name : RecMetric
[2024/08/30 04:26:04] ppocr INFO: Optimizer :
[2024/08/30 04:26:04] ppocr INFO: beta1 : 0.9
[2024/08/30 04:26:04] ppocr INFO: beta2 : 0.99
[2024/08/30 04:26:04] ppocr INFO: epsilon : 1e-08
[2024/08/30 04:26:04] ppocr INFO: lr :
[2024/08/30 04:26:04] ppocr INFO: learning_rate : 0.0005
[2024/08/30 04:26:04] ppocr INFO: name : Cosine
[2024/08/30 04:26:04] ppocr INFO: warmup_epoch : 2
[2024/08/30 04:26:04] ppocr INFO: name : AdamW
[2024/08/30 04:26:04] ppocr INFO: no_weight_decay_name : norm pos_embed
[2024/08/30 04:26:04] ppocr INFO: one_dim_param_no_weight_decay : True
[2024/08/30 04:26:04] ppocr INFO: weight_decay : 0.05
[2024/08/30 04:26:04] ppocr INFO: PostProcess :
[2024/08/30 04:26:04] ppocr INFO: name : CTCLabelDecode
[2024/08/30 04:26:04] ppocr INFO: Train :
[2024/08/30 04:26:04] ppocr INFO: dataset :
[2024/08/30 04:26:04] ppocr INFO: data_dir : /backup2/__archive__/data
[2024/08/30 04:26:04] ppocr INFO: label_file_list : ['/backup2/__archive__/data/train.txt']
[2024/08/30 04:26:04] ppocr INFO: name : SimpleDataSet
[2024/08/30 04:26:04] ppocr INFO: transforms :
[2024/08/30 04:26:04] ppocr INFO: DecodeImage :
[2024/08/30 04:26:04] ppocr INFO: channel_first : False
[2024/08/30 04:26:04] ppocr INFO: img_mode : BGR
[2024/08/30 04:26:04] ppocr INFO: RecAug : None
[2024/08/30 04:26:04] ppocr INFO: CTCLabelEncode : None
[2024/08/30 04:26:04] ppocr INFO: AttnLabelEncode : None
[2024/08/30 04:26:04] ppocr INFO: RecResizeImg :
[2024/08/30 04:26:04] ppocr INFO: image_shape : [3, 48, 320]
[2024/08/30 04:26:04] ppocr INFO: KeepKeys :
[2024/08/30 04:26:04] ppocr INFO: keep_keys : ['image', 'label', 'length']
[2024/08/30 04:26:04] ppocr INFO: loader :
[2024/08/30 04:26:04] ppocr INFO: batch_size_per_card : 16
[2024/08/30 04:26:04] ppocr INFO: drop_last : True
[2024/08/30 04:26:04] ppocr INFO: num_workers : 2
[2024/08/30 04:26:04] ppocr INFO: shuffle : True
[2024/08/30 04:26:04] ppocr INFO: profiler_options : None
[2024/08/30 04:26:04] ppocr INFO: train with paddle 2.6.1 and device Place(gpu:0)
[2024/08/30 04:26:04] ppocr INFO: Initialize indexs of datasets:['/backup2/__archive__/data/train.txt']
[2024/08/30 04:26:04] ppocr INFO: Initialize indexs of datasets:['/backup2/__archive__/data/val.txt']
W0830 04:26:04.211480 125000 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 8.6, Driver API Version: 12.2, Runtime API Version: 11.8
W0830 04:26:04.212136 125000 gpu_resources.cc:164] device: 0, cuDNN Version: 8.9.
[2024/08/30 04:26:04] ppocr INFO: train dataloader has 31 iters
[2024/08/30 04:26:04] ppocr INFO: valid dataloader has 4 iters
[2024/08/30 04:26:04] ppocr WARNING: The shape of model params head.fc.weight [192, 168] not matched with loaded params head.fc.weight [192, 37] !
[2024/08/30 04:26:04] ppocr WARNING: The shape of model params head.fc.bias [168] not matched with loaded params head.fc.bias [37] !
[2024/08/30 04:26:04] ppocr INFO: load pretrain successful from ./rec_svtr_tiny_none_ctc_en_train/best_accuracy
[2024/08/30 04:26:04] ppocr INFO: During the training process, after the 0th iteration, an evaluation is run every 5 iterations
Exception in thread Thread-1:
Traceback (most recent call last):
File "/home/nazmuddoha_ansary/anaconda3/envs/paddleocr/lib/python3.9/threading.py", line 980, in _bootstrap_inner
self.run()
File "/home/nazmuddoha_ansary/anaconda3/envs/paddleocr/lib/python3.9/threading.py", line 917, in run
self._target(*self._args, **self._kwargs)
File "/home/nazmuddoha_ansary/anaconda3/envs/paddleocr/lib/python3.9/site-packages/paddle/io/dataloader/dataloader_iter.py", line 603, in _thread_loop
batch = self._get_data()
File "/home/nazmuddoha_ansary/anaconda3/envs/paddleocr/lib/python3.9/site-packages/paddle/io/dataloader/dataloader_iter.py", line 752, in _get_data
batch.reraise()
File "/home/nazmuddoha_ansary/anaconda3/envs/paddleocr/lib/python3.9/site-packages/paddle/io/dataloader/worker.py", line 187, in reraise
raise self.exc_type(msg)
RecursionError: DataLoader worker(0) caught RecursionError with message:
Traceback (most recent call last):
File "/backup2/PaddleOCR/ppocr/data/simple_dataset.py", line 136, in __getitem__
outs = transform(data, self.ops)
File "/backup2/PaddleOCR/ppocr/data/imaug/__init__.py", line 72, in transform
data = op(data)
File "/backup2/PaddleOCR/ppocr/data/imaug/rec_img_aug.py", line 58, in __call__
img = tia_distort(img, random.randint(3, 6))
File "/backup2/PaddleOCR/ppocr/data/imaug/text_image_aug/augment.py", line 63, in tia_distort
dst = trans.generate()
File "/backup2/PaddleOCR/ppocr/data/imaug/text_image_aug/warp_mls.py", line 41, in generate
return self.gen_img()
File "/backup2/PaddleOCR/ppocr/data/imaug/text_image_aug/warp_mls.py", line 162, in gen_img
nx = np.clip(nx, 0, src_w - 1)
File "/home/nazmuddoha_ansary/anaconda3/envs/paddleocr/lib/python3.9/site-packages/numpy/core/fromnumeric.py", line 2169, in clip
return _wrapfunc(a, 'clip', a_min, a_max, out=out, **kwargs)
File "/home/nazmuddoha_ansary/anaconda3/envs/paddleocr/lib/python3.9/site-packages/numpy/core/fromnumeric.py", line 59, in _wrapfunc
return bound(*args, **kwds)
RecursionError: maximum recursion depth exceeded
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/nazmuddoha_ansary/anaconda3/envs/paddleocr/lib/python3.9/site-packages/paddle/io/dataloader/worker.py", line 372, in _worker_loop
batch = fetcher.fetch(indices)
File "/home/nazmuddoha_ansary/anaconda3/envs/paddleocr/lib/python3.9/site-packages/paddle/io/dataloader/fetcher.py", line 77, in fetch
data.append(self.dataset[idx])
File "/backup2/PaddleOCR/ppocr/data/simple_dataset.py", line 151, in __getitem__
return self.__getitem__(rnd_idx)
File "/backup2/PaddleOCR/ppocr/data/simple_dataset.py", line 151, in __getitem__
return self.__getitem__(rnd_idx)
File "/backup2/PaddleOCR/ppocr/data/simple_dataset.py", line 151, in __getitem__
return self.__getitem__(rnd_idx)
[Previous line repeated 969 more times]
File "/backup2/PaddleOCR/ppocr/data/simple_dataset.py", line 140, in __getitem__
data_line, traceback.format_exc()
File "/home/nazmuddoha_ansary/anaconda3/envs/paddleocr/lib/python3.9/traceback.py", line 167, in format_exc
return "".join(format_exception(*sys.exc_info(), limit=limit, chain=chain))
File "/home/nazmuddoha_ansary/anaconda3/envs/paddleocr/lib/python3.9/traceback.py", line 120, in format_exception
return list(TracebackException(
File "/home/nazmuddoha_ansary/anaconda3/envs/paddleocr/lib/python3.9/site-packages/exceptiongroup/_formatting.py", line 96, in __init__
self.stack = traceback.StackSummary.extract(
File "/home/nazmuddoha_ansary/anaconda3/envs/paddleocr/lib/python3.9/traceback.py", line 366, in extract
f.line
File "/home/nazmuddoha_ansary/anaconda3/envs/paddleocr/lib/python3.9/traceback.py", line 288, in line
self._line = linecache.getline(self.filename, self.lineno).strip()
File "/home/nazmuddoha_ansary/anaconda3/envs/paddleocr/lib/python3.9/linecache.py", line 30, in getline
lines = getlines(filename, module_globals)
File "/home/nazmuddoha_ansary/anaconda3/envs/paddleocr/lib/python3.9/linecache.py", line 46, in getlines
return updatecache(filename, module_globals)
File "/home/nazmuddoha_ansary/anaconda3/envs/paddleocr/lib/python3.9/linecache.py", line 86, in updatecache
if len(cache[filename]) != 1:
RecursionError: maximum recursion depth exceeded while calling a Python object
Traceback (most recent call last):
File "/backup2/PaddleOCR/tools/train.py", line 255, in <module>
main(config, device, logger, vdl_writer, seed)
File "/backup2/PaddleOCR/tools/train.py", line 208, in main
program.train(
File "/backup2/PaddleOCR/tools/program.py", line 305, in train
for idx, batch in enumerate(train_dataloader):
File "/home/nazmuddoha_ansary/anaconda3/envs/paddleocr/lib/python3.9/site-packages/paddle/io/dataloader/dataloader_iter.py", line 826, in __next__
self._reader.read_next_list()[0]
SystemError: (Fatal) Blocking queue is killed because the data reader raises an exception.
[Hint: Expected killed_ != true, but received killed_:1 == true:1.] (at /paddle/paddle/fluid/operators/reader/blocking_queue.h:175) data.zip |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments
-
Thanks for the feedback, I will try to reproduce the issue. |
Beta Was this translation helpful? Give feedback.
-
There's something wrong with your configuration file. The Global:
use_gpu: True
use_space_char: False
character_dict_path: ./train_data/data/mixed_dict.txt
epoch_num: 20
log_smooth_window: 20
print_batch_step: 10
save_model_dir: ./output/rec/svtr/
save_epoch_step: 1
# evaluation is run every 5 iterations after the 0th iteration
eval_batch_step: [0, 5]
cal_metric_during_train: True
pretrained_model:
checkpoints:
save_inference_dir:
use_visualdl: False
infer_img: doc/imgs_words_en/word_10.png
max_text_length: 40
infer_mode: False
save_res_path: ./output/rec/predicts_svtr_tiny.txt
d2s_train_image_shape: [3, 64, 256]
Optimizer:
name: AdamW
beta1: 0.9
beta2: 0.99
epsilon: 1.e-8
weight_decay: 0.05
no_weight_decay_name: norm pos_embed
one_dim_param_no_weight_decay: True
lr:
name: Cosine
learning_rate: 0.0005
warmup_epoch: 2
Architecture:
model_type: rec
algorithm: SVTR
Transform:
name: STN_ON
tps_inputsize: [32, 64]
tps_outputsize: [32, 100]
num_control_points: 20
tps_margins: [0.05,0.05]
stn_activation: none
Backbone:
name: SVTRNet
img_size: [32, 100]
out_char_num: 25 # W//4 or W//8 or W/12
out_channels: 192
patch_merging: 'Conv'
embed_dim: [64, 128, 256]
depth: [3, 6, 3]
num_heads: [2, 4, 8]
mixer: ['Local','Local','Local','Local','Local','Local','Global','Global','Global','Global','Global','Global']
local_mixer: [[7, 11], [7, 11], [7, 11]]
last_stage: True
prenorm: False
Neck:
name: SequenceEncoder
encoder_type: reshape
Head:
name: CTCHead
Loss:
name: CTCLoss
PostProcess:
name: CTCLabelDecode
Metric:
name: RecMetric
main_indicator: acc
Train:
dataset:
name: SimpleDataSet
data_dir: ./train_data/data
label_file_list: ["train_data/data/train.txt"]
transforms:
- DecodeImage:
img_mode: BGR
channel_first: False
- RecAug: null
- CTCLabelEncode: null
- RecResizeImg:
image_shape: [3, 48, 320]
- KeepKeys:
keep_keys: [image, label, length]
loader:
shuffle: True
drop_last: True
batch_size_per_card: 16
num_workers: 1
Eval:
dataset:
name: SimpleDataSet
data_dir: ./train_data/data
label_file_list: ["./train_data/data/val.txt"]
transforms:
- DecodeImage:
img_mode: BGR
channel_first: False
- CTCLabelEncode: null
- RecResizeImg:
image_shape: [3, 48, 320]
- KeepKeys:
keep_keys: [image, label]
loader:
shuffle: False
drop_last: False
batch_size_per_card: 16
num_workers: 1
|
Beta Was this translation helpful? Give feedback.
-
Thank you for your quick solve. |
Beta Was this translation helpful? Give feedback.
There's something wrong with your configuration file. The
train
section has bothAttnLabelEncode
andCTCLabelEncode
, but nothing in theval
section.