-
Notifications
You must be signed in to change notification settings - Fork 54
-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAML模型疑问 #22
Comments
--train : --error Please tell me where the error is. |
@yinzhiqiangluvlzx 你好, 我刚刚测试下,没有问题;我的训练代码:
测试脚本为:
看报错信息应该是你那边一些参数没有修改,导致训练和测试不一致。 |
恩,谢谢大佬回信,我改好batch_size可以运行了 2080ti 11g 运行太慢了,想用三块一起跑,但是模型保存出错,这部分应该怎么解决呢?我把这个问题提到另一个DAML的issues里了
在2020年12月23日 14:33,HT Liu<notifications@github.com> 写道:
@yinzhiqiangluvlzx 你好, 我刚刚测试下,没有问题;我的训练代码:
python3 main.py train --model=DAML --num_fea=2 --batch_size=16
测试脚本为:
python3 main.py test --model=DAML --num_fea=2 --batch_size=16 --pth_path='./checkpoints/DAML_Digital_Music_data_defau
lt.pth'
看报错信息应该是你那边一些参数没有修改,导致训练和测试不一致。
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
@FKCHAN 在那个issue里面已经提到, 多卡模型的save与单卡有点不同, https://pytorch.org/tutorials/beginner/saving_loading_models.html#saving-torch-nn-dataparallel-models 后期的计划, 用pytorch-lightning 包装下模型,更好更简单的支持并行训练。 预计春节前做。 |
好的,那我就先一边训练一遍测试了,期待中,大佬加油,fighting!
在2020年12月23日 14:41,HT Liu<notifications@github.com> 写道:
@FKCHAN 在那个issue里面已经提到, 多卡模型的save与单卡有点不同, https://pytorch.org/tutorials/beginner/saving_loading_models.html#saving-torch-nn-dataparallel-models
后期的计划, 用pytorch-lightning 包装下模型,更好更简单的支持并行训练。 预计春节前做。
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
DAML模型训练没问题,测试加载时候报错:
raceback (most recent call last):
File "", line 1, in
File "G:\yzq\pycharm\PyCharm 2019.1.2\helpers\pydev_pydev_bundle\pydev_umd.py", line 197, in runfile
pydev_imports.execfile(filename, global_vars, local_vars) # execute the script
File "G:\yzq\pycharm\PyCharm 2019.1.2\helpers\pydev_pydev_imps_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "G:/yzq/Rec/Neu-Review-Rec/main.py", line 210, in
fire.Fire()
File "G:\yzq\anaconda3\envs\pytorch\lib\site-packages\fire\core.py", line 138, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "G:\yzq\anaconda3\envs\pytorch\lib\site-packages\fire\core.py", line 468, in _Fire
target=component.name)
File "G:\yzq\anaconda3\envs\pytorch\lib\site-packages\fire\core.py", line 672, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "G:/yzq/Rec/Neu-Review-Rec/main.py", line 155, in test
model.load(opt.pth_path)
File "G:\yzq\Rec\Neu-Review-Rec\framework\models.py", line 49, in load
self.load_state_dict(torch.load(path),False)
File "G:\yzq\anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 1052, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for Model:
size mismatch for predict_net.model.fm_V: copying a param with shape torch.Size([16, 128]) from checkpoint, the shape in current model is torch.Size([128, 10]).
跑模型时候仅仅是修改了fea=2,跑了2天才训练好,测试时候也没做修改,报这个错搜了一圈也没找到,想问下作者之前有遇到过嘛,谢谢您啦!
The text was updated successfully, but these errors were encountered: