Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using OPUS-MT with DeepSpeed #83

Open
rlenain opened this issue Oct 20, 2022 · 0 comments
Open

Using OPUS-MT with DeepSpeed #83

rlenain opened this issue Oct 20, 2022 · 0 comments

Comments

@rlenain
Copy link

rlenain commented Oct 20, 2022

Hello,

I am trying to use OPUS-MT together with DeepSpeed compression (examples can be found at this link https://github.com/microsoft/DeepSpeedExamples under model_compression).

I am running into an issue where the exact same code if I use t5-small, but if I switch to Helsinki-NLP/opus-mt-zh-en it does not work anymore. The error is:

Traceback (most recent call last):
  File "translation/run_translation.py", line 686, in <module>
    main()
  File "translation/run_translation.py", line 603, in main
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "/home/CORP/r.lenain/miniconda3/envs/mt_opus-mt/lib/python3.7/site-packages/transformers/trainer.py", line 1504, in train
    ignore_keys_for_eval=ignore_keys_for_eval,
  File "/home/CORP/r.lenain/miniconda3/envs/mt_opus-mt/lib/python3.7/site-packages/transformers/trainer.py", line 1742, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/home/CORP/r.lenain/miniconda3/envs/mt_opus-mt/lib/python3.7/site-packages/transformers/trainer.py", line 2486, in training_step
    loss = self.compute_loss(model, inputs)
  File "/home/CORP/r.lenain/miniconda3/envs/mt_opus-mt/lib/python3.7/site-packages/transformers/trainer.py", line 2518, in compute_loss
    outputs = model(**inputs)
  File "/home/CORP/r.lenain/miniconda3/envs/mt_opus-mt/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/CORP/r.lenain/miniconda3/envs/mt_opus-mt/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 168, in forward
    outputs = self.parallel_apply(replicas, inputs, kwargs)
  File "/home/CORP/r.lenain/miniconda3/envs/mt_opus-mt/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 178, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File "/home/CORP/r.lenain/miniconda3/envs/mt_opus-mt/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply
    output.reraise()
  File "/home/CORP/r.lenain/miniconda3/envs/mt_opus-mt/lib/python3.7/site-packages/torch/_utils.py", line 461, in reraise
    raise exception
TypeError: Caught TypeError in replica 0 on device 0.
Original Traceback (most recent call last):
  File "/home/CORP/r.lenain/miniconda3/envs/mt_opus-mt/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
    output = module(*input, **kwargs)
  File "/home/CORP/r.lenain/miniconda3/envs/mt_opus-mt/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/CORP/r.lenain/miniconda3/envs/mt_opus-mt/lib/python3.7/site-packages/transformers/models/marian/modeling_marian.py", line 1455, in forward
    return_dict=return_dict,
  File "/home/CORP/r.lenain/miniconda3/envs/mt_opus-mt/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/CORP/r.lenain/miniconda3/envs/mt_opus-mt/lib/python3.7/site-packages/transformers/models/marian/modeling_marian.py", line 1229, in forward
    return_dict=return_dict,
  File "/home/CORP/r.lenain/miniconda3/envs/mt_opus-mt/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/CORP/r.lenain/miniconda3/envs/mt_opus-mt/lib/python3.7/site-packages/transformers/models/marian/modeling_marian.py", line 751, in forward
    embed_pos = self.embed_positions(input_shape)
  File "/home/CORP/r.lenain/miniconda3/envs/mt_opus-mt/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/CORP/r.lenain/miniconda3/envs/mt_opus-mt/lib/python3.7/site-packages/deepspeed/compression/basic_layer.py", line 130, in forward
    self.sparse)
  File "/home/CORP/r.lenain/miniconda3/envs/mt_opus-mt/lib/python3.7/site-packages/torch/nn/functional.py", line 2199, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
TypeError: embedding(): argument 'indices' (position 2) must be Tensor, not torch.Size

Has anyone ever encountered this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant