Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add warning message if model uses input_ids that include padding tokens, but no attention_mask is provided. #16136

Closed
patrickvonplaten opened this issue Mar 14, 2022 · 15 comments · Fixed by #24510

Comments

@patrickvonplaten
Copy link
Contributor

First good issue

A current error is that a user forwards a batched tensor of input_ids that include a padding token, e.g. input_ids = torch.tensor([["hello", "this", "is", "a", "long", "string"], ["hello", "<pad>", "<pad>", "<pad>", "<pad>"]]

In this case, the attention_mask should be provided as well. Otherwise the output hidden_states will be incorrectly computed. This is quite a common silent error IMO.

With @LysandreJik @sgugger, we have decided to not automatically create the attention_mask that masks out the padding tokens in this case because of the reasons explains here: #15479 (comment) . However as pointed out in #15479, we should IMO at least displa a warning since this error happens a lot IMO.

As a first good issue, one could add such a warning to the BertModel in a first case which would go something like:

if attention_mask is not None and (input_ids == pad_token_id).any():
    logger.warn("display nice warning here....")

What do you think @sgugger @LysandreJik ?

@sgugger
Copy link
Collaborator

sgugger commented Mar 14, 2022

Models usually don't know the right pad token ID as pointed out in the issue (I'm also not sure that community-contributed models or models not as heavily used as BERT have the right pas token ID in their configs), so I'm not in favor of this. Plus, the check of the inputs at each forward pass would slow down performance.

I agree that it's a common error, and it would make a very nice addition to the troubleshooting guide IMO, but I'm not sure we can add anything in the library to properly warn users without hurting performance or having a lot of false alarms.

@patrickvonplaten
Copy link
Contributor Author

Hmm, think we can be pretty confident that self.config.pad_token_id inside the model is the correct padding token. Agree that performance would suffer here a bit. Think putting it in the Trouble shooting guide is a good idea cc @stevhliu

@stevhliu
Copy link
Member

Yay more content for the troubleshooting guide! I'll work on a PR for this 👍

@Pawank06
Copy link

Pawank06 commented Feb 1, 2023

Hey, @patrickvonplaten can I work on this issue?

@patrickvonplaten
Copy link
Contributor Author

Sure that'd be great. Just to make sure we don't do duplicated work here - @ydshieh you haven't started on this one yet no?

@ydshieh
Copy link
Collaborator

ydshieh commented Feb 3, 2023

Hi, @Pawank06 @patrickvonplaten

Not really. On Sep. 2022, I rebased the branch @patrickvonplaten created add_important_warning_padding_attention_mask, but then turned my focus to other tasks.

@Pawank06, maybe you can pull that branch, rebase on the latest main, and continue what @patrickvonplaten has done? Don't hesitate if you need any help ❤️

@Pawank06
Copy link

Pawank06 commented Feb 3, 2023

@ydshieh @patrickvonplaten Ok can you assign me this issue and also can you please share me the file path

@anruijian
Copy link
Contributor

@ydshieh @Pawank06 Hello, if no one is actively working on this issue, I am willing to take a look and continue the work!

@ydshieh
Copy link
Collaborator

ydshieh commented Feb 22, 2023

@anruijian Let's wait a bit for @Pawank06 's response :-) Thank you for expressing the interest 💯

@Pawank06 Pawank06 removed their assignment Feb 22, 2023
@anruijian
Copy link
Contributor

@ydshieh Sure. It seems @Pawank06 removed the assignment.

@ydshieh
Copy link
Collaborator

ydshieh commented Feb 23, 2023

I see. @anruijian , you can take a look on this comment, and let me know if you have any question before working on it. Thank you!

@anruijian
Copy link
Contributor

@ydshieh I have checked the add_important_warning_padding_attention_mask and would like to confirm my understanding of the current status and next steps before proceeding with my work. As of now, the task has been completed for the Torch version. The next steps involve adding an equivalent warning function to the TensorFlow and Flax versions. More specifically, in FlaxPreTrainedModel, modeling_flax_bert.pyand TFPreTrainedModel, modeling_tf_bert.py. Thank you!

@ydshieh
Copy link
Collaborator

ydshieh commented Feb 27, 2023

Hi @anruijian . No, the torch part is not finished yet. @patrickvonplaten added a method warn_if_pad_token_in_input_ids_no_attention_mask in src/transformers/modeling_utils.py, and only used that method in a modeling file src/transformers/models/bert/modeling_bert.py.

The goal is to have the same change made in modeling_bert.py to other pytorch modeling files in transformers, like GPT2, Bart, T5, etc., wherever it makes sense, mostly will be in the places where we have

        elif input_ids is not None:
            input_shape = input_ids.size()

@hackyon
Copy link
Contributor

hackyon commented Jun 26, 2023

@patrickvonplaten @ydshieh It looks like none of the pull requests were committed yet, I'd like to take a stab at this issue if it's ok. Thanks.

@ydshieh
Copy link
Collaborator

ydshieh commented Jun 26, 2023

@hackyon Sure!

You can probably continue the work from the branch opened

#21916

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants