Add warning message if model uses `input_ids` that include padding tokens, but no `attention_mask` is provided. #16136

patrickvonplaten · 2022-03-14T11:21:32Z

First good issue

A current error is that a user forwards a batched tensor of input_ids that include a padding token, e.g. input_ids = torch.tensor([["hello", "this", "is", "a", "long", "string"], ["hello", "<pad>", "<pad>", "<pad>", "<pad>"]]

In this case, the attention_mask should be provided as well. Otherwise the output hidden_states will be incorrectly computed. This is quite a common silent error IMO.

With @LysandreJik @sgugger, we have decided to not automatically create the attention_mask that masks out the padding tokens in this case because of the reasons explains here: #15479 (comment) . However as pointed out in #15479, we should IMO at least displa a warning since this error happens a lot IMO.

As a first good issue, one could add such a warning to the BertModel in a first case which would go something like:

if attention_mask is not None and (input_ids == pad_token_id).any():
    logger.warn("display nice warning here....")

What do you think @sgugger @LysandreJik ?

The text was updated successfully, but these errors were encountered:

sgugger · 2022-03-14T14:09:13Z

Models usually don't know the right pad token ID as pointed out in the issue (I'm also not sure that community-contributed models or models not as heavily used as BERT have the right pas token ID in their configs), so I'm not in favor of this. Plus, the check of the inputs at each forward pass would slow down performance.

I agree that it's a common error, and it would make a very nice addition to the troubleshooting guide IMO, but I'm not sure we can add anything in the library to properly warn users without hurting performance or having a lot of false alarms.

patrickvonplaten · 2022-03-17T17:01:32Z

Hmm, think we can be pretty confident that self.config.pad_token_id inside the model is the correct padding token. Agree that performance would suffer here a bit. Think putting it in the Trouble shooting guide is a good idea cc @stevhliu

stevhliu · 2022-03-17T18:11:09Z

Yay more content for the troubleshooting guide! I'll work on a PR for this 👍

Pawank06 · 2023-02-01T17:37:29Z

Hey, @patrickvonplaten can I work on this issue?

patrickvonplaten · 2023-02-03T15:27:44Z

Sure that'd be great. Just to make sure we don't do duplicated work here - @ydshieh you haven't started on this one yet no?

ydshieh · 2023-02-03T16:01:49Z

Hi, @Pawank06 @patrickvonplaten

Not really. On Sep. 2022, I rebased the branch @patrickvonplaten created add_important_warning_padding_attention_mask, but then turned my focus to other tasks.

@Pawank06, maybe you can pull that branch, rebase on the latest main, and continue what @patrickvonplaten has done? Don't hesitate if you need any help ❤️

Pawank06 · 2023-02-03T16:24:18Z

@ydshieh @patrickvonplaten Ok can you assign me this issue and also can you please share me the file path

anruijian · 2023-02-22T01:19:49Z

@ydshieh @Pawank06 Hello, if no one is actively working on this issue, I am willing to take a look and continue the work!

ydshieh · 2023-02-22T08:54:25Z

@anruijian Let's wait a bit for @Pawank06 's response :-) Thank you for expressing the interest 💯

anruijian · 2023-02-23T04:03:29Z

@ydshieh Sure. It seems @Pawank06 removed the assignment.

ydshieh · 2023-02-23T06:10:49Z

I see. @anruijian , you can take a look on this comment, and let me know if you have any question before working on it. Thank you!

anruijian · 2023-02-27T03:24:18Z

@ydshieh I have checked the add_important_warning_padding_attention_mask and would like to confirm my understanding of the current status and next steps before proceeding with my work. As of now, the task has been completed for the Torch version. The next steps involve adding an equivalent warning function to the TensorFlow and Flax versions. More specifically, in FlaxPreTrainedModel, modeling_flax_bert.pyand TFPreTrainedModel, modeling_tf_bert.py. Thank you!

ydshieh · 2023-02-27T13:39:50Z

Hi @anruijian . No, the torch part is not finished yet. @patrickvonplaten added a method warn_if_pad_token_in_input_ids_no_attention_mask in src/transformers/modeling_utils.py, and only used that method in a modeling file src/transformers/models/bert/modeling_bert.py.

The goal is to have the same change made in modeling_bert.py to other pytorch modeling files in transformers, like GPT2, Bart, T5, etc., wherever it makes sense, mostly will be in the places where we have

        elif input_ids is not None:
            input_shape = input_ids.size()

hackyon · 2023-06-26T06:11:09Z

@patrickvonplaten @ydshieh It looks like none of the pull requests were committed yet, I'd like to take a stab at this issue if it's ok. Thanks.

ydshieh · 2023-06-26T17:52:30Z

@hackyon Sure!

You can probably continue the work from the branch opened

#21916

patrickvonplaten added the Good First Issue label Mar 14, 2022

patrickvonplaten mentioned this issue Mar 14, 2022

Wrong/inconsistent behaviour in EncoderDecoderModel and generate method #15479

Closed

stevhliu mentioned this issue Mar 17, 2022

Update troubleshoot with more content #16243

Merged

patrickvonplaten mentioned this issue May 26, 2022

[WIP] Warning when passing padded input ids but no attention mask #17444

Closed

5 tasks

ydshieh assigned Pawank06 Feb 3, 2023

Pawank06 removed their assignment Feb 22, 2023

anruijian mentioned this issue Mar 3, 2023

[In progress] Add warning padding attention mask #21916

Closed

5 tasks

hackyon mentioned this issue Jun 27, 2023

Show a warning for missing attention masks when pad_token_id is not None #24510

Merged

5 tasks

sgugger closed this as completed in #24510 Jun 30, 2023

hackyon mentioned this issue Aug 7, 2023

Add warning for missing attention mask when pad tokens are detected to various models #25345

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add warning message if model uses `input_ids` that include padding tokens, but no `attention_mask` is provided. #16136

Add warning message if model uses `input_ids` that include padding tokens, but no `attention_mask` is provided. #16136

patrickvonplaten commented Mar 14, 2022

sgugger commented Mar 14, 2022

patrickvonplaten commented Mar 17, 2022

stevhliu commented Mar 17, 2022

Pawank06 commented Feb 1, 2023

patrickvonplaten commented Feb 3, 2023

ydshieh commented Feb 3, 2023

Pawank06 commented Feb 3, 2023

anruijian commented Feb 22, 2023

ydshieh commented Feb 22, 2023

anruijian commented Feb 23, 2023

ydshieh commented Feb 23, 2023

anruijian commented Feb 27, 2023

ydshieh commented Feb 27, 2023

hackyon commented Jun 26, 2023

ydshieh commented Jun 26, 2023

Add warning message if model uses input_ids that include padding tokens, but no attention_mask is provided. #16136

Add warning message if model uses input_ids that include padding tokens, but no attention_mask is provided. #16136

Comments

patrickvonplaten commented Mar 14, 2022

First good issue

sgugger commented Mar 14, 2022

patrickvonplaten commented Mar 17, 2022

stevhliu commented Mar 17, 2022

Pawank06 commented Feb 1, 2023

patrickvonplaten commented Feb 3, 2023

ydshieh commented Feb 3, 2023

Pawank06 commented Feb 3, 2023

anruijian commented Feb 22, 2023

ydshieh commented Feb 22, 2023

anruijian commented Feb 23, 2023

ydshieh commented Feb 23, 2023

anruijian commented Feb 27, 2023

ydshieh commented Feb 27, 2023

hackyon commented Jun 26, 2023

ydshieh commented Jun 26, 2023

Add warning message if model uses `input_ids` that include padding tokens, but no `attention_mask` is provided. #16136

Add warning message if model uses `input_ids` that include padding tokens, but no `attention_mask` is provided. #16136