[In progress] Add warning padding attention mask #21916

anruijian · 2023-03-03T02:48:57Z

What does this PR do?

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

sgugger · 2023-03-03T12:54:37Z

cc @gante

gante

I share the concerns of @sgugger stated in this comment -- If the user is not passing the attention mask, but doing things right, all these checks will be done at each forward pass, probably causing unnecessary slowdowns.

Why not a simple logger.warning_once() if attention_mask is None? It is guaranteed to run only once at train time, is much shorter in terms of code, and accounts for special tokens that may exist in future models.

anruijian · 2023-03-07T19:05:13Z

Thank you for the comment!

Based on my understanding, this line of code enables the checking process only once during the forward pass, so it should not significantly impact performance.

The current warning method only issue warnings when attention_mask is necessary (due to the presence of padding tokens in the input), but no attention_mask is provided. In other cases where attention_mask is not required, no warning is issued. The additional checking on special tokens allows a more detailed warning message.

I agree that your suggested method is more concise and efficient, but it may generate warnings when attention_mask is not needed.

Since it's my first time contributing to the community, I don't have a strong opinion towards either solution. The original work is by @ydshieh and @patrickvonplaten. Perhaps they have additional insights and can suggest a more effective solution.

ydshieh · 2023-03-07T20:37:28Z

Why not a simple logger.warning_once()

This is recently introduced :-)

ydshieh · 2023-03-07T20:40:07Z

src/transformers/modeling_utils.py

+
+        if self.warnings_issued.get("pad_token_in_input_ids", False):
+            # if warning has already been thrown don't throw again
+            return


This makes the warning appear once, cc @gante . But it's nice if we can re-use the recently added logger.warning_once. I will leave @gante to make the final call.

gante · 2023-03-07T20:47:28Z

@anruijian It checks input_ids until there is a batch in which a pad_token_id exists. If a user is working on a problem where they have no pad_token_id on their data and they don't pass the attention_mask, there is a check made every forward pass. I'd strongly advocate for a simple warning when the attention_mask is not passed 🤗

As a side note, we have related problems at other points in the code base. Getting into the habit of passing the attention_mask would really make everyone happier!

anruijian · 2023-03-08T14:22:43Z

@gante Just to confirm before updating the PR, we are going to remove warn_if_pad_token_in_input_ids_no_attention_mask method and use logger.warning_once in forward():

def forward(...):
    ...
    if not attention_mask:
        logger.warning_once(
            "\nWe strongly recommend passing an `attention_mask` to avoid possibly incorrectly computing the"
            " attention weights. "
        )
    ...

gante · 2023-03-08T14:41:02Z

@anruijian correct :) I would add a short example in the warning, such as (e.g. to correctly mask the pad tokens), but I'll leave that up to you!

anruijian · 2023-03-31T19:12:47Z

@gante

def forward(...):
    ...
    if not attention_mask:
        logger.warning_once(
            "\nWe strongly recommend passing an `attention_mask` to avoid possibly incorrectly computing the"
            " attention weights. Example to correctly mask the pad tokens: model(input_ids, attention_mask=attention_mask)."
            " See https://huggingface.co/docs/transformers/v4.23.1/en/troubleshooting#incorrect-output-when-padding-tokens-arent-masked for more details."
        )
    ...

Does this example look good to you? I also link the official doc on the issue. Not sure if it's too long. Let me know what you think about this. Thanks!

gante · 2023-04-03T12:22:03Z

@anruijian sounds good to me! (A minor nit: the link is for v4.23 of the docs, should be https://huggingface.co/docs/transformers/troubleshooting#incorrect-output-when-padding-tokens-arent-masked instead)

github-actions · 2023-04-27T15:03:12Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

patrickvonplaten and others added 7 commits September 27, 2022 19:30

First draft

407ba35

add sep

b7e1eef

make style

f718f06

add padding token warning in bart

650d4f7

add padding token warning in t5

3b53b72

add padding token warning in gpt2

42fe00c

add padding token warning

063bc98

gante reviewed Mar 7, 2023

View reviewed changes

ydshieh reviewed Mar 7, 2023

View reviewed changes

gante mentioned this pull request Mar 28, 2023

Add important warning padding attention mask #22417

Closed

5 tasks

github-actions bot closed this May 6, 2023

ydshieh mentioned this pull request Jun 26, 2023

Add warning message if model uses input_ids that include padding tokens, but no attention_mask is provided. #16136

Closed

hackyon mentioned this pull request Jun 27, 2023

Show a warning for missing attention masks when pad_token_id is not None #24510

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[In progress] Add warning padding attention mask #21916

[In progress] Add warning padding attention mask #21916

anruijian commented Mar 3, 2023

sgugger commented Mar 3, 2023

gante left a comment •

edited

Loading

anruijian commented Mar 7, 2023

ydshieh commented Mar 7, 2023

ydshieh Mar 7, 2023

gante commented Mar 7, 2023 •

edited

Loading

anruijian commented Mar 8, 2023

gante commented Mar 8, 2023

anruijian commented Mar 31, 2023

gante commented Apr 3, 2023

github-actions bot commented Apr 27, 2023

[In progress] Add warning padding attention mask #21916

[In progress] Add warning padding attention mask #21916

Conversation

anruijian commented Mar 3, 2023

What does this PR do?

Before submitting

Who can review?

sgugger commented Mar 3, 2023

gante left a comment • edited Loading

Choose a reason for hiding this comment

anruijian commented Mar 7, 2023

ydshieh commented Mar 7, 2023

ydshieh Mar 7, 2023

Choose a reason for hiding this comment

gante commented Mar 7, 2023 • edited Loading

anruijian commented Mar 8, 2023

gante commented Mar 8, 2023

anruijian commented Mar 31, 2023

gante commented Apr 3, 2023

github-actions bot commented Apr 27, 2023

gante left a comment •

edited

Loading

gante commented Mar 7, 2023 •

edited

Loading