Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[In progress] Add warning padding attention mask #21916

Closed

Conversation

anruijian
Copy link
Contributor

What does this PR do?

Fixes #16136

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@sgugger
Copy link
Collaborator

sgugger commented Mar 3, 2023

cc @gante

Copy link
Member

@gante gante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I share the concerns of @sgugger stated in this comment -- If the user is not passing the attention mask, but doing things right, all these checks will be done at each forward pass, probably causing unnecessary slowdowns.

Why not a simple logger.warning_once() if attention_mask is None? It is guaranteed to run only once at train time, is much shorter in terms of code, and accounts for special tokens that may exist in future models.

@anruijian
Copy link
Contributor Author

Thank you for the comment!

Based on my understanding, this line of code enables the checking process only once during the forward pass, so it should not significantly impact performance.

The current warning method only issue warnings when attention_mask is necessary (due to the presence of padding tokens in the input), but no attention_mask is provided. In other cases where attention_mask is not required, no warning is issued. The additional checking on special tokens allows a more detailed warning message.

I agree that your suggested method is more concise and efficient, but it may generate warnings when attention_mask is not needed.

Since it's my first time contributing to the community, I don't have a strong opinion towards either solution. The original work is by @ydshieh and @patrickvonplaten. Perhaps they have additional insights and can suggest a more effective solution.

@ydshieh
Copy link
Collaborator

ydshieh commented Mar 7, 2023

Why not a simple logger.warning_once()

This is recently introduced :-)


if self.warnings_issued.get("pad_token_in_input_ids", False):
# if warning has already been thrown don't throw again
return
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes the warning appear once, cc @gante . But it's nice if we can re-use the recently added logger.warning_once. I will leave @gante to make the final call.

@gante
Copy link
Member

gante commented Mar 7, 2023

@anruijian It checks input_ids until there is a batch in which a pad_token_id exists. If a user is working on a problem where they have no pad_token_id on their data and they don't pass the attention_mask, there is a check made every forward pass. I'd strongly advocate for a simple warning when the attention_mask is not passed 🤗

As a side note, we have related problems at other points in the code base. Getting into the habit of passing the attention_mask would really make everyone happier!

@anruijian
Copy link
Contributor Author

@gante Just to confirm before updating the PR, we are going to remove warn_if_pad_token_in_input_ids_no_attention_mask method and use logger.warning_once in forward():

def forward(...):
    ...
    if not attention_mask:
        logger.warning_once(
            "\nWe strongly recommend passing an `attention_mask` to avoid possibly incorrectly computing the"
            " attention weights. "
        )
    ...

@gante
Copy link
Member

gante commented Mar 8, 2023

@anruijian correct :) I would add a short example in the warning, such as (e.g. to correctly mask the pad tokens), but I'll leave that up to you!

@anruijian
Copy link
Contributor Author

@gante

def forward(...):
    ...
    if not attention_mask:
        logger.warning_once(
            "\nWe strongly recommend passing an `attention_mask` to avoid possibly incorrectly computing the"
            " attention weights. Example to correctly mask the pad tokens: model(input_ids, attention_mask=attention_mask)."
            " See https://huggingface.co/docs/transformers/v4.23.1/en/troubleshooting#incorrect-output-when-padding-tokens-arent-masked for more details."
        )
    ...

Does this example look good to you? I also link the official doc on the issue. Not sure if it's too long. Let me know what you think about this. Thanks!

@gante
Copy link
Member

gante commented Apr 3, 2023

@anruijian sounds good to me! (A minor nit: the link is for v4.23 of the docs, should be https://huggingface.co/docs/transformers/troubleshooting#incorrect-output-when-padding-tokens-arent-masked instead)

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add warning message if model uses input_ids that include padding tokens, but no attention_mask is provided.
5 participants