-
Notifications
You must be signed in to change notification settings - Fork 26.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[In progress] Add warning padding attention mask #21916
[In progress] Add warning padding attention mask #21916
Conversation
cc @gante |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I share the concerns of @sgugger stated in this comment -- If the user is not passing the attention mask, but doing things right, all these checks will be done at each forward pass, probably causing unnecessary slowdowns.
Why not a simple logger.warning_once()
if attention_mask is None
? It is guaranteed to run only once at train time, is much shorter in terms of code, and accounts for special tokens that may exist in future models.
Thank you for the comment! Based on my understanding, this line of code enables the checking process only once during the forward pass, so it should not significantly impact performance. The current warning method only issue warnings when I agree that your suggested method is more concise and efficient, but it may generate warnings when Since it's my first time contributing to the community, I don't have a strong opinion towards either solution. The original work is by @ydshieh and @patrickvonplaten. Perhaps they have additional insights and can suggest a more effective solution. |
This is recently introduced :-) |
|
||
if self.warnings_issued.get("pad_token_in_input_ids", False): | ||
# if warning has already been thrown don't throw again | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anruijian It checks As a side note, we have related problems at other points in the code base. Getting into the habit of passing the |
@gante Just to confirm before updating the PR, we are going to remove def forward(...):
...
if not attention_mask:
logger.warning_once(
"\nWe strongly recommend passing an `attention_mask` to avoid possibly incorrectly computing the"
" attention weights. "
)
... |
@anruijian correct :) I would add a short example in the warning, such as |
def forward(...):
...
if not attention_mask:
logger.warning_once(
"\nWe strongly recommend passing an `attention_mask` to avoid possibly incorrectly computing the"
" attention weights. Example to correctly mask the pad tokens: model(input_ids, attention_mask=attention_mask)."
" See https://huggingface.co/docs/transformers/v4.23.1/en/troubleshooting#incorrect-output-when-padding-tokens-arent-masked for more details."
)
... Does this example look good to you? I also link the official doc on the issue. Not sure if it's too long. Let me know what you think about this. Thanks! |
@anruijian sounds good to me! (A minor nit: the link is for v4.23 of the docs, should be |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
What does this PR do?
Fixes #16136
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.