Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tokenizer] Fix decode output with space in decode_token #9010

Merged

Conversation

DrownFish19
Copy link
Collaborator

PR types

Bug fixes

PR changes

Others

Description

Fix decode output with space in decode_token.
For example, new_text = " 123"(space + 123), and prefix_text = " "(space),the clean_up_tokenization_spaces will make new_text to "123".

Copy link

paddle-bot bot commented Aug 26, 2024

Thanks for your contribution!

Copy link

codecov bot commented Aug 26, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 53.96%. Comparing base (56d293d) to head (0cd1c52).
Report is 77 commits behind head on develop.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #9010      +/-   ##
===========================================
- Coverage    54.08%   53.96%   -0.13%     
===========================================
  Files          650      652       +2     
  Lines       103915   104929    +1014     
===========================================
+ Hits         56200    56621     +421     
- Misses       47715    48308     +593     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@DrownFish19 DrownFish19 changed the title [Tokenizer] Fix decode output with space in decode_token. [Tokenizer] Fix decode output with space in decode_token Aug 27, 2024
Copy link
Collaborator

@ZHUI ZHUI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@DrownFish19 DrownFish19 merged commit c93bada into PaddlePaddle:develop Sep 19, 2024
10 of 12 checks passed
@DrownFish19 DrownFish19 deleted the dev_20240826_fix_decode_token branch September 19, 2024 08:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants