Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Add remove_eol_characters hook #12

Merged
merged 9 commits into from
Sep 13, 2022

Conversation

C1rN09
Copy link
Contributor

@C1rN09 C1rN09 commented Sep 13, 2022

This PR adds a new script/pre-commit-hook, named remove-improper-eol-in-cn-docs.

The script aims at resolving extra whitespaces in Chinese docs, which is a long-standing HTML issue as discussed here.

To solve the issue, this script finds and removes end_of_line characters which split natural Chinese paragraphs. For example,

这是一个,
像诗一样的
测试

will be changed to

这是一个,像诗一样的测试

However, the following cases stay unchanged:

  • Docs written in English

This is,
a poem-like
test

  • Natural paragraphs (split by 2+ eol characters in Markdown)

这是一个

测试

@zhouzaida
Copy link
Collaborator

zhouzaida commented Sep 13, 2022

@C1rN09
Copy link
Contributor Author

C1rN09 commented Sep 13, 2022

Tested with the following config in MMEngine

- repo: https://github.com/C1rN09/pre-commit-hooks.git
    rev: 02421bc
    hooks:
      - id: check-copyright
        args: ["mmengine", "tests"]
      - id: remove-improper-eol-in-cn-docs

Seems good.

README.md Outdated Show resolved Hide resolved
setup.py Outdated Show resolved Hide resolved
@zhouzaida zhouzaida merged commit 1bed9a0 into open-mmlab:master Sep 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants