Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgraded rivo/uniseg to latest version, switched StringWidth/Truncate to speedier version #63

Open
wants to merge 15 commits into
base: master
Choose a base branch
from

Conversation

rivo
Copy link

@rivo rivo commented Jul 27, 2022

The rivo/uniseg package has received a major update which also includes methods for grapheme cluster parsing that are much faster than the previously used Graphemes class.

I've upgraded your package accordingly and updated the relevant code to use these faster methods. It would be great if you could merge these changes.

Thank you!

ps. I noticed that some automatic checks did not complete successfully because they are still running on Go 1.15. Would you like me to look into upgrading them to the current version (1.18)?

@codecov-commenter
Copy link

codecov-commenter commented Sep 11, 2022

Codecov Report

Merging #63 (0a21090) into master (f9d5553) will increase coverage by 0.25%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master      #63      +/-   ##
==========================================
+ Coverage   93.56%   93.82%   +0.25%     
==========================================
  Files           3        3              
  Lines         171      178       +7     
==========================================
+ Hits          160      167       +7     
  Misses          6        6              
  Partials        5        5              
Impacted Files Coverage Δ
runewidth.go 95.10% <100.00%> (+0.25%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@rivo
Copy link
Author

rivo commented Sep 11, 2022

I upgraded again to the latest uniseg version. Also implemented the "Quick Fix" solution outlined in #59 (comment).

Also updated CI to use current Go versions. Everything passes now.

@@ -166,6 +167,16 @@ func emoji(out io.Writer, in io.Reader) error {
})
}

// We also want regional indicator symbols (flags) to be part of the Emoji
// table. They are U+1F1E6..U+1F1FF.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add link to documentation of the specification?

Copy link
Author

@rivo rivo Sep 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

http://www.unicode.org/reports/tr51/#def_emoji_flag_sequence documents how flags are constructed using Regional Indicator code points.

Regional Indicator code points are not classified as Extended_Pictographic so they don't show up in your emoji table. But for the sake of calculating the width, they behave the same as other emojis. So the simplest solution is to add them to your emoji table. Alternatively, you could add the detection of Regional Indicators to all other parts of your code. That would be overkill, in my opinion, but it's up to you.

In any case, you'll want StringLength("🇯🇵") == 2. That's what this is for.

You'll find them in the same file (look for "Regional Indicator"):

https://unicode.org/Public/13.0.0/ucd/emoji/emoji-data.txt

go.mod Outdated
@@ -1,5 +1,5 @@
module github.com/mattn/go-runewidth

go 1.9
go 1.18
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope to keep go1.16 but do you have something problem?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, rivo/uniseg uses generics and the new build tag syntax, both of which were introduced with Go 1.18.

I could probably downgrade it in go-runewidth to a previous version but that old version was much slower than the new version. If I do, Go 1.16 will work.

Let me know how you'd like to proceed.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What version of uniseg is it possible to build with go-runewidth with 1.16?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

v0.3.4

But if you use v0.3.4, you also need to make adjustments to your code.

Here's my suggestion: I prepare a second PR with the same output as this one, but based on uniseg v.0.3.4 and the older Go version. We can leave this PR (#63) open and you can merge it once you're ready to switch to Go 1.18.

What do you think of this?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about to separate files for runewidth_go118.go and runewidth_go117.go ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just submitted a commit that does this but it gives me merge conflicts. It looks like you made/accepted other changes in the meantime. I'm not able to resolve these conflicts, only you can.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I was able to resolve the merge conflict. Please have a look.

@rivo
Copy link
Author

rivo commented Sep 18, 2022

Do you still need information or any action from me here?

@derekperkins
Copy link

👍 to getting this merged

@derekperkins
Copy link

@mattn is there anything blocking this from merging and cutting a new release?

@junegunn
Copy link

junegunn commented Jan 4, 2024

@rivo Thanks for the patch. I can confirm that this fixes a display issue of my program.

cl, s, _, state = uniseg.FirstGraphemeClusterInString(s, state)
for index, r := range cl {
if index == 0 && inTable(r, emoji) {
chWidth = 2 // Not the optimal solution but it will work in most cases.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that this incorrectly reports the width of some characters, such as (0x25b6), which is in both the emoji and ambiguous tables.

Related: junegunn/fzf#3588

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it turns out that emojis can have multiple representations. There is a default for each emoji and your example or, say, ☺, defaults to "text presentation". And there are code points that can be added to force a specific presentation. To make things worse, many systems don't respect the Unicode specification and then there is the question of whether fonts will support the different representations.

go-runewidth does not go into this detail. This is also why the comment here says "in most cases". It's a simple approximation. If you want this emoji presentation flag to be considered, you can also use uniseg directly which has had string width calculation for about two years now. uniseg.StringWidth("\u25b6") will report as "1".

From what I can see, uniseg will be much more accurate than go-runewidth. However, go-runewidth is mostly in line with most terminals which also tend to use the simple wcwidth functionality. Depending on your application, there may be some benefit to "making the same mistakes" as the environment it is run in. For example, iTerm2 on macOS which is a very popular terminal application, does not render the rainbow flag correctly:

image

This flag obviously has a width of 2 but iTerm2 assigns a width of 1. If you need your application to be in line with iTerm2, it may be better to make that same mistake. go-runewidth is probably your best bet in that case. (Although I don't know what it reports for this specific example, it might not have the same issue as iTerm2.) But if you need something that is accurate, no matter what, I would suggest using uniseg to calculate rune/string widths.

As a final note, VS Code renders the flag correctly:

image

There is no official spec for these widths. Therefore, everyone rolls their own implementation. It's a mess.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed comment. Really appreciated. I'll see if I can use uniseg.StringWidth instead.

@jesseduffield
Copy link

Hi @mattn just checking if there's anything specific blocking this merge?

@rivo Are you using this branch yourself in another project? Is your fork something I could safely use myself?

@rivo
Copy link
Author

rivo commented Jun 23, 2024

@jesseduffield I'm using this package but not this branch. After tons and tons of hours and effort getting into the details of Unicode and terminals, including writing my own version of a character width package (https://github.com/rivo/uniseg?tab=readme-ov-file#monospace-width), I realize that which package to use really depends on what you're trying to achieve. go-runewidth will be more or less compatible with most terminals out there. Most of them use the widespread wcwidth implementation which, unfortunately, is old and buggy. Many emojis don't render properly, spacing is wrong is lots of cases. And similar issues have been found here, in go-runewidth. But if your application runs in these terminals, you'll probably want to make the same "mistakes". It doesn't help if a character is really 2 spaces wide but the terminal will always advance by 1 space. So go-runewidth is your best bet in doing what the terminal is doing.

On the other hand, the uniseg package attempts to give you the "real" width of a character. If your application has control over where a character is placed, I would suggest using uniseg instead. For example, if you write your own text editor GUI (e.g. something like VSCode) where you don't rely on terminals, I would not suggest using go-runewidth but uniseg instead.

Regarding this branch, I don't know what the plans are. This whole thing is a complicated topic and I totally understand if @mattn doesn't have the time to get into the details. Looks like there's not much happening in this project anymore anyway.

Maybe I should add wcwidth compatibility to uniseg at some point. Then people could simply switch over.

@mattn
Copy link
Owner

mattn commented Jun 23, 2024

I got some errors in this branch.

./runewidth_go118.go:188:28: undefined: uniseg.FirstGraphemeClusterInString
./runewidth_go118.go:214:33: undefined: uniseg.FirstGraphemeClusterInString
./runewidth_go118.go:244:33: undefined: uniseg.FirstGraphemeClusterInString

@rivo
Copy link
Author

rivo commented Jun 23, 2024

@mattn I haven't spent time on this PR anymore as it seemed like you didn't want to follow up on it / merge it. I can fix these issues but only if you indicate that you will merge it. Otherwise, it would be a wasted effort. Please let me know.

Also, your go.mod still lists Go 1.9 as a requirement. uniseg requires 1.18 due to the use of generics. Would you consider bumping your Go version to 1.18 as well? It would make things much easier.

@jesseduffield
Copy link

@rivo thanks for that detailed explanation, that's really helpful

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants