Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ImTextCharFromUtf8 excludes a range of unicode characters #832

Closed
josh04 opened this issue Sep 15, 2016 · 8 comments
Closed

ImTextCharFromUtf8 excludes a range of unicode characters #832

josh04 opened this issue Sep 15, 2016 · 8 comments

Comments

@josh04
Copy link

josh04 commented Sep 15, 2016

Using a sample paragraph which includes the text " ‘possible worlds’ " in TextWrapped, ImGui would not print any characters from the first quote onwards. (Unicode codepoint 0x91)

As far as I can tell, in the case of UTF-8 characters which are greater than 0x80 but less than 0xE0, ImTextCharFromUtf8 fails to recognise the character and returns a null, truncating the string to that point. I'm not knowledgeable about UTF-8 enough to say exactly what ImTextCharFromUtf8 is doing or for what purpose in excluding these values, but replacing

*out_char = 0; return 0;

with

*out_char = *str; return 1;

On line 950 of imgui.cpp has resolved my issue, but it obviously might cause others.

@josh04 josh04 changed the title ImTextCharFromUtf8 excludes a range of unicode charac ImTextCharFromUtf8 excludes a range of unicode characters Sep 15, 2016
@ocornut
Copy link
Owner

ocornut commented Sep 16, 2016

0x91 isn't a quote character, according to
http://www.unicode.org/charts/PDF/U0080.pdf
https://www.cl.cam.ac.uk/~mgk25/ucs/quotes.html

I know for a fact that I used the copyright symbol (U+00A9, UTF-8 0xC2 0xA9).

I am not mega familar with UTF-8 but I'm not sure a single byte between 0x80 and 0xBF translate to a valid code-path.

Could you dump the hex data for the string and confirm that you are indeed passing UTF-8 to it and not extended Ascii ? and/or provide a "portable" repro, portable in the sense maybe using \xFF byte encoding within literal so it can be copied across.

Also see
http://www.utf8-chartable.de/

@ocornut
Copy link
Owner

ocornut commented Sep 16, 2016

And
https://en.wikipedia.org/wiki/UTF-8#Description

Possibly you aren't passing valid UTF-8 because it is a confusing thing to do with compilers pre-dating C++11. Newer compiler allows for the u8"this is a utf8 literal".

@josh04
Copy link
Author

josh04 commented Sep 16, 2016

You're entirely correct, I'm passing through some improperly converted UTF-16 from C# and that's where 0x91 corresponds to a smart quote (http://www.fileformat.info/info/charset/UTF-16/list.htm). I could have sworn I checked http://www.utf8-chartable.de/ before posting the report, but apparently not. That'll teach me to file bug reports at 2am.

Thanks for your help, and thanks for working on such a useful library!

@josh04 josh04 closed this as completed Sep 16, 2016
@ocornut
Copy link
Owner

ocornut commented Sep 16, 2016

It's sort of unfortunate and cause of recurrent first-time issues with many users.
I was just wondering if maybe we could add a helper imgui function to check the content and display a utf-8 string (e.g. display hex dump). At least make it so everyone who has character related problems can run it and see what they are passing.. I'll add that idea to my notes!

@josh04
Copy link
Author

josh04 commented Sep 17, 2016

Just to elaborate further on what got me into this tangle in case anyone gets here by google, it turns out that 0x0091 isn't a defined character in UTF-16 either. It's reserved for private use, so Windows opts to treat it as a smart quote to match the earlier Windows-1252 code page. Converting to UTF-8 with C#'s text encoding functions correctly translates this to 0xe28098, the three-byte UTF-8 character for a smart quote. ProggyClean.tff doesn't have a 0xe28098 code point, so with the string correctly converted I get a ? in place of the quote.

However, ProggyClean.tff DOES have a 0x91 code point, for Windows-1252. So if I fail to convert the string and amend ImTextCharFromUtf8 to let the malformed character through, I get the correct glyph. Someone should develop a unicode ProggyClean, I guess.

@MrSapps
Copy link

MrSapps commented Sep 17, 2016

@ocornut
Copy link
Owner

ocornut commented Sep 17, 2016

That does the same thing:

void    ImDumpHex(const u8* ptr, int count, int line_limit)
{
    for (int n = 0; n < count; n++)
    {
        if (n > 0 && (line_limit == 0 || (n % line_limit) != 0))
            ImGui::SameLine();
        ImGui::Text("%02X", ptr[n]);
    }
}

@ocornut
Copy link
Owner

ocornut commented Sep 17, 2016

The difference of size between to those blurbs of code is also maybe a gentle reminder of how stupidly wrong and inefficient the C++ stream/string libraries are. Not only the code is 10 times bigger but it is also probably 100 times slower, involving heap allocations, etc. Stay away from this madness :)

ocornut added a commit that referenced this issue May 3, 2022
…ng issues and font loading issues. Simplified code + extracted DebugNodeFontGlyph().

Helper to diagnose issues such as #4866, #3558, #3436, #2233, #1880, #1780, #905, #832, #762, #726, #609, #565, #307)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants