ImTextCharFromUtf8 excludes a range of unicode characters #832

josh04 · 2016-09-15T23:44:57Z

Using a sample paragraph which includes the text " ‘possible worlds’ " in TextWrapped, ImGui would not print any characters from the first quote onwards. (Unicode codepoint 0x91)

As far as I can tell, in the case of UTF-8 characters which are greater than 0x80 but less than 0xE0, ImTextCharFromUtf8 fails to recognise the character and returns a null, truncating the string to that point. I'm not knowledgeable about UTF-8 enough to say exactly what ImTextCharFromUtf8 is doing or for what purpose in excluding these values, but replacing

*out_char = 0; return 0;

with

*out_char = *str; return 1;

On line 950 of imgui.cpp has resolved my issue, but it obviously might cause others.

The text was updated successfully, but these errors were encountered:

ocornut · 2016-09-16T07:32:39Z

0x91 isn't a quote character, according to
http://www.unicode.org/charts/PDF/U0080.pdf
https://www.cl.cam.ac.uk/~mgk25/ucs/quotes.html

I know for a fact that I used the copyright symbol (U+00A9, UTF-8 0xC2 0xA9).

I am not mega familar with UTF-8 but I'm not sure a single byte between 0x80 and 0xBF translate to a valid code-path.

Could you dump the hex data for the string and confirm that you are indeed passing UTF-8 to it and not extended Ascii ? and/or provide a "portable" repro, portable in the sense maybe using \xFF byte encoding within literal so it can be copied across.

Also see
http://www.utf8-chartable.de/

ocornut · 2016-09-16T07:36:02Z

And
https://en.wikipedia.org/wiki/UTF-8#Description

Possibly you aren't passing valid UTF-8 because it is a confusing thing to do with compilers pre-dating C++11. Newer compiler allows for the u8"this is a utf8 literal".

josh04 · 2016-09-16T08:56:45Z

You're entirely correct, I'm passing through some improperly converted UTF-16 from C# and that's where 0x91 corresponds to a smart quote (http://www.fileformat.info/info/charset/UTF-16/list.htm). I could have sworn I checked http://www.utf8-chartable.de/ before posting the report, but apparently not. That'll teach me to file bug reports at 2am.

Thanks for your help, and thanks for working on such a useful library!

ocornut · 2016-09-16T08:59:11Z

It's sort of unfortunate and cause of recurrent first-time issues with many users.
I was just wondering if maybe we could add a helper imgui function to check the content and display a utf-8 string (e.g. display hex dump). At least make it so everyone who has character related problems can run it and see what they are passing.. I'll add that idea to my notes!

josh04 · 2016-09-17T10:03:34Z

Just to elaborate further on what got me into this tangle in case anyone gets here by google, it turns out that 0x0091 isn't a defined character in UTF-16 either. It's reserved for private use, so Windows opts to treat it as a smart quote to match the earlier Windows-1252 code page. Converting to UTF-8 with C#'s text encoding functions correctly translates this to 0xe28098, the three-byte UTF-8 character for a smart quote. ProggyClean.tff doesn't have a 0xe28098 code point, so with the string correctly converted I get a ? in place of the quote.

However, ProggyClean.tff DOES have a 0x91 code point, for Windows-1252. So if I fail to convert the string and amend ImTextCharFromUtf8 to let the malformed character through, I get the correct glyph. Someone should develop a unicode ProggyClean, I guess.

MrSapps · 2016-09-17T10:23:33Z

Maybe someone can adapt this: https://gist.github.com/paulsapps/cbd037b3d1b063927b719e489197aa27

ocornut · 2016-09-17T10:42:08Z

That does the same thing:

void    ImDumpHex(const u8* ptr, int count, int line_limit)
{
    for (int n = 0; n < count; n++)
    {
        if (n > 0 && (line_limit == 0 || (n % line_limit) != 0))
            ImGui::SameLine();
        ImGui::Text("%02X", ptr[n]);
    }
}

ocornut · 2016-09-17T10:46:27Z

The difference of size between to those blurbs of code is also maybe a gentle reminder of how stupidly wrong and inefficient the C++ stream/string libraries are. Not only the code is 10 times bigger but it is also probably 100 times slower, involving heap allocations, etc. Stay away from this madness :)

…ng issues and font loading issues. Simplified code + extracted DebugNodeFontGlyph(). Helper to diagnose issues such as #4866, #3558, #3436, #2233, #1880, #1780, #905, #832, #762, #726, #609, #565, #307)

josh04 changed the title ~~ImTextCharFromUtf8 excludes a range of unicode charac~~ ImTextCharFromUtf8 excludes a range of unicode characters Sep 15, 2016

josh04 closed this as completed Sep 16, 2016

ocornut added the font/text label Apr 1, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ImTextCharFromUtf8 excludes a range of unicode characters #832

ImTextCharFromUtf8 excludes a range of unicode characters #832

josh04 commented Sep 15, 2016 •

edited

Loading

ocornut commented Sep 16, 2016 •

edited

Loading

ocornut commented Sep 16, 2016

josh04 commented Sep 16, 2016

ocornut commented Sep 16, 2016

josh04 commented Sep 17, 2016

MrSapps commented Sep 17, 2016

ocornut commented Sep 17, 2016 •

edited

Loading

ocornut commented Sep 17, 2016

ImTextCharFromUtf8 excludes a range of unicode characters #832

ImTextCharFromUtf8 excludes a range of unicode characters #832

Comments

josh04 commented Sep 15, 2016 • edited Loading

ocornut commented Sep 16, 2016 • edited Loading

ocornut commented Sep 16, 2016

josh04 commented Sep 16, 2016

ocornut commented Sep 16, 2016

josh04 commented Sep 17, 2016

MrSapps commented Sep 17, 2016

ocornut commented Sep 17, 2016 • edited Loading

ocornut commented Sep 17, 2016

josh04 commented Sep 15, 2016 •

edited

Loading

ocornut commented Sep 16, 2016 •

edited

Loading

ocornut commented Sep 17, 2016 •

edited

Loading