Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C#: Always decode dotnet output as UTF-8 #74065

Merged
merged 1 commit into from
Jun 7, 2023

Conversation

RedworkDE
Copy link
Member

Even tho this seems like a straight forward fix and testing shows that this appears correct, but console handling on Windows is complicated and as far as I can tell this shouldn't work as is, so I am not sure what I am missing here and if this can cause issues in some cases.

SPOILER: One quick run through the Windows console complications
  • A console is supposed to represent a virtual VGA text mode display, most oddities of the system are because of this.
  • Every console is associated with 1 or more processes and every process with 0 or 1 console, e.g. multiple process can share a console
  • Every console has a input and an output encoding that can only be get and set by processes that use that console
  • Sub processes use their parent console (there are ways to change that, but they are not used here)
  • If the parent doesn't have a console, and the child uses the console subsystem, a new console is allocated.
  • Now for the relevant .NET bits:
  • Writing to stdout uses the console output encoding by default.
  • Reading from the redirected stdout of a child process uses the console output encoding by default.
  • Thus as long as the parent & child process use the same console / didn't change their encoding from the system default everything should be passed correctly
  • The failure mode I would expect with this is that the localized string contains characters that cannot be encoded and thus get replaced with the fallback char (generally question marks)
  • But in reality it looks like dotnet always writes UTF8 output, and thus Godot decodes things incorrectly (unless the system default encoding also is UTF8)

If anyone has an idea what causes the dotnet output to be UTF-8 encoded, that would be great to know, so that we can ensure that this doesn't cause any regressions anywhere.

The upstream change is not relevant for the issues, as it it only include in the .NET 8 previews.

If not, changing the windows default code page to utf-8 also fixes the issues, both without the fix and any potential regressions with this fix. (also it's probably a good things to do regardless)

@RedworkDE RedworkDE requested a review from a team as a code owner February 27, 2023 17:07
@RedworkDE
Copy link
Member Author

Since the issues seem to be Windows only, I made the encoding changes only when running on windows.

I did a bunch more source diving and found that the output of --list-sdks is encoded vfwprintf as multibyte characters of the current locale, which means ???.
For the other commands the output is generated by MSBuild which is a massive system and it is not super easy to trace which parts are relevant, but most likely the output is generated by Console.WriteLine which should use the encoding as elaborated above, but the whole https://github.com/dotnet/dotnet repo also contains a whole bunch of places where the encoding is unconditionally set to UTF8 and it is not exactly clear if / when any of them are relevant.

After having spent way too much time on this and still not understanding the whole system:

  • This appears to fix the issue
  • Any possible regressions are very limited in scope, and not very important
  • I'd say merge this fix
  • IF we get any reports of regressions, either recommend to them to configure their windows to use utf8 globally or if there is a significant number of users that experience regressions, revert it and recommend to the users that had issues before to configure their windows to use utf8 globally
  • ¯\_(ツ)_/¯

@nagilson
Copy link

nagilson commented Mar 8, 2023

If anyone has an idea what causes the dotnet output to be UTF-8 encoded, that would be great to know.

It looks like you found the biggest source: https://github.com/dotnet/sdk/blob/af67718a9fce6ba01796ed06df44299fe17da67b/src/Cli/Microsoft.DotNet.Cli.Utils/UILanguageOverride.cs#L27 but yes, dotnet new and dotnet watch and others can also do the same and it is a fool's errand to try to tell from our codebases. Besides that, MSBuild will also have a similar change upcoming which makes it use utf8.

--list-sdks output comes from multiple repositories and executables on the system, one being the runtime host repo, the other the .NET SDK, which is why it was different.

Copy link
Member

@akien-mga akien-mga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. TIWAGOS as I'm not a .NET maintainer, but I don't see any blocker for merging.

Copy link
Member

@raulsntos raulsntos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change looks reasonable to me too, so I'm also in favor of merging.

@akien-mga akien-mga merged commit ec999b2 into godotengine:master Jun 7, 2023
@akien-mga
Copy link
Member

Thanks!

@RedworkDE RedworkDE deleted the net-dotnet-encoding branch June 7, 2023 11:36
@YuriSizov
Copy link
Contributor

Cherry-picked for 4.0.4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Wrong Chinese encoding for MSBuild errors in the editor
5 participants