Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cell needs much deeper thought to truly support Unicode #2933

Open
tig opened this issue Oct 25, 2023 · 18 comments
Open

Cell needs much deeper thought to truly support Unicode #2933

tig opened this issue Oct 25, 2023 · 18 comments
Assignees
Labels
design Issues regarding Terminal.Gui design (bugs, guidelines, debates, etc...) v2 For discussions, issues, etc... relavant for v2
Milestone

Comments

@tig
Copy link
Collaborator

tig commented Oct 25, 2023

I hate the design I came up with for this for v2:

/// <summary>
/// Represents a single row/column in a Terminal.Gui rendering surface
/// (e.g. <see cref="LineCanvas"/> and <see cref="ConsoleDriver"/>).
/// </summary>
public class Cell {
	/// <summary>
	/// The list of Runes to draw in this cell. If the list is empty, the cell is blank. If the list contains
	/// more than one Rune, the cell is a combining sequence.
	/// (See #2616 - Support combining sequences that don't normalize)
	/// </summary>
	public List<Rune> Runes { get; set; } = new List<Rune> ();

	/// <summary>
	/// The attributes to use when drawing the Glyph.
	/// </summary>
	public Attribute? Attribute { get; set; }

	/// <summary>
	/// Gets or sets a value indicating whether this <see cref="T:Terminal.Gui.Cell"/> has
	/// been modified since the last time it was drawn.
	/// </summary>
	public bool IsDirty { get; set; }
}

It forces simple code to do cell.Runes[0] which makes code hard to read.

I propose changing this to:

/// <summary>
/// Represents a single row/column in a Terminal.Gui rendering surface
/// (e.g. <see cref="LineCanvas"/> and <see cref="ConsoleDriver"/>).
/// </summary>
public class Cell {
	/// <summary>
	/// The character to display. If <see cref="Rune"/> is <see langword="null"/>, then <see cref="Rune"/> is ignored.
	/// </summary>
	public Rune Rune { get; set; }

	/// <summary>
	/// The combining mark for <see cref="Rune"/> that when combined makes this Cell a combining sequence that could
	/// not be normalized to a single Rune.
	/// If <see cref="CombiningMark"/> is <see langword="null"/>, then <see cref="CombiningMark"/> is ignored.
	/// </summary>
	/// <remarks>
	/// Only valid in the rare case where <see cref="Rune"/> is a combining sequence that could not be normalized to a single Rune.
	/// </remarks>
	internal Rune CombiningMark { get; set; }

	/// <summary>
	/// The attributes to use when drawing the Glyph.
	/// </summary>
	public Attribute? Attribute { get; set; }

	/// <summary>
	/// Gets or sets a value indicating whether this <see cref="T:Terminal.Gui.Cell"/> has
	/// been modified since the last time it was drawn.
	/// </summary>
	public bool IsDirty { get; set; }
}

When so, all code that currently does cell.Runes[0] will become cell.Rune.

Only ConsoleDriver will need to use CombiningMark (see #2932)

See #2939

@tig tig added design Issues regarding Terminal.Gui design (bugs, guidelines, debates, etc...) v2 For discussions, issues, etc... relavant for v2 labels Oct 25, 2023
tig added a commit that referenced this issue Oct 25, 2023
* refactored Cell

* Fixed usings

* Wierd CI/Cd unit test failure. Will this fix?
@BDisp
Copy link
Collaborator

BDisp commented Oct 25, 2023

It forces simple code to do cell.Runes[0] which makes code hard to read.

I agree.

There will never be more than 2 Runes in a Cell.

I've doubt about because I think that can have more than 2 runes. So I suggest to make this as follow in case of doubt.
public List<Rune> CombiningMark { get; set; } = new List<Rune> ();

@tig
Copy link
Collaborator Author

tig commented Oct 25, 2023

It forces simple code to do cell.Runes[0] which makes code hard to read.

I agree.

There will never be more than 2 Runes in a Cell.

I've doubt about because I think that can have more than 2 runes. So I suggest to make this as follow in case of doubt. public List<Rune> CombiningMark { get; set; } = new List<Rune> ();

I've researched this. There are no known examples of unicode codepoints where there will be more than one combining mark.

Regardless, in the current PR CombingMark is commented out, so feel free to change this to the following in #2932

public List<Rune> CombiningMarks { get; set; } = new List<Rune> ();

@BDisp
Copy link
Collaborator

BDisp commented Oct 25, 2023

@tig thinking better this change will have complicate more than before because when you replace the previous column with the normalized rune list, now you have to deal with the new property CombiningMarks. In my opinion the previous code was better.

@BDisp
Copy link
Collaborator

BDisp commented Oct 25, 2023

Without say you broke all my idea about this. You could wait to merge first and do the change later.

@tig
Copy link
Collaborator Author

tig commented Oct 25, 2023

@tig thinking better this change will have complicate more than before because when you replace the previous column with the normalized rune list, now you have to deal with the new property CombiningMarks. In my opinion the previous code was better.

Yes, this will complicate ConsoleDriver and potentailly ConsoleDriver implementations, but it absolutely simplifies normal app developer's lives.

If you have an alternative solution, I'm all ears, but only if it does not make library user's code more complex.

@tig tig changed the title Cell.Runes is clumsy Cell needs much deeper thought Oct 29, 2023
@tig tig changed the title Cell needs much deeper thought Cell needs much deeper thought to truly support Unicode Oct 29, 2023
@a-usr
Copy link

a-usr commented Nov 9, 2023

Why don't we (or you?) somewhat combine both of your proposals/ideas?

the result would look something like this:

/// <summary>
/// Represents a single row/column in a Terminal.Gui rendering surface
/// (e.g. <see cref="LineCanvas"/> and <see cref="ConsoleDriver"/>).
/// </summary>
public class Cell
  {
    /// <summary>
    /// The character to display. If <see cref="Rune"/> is <see langword="null"/>, then <see cref="Rune"/> is ignored.
    /// </summary>
    /// <remarks>
    /// Stores and reads from <see cref="Runes"/>[0]
    /// </remarks>
    public Rune Rune { get; set; } // { get => Runes[0]; set => {Runes[0] = value}; }

    /// <summary>
    /// The list where the Runes are stored. 
    /// </summary>
    /// <remarks>
    /// Could be made internal, but keeping this public prevents older apps from breaking as long as they operate within the lists size limit.
    /// </remarks>
    public List<Rune> Runes { get; set; } // = new List<Rune>(2){null, null}  // Limits List size to two Items. Can be mmade larger if required, but may help detect faulty code that else would result in improperly drawn cells

    /// <summary>
    /// The combining mark for <see cref="Rune"/> that when combined makes this Cell a combining sequence that could
    /// not be normalized to a single Rune.
    /// If <see cref="CombiningMark"/> is <see langword="null"/>, then <see cref="CombiningMark"/> is ignored. 
    /// </summary>
    /// <remarks>
    /// Only valid in the rare case where <see cref="Rune"/> is a combining sequence that could not be normalized to a single Rune. Stores and reads from <see cref="Runes"/>[1]
    /// </remarks>
    public Rune CombiningMark { get; set; } // { get => Runes[1]; set => {Runes[1] = value}; }

    /// <summary>
    /// The attributes to use when drawing the Glyph.
    /// </summary>
    public Attribute? Attribute { get; set; }

    /// <summary>
    /// Gets or sets a value indicating whether this <see cref="T:Terminal.Gui.Cell"/> has
    /// been modified since the last time it was drawn.
    /// </summary>
    public bool IsDirty { get; set; }
  }

}

Effectively, Rune and CombiningMark point to the objects positioned at the index 0 and 1 respectively inside of the list Runes. This both enables the use of the newer Properties Rune and CombiningMark, but also iterating through Runes for the ConsoleDrivers (wich i assume was @BDisp's idea wich he mentioned) and prevents old apps from breaking. Combine Runes with the Obsolete Attribute and both the compiler and intellisense issue a warning to not use this!

@BDisp
Copy link
Collaborator

BDisp commented Nov 9, 2023

@tig already did the PR #2934, which is well done.
The obsolete Attribute is what only work for now with old terminal but I think they will exist for a long time, yet. The MS guys gave up to add more features for older terminals because increase the bugs and conflicts. The only leverage we can do with them is allowing render non-bmp code points when the font support them.

@BDisp
Copy link
Collaborator

BDisp commented Nov 9, 2023

👨+ \u200D +👩+ \u200D +👧 will result in 👨‍👩‍👧 (\U0001F468\u200D\U0001F469\u200D\U0001F467)
\u200D isn't a combining mark but is a format Unicode category.

conhost:

image

WT force 16 colors:

image

WT true color:

image

@BDisp
Copy link
Collaborator

BDisp commented Nov 9, 2023

@tig I did these changes on my PR tig#16.

@a-usr
Copy link

a-usr commented Nov 9, 2023

Isn't Rune per definition Unicode, just like char is ascii?

@BDisp
Copy link
Collaborator

BDisp commented Nov 9, 2023

Isn't Rune per definition Unicode, just like char is ascii?

Right.

@a-usr
Copy link

a-usr commented Nov 10, 2023

Isn't Rune per definition Unicode, just like char is ascii?

Right.

But rune doesn't support combining unicode characters?

@BDisp
Copy link
Collaborator

BDisp commented Nov 10, 2023

But rune doesn't support combining unicode characters?

Yes, support. The reason for puts them on another property is because they should not occupy another column but the same as the Rune.

Edit:
If the rune has a width of two column then the next column should be null '\0' to not be outputted.

@a-usr
Copy link

a-usr commented Nov 13, 2023

But rune doesn't support combining unicode characters?

Yes, support. The reason for puts them on another property is because they should not occupy another column but the same as the Rune.

Edit:
If the rune has a width of two column then the next column should be null '\0' to not be outputted.

That makes sense.
How exactly are you printing unicode? Are you parsing the unicode codepoint into an ansi escape sequence? Or how else do you do it?

@BDisp
Copy link
Collaborator

BDisp commented Nov 13, 2023

How exactly are you printing unicode? Are you parsing the unicode codepoint into an ansi escape sequence? Or how else do you do it?

On the v1 that is done on the override AddRune of each driver. Before the Contents only existed on the NetDriver. Then for unit tests was also implemented on the FakeDriver. With the Border implementation and for use the 3D effect, the Contents field was included on the ConsoleDriver abstract class and overridden on all drivers with duplicated code.
On v2 @tig did a wonderfully job by remove the abstract keyword from the AddRune and the Contents for doing the common code in one place. The overridden UpdateScreen of each driver is responsible to collect all the information from the Contents field and translate to the respective buffer that each driver understand. By using the StringBuilder class some driver can recognize non-bmp as well, but the older WriteConsoleOutput only recognize char type and the cursor position cannot be handled. The only way to it support non-bmp is considering the non-bmp as a wide character with two columns. It will print them but if a non-bmp has only a width of one column it will add a space, because each buffer column contain a char (bmp).
The CursesDriver is the only one than cant print non-bmp and I still don't understand why.

@tig
Copy link
Collaborator Author

tig commented Nov 14, 2023

The CursesDriver is the only one than cant print non-bmp and I still don't understand why.

This is the relevant code:
image

I started down the path of experimenting with the other Curses APIs for wide-chars to try to figure out what the heck Curses is doing, but decided to focus on Windows first. I'm still experimenting with how to do all this right on Windows.

I think a key to getting Curses working right is to use APIs other than mvaddch/mvaddstr like

https://linux.die.net/man/3/curs_add_wch

I really do not like the current architecture of the Curses driver. It was clearly hacked together by Miguel back in the day and has some really suspect code. I hope to significantly simplify and modernize it and I'd rather not spend any more energy trying to hack the current code to work. For example, this is just nuts:

image

Note TrueColor doesn't work yet either.

@tig
Copy link
Collaborator Author

tig commented Jan 23, 2024

This is super-relevant and super-cool:

A new Unicode Terminal Complex Script Support, or TCSS proposal

@tig
Copy link
Collaborator Author

tig commented Jan 23, 2024

@tig tig added this to the V2 Beta milestone Jul 11, 2024
@tig tig self-assigned this Aug 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design Issues regarding Terminal.Gui design (bugs, guidelines, debates, etc...) v2 For discussions, issues, etc... relavant for v2
Projects
Status: 🏗 Approved - In progress
Development

No branches or pull requests

3 participants