Unicode "negative squared latin" letters not picked up by normalisation algorithm #17120

kara-louise · 2024-09-05T09:28:37Z

Steps to reproduce:

Read the following line of characters

🅻🅸🅲🅴

Actual behavior:

The characters are not normalised. The spoken result depends on the synthesiser used. The Unicode character numbers are sent to a Braille display.

Expected behavior:

If normalisation is enabled, the characters should be normalised.

NVDA logs, crash dumps and other attachments:

n/a

System configuration

NVDA installed/portable/running from source:

installed

NVDA version:

NVDA version 2024.4beta2

Windows version:

Windows 10 Version 22H2 (OS Build 19045.4842)

Name and version of other software in use when reproducing the issue:

n/a

Other information about your system:

Other questions

Does the issue still occur after restarting your computer?

yes

Have you tried any other versions of NVDA? If so, please report their behaviors.

Same issue occurs with alpha-33832,9d15b169 (2025.1.0.33832)

If NVDA add-ons are disabled, is your problem still occurring?

yes

Does the issue still occur after you run the COM Registration Fixing Tool in NVDA's tools menu?

n/a

Adriani90 · 2024-09-05T13:45:02Z

@LeonarddeR is it possible to extend the algorythm? This impacts other alphanumeric suplimements as well, e.g. regional indicator symbol letters, negative circled letters and some other symbols.
Here is the complete list:
https://en.wiktionary.org/wiki/Appendix:Unicode/Enclosed_Alphanumeric_Supplement

Most of the symbols in that list work perfectly though.

ABuffEr · 2024-09-05T14:32:33Z

Hi,
not sure if it's useful but, always to extend the algorythm, I noticed that NVDA/Python 3.11.9 unicodedata.unidata_version returns 14.0.0, while this package currently bumps to 15.1.0.
Maybe it could be included as external dependency, to keep everything up-to-date.

LeonarddeR · 2024-09-05T16:08:54Z

It is possible to extend the algorithm by expanding textUtils.unicodeNormalize. What's the idea behind these negative squared letters? I don't think an update of unicodedata will normalize them properly.

kara-louise · 2024-09-06T04:26:15Z

What's the idea behind these negative squared letters?

@LeonarddeR I assume that the original use for them is for scientific notation since a lot of similar characters are prefixed with the word "mathematical". Not sure why this lot aren't though.
How they're used a lot these days (as in the example in my original comment) is another sort of "fancy text". IE to give the appearance of formatted text in places where you can't use it such as in social media screen names. I saw what I pasted in on Mastodon originally.
There are websites that will convert what you type into the Unicode characters of your choosing, and presumably that was one of the options.

LeonarddeR · 2024-09-06T05:47:37Z

@ABuffEr this unicodedata package received its last update over a year ago. I think it is unlikely that that package will fix these cases anyway.

ABuffEr · 2024-09-06T07:11:14Z

@ABuffEr this unicodedata package received its last update over a year ago. I think it is unlikely that that package will fix these cases anyway.

I imagine because, accordint to this page, Unicode 15.1.0 is the latest version, released in 2023. On the other hand, the 14.0.0 dates back to 2021. Then, ok, I don't know whether this can make any difference here.

Adriani90 · 2024-09-08T14:46:06Z

@LeonarddeR the enclosed alphanumeric supliments have been added to Unicode since version 5.2, but the last symbols have been added in 2020, so 14.0 should actually already contain all these symbols.
Usually they are used to make text stand out visually, in japanese context they are used as well very often, but also in cases such as when indicating country flags etc.

I guess if the normalization algorythm cannot handle them, we would have to add them to the symbols.dic file. right? I mean there are probably about 70 symbols that are not supported so far from this block.

LeonarddeR · 2024-09-09T06:02:35Z

Adding them to symbols.dic is an option yes, but that will still not normalize them when speaking by word or line.
I'd personnally create a Unicode normalization supplementary dictionary in code

sublement = {
    "🅻": "L",
    ...
}

Then feed that to str.maketrans and use the result of a call to str.translate in textUtils.unicodeNormalize.

seanbudd · 2024-09-10T00:21:17Z

@LeonarddeR - I think these should just go in the symbols dictionary - I don't think a supplementary dictionary for normalization is going to be very maintainable

CyrilleB79 · 2024-09-10T08:08:49Z

Before discussing a solution, let's focus on the expected result.

Regarding the character "🅻":

if normalization is on, I'd expect it to be just replaced by an "L" character.
if normalization is off, I would to be able to know that it's not a normal "L". Either I can hear "negative squared latin L" (also to be translated unfortunately) or something reported by the synth (e.g. "letter 1F17B" with eSpeak) needs to be discussed. But in any case, it should be something different than when the normalization is on.

SaschaCowley · 2024-09-11T00:14:11Z

Personally, my preference would probably be to have it replaced by "L" when doing anything but reading by character, but when reading by character have it read as "negative squared latin L".

XLTechie · 2024-09-11T00:47:04Z

Agreed with @SaschaCowley, that seems the most logical way of covering the needs of most users.

kara-louise · 2024-09-11T02:56:21Z

The Unicode ASCII add-on by Sukil Etxenike from the Spanish add-ons store is able to sort of normalise the above characters. I said sort of because they appear for some reason as "[L][I][C][E]".
I don't know what that add-ons doing differently than other normalisation tools such as the Unicode Normalization Test Page, which can't normalise them. So it might be worth investigating that add-on's source code to see how it works.

Adriani90 · 2024-09-11T05:41:34Z

That would be inconsistent to the other normalized alphanumeric characters. The full unicode name can be retrieved by an add-on as we do now currently with other normalized characters.Von meinem iPhone gesendetAm 11.09.2024 um 02:14 schrieb Sascha Cowley ***@***.***>: Personally, my preference would probably be to have it replaced by "L" when doing anything but reading by character, but when reading by character have it read as "negative squared latin L". —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: ***@***.***>

CyrilleB79 · 2024-09-11T06:43:17Z

Personally, my preference would probably be to have it replaced by "L" when doing anything but reading by character, but when reading by character have it read as "negative squared latin L".

Do you mean with normalization on? This would be inconsistent with the process applied to other normalized characters such as "𝑪". When normalization is enabled, we would expect for "🅻":

"L normalized" when navigating by character
"L" in all other cases, i.e. navigating by word, line, during say all, etc.

Adding "🅻" in a symbol file may allow to achieve something interesting for the user, but it's not a solution to achieve something similar to the UX seen on characters where the normalization is already working.

So if we want to discuss what can be heard when normalization is off (e.g. "negative square letter L"), I'd recommend to do it in a separate issue.

LeonarddeR · 2024-09-11T08:07:35Z

If these characters go in the symbol dictionary, this has nothing to do with normalization, and normalization will not work when reading by word or line.

ABuffEr · 2024-09-11T08:53:27Z

Personally, my preference would probably be to have it replaced by "L" when doing anything but reading by character, but when reading by character have it read as "negative squared latin L".
Do you mean with normalization on? This would be inconsistent with the process applied to other normalized characters such as "𝑪".

In fact, you get "normalized C", completely missing that is a Bold Italic styled C. Too flatten in my opinion.
So, if I understand correctly, I agree with @SaschaCowley, even if it requires a change against the current situation.

Adriani90 · 2024-09-11T09:27:19Z

In fact, you get "normalized C", completely missing that is a Bold Italic styled C.

That's exactly the expected behavior in this case. Bold, script, squared, circled or what so ever are details that are totally irelevant when reading the text with a screen reader usually, because these properties are in these special cases only for visual purposes. No one would pronounce these characters with their full unicode name. I agree that in some use cases like if you want to write a publication yourself and needs these characters to meet sighted users needs, then you need these properties, but then you can use the character info add-on to get the full unicode name. It is too much verbosity to make the pronounciation according to the unicode standard. That is the experience in Jaws and to be honnest it is horible to explore a publication with such alphanumeric characters and hearing the whole unicode names, even when navigating character by character which is sometimes needed.
So the current normalization style as it is in NVDA is the most convenient way to handle these characters. But if this is not achievable with theese alphanumeric supliments, I suggest we should try with the symbols.dic and get at least the characters announced in some situations.

Adriani90 · 2024-09-11T09:29:26Z

An alternative would be to integrate character info add-on into NVDA itself and report full unicode name by pressing e.g. nvda+coma, but then we still don't have a database of unicode names that is fully translated into several languages.

Adriani90 · 2024-09-11T09:31:21Z

Another alternative would be to retrieve formating of these characters from unicode and include them into the nvda+f command, so that you can get the formating of a character on demand as well, but this would be a huge workload I guess.

CyrilleB79 · 2024-09-11T09:40:46Z

@ABuffEr and @Adriani90, have you read #17120 (comment)?

The initial request is that the normalization (as currently implemented in NVDA) also work with some more characters, namely the negative squared letters.

If you wish to discuss other topics, please, please, open a new issue. These new topics include:

improvements for the normalization feature
re-defining the name with which a character is reported (e.g. with an extra symbol file or using the existing one)
integrating some features of Character Information add-on in NVDA such as Unicode character description

Thanks.

XLTechie · 2024-09-11T10:11:33Z

Do we need three normalizing modes? Off, full name when reading by character, fully normalized always (the current behavior)?

Adriani90 · 2024-09-11T20:49:00Z

So back to the issue, actually negative in this context means that the letter has an inverted color, so negative letters are white on a dark background. However, even this detail is not important for the screen reader user when reading the text, it is something that could be announced on demand by retrieving the full unicode name.

So if it is possible to include these letters in the normalization similar to all other alphanumeric letters already, this would be ideal.

seanbudd added p4 https://github.com/nvaccess/nvda/blob/master/projectDocs/issues/triage.md#priority good first issue github features these at https://github.com/nvaccess/nvda/contribute triaged Has been triaged, issue is waiting for implementation. labels Sep 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unicode "negative squared latin" letters not picked up by normalisation algorithm #17120

Unicode "negative squared latin" letters not picked up by normalisation algorithm #17120

kara-louise commented Sep 5, 2024

Adriani90 commented Sep 5, 2024 •

edited

Loading

ABuffEr commented Sep 5, 2024

LeonarddeR commented Sep 5, 2024

kara-louise commented Sep 6, 2024 •

edited

Loading

LeonarddeR commented Sep 6, 2024

ABuffEr commented Sep 6, 2024

Adriani90 commented Sep 8, 2024

LeonarddeR commented Sep 9, 2024

seanbudd commented Sep 10, 2024

CyrilleB79 commented Sep 10, 2024

SaschaCowley commented Sep 11, 2024

XLTechie commented Sep 11, 2024 via email

kara-louise commented Sep 11, 2024

Adriani90 commented Sep 11, 2024 via email

CyrilleB79 commented Sep 11, 2024

LeonarddeR commented Sep 11, 2024

ABuffEr commented Sep 11, 2024

Adriani90 commented Sep 11, 2024

Adriani90 commented Sep 11, 2024

Adriani90 commented Sep 11, 2024

CyrilleB79 commented Sep 11, 2024

XLTechie commented Sep 11, 2024 via email

Adriani90 commented Sep 11, 2024

Unicode "negative squared latin" letters not picked up by normalisation algorithm #17120

Unicode "negative squared latin" letters not picked up by normalisation algorithm #17120

Comments

kara-louise commented Sep 5, 2024

Steps to reproduce:

Actual behavior:

Expected behavior:

NVDA logs, crash dumps and other attachments:

System configuration

NVDA installed/portable/running from source:

NVDA version:

Windows version:

Name and version of other software in use when reproducing the issue:

Other information about your system:

Other questions

Does the issue still occur after restarting your computer?

Have you tried any other versions of NVDA? If so, please report their behaviors.

If NVDA add-ons are disabled, is your problem still occurring?

Does the issue still occur after you run the COM Registration Fixing Tool in NVDA's tools menu?

Adriani90 commented Sep 5, 2024 • edited Loading

ABuffEr commented Sep 5, 2024

LeonarddeR commented Sep 5, 2024

kara-louise commented Sep 6, 2024 • edited Loading

LeonarddeR commented Sep 6, 2024

ABuffEr commented Sep 6, 2024

Adriani90 commented Sep 8, 2024

LeonarddeR commented Sep 9, 2024

seanbudd commented Sep 10, 2024

CyrilleB79 commented Sep 10, 2024

SaschaCowley commented Sep 11, 2024

XLTechie commented Sep 11, 2024 via email

kara-louise commented Sep 11, 2024

Adriani90 commented Sep 11, 2024 via email

CyrilleB79 commented Sep 11, 2024

LeonarddeR commented Sep 11, 2024

ABuffEr commented Sep 11, 2024

Adriani90 commented Sep 11, 2024

Adriani90 commented Sep 11, 2024

Adriani90 commented Sep 11, 2024

CyrilleB79 commented Sep 11, 2024

XLTechie commented Sep 11, 2024 via email

Adriani90 commented Sep 11, 2024

Adriani90 commented Sep 5, 2024 •

edited

Loading

kara-louise commented Sep 6, 2024 •

edited

Loading