Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reordering of ligature substitution rules is considered harmful #1727

Open
khaledhosny opened this issue Nov 16, 2023 · 10 comments
Open

Reordering of ligature substitution rules is considered harmful #1727

khaledhosny opened this issue Nov 16, 2023 · 10 comments

Comments

@khaledhosny
Copy link
Collaborator

The Feature File Specification §5.d, states that:

A contiguous set of ligature rules does not need to be ordered in any particular way by the font editor; the implementation software must do the appropriate sorting. So:

sub f f     by f_f;
sub f i     by f_i;
sub f f i   by f_f_i;
sub o f f i by o_f_f_i;

will produce an identical representation in the font as:

sub o f f i by o_f_f_i;
sub f f i   by f_f_i;
sub f f     by f_f;
sub f i     by f_i;

There are several issues with this:

  1. It is very surprising to users, since the code has one order and the binary silently gets a different order, and the order matters as it controls which substitution is applied first,
  2. There is no way to prevent this automatic re-ordering, other than splitting each substitution to its own lookup which is wasteful and unnecessary,
  3. The sorting algorithm is undocumented, so there is no clear way to verify that implementations are implementing it compatibly.

I think this sorting should be deprecated and dropped, or if back-compatibility is a concern, have a way to disable it.

@frankrolf
Copy link
Member

I remember a time where the mantra “longer ligatures first” was important. I only found out about the re-ordering when trying to demonstrate this problem in one of my workshops.

I can see how this behavior might be considered a theoretical problem, but I think the benefits outweigh this concern. It seems natural for users to write shorter substitutions first.

That said, do you have a practical example where this re-sorting would cause actual harm?

FWIW, the sorting algorithm seems to be here:
https://github.com/adobe-type-tools/afdko/blob/develop/c/makeotf/lib/hotconv/GSUB.c#L1730-L1768

@khaledhosny
Copy link
Collaborator Author

@skef
Copy link
Collaborator

skef commented Nov 16, 2023

I don't see us just removing this part of the spec. Documenting the ordering requirement could be valuable, although there are a lot of things like this in the older parts of the spec and that horse may have left the barn. (We can document what AFDKO does, but that doesn't mean other implementations will update their algorithms if those differ.

We could add a flag to disable the sorting, but that would operate on a font-wide basis.

Seems like it might be better to add some sort of "explicit" command, similar to "subtable", that blocks any reordering within a lookup at the point where it is used.

@khaledhosny
Copy link
Collaborator Author

FWIW, the sorting algorithm seems to be here:
https://github.com/adobe-type-tools/afdko/blob/develop/c/makeotf/lib/hotconv/GSUB.c#L1730-L1768

This sorts by length and GID, which is double bad. Sorting by legnth is understandable, though misguided, but sorting by GID makes no sense.

  1. The sorting algorithm is undocumented, so there is no clear way to verify that implementations are implementing it compatibly.

Case in point, FontTools only sorts by length https://github.com/fonttools/fonttools/blob/fa59ada1b557bc304c592a2ca91c6b99ff6d241d/Lib/fontTools/otlLib/builder.py#L1570

@khaledhosny khaledhosny changed the title Reordering of ligature substitution is considered harmful Reordering of ligature substitution rules is considered harmful Nov 16, 2023
@Lorp
Copy link

Lorp commented Nov 16, 2023

Is the sort by glyphId simply to ensure consistent results between different sort algos?

@khaledhosny
Copy link
Collaborator Author

I don’t think there is any point in sorting by GID, as it changed the meaning of the code and is far more worse than sorting by length since that one is at least potentially desirable.

@Lorp
Copy link

Lorp commented Nov 17, 2023

Right, I was assuming the sort by GID was a secondary sort after the sort by length. Still, that could be confusing if you have some equal-length subs that you need to happen in sequence.

@anthrotype
Copy link
Member

anthrotype commented Jan 22, 2024

FontTools only sorts by length

Well, actually it sorts by length first and secondarily sorts alphabetically by the ligature component glyph names.
fra-rs I believe sorts by length and then GID, similar to makeotf if I understand correctly.
I can see situations where the sorting is undesirable altogether. Ideally one should be able to opt out. For the default behavior I suppose we should stick to one officially documented ordering.

@cmyr
Copy link

cmyr commented Jan 23, 2024

So I've been revisiting this question along with @anthrotype, because there was a slight difference in the sorting behaviour of fea-rs (rust) and feaLib (python, fonttools) for these ligature rules, and for purposes of testing we try to have these two tools generate the same output wherever it is (ahem) feasible.

Currently, fea-rs matches afdko, but feaLib uses glyph names, not GIDs, to determine the ordering within a given LigatureSet table. We are now looking at standardizing on a single sorting approach, that accounts only for length, and is stable (in the order declared in the input) for ligatures within a ligature set. That is, given the following FEA,

sub f i by f_i;
sub f f f by f_f_f;
sub f f by f_f;
sub f f i by f_f_i;

we will end up with the final ordering,

f_f_f
f_f_i
f_i
f_f

In thinking about this, I have been trying to understand @khaledhosny's concerns about the sorting behaviour, specifically by trying to come up with some example of input text + ligature rules where the (unexpected) sorting behaviour could interfere with the designers intentions, and I'm struggling to come up with any.

My current understanding:

  • ligatures (within a lookup) are always grouped into ligature sets, grouped by their first glyph.
  • within a ligature set, the only possible 'interference' is if one ligature is a prefix of another ligature (e.g. f f is a prefix of f f i) in which case, if it occurs earlier in the set, the longer ligature will be unreachable.
  • If ligatures are of equal length then one cannot be a prefix of the other, by definition.
  • if ligatures do not start with the same letter then they will end up in different ligature sets anyway, and declaration order is irrelevant.
  • I cannot come up with a good argument for not ordering longer ligatures ahead of shorter ones. If we were not going to do this then I think we should just drop them completely, since they are going to be unreachable and are just dead bytes.
  • the example in the spec (and quoted in the original issue here) is slightly misleading, since o f f i is going to end up in a different ligature set than f f i, and will always be applied before f f i if it occurs, since the logical cursor will match the o before seeing the f.

Am I missing anything? Does anyone have an example of an input string and a set of ligature rules where the sorting behaviour would confound the designer's intentions?

I think it would be nice, if the spec is going to suggest sorting, that it define how that sorting should occur, and I think that a sorting that considers only length and otherwise respects declaration order is the simplest; but i don't think this is hugely important, since as far as I can tell it should have no impact on the shaping behaviour.

@anthrotype
Copy link
Member

Thanks Colin for clarifying the non-issue. We should not be talking about ordering of ligatures in general (as they appear in the feature.fea) but the order within a given ligature set keyed by first glyph, with each ligature set always necessarily sorted by the glyphID as per OpenType spec (no matter what FEA or font developer say).
I agree that not ordering longer ligatures ahead of shorter ones may lead to some becoming unreachable -- why even bother having a f_f_i ligature if f_f would always match first?! So it makes sense to keep sorting ligature within a set by the length of ligature components.
I also now see that even for different ligatures of equal length (within a set), it doesn't really matter which order they appear, either they will match the input string or they will not. So for these the only reason for specifying some order is consistency across implementations. We can sort by GID (like makotf and fea-rs do), by glyph name (like fonttools does), or not sort these (equal length ligatures with same first glyph) but keep in the same order as written in the FEA. I think overall the latter is the least effort for anybody so +1 to this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants