Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Letter and word spacing #171

Merged
merged 6 commits into from
Jan 8, 2023

Conversation

therahedwig
Copy link
Contributor

Implements letter and word spacing. This patch applies on top of #169

image

As discussed in #166 , LTR has the spacing added to the advance, RTL added to offset and advance, TTB spacing 'removed' from the advance because TTB advance is negative.

Fixes #166

@therahedwig
Copy link
Contributor Author

I've rebased it o it doesn't apply on the baseline patch anymore, because I am increasingly skeptical I can get that one to work.

src/raqm.c Show resolved Hide resolved
src/raqm.c Show resolved Hide resolved
src/raqm.c Outdated Show resolved Hide resolved
src/raqm.c Show resolved Hide resolved
src/raqm.c Show resolved Hide resolved
@khaledhosny
Copy link
Collaborator

I think the letter spacing logic needs to be reworked. You currently iterate over input code points, but you need to iterate over grapheme clusters (not to be confused with HarfBuzz clusters) and add the letter spacing to the last code point in the cluster.

Then, when adding the spacing to the glyphs themselves, you need to add it to the last glyph (in visual order) in any sequence of glyphs that has the same HarfBuzz cluster. You can test Latin text with decomposed combining marks (e.g. x̣v̇) and Arabic text with vowel marks (e.g. وَرِقْ).

Furthermore, you need to disable ligatures when letter spacing is used (at least liga features, but I think browsers disable a few other features as well), otherwise text with ligatures will be unevenly spaced.

@therahedwig
Copy link
Contributor Author

therahedwig commented May 7, 2022

I think the letter spacing logic needs to be reworked. You currently iterate over input code points, but you need to iterate over grapheme clusters (not to be confused with HarfBuzz clusters) and add the letter spacing to the last code point in the cluster.

Then, when adding the spacing to the glyphs themselves, you need to add it to the last glyph (in visual order) in any sequence of glyphs that has the same HarfBuzz cluster. You can test Latin text with decomposed combining marks (e.g. x̣v̇) and Arabic text with vowel marks (e.g. وَرِقْ).

Ok, so I had some trouble getting this to work. Namely, when I reuse raqm's grapheme break functionality, and then use the harfbuzz clusters to apply spacing, the spacing just did not get applied, especially with grapheme clusters that had more than two characters. Part of my confusion here is also that : if harfbuzz clusters and grapheme clusters are not the same thing, why are we applying these to the last entry in the grapheme cluster, furthermore, if we are only applying letter spacing once per harfbuzz cluster, why do we worry about grapheme clusters at all?

I currently have offset added to RTL characters that aren't the last in their cluster, which works, but I now realize that's somewhat inconsistent and potentially very troublsome in the future, and will try to adjust this now.

Furthermore, you need to disable ligatures when letter spacing is used (at least liga features, but I think browsers disable a few other features as well), otherwise text with ligatures will be unevenly spaced.

I've implemented this in the letter-spacing function, using clig and liga, as those are what css-fonts specifies are the common ligatures. I could also turn off dlig, hlig and calt, as all those are also considered ligatures according to css-fonts. I am reusing the set_font_feature function here, as that reduces code duplication, but that does rely on letter-spacing to happen after setting the font-features. If you'd prefer this to happen just before shaping, please say so.

Furthermore, I am having a bit of trouble with the tests: It seems deja vu sans doesn't do decomposition for latin with diacritics, so testing letterspacing on latin is somewhat tricky. Is this why we should be using the grapheme clusters?

EDIT: I forgot to say that I did apply a check for grapheme breaks for wordspacing, as I've noticed that browsers don't consider a space with a combining mark a word space. I have also fixed RTL now.

@therahedwig
Copy link
Contributor Author

Ok, after a night of sleep, I managed to get grapheme breaking to work by checking whether the loop is at the start of a cluster instead of the end, and then adding the spacing to the first codepoint.

I can't get my head around applying it to the last code point, even though I understand there might be a situation where, say, the grapheme cluster goes from 3 to 9 and the harfbuzz cluster from 3 to 5. I've tried the following, which would need adjustment for RTL, but should work on LTR:

diff --git a/src/raqm.c b/src/raqm.c
index 2e7ac71..452532a 100644
--- a/src/raqm.c
+++ b/src/raqm.c
@@ -1086,28 +1086,25 @@ _raqm_set_spacing (raqm_t *rq,
 
   for (size_t i = start; i < end; i++)
   {
-    bool set_spacing = i == 0;
+    bool set_spacing = i == rq->text_len - 1;
     if (!set_spacing)
-      set_spacing = _raqm_allowed_grapheme_boundary (rq->text[i-1], rq->text[i]);
+      set_spacing = _raqm_allowed_grapheme_boundary (rq->text[i], rq->text[i+1]);
 
     if (set_spacing)
     {
       if (word_spacing)
       {
-        if (_raqm_allowed_grapheme_boundary (rq->text[i], rq->text[i+1]))
+        /* CSS word seperators, word spacing is only applied on these.*/
+        if (rq->text[i] == 0x0020  || /* Space */
+            rq->text[i] == 0x00A0  || /* No Break Space */
+            rq->text[i] == 0x1361  || /* Ethiopic Word Space */
+            rq->text[i] == 0x10100 || /* Aegean Word Seperator Line */
+            rq->text[i] == 0x10101 || /* Aegean Word Seperator Dot */
+            rq->text[i] == 0x1039F || /* Ugaric Word Divider */
+            rq->text[i] == 0x1091F)   /* Phoenician Word Separator */
         {
-          /* CSS word seperators, word spacing is only applied on these.*/
-          if (rq->text[i] == 0x0020  || /* Space */
-              rq->text[i] == 0x00A0  || /* No Break Space */
-              rq->text[i] == 0x1361  || /* Ethiopic Word Space */
-              rq->text[i] == 0x10100 || /* Aegean Word Seperator Line */
-              rq->text[i] == 0x10101 || /* Aegean Word Seperator Dot */
-              rq->text[i] == 0x1039F || /* Ugaric Word Divider */
-              rq->text[i] == 0x1091F)   /* Phoenician Word Separator */
-          {
-            rq->text_info[i].spacing_after = spacing;
-            rq->text_info[i].spacing_is_percentage = percentage;
-          }
+          rq->text_info[i].spacing_after = spacing;
+          rq->text_info[i].spacing_is_percentage = percentage;
         }
       }
       else
@@ -2175,11 +2172,22 @@ _raqm_shape (raqm_t *rq)
       pos = hb_buffer_get_glyph_positions (run->buffer, &len);
       info = hb_buffer_get_glyph_infos (run->buffer, &len);
 
+      unsigned int cluster_length = 0;
+      unsigned int current_cluster = 0;
       for (unsigned int i = 0; i < len; i++)
       {
         _raqm_ft_transform (&pos[i].x_advance, &pos[i].y_advance, matrix);
         _raqm_ft_transform (&pos[i].x_offset, &pos[i].y_offset, matrix);
 
+        if (current_cluster == info[i].cluster)
+        {
+          cluster_length += 1;
+        }
+        else
+        {
+          current_cluster = info[i].cluster;
+          cluster_length = 0;
+        }
         bool set_spacing = false;
         if (run->direction == HB_DIRECTION_RTL)
         {
@@ -2193,7 +2201,7 @@ _raqm_shape (raqm_t *rq)
             set_spacing = info[i].cluster != info[i+1].cluster;
         }
 
-        _raqm_text_info rq_info = rq->text_info[info[i].cluster];
+        _raqm_text_info rq_info = rq->text_info[current_cluster + cluster_length];
 
         if (rq_info.spacing_after != 0 && set_spacing)
         {

The problem I am seeing is that Deja Vu Sans and Noto Sans will give different results for letter spacing for letters with excessive combining marks, with the former spacing out each letter while the latter clumps some letters. With the current solution, all my test-cases work, though I am worried about the edge case identified earlier on in the post.

I've also added disabling the other opentype features.

It isn't actually parrt of CSS and seems odd in any case.
@therahedwig
Copy link
Contributor Author

Ok, half a year later I haven't really found an example of the problem I was worried about, despite my other text work, and in hindsight it may have also been a bug somewhere else...

I've removed 'percentage' based spacing because it is not actually in the CSS spec, and I must've mixed something up.

@khaledhosny khaledhosny merged commit d63abe5 into HOST-Oman:master Jan 8, 2023
@khaledhosny
Copy link
Collaborator

Thanks! (and sorry for the late review).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Letter spacing(Tracking) and word spacing
2 participants