Re: Treatment of Variation Selectors in the Client

I think looking at a more general mechanism would be good, both for the UVS case and also for the Emoji ligature cases.

I continue to suspect that that mechanism would probably not wind up helping most "traditional" ligatures, because there won't be enough of them to warrant a separate patch, and aggregating a bunch of them together artificially into a "ligature patch" would be counterproductive (especially if you're effectively always going to load that extra patch in practice, because liga is on by default and the source glyphs are high-frequency relative to the script). (Maybe there are languages where ligatures are prevalent enough that it would help there.)

So the trick will be coming up with the cases where it will really help and (I suspect) not knowing where to cut things off so that cases that don't need help won't get in the way.

Skef
________________________________
From: Garret Rieger <grieger@google.com>
Sent: Friday, October 18, 2024 10:23 AM
To: Skef Iterum <siterum@adobe.com>
Cc: John Hudson <john@tiro.ca>; public-webfonts-wg@w3.org <public-webfonts-wg@w3.org>
Subject: Re: Treatment of Variation Selectors in the Client


EXTERNAL: Use caution when clicking on links or opening attachments.


In harfbuzz we treat UVS as normal codepoints, that is they are specified as part of the input unicodes set. However, we also run a special glyph closure against cmap14. In the closure we check for UVS sequences that can be activated and pull in any required glyphs to support those sequences. Harfbuzz always treats the input unicode set as unsorted so we'll include any alternate glyphs that could be reached by any ordering of the input unicode set. For example if you run a subsetting operation that asks for the unicode set:

{CJK Codepoint 1, CJK Codepoint 2, VS2} where VS2 causes a non default glyph swap for those codepoints then the retained glyph set will be expanded to {CJK 1 Default Glyph, CJK 2 Default Glyph, CJK 1 VS2 Glyph, CJK 2 VS2 Glyph}, but won't include other alternate glyphs reachable via codepoints not included in the input.

Now for IFT there's two ways you could handle a encoding a font that uses UVS with what we currently have:

  1.  Always include all possible alternate glyphs in the patches that contain the base glyph. This is of course wasteful if the alternates aren't needed.
  2.  Use a trick similar to how we handle VF axis extension, have a table keyed patch which is matched only on a single VS codepoint which changes the set of glyhp keyed patches listed in the font to ones that include the appropriate alternate glyphs. The downside of this approach is that because this patch needs to be full invalidation this incurs a full extra round trip.

Neither of these options are great. The fundamental problem we run into is that codepoint sets are matched via intersection, so it's not currently possible to express I want patch X only if codepoint a AND codepoint b are present (codepoints are always matched with OR). Skef's suggestion to add a UVSRecords which acts like FeatureRecords gets around this by effectively introducing a second codepoint set which is matched with AND, since the matching algorithm uses AND between the top level sets. However, since this same problem comes up in more places then just UVS sequences (eg. ligatures) I think we should look for a more general solution and find a way to include a mechanism which allows for multiple codepoint sets to be attached to an entry and require all the sets to intersect for the entry to match (eg. intersection(input codepoints, set 1) AND intersection(input codepoints, set 2)). Format 2 has the notion of a copy index which allows constructs an entry by unioning other entries together, we could introduce an alternate mode on this which treats the combined entries as all needing to report intersections for the top level entry to match. This would give the ability to create patches that pull in only the needed alternate glyphs when a  UVS codepoint is present. I'll need to look at how to incorporate this into the spec without making things too complicated.

On Fri, Oct 18, 2024 at 2:51 AM Skef Iterum <siterum@adobe.com<mailto:siterum@adobe.com>> wrote:
Perhaps the thinking is that USVs could be applied downstream?
It's probably something like this, but there's still a missing underlying explanation. After all, you might need any given codepoint downstream and the premise of subsetting is that you know what you will and won't need. One could instead treat the SVs like default-active layout features, putting them in a list that's added to the unicodes by default but allowing you to override that. But that's not what seems to have happened.

If I had to guess I would say that the rationale for how HarfBuzz works is probably "this functionality isn't widely understood and maybe even not known, so we shouldn't rely on users specifically adding the variation selectors they might need." And if that's more or less what happened I'm not sure the same answer should apply to IFT, because the spec strongly encourages providing everything in the font, and we plan to do that, it's just a question of where. So as long as "the client" (or some clients) can know whether it's about to use an SV, it might make sense to patch more cleverly on that basis. (And if some clients don't know, they can always add the SVs into the codepoint list, at the cost of loading extra patches.)

Skef
________________________________
From: John Hudson <john@tiro.ca<mailto:john@tiro.ca>>
Sent: Thursday, October 17, 2024 4:57 PM
To: public-webfonts-wg@w3.org<mailto:public-webfonts-wg@w3.org> <public-webfonts-wg@w3.org<mailto:public-webfonts-wg@w3.org>>
Subject: Re: Treatment of Variation Selectors in the Client

EXTERNAL: Use caution when clicking on links or opening attachments.


On 2024-10-17 15:52, Skef Iterum wrote:
> Beyond that, though, the question is how flexible we can be in
> satisfying the glyph closure requirement. It seems like the strategy
> used for static subsets (again, if I'm reading the code right) is to
> treat the variation selectors as "extra", not considering them as part
> of the list of unicodes to be preserved (or not). So, for example,
> even if VS 1 isn't in the list of codepoints to be preserved, you can
> still get glyphs only accessible using VS 1. It's not clear to me why
> that's the case.

Perhaps the thinking is that USVs could be applied downstream?

I’m giving a talk at UTW next week that touches on applying formatting
control characters in buffered states to affect text modes for readers.
My focus is ZWNJ and ZWJ in Indic scripts, but the same concept can be
applied to USVs.

J.


--

John Hudson
Tiro Typeworks Ltd    https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.tiro.com%2F&data=05%7C02%7Csiterum%40adobe.com%7C72149c4578454eba1aa508dcef077941%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C638648062620036820%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=NvVh1RqiZkGXoxrqWponvpbW3hSMq45PB7IdaZGRYbY%3D&reserved=0<http://www.tiro.com/>

Tiro Typeworks is physically located on islands
in the Salish Sea, on the traditional territory
of the Snuneymuxw and Penelakut First Nations.

__________

EMAIL HOUR
In the interests of productivity, I am only dealing
with email towards the end of the day, typically
between 4PM and 5PM. If you need to contact me more
urgently, please use other means.

Received on Friday, 18 October 2024 21:09:54 UTC