Re: Treatment of Variation Selectors in the Client

In harfbuzz we treat UVS as normal codepoints, that is they are specified
as part of the input unicodes set. However, we also run a special glyph
closure against cmap14. In the closure we check for UVS sequences that can
be activated and pull in any required glyphs to support those sequences.
Harfbuzz always treats the input unicode set as unsorted so we'll include
any alternate glyphs that could be reached by any ordering of the input
unicode set. For example if you run a subsetting operation that asks for
the unicode set:

{CJK Codepoint 1, CJK Codepoint 2, VS2} where VS2 causes a non default
glyph swap for those codepoints then the retained glyph set will be
expanded to {CJK 1 Default Glyph, CJK 2 Default Glyph, CJK 1 VS2 Glyph, CJK
2 VS2 Glyph}, but won't include other alternate glyphs reachable via
codepoints not included in the input.

Now for IFT there's two ways you could handle a encoding a font that uses
UVS with what we currently have:

   1. Always include all possible alternate glyphs in the patches that
   contain the base glyph. This is of course wasteful if the alternates aren't
   needed.
   2. Use a trick similar to how we handle VF axis extension, have a table
   keyed patch which is matched only on a single VS codepoint which changes
   the set of glyhp keyed patches listed in the font to ones that include the
   appropriate alternate glyphs. The downside of this approach is that because
   this patch needs to be full invalidation this incurs a full extra round
   trip.

Neither of these options are great. The fundamental problem we run into is
that codepoint sets are matched via intersection, so it's not currently
possible to express I want patch X only if codepoint a AND codepoint b are
present (codepoints are always matched with OR). Skef's suggestion to add a
UVSRecords which acts like FeatureRecords gets around this by effectively
introducing a second codepoint set which is matched with AND, since the
matching algorithm uses AND between the top level sets. However, since this
same problem comes up in more places then just UVS sequences (eg.
ligatures) I think we should look for a more general solution and find a
way to include a mechanism which allows for multiple codepoint sets to be
attached to an entry and require all the sets to intersect for the entry to
match (eg. intersection(input codepoints, set 1) AND intersection(input
codepoints, set 2)). Format 2 has the notion of a copy index which allows
constructs an entry by unioning other entries together, we could introduce
an alternate mode on this which treats the combined entries as all needing
to report intersections for the top level entry to match. This would give
the ability to create patches that pull in only the needed alternate glyphs
when a  UVS codepoint is present. I'll need to look at how to incorporate
this into the spec without making things too complicated.

On Fri, Oct 18, 2024 at 2:51 AM Skef Iterum <siterum@adobe.com> wrote:

> Perhaps the thinking is that USVs could be applied downstream?
>
> It's probably something like this, but there's still a missing underlying
> explanation. After all, you might need *any* given codepoint downstream
> and the premise of subsetting is that you know what you will and won't
> need. One could instead treat the SVs like default-active layout features,
> putting them in a list that's added to the unicodes by default but allowing
> you to override that. But that's not what seems to have happened.
>
> If I had to guess I would say that the rationale for how HarfBuzz works is
> probably "this functionality isn't widely understood and maybe even not
> known, so we shouldn't rely on users specifically adding the variation
> selectors they might need." And if that's more or less what happened I'm
> not sure the same answer should apply to IFT, because the spec strongly
> encourages providing *everything* in the font, and we plan to do that,
> it's just a question of * where*. So as long as "the client" (or *some* clients)
> can know whether it's about to use an SV, it might make sense to patch more
> cleverly on that basis. (And if some clients don't know, they can always
> add the SVs into the codepoint list, at the cost of loading extra patches.)
>
> Skef
> ------------------------------
> *From:* John Hudson <john@tiro.ca>
> *Sent:* Thursday, October 17, 2024 4:57 PM
> *To:* public-webfonts-wg@w3.org <public-webfonts-wg@w3.org>
> *Subject:* Re: Treatment of Variation Selectors in the Client
>
> EXTERNAL: Use caution when clicking on links or opening attachments.
>
>
> On 2024-10-17 15:52, Skef Iterum wrote:
> > Beyond that, though, the question is how flexible we can be in
> > satisfying the glyph closure requirement. It seems like the strategy
> > used for static subsets (again, if I'm reading the code right) is to
> > treat the variation selectors as "extra", not considering them as part
> > of the list of unicodes to be preserved (or not). So, for example,
> > even if VS 1 isn't in the list of codepoints to be preserved, you can
> > still get glyphs only accessible using VS 1. It's not clear to me why
> > that's the case.
>
> Perhaps the thinking is that USVs could be applied downstream?
>
> I’m giving a talk at UTW next week that touches on applying formatting
> control characters in buffered states to affect text modes for readers.
> My focus is ZWNJ and ZWJ in Indic scripts, but the same concept can be
> applied to USVs.
>
> J.
>
>
> --
>
> John Hudson
> Tiro Typeworks Ltd
> https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.tiro.com%2F&data=05%7C02%7Csiterum%40adobe.com%7C72149c4578454eba1aa508dcef077941%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C638648062620036820%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=NvVh1RqiZkGXoxrqWponvpbW3hSMq45PB7IdaZGRYbY%3D&reserved=0
> <http://www.tiro.com/>
>
> Tiro Typeworks is physically located on islands
> in the Salish Sea, on the traditional territory
> of the Snuneymuxw and Penelakut First Nations.
>
> __________
>
> EMAIL HOUR
> In the interests of productivity, I am only dealing
> with email towards the end of the day, typically
> between 4PM and 5PM. If you need to contact me more
> urgently, please use other means.
>
>
>

Received on Friday, 18 October 2024 17:24:16 UTC