Re: Treatment of Variation Selectors in the Client

I put together a draft PR which adds a general purpose mechanism for
handling UVS and other similar situations where you want to load patches
only if multiple conditions are met: https://github.com/w3c/IFT/pull/222

This works pretty similarly to my proposal above:

   - The logical entry structure we use throughout the spec has been
   updated to have one or more attached subset definitions (previously there
   was only one).
   - The intersection check requires all attached subset definitions match
   the input for the whole entry to match.
   - Updated format 2 to provide a way to specify entries with more then
   one subset definition.
   - Format 1 remains unmodified and always produces entries with only one
   subset definition.


Re: ligatures, agreed that this approach is likely overkill for most
ligature cases where you only have a small number of them attached to
specific characters, this mechanism will be primarily useful for things
like UVS where there are a large enough number of alternates to make it
worthwhile to have the alternates in patches of their own.

On Fri, Oct 18, 2024 at 3:47 PM Skef Iterum <siterum@adobe.com> wrote:

> Or, rather, knowing where to cut things off ...
>
> Skef
> ------------------------------
> *From:* Skef Iterum <siterum@adobe.com>
> *Sent:* Friday, October 18, 2024 2:09 PM
> *To:* Garret Rieger <grieger@google.com>
> *Cc:* John Hudson <john@tiro.ca>; public-webfonts-wg@w3.org <
> public-webfonts-wg@w3.org>
> *Subject:* Re: Treatment of Variation Selectors in the Client
>
> I think looking at a more general mechanism would be good, both for the
> UVS case and also for the Emoji ligature cases.
>
> I continue to suspect that that mechanism would probably not wind up
> helping most "traditional" ligatures, because there won't be enough of them
> to warrant a separate patch, and aggregating a bunch of them together
> artificially into a "ligature patch" would be counterproductive (especially
> if you're effectively always going to load that extra patch in practice,
> because liga is on by default and the source glyphs are high-frequency
> relative to the script). (Maybe there are languages where ligatures are
> prevalent enough that it would help there.)
>
> So the trick will be coming up with the cases where it will really help
> and (I suspect) not knowing where to cut things off so that cases that
> don't need help won't get in the way.
>
> Skef
> ------------------------------
> *From:* Garret Rieger <grieger@google.com>
> *Sent:* Friday, October 18, 2024 10:23 AM
> *To:* Skef Iterum <siterum@adobe.com>
> *Cc:* John Hudson <john@tiro.ca>; public-webfonts-wg@w3.org <
> public-webfonts-wg@w3.org>
> *Subject:* Re: Treatment of Variation Selectors in the Client
>
>
> *EXTERNAL: Use caution when clicking on links or opening attachments.*
>
>
> In harfbuzz we treat UVS as normal codepoints, that is they are specified
> as part of the input unicodes set. However, we also run a special glyph
> closure against cmap14. In the closure we check for UVS sequences that can
> be activated and pull in any required glyphs to support those sequences.
> Harfbuzz always treats the input unicode set as unsorted so we'll include
> any alternate glyphs that could be reached by any ordering of the input
> unicode set. For example if you run a subsetting operation that asks for
> the unicode set:
>
> {CJK Codepoint 1, CJK Codepoint 2, VS2} where VS2 causes a non default
> glyph swap for those codepoints then the retained glyph set will be
> expanded to {CJK 1 Default Glyph, CJK 2 Default Glyph, CJK 1 VS2 Glyph, CJK
> 2 VS2 Glyph}, but won't include other alternate glyphs reachable via
> codepoints not included in the input.
>
> Now for IFT there's two ways you could handle a encoding a font that uses
> UVS with what we currently have:
>
>    1. Always include all possible alternate glyphs in the patches that
>    contain the base glyph. This is of course wasteful if the alternates aren't
>    needed.
>    2. Use a trick similar to how we handle VF axis extension, have a
>    table keyed patch which is matched only on a single VS codepoint which
>    changes the set of glyhp keyed patches listed in the font to ones that
>    include the appropriate alternate glyphs. The downside of this approach is
>    that because this patch needs to be full invalidation this incurs a full
>    extra round trip.
>
> Neither of these options are great. The fundamental problem we run into is
> that codepoint sets are matched via intersection, so it's not currently
> possible to express I want patch X only if codepoint a AND codepoint b are
> present (codepoints are always matched with OR). Skef's suggestion to add a
> UVSRecords which acts like FeatureRecords gets around this by effectively
> introducing a second codepoint set which is matched with AND, since the
> matching algorithm uses AND between the top level sets. However, since this
> same problem comes up in more places then just UVS sequences (eg.
> ligatures) I think we should look for a more general solution and find a
> way to include a mechanism which allows for multiple codepoint sets to be
> attached to an entry and require all the sets to intersect for the entry to
> match (eg. intersection(input codepoints, set 1) AND intersection(input
> codepoints, set 2)). Format 2 has the notion of a copy index which allows
> constructs an entry by unioning other entries together, we could introduce
> an alternate mode on this which treats the combined entries as all needing
> to report intersections for the top level entry to match. This would give
> the ability to create patches that pull in only the needed alternate glyphs
> when a  UVS codepoint is present. I'll need to look at how to incorporate
> this into the spec without making things too complicated.
>
> On Fri, Oct 18, 2024 at 2:51 AM Skef Iterum <siterum@adobe.com> wrote:
>
> Perhaps the thinking is that USVs could be applied downstream?
>
> It's probably something like this, but there's still a missing underlying
> explanation. After all, you might need *any* given codepoint downstream
> and the premise of subsetting is that you know what you will and won't
> need. One could instead treat the SVs like default-active layout features,
> putting them in a list that's added to the unicodes by default but allowing
> you to override that. But that's not what seems to have happened.
>
> If I had to guess I would say that the rationale for how HarfBuzz works is
> probably "this functionality isn't widely understood and maybe even not
> known, so we shouldn't rely on users specifically adding the variation
> selectors they might need." And if that's more or less what happened I'm
> not sure the same answer should apply to IFT, because the spec strongly
> encourages providing *everything* in the font, and we plan to do that,
> it's just a question of * where*. So as long as "the client" (or *some* clients)
> can know whether it's about to use an SV, it might make sense to patch more
> cleverly on that basis. (And if some clients don't know, they can always
> add the SVs into the codepoint list, at the cost of loading extra patches.)
>
> Skef
> ------------------------------
> *From:* John Hudson <john@tiro.ca>
> *Sent:* Thursday, October 17, 2024 4:57 PM
> *To:* public-webfonts-wg@w3.org <public-webfonts-wg@w3.org>
> *Subject:* Re: Treatment of Variation Selectors in the Client
>
> EXTERNAL: Use caution when clicking on links or opening attachments.
>
>
> On 2024-10-17 15:52, Skef Iterum wrote:
> > Beyond that, though, the question is how flexible we can be in
> > satisfying the glyph closure requirement. It seems like the strategy
> > used for static subsets (again, if I'm reading the code right) is to
> > treat the variation selectors as "extra", not considering them as part
> > of the list of unicodes to be preserved (or not). So, for example,
> > even if VS 1 isn't in the list of codepoints to be preserved, you can
> > still get glyphs only accessible using VS 1. It's not clear to me why
> > that's the case.
>
> Perhaps the thinking is that USVs could be applied downstream?
>
> I’m giving a talk at UTW next week that touches on applying formatting
> control characters in buffered states to affect text modes for readers.
> My focus is ZWNJ and ZWJ in Indic scripts, but the same concept can be
> applied to USVs.
>
> J.
>
>
> --
>
> John Hudson
> Tiro Typeworks Ltd
> https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.tiro.com%2F&data=05%7C02%7Csiterum%40adobe.com%7C72149c4578454eba1aa508dcef077941%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C638648062620036820%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=NvVh1RqiZkGXoxrqWponvpbW3hSMq45PB7IdaZGRYbY%3D&reserved=0
> <http://www.tiro.com/>
>
> Tiro Typeworks is physically located on islands
> in the Salish Sea, on the traditional territory
> of the Snuneymuxw and Penelakut First Nations.
>
> __________
>
> EMAIL HOUR
> In the interests of productivity, I am only dealing
> with email towards the end of the day, typically
> between 4PM and 5PM. If you need to contact me more
> urgently, please use other means.
>
>
>

Received on Monday, 21 October 2024 19:09:52 UTC