Re: Treatment of Variation Selectors in the Client from Garret Rieger on 2024-10-21 (public-webfonts-wg@w3.org from October 2024)

From: Garret Rieger <grieger@google.com>
Date: Mon, 21 Oct 2024 15:38:45 -0600
To: Skef Iterum <siterum@adobe.com>
Cc: John Hudson <john@tiro.ca>, "public-webfonts-wg@w3.org" <public-webfonts-wg@w3.org>
Message-ID: <CAM=OCWYwks72FJUx-P0sNH=eHOOvh1bqAsGHLODroCiiboBCfg@mail.gmail.com>
I've been thinking about this more, specifically in regards to the separate
issue of detecting the readiness of the font to render specific
codepoint/feature/design space combinations. I wrote down my current
thinking in a new issue here: https://github.com/w3c/IFT/issues/223

The TL;DR is providing a mechanism to load patches that requires the
simultaneous presence of multiple codepoints, unfortunately hampers our
ability to easily detect the readiness of the font to render specific parts
of text sequences. Readiness detection at the codepoint level is likely
going to be required to integrate this technology with browsers, so I think
we probably don't want to add such a mechanism unless we can come up with a
solution to the readiness problem as part of the mechanism.

On Mon, Oct 21, 2024 at 1:09 PM Garret Rieger <grieger@google.com> wrote:

> I put together a draft PR which adds a general purpose mechanism for
> handling UVS and other similar situations where you want to load patches
> only if multiple conditions are met: https://github.com/w3c/IFT/pull/222
>
> This works pretty similarly to my proposal above:
>
>    - The logical entry structure we use throughout the spec has been
>    updated to have one or more attached subset definitions (previously there
>    was only one).
>    - The intersection check requires all attached subset definitions
>    match the input for the whole entry to match.
>    - Updated format 2 to provide a way to specify entries with more then
>    one subset definition.
>    - Format 1 remains unmodified and always produces entries with only
>    one subset definition.
>
>
> Re: ligatures, agreed that this approach is likely overkill for most
> ligature cases where you only have a small number of them attached to
> specific characters, this mechanism will be primarily useful for things
> like UVS where there are a large enough number of alternates to make it
> worthwhile to have the alternates in patches of their own.
>
> On Fri, Oct 18, 2024 at 3:47 PM Skef Iterum <siterum@adobe.com> wrote:
>
>> Or, rather, knowing where to cut things off ...
>>
>> Skef
>> ------------------------------
>> *From:* Skef Iterum <siterum@adobe.com>
>> *Sent:* Friday, October 18, 2024 2:09 PM
>> *To:* Garret Rieger <grieger@google.com>
>> *Cc:* John Hudson <john@tiro.ca>; public-webfonts-wg@w3.org <
>> public-webfonts-wg@w3.org>
>> *Subject:* Re: Treatment of Variation Selectors in the Client
>>
>> I think looking at a more general mechanism would be good, both for the
>> UVS case and also for the Emoji ligature cases.
>>
>> I continue to suspect that that mechanism would probably not wind up
>> helping most "traditional" ligatures, because there won't be enough of them
>> to warrant a separate patch, and aggregating a bunch of them together
>> artificially into a "ligature patch" would be counterproductive (especially
>> if you're effectively always going to load that extra patch in practice,
>> because liga is on by default and the source glyphs are high-frequency
>> relative to the script). (Maybe there are languages where ligatures are
>> prevalent enough that it would help there.)
>>
>> So the trick will be coming up with the cases where it will really help
>> and (I suspect) not knowing where to cut things off so that cases that
>> don't need help won't get in the way.
>>
>> Skef
>> ------------------------------
>> *From:* Garret Rieger <grieger@google.com>
>> *Sent:* Friday, October 18, 2024 10:23 AM
>> *To:* Skef Iterum <siterum@adobe.com>
>> *Cc:* John Hudson <john@tiro.ca>; public-webfonts-wg@w3.org <
>> public-webfonts-wg@w3.org>
>> *Subject:* Re: Treatment of Variation Selectors in the Client
>>
>>
>> *EXTERNAL: Use caution when clicking on links or opening attachments.*
>>
>>
>> In harfbuzz we treat UVS as normal codepoints, that is they are specified
>> as part of the input unicodes set. However, we also run a special glyph
>> closure against cmap14. In the closure we check for UVS sequences that can
>> be activated and pull in any required glyphs to support those sequences.
>> Harfbuzz always treats the input unicode set as unsorted so we'll include
>> any alternate glyphs that could be reached by any ordering of the input
>> unicode set. For example if you run a subsetting operation that asks for
>> the unicode set:
>>
>> {CJK Codepoint 1, CJK Codepoint 2, VS2} where VS2 causes a non default
>> glyph swap for those codepoints then the retained glyph set will be
>> expanded to {CJK 1 Default Glyph, CJK 2 Default Glyph, CJK 1 VS2 Glyph, CJK
>> 2 VS2 Glyph}, but won't include other alternate glyphs reachable via
>> codepoints not included in the input.
>>
>> Now for IFT there's two ways you could handle a encoding a font that uses
>> UVS with what we currently have:
>>
>>    1. Always include all possible alternate glyphs in the patches that
>>    contain the base glyph. This is of course wasteful if the alternates aren't
>>    needed.
>>    2. Use a trick similar to how we handle VF axis extension, have a
>>    table keyed patch which is matched only on a single VS codepoint which
>>    changes the set of glyhp keyed patches listed in the font to ones that
>>    include the appropriate alternate glyphs. The downside of this approach is
>>    that because this patch needs to be full invalidation this incurs a full
>>    extra round trip.
>>
>> Neither of these options are great. The fundamental problem we run into
>> is that codepoint sets are matched via intersection, so it's not currently
>> possible to express I want patch X only if codepoint a AND codepoint b are
>> present (codepoints are always matched with OR). Skef's suggestion to add a
>> UVSRecords which acts like FeatureRecords gets around this by effectively
>> introducing a second codepoint set which is matched with AND, since the
>> matching algorithm uses AND between the top level sets. However, since this
>> same problem comes up in more places then just UVS sequences (eg.
>> ligatures) I think we should look for a more general solution and find a
>> way to include a mechanism which allows for multiple codepoint sets to be
>> attached to an entry and require all the sets to intersect for the entry to
>> match (eg. intersection(input codepoints, set 1) AND intersection(input
>> codepoints, set 2)). Format 2 has the notion of a copy index which allows
>> constructs an entry by unioning other entries together, we could introduce
>> an alternate mode on this which treats the combined entries as all needing
>> to report intersections for the top level entry to match. This would give
>> the ability to create patches that pull in only the needed alternate glyphs
>> when a  UVS codepoint is present. I'll need to look at how to incorporate
>> this into the spec without making things too complicated.
>>
>> On Fri, Oct 18, 2024 at 2:51 AM Skef Iterum <siterum@adobe.com> wrote:
>>
>> Perhaps the thinking is that USVs could be applied downstream?
>>
>> It's probably something like this, but there's still a missing underlying
>> explanation. After all, you might need *any* given codepoint downstream
>> and the premise of subsetting is that you know what you will and won't
>> need. One could instead treat the SVs like default-active layout features,
>> putting them in a list that's added to the unicodes by default but allowing
>> you to override that. But that's not what seems to have happened.
>>
>> If I had to guess I would say that the rationale for how HarfBuzz works
>> is probably "this functionality isn't widely understood and maybe even not
>> known, so we shouldn't rely on users specifically adding the variation
>> selectors they might need." And if that's more or less what happened I'm
>> not sure the same answer should apply to IFT, because the spec strongly
>> encourages providing *everything* in the font, and we plan to do that,
>> it's just a question of * where*. So as long as "the client" (or *some* clients)
>> can know whether it's about to use an SV, it might make sense to patch more
>> cleverly on that basis. (And if some clients don't know, they can always
>> add the SVs into the codepoint list, at the cost of loading extra patches.)
>>
>> Skef
>> ------------------------------
>> *From:* John Hudson <john@tiro.ca>
>> *Sent:* Thursday, October 17, 2024 4:57 PM
>> *To:* public-webfonts-wg@w3.org <public-webfonts-wg@w3.org>
>> *Subject:* Re: Treatment of Variation Selectors in the Client
>>
>> EXTERNAL: Use caution when clicking on links or opening attachments.
>>
>>
>> On 2024-10-17 15:52, Skef Iterum wrote:
>> > Beyond that, though, the question is how flexible we can be in
>> > satisfying the glyph closure requirement. It seems like the strategy
>> > used for static subsets (again, if I'm reading the code right) is to
>> > treat the variation selectors as "extra", not considering them as part
>> > of the list of unicodes to be preserved (or not). So, for example,
>> > even if VS 1 isn't in the list of codepoints to be preserved, you can
>> > still get glyphs only accessible using VS 1. It's not clear to me why
>> > that's the case.
>>
>> Perhaps the thinking is that USVs could be applied downstream?
>>
>> I’m giving a talk at UTW next week that touches on applying formatting
>> control characters in buffered states to affect text modes for readers.
>> My focus is ZWNJ and ZWJ in Indic scripts, but the same concept can be
>> applied to USVs.
>>
>> J.
>>
>>
>> --
>>
>> John Hudson
>> Tiro Typeworks Ltd
>> https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.tiro.com%2F&data=05%7C02%7Csiterum%40adobe.com%7C72149c4578454eba1aa508dcef077941%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C638648062620036820%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=NvVh1RqiZkGXoxrqWponvpbW3hSMq45PB7IdaZGRYbY%3D&reserved=0
>> <http://www.tiro.com/>
>>
>> Tiro Typeworks is physically located on islands
>> in the Salish Sea, on the traditional territory
>> of the Snuneymuxw and Penelakut First Nations.
>>
>> __________
>>
>> EMAIL HOUR
>> In the interests of productivity, I am only dealing
>> with email towards the end of the day, typically
>> between 4PM and 5PM. If you need to contact me more
>> urgently, please use other means.
>>
>>
>>
Received on Monday, 21 October 2024 21:39:08 UTC