Re: Treatment of Variation Selectors in the Client from Garret Rieger on 2024-11-14 (public-webfonts-wg@w3.org from November 2024)

From: Garret Rieger <grieger@google.com>
Date: Wed, 13 Nov 2024 17:00:37 -0700
To: Skef Iterum <siterum@adobe.com>
Cc: John Hudson <john@tiro.ca>, "public-webfonts-wg@w3.org" <public-webfonts-wg@w3.org>
Message-ID: <CAM=OCWYYEaCbRbBasTW3hD6Sv2wEKkKMEXyU1GemFuyxkzSy7Q@mail.gmail.com>
To follow up on this:
1. We added a mechanism to the IFT spec which can handle the UVS case:
https://github.com/w3c/IFT/pull/222. For example this allows construction
of mappings which require the presence of a UVS codepoint AND one or more
base codepoints before a patch is loaded.
2. Harfbuzz and fonttools have been fixed to correctly handle subsetting
fonts with cmap14 UVS selectors:
https://github.com/harfbuzz/harfbuzz/pull/4912 and
https://github.com/fonttools/fonttools/pull/3672

I believe that fixes the outstanding issues in supporting UVS selectors in
IFT fonts.

On Mon, Oct 21, 2024 at 3:38 PM Garret Rieger <grieger@google.com> wrote:

> I've been thinking about this more, specifically in regards to the
> separate issue of detecting the readiness of the font to render specific
> codepoint/feature/design space combinations. I wrote down my current
> thinking in a new issue here: https://github.com/w3c/IFT/issues/223
>
> The TL;DR is providing a mechanism to load patches that requires the
> simultaneous presence of multiple codepoints, unfortunately hampers our
> ability to easily detect the readiness of the font to render specific parts
> of text sequences. Readiness detection at the codepoint level is likely
> going to be required to integrate this technology with browsers, so I think
> we probably don't want to add such a mechanism unless we can come up with a
> solution to the readiness problem as part of the mechanism.
>
> On Mon, Oct 21, 2024 at 1:09 PM Garret Rieger <grieger@google.com> wrote:
>
>> I put together a draft PR which adds a general purpose mechanism for
>> handling UVS and other similar situations where you want to load patches
>> only if multiple conditions are met: https://github.com/w3c/IFT/pull/222
>>
>> This works pretty similarly to my proposal above:
>>
>>    - The logical entry structure we use throughout the spec has been
>>    updated to have one or more attached subset definitions (previously there
>>    was only one).
>>    - The intersection check requires all attached subset definitions
>>    match the input for the whole entry to match.
>>    - Updated format 2 to provide a way to specify entries with more then
>>    one subset definition.
>>    - Format 1 remains unmodified and always produces entries with only
>>    one subset definition.
>>
>>
>> Re: ligatures, agreed that this approach is likely overkill for most
>> ligature cases where you only have a small number of them attached to
>> specific characters, this mechanism will be primarily useful for things
>> like UVS where there are a large enough number of alternates to make it
>> worthwhile to have the alternates in patches of their own.
>>
>> On Fri, Oct 18, 2024 at 3:47 PM Skef Iterum <siterum@adobe.com> wrote:
>>
>>> Or, rather, knowing where to cut things off ...
>>>
>>> Skef
>>> ------------------------------
>>> *From:* Skef Iterum <siterum@adobe.com>
>>> *Sent:* Friday, October 18, 2024 2:09 PM
>>> *To:* Garret Rieger <grieger@google.com>
>>> *Cc:* John Hudson <john@tiro.ca>; public-webfonts-wg@w3.org <
>>> public-webfonts-wg@w3.org>
>>> *Subject:* Re: Treatment of Variation Selectors in the Client
>>>
>>> I think looking at a more general mechanism would be good, both for the
>>> UVS case and also for the Emoji ligature cases.
>>>
>>> I continue to suspect that that mechanism would probably not wind up
>>> helping most "traditional" ligatures, because there won't be enough of them
>>> to warrant a separate patch, and aggregating a bunch of them together
>>> artificially into a "ligature patch" would be counterproductive (especially
>>> if you're effectively always going to load that extra patch in practice,
>>> because liga is on by default and the source glyphs are high-frequency
>>> relative to the script). (Maybe there are languages where ligatures are
>>> prevalent enough that it would help there.)
>>>
>>> So the trick will be coming up with the cases where it will really help
>>> and (I suspect) not knowing where to cut things off so that cases that
>>> don't need help won't get in the way.
>>>
>>> Skef
>>> ------------------------------
>>> *From:* Garret Rieger <grieger@google.com>
>>> *Sent:* Friday, October 18, 2024 10:23 AM
>>> *To:* Skef Iterum <siterum@adobe.com>
>>> *Cc:* John Hudson <john@tiro.ca>; public-webfonts-wg@w3.org <
>>> public-webfonts-wg@w3.org>
>>> *Subject:* Re: Treatment of Variation Selectors in the Client
>>>
>>>
>>> *EXTERNAL: Use caution when clicking on links or opening attachments.*
>>>
>>>
>>> In harfbuzz we treat UVS as normal codepoints, that is they are
>>> specified as part of the input unicodes set. However, we also run a special
>>> glyph closure against cmap14. In the closure we check for UVS sequences
>>> that can be activated and pull in any required glyphs to support those
>>> sequences. Harfbuzz always treats the input unicode set as unsorted so
>>> we'll include any alternate glyphs that could be reached by any ordering of
>>> the input unicode set. For example if you run a subsetting operation that
>>> asks for the unicode set:
>>>
>>> {CJK Codepoint 1, CJK Codepoint 2, VS2} where VS2 causes a non default
>>> glyph swap for those codepoints then the retained glyph set will be
>>> expanded to {CJK 1 Default Glyph, CJK 2 Default Glyph, CJK 1 VS2 Glyph, CJK
>>> 2 VS2 Glyph}, but won't include other alternate glyphs reachable via
>>> codepoints not included in the input.
>>>
>>> Now for IFT there's two ways you could handle a encoding a font that
>>> uses UVS with what we currently have:
>>>
>>>    1. Always include all possible alternate glyphs in the patches that
>>>    contain the base glyph. This is of course wasteful if the alternates aren't
>>>    needed.
>>>    2. Use a trick similar to how we handle VF axis extension, have a
>>>    table keyed patch which is matched only on a single VS codepoint which
>>>    changes the set of glyhp keyed patches listed in the font to ones that
>>>    include the appropriate alternate glyphs. The downside of this approach is
>>>    that because this patch needs to be full invalidation this incurs a full
>>>    extra round trip.
>>>
>>> Neither of these options are great. The fundamental problem we run into
>>> is that codepoint sets are matched via intersection, so it's not currently
>>> possible to express I want patch X only if codepoint a AND codepoint b are
>>> present (codepoints are always matched with OR). Skef's suggestion to add a
>>> UVSRecords which acts like FeatureRecords gets around this by effectively
>>> introducing a second codepoint set which is matched with AND, since the
>>> matching algorithm uses AND between the top level sets. However, since this
>>> same problem comes up in more places then just UVS sequences (eg.
>>> ligatures) I think we should look for a more general solution and find a
>>> way to include a mechanism which allows for multiple codepoint sets to be
>>> attached to an entry and require all the sets to intersect for the entry to
>>> match (eg. intersection(input codepoints, set 1) AND intersection(input
>>> codepoints, set 2)). Format 2 has the notion of a copy index which allows
>>> constructs an entry by unioning other entries together, we could introduce
>>> an alternate mode on this which treats the combined entries as all needing
>>> to report intersections for the top level entry to match. This would give
>>> the ability to create patches that pull in only the needed alternate glyphs
>>> when a  UVS codepoint is present. I'll need to look at how to incorporate
>>> this into the spec without making things too complicated.
>>>
>>> On Fri, Oct 18, 2024 at 2:51 AM Skef Iterum <siterum@adobe.com> wrote:
>>>
>>> Perhaps the thinking is that USVs could be applied downstream?
>>>
>>> It's probably something like this, but there's still a missing
>>> underlying explanation. After all, you might need *any* given codepoint
>>> downstream and the premise of subsetting is that you know what you will and
>>> won't need. One could instead treat the SVs like default-active layout
>>> features, putting them in a list that's added to the unicodes by default
>>> but allowing you to override that. But that's not what seems to have
>>> happened.
>>>
>>> If I had to guess I would say that the rationale for how HarfBuzz works
>>> is probably "this functionality isn't widely understood and maybe even not
>>> known, so we shouldn't rely on users specifically adding the variation
>>> selectors they might need." And if that's more or less what happened I'm
>>> not sure the same answer should apply to IFT, because the spec strongly
>>> encourages providing *everything* in the font, and we plan to do that,
>>> it's just a question of * where*. So as long as "the client" (or *some* clients)
>>> can know whether it's about to use an SV, it might make sense to patch more
>>> cleverly on that basis. (And if some clients don't know, they can always
>>> add the SVs into the codepoint list, at the cost of loading extra patches.)
>>>
>>> Skef
>>> ------------------------------
>>> *From:* John Hudson <john@tiro.ca>
>>> *Sent:* Thursday, October 17, 2024 4:57 PM
>>> *To:* public-webfonts-wg@w3.org <public-webfonts-wg@w3.org>
>>> *Subject:* Re: Treatment of Variation Selectors in the Client
>>>
>>> EXTERNAL: Use caution when clicking on links or opening attachments.
>>>
>>>
>>> On 2024-10-17 15:52, Skef Iterum wrote:
>>> > Beyond that, though, the question is how flexible we can be in
>>> > satisfying the glyph closure requirement. It seems like the strategy
>>> > used for static subsets (again, if I'm reading the code right) is to
>>> > treat the variation selectors as "extra", not considering them as part
>>> > of the list of unicodes to be preserved (or not). So, for example,
>>> > even if VS 1 isn't in the list of codepoints to be preserved, you can
>>> > still get glyphs only accessible using VS 1. It's not clear to me why
>>> > that's the case.
>>>
>>> Perhaps the thinking is that USVs could be applied downstream?
>>>
>>> I’m giving a talk at UTW next week that touches on applying formatting
>>> control characters in buffered states to affect text modes for readers.
>>> My focus is ZWNJ and ZWJ in Indic scripts, but the same concept can be
>>> applied to USVs.
>>>
>>> J.
>>>
>>>
>>> --
>>>
>>> John Hudson
>>> Tiro Typeworks Ltd
>>> https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.tiro.com%2F&data=05%7C02%7Csiterum%40adobe.com%7C72149c4578454eba1aa508dcef077941%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C638648062620036820%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=NvVh1RqiZkGXoxrqWponvpbW3hSMq45PB7IdaZGRYbY%3D&reserved=0
>>> <http://www.tiro.com/>
>>>
>>> Tiro Typeworks is physically located on islands
>>> in the Salish Sea, on the traditional territory
>>> of the Snuneymuxw and Penelakut First Nations.
>>>
>>> __________
>>>
>>> EMAIL HOUR
>>> In the interests of productivity, I am only dealing
>>> with email towards the end of the day, typically
>>> between 4PM and 5PM. If you need to contact me more
>>> urgently, please use other means.
>>>
>>>
>>>
Received on Thursday, 14 November 2024 00:01:01 UTC