- From: Garret Rieger <grieger@google.com>
- Date: Mon, 21 Oct 2024 13:09:29 -0600
- To: Skef Iterum <siterum@adobe.com>
- Cc: John Hudson <john@tiro.ca>, "public-webfonts-wg@w3.org" <public-webfonts-wg@w3.org>
- Message-ID: <CAM=OCWbNPaO5pY6MMFFgpj-rWT=SrcuQDQ+1Z_zTNMNeo1pWCg@mail.gmail.com>
I put together a draft PR which adds a general purpose mechanism for handling UVS and other similar situations where you want to load patches only if multiple conditions are met: https://github.com/w3c/IFT/pull/222 This works pretty similarly to my proposal above: - The logical entry structure we use throughout the spec has been updated to have one or more attached subset definitions (previously there was only one). - The intersection check requires all attached subset definitions match the input for the whole entry to match. - Updated format 2 to provide a way to specify entries with more then one subset definition. - Format 1 remains unmodified and always produces entries with only one subset definition. Re: ligatures, agreed that this approach is likely overkill for most ligature cases where you only have a small number of them attached to specific characters, this mechanism will be primarily useful for things like UVS where there are a large enough number of alternates to make it worthwhile to have the alternates in patches of their own. On Fri, Oct 18, 2024 at 3:47 PM Skef Iterum <siterum@adobe.com> wrote: > Or, rather, knowing where to cut things off ... > > Skef > ------------------------------ > *From:* Skef Iterum <siterum@adobe.com> > *Sent:* Friday, October 18, 2024 2:09 PM > *To:* Garret Rieger <grieger@google.com> > *Cc:* John Hudson <john@tiro.ca>; public-webfonts-wg@w3.org < > public-webfonts-wg@w3.org> > *Subject:* Re: Treatment of Variation Selectors in the Client > > I think looking at a more general mechanism would be good, both for the > UVS case and also for the Emoji ligature cases. > > I continue to suspect that that mechanism would probably not wind up > helping most "traditional" ligatures, because there won't be enough of them > to warrant a separate patch, and aggregating a bunch of them together > artificially into a "ligature patch" would be counterproductive (especially > if you're effectively always going to load that extra patch in practice, > because liga is on by default and the source glyphs are high-frequency > relative to the script). (Maybe there are languages where ligatures are > prevalent enough that it would help there.) > > So the trick will be coming up with the cases where it will really help > and (I suspect) not knowing where to cut things off so that cases that > don't need help won't get in the way. > > Skef > ------------------------------ > *From:* Garret Rieger <grieger@google.com> > *Sent:* Friday, October 18, 2024 10:23 AM > *To:* Skef Iterum <siterum@adobe.com> > *Cc:* John Hudson <john@tiro.ca>; public-webfonts-wg@w3.org < > public-webfonts-wg@w3.org> > *Subject:* Re: Treatment of Variation Selectors in the Client > > > *EXTERNAL: Use caution when clicking on links or opening attachments.* > > > In harfbuzz we treat UVS as normal codepoints, that is they are specified > as part of the input unicodes set. However, we also run a special glyph > closure against cmap14. In the closure we check for UVS sequences that can > be activated and pull in any required glyphs to support those sequences. > Harfbuzz always treats the input unicode set as unsorted so we'll include > any alternate glyphs that could be reached by any ordering of the input > unicode set. For example if you run a subsetting operation that asks for > the unicode set: > > {CJK Codepoint 1, CJK Codepoint 2, VS2} where VS2 causes a non default > glyph swap for those codepoints then the retained glyph set will be > expanded to {CJK 1 Default Glyph, CJK 2 Default Glyph, CJK 1 VS2 Glyph, CJK > 2 VS2 Glyph}, but won't include other alternate glyphs reachable via > codepoints not included in the input. > > Now for IFT there's two ways you could handle a encoding a font that uses > UVS with what we currently have: > > 1. Always include all possible alternate glyphs in the patches that > contain the base glyph. This is of course wasteful if the alternates aren't > needed. > 2. Use a trick similar to how we handle VF axis extension, have a > table keyed patch which is matched only on a single VS codepoint which > changes the set of glyhp keyed patches listed in the font to ones that > include the appropriate alternate glyphs. The downside of this approach is > that because this patch needs to be full invalidation this incurs a full > extra round trip. > > Neither of these options are great. The fundamental problem we run into is > that codepoint sets are matched via intersection, so it's not currently > possible to express I want patch X only if codepoint a AND codepoint b are > present (codepoints are always matched with OR). Skef's suggestion to add a > UVSRecords which acts like FeatureRecords gets around this by effectively > introducing a second codepoint set which is matched with AND, since the > matching algorithm uses AND between the top level sets. However, since this > same problem comes up in more places then just UVS sequences (eg. > ligatures) I think we should look for a more general solution and find a > way to include a mechanism which allows for multiple codepoint sets to be > attached to an entry and require all the sets to intersect for the entry to > match (eg. intersection(input codepoints, set 1) AND intersection(input > codepoints, set 2)). Format 2 has the notion of a copy index which allows > constructs an entry by unioning other entries together, we could introduce > an alternate mode on this which treats the combined entries as all needing > to report intersections for the top level entry to match. This would give > the ability to create patches that pull in only the needed alternate glyphs > when a UVS codepoint is present. I'll need to look at how to incorporate > this into the spec without making things too complicated. > > On Fri, Oct 18, 2024 at 2:51 AM Skef Iterum <siterum@adobe.com> wrote: > > Perhaps the thinking is that USVs could be applied downstream? > > It's probably something like this, but there's still a missing underlying > explanation. After all, you might need *any* given codepoint downstream > and the premise of subsetting is that you know what you will and won't > need. One could instead treat the SVs like default-active layout features, > putting them in a list that's added to the unicodes by default but allowing > you to override that. But that's not what seems to have happened. > > If I had to guess I would say that the rationale for how HarfBuzz works is > probably "this functionality isn't widely understood and maybe even not > known, so we shouldn't rely on users specifically adding the variation > selectors they might need." And if that's more or less what happened I'm > not sure the same answer should apply to IFT, because the spec strongly > encourages providing *everything* in the font, and we plan to do that, > it's just a question of * where*. So as long as "the client" (or *some* clients) > can know whether it's about to use an SV, it might make sense to patch more > cleverly on that basis. (And if some clients don't know, they can always > add the SVs into the codepoint list, at the cost of loading extra patches.) > > Skef > ------------------------------ > *From:* John Hudson <john@tiro.ca> > *Sent:* Thursday, October 17, 2024 4:57 PM > *To:* public-webfonts-wg@w3.org <public-webfonts-wg@w3.org> > *Subject:* Re: Treatment of Variation Selectors in the Client > > EXTERNAL: Use caution when clicking on links or opening attachments. > > > On 2024-10-17 15:52, Skef Iterum wrote: > > Beyond that, though, the question is how flexible we can be in > > satisfying the glyph closure requirement. It seems like the strategy > > used for static subsets (again, if I'm reading the code right) is to > > treat the variation selectors as "extra", not considering them as part > > of the list of unicodes to be preserved (or not). So, for example, > > even if VS 1 isn't in the list of codepoints to be preserved, you can > > still get glyphs only accessible using VS 1. It's not clear to me why > > that's the case. > > Perhaps the thinking is that USVs could be applied downstream? > > I’m giving a talk at UTW next week that touches on applying formatting > control characters in buffered states to affect text modes for readers. > My focus is ZWNJ and ZWJ in Indic scripts, but the same concept can be > applied to USVs. > > J. > > > -- > > John Hudson > Tiro Typeworks Ltd > https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.tiro.com%2F&data=05%7C02%7Csiterum%40adobe.com%7C72149c4578454eba1aa508dcef077941%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C638648062620036820%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=NvVh1RqiZkGXoxrqWponvpbW3hSMq45PB7IdaZGRYbY%3D&reserved=0 > <http://www.tiro.com/> > > Tiro Typeworks is physically located on islands > in the Salish Sea, on the traditional territory > of the Snuneymuxw and Penelakut First Nations. > > __________ > > EMAIL HOUR > In the interests of productivity, I am only dealing > with email towards the end of the day, typically > between 4PM and 5PM. If you need to contact me more > urgently, please use other means. > > >
Received on Monday, 21 October 2024 19:09:52 UTC