- From: Garret Rieger <grieger@google.com>
- Date: Wed, 13 Nov 2024 17:00:37 -0700
- To: Skef Iterum <siterum@adobe.com>
- Cc: John Hudson <john@tiro.ca>, "public-webfonts-wg@w3.org" <public-webfonts-wg@w3.org>
- Message-ID: <CAM=OCWYYEaCbRbBasTW3hD6Sv2wEKkKMEXyU1GemFuyxkzSy7Q@mail.gmail.com>
To follow up on this: 1. We added a mechanism to the IFT spec which can handle the UVS case: https://github.com/w3c/IFT/pull/222. For example this allows construction of mappings which require the presence of a UVS codepoint AND one or more base codepoints before a patch is loaded. 2. Harfbuzz and fonttools have been fixed to correctly handle subsetting fonts with cmap14 UVS selectors: https://github.com/harfbuzz/harfbuzz/pull/4912 and https://github.com/fonttools/fonttools/pull/3672 I believe that fixes the outstanding issues in supporting UVS selectors in IFT fonts. On Mon, Oct 21, 2024 at 3:38 PM Garret Rieger <grieger@google.com> wrote: > I've been thinking about this more, specifically in regards to the > separate issue of detecting the readiness of the font to render specific > codepoint/feature/design space combinations. I wrote down my current > thinking in a new issue here: https://github.com/w3c/IFT/issues/223 > > The TL;DR is providing a mechanism to load patches that requires the > simultaneous presence of multiple codepoints, unfortunately hampers our > ability to easily detect the readiness of the font to render specific parts > of text sequences. Readiness detection at the codepoint level is likely > going to be required to integrate this technology with browsers, so I think > we probably don't want to add such a mechanism unless we can come up with a > solution to the readiness problem as part of the mechanism. > > On Mon, Oct 21, 2024 at 1:09 PM Garret Rieger <grieger@google.com> wrote: > >> I put together a draft PR which adds a general purpose mechanism for >> handling UVS and other similar situations where you want to load patches >> only if multiple conditions are met: https://github.com/w3c/IFT/pull/222 >> >> This works pretty similarly to my proposal above: >> >> - The logical entry structure we use throughout the spec has been >> updated to have one or more attached subset definitions (previously there >> was only one). >> - The intersection check requires all attached subset definitions >> match the input for the whole entry to match. >> - Updated format 2 to provide a way to specify entries with more then >> one subset definition. >> - Format 1 remains unmodified and always produces entries with only >> one subset definition. >> >> >> Re: ligatures, agreed that this approach is likely overkill for most >> ligature cases where you only have a small number of them attached to >> specific characters, this mechanism will be primarily useful for things >> like UVS where there are a large enough number of alternates to make it >> worthwhile to have the alternates in patches of their own. >> >> On Fri, Oct 18, 2024 at 3:47 PM Skef Iterum <siterum@adobe.com> wrote: >> >>> Or, rather, knowing where to cut things off ... >>> >>> Skef >>> ------------------------------ >>> *From:* Skef Iterum <siterum@adobe.com> >>> *Sent:* Friday, October 18, 2024 2:09 PM >>> *To:* Garret Rieger <grieger@google.com> >>> *Cc:* John Hudson <john@tiro.ca>; public-webfonts-wg@w3.org < >>> public-webfonts-wg@w3.org> >>> *Subject:* Re: Treatment of Variation Selectors in the Client >>> >>> I think looking at a more general mechanism would be good, both for the >>> UVS case and also for the Emoji ligature cases. >>> >>> I continue to suspect that that mechanism would probably not wind up >>> helping most "traditional" ligatures, because there won't be enough of them >>> to warrant a separate patch, and aggregating a bunch of them together >>> artificially into a "ligature patch" would be counterproductive (especially >>> if you're effectively always going to load that extra patch in practice, >>> because liga is on by default and the source glyphs are high-frequency >>> relative to the script). (Maybe there are languages where ligatures are >>> prevalent enough that it would help there.) >>> >>> So the trick will be coming up with the cases where it will really help >>> and (I suspect) not knowing where to cut things off so that cases that >>> don't need help won't get in the way. >>> >>> Skef >>> ------------------------------ >>> *From:* Garret Rieger <grieger@google.com> >>> *Sent:* Friday, October 18, 2024 10:23 AM >>> *To:* Skef Iterum <siterum@adobe.com> >>> *Cc:* John Hudson <john@tiro.ca>; public-webfonts-wg@w3.org < >>> public-webfonts-wg@w3.org> >>> *Subject:* Re: Treatment of Variation Selectors in the Client >>> >>> >>> *EXTERNAL: Use caution when clicking on links or opening attachments.* >>> >>> >>> In harfbuzz we treat UVS as normal codepoints, that is they are >>> specified as part of the input unicodes set. However, we also run a special >>> glyph closure against cmap14. In the closure we check for UVS sequences >>> that can be activated and pull in any required glyphs to support those >>> sequences. Harfbuzz always treats the input unicode set as unsorted so >>> we'll include any alternate glyphs that could be reached by any ordering of >>> the input unicode set. For example if you run a subsetting operation that >>> asks for the unicode set: >>> >>> {CJK Codepoint 1, CJK Codepoint 2, VS2} where VS2 causes a non default >>> glyph swap for those codepoints then the retained glyph set will be >>> expanded to {CJK 1 Default Glyph, CJK 2 Default Glyph, CJK 1 VS2 Glyph, CJK >>> 2 VS2 Glyph}, but won't include other alternate glyphs reachable via >>> codepoints not included in the input. >>> >>> Now for IFT there's two ways you could handle a encoding a font that >>> uses UVS with what we currently have: >>> >>> 1. Always include all possible alternate glyphs in the patches that >>> contain the base glyph. This is of course wasteful if the alternates aren't >>> needed. >>> 2. Use a trick similar to how we handle VF axis extension, have a >>> table keyed patch which is matched only on a single VS codepoint which >>> changes the set of glyhp keyed patches listed in the font to ones that >>> include the appropriate alternate glyphs. The downside of this approach is >>> that because this patch needs to be full invalidation this incurs a full >>> extra round trip. >>> >>> Neither of these options are great. The fundamental problem we run into >>> is that codepoint sets are matched via intersection, so it's not currently >>> possible to express I want patch X only if codepoint a AND codepoint b are >>> present (codepoints are always matched with OR). Skef's suggestion to add a >>> UVSRecords which acts like FeatureRecords gets around this by effectively >>> introducing a second codepoint set which is matched with AND, since the >>> matching algorithm uses AND between the top level sets. However, since this >>> same problem comes up in more places then just UVS sequences (eg. >>> ligatures) I think we should look for a more general solution and find a >>> way to include a mechanism which allows for multiple codepoint sets to be >>> attached to an entry and require all the sets to intersect for the entry to >>> match (eg. intersection(input codepoints, set 1) AND intersection(input >>> codepoints, set 2)). Format 2 has the notion of a copy index which allows >>> constructs an entry by unioning other entries together, we could introduce >>> an alternate mode on this which treats the combined entries as all needing >>> to report intersections for the top level entry to match. This would give >>> the ability to create patches that pull in only the needed alternate glyphs >>> when a UVS codepoint is present. I'll need to look at how to incorporate >>> this into the spec without making things too complicated. >>> >>> On Fri, Oct 18, 2024 at 2:51 AM Skef Iterum <siterum@adobe.com> wrote: >>> >>> Perhaps the thinking is that USVs could be applied downstream? >>> >>> It's probably something like this, but there's still a missing >>> underlying explanation. After all, you might need *any* given codepoint >>> downstream and the premise of subsetting is that you know what you will and >>> won't need. One could instead treat the SVs like default-active layout >>> features, putting them in a list that's added to the unicodes by default >>> but allowing you to override that. But that's not what seems to have >>> happened. >>> >>> If I had to guess I would say that the rationale for how HarfBuzz works >>> is probably "this functionality isn't widely understood and maybe even not >>> known, so we shouldn't rely on users specifically adding the variation >>> selectors they might need." And if that's more or less what happened I'm >>> not sure the same answer should apply to IFT, because the spec strongly >>> encourages providing *everything* in the font, and we plan to do that, >>> it's just a question of * where*. So as long as "the client" (or *some* clients) >>> can know whether it's about to use an SV, it might make sense to patch more >>> cleverly on that basis. (And if some clients don't know, they can always >>> add the SVs into the codepoint list, at the cost of loading extra patches.) >>> >>> Skef >>> ------------------------------ >>> *From:* John Hudson <john@tiro.ca> >>> *Sent:* Thursday, October 17, 2024 4:57 PM >>> *To:* public-webfonts-wg@w3.org <public-webfonts-wg@w3.org> >>> *Subject:* Re: Treatment of Variation Selectors in the Client >>> >>> EXTERNAL: Use caution when clicking on links or opening attachments. >>> >>> >>> On 2024-10-17 15:52, Skef Iterum wrote: >>> > Beyond that, though, the question is how flexible we can be in >>> > satisfying the glyph closure requirement. It seems like the strategy >>> > used for static subsets (again, if I'm reading the code right) is to >>> > treat the variation selectors as "extra", not considering them as part >>> > of the list of unicodes to be preserved (or not). So, for example, >>> > even if VS 1 isn't in the list of codepoints to be preserved, you can >>> > still get glyphs only accessible using VS 1. It's not clear to me why >>> > that's the case. >>> >>> Perhaps the thinking is that USVs could be applied downstream? >>> >>> I’m giving a talk at UTW next week that touches on applying formatting >>> control characters in buffered states to affect text modes for readers. >>> My focus is ZWNJ and ZWJ in Indic scripts, but the same concept can be >>> applied to USVs. >>> >>> J. >>> >>> >>> -- >>> >>> John Hudson >>> Tiro Typeworks Ltd >>> https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.tiro.com%2F&data=05%7C02%7Csiterum%40adobe.com%7C72149c4578454eba1aa508dcef077941%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C638648062620036820%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=NvVh1RqiZkGXoxrqWponvpbW3hSMq45PB7IdaZGRYbY%3D&reserved=0 >>> <http://www.tiro.com/> >>> >>> Tiro Typeworks is physically located on islands >>> in the Salish Sea, on the traditional territory >>> of the Snuneymuxw and Penelakut First Nations. >>> >>> __________ >>> >>> EMAIL HOUR >>> In the interests of productivity, I am only dealing >>> with email towards the end of the day, typically >>> between 4PM and 5PM. If you need to contact me more >>> urgently, please use other means. >>> >>> >>>
Received on Thursday, 14 November 2024 00:01:01 UTC