Re: Some More Thoughts on Adding Encoder Requirements from Skef Iterum on 2024-02-08 (public-webfonts-wg@w3.org from February 2024)

From: Skef Iterum <siterum@adobe.com>
Date: Thu, 8 Feb 2024 00:30:40 +0000
To: Garret Rieger <grieger@google.com>
CC: "w3c-webfonts-wg (public-webfonts-wg@w3.org)" <public-webfonts-wg@w3.org>
Message-ID: <CH3PR02MB9139A7F506F96E9C49BC693EB9452@CH3PR02MB9139.namprd02.prod.outlook.com>
I think my view on the suggested approach is best illustrated by my disagreement with how you describe the Brotli specification when it comes to the sort of editorial directions we are discussing. The initial summary paragraph starts with the phrase "This specification defines a lossless compressed data format  ...", and section 11 is all in terms of an original file. If that specification were written according to how we've been discussing things, "lossless" would be taken out of that summary sentence and section 11 would probably be removed or reorganized because both imply that there's an original file to be reproduced, which is a restrictive way of looking at the Brotli format. Why think there needs to be an original file at all?

So, that specification is written how I would expect it to be. Obviously, the bulk of the content is about the format and decoding the format, as those are the fixed points of the system while the details of the encoder are open. But editorially the document is still organized around the expected case: compressing an "original" file into the format and then decompressing that format to get back the original data. I haven't read the whole thing in detail this afternoon but it seems like any more abstract uses of the format are more or less left to the imagination.

And I think that's entirely appropriate. Specifications like these should be organized around the central case, leaving other cases either to the imagination or to brief asides. So yes, I do think we should be discussing the closure requirement, for example, in terms of an original file. Specifically because putting it in more abstract terms risks people not understanding what the requirement actually is, what it is actually intended to preserve.

You say:
I don't agree with the assertion that we've made it so complicated that we can't explain how to maintain functional equivalence during augmentation:
Let me emphasize: I don't think this either. What I think is that the editorial direction you prefer is likely to make it so. We have designed a system with the primary goal of being able to transfer parts of a font to a client while preserving the overall behavior of the font on the client side, relative to the content it renders, and you say that talking in terms of an "original font" is "somewhat problematic" because it could be used for other things.

I reiterate what I said in the meeting: If we make the documentation too abstract, too removed from the central case, we increase the risk that the writer of an encoder will fail to understand or appreciate aspects of that central case and fail to reproduce the original behavior. And because we have taken an "everything is permitted" editorial approach, we won't be able to point to requirements or even "strong guidance" indicating that the encoder is not written as it should be. And if things go that way people might rightly conclude that the system is not reliable because it isn't reliable in practice, and it won't be widely adopted.

Skef
________________________________
From: Garret Rieger <grieger@google.com>
Sent: Wednesday, February 7, 2024 3:08 PM
To: Skef Iterum <siterum@adobe.com>
Cc: w3c-webfonts-wg (public-webfonts-wg@w3.org) <public-webfonts-wg@w3.org>
Subject: Re: Some More Thoughts on Adding Encoder Requirements


EXTERNAL: Use caution when clicking on links or opening attachments.



On Tue, Feb 6, 2024 at 4:51 PM Skef Iterum <siterum@adobe.com<mailto:siterum@adobe.com>> wrote:
A few quick thoughts on this:


  1.
Is there some reason we couldn't treat the any potential "pre-subsetting" of the original font as a separate step conceptually? That is, couldn't we describe the behavior of the IFT-encoded font in relation to the "original font", while acknowledging that one might have some reason to subset a given actual font in a given situation before encoding it? Even if some encoder has that functionality built in I don't think we have to account for it in the spec.

This is effectively equivalent to framing the equivalence requirement around the fully expanded ift font, which could be a subset of some original font. However, using an original font is more restrictive because it requires there to be an original font. That creates the assumption that IFT is an encoding process that is done only to existing fonts. I'm advocating that we treat an IFT font + patches as a self contained font that could have been produced by any number of different means including directly from authoring sources or by transforming an existing font. Essentially we start from the point that an IFT font exists, and don't care about how it was produced. Once we have an existing IFT font we impose certain requirements on it (and in turn the process that produced it) to ensure it will behave consistently under the application of the client side algorithms.

  1.
The closure requirement section in the current IFTB docs are part of the explanation of how to build an IFTB encoder: "you need to solve this problem when putting glyphs in bins". Just saying that the general behavior of the font needs to be preserved in relation to something somehow is not, in my opinion, close to enough guidance. So one way or another we're going to have to figure out how to add a section with very close to that sort of information, including the "collapse bins, duplicate, or put into bin 0" stuff and what those decisions need to preserve.

I'm planning on including non-normative guidance on how an encoder should function in the new encoder section, this email was meant to primarily discuss the normative requirements we want to include in that section in addition to non-normative guidance.


  1.
If I were on the W3C review committee and saw the document we seem to be heading towards, I think I would say something like "If you've made the spec so complicated and/or flexible that you aren't able to concisely explain what the encoding process needs to do to preserve a font's behavior on the client side, maybe you've made the spec too complicated and/or flexible."

It's not a specifications job to explain how something should be done. The spec exists primarily to enable interoperability of different implementations. As such the main focus will be on the details critical to interoperability, that's why there's such a heavy focus on the client side in the current draft. If the client side behaviour is well defined, then encoders can predict exactly how the client will operate given a particular encoding. This allows an encoder to generate an encoding that produces whatever desired outcome the encoder is targetting. A good example of this is the brotli specification (https://datatracker.ietf.org/doc/html/rfc7932) it describes how the brotli encoding format works and how to decompress it, but provides no guidance on how to actually build a high quality compressor implementation which is a significantly complicated undertaking (see: https://datatracker.ietf.org/doc/html/rfc7932#section-11).

I don't agree with the assertion that we've made it so complicated that we can't explain how to maintain functional equivalence during augmentation:
1. We already have two prototype encoder implementations (the IFTB one, and the one I wrote) that demonstrate how to maintain the functional equivalence property in an encoder implementation. These two prototype encoders are also good quality (with room for improvement of course) as they've already demonstrated smaller transfer costs than existing font loading methods. We will also want to write some documentation outside of the specification that discusses in much more detail how to approach building an encoder based on what we've learned developing the prototypes. This could be referenced from the spec as a non-normative reference.
2. A simple but functional encoder can be demonstrated using only a font subsetter and some basic recursive logic (plus serialization logic for the IFT table, but that's straightforward). Several high quality open source font subsetters exist, and font subsetting at this point is a well understood problem.

I do plan to add some guidance around how to approach an encoder to the specification as well as more explanation of how the pieces all fit together and operate as I think that will help readers with understanding how the technology is meant to function.


Skef
________________________________
From: Garret Rieger <grieger@google.com<mailto:grieger@google.com>>
Sent: Tuesday, February 6, 2024 3:06 PM
To: w3c-webfonts-wg (public-webfonts-wg@w3.org<mailto:public-webfonts-wg@w3.org>) <public-webfonts-wg@w3.org<mailto:public-webfonts-wg@w3.org>>
Subject: Some More Thoughts on Adding Encoder Requirements


EXTERNAL: Use caution when clicking on links or opening attachments.


Following discussions at the last wg call I’ve been thinking about what requirements we may want to place on the encoder/encoded fonts. Here’s what I’ve come up with.


First it helps to look at the requirements that previous IFT proposals have imposed on the server side/encoder side of the technology.


  *   Binned Incremental Font Transfer: this approach uses whats called the “closure” requirement:

“The set of glyphs contained in the chunks loaded through the GID and feature maps must be a superset of those in the GID closure of the font subset description.”


  *   Patch Subset: this approach uses a rendering equivalence requirement:

“When a subsetted font is used to render text using any combination of the subset codepoints, layout features<https://docs.microsoft.com/en-us/typography/opentype/spec/featuretags#>, or design-variation space<https://docs.microsoft.com/en-us/typography/opentype/spec/otvaroverview#terminology> it must render identically to the original font. This includes rendering with the use of any optional typographic features that a renderer may choose to use from the original font, such as hinting instructions.”



In thinking it through I don’t think we’ll be able to use the glyph closure requirement from IFTB in the new IFT specification. The closure requirement in the context of the new IFT approach is too narrow. To be correct it assumes glyph ids, outlines, and non outline data are stable. However, under the new IFT proposal glyph ids, outlines, and all other data in the font are allowed to change while still maintaining functional equivalence to the original font.


Ultimately what we want is a requirement that ensures the intermediate subsets are functionally equivalent to some font. Which naturally leads to something like the rendering equivalence requirement used in patch subset. However, this also has some issues which I’ll discuss later.


The next thing we need to define is equivalence to what? The two previous approaches define equivalence to the “original font”. However, I think this is somewhat problematic:

  *   It’s quite likely that most encoder implementations will want to optionally perform some amount of subsetting on the input font as part of generating the encoding. The current prototype encoder does exactly that. The IFTB draft spec states that any subsetting operations must be done prior to invoking the encoder, but I think forcing those to be discrete operations artificially limits encoder implementations.

  *   In some cases there may not actually be an “original font”. Consider the case where IFT encoding is built directly into a font editor or font compilation tool chain. You could be going directly from font authoring sources to an IFT encoded font.


Instead we should look at the IFT font and collection of patches as logically being a font, which may be derived from and equivalent to some original non-incremental font, or subset of that original font. From the IFT base font and collection of patches we can produce the fully expanded font that it represents using the extension algorithm<https://garretrieger.github.io/IFT/Overview.html#extending-font-subset> with the input subset definition having sets that match all things. The iteration will eventually terminate and end with a font that is fully expanded and no longer incremental.


I suggest that we use this fully expanded font as the basis from which to require equivalence. Specifically, we could require that the font produced after any application of the extension algorithm<https://garretrieger.github.io/IFT/Overview.html#extending-font-subset> is a subset (as defined here<https://garretrieger.github.io/IFT/Overview.html#font-subset-info>) of the fully expanded font.

In addition, we want the IFT augmentation process to be consistent: when utilizing dependent patches the extension algorithm<https://garretrieger.github.io/IFT/Overview.html#extending-font-subset> allows the order of application of dependent patches to be selected by the implementation. We should require that all possible orderings of patch applications should produce the same end result. The current definition of the extension algorithm<https://garretrieger.github.io/IFT/Overview.html#extending-font-subset> already implicitly makes this assumption.


Concretely here’s what I’m considering adding to the current draft:


  1.  Define how to fully expand an IFT font (can also re-use this definition in the offline section).

  2.  Add a section which provides guidance and requirements for encoder implementations. Add that for any valid subset definition after the application of the extension algorithm<https://garretrieger.github.io/IFT/Overview.html#extending-font-subset> on an IFT font:

     *   The encoding MUST be consistent: the result font must be equal (binary equivalence) regardless of the ordering of patch application chosen during the execution of the extension algorithm.

     *   The result SHOULD (strongly encouraged) render equivalent to the fully expanded font when using content fully covered by the subset definition.


I’m proposing we use SHOULD for requirement 2b because it will be problematic to define the rendering equivalence in a way that we could reasonably conformance test for. The main issue I see is that there are many existing font renderers that can in some cases have different behaviours (eg. shaping implementations are definitely not consistent). Testing against all renderers is obviously infeasible, and specifying one specific one to test against is not good either. Also you could end up in a case where it isn’t possible to satisfy the requirement on two different renderers at the same time (probably unlikely but not something that can be ruled out). Lastly, as far as I’m aware there’s no canonical specification for how a font is rendered, so we can’t reference that either.


Another option is to use a weaker equivalence requirement based around something like codepoint presence in the cmap, but as that would provide no useful guarantees as to the end result of rendering I’m not sure it’s worthwhile.
Received on Thursday, 8 February 2024 00:30:51 UTC