Re: About dynamic IFT after the proposed changes from Garret Rieger on 2023-10-28 (public-webfonts-wg@w3.org from October 2023)

From: Garret Rieger <grieger@google.com>
Date: Fri, 27 Oct 2023 19:22:44 -0600
To: Skef Iterum <siterum@adobe.com>
Cc: "public-webfonts-wg@w3.org" <public-webfonts-wg@w3.org>
Message-ID: <CAM=OCWbyDzywwXLL9rGrXwOY+G+BgkfLiZQnU3uPN7im2YELFQ@mail.gmail.com>
The approach that I had in mind would be something along the lines of
encoding the path along the graph in the id string. Except you don't need
to encode the full path, you would only need to identify the current node
and the destination node to the server. This is because it doesn't matter
how you reach a node, the subset at that node will always be the same. For
example if you loaded subset a, then b, then c would be no different than
if you loaded subset a, then c, then b. Either path would land you on the
same font that is a union of subsets a, b, and c.

Given that this is how I envisioned implementing a dynamic version (working
off the assumption that the configuration is fixed):
1. You start with a list of subset definitions that the input font will be
partitioned across.
2.  Each of these is assigned a numeric id (assume this mapping is fixed).
3. The id string for a given patch is then formed by encoding two sets into
a binary representation: first the set of id's for partitions that the
current file has, second the set of id's to be added. This could be done
using SparseBitSet's (or some other binary encoding of a set of integers).
The binary encoding is then run through base64 to produce a url safe string
token that identifies that particular patch.
4. Now your dynamic backend upon receiving a request with a particular id
string can reverse the base64 and decode the binary encoding to reconstruct
the two sets. This gives it all the information it needs to produce two
subsets: one that matches what the client currently has and one that is an
extended version. From there the shared brotli patch can be created. In
this model the patch would also update the IFT table in the font, and in
particular would replace all of the id strings to reflect the change to the
current subset.

An important property of this setup is that at no point do we have to
calculate the full graph ahead of time. The graph emerges dynamically as
you start walking it.

This all probably sounds pretty familiar because it essentially acts like a
simplified version of the fully dynamic patch subset approach.

To give a concrete example let's say we have a font and want to partition
into 4 subsets: latin, greek, cyrillic, vietnamese. The root contains latin
and we assign the subsets numeric ids:

latin -> 0
greek -> 1
cyrillic -> 2
vietnamese -> 3

Inside the IFT mapping table of the base font will have three patches
listed. The mapping from subset def to ids would be:

greek -> [{0}, {1}]
cyrillic -> [{0}, {2}]
vietnamese -> [{0}, {3}]

The client wanting to add cyrillic to it's font sends a request to a url
containing the cyrllic id string  [{0}, {2}]. The server can decode that id
and from it cut two subsets: one that contains latin and one that contains
latin and cyrillic. Too the second subset an IFT table is added with
updated mappings:

greek -> [{0,2}, {1}]
vietnamese [{0,2}, {3}]

Finally the server computes the binary diff between these two subsets and
returns that to the client. In this example it would also be possible for
the IFT patch mapping to contain combinations of subsets. For example:

greek -> [{0}, {1}]
cyrillic -> [{0}, {2}]
vietnamese -> [{0}, {3}]
greek + cyrillic -> [{0}, {1, 2}]
greek + vietnamese -> [{0}, {1, 3}]
cyrillic + vietnamese -> [{0}, {2, 3}]
greek + cyrillic + vietnamese -> [{0}, {1, 2, 3}]

Would allow the client to jump to any combination of subsets as the next
step.

On Tue, Oct 24, 2023 at 12:46 PM Skef Iterum <siterum@adobe.com> wrote:

> As today's discussion is sinking in there's one thing I'm curious about:
>
> With static IFT under the new proposal the encoder will take the font file
> and some configuration and arrive at a patch graph, perhaps all at once and
> perhaps step by step but either way (I presume) starting from the root. In
> that model the URL for each patch file is just a token embedded in two
> types of place (the map in the source file, the name of the target file).
> So they could be picked at random for all it matters.
>
> With the new proposal for dynamic IFT the "target file" won't exist. That
> means that the URL in that case needs to map, somehow, to a pair of
> parameter sets (codepoints, features, axes): the parameter set of the
> source file, and the parameter set of the target file.
>
> Assuming the configuration is fixed, and therefore the graph for a given
> file will be deterministic, one way to do this is to drive the URL has a
> hash of the two parameter sets and then walk the whole graph on each
> request to find the right node. With a lot of nodes this might be costly.
>
> Alternatively, the URL could encode a path along the graph, and then you
> would just need to generate and walk those particular nodes, assuming
> that's possible.
>
> A third option is to require that the encoder generate and output the
> entire graph even in any partially or fully dynamic use case, and then the
> server side could consult the file to get the mapping (with the storage
> format presumably optimized for this). The map might be large but if it's
> only on the server side that's probably not of much importance.
>
> So what I'm wondering is which of these strategies is the current
> thinking, or is there some better option?
>
> Skef
>
Received on Saturday, 28 October 2023 01:23:09 UTC