Re: Reporting my findings on Action 123 (http://www.w3.org/Fonts/WG/track/actions/open) from Jonathan Kew on 2014-01-03 (public-webfonts-wg@w3.org from January 2014)

From: Jonathan Kew <jfkthame@googlemail.com>
Date: Fri, 03 Jan 2014 20:51:08 +0000
To: David Kuettel <kuettel@google.com>, "Levantovsky, Vladimir" <Vladimir.Levantovsky@monotype.com>
CC: "public-webfonts-wg@w3.org" <public-webfonts-wg@w3.org>
Message-ID: <52C722BC.5030707@gmail.com>
On 11/12/13 22:27, David Kuettel wrote:
> On Wed, Dec 11, 2013 at 8:46 AM, Levantovsky, Vladimir
> <Vladimir.Levantovsky@monotype.com
> <mailto:Vladimir.Levantovsky@monotype.com>> wrote:
>
>     Folks,
>
>     I suspect that I have a bug in my code and that the evaluated number
>     of bytes saved is incorrect (I didn't account for all the different
>     cases where deltas can be equal to "0"). However, the number of
>     points that can be eliminated is not going to be affected by it
>     since I evaluated them using their actual x/y coordinates. So, for
>     now please disregard the number of bytes saved.
>
>
> Great catch Vlad!  That is a bummer though, the estimated byte savings
> were significant for some of the fonts.  I have tentatively updated the
> online spreadsheet accordingly (greying out the "Bytes saved" /
> "Bytes/point" columns, for now, but can remove them completely).
>
>
>     The real question is whether eliminating predictable points will
>     produce any meaningful savings *after* the entropy coding step is
>     applied. Since all coordinates in the glyf table are expressed as
>     deltas - I wonder how the entropy coder is taking care of them (and
>     I suspect that it is quite good dealing with the deltas).
>
>
> Definitely.  Once the optimization has been added to the reference
> compression tool (thank you again for volunteering to take this on
> Jonathan), we can gather the post-Brotli numbers and then review them
> all together.

I've begun to look at this a bit, and the impression I'm getting is that 
it is -not- going to be worthwhile to do the predictable-points 
elimination, because the entropy coding achieves equally good 
compression anyhow without this step.

In more detail, here's what I've done to investigate so far:

First, I patched the woff2 code from the font-compression-reference 
repository to identify on-curve points that are exactly midway between 
their preceding and following off-curve points, and omit such 
"predictable" points from the data generated by the GlyfEncoder class.

Note that this patch produces glyph data that is not actually valid; the 
"predictable" points are being completely discarded and no flag is left 
to indicate where they need to be restored. This is because AFAICS the 
flags stream generated by GlyfEncoder does not currently have any free 
bits (it's not simply a copy of the TrueType contour point flags, where 
there are a couple of unused bits). So we'd have to revise the encoding 
of the coordinates, or find somewhere else to slip in the single flag 
bit needed to indicate "add a predicted point here".

Despite this shortcoming, I expected that simply finding and discarding 
the "predictable" points, and then running the resulting glyph data 
through the entropy coder, should give a reasonable indication of how 
much difference this makes. On average, we'd expect the result to be 
marginally smaller than it really ought to be, as we've thrown away one 
bit of data (per predicted point) that in fact needs to be preserved.

So using the patch described above, I compared the overall size of a 
collection of fonts (.ttf files from a standard Windows fonts directory) 
when compressed to woff2 format with and without discarding the 
predicted points.

With some fonts, discarding the points did reduce the size of the final 
.woff2 file, in one case by as much as 2%. However, only a couple of 
(fairly small) fonts achieved anything like this; in the vast majority 
of cases the difference was a small fraction of a percent.

More interestingly, while about 1/3 of the fonts did show -some- benefit 
(although usually a tiny one), twice as many fonts actually compressed 
-worse- when the predictable points had been discarded. Summing the 
results over the entire directory of fonts, the total size of the .woff2 
files (about 57MB) -increased- by 0.1%.


By way of an example, here's what happens when I try this with Hei.ttf, 
one of the large Asian fonts found on OS X:

$ ls -l Hei.ttf
-rw-r--r--  1 jkew  staff  7502752 11 Dec 22:00 Hei.ttf

First, try woff2 compression without the predicted-points optimization:

$ ./woff2_compress Hei.ttf
Processing Hei.ttf => Hei.woff2
transformed_glyf length = 5781196

The result is a file compressed to 51.53% of its original size:

$ ls -l Hei.woff2
-rw-r--r--  1 jkew  staff  3866220  3 Jan 17:06 Hei.woff2

Then let's try discarding those points:

$ WOFF2_DROP_PREDICTED_POINTS=1 ./woff2_compress Hei.ttf
Processing Hei.ttf => Hei.woff2
transformed_glyf length = 5773728
total points: 1005476
    predicted: 3638 (0.36%)

So we have discarded 3638 points, resulting in a transformed glyf table 
that is 7468 bytes smaller; but look what happens next:

$ ls -l Hei.woff2
-rw-r--r--  1 jkew  staff  3868048  3 Jan 17:09 Hei.woff2

The final compressed file is 1828 bytes LARGER than before!

(There are some possible variations in exactly how the deltas for the 
points are stored, but the basic issue remains the same: even though we 
can slightly reduce the size of the transformed glyf table, this may 
well -not- result in a reduction in the eventual file size.)


This example isn't a one-off anomaly; the results from the Windows font 
directory show that there's a strong possibility that eliminating the 
"predictable" points may harm rather than help the overall 
compressibility of the glyf table. To determine whether this 
optimization is worth applying to any given font, we'd have to run the 
full entropy-coding process twice, once with and once without the 
predictable-point removal, and see which comes out smaller.

Given how expensive the entropy-compression process is, I don't think 
this is worth doing; the need to compare the two versions of the glyf 
data means it would virtually double the compression time, for a minimal 
(if any) gain. So my conclusion is that the predicted-point optimization 
is not something we should include in the WOFF2 preprocessing step after 
all.

JK
Received on Friday, 3 January 2014 20:51:37 UTC