- From: Jonathan Kew <jfkthame@googlemail.com>
- Date: Fri, 03 Jan 2014 20:51:08 +0000
- To: David Kuettel <kuettel@google.com>, "Levantovsky, Vladimir" <Vladimir.Levantovsky@monotype.com>
- CC: "public-webfonts-wg@w3.org" <public-webfonts-wg@w3.org>
On 11/12/13 22:27, David Kuettel wrote:
> On Wed, Dec 11, 2013 at 8:46 AM, Levantovsky, Vladimir
> <Vladimir.Levantovsky@monotype.com
> <mailto:Vladimir.Levantovsky@monotype.com>> wrote:
>
> Folks,
>
> I suspect that I have a bug in my code and that the evaluated number
> of bytes saved is incorrect (I didn't account for all the different
> cases where deltas can be equal to "0"). However, the number of
> points that can be eliminated is not going to be affected by it
> since I evaluated them using their actual x/y coordinates. So, for
> now please disregard the number of bytes saved.
>
>
> Great catch Vlad! That is a bummer though, the estimated byte savings
> were significant for some of the fonts. I have tentatively updated the
> online spreadsheet accordingly (greying out the "Bytes saved" /
> "Bytes/point" columns, for now, but can remove them completely).
>
>
> The real question is whether eliminating predictable points will
> produce any meaningful savings *after* the entropy coding step is
> applied. Since all coordinates in the glyf table are expressed as
> deltas - I wonder how the entropy coder is taking care of them (and
> I suspect that it is quite good dealing with the deltas).
>
>
> Definitely. Once the optimization has been added to the reference
> compression tool (thank you again for volunteering to take this on
> Jonathan), we can gather the post-Brotli numbers and then review them
> all together.
I've begun to look at this a bit, and the impression I'm getting is that
it is -not- going to be worthwhile to do the predictable-points
elimination, because the entropy coding achieves equally good
compression anyhow without this step.
In more detail, here's what I've done to investigate so far:
First, I patched the woff2 code from the font-compression-reference
repository to identify on-curve points that are exactly midway between
their preceding and following off-curve points, and omit such
"predictable" points from the data generated by the GlyfEncoder class.
Note that this patch produces glyph data that is not actually valid; the
"predictable" points are being completely discarded and no flag is left
to indicate where they need to be restored. This is because AFAICS the
flags stream generated by GlyfEncoder does not currently have any free
bits (it's not simply a copy of the TrueType contour point flags, where
there are a couple of unused bits). So we'd have to revise the encoding
of the coordinates, or find somewhere else to slip in the single flag
bit needed to indicate "add a predicted point here".
Despite this shortcoming, I expected that simply finding and discarding
the "predictable" points, and then running the resulting glyph data
through the entropy coder, should give a reasonable indication of how
much difference this makes. On average, we'd expect the result to be
marginally smaller than it really ought to be, as we've thrown away one
bit of data (per predicted point) that in fact needs to be preserved.
So using the patch described above, I compared the overall size of a
collection of fonts (.ttf files from a standard Windows fonts directory)
when compressed to woff2 format with and without discarding the
predicted points.
With some fonts, discarding the points did reduce the size of the final
.woff2 file, in one case by as much as 2%. However, only a couple of
(fairly small) fonts achieved anything like this; in the vast majority
of cases the difference was a small fraction of a percent.
More interestingly, while about 1/3 of the fonts did show -some- benefit
(although usually a tiny one), twice as many fonts actually compressed
-worse- when the predictable points had been discarded. Summing the
results over the entire directory of fonts, the total size of the .woff2
files (about 57MB) -increased- by 0.1%.
By way of an example, here's what happens when I try this with Hei.ttf,
one of the large Asian fonts found on OS X:
$ ls -l Hei.ttf
-rw-r--r-- 1 jkew staff 7502752 11 Dec 22:00 Hei.ttf
First, try woff2 compression without the predicted-points optimization:
$ ./woff2_compress Hei.ttf
Processing Hei.ttf => Hei.woff2
transformed_glyf length = 5781196
The result is a file compressed to 51.53% of its original size:
$ ls -l Hei.woff2
-rw-r--r-- 1 jkew staff 3866220 3 Jan 17:06 Hei.woff2
Then let's try discarding those points:
$ WOFF2_DROP_PREDICTED_POINTS=1 ./woff2_compress Hei.ttf
Processing Hei.ttf => Hei.woff2
transformed_glyf length = 5773728
total points: 1005476
predicted: 3638 (0.36%)
So we have discarded 3638 points, resulting in a transformed glyf table
that is 7468 bytes smaller; but look what happens next:
$ ls -l Hei.woff2
-rw-r--r-- 1 jkew staff 3868048 3 Jan 17:09 Hei.woff2
The final compressed file is 1828 bytes LARGER than before!
(There are some possible variations in exactly how the deltas for the
points are stored, but the basic issue remains the same: even though we
can slightly reduce the size of the transformed glyf table, this may
well -not- result in a reduction in the eventual file size.)
This example isn't a one-off anomaly; the results from the Windows font
directory show that there's a strong possibility that eliminating the
"predictable" points may harm rather than help the overall
compressibility of the glyf table. To determine whether this
optimization is worth applying to any given font, we'd have to run the
full entropy-coding process twice, once with and once without the
predictable-point removal, and see which comes out smaller.
Given how expensive the entropy-compression process is, I don't think
this is worth doing; the need to compare the two versions of the glyf
data means it would virtually double the compression time, for a minimal
(if any) gain. So my conclusion is that the predicted-point optimization
is not something we should include in the WOFF2 preprocessing step after
all.
JK
Received on Friday, 3 January 2014 20:51:37 UTC