- From: Jonathan Kew <jfkthame@googlemail.com>
- Date: Fri, 03 Jan 2014 20:51:08 +0000
- To: David Kuettel <kuettel@google.com>, "Levantovsky, Vladimir" <Vladimir.Levantovsky@monotype.com>
- CC: "public-webfonts-wg@w3.org" <public-webfonts-wg@w3.org>
On 11/12/13 22:27, David Kuettel wrote: > On Wed, Dec 11, 2013 at 8:46 AM, Levantovsky, Vladimir > <Vladimir.Levantovsky@monotype.com > <mailto:Vladimir.Levantovsky@monotype.com>> wrote: > > Folks, > > I suspect that I have a bug in my code and that the evaluated number > of bytes saved is incorrect (I didn't account for all the different > cases where deltas can be equal to "0"). However, the number of > points that can be eliminated is not going to be affected by it > since I evaluated them using their actual x/y coordinates. So, for > now please disregard the number of bytes saved. > > > Great catch Vlad! That is a bummer though, the estimated byte savings > were significant for some of the fonts. I have tentatively updated the > online spreadsheet accordingly (greying out the "Bytes saved" / > "Bytes/point" columns, for now, but can remove them completely). > > > The real question is whether eliminating predictable points will > produce any meaningful savings *after* the entropy coding step is > applied. Since all coordinates in the glyf table are expressed as > deltas - I wonder how the entropy coder is taking care of them (and > I suspect that it is quite good dealing with the deltas). > > > Definitely. Once the optimization has been added to the reference > compression tool (thank you again for volunteering to take this on > Jonathan), we can gather the post-Brotli numbers and then review them > all together. I've begun to look at this a bit, and the impression I'm getting is that it is -not- going to be worthwhile to do the predictable-points elimination, because the entropy coding achieves equally good compression anyhow without this step. In more detail, here's what I've done to investigate so far: First, I patched the woff2 code from the font-compression-reference repository to identify on-curve points that are exactly midway between their preceding and following off-curve points, and omit such "predictable" points from the data generated by the GlyfEncoder class. Note that this patch produces glyph data that is not actually valid; the "predictable" points are being completely discarded and no flag is left to indicate where they need to be restored. This is because AFAICS the flags stream generated by GlyfEncoder does not currently have any free bits (it's not simply a copy of the TrueType contour point flags, where there are a couple of unused bits). So we'd have to revise the encoding of the coordinates, or find somewhere else to slip in the single flag bit needed to indicate "add a predicted point here". Despite this shortcoming, I expected that simply finding and discarding the "predictable" points, and then running the resulting glyph data through the entropy coder, should give a reasonable indication of how much difference this makes. On average, we'd expect the result to be marginally smaller than it really ought to be, as we've thrown away one bit of data (per predicted point) that in fact needs to be preserved. So using the patch described above, I compared the overall size of a collection of fonts (.ttf files from a standard Windows fonts directory) when compressed to woff2 format with and without discarding the predicted points. With some fonts, discarding the points did reduce the size of the final .woff2 file, in one case by as much as 2%. However, only a couple of (fairly small) fonts achieved anything like this; in the vast majority of cases the difference was a small fraction of a percent. More interestingly, while about 1/3 of the fonts did show -some- benefit (although usually a tiny one), twice as many fonts actually compressed -worse- when the predictable points had been discarded. Summing the results over the entire directory of fonts, the total size of the .woff2 files (about 57MB) -increased- by 0.1%. By way of an example, here's what happens when I try this with Hei.ttf, one of the large Asian fonts found on OS X: $ ls -l Hei.ttf -rw-r--r-- 1 jkew staff 7502752 11 Dec 22:00 Hei.ttf First, try woff2 compression without the predicted-points optimization: $ ./woff2_compress Hei.ttf Processing Hei.ttf => Hei.woff2 transformed_glyf length = 5781196 The result is a file compressed to 51.53% of its original size: $ ls -l Hei.woff2 -rw-r--r-- 1 jkew staff 3866220 3 Jan 17:06 Hei.woff2 Then let's try discarding those points: $ WOFF2_DROP_PREDICTED_POINTS=1 ./woff2_compress Hei.ttf Processing Hei.ttf => Hei.woff2 transformed_glyf length = 5773728 total points: 1005476 predicted: 3638 (0.36%) So we have discarded 3638 points, resulting in a transformed glyf table that is 7468 bytes smaller; but look what happens next: $ ls -l Hei.woff2 -rw-r--r-- 1 jkew staff 3868048 3 Jan 17:09 Hei.woff2 The final compressed file is 1828 bytes LARGER than before! (There are some possible variations in exactly how the deltas for the points are stored, but the basic issue remains the same: even though we can slightly reduce the size of the transformed glyf table, this may well -not- result in a reduction in the eventual file size.) This example isn't a one-off anomaly; the results from the Windows font directory show that there's a strong possibility that eliminating the "predictable" points may harm rather than help the overall compressibility of the glyf table. To determine whether this optimization is worth applying to any given font, we'd have to run the full entropy-coding process twice, once with and once without the predictable-point removal, and see which comes out smaller. Given how expensive the entropy-compression process is, I don't think this is worth doing; the need to compare the two versions of the glyf data means it would virtually double the compression time, for a minimal (if any) gain. So my conclusion is that the predicted-point optimization is not something we should include in the WOFF2 preprocessing step after all. JK
Received on Friday, 3 January 2014 20:51:37 UTC