Re: Reporting my findings on Action 123 (http://www.w3.org/Fonts/WG/track/actions/open) from David Kuettel on 2014-01-09 (public-webfonts-wg@w3.org from January 2014)

From: David Kuettel <kuettel@google.com>
Date: Thu, 9 Jan 2014 14:11:14 -0800
To: "Levantovsky, Vladimir" <Vladimir.Levantovsky@monotype.com>
Cc: Jonathan Kew <jfkthame@googlemail.com>, "public-webfonts-wg@w3.org" <public-webfonts-wg@w3.org>
Message-ID: <CAAYUqgHsABejNFHPuSpS42HJPtM+jFHQtC9YEdEyn-J5rqvoCw@mail.gmail.com>
Thank you for the fantastic exploration, collaboration and analysis
Jonathan and Vlad!  It's fantastic to have fully explored this great idea.
 Thank you.  Happy New Years to everyone as well!


On Mon, Jan 6, 2014 at 7:03 AM, Levantovsky, Vladimir <
Vladimir.Levantovsky@monotype.com> wrote:

> Hi Jonathan, all,
>
> Happy New Year!
> Regarding the experiment Jonathan conducted during the holiday break -
> first of all, thank you Jonathan very much for taking the time to make the
> changes and run the tests. Because the "predictable" points are most of the
> time located at the extrema of the outline contours (and, thus, usually
> have one coordinate unchanged while another one has the same delta as the
> next point) - I did suspect that the effect of elimination of the
> predictable points will not make much difference once we compare the
> compressed data sizes. Seeing an increase in compressed data size after the
> input data stream was reduced in size (even if only slightly) is totally
> unexpected and speaks volumes to the ability of the new entropy coder to
> pick up and deflate repeated data patterns.
>
> Let's discuss this in details this week during our telcon on Wednesday.
>
> Thank you,
> Vlad
>
>
> > -----Original Message-----
> > From: Jonathan Kew [mailto:jfkthame@googlemail.com]
> > Sent: Friday, January 03, 2014 3:51 PM
> > To: David Kuettel; Levantovsky, Vladimir
> > Cc: public-webfonts-wg@w3.org
> > Subject: Re: Reporting my findings on Action 123
> > (http://www.w3.org/Fonts/WG/track/actions/open)
> >
> > On 11/12/13 22:27, David Kuettel wrote:
> > > On Wed, Dec 11, 2013 at 8:46 AM, Levantovsky, Vladimir
> > > <Vladimir.Levantovsky@monotype.com
> > > <mailto:Vladimir.Levantovsky@monotype.com>> wrote:
> > >
> > >     Folks,
> > >
> > >     I suspect that I have a bug in my code and that the evaluated
> > number
> > >     of bytes saved is incorrect (I didn't account for all the
> > different
> > >     cases where deltas can be equal to "0"). However, the number of
> > >     points that can be eliminated is not going to be affected by it
> > >     since I evaluated them using their actual x/y coordinates. So,
> > for
> > >     now please disregard the number of bytes saved.
> > >
> > >
> > > Great catch Vlad!  That is a bummer though, the estimated byte
> > savings
> > > were significant for some of the fonts.  I have tentatively updated
> > > the online spreadsheet accordingly (greying out the "Bytes saved" /
> > > "Bytes/point" columns, for now, but can remove them completely).
> > >
> > >
> > >     The real question is whether eliminating predictable points will
> > >     produce any meaningful savings *after* the entropy coding step is
> > >     applied. Since all coordinates in the glyf table are expressed as
> > >     deltas - I wonder how the entropy coder is taking care of them
> > (and
> > >     I suspect that it is quite good dealing with the deltas).
> > >
> > >
> > > Definitely.  Once the optimization has been added to the reference
> > > compression tool (thank you again for volunteering to take this on
> > > Jonathan), we can gather the post-Brotli numbers and then review them
> > > all together.
> >
> > I've begun to look at this a bit, and the impression I'm getting is
> > that it is -not- going to be worthwhile to do the predictable-points
> > elimination, because the entropy coding achieves equally good
> > compression anyhow without this step.
> >
> > In more detail, here's what I've done to investigate so far:
> >
> > First, I patched the woff2 code from the font-compression-reference
> > repository to identify on-curve points that are exactly midway between
> > their preceding and following off-curve points, and omit such
> > "predictable" points from the data generated by the GlyfEncoder class.
> >
> > Note that this patch produces glyph data that is not actually valid;
> > the "predictable" points are being completely discarded and no flag is
> > left to indicate where they need to be restored. This is because AFAICS
> > the flags stream generated by GlyfEncoder does not currently have any
> > free bits (it's not simply a copy of the TrueType contour point flags,
> > where there are a couple of unused bits). So we'd have to revise the
> > encoding of the coordinates, or find somewhere else to slip in the
> > single flag bit needed to indicate "add a predicted point here".
> >
> > Despite this shortcoming, I expected that simply finding and discarding
> > the "predictable" points, and then running the resulting glyph data
> > through the entropy coder, should give a reasonable indication of how
> > much difference this makes. On average, we'd expect the result to be
> > marginally smaller than it really ought to be, as we've thrown away one
> > bit of data (per predicted point) that in fact needs to be preserved.
> >
> > So using the patch described above, I compared the overall size of a
> > collection of fonts (.ttf files from a standard Windows fonts
> > directory) when compressed to woff2 format with and without discarding
> > the predicted points.
> >
> > With some fonts, discarding the points did reduce the size of the final
> > .woff2 file, in one case by as much as 2%. However, only a couple of
> > (fairly small) fonts achieved anything like this; in the vast majority
> > of cases the difference was a small fraction of a percent.
> >
> > More interestingly, while about 1/3 of the fonts did show -some-
> > benefit (although usually a tiny one), twice as many fonts actually
> > compressed
> > -worse- when the predictable points had been discarded. Summing the
> > results over the entire directory of fonts, the total size of the
> > .woff2 files (about 57MB) -increased- by 0.1%.
> >
> >
> > By way of an example, here's what happens when I try this with Hei.ttf,
> > one of the large Asian fonts found on OS X:
> >
> > $ ls -l Hei.ttf
> > -rw-r--r--  1 jkew  staff  7502752 11 Dec 22:00 Hei.ttf
> >
> > First, try woff2 compression without the predicted-points optimization:
> >
> > $ ./woff2_compress Hei.ttf
> > Processing Hei.ttf => Hei.woff2
> > transformed_glyf length = 5781196
> >
> > The result is a file compressed to 51.53% of its original size:
> >
> > $ ls -l Hei.woff2
> > -rw-r--r--  1 jkew  staff  3866220  3 Jan 17:06 Hei.woff2
> >
> > Then let's try discarding those points:
> >
> > $ WOFF2_DROP_PREDICTED_POINTS=1 ./woff2_compress Hei.ttf Processing
> > Hei.ttf => Hei.woff2 transformed_glyf length = 5773728 total points:
> > 1005476
> >     predicted: 3638 (0.36%)
> >
> > So we have discarded 3638 points, resulting in a transformed glyf table
> > that is 7468 bytes smaller; but look what happens next:
> >
> > $ ls -l Hei.woff2
> > -rw-r--r--  1 jkew  staff  3868048  3 Jan 17:09 Hei.woff2
> >
> > The final compressed file is 1828 bytes LARGER than before!
> >
> > (There are some possible variations in exactly how the deltas for the
> > points are stored, but the basic issue remains the same: even though we
> > can slightly reduce the size of the transformed glyf table, this may
> > well -not- result in a reduction in the eventual file size.)
> >
> >
> > This example isn't a one-off anomaly; the results from the Windows font
> > directory show that there's a strong possibility that eliminating the
> > "predictable" points may harm rather than help the overall
> > compressibility of the glyf table. To determine whether this
> > optimization is worth applying to any given font, we'd have to run the
> > full entropy-coding process twice, once with and once without the
> > predictable-point removal, and see which comes out smaller.
> >
> > Given how expensive the entropy-compression process is, I don't think
> > this is worth doing; the need to compare the two versions of the glyf
> > data means it would virtually double the compression time, for a
> > minimal (if any) gain. So my conclusion is that the predicted-point
> > optimization is not something we should include in the WOFF2
> > preprocessing step after all.
> >
> > JK
>
>
Received on Thursday, 9 January 2014 22:12:03 UTC