Announcing new font compression project

Greetings, web font enthusiasts.

The growth in adoption of web fonts over the past two years has been
stunning. One of the reasons holding people back from using web fonts is
concern over file size and the delay in text rendering until the font is
fully loaded. We believe that better compression of font files will make
web fonts even more appealing for designers, and make the user experience
better. We also believe that lossless compression is quite practical, and
is important because it will be completely transparent to designers and
users alike, with no degradation or concerns over reliability and testing.

We have been researching a new lossless compression format, and am now
releasing it as open source and asking for a public discussion. The code
name for the project is "WOFF Ultra Condensed", and the hope is for it to
be considered by the W3C as a future evolution of the WOFF standard. To
give a flavor of the kind of improvements to expect, running compression
over the all fonts in the Google Web Fonts project yields a mean of 26.9%
gain compared to WOFF. Large CJK fonts benefit particularly well - as one
dramatic example, the Nanum Myeongjo font is 48.5% smaller than the
corresponding WOFF. More experiments will follow.

The code and documentation of the draft wire format are here:

http://code.google.com/p/font-compression-reference/

http://wiki.font-compression-reference.googlecode.com/git/img/WOFFUltraCondensed.pdf
http://wiki.font-compression-reference.googlecode.com/git/img/WOFFUltraCondensedfileformat.pdf

The intent of this proposal is to preserve everything that has made WOFF
great and successful, just providing better compression. The initial WOFF
header, including the metadata features, is completely unchanged from WOFF,
with the exception of the signature.

There's more documentation inside the project, but here is a brief overview
of what's going on inside that makes these levels of compression possible:

First, the entropy coding is LZMA, which offers significant gains compared
with zlib (gzip).

Second, there is preprocessing that removes much of the redundancy in the
TrueType format (which was designed for quick random access rather than
maximal packing into a stream). Third, the directory header is packed using
Huffman coding and a dictionary of common table values, saving over 200
bytes (particularly important for small subsets).

There is also a provision for combining multiple tables into a single
entropy coding stream, which can save both the CPU time and file size
overhead of having many small streams.

We consider the format to be lossless, in the sense that the _contents_ of
the font file are preserved 100%. That said, the decompressed font is not
bit-identical to the source font, as there are many irrelevant details such
as padding and redundant ways of encoding the same data (for example, it's
perfectly valid, but inefficient to repeat flag bytes in a simple glyph,
instead of using the repeat code). A significant amount of the compression
is due to stripping these out. One way of thinking about the losslessness
guarantee is that running a valid font through compression and
decompression should yield exactly the same TTX representation as the
original font. Further, we plan to build an extensive test suite to
validate this assertion.

In this proposal, we've tried to strike a balance between complexity and
aggressiveness of compression. The biggest gains by far come from better
compression of the glyf table (and eliminating the loca table altogether),
so basically this proposal squeezes this table to the maximum. We estimate
that somewhere between 0.5% and 1% each can be gained by (1) eliminating
lsb's from the hmtx table, and (2) compressing the cmap using a technique
similar to CFF. The source code includes compression algorithms for both of
these, but we can't be 100% sure about the gains because we haven't written
the corresponding decompression code. A big concern is overall spec
complexity: We want to make it practical for people to implement, test for
conformance, etc. We'd really love to hear people's thoughts on this, in
particular, whether it's worth going after every last bit of possible
compression.

This is an open source project, and we encourage participation from the
whole community. I'd also like to thank a number of people who have
contributed so far: the compression code is based on sfntly (by the Google
Internationalization team), the decompression code is built on top of OTS
(the OpenType Sanitizer), and a number of pleasant discussions with Vlad
Levantovsky, John Daggett have helped improved it. Many of the ideas, and
some particulars of the glyf table compression, are based on Monotype
Imaging's MicroType Express format, which is now available under
open-source and proprietary friendly licensing terms, see
http://monotypeimaging.com/aboutus/mtx-license.aspx. Also thanks to Kenichi
Ishibashi for doing an integration into Chromium so we can test it in real
browsers (this will also be released soon).

We're looking forward to the discussion!

Raph Levien
Engineer, Google Web Fonts

Received on Wednesday, 28 March 2012 07:51:48 UTC