RE: Announcing new font compression project from Richard Fink on 2012-03-28 (www-font@w3.org from January to March 2012)

From: Richard Fink <rfink@readableweb.com>
Date: Wed, 28 Mar 2012 12:20:07 -0400
To: "'Raph Levien'" <raph@google.com>, <www-font@w3.org>, "'Dave Crossland'" <dcrossland@google.com>, "'Levantovsky, Vladimir'" <Vladimir.Levantovsky@MonotypeImaging.com>
Message-ID: <002d01cd0cfe$a4aadce0$ee0096a0$@com>
Raph said:

 

>"Many of the ideas, and some particulars of the glyf table compression, are
based on

>Monotype Imaging's MicroType Express format, which is now available under

>open-source and proprietary friendly licensing terms,"

 

Kudos to Vlad and Monotype for seeing the light, finally - that there was
something to be gained by letting MTX run loose and settling for whatever
brownie points (among other opportunities) that might come along with doing
that. It is good that MTX is no longer tied to the Windows platform or,
better said, that it's clear that it no longer is if it ever actually was.
The fog has cleared.

 

I have been sitting on the CPP for EOTFAST for a couple of years, never
having made up my mind what to do with it and then just losing track. (As
per my arrangement with co-author Philip Taylor, it was my call where and
when and how.) There is also a version2 of EOTFAST that was never released
which uses a Perl library to add some additional features and
error-checking.

 

For what it's worth, I'll make the effort to get it up on the EOTFAST site
very soon for whoever wants to make use of it and/or post it elsewhere with
related code. I'll post a few notices in prominent spots when I've done so.

 

BTW - thanks to Twardoch for pointing out there's a Windows independent
implementation of MTX available:

 

>There already is an open-source implementation of EOT with MicroType
Express compression,

>it's in the Java source of Google's sfntly library.

 

Raph also said:

 

>The growth in adoption of web fonts over the past two years has been
stunning

 

I forget the source, but one estimate - which struck me as reasonable - is
that 8% of sites are now using at least one or more web fonts. That IS
stunning.

 

And now, the incredibly quick rise of the Mobile Web has bumped us all back
to the days before broadband and the miserly counting of bytes-per-page. Any
and all tools that address that situation are most welcome.

 

Good Luck.

 

Rich

 

 

From: Raph Levien [mailto:raph@google.com] 
Sent: Tuesday, March 27, 2012 6:08 PM
To: www-font@w3.org
Subject: Announcing new font compression project

 

Greetings, web font enthusiasts.

 

The growth in adoption of web fonts over the past two years has been
stunning. One of the reasons holding people back from using web fonts is
concern over file size and the delay in text rendering until the font is
fully loaded. We believe that better compression of font files will make web
fonts even more appealing for designers, and make the user experience
better. We also believe that lossless compression is quite practical, and is
important because it will be completely transparent to designers and users
alike, with no degradation or concerns over reliability and testing.

 

We have been researching a new lossless compression format, and am now
releasing it as open source and asking for a public discussion. The code
name for the project is "WOFF Ultra Condensed", and the hope is for it to be
considered by the W3C as a future evolution of the WOFF standard. To give a
flavor of the kind of improvements to expect, running compression over the
all fonts in the Google Web Fonts project yields a mean of 26.9% gain
compared to WOFF. Large CJK fonts benefit particularly well - as one
dramatic example, the Nanum Myeongjo font is 48.5% smaller than the
corresponding WOFF. More experiments will follow.

 

The code and documentation of the draft wire format are here:

 

 <http://code.google.com/p/font-compression-reference/>
http://code.google.com/p/font-compression-reference/

 

http://wiki.font-compression-reference.googlecode.com/git/img/WOFFUltraConde
nsed.pdf

http://wiki.font-compression-reference.googlecode.com/git/img/WOFFUltraConde
nsedfileformat.pdf

 

The intent of this proposal is to preserve everything that has made WOFF
great and successful, just providing better compression. The initial WOFF
header, including the metadata features, is completely unchanged from WOFF,
with the exception of the signature.

 

There's more documentation inside the project, but here is a brief overview
of what's going on inside that makes these levels of compression possible:

 

First, the entropy coding is LZMA, which offers significant gains compared
with zlib (gzip).

 

Second, there is preprocessing that removes much of the redundancy in the
TrueType format (which was designed for quick random access rather than
maximal packing into a stream). Third, the directory header is packed using
Huffman coding and a dictionary of common table values, saving over 200
bytes (particularly important for small subsets).

 

There is also a provision for combining multiple tables into a single
entropy coding stream, which can save both the CPU time and file size
overhead of having many small streams.

 

We consider the format to be lossless, in the sense that the _contents_ of
the font file are preserved 100%. That said, the decompressed font is not
bit-identical to the source font, as there are many irrelevant details such
as padding and redundant ways of encoding the same data (for example, it's
perfectly valid, but inefficient to repeat flag bytes in a simple glyph,
instead of using the repeat code). A significant amount of the compression
is due to stripping these out. One way of thinking about the losslessness
guarantee is that running a valid font through compression and decompression
should yield exactly the same TTX representation as the original font.
Further, we plan to build an extensive test suite to validate this
assertion.

 

In this proposal, we've tried to strike a balance between complexity and
aggressiveness of compression. The biggest gains by far come from better
compression of the glyf table (and eliminating the loca table altogether),
so basically this proposal squeezes this table to the maximum. We estimate
that somewhere between 0.5% and 1% each can be gained by (1) eliminating
lsb's from the hmtx table, and (2) compressing the cmap using a technique
similar to CFF. The source code includes compression algorithms for both of
these, but we can't be 100% sure about the gains because we haven't written
the corresponding decompression code. A big concern is overall spec
complexity: We want to make it practical for people to implement, test for
conformance, etc. We'd really love to hear people's thoughts on this, in
particular, whether it's worth going after every last bit of possible
compression.

 

This is an open source project, and we encourage participation from the
whole community. I'd also like to thank a number of people who have
contributed so far: the compression code is based on sfntly (by the Google
Internationalization team), the decompression code is built on top of OTS
(the OpenType Sanitizer), and a number of pleasant discussions with Vlad
Levantovsky, John Daggett have helped improved it. Many of the ideas, and
some particulars of the glyf table compression, are based on Monotype
Imaging's MicroType Express format, which is now available under open-source
and proprietary friendly licensing terms, see
<http://monotypeimaging.com/aboutus/mtx-license.aspx>
http://monotypeimaging.com/aboutus/mtx-license.aspx. Also thanks to Kenichi
Ishibashi for doing an integration into Chromium so we can test it in real
browsers (this will also be released soon).

 

We're looking forward to the discussion!

 

Raph Levien

Engineer, Google Web Fonts
Received on Wednesday, 28 March 2012 16:19:56 UTC