- From: Raph Levien <raph@google.com>
- Date: Tue, 27 Mar 2012 15:08:09 -0700
- To: www-font@w3.org
- Message-ID: <CAFQ67bPNE8MbsjvNKaRb1RQOMmFqNwP4-KTx7iuM1jptkAk-ng@mail.gmail.com>
Greetings, web font enthusiasts. The growth in adoption of web fonts over the past two years has been stunning. One of the reasons holding people back from using web fonts is concern over file size and the delay in text rendering until the font is fully loaded. We believe that better compression of font files will make web fonts even more appealing for designers, and make the user experience better. We also believe that lossless compression is quite practical, and is important because it will be completely transparent to designers and users alike, with no degradation or concerns over reliability and testing. We have been researching a new lossless compression format, and am now releasing it as open source and asking for a public discussion. The code name for the project is "WOFF Ultra Condensed", and the hope is for it to be considered by the W3C as a future evolution of the WOFF standard. To give a flavor of the kind of improvements to expect, running compression over the all fonts in the Google Web Fonts project yields a mean of 26.9% gain compared to WOFF. Large CJK fonts benefit particularly well - as one dramatic example, the Nanum Myeongjo font is 48.5% smaller than the corresponding WOFF. More experiments will follow. The code and documentation of the draft wire format are here: http://code.google.com/p/font-compression-reference/ http://wiki.font-compression-reference.googlecode.com/git/img/WOFFUltraCondensed.pdf http://wiki.font-compression-reference.googlecode.com/git/img/WOFFUltraCondensedfileformat.pdf The intent of this proposal is to preserve everything that has made WOFF great and successful, just providing better compression. The initial WOFF header, including the metadata features, is completely unchanged from WOFF, with the exception of the signature. There's more documentation inside the project, but here is a brief overview of what's going on inside that makes these levels of compression possible: First, the entropy coding is LZMA, which offers significant gains compared with zlib (gzip). Second, there is preprocessing that removes much of the redundancy in the TrueType format (which was designed for quick random access rather than maximal packing into a stream). Third, the directory header is packed using Huffman coding and a dictionary of common table values, saving over 200 bytes (particularly important for small subsets). There is also a provision for combining multiple tables into a single entropy coding stream, which can save both the CPU time and file size overhead of having many small streams. We consider the format to be lossless, in the sense that the _contents_ of the font file are preserved 100%. That said, the decompressed font is not bit-identical to the source font, as there are many irrelevant details such as padding and redundant ways of encoding the same data (for example, it's perfectly valid, but inefficient to repeat flag bytes in a simple glyph, instead of using the repeat code). A significant amount of the compression is due to stripping these out. One way of thinking about the losslessness guarantee is that running a valid font through compression and decompression should yield exactly the same TTX representation as the original font. Further, we plan to build an extensive test suite to validate this assertion. In this proposal, we've tried to strike a balance between complexity and aggressiveness of compression. The biggest gains by far come from better compression of the glyf table (and eliminating the loca table altogether), so basically this proposal squeezes this table to the maximum. We estimate that somewhere between 0.5% and 1% each can be gained by (1) eliminating lsb's from the hmtx table, and (2) compressing the cmap using a technique similar to CFF. The source code includes compression algorithms for both of these, but we can't be 100% sure about the gains because we haven't written the corresponding decompression code. A big concern is overall spec complexity: We want to make it practical for people to implement, test for conformance, etc. We'd really love to hear people's thoughts on this, in particular, whether it's worth going after every last bit of possible compression. This is an open source project, and we encourage participation from the whole community. I'd also like to thank a number of people who have contributed so far: the compression code is based on sfntly (by the Google Internationalization team), the decompression code is built on top of OTS (the OpenType Sanitizer), and a number of pleasant discussions with Vlad Levantovsky, John Daggett have helped improved it. Many of the ideas, and some particulars of the glyf table compression, are based on Monotype Imaging's MicroType Express format, which is now available under open-source and proprietary friendly licensing terms, see http://monotypeimaging.com/aboutus/mtx-license.aspx. Also thanks to Kenichi Ishibashi for doing an integration into Chromium so we can test it in real browsers (this will also be released soon). We're looking forward to the discussion! Raph Levien Engineer, Google Web Fonts
Received on Wednesday, 28 March 2012 07:51:48 UTC