RE: Announcing new font compression project from Levantovsky, Vladimir on 2012-04-02 (www-font@w3.org from April to June 2012)

From: Levantovsky, Vladimir <Vladimir.Levantovsky@MonotypeImaging.com>
Date: Mon, 2 Apr 2012 15:54:44 +0000
To: Just Fill Bugs <mozbugbox@yahoo.com.au>, "www-font@w3.org" <www-font@w3.org>
Message-ID: <79E5B05BFEBAF5418BCB714B43F441990C5A72@wob-mail-01.agfamonotype.org>

On Sunday, April 01, 2012 2:25 AM Just Fill Bugs [mailto:mozbugbox@yahoo.com.au] wrote:
> 
> The removal of random access to the glyf table is bad. It won't do
> anything good to CJK fonts with the tiny bit space saving.
> 

Just to clarify - random access to the glyf table is _not_ removed by applying the new compression. It would work exactly the same way as WOFF does it today: when glyf table is compressed you need to decompress it first to be able to read the data. The same is true with new "WOFF Ultra Condensed" compression, with the only caveat that the loca table is removed from a font when glyf table is compressed (because the glyf data is going to be optimized and the old offsets aren't going to work anymore) and then loca table is re-created on the fly when glyf table is decompressed. You don't lose any functionality at all, except that the loca table data is not traveling from a server to a UA, it's created by the UA on the fly.

> It might takes only 1 second to load a 100KB latin font, while for a
> 3MB CJK font, it will take 30 seconds. Are you sure users are willing
> to wait for 30 seconds before seeing the webpages pop up with some
> fancy font faces?
> 

Could you please explain this with more details? How is it different from WOFF 1.0?

> glyf should be allowed to be compressed by segment and there could be a
> new table similar to loca which maps glyph index (maybe by range) to
> compressed glyf segments. The a browser can start showing text in the
> new font face progressively while the whole font file being downloaded
> in the background. The raw glyf data can be optimized such that the
> most frequently used glyphs for certain language are grouped together.
> 

This is a tough nut to crack. First and foremost, when you have data adaptively entropy-coded (with either gzip or LZMA), your compressed dataset does not include the token dictionary - it will be built on the fly by the compressor and then rebuilt by the decompressor. For this to happen you have to decompress the whole dataset from start to finish.

You can selectively decompress the specific data range if two conditions are met:
- you have the complete dictionary available on the decoder side beforehand, and
- because your tokens are variable-length codewords, you have to have compressed dataset organized in such a way that you can find where one token ends and another one starts. 
It can be (and has been) done, there is ISO/IEC 14496-18 standard utilizing a variation of the MTX compression that provides exactly this capability to compress fonts, transfer them to a client and render them without a need to decompress the whole font. But you pay for it with the noticeable increase in the dataset size, which is exactly the opposite of the problem we are trying to solve.

IMO, the easy alternative for CJK would be to simply create a number of CJK font subsets (with the first subset being small and yet sufficient to display the content of the first page), and then assemble those subsets as a single font family using CSS.

Regards,
Vlad

> 
> I'd rather lost a little bit compression gain over having to wait for a
> complete CJK font to be downloaded before anything shows up.
> 
> Of course this compression by segment for the glyf table can be
> optional to please some latin font users who hate to download extra
> 10KB cache-able data.
> 
> A possible alternatives I can see is to provide something similar to
> the FTC format which contains paritial subfonts of the same type face.
> Each subfont contains a part of a complete cmap range for the same
> font. A browser can download and cache just parts of a big font. This
> sounds more like a hack but could be easier to implement for browsers.
> It is also cleaner to maintained on the server side since we just use a
> single WOFF font with subfonts instead of myface-0-1000.woff, myface-
> 1001-2000.woff, myface-2001-3000.woff...
> 
> 
>

Received on Monday, 2 April 2012 15:55:13 UTC